college/Spring-2024/CS-2124/Lectures.org

* Lecture 1

[2024-01-16 Tue]

** Recommended Books

- Data Structures Using C and C++, by Langsam, Augenstein, and Tenenbaum
- Data Structures and Algorithm Analysis in C, Mark Allen Wiss

** Office Hours

- Tuesday & Thursdays, 3pm - 4pm

** Attendance

- Not mandatory
- Missing will cause a loss of /bonus/ points

** Late Work

- Point reduction for late work

** Why Study Data Structures


*** Application

- Big Data
- Data is everything and it must be managed to extract information
- Applications, websites must be optimized (Data access)

*** Student

- Fundamental
- Develops thinking for programming
- Improves solving problems with better time complexities (performance)
- Many self-taught programmer lack fundamentals of Computer Science
- Popular technologies change, Data Structures or Analysis of Algorithms remain the same

*** Textbook Definition

- Refers to a scheme for organizing related pieces of information
- Basic types of data structures include:
  - Files / lists
  - Arrays / Records
  - Trees / Tables
  - Graphs

** Types of Data Structures


*** Structures & Unions

- Stuctures
  Contains ordered group of data objects, each data object in a structure is a /member/ or a /field/.
- Union
  Similar to a structure except that all of its members start at the same location in memory. A
  union variable can represent the value of only one of its members at a time.

*** Graphs and Trees

- Graphs and Trees are linked abstract data structures composed of nodes.
- Each node contains a value and one or more pointers to other nodes arranged in a hierarchy.
- Graphs can be used to represent networks, while variants of trees can be used for sorting and
  searching.

*** Data Structure Operations

1. Traversal
2. Searching
3. Insertion
4. Deletion
5. Sorting
6. Merging

*** Planning a Computer Program

- As a programmer you are not supposed to start directly by coding
- The most important part of programming is brain storming on how to solve the problem
  - First step can be on paper
  - Technically we term such steps as *Pseudocode*
  - Some programmers also use Algorithm to solve the issue on paper, then start programming

*** Algorithm Specification

- An *algorithm* is a finite set of instructions that, if followed, accomplishes a particular
  task.
- All Algorithms must satisfy the following criteria:
  1. Input
  2. Output 3. Defi

  5. Effectiveness

*** How Programs Solve Problems

- Program Flow Control
  - The order in which program statements are executed
- Heuristics
  - Some problems are very complex or no algorithm exist to solve some problems, at such
    conditions programmers rely on heuristics
  - Intrusion Detection Systems can rely on heuristics to identify attacks
  - Heuristics are basically identified patterns or elements to assist in creating a solution to
    some problem

* Lecture 2

[2024-01-23 Tue]

- Time and Space Complexity
- Intro to Asymptotic Notations
- Big O Notation
- Searching
  - Binary Search
  - Linear Search

** Time and Space Complexity

*** Time Complexity
- Time taken by an algorithm for execution
  - Process of determining how processing time increases as the /size of the problem/ (input size)
    increases
  - Generally time complexity is expressed by keeping only the /values which affects runtime most/.

- *For example*, if time complexity for a pgrogram needs to be calculated as a function of (i.e.
  $n^4 + n^3 + n^2 + n = n^4$) as all terms are small and they have a lesser impact on overall
  computation time when compared with $n^4$

*** Space Complexity

- Memory required by an algorithm to execute a program
  - Space complexity is the total amount of /memory space used/ by an algorithm/pgroam including
    the space of input values for execution

*** Time and Space Complexity Importance

- Intrusion Detection Systems must process gigabits of data or more with minimal latency, time
  complexity is important for this
- Handling potentially petabytes of data, memory complexity is important here

*** Data Structures

- Data Structures are necessary for designing efficient algorithms
  - Provides reusability and abstraction
  - Appropriate data structures helps programmers save time and space
  - Assists in optimizing data manipulation (i.e. add, remove, edit large amounts of data)

- For example: A tree based data structure is ideal for college course info storage in a program
  - Determining what courses need to be taken, courses have "dependencies" on other courses that
    could be more than a single course dependency

*** Intro to Asymptotic Notations

- Tells us how good an algorithm is when compared with another algorithm
- Parameters can play a part
  - i.e. hardware used for implementation, Operating System, CPU model, processor generation,
    etc.
- Therefore, we use Asymptotic analysis to compare space and time complexity
- Analyzes two algorithms based on changes in their performance concerning the increment or
  decrement in the input size

- *Big O* ($O()$) describes the upper bound of complexity
  - Worst case scenario

- Runtime usually depends on the size of the input:
  - $T(n)$: /the time taken on an input of size $n$.

- Asymptotic analysis considers the growth of $T(n)$

*** Big O Notation

- Worst case scenario (analyzes algorithm's upper bound)
- Best-case scenario is not considered for use in a comparative analysis
  - That's why we employ worst-case scenarios to get meaningful input
- Algorithm in data structure while programming code is critical
  - Big O makes it easier to compare algorithms

- *Big O* notation, $O(g(n))$ is a collection of functions
- A function $f(n)$ is a member of that collection only if it fits the following criteria:
  - Constant $c$ and $n_0$ exist where $f(n) <= c.g(n)$ for all $n >= n_0$ Where '$c$'
    represents constant values

- $O(f(n))$ describes the upper bound of $f(n)$, /worst-case scenario/
- $\omega(f(n))$ describes the lower bound of $f(n)$, /best-case scenario/

**** Why Do We Need Big O?

- World we live in today consists of complicated apps & software, each running various devices
  and each has different capabilities
- Some devices like desktops can run heavy machine learning software, but others like phones can
  only run apps
- WHen you create an application, you'll need to optimize your code so that it runs smoothly
  across devices to give you an edge over your competitors

**** Computing Big O Notation

- *Big-O* Asymptotic Notation gives up Upper Bound:
  1. Determine what the input is and what '$n$' represents (i.e. $f(n)=O(g(n))$)
  2. Identify maximum number of operations the algorithm performs in terms of '$n$'. (i.e.
     addition of two numbers is just 1 operation)
  3. Eliminate all excluding the highest order terms. (i.e. if you have $n^4$ and $n^3$ consider
     only $n^4$)
  4. Remove all constant factors. Constants will remain /constant/ regardless of user input
- Basically *Big-O* is used to measure and compare worst-case scenarios of algorithms


**** Example Big O Notations

| Big O Notation          | Example                        |
|-------------------------|--------------------------------|
| Constant: $O(c)$        | $O(1)$                         |
| Logarithmic $O(log(n))$ | $n=20$ means $log(20) = 2.996$ |
| Linear: $O(n)$          | $n=20$ means $20$              |
| Quadratic: $O(n^2)$     | $n=20$ means $20^2 = 400$      |
| Exponential: $O(2^n)$   | $n=20$ means $2^20$ = 1084576  |
| Factorial: $O(n!)$      | $N=20$ means $20!$             |

**** Example Program

#+begin_src c
#include <stdio.h>
int main() {
    int n;
    printf("N = ");
    scanf("%d", &n);
    printf("Got: %d\n", n);
    int a[n];
    for (int i = 0; i < n; i++)
      printf("a[%d] = %d \n", i, a[n]=i-1)
}
#+end_src

- Big O of this is $O(n)$

*** Searching

- Searching in data structures refers to the process of finding the location of an element in a
  list
- One of the important parts of many data structures algorithms, as one operation can be
  performed on an element if and only if we find it
- We do not want searching to take '$n$' steps for searching an array of '$n$' number of
  elements
  - In some cases we are bound to take '$n$' steps
  - Different algorithms try to minimize the number of steps to search an element

**** Binary Search

- Divide and conquer approach
- Requires the data to be sorted
- In sequential search, when we compare against the first item, there are at most more items to
  look through if the first item is not what we are looking for
  - Instead of searching the list in sequence, a /binary search/ will start by examining the
    middle term
  - If that term is the one we are searching for, we are done
  - If it is not the correct term, we can use the ordered nature of the list to eliminate half
    of the remaining items
  - If the term we are searching for is greater then the middle item, we know that the entire
    lower half of the list as well as the middle item can be eliminated from further
    consideration
  - The term, if it is the list, must be in the upper half

***** Algorithm Steps

#+begin_src python
def binary_search(arr: list[int], term: int, low: int, high: int) -> int:
    mid = (low + high)/2
    while low <= high:
      if (low > high):
        raise("Unable to find the search term!")
      else if (arr[mid] < term):
        low = mid + 1
      else if (arr[mid] == term):
        return mid

      else
        high = mid - 1;
        mid = (low + high) / 2

binary_search([0, 1, 2, 3, 4], 0, 0, 5) # Outputs: 5
#+end_src

**** Linear Search

- The Linear Search (sequential search) algorithm starts at one end of a list and goes through
  each element of a list until the desired element is found, otherwise the search continues till the
  end of the data set
- Does not require data to be sorted
- Poor *Big-O* complexity: $O(n)$

#+begin_src python
def linear_search(arr: list[int], term: int) -> int:
for i in range(0, len(arr)):
    if arr[i] == term:
        return i
raise(f"Unable to find {term} in array!")
#+end_src

* Lecture 2


** Big O Notation

*** $O(log(n))$

- Divide and conquer
- If the base is not specified in CS, assume a base of $2$: $O(log_2(n))$
- Binary search is an algorithm that is of $O(log(n))$ complexity


** Sorting Algorithms

- Sorting refers to arranging data in a particular format
- Many search algorithms depend on sorted data, hence /sorting/
- In general there are *2 approaches* to sort an array of elements:
  1. Some algorithms work by moving elements to their final position, one at a time. You sort an
     array of size N, put 1 item in place, and continue sorting an array of size N - 1.
     - Memory efficient
     - Performance inefficient
  2. Some algorithms put items into a temporary position, close(r) to their final position. YOu
     rescan, moving items closer to the final position with each iteration.
     - Memory inefficient
     - Performance efficient

*** Complexity and Running Time

- Factors:
  1. Algorithmic complexity
  2. Additional space requirements
  3. Use of recursion
     - Have to be careful with recursion, can easily spiral out into an $b^n$ complexity
  4. Worst-case behavior
     - Worst-case behavior is important for real-time systems that need guaranteed performance
  5. Behavior on already-sorted or nearly-sorted data

*** Stable vs Unstable Sorting

- A stable sort is one which preserves the original order of the input set
  - Elements of same value will be in order
- An unstable sort does not preserve order of elements
  - Ordering will be only on the sorted value, original order is not respected

*** In-place and Out-of-place Sorting

- An *In-place algorithm* modifies the inputs, which can be a list or an array, without using
  any additional memory As the algorithm runs, the input is usally overwritten by the output, so no
  additional space is required.
  - In-place algorithms may take some memory, like using some variables for its operation
  - Overall, it takes constant memory. Space complexity of $O(1)$
- An algorithm that is not in place is called a *not-place* or *out-of-place* algorithm. These
  sorting algorithms use =extra space= for sorting, which depends upon the size of the input

*** Bubble Sort

- *Bubble Sort* works by repeatedly swapping adjacent elements if they are in the wrong order
- Not suitable for large data sets as its average and worst-case complexity is quite high
- Bubble sort is an In-place and Stable sorting algorithm
- Big O's:
  - Time Complexity: $O(n^2)$
  - Space Complexity: $O(n^2)$
- Steps:
  1. Walk through the array n-times
  2. As you walk through the array, check if the current element and it's next neighbor are out
     of order
  3. If they are out of order, swap them

*** Selection Sort

- *Selection Sort* is an *in-place* algorithm in which the list is divided into two parts
  1. The sorted part at the left end
  2. The unsorted part at the right end
- The smallest element is selected from the unsorted array and swapped with the leftmost
  element, and that element becoems a part of the sorted array. This process continues moving
  unsorted array boundary by one element to the right.
- Selection sort is generally preferred over Bubble Sort
- Big O's:
  - Time Complexity: $O(n^2)$
  - Space Complexity: $O(1)$
- Steps:
  1. Set ~MIN~ to location $0$
  2. Search the minimum element in the list
  3. Swap with value at location ~MIN~
  4. Increment ~MIN~ to point to next element
  5. Repeat until list is sorted

*** Insertion Sort

# TODO: Finish this section out
- Insertion sort is a simple sorting algorithm that works similar to the way you sort playing
  cards in your hands
- The array is virtually split into a sorted and an unsorted part
- Values from the unsorted part are picked and placed at the correct position in the sorted part
- Steps:
  1. If it is the first element, it is already sorted. return 1;
  2. Pick next element
  3. Compare with all elements in the sorted sub-list
  4. Shift all elements in the sorted sub-list that is greater than the value to be sorted

*** Algorithm Comparison

| Bubble Sort                   | Selection Sort                                         | Insertion Sort                        |
|-------------------------------|--------------------------------------------------------|---------------------------------------|
| Simple Sorting Algorithm      | Simple Sorting Algorithm                               | Simple Sorting Algorithm              |
| Compares Neighboring Elements | Takes the smallest element and moves it into its place | Transfer one element at a time to its |

*** Merge Sort

- Divide and Conquer Algorithm
- Works by dividing an array into smaller subarrays, sorting each subarray, and then merging the
  sorted subarrays back together
- Does not check if the data is already sorted.
- Steps:
  1. Find middle index of the array: ~Middle = 1 + (last-first)/2~
  2. Divide the array from the middle
  3. Call merge sort for the first half of the array: ~MergeSort(array, first, middle)~
  4. Call merge sort for the second half of the array: ~MergeSort(array, middle + 1, last)~
  5. Merge the two sorted halves into a single sorted array

- Big O's:
  - Time Complexity: $nlog(n)$
  - Space Complexity: $O(n)$

*** Quick Sort

- Divide and Conquer Algorithm
- Picks an element as a pivot and partitions the given array around the picked pivot
- There are many different versions of Quick Sort that pick pivot in different ways:
  1. Always pick the first element as a pivot
  2. Always pick the last element as a pivot
  3. Pick a random elemt as a pivot
  4. Pick median as the pivot
- Big O's:
  - Time Complexity: $O(n^2)$, average case is $O(nlog(n))$
- Steps:
  1. Pick an element from the array as the pivot
  2. Divide the unosrted array of elements in two arrays
     a) Values less than the pivot come in the first sub array
     b) Values greater than the pivot come in the second sub-array
  3. Recursively repeat step ~2~ (until the sub-arrays are sorted)

* Lecture 4

** Methods for Reducing Complexity

- Goes along with the *Simplicity* Design Principle
- They are:
  1. Abstraction
  2. Modularity
  3. Layering
  4. Hierarchy

** Stack

- What is a stack
  - A data structure.
  - New items are inserted on the "top" and deleted or removed from the top. Top is the most
    recent item inserted.
  - A stack is a Last In First Out (LIFO) or First In Last Out (FILO) data structure. This means
    that the last item to get stored on the stack (often called Push operation) is the first one
    to get out of it (often called as Pop operation).

** Stack Operations

1. *push*: Adds an element to the top of the stack
2. *pop*: Removes the topmost element from the stack
3. *isEmpty*: Checks whether the stack is empty
4. *isFull*: checks whether the stack is full
5. *top*: Displays the topmost element of the stack
6. *Peek*: View the top element on the stack

- *Important* things to remember when wokring with stacks:
  - Initially, a pointer (top) is set to keep track of the topmost itemi n the stack. The stack
    is initialized to -1
  - Then, a check is performed to determine if the stack is empty by comparing top to -1
  - As elements are added to the stack, the position of the top is updated
  - As soon as elements are popped or deleted, the topmost element is removed and the position of
    top is updated
- There are *two ways to implement* a stack:
  1. Using a =array=
  2. Using a =linked list=