college/Spring-2023/CS-2124/Lectures.org

17 KiB

Lecture 1

[2024-01-16 Tue]

Recommended Books

  • Data Structures Using C and C++, by Langsam, Augenstein, and Tenenbaum
  • Data Structures and Algorithm Analysis in C, Mark Allen Wiss

Office Hours

  • Tuesday & Thursdays, 3pm - 4pm

Attendance

  • Not mandatory
  • Missing will cause a loss of bonus points

Late Work

  • Point reduction for late work

Why Study Data Structures

Application

  • Big Data
  • Data is everything and it must be managed to extract information
  • Applications, websites must be optimized (Data access)

Student

  • Fundamental
  • Develops thinking for programming
  • Improves solving problems with better time complexities (performance)
  • Many self-taught programmer lack fundamentals of Computer Science
  • Popular technologies change, Data Structures or Analysis of Algorithms remain the same

Textbook Definition

  • Refers to a scheme for organizing related pieces of information
  • Basic types of data structures include:

    • Files / lists
    • Arrays / Records
    • Trees / Tables
    • Graphs

Types of Data Structures

Structures & Unions

  • Stuctures Contains ordered group of data objects, each data object in a structure is a member or a field.
  • Union Similar to a structure except that all of its members start at the same location in memory. A union variable can represent the value of only one of its members at a time.

Graphs and Trees

  • Graphs and Trees are linked abstract data structures composed of nodes.
  • Each node contains a value and one or more pointers to other nodes arranged in a hierarchy.
  • Graphs can be used to represent networks, while variants of trees can be used for sorting and searching.

Data Structure Operations

  1. Traversal
  2. Searching
  3. Insertion
  4. Deletion
  5. Sorting
  6. Merging

Planning a Computer Program

  • As a programmer you are not supposed to start directly by coding
  • The most important part of programming is brain storming on how to solve the problem

    • First step can be on paper
    • Technically we term such steps as Pseudocode
    • Some programmers also use Algorithm to solve the issue on paper, then start programming

Algorithm Specification

  • An algorithm is a finite set of instructions that, if followed, accomplishes a particular task.
  • All Algorithms must satisfy the following criteria:

    1. Input
    2. Output 3. Defi
    3. Effectiveness

How Programs Solve Problems

  • Program Flow Control

    • The order in which program statements are executed
  • Heuristics

    • Some problems are very complex or no algorithm exist to solve some problems, at such conditions programmers rely on heuristics
    • Intrusion Detection Systems can rely on heuristics to identify attacks
    • Heuristics are basically identified patterns or elements to assist in creating a solution to some problem

Lecture 2

[2024-01-23 Tue]

  • Time and Space Complexity
  • Intro to Asymptotic Notations
  • Big O Notation
  • Searching

    • Binary Search
    • Linear Search

Time and Space Complexity

Time Complexity

  • Time taken by an algorithm for execution

    • Process of determining how processing time increases as the size of the problem (input size) increases
    • Generally time complexity is expressed by keeping only the values which affects runtime most.
  • For example, if time complexity for a pgrogram needs to be calculated as a function of (i.e. $n^4 + n^3 + n^2 + n = n^4$) as all terms are small and they have a lesser impact on overall computation time when compared with $n^4$

Space Complexity

  • Memory required by an algorithm to execute a program

    • Space complexity is the total amount of memory space used by an algorithm/pgroam including the space of input values for execution

Time and Space Complexity Importance

  • Intrusion Detection Systems must process gigabits of data or more with minimal latency, time complexity is important for this
  • Handling potentially petabytes of data, memory complexity is important here

Data Structures

  • Data Structures are necessary for designing efficient algorithms

    • Provides reusability and abstraction
    • Appropriate data structures helps programmers save time and space
    • Assists in optimizing data manipulation (i.e. add, remove, edit large amounts of data)
  • For example: A tree based data structure is ideal for college course info storage in a program

    • Determining what courses need to be taken, courses have "dependencies" on other courses that could be more than a single course dependency

Intro to Asymptotic Notations

  • Tells us how good an algorithm is when compared with another algorithm
  • Parameters can play a part

    • i.e. hardware used for implementation, Operating System, CPU model, processor generation, etc.
  • Therefore, we use Asymptotic analysis to compare space and time complexity
  • Analyzes two algorithms based on changes in their performance concerning the increment or decrement in the input size
  • Big O ($O()$) describes the upper bound of complexity

    • Worst case scenario
  • Runtime usually depends on the size of the input:

    • $T(n)$: /the time taken on an input of size $n$.
  • Asymptotic analysis considers the growth of $T(n)$

Big O Notation

  • Worst case scenario (analyzes algorithm's upper bound)
  • Best-case scenario is not considered for use in a comparative analysis

    • That's why we employ worst-case scenarios to get meaningful input
  • Algorithm in data structure while programming code is critical

    • Big O makes it easier to compare algorithms
  • Big O notation, $O(g(n))$ is a collection of functions
  • A function $f(n)$ is a member of that collection only if it fits the following criteria:

    • Constant $c$ and $n_0$ exist where $f(n) <= c.g(n)$ for all $n >= n_0$ Where '$c$' represents constant values
  • $O(f(n))$ describes the upper bound of $f(n)$, worst-case scenario
  • $\omega(f(n))$ describes the lower bound of $f(n)$, best-case scenario
Why Do We Need Big O?
  • World we live in today consists of complicated apps & software, each running various devices and each has different capabilities
  • Some devices like desktops can run heavy machine learning software, but others like phones can only run apps
  • WHen you create an application, you'll need to optimize your code so that it runs smoothly across devices to give you an edge over your competitors
Computing Big O Notation
  • Big-O Asymptotic Notation gives up Upper Bound:

    1. Determine what the input is and what '$n$' represents (i.e. $f(n)=O(g(n))$)
    2. Identify maximum number of operations the algorithm performs in terms of '$n$'. (i.e. addition of two numbers is just 1 operation)
    3. Eliminate all excluding the highest order terms. (i.e. if you have $n^4$ and $n^3$ consider only $n^4$)
    4. Remove all constant factors. Constants will remain constant regardless of user input
  • Basically Big-O is used to measure and compare worst-case scenarios of algorithms
Example Big O Notations
Big O Notation Example
Constant: $O(c)$ $O(1)$
Logarithmic $O(log(n))$ $n=20$ means $log(20) = 2.996$
Linear: $O(n)$ $n=20$ means $20$
Quadratic: $O(n^2)$ $n=20$ means $20^2 = 400$
Exponential: $O(2^n)$ $n=20$ means $2^20$ = 1084576
Factorial: $O(n!)$ $N=20$ means $20!$
Example Program
#include <stdio.h>
int main() {
    int n;
    printf("N = ");
    scanf("%d", &n);
    printf("Got: %d\n", n);
    int a[n];
    for (int i = 0; i < n; i++)
      printf("a[%d] = %d \n", i, a[n]=i-1)
}
  • Big O of this is $O(n)$

Searching

  • Searching in data structures refers to the process of finding the location of an element in a list
  • One of the important parts of many data structures algorithms, as one operation can be performed on an element if and only if we find it
  • We do not want searching to take '$n$' steps for searching an array of '$n$' number of elements

    • In some cases we are bound to take '$n$' steps
    • Different algorithms try to minimize the number of steps to search an element
Binary Search
  • Divide and conquer approach
  • Requires the data to be sorted
  • In sequential search, when we compare against the first item, there are at most more items to look through if the first item is not what we are looking for

    • Instead of searching the list in sequence, a binary search will start by examining the middle term
    • If that term is the one we are searching for, we are done
    • If it is not the correct term, we can use the ordered nature of the list to eliminate half of the remaining items
    • If the term we are searching for is greater then the middle item, we know that the entire lower half of the list as well as the middle item can be eliminated from further consideration
    • The term, if it is the list, must be in the upper half
Algorithm Steps
def binary_search(arr: list[int], term: int, low: int, high: int) -> int:
    mid = (low + high)/2
    while low <= high:
      if (low > high):
        raise("Unable to find the search term!")
      else if (arr[mid] < term):
        low = mid + 1
      else if (arr[mid] == term):
        return mid

      else
        high = mid - 1;
        mid = (low + high) / 2

binary_search([0, 1, 2, 3, 4], 0, 0, 5) # Outputs: 5
Linear Search
  • The Linear Search (sequential search) algorithm starts at one end of a list and goes through each element of a list until the desired element is found, otherwise the search continues till the end of the data set
  • Does not require data to be sorted
  • Poor Big-O complexity: $O(n)$
def linear_search(arr: list[int], term: int) -> int:
for i in range(0, len(arr)):
    if arr[i] == term:
        return i
raise(f"Unable to find {term} in array!")

Lecture 2

Big O Notation

$O(log(n))$

  • Divide and conquer
  • If the base is not specified in CS, assume a base of $2$: $O(log_2(n))$
  • Binary search is an algorithm that is of $O(log(n))$ complexity

Sorting Algorithms

  • Sorting refers to arranging data in a particular format
  • Many search algorithms depend on sorted data, hence sorting
  • In general there are 2 approaches to sort an array of elements:

    1. Some algorithms work by moving elements to their final position, one at a time. You sort an array of size N, put 1 item in place, and continue sorting an array of size N - 1.

      • Memory efficient
      • Performance inefficient
    2. Some algorithms put items into a temporary position, close(r) to their final position. YOu rescan, moving items closer to the final position with each iteration.

      • Memory inefficient
      • Performance efficient

Complexity and Running Time

  • Factors:

    1. Algorithmic complexity
    2. Additional space requirements
    3. Use of recursion

      • Have to be careful with recursion, can easily spiral out into an $b^n$ complexity
    4. Worst-case behavior

      • Worst-case behavior is important for real-time systems that need guaranteed performance
    5. Behavior on already-sorted or nearly-sorted data

Stable vs Unstable Sorting

  • A stable sort is one which preserves the original order of the input set

    • Elements of same value will be in order
  • An unstable sort does not preserve order of elements

    • Ordering will be only on the sorted value, original order is not respected

In-place and Out-of-place Sorting

  • An In-place algorithm modifies the inputs, which can be a list or an array, without using any additional memory As the algorithm runs, the input is usally overwritten by the output, so no additional space is required.

    • In-place algorithms may take some memory, like using some variables for its operation
    • Overall, it takes constant memory. Space complexity of $O(1)$
  • An algorithm that is not in place is called a not-place or out-of-place algorithm. These sorting algorithms use extra space for sorting, which depends upon the size of the input

Bubble Sort

  • Bubble Sort works by repeatedly swapping adjacent elements if they are in the wrong order
  • Not suitable for large data sets as its average and worst-case complexity is quite high
  • Bubble sort is an In-place and Stable sorting algorithm
  • Big O's:

    • Time Complexity: $O(n^2)$
    • Space Complexity: $O(n^2)$
  • Steps:

    1. Walk through the array n-times
    2. As you walk through the array, check if the current element and it's next neighbor are out of order
    3. If they are out of order, swap them

Selection Sort

  • Selection Sort is an in-place algorithm in which the list is divided into two parts

    1. The sorted part at the left end
    2. The unsorted part at the right end
  • The smallest element is selected from the unsorted array and swapped with the leftmost element, and that element becoems a part of the sorted array. This process continues moving unsorted array boundary by one element to the right.
  • Selection sort is generally preferred over Bubble Sort
  • Big O's:

    • Time Complexity: $O(n^2)$
    • Space Complexity: $O(1)$
  • Steps:

    1. Set MIN to location $0$
    2. Search the minimum element in the list
    3. Swap with value at location MIN
    4. Increment MIN to point to next element
    5. Repeat until list is sorted

Insertion Sort

  • Insertion sort is a simple sorting algorithm that works similar to the way you sort playing cards in your hands
  • The array is virtually split into a sorted and an unsorted part
  • Values from the unsorted part are picked and placed at the correct position in the sorted part
  • Steps:

    1. If it is the first element, it is already sorted. return 1;
    2. Pick next element
    3. Compare with all elements in the sorted sub-list
    4. Shift all elements in the sorted sub-list that is greater than the value to be sorted

Algorithm Comparison

Bubble Sort Selection Sort Insertion Sort
Simple Sorting Algorithm Simple Sorting Algorithm Simple Sorting Algorithm
Compares Neighboring Elements Takes the smallest element and moves it into its place Transfer one element at a time to its

Merge Sort

  • Divide and Conquer Algorithm
  • Works by dividing an array into smaller subarrays, sorting each subarray, and then merging the sorted subarrays back together
  • Does not check if the data is already sorted.
  • Steps:

    1. Find middle index of the array: Middle = 1 + (last-first)/2
    2. Divide the array from the middle
    3. Call merge sort for the first half of the array: MergeSort(array, first, middle)
    4. Call merge sort for the second half of the array: MergeSort(array, middle + 1, last)
    5. Merge the two sorted halves into a single sorted array
  • Big O's:

    • Time Complexity: $nlog(n)$
    • Space Complexity: $O(n)$

Quick Sort

  • Divide and Conquer Algorithm
  • Picks an element as a pivot and partitions the given array around the picked pivot
  • There are many different versions of Quick Sort that pick pivot in different ways:

    1. Always pick the first element as a pivot
    2. Always pick the last element as a pivot
    3. Pick a random elemt as a pivot
    4. Pick median as the pivot
  • Big O's:

    • Time Complexity: $O(n^2)$, average case is $O(nlog(n))$
  • Steps:

    1. Pick an element from the array as the pivot
    2. Divide the unosrted array of elements in two arrays

      1. Values less than the pivot come in the first sub array
      2. Values greater than the pivot come in the second sub-array
    3. Recursively repeat step 2 (until the sub-arrays are sorted)

Lecture 4

Methods for Reducing Complexity

  • Goes along with the Simplicity Design Principle
  • They are:

    1. Abstraction
    2. Modularity
    3. Layering
    4. Hierarchy

Stack

  • What is a stack

    • A data structure.
    • New items are inserted on the "top" and deleted or removed from the top. Top is the most recent item inserted.
    • A stack is a Last In First Out (LIFO) or First In Last Out (FILO) data structure. This means that the last item to get stored on the stack (often called Push operation) is the first one to get out of it (often called as Pop operation).

Stack Operations

  1. push: Adds an element to the top of the stack
  2. pop: Removes the topmost element from the stack
  3. isEmpty: Checks whether the stack is empty
  4. isFull: checks whether the stack is full
  5. top: Displays the topmost element of the stack
  6. Peek: View the top element on the stack
  • Important things to remember when wokring with stacks:

    • Initially, a pointer (top) is set to keep track of the topmost itemi n the stack. The stack is initialized to -1
    • Then, a check is performed to determine if the stack is empty by comparing top to -1
    • As elements are added to the stack, the position of the top is updated
    • As soon as elements are popped or deleted, the topmost element is removed and the position of top is updated
  • There are two ways to implement a stack:

    1. Using a array
    2. Using a linked list