439 lines
16 KiB
Org Mode
439 lines
16 KiB
Org Mode
* Lecture 1
|
|
|
|
[2024-01-16 Tue]
|
|
|
|
** Recommended Books
|
|
|
|
- Data Structures Using C and C++, by Langsam, Augenstein, and Tenenbaum
|
|
- Data Structures and Algorithm Analysis in C, Mark Allen Wiss
|
|
|
|
** Office Hours
|
|
|
|
- Tuesday & Thursdays, 3pm - 4pm
|
|
|
|
** Attendance
|
|
|
|
- Not mandatory
|
|
- Missing will cause a loss of /bonus/ points
|
|
|
|
** Late Work
|
|
|
|
- Point reduction for late work
|
|
|
|
** Why Study Data Structures
|
|
|
|
|
|
*** Application
|
|
|
|
- Big Data
|
|
- Data is everything and it must be managed to extract information
|
|
- Applications, websites must be optimized (Data access)
|
|
|
|
*** Student
|
|
|
|
- Fundamental
|
|
- Develops thinking for programming
|
|
- Improves solving problems with better time complexities (performance)
|
|
- Many self-taught programmer lack fundamentals of Computer Science
|
|
- Popular technologies change, Data Structures or Analysis of Algorithms remain the same
|
|
|
|
*** Textbook Definition
|
|
|
|
- Refers to a scheme for organizing related pieces of information
|
|
- Basic types of data structures include:
|
|
- Files / lists
|
|
- Arrays / Records
|
|
- Trees / Tables
|
|
- Graphs
|
|
|
|
** Types of Data Structures
|
|
|
|
|
|
*** Structures & Unions
|
|
|
|
- Stuctures
|
|
Contains ordered group of data objects, each data object in a structure is a /member/ or a /field/.
|
|
- Union
|
|
Similar to a structure except that all of its members start at the same location in memory. A
|
|
union variable can represent the value of only one of its members at a time.
|
|
|
|
*** Graphs and Trees
|
|
|
|
- Graphs and Trees are linked abstract data structures composed of nodes.
|
|
- Each node contains a value and one or more pointers to other nodes arranged in a hierarchy.
|
|
- Graphs can be used to represent networks, while variants of trees can be used for sorting and
|
|
searching.
|
|
|
|
*** Data Structure Operations
|
|
|
|
1. Traversal
|
|
2. Searching
|
|
3. Insertion
|
|
4. Deletion
|
|
5. Sorting
|
|
6. Merging
|
|
|
|
*** Planning a Computer Program
|
|
|
|
- As a programmer you are not supposed to start directly by coding
|
|
- The most important part of programming is brain storming on how to solve the problem
|
|
- First step can be on paper
|
|
- Technically we term such steps as *Pseudocode*
|
|
- Some programmers also use Algorithm to solve the issue on paper, then start programming
|
|
|
|
*** Algorithm Specification
|
|
|
|
- An *algorithm* is a finite set of instructions that, if followed, accomplishes a particular
|
|
task.
|
|
- All Algorithms must satisfy the following criteria:
|
|
1. Input
|
|
2. Output 3. Defi
|
|
|
|
5. Effectiveness
|
|
|
|
*** How Programs Solve Problems
|
|
|
|
- Program Flow Control
|
|
- The order in which program statements are executed
|
|
- Heuristics
|
|
- Some problems are very complex or no algorithm exist to solve some problems, at such
|
|
conditions programmers rely on heuristics
|
|
- Intrusion Detection Systems can rely on heuristics to identify attacks
|
|
- Heuristics are basically identified patterns or elements to assist in creating a solution to
|
|
some problem
|
|
|
|
* Lecture 2
|
|
|
|
[2024-01-23 Tue]
|
|
|
|
- Time and Space Complexity
|
|
- Intro to Asymptotic Notations
|
|
- Big O Notation
|
|
- Searching
|
|
- Binary Search
|
|
- Linear Search
|
|
|
|
** Time and Space Complexity
|
|
|
|
*** Time Complexity
|
|
- Time taken by an algorithm for execution
|
|
- Process of determining how processing time increases as the /size of the problem/ (input size)
|
|
increases
|
|
- Generally time complexity is expressed by keeping only the /values which affects runtime most/.
|
|
|
|
- *For example*, if time complexity for a pgrogram needs to be calculated as a function of (i.e.
|
|
$n^4 + n^3 + n^2 + n = n^4$) as all terms are small and they have a lesser impact on overall
|
|
computation time when compared with $n^4$
|
|
|
|
*** Space Complexity
|
|
|
|
- Memory required by an algorithm to execute a program
|
|
- Space complexity is the total amount of /memory space used/ by an algorithm/pgroam including
|
|
the space of input values for execution
|
|
|
|
*** Time and Space Complexity Importance
|
|
|
|
- Intrusion Detection Systems must process gigabits of data or more with minimal latency, time
|
|
complexity is important for this
|
|
- Handling potentially petabytes of data, memory complexity is important here
|
|
|
|
*** Data Structures
|
|
|
|
- Data Structures are necessary for designing efficient algorithms
|
|
- Provides reusability and abstraction
|
|
- Appropriate data structures helps programmers save time and space
|
|
- Assists in optimizing data manipulation (i.e. add, remove, edit large amounts of data)
|
|
|
|
- For example: A tree based data structure is ideal for college course info storage in a program
|
|
- Determining what courses need to be taken, courses have "dependencies" on other courses that
|
|
could be more than a single course dependency
|
|
|
|
*** Intro to Asymptotic Notations
|
|
|
|
- Tells us how good an algorithm is when compared with another algorithm
|
|
- Parameters can play a part
|
|
- i.e. hardware used for implementation, Operating System, CPU model, processor generation,
|
|
etc.
|
|
- Therefore, we use Asymptotic analysis to compare space and time complexity
|
|
- Analyzes two algorithms based on changes in their performance concerning the increment or
|
|
decrement in the input size
|
|
|
|
- *Big O* ($O()$) describes the upper bound of complexity
|
|
- Worst case scenario
|
|
|
|
- Runtime usually depends on the size of the input:
|
|
- $T(n)$: /the time taken on an input of size $n$.
|
|
|
|
- Asymptotic analysis considers the growth of $T(n)$
|
|
|
|
*** Big O Notation
|
|
|
|
- Worst case scenario (analyzes algorithm's upper bound)
|
|
- Best-case scenario is not considered for use in a comparative analysis
|
|
- That's why we employ worst-case scenarios to get meaningful input
|
|
- Algorithm in data structure while programming code is critical
|
|
- Big O makes it easier to compare algorithms
|
|
|
|
- *Big O* notation, $O(g(n))$ is a collection of functions
|
|
- A function $f(n)$ is a member of that collection only if it fits the following criteria:
|
|
- Constant $c$ and $n_0$ exist where $f(n) <= c.g(n)$ for all $n >= n_0$ Where '$c$'
|
|
represents constant values
|
|
|
|
- $O(f(n))$ describes the upper bound of $f(n)$, /worst-case scenario/
|
|
- $\omega(f(n))$ describes the lower bound of $f(n)$, /best-case scenario/
|
|
|
|
**** Why Do We Need Big O?
|
|
|
|
- World we live in today consists of complicated apps & software, each running various devices
|
|
and each has different capabilities
|
|
- Some devices like desktops can run heavy machine learning software, but others like phones can
|
|
only run apps
|
|
- WHen you create an application, you'll need to optimize your code so that it runs smoothly
|
|
across devices to give you an edge over your competitors
|
|
|
|
**** Computing Big O Notation
|
|
|
|
- *Big-O* Asymptotic Notation gives up Upper Bound:
|
|
1. Determine what the input is and what '$n$' represents (i.e. $f(n)=O(g(n))$)
|
|
2. Identify maximum number of operations the algorithm performs in terms of '$n$'. (i.e.
|
|
addition of two numbers is just 1 operation)
|
|
3. Eliminate all excluding the highest order terms. (i.e. if you have $n^4$ and $n^3$ consider
|
|
only $n^4$)
|
|
4. Remove all constant factors. Constants will remain /constant/ regardless of user input
|
|
- Basically *Big-O* is used to measure and compare worst-case scenarios of algorithms
|
|
|
|
|
|
**** Example Big O Notations
|
|
|
|
| Big O Notation | Example |
|
|
|-------------------------|--------------------------------|
|
|
| Constant: $O(c)$ | $O(1)$ |
|
|
| Logarithmic $O(log(n))$ | $n=20$ means $log(20) = 2.996$ |
|
|
| Linear: $O(n)$ | $n=20$ means $20$ |
|
|
| Quadratic: $O(n^2)$ | $n=20$ means $20^2 = 400$ |
|
|
| Exponential: $O(2^n)$ | $n=20$ means $2^20$ = 1084576 |
|
|
| Factorial: $O(n!)$ | $N=20$ means $20!$ |
|
|
|
|
**** Example Program
|
|
|
|
#+begin_src c
|
|
#include <stdio.h>
|
|
int main() {
|
|
int n;
|
|
printf("N = ");
|
|
scanf("%d", &n);
|
|
printf("Got: %d\n", n);
|
|
int a[n];
|
|
for (int i = 0; i < n; i++)
|
|
printf("a[%d] = %d \n", i, a[n]=i-1)
|
|
}
|
|
#+end_src
|
|
|
|
- Big O of this is $O(n)$
|
|
|
|
*** Searching
|
|
|
|
- Searching in data structures refers to the process of finding the location of an element in a
|
|
list
|
|
- One of the important parts of many data structures algorithms, as one operation can be
|
|
performed on an element if and only if we find it
|
|
- We do not want searching to take '$n$' steps for searching an array of '$n$' number of
|
|
elements
|
|
- In some cases we are bound to take '$n$' steps
|
|
- Different algorithms try to minimize the number of steps to search an element
|
|
|
|
**** Binary Search
|
|
|
|
- Divide and conquer approach
|
|
- Requires the data to be sorted
|
|
- In sequential search, when we compare against the first item, there are at most more items to
|
|
look through if the first item is not what we are looking for
|
|
- Instead of searching the list in sequence, a /binary search/ will start by examining the
|
|
middle term
|
|
- If that term is the one we are searching for, we are done
|
|
- If it is not the correct term, we can use the ordered nature of the list to eliminate half
|
|
of the remaining items
|
|
- If the term we are searching for is greater then the middle item, we know that the entire
|
|
lower half of the list as well as the middle item can be eliminated from further
|
|
consideration
|
|
- The term, if it is the list, must be in the upper half
|
|
|
|
***** Algorithm Steps
|
|
|
|
#+begin_src python
|
|
def binary_search(arr: list[int], term: int, low: int, high: int) -> int:
|
|
mid = (low + high)/2
|
|
while low <= high:
|
|
if (low > high):
|
|
raise("Unable to find the search term!")
|
|
else if (arr[mid] < term):
|
|
low = mid + 1
|
|
else if (arr[mid] == term):
|
|
return mid
|
|
|
|
else
|
|
high = mid - 1;
|
|
mid = (low + high) / 2
|
|
|
|
binary_search([0, 1, 2, 3, 4], 0, 0, 5) # Outputs: 5
|
|
#+end_src
|
|
|
|
**** Linear Search
|
|
|
|
- The Linear Search (sequential search) algorithm starts at one end of a list and goes through
|
|
each element of a list until the desired element is found, otherwise the search continues till the
|
|
end of the data set
|
|
- Does not require data to be sorted
|
|
- Poor *Big-O* complexity: $O(n)$
|
|
|
|
#+begin_src python
|
|
def linear_search(arr: list[int], term: int) -> int:
|
|
for i in range(0, len(arr)):
|
|
if arr[i] == term:
|
|
return i
|
|
raise(f"Unable to find {term} in array!")
|
|
#+end_src
|
|
|
|
* Lecture 2
|
|
|
|
|
|
** Big O Notation
|
|
|
|
*** $O(log(n))$
|
|
|
|
- Divide and conquer
|
|
- If the base is not specified in CS, assume a base of $2$: $O(log_2(n))$
|
|
- Binary search is an algorithm that is of $O(log(n))$ complexity
|
|
|
|
|
|
** Sorting Algorithms
|
|
|
|
- Sorting refers to arranging data in a particular format
|
|
- Many search algorithms depend on sorted data, hence /sorting/
|
|
- In general there are *2 approaches* to sort an array of elements:
|
|
1. Some algorithms work by moving elements to their final position, one at a time. You sort an
|
|
array of size N, put 1 item in place, and continue sorting an array of size N - 1.
|
|
- Memory efficient
|
|
- Performance inefficient
|
|
2. Some algorithms put items into a temporary position, close(r) to their final position. YOu
|
|
rescan, moving items closer to the final position with each iteration.
|
|
- Memory inefficient
|
|
- Performance efficient
|
|
|
|
*** Complexity and Running Time
|
|
|
|
- Factors:
|
|
1. Algorithmic complexity
|
|
2. Additional space requirements
|
|
3. Use of recursion
|
|
- Have to be careful with recursion, can easily spiral out into an $b^n$ complexity
|
|
4. Worst-case behavior
|
|
- Worst-case behavior is important for real-time systems that need guaranteed performance
|
|
5. Behavior on already-sorted or nearly-sorted data
|
|
|
|
*** Stable vs Unstable Sorting
|
|
|
|
- A stable sort is one which preserves the original order of the input set
|
|
- Elements of same value will be in order
|
|
- An unstable sort does not preserve order of elements
|
|
- Ordering will be only on the sorted value, original order is not respected
|
|
|
|
*** In-place and Out-of-place Sorting
|
|
|
|
- An *In-place algorithm* modifies the inputs, which can be a list or an array, without using
|
|
any additional memory As the algorithm runs, the input is usally overwritten by the output, so no
|
|
additional space is required.
|
|
- In-place algorithms may take some memory, like using some variables for its operation
|
|
- Overall, it takes constant memory. Space complexity of $O(1)$
|
|
- An algorithm that is not in place is called a *not-place* or *out-of-place* algorithm. These
|
|
sorting algorithms use =extra space= for sorting, which depends upon the size of the input
|
|
|
|
*** Bubble Sort
|
|
|
|
- *Bubble Sort* works by repeatedly swapping adjacent elements if they are in the wrong order
|
|
- Not suitable for large data sets as its average and worst-case complexity is quite high
|
|
- Bubble sort is an In-place and Stable sorting algorithm
|
|
- Big O's:
|
|
- Time Complexity: $O(n^2)$
|
|
- Space Complexity: $O(n^2)$
|
|
- Steps:
|
|
1. Walk through the array n-times
|
|
2. As you walk through the array, check if the current element and it's next neighbor are out
|
|
of order
|
|
3. If they are out of order, swap them
|
|
|
|
*** Selection Sort
|
|
|
|
- *Selection Sort* is an *in-place* algorithm in which the list is divided into two parts
|
|
1. The sorted part at the left end
|
|
2. The unsorted part at the right end
|
|
- The smallest element is selected from the unsorted array and swapped with the leftmost
|
|
element, and that element becoems a part of the sorted array. This process continues moving
|
|
unsorted array boundary by one element to the right.
|
|
- Selection sort is generally preferred over Bubble Sort
|
|
- Big O's:
|
|
- Time Complexity: $O(n^2)$
|
|
- Space Complexity: $O(1)$
|
|
- Steps:
|
|
1. Set ~MIN~ to location $0$
|
|
2. Search the minimum element in the list
|
|
3. Swap with value at location ~MIN~
|
|
4. Increment ~MIN~ to point to next element
|
|
5. Repeat until list is sorted
|
|
|
|
*** Insertion Sort
|
|
|
|
# TODO: Finish this section out
|
|
- Insertion sort is a simple sorting algorithm that works similar to the way you sort playing
|
|
cards in your hands
|
|
- The array is virtually split into a sorted and an unsorted part
|
|
- Values from the unsorted part are picked and placed at the correct position in the sorted part
|
|
- Steps:
|
|
1. If it is the first element, it is already sorted. return 1;
|
|
2. Pick next element
|
|
3. Compare with all elements in the sorted sub-list
|
|
4. Shift all elements in the sorted sub-list that is greater than the value to be sorted
|
|
|
|
*** Algorithm Comparison
|
|
|
|
| Bubble Sort | Selection Sort | Insertion Sort |
|
|
|-------------------------------|--------------------------------------------------------|--------------------------|
|
|
| Simple Sorting Algorithm | Simple Sorting Algorithm | Simple Sorting Algorithm |
|
|
| Compares Neighboring Elements | Takes the smallest element and moves it into its place | Transfer one element at a time to its
|
|
|
|
*** Merge Sort
|
|
|
|
- Divide and Conquer Algorithm
|
|
- Works by dividing an array into smaller subarrays, sorting each subarray, and then merging the
|
|
sorted subarrays back together
|
|
- Does not check if the data is already sorted.
|
|
- Steps:
|
|
1. Find middle index of the array: ~Middle = 1 + (last-first)/2~
|
|
2. Divide the array from the middle
|
|
3. Call merge sort for the first half of the array: ~MergeSort(array, first, middle)~
|
|
4. Call merge sort for the second half of the array: ~MergeSort(array, middle + 1, last)~
|
|
5. Merge the two sorted halves into a single sorted array
|
|
|
|
- Big O's:
|
|
- Time Complexity: $nlog(n)$
|
|
- Space Complexity: $O(n)$
|
|
|
|
*** Quick Sort
|
|
|
|
- Divide and Conquer Algorithm
|
|
- Picks an element as a pivot and partitions the given array around the picked pivot
|
|
- There are many different versions of Quick Sort that pick pivot in different ways:
|
|
1. Always pick the first element as a pivot
|
|
2. Always pick the last element as a pivot
|
|
3. Pick a random elemt as a pivot
|
|
4. Pick median as the pivot
|
|
- Big O's:
|
|
- Time Complexity: $O(n^2)$, average case is $O(nlog(n))$
|
|
- Steps:
|
|
1. Pick an element from the array as the pivot
|
|
2. Divide the unosrted array of elements in two arrays
|
|
a) Values less than the pivot come in the first sub array
|
|
b) Values greater than the pivot come in the second sub-array
|
|
3. Recursively repeat step ~2~ (until the sub-arrays are sorted)
|