Preparing for wild interview rounds

26 Jun, 2025

This is interview-focused writeup!

A lot of interviews coming up! I did some DSA practice earlier along with foundational CS courses, but I was forgetting them. I just used to solve DSA problems randomly, taking my own sweet time to come up with solutions. So, just wanted to note down some important concepts to revise later. I can't solve all these problems again, they are just too much. Also, remembering stuff is important given how difficult Leetcode interviews have become in 2025! I get nightmares on how someone with an actual job will have to do these things to get an offer from another compnay.

Hilariously, most candidates can't even explain their day job if you go deep enough. Interviewing is interesting in the age of LLMs.

Data Structures and Algorithms

After doing a lot of FOSS for almost 2.5 years since freshman year, it's time that I actively start preparing for jobs. Now, DSA is the only thing that will be expected from me in my interviews, and my resume will be thrown to trash anyways (｡•́︿•̀｡). So lets do some Data Structures and Algorithms now. I have a lot of interviews coming up!

Generally in 40 min, I will need to solve minimum 2 algorithms questions with followups, thus recognizing patterns is crucial, I don't think anyone can problem-solve themselves out of such interviews if they haven't been exposed to such problems, until and unless they are into some Competitive Programming.

I am very very time constrained right now - doing undergraduate research in ML + internships + mentoring folks in Google Summer of Code + managing dreaded academic of IIT Kanpur + some dating (≧ε≦)

So the approach I am following for optimal results is :

Not solving random problems, brain needs to internalize patterns.
Take a look at easy problem solutions when starting out, don't solve them. Solve only medium-hard questions on Leetcode. Difficult problems are just a mix of some easy problems with some observations involved.
Focus on whether you are able to think intuitive steps from bruteforce to optimal. Don't waste a lot of time coding each and every problem out. Just code the difficult ones.
Work hard on a single problem until you figure it out is a great approach for Competitive Programming given you have a lot of time. But when you are heavily time constrained, or preparing for simpler interviews, pattern recognition is the key!

And, I personally feel that practicing DSA in Rust (or your favourite programming language) is one of the best way to actually learn the programming language itself. So, not a total waste of time doing Leetcode.

You know I am Rusty, but you should never use Rust in DSA interviews, until and unless you either wanna be mentally harassed in the interview or even better, harass your interviewer (￣▽￣)

You need to be borderline masochistic to even think about doing that! Implementing Data Structures in Rust is a serious task.

General Stuff

In Python, self refers to the instance of the class — the object on which the method is called. Any variable assigned as self.some_var becomes an instance variable/property, accessible by all methods of the class on that specific object. Variables without self inside methods are local to that method and do not persist or share state across methods.
Despite their efficiency, hash collisions can degrade performance, and techniques such as chaining and open addressing are used to manage them.
List isn't hashable becuase its mutable. So convert it to tuple as its immutable hence can be hashed.
ord('a') converts to ASCII and chr(64) converts back to string
min("abc", "def") returns lexographically smaller string
Playing with array and hashmap/set :

my_list = [1, 2, 3]
my_list.append(4) # add one element at the end of list
print(my_list)  # Output: [1, 2, 3, 4] 
my_list.extend([5, 6])
# Or: my_list += [5, 6] # concatinate with another list
print(my_list)  # Output: [1, 2, 3, 4, 5, 6]
my_list.insert(0, 10) # insert value at a specific index
print(my_list)  # Output: [10, 1, 2, 3, 4, 5, 6]
my_list.remove(3) # remove first occurance of value
print(my_list)  # Output: [10, 1, 2, 4, 5, 6]
my_list.pop(2) # remove value at specific index, default removal of last value
print(my_list)  # Output: [10, 1, 4, 5, 6]
del my_list[1:3] # remove slice
print(my_list)  # Output: [1, 6]
my_list.clear()
print(my_list)  # Output: []

visit = set()
visit.add((r, c)) # add tuple to set, cant add list they are immutable
visit.remove((r, c))

d = {}
result = []
d["a"] = 1           # Adds key "a" with value 1
d["a"] = 5           # Modifies key "a" to value 5
d.update({"b": 2, "c": 3})     # Adds/updates multiple key-value pairs
val = d.pop("a")     # Removes key "a" and returns its value
del d["b"]           # Removes key "b" (raises KeyError if not present)
key, val = d.popitem()   # Removes and returns the last (key, value) pair
d.clear()             # Empties the dictionary
# Iterate over keys
d = {"a": 1, "b": 2, "c": 3}
for key in d:
    print(key)            # Output: a, b, c
# Iterate over values
for value in d.values():
    print(value)          # Output: 1, 2, 3
# Iterate over key-value pairs
for key, value in d.items():
    # Use append with a tuple or list
    result.append((key, value))  # Append as a single element (tuple)
    # or
    # result.append([key, value])  # Append as a list

print(result)  # [('a', 1), ('b', 2), ('c', 3)]
# Also include index
for i, (key, value) in enumerate(d.items()):
    print(i, key, value)

s = set()
s.add(1)
s.add(2)
# Iterating
for element in s:
    print(element)   # 1 2 3 (order not guaranteed)
s.remove(2)        # Removes 2, raises KeyError if not found
s.discard(3)       # Removes 3 if present, does nothing otherwise
e = s.pop()        # Removes and returns an arbitrary set element
s.clear()

Int to String and vice versa

n = 42
s = str(n)       # s will be "42"
s = "42"
n = int(s)       # n will be 42 (int)

To remove all occurrences of an element from a list in-place (i.e., without creating a new array), the usual approach is to modify the original list by overwriting unwanted elements and then truncating its length. Python lists don’t support deleting elements efficiently inside loops without extra space.
A stable sorting algorithm is the one that sorts the identical elements in their same order as they appear in the input, whilst unstable sorting may not satisfy the case. Merge Sort, Bubble Sort, Insertion Sort, etc are stable, whereas Heap Sort, Quick Sort, etc are unstable

Collection to be sorted: {(6, 3), (5, 5), (6, 1), (1, 3)}
Stable Sorted: {(1, 3), (5, 5), (6, 3), (6, 1)}
Regular Sorted: Either {(1, 3), (5, 5), (6, 3), (6, 1)}, or {(1, 3), (5, 5), (6, 1), (6, 3)}

arr.sort(reverse=True) for sorting in descending order and arr.sort() for ascending. arr.sort(key=lambda x: x[0] + x[1]) to sort array by sum.
Avoid checking if key exists and then entering :

from collections import defaultdict

freq = defaultdict(int) # defaultdict(lambda: 1) for default 1 frequency instead of 0
nums = [1, 2, 2, 3, 3, 3]

for i, num in enumarate(nums):
    freq[num] += 1  # No need for if-else check of key
print(dict(freq)) # {1: 1, 2: 2, 3: 3}

nums = [1, 2, 2, 3, 3, 3]
count = {}
for n in nums:
    count[n] = 1 + count.get(n, 0)
print(count)  # {1: 1, 2: 2, 3: 3}

from collections import Counter
nums = [1, 2, 2, 3, 3, 3]
freq = Counter(nums)
print(dict(freq))  # {1: 1, 2: 2, 3: 3}

Use counter for frequency counting :

from collections import Counter

nums = [5, 1, 2, 2, 3, 3, 3, 4]
freq = Counter(nums)
print(dict(freq))   # Output: {5: 1, 1: 1, 2: 2, 3: 3, 4: 1}
# Sort by key (ascending)
print(sorted(freq.items()))  # [(1, 1), (2, 2), (3, 3), (4, 1), (5, 1)]

# Sort ONLY by frequency (ascending)
print(sorted(freq.items(), key=lambda x: x[1]))
# Output: [(5, 1), (1, 1), (4, 1), (2, 2), (3, 3)]

# Sort ONLY by frequency (descending)
print(sorted(freq.items(), key=lambda x: -x[1]))
# Output: [(3, 3), (2, 2), (5, 1), (1, 1), (4, 1)]

# Sort by frequency (descending), then by key (ascending)
print(sorted(freq.items(), key=lambda x: (-x[1], x[0])))

diff = ["aaa", "aa", "a"]
freq2 = Counter(diff)
print(dict(freq2))  # Output: {'aaa': 1, 'aa': 1, 'a': 1}

Modify list in-place without re-allocating memory

nums = [1, 2, 3, 4, 5]
nums = [x for x in nums if x % 2 == 0]  # Removes odd numbers
squares = [x * x for x in nums]

Enumerate instead of manual indexing :

nums = [10, 20, 30]
for i, num in enumerate(nums):
    print(i, num)  # 0 10, 1 20, 2 30

Zip to iterate multiple lists and create window :

names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 22]
for name, age in zip(names, ages):
    print(name, age)

arr = [1, 3, 5, 7, 9]
windows = list(zip(arr, arr[1:], arr[2:]))  
print(windows)  # [(1, 3, 5), (3, 5, 7), (5, 7, 9)]

Ceil and Floor

import math

print(math.ceil(4.7))    # Output: 5
print(math.ceil(-10.7))  # Output: -10

print(math.floor(2.3))   # Output: 2
print(math.floor(-33.89))# Output: -34

Use .join() instead of + for string concatenation for better performance

words = ["hello", "world"]
sentence = " ".join(words)  # Efficient
sentence = words[0] + " " + words[1]  # Slow in loops

bisect for simple Binary Search

"""
Only for ascending order
"""
import bisect

arr = [1, 3, 5, 7, 9]
# Returns First Element >= 5 -> if target = 5 is there then Binary Search, if not then it returns the position where the target would be inserted.
target = bisect.bisect_left(arr, 5)  # 2
lower_bound_index = bisect.bisect_left(arr, 6)  # 3
# Returns First Element > 7
upper_bound_index = bisect.bisect_right(arr, 7)  # 4

Generate Permutations and Combinations directly

from itertools import combinations, permutations

arr = [1, 2, 3]
print(list(combinations(arr, 2)))  # [(1,2), (1,3), (2,3)]
print(list(permutations(arr, 2)))  # [(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)]

In queue (FIFO) elements added from back and removed from front - first element added in the list gets removed first, whereas in Stack (LIFO) elements are added to the top and removed from the top as well - last element added is removed first. We get reversed elements when we pop from the stack. So when you want to get reverse elements, dont just store in list and reverse, instead simply pop from stack.

from collections import deque
stack = []

# Push O(1)
stack.append(10)
stack.append(20)
# Top element
top = stack[-1]
# Pop O(1)
stack.pop()     # 20
stack.pop()     # 10

# Slicing from top
stack = [1, 2, 3, 4, 5]  # Pushed in order: 1 (bottom) to 5 (top)
top_three = stack[-3:]  # Returns [3, 4, 5] (3rd top: 3, 2nd top: 4, top: 5)

queue = deque()
# Enqueue elements
queue.append(1)
queue.append(2)

# Dequeue elements (removes from front)
front = queue.popleft() # O(1)

# O(1) time complexity for both append (to the right) and popleft (from the left)

heapq for Priority Queues

"""
Min-heap has smallest element on top - use it for finding k largest, Max-heap has largest element on top - used to find k smallest.
Poping from heap of size k is log(k).

The k-th largest element is the smallest element among the top k largest elements. This means we only need to maintain k elements in our Min-Heap to efficiently determine the k-th largest element. Whenever the size of the Min-Heap exceeds k, we remove the smallest element by popping from the heap. 

heapify() - Used when you have an existing unsorted list and want to convert it into a heap all at once. Runs in O(n) time. You only call this once, then you can use heappop or heappush. A sorted array ascending/desecending is already a valid min-heap/max-heap. But if you want a max-heap from an ascending sorted array, you would need to reverse it or heapify. O(n) Time Complexity

heappush() - Used to insert one element into an already valid heap. Maintains heap property by reordering as needed. Runs in O(log n) time per push. If you build a heap by pushing elements one after another, no need to call heapify(). O(logn) Time Complexity

heappop() - Removes and returns the smallest element (min-heap), then rebalances. O(logn) Time Complexity for one pop, so O(klogn) for k pops

heap[0] - Accesses the smallest element without removing it in O(1)
"""
import heapq

nums = [7, 2, 9, 4, 1]
heapq.heapify(nums)  # Convert to heap
print(heapq.heappop(nums))  # Removes smallest element (1)
# Convert to a max-heap in-place
nums = [-num for num in nums]  # Negate all values
heapq.heapify(nums)  # Convert to min-heap (which acts as max-heap)

# Example list of pairs
nums = [[1, 2], [7, 3], [4, 1], [2, 5]]
k = 2

# Create an empty list to use as a heap
heap = []

# Push all elements into the heap with their sum as the priority
for pair in nums:
    # tuple: (priority, actual_value)
    heapq.heappush(heap, (pair[0] + pair[1], pair))

# Pop the smallest k elements from the heap
result = []
for _ in range(min(k, len(heap))):
    sum_val, pair = heapq.heappop(heap)
    result.append(pair)

print(result)  # [[1, 2], [4, 1]]  → smallest sums

Kth largest element - The smallest element is always at the top in min-heap. Every time the heap’s size exceeds k, pop the min element (smallest in heap). After all insertions, the min-heap contains the k largest elements from the array. The root of the min-heap (min_heap) is the smallest among the k largest elements, i.e., it is the k-th largest in the whole array. Similarly, The k-th smallest element is the largest element among the top k smallest elements.

import heapq
"""
Time complexity: O(klog⁡n)
Space complexity: O(k)
Heapify in O(n) and we pop k times O(logn)

For a list of lists or tuples, it uses the first element of each sub-list/tuple unless specified otherwise. 
"""
# O(n + k log n)
def kth_largest_heap(arr, k):
    # returns a list of the k largest, sorted largest to smallest
    ans = heapq.nlargest(k, arr)
    return ans[-1] # the last element is kth largest

# O(n + k log n)
def kth_smallest_heap(arr, k):
    # returns a list of the k largest, sorted smallest to largest
    ans = heapq.nsmallest(k, arr)
    return ans[-1]  

data = [(2, "B"), (1, "C"), (3, "A")]
# Get 2 "smallest" by first value
out = heapq.nsmallest(2, data, key=lambda x: x[0])  # [(1, 'C'), (2, 'B')]
# Get 2 "largest" by the string, alphabetically
out2 = heapq.nlargest(2, data, key=lambda x: x[1])  # [(3, 'A'), (2, 'B')]

If array is small in problems of finding kth smallest or largest, then its better to do it with a single loop then heap. If

    first_largest = float('-inf')
    second_largest = float('-inf')
    third_largest = float('-inf')

    for num in nums:
"""
If num is greater than first_largest: This means we've found a new largest number. We update third_largest to the previous second_largest, second_largest to the previous first_largest, and first_largest to the current num.
"""
        if num > first_largest:
            third_largest = second_largest
            second_largest = first_largest
            first_largest = num
"""
Else if num is greater than second_largest: This means num is the new second largest (but not the largest). We update third_largest to the previous second_largest and second_largest to num.
"""
        elif num > second_largest:
            third_largest = second_largest
            second_largest = num
"""
Else if num is greater than third_largest: This means num is the new third largest (but not larger than the first or second). We update third_largest to num.
"""
        elif num > third_largest:
            third_largest = num

Dequeue for effective queue and stack implementation. Insert (enqueue) and pop (dequeue) are O(1) is queue. Queues are used when you want to process elements one at a time from the front, as in printers, scheduling, or BFS (breadth-first search). Rolling average or moving sum (Fixed window size) and Sliding window maximum.

from collections import deque

queue = deque([1, 2, 3, 4, 5])
queue.popleft()  # Removes 1 in O(1), NO shifting required!

If characters only lowercase then store frequency in 26-size list is better than storing in hashmap (dictionary). A list of 26 integers uses less memory than a dictionary, which stores keys, values, and additional metadata. List access by index is faster than dictionary key lookup.

s = "abracadabra"
freq = [0] * 26
for ch in s:
    freq[ord(ch) - ord('a')] += 1

Recursion simply simulates future and combines the result of explored paths. Every recursion call is going to end up at base case, where it returns. So, just write the recurion call assuming that you are in base case.
@lru_cache for memoization. Do manually if required to customize key structure and cache policies. Memoization is most effective when the problem has overlapping subproblems and you only care about the result (e.g., count, existence, or max sum), not the actual combinations.

def factorial(n, memo=None):
    # Use memo dictionary to store previously computed factorials
    if memo is None:
        memo = {}

    # Base case
    if n == 0 or n == 1:
        return 1

    # Check memo
    if n in memo:
        return memo[n]

    # Recursive computation with memoization
    memo[n] = n * factorial(n - 1, memo)
    return memo[n]

Python doesnt have in-built Ordered Map (TreeMap and BTreeMap), but its used for keeping the keys in sorted order in key-value store, i.e store frequency of sorted elements directly. This allows for efficient (O(log n)) insertion, deletion, and lookup operations, as well as range queries.
Priority queue (implemented using a min-heap or max-heap) achieve O(log n) time complexity for insertion (add element to the heap and perform heapify-up) and deletion (remove the root (min/max), replace with the last element, and heapify-down) operations, along with O(1) access to the highest-priority element. However, arbitrary element lookup/search and then removing remains O(n) due to the heap's structure. Prefer this for contiguous arrays. So, if you continuously want to get max/min value, use heap.
When using math.pow(base, exponent) the time complexity is O(log exponenet). Incase this is causing overflow (expoenents can be huge), resort to using a simple for loop to calculate the power only till its required to compare.
Use bit for faster even/odd :

def is_odd(x):
    return (x & 1) == 1  # Odd if last bit is 1

def is_even(x):
    return (x & 1) == 0  # Even if last bit is 0

When the elements in integer are in ascending order, then its smallest, out of all possibilities.

Now once we know how to exploit Python properly for our interviews, lets jump into the waters of actual problem solving.

There are too many questions, but the concepts are limited, so crux of problem solving would be :

always write examples to understand the problem
break the problem into simpler versions
exploit the defining features of the problem
seek symmetry and patterns among problems
dry run for intutions

Always do some examples to understand the problem well. Never forget Edge Cases:

Empty input
Duplicate elements
Zero/Negative numbers
Overflow cases
Constraints on inputs and outputs
Intendation Error

Things to notice :

Constraints

If the constraints are n <= 20, then Brute force approaches are viable. Backtracking and recursion leading to exponential time complexity (2^n, n!) is acceptable.
If n lies between 10^3 and 10^6, then most probably O(nlogn) or O(n) solution is requied. Try Heaps, Pointers, Greedy or Dynamic Programming
If n >= 10^7, only O(logn) or O(1) solution is vaible.

Output Format

List of Lists (combinations, subsets, paths) - Backtracking is almost always the answer. Generate all possibilities. Use recursion with choice/no-choice pattern
Single Number (max/min profit, cost, ways, jumps) - Dynamic Programming for optimization. Greedy for local optimal choices.
Modified Array/String (in-place operations) - Two Pointers for in-place modifications

Patterns

If we have to continuously remove elements from between in the array, then deque is better. For the circular game, you can rotate the queue (move elements from front to back) and then remove the front element, all in O(1) per operation. This avoids the repeated shifting that happens in an array.
If sorted array is given, Binary Search, Pointers, Greedy.
Nearest smaller/larger element problems, next greater/smaller element queries -> Monotonic Stack. When order of past element matters then apply.
Generate with some condition, recursion/backtracking
Maintaing smallest and largest in a array, both obtained in O(1) -> Monotonic Increasing Queue
If list of list is given as input, and a sort of adjcency maps is being created for traversing, then its graph problem. Also, when you have to visit places, it can be graph.
BFS for level based traversal (shortest path) whereas DFS for recursive exploration. When you want to find the shortest distance/time from any one of several starting nodes to other nodes, use multi-source BFS. Problems where multiple initial points influence the state simultaneously (e.g., infection spreading, water flooding). If the goal is to reach from somewhere to somewhere (source to target), then it might be graph if List of List is given as input.
Topological Sort for dependency and Union Find for group finding
Dynamic Programming Keywords - "Number of ways", "Maximum/minimum" + "sum/profit/cost", "Can you reach", "Longest/shortest subsequence", "Optimal" or "best"
Two Pointers Keywords - "Palindrome", "Sorted array", "Target sum", "Remove duplicates"
Sliding Window - "Substring" with conditions, "Subarray" with fixed/variable size, "Maximum/minimum window", "Contains all"
Heap Keywords - "K largest" or "K smallest", "Top K elements", "Median", "Priority"
Stack Keywords - "Nested structure", "Undo operations"
Binary Search - "Search in sorted", "Minimize maximum" or "Maximum"/"Minimum", "First/last occurrence",
We need to have left and right boundary (min and max) for Binary Search along with a property to cancel elements before/after mid. One such property is of sorting, but it can be any check() given its monotonic. The main idea is to alter the problem so the answer is not simply a matter of picking the largest-values-first (greedy), but of searching for a threshold or minimal/optimal value that cannot be directly achieved by sequentially picking values.

Every problem listed is unique and teaches you something. Lets gooo....

Arrays

XOR to find unique element and duplicates

nums = [1, 2, 2, 3, 3]
unique = 0

for num in nums:
    unique ^= num  # XOR accumulates and cancels out duplicates

print(unique)  # Output: 1

Rotation of array and Reverse after m positions

def rotate(arr, k):
    k %= len(arr)  # Handle cases where k > len(arr)
    return arr[-k:] + arr[:-k]  # Move last k elements to the front
    return arr[k:] + arr[:k]  # Move first k elements to the back

def reverse_after_m(arr, m):
    arr[m+1:] = arr[m+1:][::-1]  # Reverse portion after m
    return arr

A subarray/substring is a contiguous part of an array, meaning all elements are taken in sequence and without skipping any in between. A subsequence is a sequence that can be derived from the array by deleting some (or no) elements without changing the order of the remaining elements. Elements need not be contiguous. A subset is any possible combination of the original elements, regardless of order or contiguity. Generating subarry by bruteforce is O(n^2) and subset and subsequence is O(n⋅2^n) with space O(n).
When finding target sum, if array is not sorted, it's better to use hashmap to store the index-value pair and then do a o(n) search on array with o(1) lookup in hashmap to get the target sum. But if the array is sorted, two-pointer should be used as the sum can be increased/decreased to match the target sum (move i forward to increase the sum, move j back to decrease the sum according to the target).
Prefix Sum to find sum between specific indices in array. Useful for subarray sum and continuity (contiguous) is maintained.

"""
prefix[i] stores the sum of all elements from index 0 to i-1.
prefix = 0 represents the sum before any elements.
This lets you get the sum of any subarray arr[l:r] using:
sum(l,r) = prefix[r+1] − prefix[l]
This reduces querying subarray sums from O(n) to O(1) in time, using O(n) in space for prefix sum array
"""
def prefix_sum(self, arr):
    n = len(arr)
    prefix = [0] * (n + 1) # prefix sum array has added element as sum of all numbers at last
    for i in range(n): # iterate till n - 1
       prefix[i+1] = prefix[i] + arr[i]
    return prefix

# iterating across all l and r is O(n^2)
def range_sum(self, prefix, l, r):
    return prefix[r+1] - prefix[l]

# instead find target - l from map of prefix sums
count = 0
prefix = self.prefix_sum(nums)

prefix_counts = defaultdict(int) 
prefix_counts[0] = 1  # sum = 0 before starting {0 : 1}

for i in range(1, len(prefix)):
    curr_sum = prefix[i]
    needed = curr_sum - sum(l, r)
    count += prefix_counts.get(needed, 0)
    prefix_counts[curr_sum] += 1

return count

Difference Array technique allows us to update a range in array but just changing 2 elements. To increase the value in range [a, b] by x, increase the value at position a by x, and b+1 by -x, and then prefix sum over those indices. 2D Difference Array is quite useful in Image processing.

def apply_range_updates(array_length, updates):
    """
O(1) per update
    """

    # Step 1: Create a difference array of size (n + 1) initialized to 0
    diff_array = [0] * (array_length + 1)

    # Step 2: Apply each update in O(1) using the diff array
    for start, end, increment in updates:
        diff_array[start] += increment
        if end + 1 < array_length:
            diff_array[end + 1] -= increment

    # Step 3: Convert the diff array to the final result using prefix sum
    final_array = [0] * array_length
    final_array[0] = diff_array[0]

    for i in range(1, array_length):
        final_array[i] = final_array[i - 1] + diff_array[i]

    return final_array

Isomorphic Strings

def is_isomorphic_single_direction(s: str, t: str) -> bool:
    """
    Checks if string s can be transformed into string t following the
    isomorphic string rules (one-to-many mapping, preserving order).
    """
    if len(s) != len(t):
        return False

    s_to_t_map = {}
    t_to_s_map = {}

    for i in range(len(s)):
        char_s = s[i]
        char_t = t[i]

        # Check mapping from s to t
        if char_s in s_to_t_map:
            if s_to_t_map[char_s] != char_t:
                return False
        else:
            if char_t in t_to_s_map:
                return False  # t_char is already mapped to another s_char
            s_to_t_map[char_s] = char_t
            t_to_s_map[char_t] = char_s

    return True

def is_isomorphic(s: str, t: str) -> bool:
    """
    Determines if two strings s and t are isomorphic by checking the
    isomorphism in both directions.
    """
    return is_isomorphic_single_direction(s, t) and is_isomorphic_single_direction(t, s)

Longest Consecutive Subsequence

def longestConsecutive(self, nums: List[int]) -> int:
        numSet = set(nums) # no need to worry about duplicates in between, we are hopping 
        longest = 0

        for num in numSet: # O(1)
            if (num - 1) not in numSet:
                length = 1
                while (num + length) in numSet: # consecutives are in set
                    length += 1
                longest = max(length, longest)
        return longest

Next Permutation

def nextPermutation(self, nums: List[int]) -> None:
        n = len(nums)
        i = n - 2
        
        # Find the pivot: first index where nums[i] < nums[i+1]
        while i >= 0 and nums[i] >= nums[i + 1]:
            i -= 1
        
        # If pivot found (i >= 0), find successor to swap
        if i >= 0:
            j = n - 1
            # Find smallest element greater than nums[i] from the end
            while j > i and nums[j] <= nums[i]:
                j -= 1
            # Swap pivot with successor
            nums[i], nums[j] = nums[j], nums[i]
        
        # Reverse the suffix starting from i+1 to end
        left, right = i + 1, n - 1
        while left < right:
            nums[left], nums[right] = nums[right], nums[left]
            left += 1
            right -= 1

Count Longest Subarray with Sum = k

"""
In case of having integers this solution is valid, but incase of whole numbers we can further optimize by using greedy + two pointers exploiting the monotonicity. When all elements are positive, increasing the end pointer (right) of a window will always increase the sum, and moving the start pointer (left) forward will always decrease the sum. With negative numbers or zeros, increasing the window size can decrease or not change the sum, so the sliding window property breaks. 
In both cases, time complexity is O(n)
"""
def longest_subarray_sum_k(arr, k):
    # Dictionary to store the earliest index of each prefix sum
    prefix_sum_indices = {0: -1}
    prefix_sum = 0
    max_length = 0      # Length of the longest subarray with sum = k
    count = 0           # Number of longest subarrays with sum = k

    for i, num in enumerate(arr):
        prefix_sum += num
        # Check if there is a prefix sum that, when subtracted, gives sum = k
        if prefix_sum - k in prefix_sum_indices:
            length = i - prefix_sum_indices[prefix_sum - k]
            # If found a longer subarray, update max_length and reset count
            if length > max_length:
                max_length = length
                count = 1
            # If found another subarray of same max_length, increment count
            elif length == max_length:
                count += 1
        # Only store the earliest index for each prefix sum to maximize subarray length
        if prefix_sum not in prefix_sum_indices:
            prefix_sum_indices[prefix_sum] = i

    return max_length, count

Maximum Subarray sum

def kadane(arr):
        """Maximum subarray sum"""
        max_current = max_global = arr[0]
        for i in range(1, len(arr)):
            max_current = max(arr[i], max_current + arr[i])
            max_global = max(max_global, max_current)
        return max_global

Quick and Merge Sort are O(nlogn) but if finite elements are there, then use Bucket Sort to sort in O(n) time and O(1) space in-place. When you have a large dataset and want to distribute the sorting workload, its useful. Sorting in memory is faster than sort on disk. However if you have more data than will fit in memory you need another option. What you can do is a bucket sort, where the buckets are small enough to fit into memory. i.e. there is a large number of entries in each bucket. These you can quick sort individually.
Top K Frequent Elements

"""
O(n) Space and Time
Bucket sort is an algorithm which involves grouping numbers based on their frequency. Use the bucket sort algorithm to create n buckets, grouping numbers based on their frequencies from 1 to n. Then, pick the top k numbers from the buckets, starting from n down to 1.

Heap solution is better than sorting as it will O(klogn) compared to O(nlogn) but have O(n+k) space compared to O(n).
"""
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        count = {}
        freq = freq = [[] for i in range(len(nums) + 1)] # finite upto lenght of array

        for num in nums:
            count[num] = 1 + count.get(num, 0)
        for num, cnt in count.items():
            # count is index for array of nums there!
            freq[cnt].append(num)

        res = []
        for i in range(len(freq) - 1, 0, -1): # reverse traversal for top k
            for num in freq[i]:
                res.append(num)
                if len(res) == k:
                    return res

Starting the pointer from back is benefitial if you are trying to merge things with any condition (like non-decreasing).

Strings

Minimum Changes To Make Alternating Binary String

"""
There are two valid alternating patterns for any string of length n:
    Starting with '0': e.g. "010101..."
    Starting with '1': e.g. "101010..."
"""
def minOperations(self, s: str) -> int:
        flips_start_with_0 = 0
        flips_start_with_1 = 0

        for i, ch in enumerate(s):
            expected_0 = '0' if i % 2 == 0 else '1'
            expected_1 = '1' if i % 2 == 0 else '0'
            
            if ch != expected_0:
                flips_start_with_0 += 1
            if ch != expected_1:
                flips_start_with_1 += 1

        return min(flips_start_with_0, flips_start_with_1)

Longest Substring without Repeating Character

"""
Time: O(n) — each character is visited at most twice
Space: O(min(n, m)) — where m is the alphabet size (26 when only lowercase)
"""
def lengthOfLongestSubstring(self, s: str) -> int:
        char_index_map = {}  # stores character → last seen index
        start = 0  # start index of current window
        max_length = 0  # result

        for end, char in enumerate(s):
            # If the character is already seen and is in the current window
            if char in char_index_map and char_index_map[char] >= start:
                start = char_index_map[char] + 1  # shrink window from left

            char_index_map[char] = end  # update last seen index
            max_length = max(max_length, end - start + 1)

        return max_length

Find Anagram or Permutation or similar in String

"""
O(n+m) Time and O(1) space
If given that the input consists only of lowercase English letters then frequency counting tuple is most optimal. Otherwise just simply append tuple in hashmap.
"""

from typing import List

class Solution:
    def findAnagrams(self, s: str, p: str) -> List[int]:
        n, m = len(s), len(p)
        if m > n:
            return []

        res = []
        # a-z has 26 characters
        pCount = [0] * 26
        sCount = [0] * 26

        # Build frequency counts for the first window and target
        for i in range(m):
            pCount[ord(p[i]) - ord('a')] += 1
            sCount[ord(s[i]) - ord('a')] += 1

        # Track how many characters match in frequency
        matches = sum(pCount[i] == sCount[i] for i in range(26))

        # Initial match
        if matches == 26:
            res.append(0)

        for i in range(m, n):
            left = ord(s[i - m]) - ord('a')
            right = ord(s[i]) - ord('a')

            # Remove the outgoing character
            if sCount[left] == pCount[left]:
                matches -= 1
            sCount[left] -= 1
            if sCount[left] == pCount[left]:
                matches += 1

            # Add the new character
            if sCount[right] == pCount[right]:
                matches -= 1
            sCount[right] += 1
            if sCount[right] == pCount[right]:
                matches += 1

            if matches == 26:
                res.append(i - m + 1)

        return res

      # Grouping the anagram 
      d = defaultdict(findAnagrams())
      for s in strs:  # O(N)
        d[tuple(sorted(s))].append(s)
        return d.values()

Maximum Score After Splitting a String

"""
Score = zero_in_left + (total_ones - ones_in_left)
Time - O(N) in one pass through array
"""
def maxScore(self, s: str) -> int:
        # Initialize left and right counts
        left_zeros = 1 if s[0] == '0' else 0
        right_ones = s[1:].count('1')
        max_score = left_zeros + right_ones

        for i in range(1, len(s) - 1):
            if s[i] == '0':
                left_zeros += 1
            else:
                right_ones -= 1
            max_score = max(max_score, left_zeros + right_ones)

        return max_score

String Compression

"""
O(n) Time and O(1) Space - in-place modification

"a" → just "a" (not "a1")
"a","a" → "a","2"
"a","b" → "a","b" (no numbers needed)
"""
from typing import List

class Solution:
    def compress(self, chars: List[str]) -> int:
        write = 0  # Where we write next
        left = 0   # Start of current group

        for right in range(len(chars) + 1):  # Extra step to flush last group
            # Check group break (or end of list)
            if right == len(chars) or chars[right] != chars[left]:
                count = right - left
                chars[write] = chars[left]
                write += 1

                if count > 1:
                    for digit in str(count):
                        chars[write] = digit
                        write += 1

                left = right  # Move to start of next group

        return write

Next Smallest Palindrome

"""
Time Complexity - O(N)

Lots of edge cases :
All 9s → e.g., 999 → 1001
Mirroring creates a smaller number → e.g., 12932 → mirror → 12921 < 12932, so you must increment and mirror again
Odd vs Even Length
Middle Carry Propagation when incrementing e.g., 1991 → mirror = 1991 < 1991, so increment → 20 → 2002
"""
def next_palindrome(num: str) -> str:
    n = len(num)
    half = (n) // 2
    is_odd = (n % 2 != 0)

    # Split into left, middle (if odd), right
    left = num[:half]
    middle = num[half] if is_odd else ''
    right = num[half+1:] if is_odd else num[half:]

    # Mirror left + middle to create a palindrome
    if is_odd:
        candidate = left + middle + left[::-1]
    else:
        candidate = left + left[::-1]

    if candidate > num:
        return candidate

    # If not, we need to increment (left + middle)
    if is_odd:
        incremented = str(int(left + middle) + 1)
        new_left = incremented[:-1]
        new_middle = incremented[-1]
        return new_left + new_middle + new_left[::-1]
    else:
        incremented = str(int(left) + 1)
        return incremented + incremented[::-1]

Flip String to Monotonic Increasing

"""
We want to decide a split point in the string such that:
All characters to the left are 0
All characters to the right are 1
We try every split point i (between characters):
Count number of 1s before i (that need to be flipped to 0)
Count number of 0s after i (that need to be flipped to 1)
The total flips = ones_before + zeros_after

Time Complexity: O(n)
Space Complexity: O(n) (due to the prefix array)
"""
"""
|00110 → 0 flips left, flip 3 zeros to 1 → 3
0|0110 → 0 1s left, 2 zeros right → 2
00|110 → 0 1s left, 1 zero right → 1
001|10 → 1 1s left, 1 zero right → 2
0011|0 → 1 1s left, 1 zero right → 2
00110| → 2 1s left, 0 zero right → 2
"""
def minFlipsMonoIncr(self, s: str) -> int:
        n = len(s)

        # Compute prefix sum of 1s
        ones_prefix = [0] * (n + 1)
        for i in range(n):
            ones_prefix[i + 1] = ones_prefix[i] + (1 if s[i] == '1' else 0)

        min_flips = float('inf')

        for i in range(n + 1):
            # Left: 0..i-1 → should be 0 → flip all 1s in that region
            ones_left = ones_prefix[i]

            # Right: i..n-1 → should be 1 → flip all 0s in that region
            zeros_right = (n - i) - (ones_prefix[n] - ones_prefix[i])

            min_flips = min(min_flips, ones_left + zeros_right)

        return min_flips

LinkedList

In typical bruteforce solutions of LL, when we want to find lenght of LL :

"""
O(N) Time 
"""
def length_of_linked_list(head: Optional[ListNode]):
"""
Although nodes is just a Python list, each element in nodes is still a reference to a node in the original linked list.
"""
    nodes = []
    current = head
    while current is not None:
        nodes.append(current)
        current = current.next
    return len(nodes)

    # if we want to remove from back in bruteforce way
    removeIndex = len(nodes) - n
        if removeIndex == 0:
            return head.next
    # directly operate on nodes array as its reference to LL
    nodes[removeIndex - 1].next = nodes[removeIndex].next
    return head

If we are working on the problems where we are asked to remove head node only and there are no other nodes in the linkedlist the dummy node can handle this case and our logic will work fine else we will need to add extra if conditions to handle this special case. Beware of .next.next giving null pointer exception.
Reverse Sublist

"""
Time - O(n) and Space - O(1)
"""
class Solution:
    def reverseBetween(self, head: Optional[ListNode], left: int, right: int) -> Optional[ListNode]:
        if not head or left == right:
            return head

        dummy = ListNode(0, head)
        prev, head = dummy, dummy.next
         
        # Move `prev` to node before `left`
        for _ in range(left - 1):
            prev = prev.next

        # `start` is the first node to reverse
        start = prev.next
        then = start.next

        # Reverse the sublist using head-insertion
        for _ in range(right - left):
            start.next = then.next
            then.next = prev.next
            prev.next = then
            then = start.next

        return dummy.next

Loop Detection and Removal from LinkedList (Fast and Slow pointers)

def detect_and_remove_cycle(head: ListNode) -> ListNode:
    if not head or not head.next:
        return head

    # Step 1: Detect loop
    slow = fast = head
    has_cycle = False

    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
        if slow == fast:
            has_cycle = True
            break

    if not has_cycle:
        return head  # No loop detected

    # Step 2: Find the start of the loop
    slow = head
    while slow != fast:
        slow = slow.next
        fast = fast.next

    # slow (or fast) is now at the start of the loop

    # Step 3: Find the last node in the cycle and break it
    ptr = slow
    while ptr.next != slow:
        ptr = ptr.next

    ptr.next = None  # Break the loop

    return head

Monotonic Stack and Queue

Next/Previous Greater/Smaller Element

def next_greater_element_backward(nums):
    """
    Finds the next greater element by iterating from the back.
    """
    n = len(nums)
    res = [-1] * n    # Initialize result array
    stack = []        # Will store elements encountered from the right

    # Iterate from n-1 down to 0
    for i in range(n - 1, -1, -1):
        # Pop elements from stack that are smaller or equal to the current element
        # They can't be the NGE for the current or any future elements to the left.
        while stack and nums[i] >= stack[-1]: # Note: storing values here for simplicity
            stack.pop()
        
        # If stack is not empty, its top is the Next Greater Element for nums[i]
        res[i] = stack[-1] if stack else -1
        
        # Push the current element onto the stack for future (leftward) elements to use
        stack.append(nums[i])

    return res

def next_smaller_element(nums):
    n = len(nums)
    res = [-1] * n  # Initialize with -1 (no smaller element)
    stack = []      # Will store indices in increasing order of nums values

    for i in range(n):
        while stack and nums[i] < nums[stack[-1]]:
            idx = stack.pop()
            res[idx] = nums[i]  # Found next smaller element
        stack.append(i)
    
    return res

def previous_greater_element(nums):
    """
    For each element, find the nearest element to the LEFT that is GREATER.
    Returns a list where res[i] is the previous greater element of nums[i],
    or -1 if no such element exists.
    """
    n = len(nums)
    res = [-1] * n    # Initialize result array with -1 (means no previous greater)
    stack = []        # Stack to keep indices of elements, maintaining strictly decreasing order

    for i in range(n):
        # Pop elements from the stack while current element is >= element at stack top
        # because they cannot be previous greater for future elements
        while stack and nums[i] >= nums[stack[-1]]:
            stack.pop()
        
        # If stack is not empty, the top is the index of previous greater element
        res[i] = nums[stack[-1]] if stack else -1
        
        # Push current index to stack to possibly be previous greater for future elements
        stack.append(i)

    return res

def previous_smaller_element(nums):
    """
    For each element, find the nearest element to the LEFT that is SMALLER.
    Returns a list where res[i] is the previous smaller element of nums[i],
    or -1 if no such element exists.
    """
    n = len(nums)
    res = [-1] * n       # Initialize result array with -1 (means no previous smaller)
    stack = []           # Stack to keep indices of elements, maintaining strictly increasing order

    for i in range(n):
        # Pop elements from stack while current element is <= element at top of stack
        # because they cannot be previous smaller for future elements
        while stack and nums[i] <= nums[stack[-1]]:
            stack.pop()
        
        # If stack not empty, top of stack is index of previous smaller element
        res[i] = nums[stack[-1]] if stack else -1
        
        # Push current index for future elements to compare
        stack.append(i)

    return res

In problems like Valid Parenthesis, you can simple take ( as +1 and ) as -1, and then if the end summation is 0, then its valid. If anytime it becomes -1, then we can break then an there as now there is a ) without (. Not necessary to have a stack.

def is_valid_parentheses(s: str) -> bool:
    open_count = 0  # Tracks the net number of unmatched '('

    for char in s:
        if char == '(':
            open_count += 1
        elif char == ')':
            open_count -= 1
            # If at any point open_count is negative, a ')' is unmatched
            if open_count < 0:
                return False

    # At the end, all '(' must be matched by ')'
    return open_count == 0

Simplify Path

"""
Example: Input path = "/a/./b/../../c/"

    Split: paths = ["", "a", ".", "b", "..", "..", "c", ""]

    Process each:

        "" (empty): Skip.

        "a": Append to stack → stack = ["a"]

        ".": Skip.

        "b": Append → stack = ["a", "b"]

        "..": Pop "b" → stack = ["a"]

        "..": Pop "a" → stack = []

        "c": Append → stack = ["c"]

        "" (empty): Skip.

    Result: "/" + "/".join(["c"]) → "/c"
"""
class Solution:
    def simplifyPath(self, path: str) -> str:
        stack = []
        paths = path.split("/") # we can add / later, we dont need to keep removing ////.. as they can be a lot, just remove completely and add later

        for cur in paths:
            if cur == "..": # this is why we need stack - poping is O(1)
                if stack:
                    stack.pop()
            elif cur != "" and cur != ".": # skip empty and dot, we ontl care about double dot
                stack.append(cur)

        return "/" + "/".join(stack)

Expression Evaluation

def evaluate(expression: str) -> int:
    def calculate(tokens):
        stack = []
        num = 0
        sign = '+'  # Default sign is '+'

        while tokens:
            token = tokens.pop(0)  # Read one character at a time

            if token.isdigit():  # Build multi-digit number
                num = num * 10 + int(token)

            if token in "+-*/" or not tokens:  # End of number OR end of expression
                if sign == '+':
                    stack.append(num)
                elif sign == '-':
                    stack.append(-num)
                elif sign == '*':
                    stack.append(stack.pop() * num)
                elif sign == '/':
                    stack.append(int(stack.pop() / num))  # Python's `/` is float, so use `int()`

                sign = token  # Update the operator
                num = 0  # Reset the number

            elif token == '(':  # Solve sub-expression inside parentheses
                num = calculate(tokens)

            elif token == ')':  # Return computed result if closing parenthesis is found
                break

        return sum(stack)  # Sum all elements left in the stack

    return calculate(list(expression.replace(" ", "")))  # Convert expression to list (char by char)

Daily Temperature

"""
Can be easily solved by a monotonic decreasing stack, but we can space optimize it by dynamic programming approach that avoids the brute-force O(n²) complexity by using previously computed results to "skip" over days that are not warmer.
"""
from typing import List

class Solution:
    def dailyTemperatures(self, temperatures: List[int]) -> List[int]:
        """
        Calculates the number of days to wait for a warmer temperature using an
        optimized dynamic programming approach.

        Time Complexity: O(n), where n is the number of temperatures. Each index is
                         visited a constant number of times on average.
        Space Complexity: O(n) for the result array (or O(1) if not counting output space).
        """
        num_days = len(temperatures)
        
        # This array will store our final answer.
        # days_to_wait[i] = number of days to wait from day 'i' for a warmer day.
        days_to_wait = [0] * num_days
        
        # Tracks the hottest temperature seen so far, iterating from right to left.
        hottest_temp_so_far = -1

        # Iterate backwards from the last day.
        # We can start from n
        for current_day in range(num_days - 1, -1, -1):
            current_temp = temperatures[current_day]

            # --- Optimization Step ---
            # If the current temperature is the new hottest (or equal), we know for
            # sure that no warmer day exists in the future.
            if current_temp >= hottest_temp_so_far:
                hottest_temp_so_far = current_temp
                # The answer is 0, which is the default, so we can just move on.
                continue

            # --- Main DP Logic (only runs if a warmer day is possible) ---
            # We know a warmer day exists, so we start searching for it.
            # 'days' represents the offset from the current_day.
            days = 1
            
            # This loop efficiently skips over non-warmer days.
            while temperatures[current_day + days] <= current_temp:
                # The core DP "jump": We move our search pointer forward by the
                # number of days we already know it takes to find a warmer day
                # from that future position.
                days += days_to_wait[current_day + days]
            
            # Once the loop terminates, 'days' holds the distance to the next warmer day.
            days_to_wait[current_day] = days

        return days_to_wait

Sliding Window Maximum

def max_sliding_window(nums: list[int], k: int) -> list[int]:
    """Maximum in sliding window using deque, where the deque always contains indices of elements in decreasing order of their values (monotonic decreasing queue). The front of the deque is always the index of the maximum element for the current window"""
    if not nums:
        return []  # Edge case: empty input
    
    result = []           # Stores the maximums for each window
    window = deque()      # Stores indices of useful elements in the current window
    
    for i in range(len(nums)):
        # Remove indices from the front if they are out of the current window
        while window and window[0] < i - k + 1:
            window.popleft()
        
        # Remove indices from the back if their corresponding values are less than nums[i]
        # These elements can't be the maximum for this or any future window
        while window and nums[window[-1]] < nums[i]:
            window.pop()
        
        # Add current index to the deque
        window.append(i)
        
        # If we've processed at least k elements, record the max for the current window
        # The front of the deque is always the index of the maximum element in the window
        if i >= k - 1:
            result.append(nums[window[0]])
    
    return result

Sum of Subarray Minimums
Continuous Subarray

Binary Search

Binary search is used when you need to repeatedly find a value in a sorted array without worrying about removals, or when each query is independent. Apply whenever there is a bounded range and we want to find a given target or min/max (by reducing the search space via lograthmic partitioning of the monotonic answer function's range).

def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = left + (right - left) // 2  # Avoids overflow
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1  # Target not found

The lower bound of target is the index of the first element in arr that is greater than or equal to target. The upper bound of target is the index of the first element in arr that is strictly greater than target. If target is not present, it's the index where target would be inserted to maintain the sorted order. In sorted array, first occurance is lower bound and last occurance is upper bound - 1. Total number of occurance of an element in sorted array is (last occurance - first occurance) + 1 = (upper bound - lower bound).

def lower_bound(array, target):
    """
    Returns the smallest index l such that array[l] >= target and array[r] < target
    If all elements are < target, returns len(array).
    """
    left, right = 0, len(array) - 1
    while left <= right:
        mid = (left + right) // 2
        # If mid element is less than target, move left up. We are trying to find first element that is equal to target, after that as well there might be multiple elememts that are target, so we will keep on shifting right pointer but not left, that should be on the first target value
        if array[mid] >= target:
            ans = mid  # Potential answer, but look left for earlier occurrence
            right = mid - 1
        else:
            left = mid + 1
    return ans

def upper_bound(array, target):
    """
    Returns the smallest index r such that array[r] > target and array[l] <= target.
    If all elements are <= target, returns len(array).
    To get the index of the last element <= target, return upper_bound(array, target) - 1
    """
    left, right = 0, len(array) - 1
    ans = len(arr)  # Default to len(arr) if not found
    while left <= right:
        mid = (left + right) // 2
        if array[mid] > target:
            ans = mid  # Potential answer, but look left for earlier occurrence
            right = mid - 1
        else:
            left = mid + 1
    return ans

In binary search on answers, the answer space has to be sorted contiguous array. The check function returns array like [false, false, false, true, true, true...] and the first true (lower bound) is what we are interested in.

def binary_search_answer(space):
    left, right = 0, len(space)
    while left < right:
        mid = (left + right) // 2
        if check(space[mid]):
            right = mid  # mid might be the first True, so keep it
        else:
            left = mid + 1  # mid is False, need larger
    return left  # index of first True (if any)

Longest Common Prefix

"""
Approach	Time Complexity	Space Complexity	Suitable For
Brute Force (Vertical Scanning)	O(N * M)	O(1)	Small Inputs
Binary Search	O(N log M)	O(1)	Medium Inputs
Divide & Conquer	O(N * M)	O(log N)	Recursion-friendly cases
Trie (Prefix Tree)	O(N * M)	O(N * M)	Large Inputs
"""
def is_common_prefix(strs, length):
    prefix = strs[0][:length]
    return all(s.startswith(prefix) for s in strs)

def longest_common_prefix(strs):
    if not strs:
        return ""

    left, right = 0, min(map(len, strs))
    while left < right:
        mid = (left + right + 1) // 2
        if is_common_prefix(strs, mid):
            left = mid  # Move right (increase prefix size)
        else:
            right = mid - 1  # Move left (decrease prefix size)

    return strs[0][:left]

Search in Rotated Sorted Array

def searchInRotatedArray(self, arr: List[int]) -> int:
        left, right = 0, len(arr) - 1

        while left < right: # if they are equal, we will be stuck in a loop
            mid = (left + right) // 2
            # as both parts are sorted individually, the minimum value will occur when arr[left] > arr[right]
            if arr[mid] < arr[right]:
                # we are in the increasing part → move right
                left = mid + 1
            else: # when mid value is smaller than right
                # we are in the decreasing part → move left
                right = mid

        return right  # or left; both point to the peak, left is maximum and right is minimum

Now, from the [0, pivot_index - 1] and [pivot_index, end] apply binary search to both to get the target in rotated array. Also pivot_index is the number of times the array has been rotated.

Kth Missing Positive Number
Minimum Time to Complete Trips
Successful Pairs of Spells and Potions
Minimum Size Subarray Sum
Minimum Cost to Make Array Equal

"""
Prefix + Binary Search is O(nlogn). This is best we can do incase of +ve, -ve, 0 in array. Prefix sums allow you to compute the sum of any subarray efficiently. Convert the problem of minimal subarray length with sum ≥ target into finding the smallest j such that prefixSum[j] - prefixSum[i] >= target for each i.
"""
def minSubArrayLen(self, target: int, nums: List[int]) -> int:
        n = len(nums)
        prefix_sums = [0] * (n + 1)

        # Compute prefix sums
        for i in range(n):
            prefix_sums[i + 1] = prefix_sums[i] + nums[i]

        min_length = n + 1 # this length cannot be the answer, so min() with it from any subarray will be fine
        
        for start in range(n):
            left, right = start, n
            # Binary search to find smallest end index where subarray sum >= target (lower bound)
            while left < right:
                mid = (left + right) // 2
                current_sum = prefix_sums[mid + 1] - prefix_sums[start]
                if current_sum >= target:
                    right = mid
                else:
                    left = mid + 1
            if left != n:
                min_length = min(min_length, left - start + 1)

        # If no valid subarray found, return 0
        return 0 if min_length == n + 1 else min_length
"""
Sliding Window is exploiting the fact that we have positive integers only in the array. The sliding window technique relies on the property that adding more elements to the window does not decrease the sum (because all are positive). So, if the subarray sum >= k is found, the right pointer can be increase till end of array. But we need minimum, so no need for that! Instead we move the left pointer to find another subarray. We want to shrink the window as soon as we see a valid subarray.
"""
def minSubArrayLen(self, target: int, nums: List[int]) -> int:
        left, sum = 0, 0
        count = float("inf")
"""
Just because we are using 2 loops dosent make it O(n^2).
The outer for right in range(len(nums)) loop runs n times. The inner while sum >= target: loop moves the left pointer forward. Critically, left pointer only moves forward throughout the entire algorithm (never back). Both left and right pointers together move at most 2n steps (right goes from 0 to n-1; left only moves forward). Each element is visited at most twice: once when right includes it and once when left excludes it.
"""
        for right in range(len(nums)):
            sum += nums[right]
            while sum >= target:
                count = min(right - left + 1, count)
                sum -= nums[left]
                left += 1

        return 0 if count == float("inf") else count

Koko Eating Banana

def minEatingSpeed(self, piles: list[int], h: int) -> int:
        """
        Finds the minimum eating speed (k) such that Koko can eat all bananas
        within h hours.

        Args:
            piles: A list of integers representing the number of bananas in each pile.
            h: The maximum number of hours Koko has to eat all bananas.

        Returns:
            The minimum integer k (eating speed).
        """

        # The search space for 'k'
        # The minimum possible speed is 1 (Koko must eat at least 1 banana per hour).
        # The maximum possible speed is the largest pile size. If Koko eats at this speed,
        # she'll finish each pile in 1 hour, which is the fastest possible.
        low = 1
        high = max(piles)
        
        # This will store the minimum valid 'k' found so far.
        # Initialize with 'high' because we know 'high' is always a valid speed,
        # but we want to find the minimum.
        min_k = high 

        # Binary search for the minimum k
        while low <= high:
            mid = (low + high) // 2  # Calculate the middle speed

            # Calculate the total time taken to eat all bananas at 'mid' speed
            total_time = 0
            for pile in piles:
                # For each pile, the time taken is ceil(pile / mid).
                # math.ceil(a / b) calculates the smallest integer greater than or equal to a/b.
                # This correctly accounts for cases where pile < mid (eats all in 1 hour)
                # or pile is a multiple of mid (exact division) or not (needs an extra hour).
                total_time += math.ceil(pile / mid)
            
            # Check if Koko can eat all bananas within 'h' hours at 'mid' speed
            if total_time <= h:
                # If she can, 'mid' is a possible answer.
                # We try to find an even smaller 'k' by searching in the lower half.
                min_k = mid
                high = mid - 1
            else:
                # If she cannot, 'mid' is too slow.
                # We need a faster speed, so search in the upper half.
                low = mid + 1
        
        return min_k

Most Beautiful Item for Each Query

"""
Binary Search + Prefix Sum for range queries
"""
def maximumBeauty(self, items, queries):
        # Step 1: Sort items by price
        items.sort()
        
        # Step 2: Build prefix max array for beauty
        prices = []
        prefix_max_beauty = []
        max_beauty = 0
        
        for price, beauty in items:
            # Only keep the best beauty seen so far for each price
            max_beauty = max(max_beauty, beauty)
            if prices and prices[-1] == price:
                # skip duplicate prices, keep max
                prefix_max_beauty[-1] = max_beauty
            else:
                prices.append(price)
                prefix_max_beauty.append(max_beauty)

        # Step 3: Binary search for each query
        res = []
        for q in queries:
            # Find the rightmost item with price <= q
            idx = bisect_right(prices, q) - 1
            if idx >= 0:
                res.append(prefix_max_beauty[idx])
            else:
                res.append(0)
        return res

Find in Mountain Array

# """
# This is MountainArray's API interface.
# You should not implement it, or speculate about its implementation
# """
#class MountainArray:
#    def get(self, index: int) -> int:
#    def length(self) -> int:

class Solution:
    def findInMountainArray(self, target: int, mountainArr: 'MountainArray') -> int:
        n = mountainArr.length()

        # Step 1: Find peak index of the mountain array using binary search
        left, right = 1, n - 2  # peak cannot be first or last element
        while left <= right:
            mid = (left + right) // 2
            val_left = mountainArr.get(mid - 1)
            val_mid = mountainArr.get(mid)
            val_right = mountainArr.get(mid + 1)

            # If slope is ascending: move right to find peak
            if val_left < val_mid < val_right:
                left = mid + 1
            # If slope is descending: move left to find peak
            elif val_left > val_mid > val_right:
                right = mid - 1
            else:
                # Found peak point where array changes from increasing to decreasing
                break
        peak_index = mid

        # Step 2: Binary search on ascending part from start to peak
        left, right = 0, peak_index
        while left <= right:
            mid = (left + right) // 2
            val = mountainArr.get(mid)
            if val == target:
                return mid
            elif val < target:
                left = mid + 1
            else:
                right = mid - 1

        # Step 3: Binary search on descending part from peak+1 to end
        left, right = peak_index + 1, n - 1
        while left <= right:
            mid = (left + right) // 2
            val = mountainArr.get(mid)
            if val == target:
                return mid
            elif val > target:
                left = mid + 1
            else:
                right = mid - 1

        # If target is not found in either part
        return -1

Recursion and Backtracking

Lexographical Numbers

"""
From each number, we explore up to 10 new branches (e.g., from 1, we explore 10, 11, ..., 19). If the recursion continued to a fixed depth d without any constraints, the complexity would indeed be O(10^d), which is exponential. However, the crucial element here is the condition: if current > n: return. This check prunes the recursion tree. The algorithm doesn't explore all possible paths. It only explores paths that lead to numbers less than or equal to n.
"""
def lexicalOrder(n: int) -> List[int]:
    result = []

    def dfs(curr):
        if curr > n:
            return
        result.append(curr)
        for i in range(10):
            next_num = curr * 10 + i
            if next_num > n:
                break  # Prune this branch
            dfs(next_num)

    for i in range(1, 10):
        dfs(i)

    return result

Palindrome Partioning

def partition(s):
    """
    Returns all possible palindrome partitionings of string s.
    Each partition is a list of substrings, all of which are 
palindromes.
    At each recursion, you are not deciding for each character whether to include it or not. Instead, you are choosing every possible end index for a substring starting at the current position, and only recursing on those substrings that are palindromes. This is a multi-way branching at each step, not binary.
    Time and Space Complexity - O(n×2^n) as each partitioning operation is O(n), and there are 2^n−1 partitions
    """
    from typing import List

class Solution:
    def partition(self, s: str) -> List[List[str]]:
        result = []
        substring = []
        self.backtrack(result, 0, substring, s)
        return result  # Return all palindrome partitions collected
    
    def backtrack(self, result: List[List[str]], start: int, substring: List[str], s: str) -> None:
        # If we reached the end, add a copy of current partition to the result
        if start == len(s):
            result.append(substring[:]) # add copy of substring
            return
        
        # Try all possible partitions starting at 'start'
        for end in range(start, len(s)):
            if self.isPalindrome(s, start, end):
                # Append current palindrome substring
                substring.append(s[start:end+1])
                # Recurse for the remaining substring
                self.backtrack(result, end + 1, substring, s)
                # Backtrack: remove last substring and try next
                substring.pop()
    
    def isPalindrome(self, s: str, left: int, right: int) -> bool:
        # Check palindrome between indices left and right inclusive
        while left < right:
            if s[left] != s[right]:
                return False
            left += 1
            right -= 1
        return True

Word Search

def exist(board, word):
    def dfs(i, j, k):
        if k == len(word):
            return True
        if i < 0 or i >= len(board) or j < 0 or j >= len(board[0]) or board[i][j] != word[k]:
            return False
        
        temp = board[i][j]
        board[i][j] = '#'  # Mark as visited
        
        # Explore neighbors
        if (dfs(i + 1, j, k + 1) or (dfs(i - 1, j, k + 1)) or (dfs(i, j + 1, k + 1)) or (dfs(i, j - 1, k + 1)):
            return True
        
        board[i][j] = temp  # Backtrack
        return False
    
    for i in range(len(board)):
        for j in range(len(board[0])):
            if dfs(i, j, 0):
                return True
    return False

Letter combinations of a phone number

"""
Approach	Time	Space	Notes
Backtracking (DFS)	O(4^N)	O(N)	Efficient, recursive
Iterative (BFS with Queue)	O(4^N)	O(4^N)	Uses queue for level-wise expansion
"""
from typing import List

def letterCombinations(digits: str) -> List[str]:
    if not digits:
        return []

    phone_map = {
        "2": "abc", "3": "def", "4": "ghi", "5": "jkl",
        "6": "mno", "7": "pqrs", "8": "tuv", "9": "wxyz"
    }

    res = []

    def backtrack(index, path):
        if index == len(digits):
            res.append("".join(path))
            return

        for char in phone_map[digits[index]]:
            path.append(char)
            backtrack(index + 1, path)
            path.pop()  # Backtrack to previous state

    backtrack(0, [])
    return res

Partition to K Equal Sum Subsets

"""
We can generate all subsets such that, each element falls into one of k buckets - each has k options, such that their sum is same = sum(nums)/k. Such decision tree would be of height n, so O(k^n) - pretty bad!

Now we dont really need to put the elements in summation buckets, we can just recurse each element and if along the recursion the sum matches, store it until we have k such subsets stored. That would be O(k*2^n) as we will have k such decision trees! Also, we have to track what numbers we used to not use it again.

Interestingly, with right pruning, its 30ms solution!

Memoization is also possible, if we pass subsets as tuple.
"""
from typing import List

class Solution:
    def canPartitionKSubsets(self, nums: List[int], k: int) -> bool:
        total_sum = sum(nums)
        
        # If total sum is not divisible by k, partitioning into k equal subsets is impossible
        if total_sum % k != 0:
            return False
        
        target_sum = total_sum // k
        
        # This list keeps track of the current sum of each subset (not the elements themselves). Tracking sum only is sufficient here because we only need to know if subsets can achieve the target sum, not the exact elements in them.
        subsets_sums = [0] * k
        
        # Sorting in descending order helps pruning:
        # We try to place bigger numbers first, so if a big number doesn't fit early, 
        # we can backtrack faster.
        nums.sort(reverse=True)
        return backtrack(0)
        
        def backtrack(start: int) -> bool:
            # If all numbers have been placed successfully
            if start == len(nums):
                return True

            current_num = nums[start]

            # Try to put current_num into each of the k subsets
            for end in range(k):
                # If adding current_num doesn't exceed the target sum for this subset
                if subsets_sums[end] + current_num <= target_sum:
                    subsets_sums[end] += current_num  # Choose to add
                    
                    # Recursive call to place next number
                    if backtrack(start + 1):
                        return True  # Found a valid partition, no need to explore further
                    
                    subsets_sums[end] -= current_num  # Backtrack (undo the choice)
                    
                    # Optimization :
                    # If this subset was empty (sum zero) and adding current_num didn't lead to solution,
                    # then trying other empty subsets at this level will also fail.
                    if subsets_sums[end] == 0:
                        break

            # If no subset could accommodate current_num without violating target sum, fail
            return False

Fair Distribution of Cookies

"""
In palindrome partitioning, we divide continuously in a linear manner. However, in this question, we can divide from any point, pick any element, and partition discontinuously.

As the bags chosen is not necessarily contiguous subarray of cookies, we can't use Greedy or Sliding window of size k.

Greedy algorithms make locally optimal choices at each step, hoping to reach a globally optimal solution. In distribution problems, a locally "fair" choice might lead to a globally unfair outcome later. Imagine cookies [10, 1, 1, 10] and 2 children. If we had to give continuous subarrays, a greedy approach might try to give the first child [10] (largest first). Then the second child gets [1, 1, 10] (unfairness 12). However, the optimal solution might be to give the first child [10, 1] (unfairness 11) and the second child [1, 10] (unfairness 11). The initial greedy choice prevented this better outcome.
"""
def distributeCookies(self, cookies: List[int], k: int) -> int:
        n = len(cookies)

        # Minimum possible unfairness is the largest single bag.
        # Maximum possible unfairness is the sum of all cookies.
        left = max(cookies)
        right = sum(cookies)
        ans = right

        def can_distribute(max_unfairness: int) -> bool:
            children_cookies = [0] * k

            def backtrack(cookie_index: int):
                if cookie_index == n:
                    return True

                for i in range(k):
                    if children_cookies[i] + cookies[cookie_index] <= max_unfairness:
                        children_cookies[i] += cookies[cookie_index]
                        if backtrack(cookie_index + 1):
                            children_cookies[i] -= cookies[cookie_index]  # Backtrack
                            return True
                        children_cookies[i] -= cookies[cookie_index]  # Backtrack
                return False

            return backtrack(0)

        while left <= right:
            mid = (left + right) // 2
            if can_distribute(mid):
                ans = mid
                right = mid - 1
            else:
                left = mid + 1

        return ans

Greedy

Non Overlapping Intervals

"""
Minimize the number of intervals thats needs to be removed to avoid overlaps or select the maximum number of non-overlapping intervals.
At every step, always pick the interval that finishes earliest among the available ones. By always picking the interval that ends earliest, you leave the most room for the remaining intervals, maximizing your chances of fitting more without overlap
"""
def eraseOverlapIntervals(intervals):
    if not intervals:
        return 0

    # Sort intervals by end time
    intervals.sort(key=lambda x: x[1])
    count = 0
    prev_end = intervals[0][1]

    for i in range(1, len(intervals)):
        start, end = intervals[i]
        if start < prev_end: # If the current interval starts before the previous interval ends, they overlap.
            count += 1
        else: 
            # No overlap, update prev_end
            prev_end = end

    return count

Huffman Coding

import heapq
from collections import defaultdict, Counter

class HuffmanNode:
    def __init__(self, freq, char=None, left=None, right=None):
        self.freq = freq          # Frequency of the character(s)
        self.char = char          # Character stored (None for internal nodes)
        self.left = left          # Left child
        self.right = right        # Right child

    # For heapq to sort nodes by frequency
    def __lt__(self, other):
        return self.freq < other.freq

def build_huffman_tree(char_freq):
    """
    Build Huffman tree from character frequencies.
    Returns the root of the Huffman tree.
    """
    min_heap = []

    # Initialize the heap with leaf nodes for each character
    for char, freq in char_freq.items():
        heapq.heappush(min_heap, HuffmanNode(freq, char))
    
    # Combine nodes until single tree remains
    while len(min_heap) > 1:
        # Pop two nodes with smallest freq
        left = heapq.heappop(min_heap)
        right = heapq.heappop(min_heap)

        # Merge these nodes into a new internal node
        merged = HuffmanNode(left.freq + right.freq, None, left, right)
        heapq.heappush(min_heap, merged)
    
    # The remaining node is the root of the Huffman tree
    return min_heap[0] if min_heap else None

def build_huffman_codes(root):
    """
    Traverse the Huffman tree to build a dictionary of character->code mappings.
    """
    huffman_codes = {}

    def dfs(node, code):
        if node is None:
            return
        # If leaf node
        if node.char is not None:
            huffman_codes[node.char] = code if code else "0"  # handle single char edge case
            return
        # Traverse left adds "0", right adds "1"
        dfs(node.left, code + "0")
        dfs(node.right, code + "1")

    dfs(root, "")
    return huffman_codes

def huffman_encode(text):
    """
    Encode the input text using Huffman coding.
    Returns the encoded string and the Huffman tree root for decoding.
    """
    # Count frequency of each character
    char_freq = Counter(text)

    # Build Huffman tree
    root = build_huffman_tree(char_freq)

    # Build codes by traversing the tree
    codes = build_huffman_codes(root)

    # Encode the text using code map
    encoded_text = "".join(codes[ch] for ch in text)

    return encoded_text, root

def huffman_decode(encoded_text, root):
    """
    Decode the encoded text using the Huffman tree.
    Returns the original text string.
    """
    decoded_chars = []
    current = root
    for bit in encoded_text:
        # Traverse tree per bit
        if bit == "0":
            current = current.left
        else:
            current = current.right
        
        # On reaching leaf node, append character and restart from root
        if current.char is not None:
            decoded_chars.append(current.char)
            current = root
    
    return "".join(decoded_chars)

Fractional Knapsack

def fractional_knapsack(weights, values, capacity):
    """
    Args:
        weights: List of item weights.
        values: List of item values.
        capacity: Maximum capacity of the knapsack.
    
    Returns:
        max_value: Maximum value that can be carried in the knapsack.
    Time Complexity - O(nlogn)
    """

    # Step 1: Calculate value-to-weight ratio for each item
    n = len(weights)
    items = []
    for i in range(n):
        ratio = values[i] / weights[i]
        items.append((ratio, weights[i], values[i]))  # (ratio, weight, value)

    # Step 2: Sort items by ratio in descending order
    items.sort(reverse=True)

    # Step 3: Accumulate items in the knapsack
    max_value = 0.0
    for ratio, weight, value in items:
        if capacity >= weight:
            # If the whole item can be taken, take it all
            max_value += value
            capacity -= weight
        else:
            # Take the fraction that fits
            fraction = capacity / weight
            max_value += value * fraction
            break  # Knapsack is full

    return max_value

Tree

"""
The inner for loop is only needed if you want to group nodes by level; otherwise, a single while loop is sufficient for standard BFS traversal.
"""
def bfs(root):
        if not root:
            return []
        
        result = []
        # We enqueue child nodes at the back.
        # We dequeue nodes from the front to process them level by level.
        queue = deque([root]) # queue = [1]
        level_sum = 0 # when its never reset, it gives sum of all nodes in binary tree
        
        # each iteration of while loop is one level
        while queue:
            level = []
            level_sum = 0 # reset sum at each level, so last level sum is of deepest layer
            level_size = len(queue)
            
            # each iteration of for loop is for nodes at each level
            for _ in range(level_size):
                node = queue.popleft()
                level.append(node.val)
                level_sum += node.val
                
                if node.left:
                    queue.append(node.left)
                if node.right:
                    queue.append(node.right)
            
            result.append(level)
        return result
    
    # Iteration 1 (Level 0):
    # - level_size = 1
    # - Process node 1:
    #     * Add 1 to current level
    #     * Add children (2,3) to queue
    # - After iteration: 
    #     * queue = [2,3]
    #     * result = [[1]]
    
    # Iteration 2 (Level 1):
    # - level_size = 2
    # - Process node 2:
    #     * Add 2 to current level
    #     * Add children (4,5) to queue
    # - Process node 3:
    #     * Add 3 to current level
    #     * Add child (6) to queue
    # - After iteration:
    #     * queue = [4,5,6]
    #     * result = [[1], [2,3]]
    
    # Iteration 3 (Level 2):
    # - level_size = 3
    # - Process nodes 4,5,6:
    #     * Add 4,5,6 to current level
    #     * No children to add
    # - After iteration:
    #     * queue = []
    #     * result = [[1], [2,3], [4,5,6]]
    
    # Final result: [[1], [2,3], [4,5,6]]
    
    # Time: O(n) - visit each node once
    # Space: O(w) - where w is max width of tree

# Left -> Root -> Right (Sorted for BST)
def inorder(node: TreeNode) -> List[int]:
            if not node:
                return []
            return inorder(node.left) + [node.val] + inorder(node.right)
        return inorder(root)

# Root -> Left -> Right
def preorder(node: TreeNode) -> List[int]:
            if not node:
                return []
            return [node.val] + preorder(node.left) + preorder(node.right)
        return preorder(root)

# 
def postorder(node: TreeNode) -> List[int]:
            if not node:
                return []
            return postorder(node.left) + postorder(node.right) + [node.val]
        return postorder(root)

Height of Binary Tree

def height(root):
    if root is None:
        return 0  # Height of empty tree is 0 (sometimes defined as -1)
    left_height = height(root.left)
    right_height = height(root.right)
    return max(left_height, right_height) + 1

Find Leaves of Binary Tree
Count Nodes Equal to Average of Subtree

class Solution:
    def averageOfSubtree(self, root: TreeNode) -> int:
        # If you use result = 0 (a local variable inside the method), each recursive call gets its own copy of result, and their changes are not reflected back to the caller or accumulated. With self.result, all recursive calls share the same variable attached to the Solution object. Any modification in any nested call is preserved and visible globally.

        self.result = 0 
        self.dfs(root)
        return self.result
        
    def dfs(self, node: TreeNode):
        if not node:
            return (0, 0)  # sum = 0, count = 0

        left_sum, left_count = self.dfs(node.left)
        right_sum, right_count = self.dfs(node.right)

        total_sum = node.val + left_sum + right_sum
        total_count = 1 + left_count + right_count

        average = total_sum // total_count
        if node.val == average:
            self.result += 1

        return (total_sum, total_count)

Delete Leaves With a Given Value

"""
O(n) Time and O(h) space
"""
    def removeLeafNodes(self, root: Optional[TreeNode], target: int) -> Optional[TreeNode]:
        if not root:
            return None

        # First recurse into children
        root.left = self.removeLeafNodes(root.left, target)
        root.right = self.removeLeafNodes(root.right, target)

        # Then check if current node is now a leaf
        if not root.left and not root.right and root.val == target:
            # in case of leaf node, it gets cut from the parent as removeLeafNodes will return None making root.left/right None. When the node has no reference, python's garbage collector will handle it
            return None 

        return root

Side view and Boundary Traversal

def right_view(root: TreeNode) -> List[int]:
        """Right view of binary tree using level order"""
        if not root:
            return []
        result = []
        queue = deque([root])
        
        while queue:
            level_size = len(queue)
            
            for i in range(level_size):
                node = queue.popleft()
                
                if i == level_size - 1:  # rightmost node of current level
                    result.append(node.val)
                
                if node.left:
                    queue.append(node.left)
                if node.right:
                    queue.append(node.right)
        
        return result

def boundary_traversal(root: TreeNode) -> List[int]:
        """
        Boundary Traversal of Binary Tree
        
        Intuition:
        - Traverse tree boundary in anti-clockwise direction
        - Three parts: left boundary, leaves, right boundary
        - Need to avoid duplicate nodes
        
        Algorithm steps:
        1. Add root
        2. Add left boundary (excluding leaves)
        3. Add leaves from left to right
        4. Add right boundary (excluding leaves) in reverse
        
        Helper functions:
        - add_left_boundary: Adds nodes on leftmost path
        - add_leaves: Adds all leaf nodes
        - add_right_boundary: Adds nodes on rightmost path (bottom-up)
        
        Use cases:
        1. Printing tree perimeter
        2. Finding external nodes
        3. Tree visualization
        
        Time: O(n), Space: O(h) where h is height of tree
        """
        def add_left_boundary(node: TreeNode, result: List[int]):
            if not node or (not node.left and not node.right):
                return
            result.append(node.val)
            if node.left:
                add_left_boundary(node.left, result)
            elif node.right:
                add_left_boundary(node.right, result)
        
        def add_leaves(node: TreeNode, result: List[int]):
            if not node:
                return
            if not node.left and not node.right:
                result.append(node.val)
                return
            add_leaves(node.left, result)
            add_leaves(node.right, result)
        
        def add_right_boundary(node: TreeNode, result: List[int]):
            if not node or (not node.left and not node.right):
                return
            if node.right:
                add_right_boundary(node.right, result)
            elif node.left:
                add_right_boundary(node.left, result)
            result.append(node.val)
        
        if not root:
            return []
        
        result = [root.val]
        if not root.left and not root.right:
            return result
        
        if root.left:
            add_left_boundary(root.left, result)
        add_leaves(root.left, result)
        add_leaves(root.right, result)
        if root.right:
            add_right_boundary(root.right, result)
        
        return result

When finding most frequent value in trees, create Hashmap and traverse it in one go to find values which have a certain frequency. Initial frequency will be 0, if (freq = max_freq) then
House Robber

def rob(self, root: TreeNode) -> int:
        def dfs(root):
            if not root:
                return [0, 0]

            leftPair = dfs(root.left)
            rightPair = dfs(root.right)

            withRoot = root.val + leftPair[1] + rightPair[1]
            withoutRoot = max(leftPair) + max(rightPair)

            return [withRoot, withoutRoot]

        return max(dfs(root))

Path Sum

"""
Complexity is O(n^2) in the worst case if we start a DFS from each node of the tree.
Time: O(n) — Each node visited once.
Space: O(h + n) — h = height of recursion stack, n from the prefix sum map.

Prefix Sum with HashMap
"""
def pathSum(self, root: TreeNode, targetSum: int) -> int:
        prefix_sum_count = defaultdict(int)
        prefix_sum_count[0] = 1  # base case to handle exact target from root

        def dfs(node, currSum):
            if not node:
                return 0

            currSum += node.val
            count = prefix_sum_count[currSum - targetSum]  # number of valid paths ending here

            prefix_sum_count[currSum] += 1

            count += dfs(node.left, currSum)
            count += dfs(node.right, currSum)

            prefix_sum_count[currSum] -= 1  # backtrack

            return count

        return dfs(root, 0)

Maximum Path Sum

"""
The path does not have to pass through the root.
greedy pruning - if a subtree gives a negative contribution, we treat it as 0.
Time - O(n) Space - O(h)
"""
def maxPathSum(self, root: TreeNode) -> int:
        max_sum = float('-inf')  # This will hold the final result

        def dfs(node):
            nonlocal max_sum
            if not node:
                return 0

            # Recursively calculate max path sum from left and right
            left_gain = max(dfs(node.left), 0)   # Only add if > 0
            right_gain = max(dfs(node.right), 0)

            # Price of current path: may be the max sum at this node
            current_path = node.val + left_gain + right_gain

            # Update the global max if this path is better
            max_sum = max(max_sum, current_path)

            # Return max gain this node can contribute to its parent
            return node.val + max(left_gain, right_gain)

        dfs(root)
        return max_sum

Maximum Product of Splitted Binary Tree

"""re
Time: O(N) — two DFS traversals.
Space: O(H) — recursion stack, where H is the height of the tree.

max(sum(T)⋅(totalSum−sum(T)))
"""
def maxProduct(self, root: Optional[TreeNode]) -> int:
        MOD = 10**9 + 7
        self.total = 0
        self.max_prod = 0

        # First pass: get the total sum of all nodes
        def get_total_sum(node):
            if not node:
                return 0
            return node.val + get_total_sum(node.left) + get_total_sum(node.right)

        self.total = get_total_sum(root)

        # Second pass: for each subtree, calculate product
        def dfs(node):
            if not node:
                return 0
            left_sum = dfs(node.left)
            right_sum = dfs(node.right)
            sub_sum = node.val + left_sum + right_sum
            # Consider removing the edge above this node
            product = sub_sum * (self.total - sub_sum)
            self.max_prod = max(self.max_prod, product)
            return sub_sum

        dfs(root)
        return self.max_prod % MOD

Insufficient Nodes in Root to Leaf Paths

"""
Greedy wont work when we start from bottom. We have to retain nodes only if there's any path to a leaf from that node with total sum ≥ limit. That requires examining all root-to-leaf paths. So, a node must be pruned even if it has a high value if all paths through it fall below the limit.
"""
def sufficientSubset(self, root: TreeNode, limit: int) -> TreeNode:
        def dfs(node, path_sum):
            if not node:
                return None
            
            # If it's a leaf
            if not node.left and not node.right:
                return node if path_sum + node.val >= limit else None

            # Post-order DFS: left and right first
            node.left = dfs(node.left, path_sum + node.val)
            node.right = dfs(node.right, path_sum + node.val)

            # If both subtrees are pruned, prune this node
            if not node.left and not node.right:
                return None
            return node

        return dfs(root, 0)

Distance between 2 Nodes

def lowestCommonAncestor(self, root: TreeNode, node1: TreeNode, node2: TreeNode) -> TreeNode:
        """
        Recursively finds the lowest common ancestor (LCA) of node1 and node2 in the binary tree.
        """
        if not root or root == node1 or root == node2:
            return root

        left_subtree = self.lowestCommonAncestor(root.left, node1, node2)
        right_subtree = self.lowestCommonAncestor(root.right, node1, node2)

        if left_subtree and right_subtree:
            return root  # current node is LCA if node1 and node2 are found in left and right
        return left_subtree if left_subtree else right_subtree

    def findDepthFromAncestor(self, ancestor: TreeNode, target: TreeNode, current_depth: int = 0) -> int:
        """
        Finds the depth of the target node starting from the ancestor node.
        Returns -1 if the target is not found in the subtree rooted at ancestor.
        """
        if not ancestor:
            return -1
        if ancestor == target:
            return current_depth

        # Search left subtree
        left_depth = self.findDepthFromAncestor(ancestor.left, target, current_depth + 1)
        if left_depth != -1:
            return left_depth

        # Search right subtree
        return self.findDepthFromAncestor(ancestor.right, target, current_depth + 1)

    def distanceBetweenNodes(self, root: TreeNode, node1: TreeNode, node2: TreeNode) -> int:
        """
        Computes the distance (number of edges) between node1 and node2 in a binary tree.
Time - O(n) and Space - O(h)
        """
        # Step 1: Find LCA of the two nodes
        lca = self.lowestCommonAncestor(root, node1, node2)

        # Step 2: Find depth of each node from the LCA
        depth1 = self.findDepthFromAncestor(lca, node1)
        depth2 = self.findDepthFromAncestor(lca, node2)

        # Step 3: Distance = depth1 + depth2
        return depth1 + depth2

Maximum Difference between Node and Ancestor

"""
Time - O(n) and Space - O(h)
"""
def maxAncestorDiff(self, root: TreeNode) -> int:
        def dfs(node, current_min, current_max):
            if not node:
                return current_max - current_min

            # Update min and max for current path
            current_min = min(current_min, node.val)
            current_max = max(current_max, node.val)

            # Recurse on left and right
            left_diff = dfs(node.left, current_min, current_max)
            right_diff = dfs(node.right, current_min, current_max)

            # Return the max diff seen in either subtree
            return max(left_diff, right_diff)

        return dfs(root, root.val, root.val)

Diameter and Width of Tree

"""
The diameter is the length (number of edges) of the longest path between any two nodes in the tree. This path may or may not pass through the root.
Time - O(n) and Space - O(h)
"""
def diameterOfBinaryTree(self, root: TreeNode) -> int:
        self.max_diameter = 0

        def dfs(node):
            if not node:
                return 0

            # Height of left and right subtrees
            left = dfs(node.left)
            right = dfs(node.right)

            # Update max diameter
            self.max_diameter = max(self.max_diameter, left + right)

            # Height of current node
            return 1 + max(left, right)

        dfs(root)
        return self.max_diameter

"""
The width is defined as the length between the leftmost and rightmost non-null nodes at any level (inclusive of nulls in between).
Time - O(n) and Space - O(n)
"""
def widthOfBinaryTree(self, root: TreeNode) -> int:
        if not root:
            return 0

        max_width = 0
        # Each element: (node, index)
        queue = deque([(root, 0)])

        while queue:
            level_length = len(queue)
            _, level_head_index = queue[0]
            for _ in range(level_length):
                node, idx = queue.popleft()
                # Normalize index to prevent overflow
                idx -= level_head_index
                if node.left:
                    queue.append((node.left, 2 * idx))
                if node.right:
                    queue.append((node.right, 2 * idx + 1))

            current_width = idx + 1  # since idx is the last one after popping all
            max_width = max(max_width, current_width)

        return max_width

In a balanced BST, operations like search, insert, and delete run in average and best-case O(log⁡n) time; in the worst (unbalanced), these degrade to O(n). BSTs are the basis for self-balancing trees (e.g., AVL, Red-Black), which maintain the O(log⁡n) guarantee in all cases.

"""
Search in BST 
"""
def searchAllBST(root: Optional[TreeNode], target: int) -> List[TreeNode]:
    """
    Returns a list of all TreeNode references with .val == target in a BST with duplicates.
    """
    res: List[TreeNode] = []

    def dfs(node: Optional[TreeNode]) -> None:
        if not node:
            return
        if node.val == target:
            res.append(node)
            # Duplicates may be both left and right in some BSTs, so check both subtrees
            dfs(node.left)
            dfs(node.right)
        elif target < node.val:
            dfs(node.left)
        else:
            dfs(node.right)

    dfs(root)
    return res

Inorder Traversal is sorted in ascending order for BST. Reverse inorder traversal (R->V->L) gives descending order. Exploit in cases of >= or <= problems in BST.
Balance BST

"""
O(n)
"""
def balanceBST(self, root: TreeNode) -> TreeNode:
        # Step 1: Inorder traversal to get sorted node values
        def inorder(node):
            if not node:
                return []
            return inorder(node.left) + [node.val] + inorder(node.right)
        
        sorted_vals = inorder(root)

        # Step 2: Build balanced BST from sorted values
        def build_balanced_tree(start, end):
            if start > end:
                return None
            mid = (start + end) // 2
            node = TreeNode(sorted_vals[mid])
            node.left = build_balanced_tree(start, mid - 1)
            node.right = build_balanced_tree(mid + 1, end)
            return node
        
        return build_balanced_tree(0, len(sorted_vals) - 1)

Range Sum of BST

"""
Simply doing inorder is worse as we don't necessarily need to go through all the node values in BST. We can prune the search.

Time - O(n)
"""
def rangeSumBST(self, root: Optional[TreeNode], low: int, high: int) -> int:
        if not root:
            return 0

        # If root value is greater than high, skip the right subtree
        if root.val > high:
            return self.rangeSumBST(root.left, low, high)

        # If root value is less than low, skip the left subtree
        if root.val < low:
            return self.rangeSumBST(root.right, low, high)

        # Root is within [low, high]; include it and recurse both sides
        return (
            root.val +
            self.rangeSumBST(root.left, low, high) +
            self.rangeSumBST(root.right, low, high)
        )

Insert and Delete Node in BST

def insertIntoBST(self, root: TreeNode, val: int) -> TreeNode:
        # If the tree is empty, create a new node with the value and return it
        if not root:
            return TreeNode(val)
        
        # If the value is less than the current node, go to the left subtree
        if val < root.val:
            root.left = self.insertIntoBST(root.left, val)
        else:
            # If the value is greater than or equal to the current node, go to the right subtree
            root.right = self.insertIntoBST(root.right, val)
        
        # Return the root as the tree structure is updated bottom-up
        return root

def deleteNode(self, root: Optional[TreeNode], key: int) -> Optional[TreeNode]:
        if not root:
            return root

        if key > root.val:
            root.right = self.deleteNode(root.right, key)
        elif key < root.val:
            root.left = self.deleteNode(root.left, key)
        else:
            if not root.left:
                return root.right
            elif not root.right:
                return root.left

            cur = root.right
            while cur.left:
                cur = cur.left
            root.val = cur.val
            root.right = self.deleteNode(root.right, root.val)

        return root

Minimum Absolute Difference in BST

"""
The minimum difference must occur between consecutive nodes in the inorder traversal as the inorder traversal will be sorted.
We need to keep track of previous node.
"""
def getMinimumDifference(self, root: TreeNode) -> int:
        # Initialize minimum difference and previous node value
        self.min_diff = float('inf')
        self.prev_node = None

        def inorder(node):
            if not node:
                return

            # Traverse the left subtree
            inorder(node.left)

            # Process current node: compute difference with previous node
            if self.prev_node is not None:
                diff = abs(node.val - self.prev_node)
                self.min_diff = min(self.min_diff, diff)

            # Update previous node value
            self.prev_node = node.val

            # Traverse the right subtree
            inorder(node.right)

        inorder(root)
        return self.min_diff

Serialize and Deserialize Binary Tree

"""
Preorder traversal using DFS to get a non-ambiguous string (no need for a seperate array). Decoding
the string should be feasible.

O(n) for both serializing and deserializing
"""
# Encodes a tree to a single string.
    def serialize(self, root: Optional[TreeNode]) -> str:
        res = []

        def dfs(node):
            if not node:
                res.append("N")
                return
            res.append(str(node.val))
            dfs(node.left)
            dfs(node.right)

        dfs(root)
        return ",".join(res)
        
    # Decodes your encoded data to tree.
    def deserialize(self, data: str) -> Optional[TreeNode]:
        vals = data.split(",") // comma is delimiter
        self.i = 0

        def dfs():
            if vals[self.i] == "N": // empty node
                self.i += 1
                return None
            node = TreeNode(int(vals[self.i]))
            self.i += 1
            node.left = dfs()
            node.right = dfs()
            return node

        return dfs()

Queue Based N-ary Tree Level Order Traversal

"""
Time - O(n) and Space - O(w), w is max width of tree
"""
def levelOrder(self, root: 'Node') -> List[List[int]]:
        if not root:
            return []

        result = []
        queue = deque([root])

        while queue:
            level_size = len(queue)
            level_nodes = []

            for _ in range(level_size):
                node = queue.popleft()
                level_nodes.append(node.val)

                for child in node.children:
                    queue.append(child)

            result.append(level_nodes)

        return result

Heap

Top K frequent elements

""" 
Better than MaxHeap solution is
Bucket Sort O(n) Space and Time Complexity
It works as frequency table is bounded by the fact that min = 0 (element never duplicates) and max = size of array (the array has only one element duplicated) 
"""
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        count = {}
        freq = [[] for i in range(len(nums) + 1)]

        for num in nums:
            count[num] = 1 + count.get(num, 0)
        for num, cnt in count.items():
            freq[cnt].append(num)
        
        res = []
        for i in range(len(freq) - 1, 0, -1):
            for num in freq[i]:
                res.append(num)
                if len(res) == k:
                    return res

Find K pair with smallest sums

"""
Two Pointers + Heap
For merging two sorted listes, two pointers is enough, moving pointers taking care of a condition.

Since both nums1 and nums2 are sorted, the smallest pairs (by sum) “radiate out” from the (0,0) pair.
But—to find the k pairs with smallest sums—you can’t just systematically generate (0,0), (1,0), (0,1), (2,0), (1,1), etc., with pure two-pointer logic. Why not? Because at each step there could be multiple “next” smallest pairs that you haven’t yet visited, and their indices can overlap in complex ways.

For example :
nums1 = [1, 7, 11]
nums2 = [2, 4, 6]

(1,2) sum=3
(1,4) sum=5
(7,2) sum=9   <-- this one is the problem
(1,6) sum=7
(7,4) sum=11
(11,2) sum=13
...
At any step there can be many candidate pairs waiting to be considered, not just the “next in row” or “next in column”.

The sorted structure of nums1 and nums2 means:

    Row 0: [nums1+nums2, nums1+nums2, nums1+nums2, ...] — increasing in j

    Row 1: [nums1+nums2, nums1+nums2, nums1+nums2, ...] — increasing in j

    But sums across rows are not in row/col traversal order.

Heap ensures global ordering of the next smallest pair — two pointers in a fixed pattern only give local ordering.

The min-heap is used here because:
* It always gives you the next smallest sum pair.
* It lets you explore “next candidates” in the correct order, potentially expanding along either direction.
* It allows duplicate sums or equal value pairs naturally.

But, with min-heap you cannot keep the size of heap in check. It will be O(mn). To restrict the heap size to at most k while building it for the k smallest sum pairs problem, you need to keep the heap as a max-heap of size k so that you can discard pairs with sums larger than the current largest in the heap.

By maintaining a max-heap of size at most k, you only keep the k pairs with smallest sums so far. When a new pair (u, v) has sum smaller than the largest sum in the heap (the max in the heap), it replaces the largest one. This way, the heap never grows beyond size k, keeping space complexity at O(k). Time complexity for pushing all pairs remains O(nm * log k), which is better than full sort (O(n*m * log (n*m))) when k is much smaller than nm.

        j=0  j=1  j=2
i=0      3    5    7
i=1      9   11   13
i=2     13   15   17

we are checking down (i, j+1) and right (i+1, j)!
"""
import heapq

def kSmallestPairs(nums1, nums2, k):
    if not nums1 or not nums2 or k == 0:
        return []
    
    max_heap = []  # will store (-sum, u, v)
    
    for u in nums1:
        for v in nums2:
            current_sum = u + v
            
            if len(max_heap) < k:
                # Push negative sum to simulate max heap
                heapq.heappush(max_heap, (-current_sum, u, v))
            else:
                # If current sum is smaller than the largest sum in max_heap (which is -max_heap[0][0])
                if current_sum < -max_heap[0][0]:
                    # Pop the largest and add the smaller current pair
                    heapq.heappop(max_heap)
                    heapq.heappush(max_heap, (-current_sum, u, v))
                else:
                    # Current sum is not smaller, no need to proceed further in this inner loop since nums2 is sorted
                    # You could add break here if you want to optimize for sorted arrays
                    break
    
    # Convert max_heap to result list, reverse negative sums back
    result = [[u, v] for (_, u, v) in max_heap]
    return result

Task Scheduler

import heapq
from collections import Counter, deque

def leastInterval(tasks, n):
    """
    Greedy for always schedule the most frequent available task to minimize idle time, Priority Queue (Max-Heap) for efficient retrieval of the next best task and a Cooldown Queue to manages tasks that are waiting to be scheduled again. 
    
    Args:
        tasks: List[str] - List of tasks (e.g., ['A', 'A', 'A', 'B', 'B', 'B'])
        n: int - cooldown period between same tasks

    Returns:
        int - least number of units of time to finish all tasks

    Counting tasks: O(N), where N = number of tasks.Each task is pushed and popped from the heap at most once: O(N log K), where K is the number of unique tasks (K ≤ 26). Each task is added and removed from the cooldown queue at most once: O(N). Since K is at most 26, O(log K) is O(1), so the overall time is O(N).
    """

    # Step 1: Count frequency of each task
    task_counts = Counter(tasks)

    # Step 2: Use a max-heap (invert counts for heapq which is min-heap by default)
    max_heap = [-cnt for cnt in task_counts.values()]
    heapq.heapify(max_heap)

    # Step 3: Queue to keep tasks in cooldown (task, available_time)
    cooldown = deque()  # (ready_time, count)

    time = 0  # Current time

    while max_heap or cooldown:
        time += 1

        if max_heap:
            # Pop the most frequent task
            cnt = heapq.heappop(max_heap)
            cnt += 1  # Decrease count (since stored as negative)
            if cnt != 0:
                # If there are more of this task, add to cooldown
                cooldown.append((time + n, cnt))

        # If any task's cooldown has expired, push back to heap
        if cooldown and cooldown[0][0] == time:
            _, ready_cnt = cooldown.popleft()
            heapq.heappush(max_heap, ready_cnt)

    return time

Find Median from Data Stream

"""
Mean is the average of all values: add up all the numbers and divide by how many there are. Median is the middle value when the numbers are arranged in order. If there’s an even number of values, it’s the average of the two middle numbers. Mode is the value that appears most frequently in the data set

We don't want to sort the array again and again when new element gets added, so we can use a priorty-queue to keep our elements sorted. But, we will have to do a klogn operation to get to middle elements. So, we will use a min and max heap to store the mid values (even numbers of data in stream) and obtain it in o(1). In case of odd number of data in stream we will store extra in max/min heap.
"""
import heapq

class MedianFinder:
    def __init__(self):
        # Max-heap for the smaller half (store negatives to simulate max-heap)
        self.left = []
        # Min-heap for the larger half
        self.right = []

    def addNum(self, num: int) -> None:
        """
        Adds a number into the data structure.
        """
        # Step 1: Add to max-heap (left)
        heapq.heappush(self.left, -num)

        # Step 2: Balance: Move the largest from left to right (if necessary)
        # This ensures all elements in left <= all elements in right
        if self.left and self.right and (-self.left[0] > self.right[0]):
            val = -heapq.heappop(self.left)
            heapq.heappush(self.right, val)

        # Step 3: Maintain size property (left can have at most 1 more element than right)
        if len(self.left) > len(self.right) + 1:
            val = -heapq.heappop(self.left)
            heapq.heappush(self.right, val)
        elif len(self.right) > len(self.left):
            val = heapq.heappop(self.right)
            heapq.heappush(self.left, -val)

    def findMedian(self) -> float:
        """
        Returns the median of all elements so far.
        """
        if len(self.left) > len(self.right):
            # Odd number of elements: median is the top of max-heap
            return -self.left[0]
        else:
            # Even number of elements: median is the average of two middles
            return (-self.left[0] + self.right[0]) / 2

Total Cost to Hire K Workers
Reorganize String

Trie/Prefix Tree

Graphs

Adjaceny list If the input is a list of lists and each index represents a node and its neighbors, You do not need to build an adjacency list. If the input is a list of lists, but each list is not directly the neighbors of a node, then build the adjacency list.

"""
For example, in the Bus Routes problem:

    Input: routes = [[1][2][7],[3][6][7]]

    Each sublist is a bus route, not a node's neighbors.

    You need to build a mapping from each bus stop to the routes that visit it, or from each route to the stops it connects, depending on your traversal strategy.
"""
routes = [[1,2,7],[3,6,7]]
stop_to_routes = defaultdict(list)
for i, route in enumerate(routes):
    for stop in route:
        stop_to_routes[stop].append(i)
# stop_to_routes: {1: [0], 2: [0], 7: [0, 1], 3: [1], 6: [1]}

Standard Algorithms

"""
BFS naturally processes nodes in increasing “distance” (layers) from starting node/root. Want minimum/smallest “number of moves” to reach something? BFS is ideal if all moves have the same cost.
"""
def bfs(graph, start):
        """Breadth-First Search
           Time Complexity - O(n) + O(2e)
           Space Complexity - O(n)
        """
        visited = set()
        queue = deque([start])
        visited.add(start)
        result = []
        
        while queue:
            vertex = queue.popleft()
            result.append(vertex)
            
            for neighbor in graph[vertex]:
                if neighbor not in visited:
                    visited.add(neighbor)
                    queue.append(neighbor)
        return result

"""
When all ways must be explored, or the visitor order matters. 
"""
def dfs(graph, start, visited=None, result=None):
    if visited is None:
        visited = set()
    if result is None:
        result = []
    
    visited.add(start)
    result.append(start)
    
    for neighbor in graph[start]:
        if neighbor not in visited:
            dfs(graph, neighbor, visited, result)
    
    return result

Cycle Detection

def undirected_has_cycle_dfs(graph):
"""
Each DFS call tracks the parent to avoid falsely detecting the immediate back edge as a cycle.
The outer loop ensures all connected components are checked.
Time Complexity - O(n)
Space Complexity - O(n + 2e)
"""
    visited = set()

    def dfs(node, parent):
        visited.add(node)
        for neighbor in graph.get(node, []):
            if neighbor not in visited:
                if dfs(neighbor, node):
                    return True
            elif neighbor != parent:
                # Found a cycle
                return True # no need for more recursion after cycle is already detected
        return False

    # Handle disconnected components
    for node in graph:
        if node not in visited: # one node per component
            if dfs(node, None):
                return True
    return False

from collections import deque

def undirected_has_cycle_bfs(graph):
    """    
Each BFS node tracks its parent to avoid confusing the edge back to the parent as a cycle.
The outer loop ensures all components are checked, even if the graph is disconnected.
    """
    visited = set()

    for start in graph:
        if start not in visited:
            queue = deque([(start, None)])  # (current_node, parent)
            visited.add(start)
            while queue:
                node, parent = queue.popleft()
                for neighbor in graph.get(node, []):
                    if neighbor not in visited:
                        visited.add(neighbor)
                        queue.append((neighbor, node))
                    elif neighbor != parent:
                        # Found a back edge (cycle)
                        return True
    return False

from collections import defaultdict

class Solution:
    def detectCycleInDirectedGraph(self, V: int, adj: list[list[int]]) -> bool:
        """
        Detects if a directed graph contains a cycle using DFS.

        Args:
            V: The number of vertices in the graph.
            adj: An adjacency list representation of the graph.
                 adj[i] contains a list of neighbors of vertex i.

        Returns:
            True if a cycle is detected, False otherwise.
        """
        # visited array: This array keeps track of all nodes that have been part of any DFS traversal at some point. Once a node is marked visited, it means we've explored it from at least one path. This prevents redundant processing of nodes and helps optimize the traversal.
        # visited[i] = True if node i has been visited in any DFS traversal.
        visited = [False] * V

        # path_visited array: This is the key for cycle detection in a directed graph. It keeps track of all nodes that are currently in the current recursion stack of the DFS traversal. If we encounter a node that is already in our path_visited array, it means we've found a back edge, which indicates a cycle. Think of it as: "I'm currently exploring this path, and I just tried to go to a node I've already visited on this exact path.
        # path_visited[i] = True if node i is currently in the recursion stack.
        path_visited = [False] * V

        # The DFS helper function
        def dfs(node: int) -> bool:
            """
            Performs DFS starting from 'node' to detect cycles.

            Args:
                node: The current node being visited.

            Returns:
                True if a cycle is detected starting from this node's path, False otherwise.
            """
            visited[node] = True      # Mark current node as globally visited
            path_visited[node] = True # Mark current node as part of the current path

            # Explore all neighbors of the current node
            for neighbor in adj[node]:
                if not visited[neighbor]:
                    # If neighbor is not visited, recursively call DFS
                    if dfs(neighbor):
                        # If a cycle is detected in the subtree, propagate True upwards
                        return True
                elif path_visited[neighbor]:
                    # If neighbor is already in the current recursion stack (path_visited is True),
                    # it means we found a back edge, hence a cycle.
                    return True

            # Backtrack: Remove the current node from the current path
            # as we are done exploring all paths from this node.
            path_visited[node] = False
            return False # No cycle found from this node's path

        # Iterate through all nodes to handle disconnected components
        for i in range(V):
            if not visited[i]:
                # If a node hasn't been visited, start a new DFS traversal
                if dfs(i):
                    # If DFS from this node finds a cycle, return True immediately
                    return True

        # No cycle found after checking all components
        return False

Topological Sort

"""
For each vertex, we iterate through its outgoing edges. Thus, every vertex and every edge in the graph is processed exactly once. This leads to O(V+E) for the DFS part. adj (Adjacency List): O(V+E) to store the graph connections, visited_state array: O(V) space, result list: O(V) space to store the topological order, Recursion Stack: In the worst-case scenario (a linear graph, e.g., 0 -> 1 -> 2 -> ... -> V-1), the depth of the recursion stack can go up to O(V).

0 (UNVISITED): The node has not been encountered at all by any DFS traversal.

1 (VISITING / IN_STACK): The node has been visited, and it is currently in the active recursion stack of the current DFS path. This means we are currently exploring its descendants.

2 (VISITED / PROCESSED): The node has been completely processed. Its DFS traversal (and all of its reachable descendants) has finished, and it has been added to the topological sort result.
"""
def _dfs(self, node: int):
        """
        Recursive DFS helper function for topological sort and cycle detection.
        """
        if self.has_cycle:
            # If a cycle has already been detected, stop further processing
            return

        # Mark the current node as VISITING (in recursion stack)
        self.visited_state[node] = 1

        # Explore all neighbors of the current node
        for neighbor in self.adj[node]:
            if self.visited_state[neighbor] == 0:
                # If neighbor is UNVISITED, recursively call DFS on it
                self._dfs(neighbor)
                if self.has_cycle: # Propagate cycle detection upwards
                    return
            elif self.visited_state[neighbor] == 1:
                # If neighbor is VISITING, it means we found a back edge,
                # hence a cycle exists in the graph.
                self.has_cycle = True
                return
        
        # After visiting all neighbors and ensuring no cycle, mark current node as VISITED
        self.visited_state[node] = 2
        
        # Add the current node to the result list.
        # Nodes are added when all their descendants (dependencies) have been processed.
        # This naturally builds the list in reverse topological order.
        self.result.append(node)

    def topological_sort(self) -> list[int]:
        """
        Performs topological sort on the graph.
        
        Returns:
            A list representing a valid topological order if one exists.
            An empty list if a cycle is detected (no valid order).
        """
        self.result = [] # Reset result for new sort attempts
        self.has_cycle = False
        self.visited_state = [0] * self.num_nodes # Reset visited states

        # Iterate through all nodes to handle disconnected components in the graph
        for i in range(self.num_nodes):
            if self.visited_state[i] == 0: # If node is unvisited, start a new DFS
                self._dfs(i)
                if self.has_cycle:
                    # If a cycle is detected during any DFS call, return empty list
                    return []
        
        # If no cycle was found, reverse the result list to get the correct topological order.
        # The DFS appends nodes after all their descendants are processed, so the
        # sink nodes are added first. Reversing puts source nodes first.
        return self.result[::-1]

Dijkstra

def dijkstra(graph, start):
        """Dijkstra's shortest path algorithm"""
        distances = {vertex: float('infinity') for vertex in graph}
        distances[start] = 0
        pq = [(0, start)]
        visited = set()
        
        while pq:
            current_distance, current_vertex = heapq.heappop(pq)
            
            if current_vertex in visited:
                continue
                
            visited.add(current_vertex)
            
            for neighbor, weight in graph[current_vertex].items():
                distance = current_distance + weight
                
                if distance < distances[neighbor]:
                    distances[neighbor] = distance
                    heapq.heappush(pq, (distance, neighbor))
        
        return distances

Course Schedular

Possible Bipartition
All Nodes Distance K in Binary Tree
Word Ladder (Graph Transformation)
Union-Find (Disjoint Set Union)

Matrix

Number of Islands

def numIslands(self, grid: List[List[str]]) -> int:
        if not grid:
            return 0

        self.rows, self.cols = len(grid), len(grid[0])
        self.visited = set()
        self.directions = [(1, 0), (-1, 0), (0, 1), (0, -1)]
        self.grid = grid  # store grid as instance variable
        islands = 0

        for r in range(self.rows):
            for c in range(self.cols):
                if self.grid[r][c] == "1" and (r, c) not in self.visited:
                    self.bfs(r, c)
                    islands += 1

        return islands

def bfs(self, r, c):
    q = deque()
    q.append((r, c))
    self.visited.add((r, c))

    while q:
        row, col = q.popleft()
        for dr, dc in self.directions:
            nr, nc = row + dr, col + dc
            if (0 <= nr < self.rows and 0 <= nc < self.cols and
                self.grid[nr][nc] == "1" and (nr, nc) not in self.visited):
                q.append((nr, nc))
                self.visited.add((nr, nc))

Matrix Traversal - Rat in a Maze

"""
Explore all the paths - DFS, but if shortest path is required then BFS
"""
def findPath(self, maze: List[List[int]], n: int) -> List[str]:
        if maze[0][0] == 0 or maze[n - 1][n - 1] == 0:
            return []  # No path if start or end is blocked
        
        result = []
        visited = [[False for _ in range(n)] for _ in range(n)]
        directions = [('D', 1, 0), ('L', 0, -1), ('R', 0, 1), ('U', -1, 0)]

        def is_safe(row, col):
            return 0 <= row < n and 0 <= col < n and maze[row][col] == 1 and not visited[row][col]

        def dfs(row, col, path):
            if row == n - 1 and col == n - 1:
                result.append(path)
                return
            
            # Mark current cell as visited
            visited[row][col] = True
            
            for move, dx, dy in directions:
                new_row, new_col = row + dx, col + dy
                if is_safe(new_row, new_col):
                    dfs(new_row, new_col, path + move)
            
            # Backtrack: unmark current cell
            visited[row][col] = False

        dfs(0, 0, "")
        return sorted(result)  # Lexicographically sorted paths

Minimum square to fit N rectangles

"""
For any candidate size XX, you can check in O(1)O(1) whether all nn rectangles fit by seeing how many can fit along the width and along the height:
    max_rects=⌊X/w⌋×⌊X/h⌋
As X increases, max_rects also increases (monotonically).
The property: "can fit at least n rectangles" transitions from False (when X is too small) to True as X increases.
"""
# Function to check if side of square X
# can pack all the N rectangles or not
def bound(w, h, N, x):

    # Find the number of rectangle
    # it can pack
    val = (x//w)*(x//h)
    
    # If val is atleast N,
    # then return true
    if(val >= N):
        return True
      
    # Otherwise, return false
    else:
        return False

# Function to find the size of the
# smallest square that can contain
# N rectangles of dimensions W * H
def FindSquare(N, W, H):

    # Stores the lower bound
    ileft = 1
    
    # Stores the upper bound
    right = (W * H) * N
    
    # Iterate until i is less than j
    while(left < right):
      
        # Calculate the mid value
        mid = left + (right - left)//2

        # If the current size of square can contain N rectangles
        if(bound(W, H, N, mid)):
            right = mid
        
        # Otherwise, update i
        else:
            left = mid + 1

    # Return the minimum size of the square required
    return right

Search in Sorted Matrix

import bisect
"""
For using bisect you will have to flatten the matrix which will make the time complexity O(mn). Directly applying Binary Search on Matrix is better to get O(log(m*n)) time complexity.
"""
def searchMatrix(matrix, target):
    if not matrix or not matrix[0]:
        return False
    
    rows, columns = len(matrix), len(matrix[0])
    
    # Binary Search on Matrix
    left, right = 0, rows * columns - 1
    while left <= right:
        mid = left + (right - left) // 2
        row, column = mid // columns, mid % columns
        mid_value = matrix[row][column]
        
        if mid_value == target:
            return True
        elif mid_value < target:
            left = mid + 1
        else:
            right = mid - 1
    
    return False

Rotten Oranges

"""
Multi sourc BFS - O(mn) Space and Time complexity
"""
from collections import deque

def orangesRotting(grid):
    # no oranges to process
    if not grid:
        return -1

    rows, cols = len(grid), len(grid[0])
    queue = deque() # Queue for BFS storing positions of rotten oranges and the time when they became rotten
    fresh = 0 # number of fresh oranges

    # Step 1: Collect all rotten oranges and count fresh ones
    for r in range(rows):
        for c in range(cols):
            # Append rotten orange location with time = 0 (start)
            if grid[r][c] == 2:
                queue.append((r, c, 0))  # (row, col, time)
            elif grid[r][c] == 1:
                fresh += 1 # count the fresh ones

    # If there are no fresh oranges from the start, no time needed to rot anything
    if fresh == 0:
        return 0

    # Step 2: BFS traversal (4-directional)
    directions = [(0, 1), (1, 0), (0, -1), (-1, 0)]
    minutes = 0 # time elapsed as rot spread

    while queue:
        r, c, minutes = queue.popleft() # Current rotten orange and time of rot
        
        # Check all 4 adjacent neighbors
        for dr, dc in directions:
            nr, nc = r + dr, c + dc
            
            # If the neighbor is a fresh orange, rot it
            if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1:
                grid[nr][nc] = 2  # Mark neighbor as rotten
                fresh -= 1 # One fewer fresh orange left
                queue.append((nr, nc, minutes + 1)) # Add newly rotten orange to queue with incremented time

    # If no fresh oranges left, return number of minutes elapsed to rot all
    # Otherwise, return -1 indicating some fresh oranges cannot rot
    return minutes if fresh == 0 else -1

0/1 Matrix

from collections import deque

class Solution:
    def updateMatrix(self, mat: list[list[int]]) -> list[list[int]]:
        """
        Calculates the distance of the nearest 0 for each cell in a binary matrix.

        Args:
            mat: A 2D list of integers representing the matrix, where 0 represents
                 an empty cell and 1 represents a filled cell.

        Returns:
            A 2D list of integers where each cell contains the distance to the
            nearest 0.
        """
        m, n = len(mat), len(mat[0])

        # Initialize the result matrix 'dist' with -1 for cells that are not 0.
        # Cells that are 0 will have a distance of 0.
        dist = [[-1] * n for _ in range(m)]

        # Initialize a queue for Breadth-First Search (BFS).
        # We will start BFS from all cells that contain 0.
        q = deque()

        for r in range(m):
            for c in range(n):
                if mat[r][c] == 0:
                    dist[r][c] = 0  # Distance to itself is 0
                    q.append((r, c)) # Add 0-cells to the queue

        # Define the possible directions for movement (up, down, left, right)
        directions = [(0, 1), (0, -1), (1, 0), (-1, 0)]

        # Perform BFS
        while q:
            row, col = q.popleft()

            # Explore all 4 neighboring cells
            for dr, dc in directions:
                new_row, new_col = row + dr, col + dc

                # Check if the new cell is within the matrix bounds
                if 0 <= new_row < m and 0 <= new_col < n:
                    # If the cell has not been visited yet (distance is -1)
                    if dist[new_row][new_col] == -1:
                        # The distance to this cell is 1 more than the current cell's distance
                        dist[new_row][new_col] = dist[row][col] + 1
                        q.append((new_row, new_col)) # Add to queue for further exploration

        return dist

Tic Tac Toe (Winning Conditions and Board State)

class TicTacToe:
    def __init__(self, n: int):
        self.n = n
        self.rows = [0] * n
        self.cols = [0] * n
        self.diag = 0
        self.anti_diag = 0

"""
Time - O(1) per move 
Space - O(N)
"""

    def move(self, row: int, col: int, player: int) -> int:
        value = 1 if player == 1 else -1

        self.rows[row] += value
        self.cols[col] += value
        
        if row == col:
            self.diag += value
        if row + col == self.n - 1:
            self.anti_diag += value

        if (abs(self.rows[row]) == self.n or 
            abs(self.cols[col]) == self.n or 
            abs(self.diag) == self.n or 
            abs(self.anti_diag) == self.n):
            return player
        
        return 0  # No winner yet

Shortest path in matrix with Obstacles

"""
BFS is preferred in grid traversals as it allows us to visit all possible neighbours at once. All the cells at the certain level from the start will be pushed into the queue in single iteration (imagine moving radially). So if the endpoint is at multiple levels, the first return will occur at the minimum possible level. And thinking in the same line DFS is complicated here. We would need to backtrack and keep track of visited in DFS (not required in BFS) and track the path lengths and compare at the end. It doesn’t guarantee minimum path length unless all paths are explored. Prone to TLE if not heavily pruned. 
"""
"""
Time: O(m * n * (k+1)) - Each cell can be visited multiple times with different k values (like (i, j, 0)...(i, j, k))
Space: O(m * n * (k+1)) - Due to the size of visited and queue
"""
from collections import deque

class Solution:
    def shortestPath(self, grid, k):
        m, n = len(grid), len(grid[0])

        # Early exit: if k is large enough to walk straight ignoring obstacles
        if k >= m + n - 2:
            return m + n - 2

        # Each state: (row, col, remaining k)
        queue = deque([(0, 0, k)])
        visited = set([(0, 0, k)])
        steps = 0

        directions = [(0, 1), (1, 0), (0, -1), (-1, 0)]

        while queue:
            for _ in range(len(queue)):
                x, y, remaining = queue.popleft()

                # Reached destination
                if x == m - 1 and y == n - 1:
                    return steps

                for dx, dy in directions:
                    nx, ny = x + dx, y + dy

                    if 0 <= nx < m and 0 <= ny < n:
                        nk = remaining - grid[nx][ny]  # decrease k if obstacle
                        state = (nx, ny, nk)

                        if nk >= 0 and state not in visited:
                            visited.add(state)
                            queue.append(state)

            steps += 1

        return -1  # Not reachable

Sudoku solving (Backtracking with constraints)

"""
Time - O(9^(n)), where n is the number of empty cells. Each cell has at most 9 choices. In practice, backtracking prunes invalid branches, making it much faster.
Space - O(1), since the board is modified in place. The recursion depth is at most O(81).
Row/Col/Box lookup - O(1) using sets
Sorting Empty Cells (MRV) - O(k log k), where k is the number of empty cells
Backtracking Calls - Much fewer than O(9^k) due to MRV
"""
from typing import List

def solveSudoku(board: List[List[str]]) -> None:
    row_sets = [set() for _ in range(9)]
    col_sets = [set() for _ in range(9)]
    box_sets = [[set() for _ in range(3)] for _ in range(3)]
    empty_cells = []

    # Initialize constraints and collect empty cells
    for i in range(9):
        for j in range(9):
            if board[i][j] != ".":
                num = board[i][j]
                row_sets[i].add(num)
                col_sets[j].add(num)
                box_sets[i // 3][j // 3].add(num)
            else:
                empty_cells.append((i, j))

    # Sort empty cells by MRV (least available options first)
    def count_options(cell):
        i, j = cell
        return sum(1 for num in "123456789" if num not in row_sets[i] and num not in col_sets[j] and num not in box_sets[i // 3][j // 3])
    
    empty_cells.sort(key=count_options)

    def backtrack(index):
        if index == len(empty_cells):
            return True  # Solved

        i, j = empty_cells[index]

        for num in "123456789":
            if num not in row_sets[i] and num not in col_sets[j] and num not in box_sets[i // 3][j // 3]:
                # Place the number
                board[i][j] = num
                row_sets[i].add(num)
                col_sets[j].add(num)
                box_sets[i // 3][j // 3].add(num)

                if backtrack(index + 1):
                    return True  # Continue solving

                # Undo placement (backtrack)
                board[i][j] = "."
                row_sets[i].remove(num)
                col_sets[j].remove(num)
                box_sets[i // 3][j // 3].remove(num)

        return False  # No valid number found

    backtrack(0)

N-queens (Backtracking with column AND diagonal tracking)

"""
Worst case O(N!) time complexity, because for each row we can try all N columns (but backtracking prunes a lot of branches).
"""
 def solveNQueens(self, n: int) -> List[List[str]]:
        def backtrack(row: int):
            if row == n:
                # Found a valid board configuration
                board = ["".join(r) for r in current_board]
                results.append(board)
                return

            for col in range(n):
                if col in cols or (row + col) in pos_diagonals or (row - col) in neg_diagonals:
                    continue  # Conflict: skip this position

                # Place queen
                cols.add(col)
                pos_diagonals.add(row + col)
                neg_diagonals.add(row - col)
                current_board[row][col] = 'Q'

                backtrack(row + 1)

                # Remove queen (backtrack)
                cols.remove(col)
                pos_diagonals.remove(row + col)
                neg_diagonals.remove(row - col)
                current_board[row][col] = '.'

        results = []
        cols = set()            # columns where queens are placed
        pos_diagonals = set()   # r + c diagonals (↘️)
        neg_diagonals = set()   # r - c diagonals (↙️)
        current_board = [["."] * n for _ in range(n)]

        backtrack(0)
        return results

Number of Distinct Islands

class Solution:
    def numDistinctIslands(self, grid: list[list[int]]) -> int:
        m, n = len(grid), len(grid[0])
        
        # A set to store the unique shapes of islands.
        # We'll store shapes as frozensets of relative coordinates (tuples).
        distinct_islands = set()
        
        # 8 Directions for DFS:
        # (dr, dc)
        # Horizontal: (0, 1), (0, -1)
        # Vertical:   (1, 0), (-1, 0)
        # Diagonal:   (1, 1), (1, -1), (-1, 1), (-1, -1)
        # directions = [
        #     (0, 1), (0, -1),  # Right, Left
        #     (1, 0), (-1, 0),  # Down, Up
        #     (1, 1), (1, -1),  # Down-Right, Down-Left
        #     (-1, 1), (-1, -1) # Up-Right, Up-Left
        # ]

        # Generate 8 directions using nested loops
        # dr will go from -1, 0, 1 (for up, current row, down)
        # dc will go from -1, 0, 1 (for left, current col, right)
        # We exclude (0, 0) as that's the current cell itself.
        directions = []
        for dr in range(-1, 2):
            for dc in range(-1, 2):
                if dr == 0 and dc == 0:
                    continue # Skip the current cell itself
                directions.append((dr, dc))

        # DFS helper function to explore an island and record its shape
        def dfs(r: int, c: int, island_shape: list, row_offset: int, col_offset: int):
            # Mark the current cell as visited by changing it from 1 to 0
            grid[r][c] = 0 # Change land to water (or a distinct visited marker)
            
            # Add the relative coordinates of the current cell to the island_shape list
            # Normalized relative to the top-leftmost point of the island.
            island_shape.append((r - row_offset, c - col_offset))
            
            # Explore neighbors in all 8 directions
            for dr, dc in directions:
                nr, nc = r + dr, c + dc
                
                # Check bounds and if it's an unvisited land cell
                if 0 <= nr < m and 0 <= nc < n and grid[nr][nc] == 1:
                    dfs(nr, nc, island_shape, row_offset, col_offset)

        # Iterate through the grid to find islands
        for r in range(m):
            for c in range(n):
                if grid[r][c] == 1: # Check if it's land
                    # Found a new island, start DFS from here
                    current_island_shape = []
                    # Pass the starting coordinates as offsets to normalize future coordinates
                    dfs(r, c, current_island_shape, r, c)
                    
                    # Add the unique shape (as a frozenset for immutability) to our set
                    # Using frozenset handles cases where the points might be visited
                    # in different orders but represent the same shape.
                    # If maintaining a specific order within the shape is crucial (e.g., for
                    # a path-based signature), then sorting current_island_shape and then
                    # converting to a tuple would be more appropriate:
                    # current_island_shape.sort()
                    # distinct_islands.add(tuple(current_island_shape))
                    distinct_islands.add(frozenset(current_island_shape))
        
        return len(distinct_islands)

Dynamic Programming

Palindromic Substrings

"""
Expand Around Center:
A palindrome can be centered around a single character (odd length, e.g., "aba") or between two characters (even length, e.g., "abba").
The "expand around center" approach iterates through every possible center of a palindrome and then expands outwards to find all palindromes centered at that point.

Odd Length Palindromes - Each character in the string can be the center of an odd-length palindrome. We can start with l = r = i (where i is the current character's index) and expand outwards.

Even Length Palindromes - Each pair of adjacent characters can be the center of an even-length palindrome. We can start with l = i and r = i + 1 and expand outwards.

By doing this for every possible center, we ensure that all palindromic substrings are counted exactly once.
"""
class Solution:
    def countSubstrings(self, s: str) -> int:
        """
        Counts the total number of palindromic substrings in a given string.

        Args:
            s: The input string.

        Returns:
            The total count of palindromic substrings.
        """
        # Initialize the total count of palindromic substrings found.
        total_palindromes = 0
        n = len(s) # Get the length of the string for boundary checks

        # Iterate through each possible center of a palindrome.
        # A single character can be the center of an odd-length palindrome (i, i).
        # The space between two characters can be the center of an even-length palindrome (i, i + 1).
        for i in range(n):
            # Case 1: Odd-length palindromes (center is a single character)
            # Start expanding from s[i] as the center.
            total_palindromes += self._count_palindromes_from_center(s, i, i)

            # Case 2: Even-length palindromes (center is between two characters)
            # Start expanding from s[i] and s[i+1] as the center.
            total_palindromes += self._count_palindromes_from_center(s, i, i + 1)
        
        return total_palindromes

    def _count_palindromes_from_center(self, s: str, l: int, r: int) -> int:
        """
        Helper function to count palindromes by expanding outwards from a given center.

        Args:
            s: The input string.
            l: The left pointer, starting at the left side of the potential center.
            r: The right pointer, starting at the right side of the potential center.

        Returns:
            The number of palindromes found by expanding from this center.
        """
        # Initialize count for palindromes found from this specific center.
        current_center_palindromes = 0
        n = len(s) # Get the length of the string for boundary checks

        # Expand outwards as long as:
        # 1. The left pointer 'l' is within bounds (>= 0).
        # 2. The right pointer 'r' is within bounds (< string length).
        # 3. The characters at 'l' and 'r' are equal (maintaining palindrome property).
        while l >= 0 and r < n and s[l] == s[r]:
            # If the characters match, it means we've found a valid palindrome.
            # E.g., if "aba", initially l=0, r=0, s[0]==s[0] (a), count=1.
            # Then l=-1, r=1, now l=0, r=2, s[0]==s[2] (a==a), count=2 for "aba".
            current_center_palindromes += 1
            
            # Move pointers outwards for the next check.
            l -= 1 # Move left pointer to the left
            r += 1 # Move right pointer to the right
            
        return current_center_palindromes

Decode Ways

"""
Time: O(N)
Space: O(N) due to recursion stack + memoization
"""
def numDecodings(self, s: str) -> int:
        n = len(s)

        @lru_cache(None)
        def recurse(start: int) -> int:
            if start >= n:
                return 1

            one, two = 0, 0

            # Take one digit if it's valid (i.e., not '0')
            if "1" <= s[start:start+1] <= "9":
                one = recurse(start + 1)

            # Take two digits if they form a valid number from 10 to 26
            if start + 2 <= n and "10" <= s[start:start+2] <= "26":
                two = recurse(start + 2)

            return one + two

        return recurse(0)

Edit Distance

def edit_distance(word1: str, word2: str) -> int:
        """Minimum operations to convert word1 to word2"""
        m, n = len(word1), len(word2)
        dp = [[0] * (n + 1) for _ in range(m + 1)]
        
        for i in range(m + 1):
            dp[i][0] = i
        for j in range(n + 1):
            dp[0][j] = j
            
        for i in range(1, m + 1):
            for j in range(1, n + 1):
                if word1[i-1] == word2[j-1]:
                    dp[i][j] = dp[i-1][j-1]
                else:
                    dp[i][j] = min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) + 1
        return dp[m][n]

Knapsack Problem

def knapsack_01(values, weights, capacity):
        """0/1 Knapsack Problem"""
        n = len(values)
        dp = [[0] * (capacity + 1) for _ in range(n + 1)]
        
        for i in range(1, n + 1):
            for w in range(capacity + 1):
                if weights[i-1] <= w:
                    dp[i][w] = max(values[i-1] + dp[i-1][w-weights[i-1]], dp[i-1][w])
                else:
                    dp[i][w] = dp[i-1][w]
        return dp[n][capacity]

Longest Common Subsequence and Longest Increasing Subsequence

def longest_common_subsequence(text1: str, text2: str) -> int:
        """Dynamic Programming - LCS"""
        m, n = len(text1), len(text2)
        dp = [[0] * (n + 1) for _ in range(m + 1)]
        
        for i in range(1, m + 1):
            for j in range(1, n + 1):
                if text1[i-1] == text2[j-1]:
                    dp[i][j] = dp[i-1][j-1] + 1
                else:
                    dp[i][j] = max(dp[i-1][j], dp[i][j-1])
        
        return dp[m][n]

def longest_increasing_subsequence(nums):
        """LIS using binary search O(nlogn)"""
        if not nums:
            return 0
        
        tails = [0] * len(nums)
        size = 0
        
        for num in nums:
            left, right = 0, size
            while left < right:
                mid = (left + right) // 2
                if tails[mid] < num:
                    left = mid + 1
                else:
                    right = mid
            tails[left] = num
            size = max(size, left + 1)
        return size

Best Time to Buy and Sell Stock limited to k Transactions

"""
Time Complexity: O(n * k)
Space Complexity: O(n * k) due to recursion stack and cache
"""
from functools import lru_cache
from math import inf

def maxProfit(self, k: int, prices: List[int]) -> int:
    @lru_cache(None)
    def recurse(time: int, stock: int, count: int) -> int:
        # Base cases
        if count >= k or time == len(prices):
            return 0

        # Option 1: Hold
        hold = recurse(time + 1, stock, count)

        # Option 2: Buy (if not holding)
        if stock == 0:
            buy = -prices[time] + recurse(time + 1, 1, count)
        else:
            buy = -inf

        # Option 3: Sell (if holding)
        if stock == 1:
            sell = prices[time] + recurse(time + 1, 0, count + 1)
        else:
            sell = -inf

        return max(hold, buy, sell)

    return recurse(0, 0, 0)

Unique BST

"""
Time: O(n^2)
Space: O(n)
"""
def numTrees(n: int) -> int:
    dp = [0] * (n + 1)
    dp[0] = dp[1] = 1  # Base cases

    for nodes in range(2, n + 1):
        for root in range(1, nodes + 1):
            left = dp[root - 1]    # Nodes in left subtree
            right = dp[nodes - root]  # Nodes in right subtree
            dp[nodes] += left * right

    return dp[n]

Maximum Profit in Job Scheduling

from typing import List
from functools import lru_cache
import bisect

class Solution:
    def jobScheduling(self, startTime: List[int], endTime: List[int], profit: List[int]) -> int:
        # Step 1: Combine all jobs and sort by start time for efficient binary search
        jobs = sorted(zip(startTime, endTime, profit))  # Each job: (start, end, profit)
        n = len(jobs)
        starts = [job[0] for job in jobs]  # For binary search

        @lru_cache(maxsize=None)
        def dfs(index: int) -> int:
            # Base case: no more jobs to process
            if index == n:
                return 0
            # Option 1: skip current job
            skip = dfs(index + 1)
            # Option 2: take current job
            curr_start, curr_end, curr_profit = jobs[index]
            # Find the next job whose start time >= curr_end
            next_index = bisect.bisect_left(starts, curr_end)
            take = curr_profit + dfs(next_index)
            # Return the best of both options
            return max(skip, take)

        return dfs(0)

from typing import List
import bisect

class Solution:
    def jobScheduling(self, startTime: List[int], endTime: List[int], profit: List[int]) -> int:
        # Step 1: Combine all job info and sort by end time
        jobs = sorted(zip(startTime, endTime, profit), key=lambda x: x[1])
        n = len(jobs)
        # dp[i]: max profit for first i jobs (jobs[0] to jobs[i-1])
        dp = [0] * (n + 1)
        # ends: list of end times for binary search
        ends = [job[1] for job in jobs]

        for i in range(1, n + 1):
            s, e, p = jobs[i - 1]
            # Find the last job that ends <= current job's start (non-overlapping)
            # bisect_right returns the insertion point, so subtract 1 for last compatible job
            idx = bisect.bisect_right(ends, s, hi=i-1)
            # Option 1: skip this job (dp[i-1])
            # Option 2: take this job (profit + best up to idx)
            dp[i] = max(dp[i-1], dp[idx] + p)

        return dp[n]

Matrix Chain Multiplication

Don't underestimate easy topics. Generally Dynamic Programming and Advanced Graphs are not asked in interviews (until and unless you are very unlucky). Just know the basics really well, and rest you can manage in the interview.

Impractical Stuff

If I ever take interviews, I wont ask Hard Leetcode questions. They are mostly impractical for any Senior Engineer with a day-job and a family ( º﹃º ). I am even crazier, I would ask the candidate to build a Iterator or some applied Data Structure problem, any basic thing that's used in day-to-day job just to check ability to fight through a problem. And that would cover most relevant Data Structures and Algorithms concepts. Even LeetCode has some interesting questions like :

Design HashMap and HashSet
Design Twitter
Design Add and Search Words Data Structure
Design Hit Counter
LFU and LRU Cache
Design Circular Queue
Insert Delete GetRandom O(1)
Time Based Key-Value Store
Single-Threaded CPU
Design Browser History

OOPs

When only one instance of a class is required, we use a Singleton Class. For this, we make the constructor private. Clarity over OOPs concepts is crucial. The Pillars of OOPs are : like Tight and Loose Coupling, Inheritance and Composition, Abstract Classes and Interfaces, Diamond Problem, Dependecny Injection,

Data Abstraction - hide the internal implementation, expose only the essential functionality. Achieved via Interface and Abstract Classes
Data Encapsulation - Classes bundle the data with its corresponsing code

Low-Level Design

This round primarily focuses on SOLID, DRY, KISS, OOPs, Design Patterns and the overall modularity, maintainability and testability of the code.

The best way to master LLD is to write good code in your day job ;) When desiging the business component, the LLD matters the most. If the code is not extensible enough, it can happen that if an important upstrea dependeny like payment gateway, auth, etc goes down for some time, your users get significantly affected. Thus a layer of abstraction is crucial for us to make changes in our componenets as requried, and not have strong coupling with upsteam providers.

I will do all of these in different languages to share a feel on how each language has its own philosophy of doing things. Also, in these Machine Coding Rounds, you will be expected to write fully-functional code (maybe with class diagrams as well) in an hour, so practise these :

Design Tic Tac Toe
Design ATM
Design Login System
Design Web Scraper (will be used to train AI models, diffenrent forms of data finally converted to text )
Design Parking Lot (vehicle entry, exit and pricing model)
Design Library Management (catalog users, borrow-return logic)
Design Snake Game (dynamic movement and collision detection)
Design Chess Game
Design Rate Limiter (Token bucket or sliding window algorithm)
Design Elevator System (multi-threaded handling of floors and requests)
Design Message Queue
Design File System
Design SQL (data structures to use, generic type of data, adding indexing, making it multithreaded, making it extensible, adding logs, deadlocks management)
Concurrent Logger system
Multi-directory file system with read-write permissions

API Design Rounds

This is a new thing in the market, and similar to LLD, its just that we need to create API. In a general system design round, I dont spend too much time on API, but as its an API Design round, I can go deeper. API designing round expects you to write production standard API to solve the problem. Later you can be asked to extend the solution by implementing approrpiate design patterns. This guide) is a good read. But still, I am writing from my side.

First of all, the HTTP Methods :

GET is to retrieve data
POST is to create new data everytime (not idempotenet - we get different creation when we run it)
PUT is to update/replace data (its idempotent - we get the same row updated everytime)
PATCH is for partial update (when we only want to change a part of resource - say date)
DELETE removes data And some response code :
200 - Success and 201 - Success and Created
400 - Bad Request; 401 - Not Authenticated and 404 - Not FOund
500 - Server Error

Generally speaking, the followups in API Designs are either :

Rate Limiting with 429 - Too Many Requests
Pagination - When the endpoint is going to return ton's of resources, and such a massive JSON payload is pretty high latency. Now if things are simple (read-heavy systems), use Page - GET /events?page=2&limit=25. But in case of write-heavy systems, this would fail as the rows are being constantly updated in DB. So instead of hopping to a Page, we hop using a Cursor - GET /events?cursor=123&limit=25 saying to give us next 25 items, starting from ID 123 where the cursor is at presently.
For security, we should authenticate the API using JWT or session token simply.

High Level Rounds

Everybody seems to throw scaling buzzwords in interviews as if they are being hired to build the next Whatsapp! Most probably you will do some vibe coding in your day-job, be honest!

But anyways, understanding the building blocks of designing a large scale system is cool. The broad understanding of designing large-scale systems in these domains are required.

Things to keep in mind :

You should know the tradeoffs of your decision.
Any system is infinitely designable, but you should know the precise scope - funcitonal (for users and business) and non functional (engineering implementation nits like availability, security, speed and latency, reliability, etc). Asking ciritical clarification questions when designing. Surge of users? Is the system read/write heavy?
The design requirements need to be objective, data-driven not subjective.
Go step by step, define building blocks, relationship between them (wrt data, responsibility), communication (HTTP/S based API requests, raw TCP/UDP, websockets, RPC, etc) among them, and then according to your requirements, fix scaling bottlenecks. No premature optimization.
In case of startups, the design incrementally (MVP -> improve for scale), whereas in Big Tech as we already have customers, we can have a bottom-up approach of building individial components

Technical things :

Delegate to specializaed component to gain performance out of the system
We can use epoch (unit time ints) instead of DateTime to improve performance. Range queries for int is significant performant. If only Date is stored, YYYYMMDD is best of both worlds. And, always operate on UTC, as browsers convert them to local time on their own.
Hard delete is when we DELETE the row from the DB, whereas in Soft Delete, we are marking the row as deleted but not deleting it (recovery, archiving and restoring the data). Also, hard delete with high throughput causes the data to be fragmented on the disk due to repeated Disk I/O being performed by the DB engine and the B+ tree being re-balanced again and again. It's better to batch the soft deletes, and at low-traffic time, do a hard delete of that batch.
To store image/videos/big files, then throw them in an object storage and make API call. Don't store them in DB.
The best practise is to have a 3NF normalized DB to reduce redundancy, we are incuring the cons of having to JOIN which are expensive.
Also, REST API designing has to be done
Syncing among DBs is another bottleneck in case of horizontally scaling the system.
Application Server (request layer) cache is quite underrated.
Caching at CDN level spans accross geography. Moslty static data or API responses aare cached here. It gets the content from closest available server, instead of main server. Reduces load on main server, improves uptime, along with improving security. TTL needs to be configured so that
Caches are not for transactional data.
Client makes request to server, and server delegates the work to queue (message broker) and returns respose immediately. The task is picked by any worked from the queue, which upon completion writes to the DB. The lag in this process can be an alert as its shouldnt take a lot of time in this process.
Short Polling is almost always a bad idea. You will be continously bugging the server but most of the times get no data. Also, waste clients bandwidth! Another approach is long polling, where the data is not sent empty. Incase of timeout, process will be repeated. Connection is open until data is available, thus we can have a lot of open connections. A preffered way is using Websockets, where a bi-directional channel is kept open (persistent TCP), and server pro-actively sends messages, client doesn't need to ask. Useful for real-time data transfer (live-streaming, interaction, chat, collaboration, etc). We can pass any data in Websockets. Though it's pretty expensive.
Its a common misconception that a client-server setup can have max 65k connections, but TCP connections are actually identified by a unique combination of four elements - Source IP address, Source port, Destination IP address and Destination port. Since each client connects from a different IP address and port combination to the server's IP and port, millions of unique connections can be established to a single server. WhatsApp scaled upto 3 million connections!

Basic Things

This video summarizes perfectly.

The holy grail of non-functional requirements in CAP Theorem. We can only have 2 out of 3 :

Consistency - all nodes see same data at the same time
Availability - every request gets some respose, successful or not
Partition Tolerance - system should work despite network failures between nodes

Well, we surely need Partition Tolerance in Distributed Systems, so we get to choose between Availability and Consistency. Simple question to ask yourself, will it be catastrophic if two users see different state of the system at the same time? If yes, you need consistency over availability.

Large Scale Infra

You dont want to over/under provision your systems. When it comes to ML systems, you need to manage GPUs as well. Also, backups stores will have to be maintained as well. Geography is an important factor to consider as well. So, we do extensive Capacity Planning when designing our systems. We will need to scale down and up depending on the traffic!

The DB strategy could be a master-slave architecutre, having Master DBs (write) and read replicas. The problem here would be the replication lag. Another strategy would be to use a Distributed/Sharded DB, wherein the data is splitted among the DBs. A sharded approach is much simpler to manage compared to a multi-master setup (write conflicts, auto-incremental IDs clashing, etc). Also, now with advent of great NoSQL DBs, we get sharding my default and it scales pretty well.

Even if we are on AWS or some other cloud provider, crashes will always be there when the number of machines increase. And also you can have various target groups for different requirements managed by the Load Balancer. Another crucial this is to have deep monitoring memory, cache hits, CPU, etc, for creating target groups initially. Also, dont forget to scale DBs via the Load Balancer. It can keep a limited number of connections. Bloddy hell, overprovison the shared resources (DB, cache, elastic search, etc) a bit, wont hurt you much, but can be life-daving when using god knows what deployment strategy.

In case of horizontal scaling, the deployment becomes a bit tideous as it happens on all machines, some will pass and some might fail in the CI. Flakes everywhere.

Operating Systems

A process has its own memeory, pages. Multi-processing is different from multi-threading. Inter-process communication is hard, so naturally we use mult-threading more.

System Calls

Process Scheduling and Page Replacement Algorithms

Syncronization and Deadlocks

Multi-threading and Concurrency

Another crucial thing to focus on is concurrency (context switching in the user space), parallelism (exploiting multiple cores) and asyncronous programming (non-blocking coroutine). Most Python devs using async/await don’t really know what’s going on under the hood. async/await is an asynchronous mechanism, NOT a concurrency model. It lets you write non-blocking code that looks like sequential code. It (by itself) does NOT make tasks run concurrently or in parallel on its own. We get concurrency only when we initiate multiple async operations together. It involves delegating I/O to event loop.

Process is a program being executed stuck to one CPU, which can be made up of multiple threads. A thread is the single unit of CPU execution in the process. In each process, there is atleast a main thread. To take advantage of multi-cores, we use multi-processes in Python. Mult-threading works with shared-memory (shared disk files for IPC) . Due to illusion of concurrency, we can run as many threads as we want, but they will not be actual hardware threads. Threading will only work in real sense when the language supports it - like Java, Go, etc.

Threading (for CPU-intensive tasks) is a concurrent execution model whereby multiple threads within a single process takes turns executing tasks. One process can contain multiple threads. Python has a complicated relationship with threading thanks to its GIL, which limits the ability of threads to run concurrently when executing Python code. Python can have only one thread executing at a time. So, in case of CPU bound tasks in python, process-based concurrency is superior as its not nerfed by GIL. However, since each process has its memory space, interprocess communication (IPC) can be more challenging to implement. Asyncio (allows for concurrent execution of coroutines) can be seen as a combination of the benefits of threading and processes, allowing for the efficient handling of I/O-bound tasks while still being able to execute CPU-bound tasks concurrently. The async keyword converts a Python function into a coroutine, whereby the coroutine can be executed asynchronously. The async keyword returns a coroutine object that is run by the event loop. What this essentially means is that the coroutine can momentarily pause its execution under the following circumstances:

Waiting for I/O operations – making network requests, interacting with databases, and more.
Waiting for external events – specific test conditions before proceeding with actions, monitoring & logging server-side issues, and more.
Achieving better concurrency – yielding control to the event loop when there are waits (or sleep), running multiple coroutines using asyncio.gather() Coroutines can pause their execution using the await keyword. The await keyword suspends the currently executing coroutine, and the control is yielded to the event loop. The suspend coroutine/task is again scheduled for execution when the awaitables (i.e. I/O, timer expiry, etc.) completes. With the current task suspended, the event loop now schedules and executes coroutines that are ready for execution. Once the awaited task is completed, the earlier suspended coroutine resumes execution from the point where it was paused.

Asyncio is pretty complex in case of production work. And, debugging can be challenging here in complex applications, as the interactions between coroutines can be difficult to reason about.

There are other ways to handle asynchronous logic as well - Callbacks that are hard to manage, Futures/Promises which are cleaner yet nested, and manual Threads that are heavy and complex. Sometimes we need those. But in general interviews, I think the focus would be on async/await made it possible to write asynchronous logic that looks like normal sequential code avoiding thread blocking while improving readability and maintainability.

It is very evident that asynchronous execution is the way to go when the application involves operations with external resources – network requests, database queries, I/O, etc. In such scenarios, the CPU would be less loaded whereby it can pick up other tasks that require its attention. For CPU-bound tasks or blocking I/O, Python asyncio can be used with ThreadPoolExecutor for offloading tasks from the asyncio event loop. Also, Python asyncio with ProcessPoolExecutor offers the benefits of parallelism by making the best use of multi-CPU cores.

Using async in the code may result in some unexpected behavior as global objects and singletons can share state.

But multiple threads working on same data at same time can create race condition, so use Mutex. And we want our locks to be as Granular as possible, as the locks make our program syncronous for some time. In case of read/write heavy applications, shared lock help. Suppose there are a lot of reads, then shared lock will allow multiple reads to happen, but mutex on write. Another thing is Deadlocks, which can be solved using condition variables.

And the age old interview question on Semaphore and Mutexes. Both are synconization primitives. But Mutexes are a locking mechanism, allow only one thread in the critical section - either locked/unlocked,

Interestingly, there are some cute problems on LeetCode revolving around these concepts :

Databases

There are a variety of DBs :

In-memory (caches)
Disk Based (normal MySQL, MongoDB, etc)
Servered (the DB has its own server which is used to interact with the client) and Embedded (running within the application)
On the basis of data organization - Row and Columnar
Graph (model subject-object relationships)
Time Series (any metrics vs time)
Relational and Non-Relational
Blob Storage (dosent care about the data, its just a long binary object)
Storages for text-based search (ElasticSearch for fuzzy searching, etc)

Normalization

Normalization is the process of structuring relational tables to reduce redundancy and anomalies. First Normal Form (1NF) requires atomic values and a primary key. Second Normal Form (2NF) requires 1NF plus no partial dependencies: every non-key column must depend on the entire primary key, not just part of it. In practice, this means eliminating tables where a composite key partially determines a column. Third Normal Form (3NF) requires 2NF and no transitive dependencies: non-key columns cannot depend on other non-key columns. For example, splitting “employees” from “roles” tables so that attributes like employee name aren’t duplicated per role. Higher forms (BCNF, 4NF) address more subtle cases. Normalizing to 3NF ensures that each fact is stored once, which avoids insert/delete/update anomalie.

Interestingly, we sometimes preserve certain redundancy for improving performance.

ACID

The most prominent name in DBs world is Relational DBs becuase of their ACID properties. A database transaction is ACID if it satisfies Atomicity (single unit of execution, all-or-nothing execution preventing data loss and corruption from occurring), Consistency (complete state change, transactions only make changes with rules/constraints preserved ensuring that corruption in the data does not create unintended consequences for the integrity of your table), Isolation (concurrent transactions behave as if serial), and Durability (successfully executed transactions are committed, changes survive system failure, and post-commit data is recoverable even after system failure) and Isolation (concurrent execution). In a distributed environment, maintaining ACID requires a coordination layer. For example, a two-phase commit (2PC) protocol is often used: a coordinator asks each participant to prepare (vote yes/no), and only if all vote “Yes” does the coordinator broadcast a commit; otherwise it aborts. This ensures atomic commits across nodes, but 2PC can block under failures and is not Byzantine fault tolerant. Some modern systems use consensus (Raft/Paxos) for metadata or use optimistic commit via logs (e.g. Delta Lake’s transaction log) to provide ACID guarantees on object stores. The beauty of ACID transactions is that users can trust the data that is stored in Delta Lake.

MVCC

Multi-version concurrency control (MVCC) allows lock-free reads by keeping multiple versions of each record. Writers create new versions instead of overwriting, so readers access a consistent snapshot based on a timestamp or transaction ID. MVCC thus ensures readers never see “half-committed” data. Periodically, old versions are garbage-collected (e.g. PostgreSQL’s VACUUM). It's widely used in RDBMS (Postgres, Oracle) and also in storage engines like RocksDB to handle concurrent updates and snapshot reads without locks.

Indexes

The primary job of DB index is to improve lookup performance, on the expense of write performance (as the main table along with index tables need to be updated). Each table is small with 2 columns, but there can be a number of indices to speed up the lookup. It can be :

Primary Index - A key paired with address to the data for lookup. Small and easy to load in memory
Clustered Index - Index and data reside together. So lookup is significantly faster, but ofcourse writes are a problem, keeping data and indexes in sync will be challenging. So, don't have too many of them.

Traditional relational engines use B-tree indexes: balanced tree structures on disk where each node holds sorted key ranges. B-trees allow efficient point queries and range scans by traversing from root to leaf (requiring O(log N) I/O). In contrast, LSM-tree (Log-Structured Merge-tree) engines (used in Cassandra, RocksDB, HBase) batch writes in memory (memtables) and periodically merge to disk in sorted runs. LSM-trees optimize write throughput by sequentializing disk writes and trading some read complexity (reads may need to check multiple levels).

SQL query optimizers use cost-based planning: they enumerate possible join orders, access paths, and choose the lowest-cost plan. Heuristics (e.g. join reordering, predicate pushdown) and statistics (table/cardinalities) guide the planner. Common optimization strategies include transforming subqueries to joins, combining filters, and choosing efficient join algorithms (see below). Indexes dramatically speed up queries by allowing lookups instead of full scans. The most common index is the B-tree index, which stores keys in sorted tree nodes. B-tree indexes offer logarithmic lookup time and support efficient range queries (e.g. all rows between values). For example, a B-tree index on a date column can quickly return all rows within a date range. Hash indexes, by contrast, map keys via a hash function and allow O(1) exact-match lookups, but do not support ordered or range scans (they lose the key order). In analytical engines, columnar indexes and bloom-filters are used: Parquet files, for instance, record per-column dictionary or min/max stats to skip non-matching row-groups. Parquet can also embed Bloom filters per page to quickly exclude pages that do not contain a value

As above, 2PC is the classic protocol for atomic commits in a distributed DB, ensuring all nodes commit or all abort. 2PC is simple but can be blocking: if the coordinator fails after sending “prepare” but before final decision, participants may wait indefinitely. To mitigate this, some systems use consensus (e.g. Raft) to coordinate commit logs or use asynchronous replication with conflict detection. Another approach is optimistic concurrency + transaction logs (as in Delta Lake/Iceberg) where readers always see a consistent snapshot and writers append new versions atomically, sidestepping classic 2PC among data nodes.

SQL Joins

SQL supports several join types:

Inner Join: Returns rows where the join condition matches on both tables.
Outer Joins: Left/Right Outer include all rows from one side and matched rows from the other (filling nulls if no match). Full Outer includes all rows from both sides.
Cross Join: Cartesian product (all combinations).
Semijoins/Antijoins: (e.g. WHERE EXISTS) return rows from one side that have (or don’t have) matches in the other.

For Join Implementation, at execution time, a DB uses algorithms like:

Nested-Loop Join: For each row in Table A (outer), scan matching rows in Table B. Simple and good when one table is small or when indexed on the join key - Worst-case cost is O(N×M).
Hash Join: Build a hash table on the smaller input’s join keys (build phase), then probe with the larger input’s rows (probe phase). Provides O(N+M) time for equi-joins. Excellent for large unordered tables with exact-key predicates.
Sort-Merge Join: Sort both inputs on the join key and then perform a linear merge scan. Efficient when inputs are pre-sorted (or indexed) on the join key. It requires additional cost to sort if not already ordered.

Modern systems choose among these based on statistics: e.g., use a hash join when both sides are large and unsorted, use merge join if sorted (or a merge-friendly storage like columnar), and fall back to nested loops for highly selective conditions or small inputs. Cost-based optimizers will pick the plan with lowest estimated I/O/CPU.

When analyzing a query in databases like MySQL, the EXPLAIN statement shows the join type, indicating how the database engine accesses and joins tables.

Join types (from best to worst): const, eq_ref, ref, range, index, and ALL. ALL - The worst join type—a full table scan, meaning every row is checked. This is slow, especially for large tables, thus bad for performance. index and range - These are better because they mean the database uses an index to access a subset of data, avoiding full table scans.

Cache

DBs have their internal buffer pools caching the page blocks of disk it has recently read depending on the algorithm applied.

Distributed Systems

The CAP theorem states a distributed system can simultaneously provide only two of {Consistency, Availability, Partition Tolerance}. In practice, partitions (network failures) are a fact, so systems choose between strong consistency (always agree on a single value) and availability (always respond, possibly stale). Strong consistency (as in traditional RDBMS or Paxos-based systems) ensures every read sees the latest committed write, but can block or fail under partition. Eventual consistency (as in many NoSQL stores) allows reads to return outdated data but guarantees convergence when nodes reconcile. Many modern distributed DBs offer tunable consistency (e.g. Quorum reads/writes in Cassandra, linearizable reads vs timeline in DynamoDB).

Fault-tolerant systems use consensus protocols to agree on state changes. Paxos is the original consensus protocol for agreeing on a value among unreliable nodes. It guarantees safety (consistency) but is complex. Raft is a later protocol designed for understandability: it decomposes consensus into leader election, log replication, and safety rules. In Raft, one leader handles all log appends; if the leader fails, a new election is held. Both Paxos and Raft require a majority of replicas and can tolerate node failures (but not network partitions beyond a majority). These protocols underlie many systems (e.g. etcd, Consul, TiKV) to manage a replicated log or state machine, ensuring that updates (schema changes, metadata commits) occur atomically across nodes

Beyond MVCC, distributed systems use patterns like distributed locking (e.g. ZooKeeper/etcd locks), leader-follower sharding (each partition has a single writer leader), and CRDTs (conflict-free replicated data types) to handle concurrent updates. Many data lake systems use optimistic concurrency: readers get snapshot views and writers append updates; conflicts are resolved via atomic replace or merge (as with Delta Lake’s transaction log.

Caching

Is for high throughput systems, when we want to tackle expensive Disk I/O and Network I/O. Data loss is possible as its stored in RAM. We want to maximize cache hit or minimize cache miss. It gives us quick reads and writes.

Various cache eviction policies to choose from - FIFO, LIFO, LRU and LFU (both O(1)), MRU and Random Replacement.

Types of caching systems :

Write through cache - The data is written to DB and cache at same time. We conclude that the data is updated if both the writes are successful. Thus, real-time sync of DB and cache. The advantage is that we have fast retrivals, data consistency, falut tolerance, but we pay by high write latency.
Write around Cache - We write to the DB, not to the cache. We take care of the sync seperately. Useful for cases where new written data is not immediately read. Not every write might go to the cache, so we are not flooding it, instead polulating the cache according to the reads.
Write Back Cache - We write to the cache first, and DB is written in background. This is for very low latency, high throughput, write intensive applications, but has a Data Availability risk. Very high performance. Always have commit logs, we dont want data to be done if something happens to the cache. Don't use this for transactional data.

Caching can happen at almost all levels in the architecuture :

DNS and CDN level
Load Balancer (Ngnix, HBASE)
API Server and API Gateway
Client Side (cache recent queries made by user on the application side)
Materialized Table views - in a read-heavy system, precompute JOINs already
Precompute expensive aggregations
Transparent Cache in front on DB (cacheops in Django)

CPU and GPU caches.

Consumer Systems

Two common implementations :

Message Queues (Homogeneous Consumers) - each message when gets out of the queue can be picked up by single consumer at random as all consumers are identical. For example, SQS, Redis, etc
Data Streams (Heterogeneous Consumers) - same message can be consumed by multiple consumers, at their own pace. There can be different types of consumers, elastic search, analytics, etc. For example, Kafka

Data Engineering

Data Lake is the foundational storage containing raw data, on top of which Delta Lake acts as an open format storage layer, managing metadata giving us relaibility, security and performance on the data lake - for both streaming and batch operations. It extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling.

These days, Delta Lakes is mostly used for managing data - useful for un/semi/strcuture as well as streaming data with Schema on Read - accepted even if data is not exactly aligned with schema. Data Lake also supports it whereas Data Warehouse is Schema on Write i.e the data is discarded when writing if its not aligned with schema. Also ACID transaction support is there in Delta Lake and Warehouse but Lake has minimal support. Also, the data from Delta Lake and Data Warehouse doesnt leave the system curropted (unlike Data Lake) making it reliable.

Design a distributed data lake architecture that can handle petabyte-scale data How would you build a high-performance object storage system with ACID compliance? Design a metadata management service for lakehouse operations Architecture for data compression and optimization at scale

MLOps

Why I wrote this?

So that I can ponder over these notes in my university classes before interviews as well (づ｡◕‿‿◕｡)づ