Preparing for wild interview rounds
A lot of interviews coming up! I did some DSA practice earlier along with foundational CS courses, but I was forgetting them. I just used to solve DSA problems randomly, taking my own sweet time to come up with solutions. So, just wanted to note down some important concepts to revise later. I can't solve all these problems again, they are just too much. Also, remembering stuff is important given how difficult Leetcode interviews have become in 2025! I get nightmares on how someone with an actual job will have to do these things to get an offer from another compnay.
Hilariously, most candidates can't even explain their day job if you go deep enough. Interviewing is interesting in the age of LLMs.
Data Structures and Algorithms
After doing a lot of FOSS for almost 2.5 years since freshman year, it's time that I actively start preparing for jobs. Now, DSA is the only thing that will be expected from me in my interviews, and my resume will be thrown to trash anyways (ļ½”ā¢Ģļøæā¢Ģļ½”). So lets do some Data Structures and Algorithms now. I have a lot of interviews coming up!
Generally in 40 min, I will need to solve minimum 2 algorithms questions with followups, thus recognizing patterns is crucial, I don't think anyone can problem-solve themselves out of such interviews if they haven't been exposed to such problems, until and unless they are into some Competitive Programming.
I am very very time constrained right now - doing undergraduate research in ML + internships + mentoring folks in Google Summer of Code + managing dreaded academic of IIT Kanpur + some dating (ā§Īµā¦)
So the approach I am following for optimal results is :
- Not solving random problems, brain needs to internalize patterns.
- Take a look at easy problem solutions when starting out, don't solve them. Solve only medium-hard questions on Leetcode. Difficult problems are just a mix of some easy problems with some observations involved.
- Focus on whether you are able to think intuitive steps from bruteforce to optimal. Don't waste a lot of time coding each and every problem out. Just code the difficult ones.
- Work hard on a single problem until you figure it out is a great approach for Competitive Programming given you have a lot of time. But when you are heavily time constrained, or preparing for simpler interviews, pattern recognition is the key!
And, I personally feel that practicing DSA in Rust (or your favourite programming language) is one of the best way to actually learn the programming language itself. So, not a total waste of time doing Leetcode.
DSA rounds get more in depth for experienced candidates, I don't need depth as such as I will be a fresh grad, but you know, I can't change myself :) So let's go all out!
You know I am Rusty, but you should never use Rust in DSA interviews, until and unless you either wanna be mentally harassed in the interview or even better, harass your interviewer ( ̄ā½ļæ£)
You need to be borderline masochistic to even think about doing that! Implementing Data Structures in Rust is a serious task.
General Stuff
- In Python,
self
refers to the instance of the class ā the object on which the method is called. Any variable assigned asself.some_var
becomes an instance variable/property, accessible by all methods of the class on that specific object. Variables withoutself
inside methods are local to that method and do not persist or share state across methods. - Despite their efficiency, hash collisions can degrade performance, and techniques such as chaining and open addressing are used to manage them.
- List isn't hashable becuase its mutable. So convert it to tuple as its immutable hence can be hashed.
ord('a')
converts to ASCII andchr(64)
converts back to stringmin("abc", "def")
returns lexographically smaller string- Playing with array :
my_list = [1, 2, 3]
my_list.append(4) # add one element at the end of list
print(my_list) # Output: [1, 2, 3, 4]
my_list.extend([5, 6])
# Or: my_list += [5, 6] # concatinate with another list
print(my_list) # Output: [1, 2, 3, 4, 5, 6]
my_list.insert(0, 10) # insert value at a specific index
print(my_list) # Output: [10, 1, 2, 3, 4, 5, 6]
my_list.remove(3) # remove first occurance of value
print(my_list) # Output: [10, 1, 2, 4, 5, 6]
my_list.pop(2) # remove value at specific index, default removal of last value
print(my_list) # Output: [10, 1, 4, 5, 6]
del my_list[1:3] # remove slice
print(my_list) # Output: [1, 6]
my_list.clear()
print(my_list) # Output: []
visit = set()
visit.add((r, c)) # add tuple to set, cant add list they are immutable
visit.remove((r, c))
- To remove all occurrences of an element from a list in-place (i.e., without creating a new array), the usual approach is to modify the original list by overwriting unwanted elements and then truncating its length. Python lists donāt support deleting elements efficiently inside loops without extra space.
- A stable sorting algorithm is the one that sorts the identical elements in their same order as they appear in the input, whilst unstable sorting may not satisfy the case. Merge Sort, Bubble Sort, Insertion Sort, etc are stable, whereas Heap Sort, Quick Sort, etc are unstable
Collection to be sorted: {(6, 3), (5, 5), (6, 1), (1, 3)}
Stable Sorted: {(1, 3), (5, 5), (6, 3), (6, 1)}
Regular Sorted: Either {(1, 3), (5, 5), (6, 3), (6, 1)}, or {(1, 3), (5, 5), (6, 1), (6, 3)}
arr.sort(reverse=True)
for sorting in descending order andarr.sort()
for ascending.arr.sort(key=lambda x: x[0] + x[1])
to sort array by sum.- Avoid checking if key exists and then entering using defaultdict instead of dict :
from collections import defaultdict
freq = defaultdict(int) # defaultdict(lambda: 1) for default 1 frequency instead of 0
nums = [1, 2, 2, 3, 3, 3]
for i, num in enumarate(nums):
freq[num] += 1 # No need for if-else check of key
print(dict(freq)) # {1: 1, 2: 2, 3: 3}
- Use counter for frequency counting :
from collections import Counter
nums = [5, 1, 2, 2, 3, 3, 3, 4]
freq = Counter(nums)
print(dict(freq)) # Output: {5: 1, 1: 1, 2: 2, 3: 3, 4: 1}
# Sort by key (ascending)
print(sorted(freq.items())) # [(1, 1), (2, 2), (3, 3), (4, 1), (5, 1)]
# Sort ONLY by frequency (ascending)
print(sorted(freq.items(), key=lambda x: x[1]))
# Output: [(5, 1), (1, 1), (4, 1), (2, 2), (3, 3)]
# Sort ONLY by frequency (descending)
print(sorted(freq.items(), key=lambda x: -x[1]))
# Output: [(3, 3), (2, 2), (5, 1), (1, 1), (4, 1)]
# Sort by frequency (descending), then by key (ascending)
print(sorted(freq.items(), key=lambda x: (-x[1], x[0])))
diff = ["aaa", "aa", "a"]
freq2 = Counter(diff)
print(dict(freq2)) # Output: {'aaa': 1, 'aa': 1, 'a': 1}
- Modify list in-place without re-allocating memory
nums = [1, 2, 3, 4, 5]
nums = [x for x in nums if x % 2 == 0] # Removes odd numbers
squares = [x * x for x in nums]
- Enumerate instead of manual indexing :
nums = [10, 20, 30]
for i, num in enumerate(nums):
print(i, num) # 0 10, 1 20, 2 30
- Zip to iterate multiple lists and create window :
names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 22]
for name, age in zip(names, ages):
print(name, age)
arr = [1, 3, 5, 7, 9]
windows = list(zip(arr, arr[1:], arr[2:]))
print(windows) # [(1, 3, 5), (3, 5, 7), (5, 7, 9)]
- Ceil and Floor
import math
print(math.ceil(4.7)) # Output: 5
print(math.ceil(-10.7)) # Output: -10
print(math.floor(2.3)) # Output: 2
print(math.floor(-33.89))# Output: -34
- Use
.join()
instead of+
for string concatenation for better performance
words = ["hello", "world"]
sentence = " ".join(words) # Efficient
sentence = words[0] + " " + words[1] # Slow in loops
bisect
for simple Binary Search
"""
Only for ascending order
"""
import bisect
arr = [1, 3, 5, 7, 9]
# Returns First Element >= 5 -> if target = 5 is there then Binary Search, if not then it returns the position where the target would be inserted.
target = bisect.bisect_left(arr, 5) # 2
lower_bound_index = bisect.bisect_left(arr, 6) # 3
# Returns First Element > 7
upper_bound_index = bisect.bisect_right(arr, 7) # 4
- Generate Permutations and Combinations directly
from itertools import combinations, permutations
arr = [1, 2, 3]
print(list(combinations(arr, 2))) # [(1,2), (1,3), (2,3)]
print(list(permutations(arr, 2))) # [(1,2), (1,3), (2,1), (2,3), (3,1), (3,2)]
- In queue (FIFO) elements added from back and removed from front - first element added in the list gets removed first, whereas in Stack (LIFO) elements are added to the top and removed from the top as well - last element added is removed first. We get reversed elements when we pop from the stack. So when you want to get reverse elements, dont just store in list and reverse, instead simply pop from stack.
from collections import deque
stack = []
# Push O(1)
stack.append(10)
stack.append(20)
# Pop O(1)
stack.pop() # 20
stack.pop() # 10
queue = deque()
# Enqueue elements
queue.append(1)
queue.append(2)
# Dequeue elements (removes from front)
front = queue.popleft() # O(1)
# O(1) time complexity for both append (to the right) and popleft (from the left)
heapq
for Priority Queues
"""
Min-heap has smallest element on top - use it for finding k largest, Max-heap has largest element on top - used to find k smallest.
Poping from heap of size k is log(k).
The k-th largest element is the smallest element among the top k largest elements. This means we only need to maintain k elements in our Min-Heap to efficiently determine the k-th largest element. Whenever the size of the Min-Heap exceeds k, we remove the smallest element by popping from the heap.
heapify() - Used when you have an existing unsorted list and want to convert it into a heap all at once. Runs in O(n) time. You only call this once, then you can use heappop or heappush.
heappush() - Used to insert one element into an already valid heap. Maintains heap property by reordering as needed. Runs in O(log n) time per push. If you build a heap by pushing elements one after another, no need to call heapify().
"""
import heapq
nums = [7, 2, 9, 4, 1]
heapq.heapify(nums) # Convert to heap
print(heapq.heappop(nums)) # Removes smallest element (1)
# Convert to a max-heap in-place
nums = [-num for num in nums] # Negate all values
heapq.heapify(nums) # Convert to min-heap (which acts as max-heap)
# Example list of pairs
nums = [[1, 2], [7, 3], [4, 1], [2, 5]]
k = 2
# Create an empty list to use as a heap
heap = []
# Push all elements into the heap with their sum as the priority
for pair in nums:
# tuple: (priority, actual_value)
heapq.heappush(heap, (pair[0] + pair[1], pair))
# Pop the smallest k elements from the heap
result = []
for _ in range(min(k, len(heap))):
sum_val, pair = heapq.heappop(heap)
result.append(pair)
print(result) # [[1, 2], [4, 1]] ā smallest sums
- Kth largest/smallest element The smallest element is always at the top in min-heap. Every time the heapās size exceeds k, pop the min element (smallest in heap). After all insertions, the min-heap contains the k largest elements from the array. The root of the min-heap (min_heap) is the smallest among the k largest elements, i.e., it is the k-th largest in the whole array.
import heapq
"""
Time complexity: O(klogā”n)
Space complexity: O(k)
Heapify in O(n) and we pop k times O(logn)
For a list of lists or tuples, it uses the first element of each sub-list/tuple unless specified otherwise.
"""
# O(n + k log n)
def kth_largest_heap(arr, k):
# returns a list of the k largest, sorted largest to smallest
ans = heapq.nlargest(k, arr)
return ans[-1] # the last element is kth largest
# O(n + k log n)
def kth_smallest_heap(arr, k):
# returns a list of the k largest, sorted smallest to largest
ans = heapq.nsmallest(k, arr)
return ans[-1]
data = [(2, "B"), (1, "C"), (3, "A")]
# Get 2 "smallest" by first value
out = heapq.nsmallest(2, data, key=lambda x: x[0]) # [(1, 'C'), (2, 'B')]
# Get 2 "largest" by the string, alphabetically
out2 = heapq.nlargest(2, data, key=lambda x: x[1]) # [(3, 'A'), (2, 'B')]
- If array is small in problems of finding kth smallest or largest, then its better to do it with a single loop then heap. If
first_largest = float('-inf')
second_largest = float('-inf')
third_largest = float('-inf')
for num in nums:
"""
If num is greater than first_largest: This means we've found a new largest number. We update third_largest to the previous second_largest, second_largest to the previous first_largest, and first_largest to the current num.
"""
if num > first_largest:
third_largest = second_largest
second_largest = first_largest
first_largest = num
"""
Else if num is greater than second_largest: This means num is the new second largest (but not the largest). We update third_largest to the previous second_largest and second_largest to num.
"""
elif num > second_largest:
third_largest = second_largest
second_largest = num
"""
Else if num is greater than third_largest: This means num is the new third largest (but not larger than the first or second). We update third_largest to num.
"""
elif num > third_largest:
third_largest = num
- Dequeue for effective queue and stack implementation. Insert and pop are O(1) is queue. Rolling average or moving sum (Fixed window size) and Sliding window maximum.
from collections import deque
queue = deque([1, 2, 3, 4, 5])
queue.popleft() # Removes 1 in O(1), NO shifting required!
- If characters only lowercase then store frequency in 26-size list is better than storing in hashmap (dictionary). A list of 26 integers uses less memory than a dictionary, which stores keys, values, and additional metadata. List access by index is faster than dictionary key lookup.
s = "abracadabra"
freq = [0] * 26
for ch in s:
freq[ord(ch) - ord('a')] += 1
- Recursion simply simulates future and combines the result of explored paths. Every recursion call is going to end up at base case, where it returns. So, just write the recurion call assuming that you are in base case.
@lru_cache
for memoization. Do manually if required to customize key structure and cache policies. Memoization is most effective when the problem has overlapping subproblems and you only care about the result (e.g., count, existence, or max sum), not the actual combinations.
def factorial(n, memo=None):
# Use memo dictionary to store previously computed factorials
if memo is None:
memo = {}
# Base case
if n == 0 or n == 1:
return 1
# Check memo
if n in memo:
return memo[n]
# Recursive computation with memoization
memo[n] = n * factorial(n - 1, memo)
return memo[n]
- Python doesnt have in-built Ordered Map (TreeMap and BTreeMap), but its used for keeping the keys in sorted order in key-value store, i.e store frequency of sorted elements directly. This allows for efficient
(O(log n))
insertion, deletion, and lookup operations, as well as range queries. - Priority queue (implemented using a min-heap or max-heap) achieve
O(log n)
time complexity for insertion (add element to the heap and perform heapify-up) and deletion (remove the root (min/max), replace with the last element, and heapify-down) operations, along withO(1)
access to the highest-priority element. However, arbitrary element lookup/search and then removing remainsO(n)
due to the heap's structure. Prefer this for contiguous arrays. - When using
math.pow(base, exponent)
the time complexity is O(log exponenet). Incase this is causing overflow (expoenents can be huge), resort to using a simple for loop to calculate the power only till its required to compare. - Use bit for faster even/odd :
def is_odd(x):
return (x & 1) == 1 # Odd if last bit is 1
def is_even(x):
return (x & 1) == 0 # Even if last bit is 0
Now once we know how to exploit Python properly for our interviews, lets jump into the waters of actual problem solving.
There are too many questions, but the concepts are limited, so crux of problem solving would be :
- always write examples to understand the problem
- break the problem into simpler versions
- exploit the defining features of the problem
- seek symmetry and patterns among problems
- dry run for intutions
Always do some examples to understand the problem well. Never forget Edge Cases:
- Empty input
- Duplicate elements
- Zero/Negative numbers
- Overflow cases
- Constraints on inputs and outputs
- Intendation Error
Things to notice :
Constraints
- If the constraints are n <= 20, then Brute force approaches are viable. Backtracking and recursion leading to exponential time complexity (2^n, n!) is acceptable.
- If n lies between 10^3 and 10^6, then most probably O(nlogn) or O(n) solution is requied. Try Heaps, Pointers, Greedy or Dynamic Programming
- If n >= 10^7, only O(logn) or O(1) solution is vaible.
Output Format
- List of Lists (combinations, subsets, paths) - Backtracking is almost always the answer. Generate all possibilities. Use recursion with choice/no-choice pattern
- Single Number (max/min profit, cost, ways, jumps) - Dynamic Programming for optimization. Greedy for local optimal choices.
- Modified Array/String (in-place operations) - Two Pointers for in-place modifications
Patterns
- If sorted array is given, Binary Search, Pointers, Greedy.
- Nearest smaller/larger element problems, next greater/smaller element queries -> Monotonic Stack
- BFS for level based traversal (shortest path) whereas DFS for recursive exploration. When you want to find the shortest distance/time from any one of several starting nodes to other nodes, use multi-source BFS. Problems where multiple initial points influence the state simultaneously (e.g., infection spreading, water flooding).
- Topological Sort for dependency and Union Find for group finding
- Dynamic Programming Keywords - "Number of ways", "Maximum/minimum" + "sum/profit/cost", "Can you reach", "Longest/shortest subsequence", "Optimal" or "best"
- Two Pointers Keywords - "Palindrome", "Sorted array", "Target sum", "Remove duplicates"
- Sliding Window - "Substring" with conditions, "Subarray" with fixed/variable size, "Maximum/minimum window", "Contains all"
- Heap Keywords - "K largest" or "K smallest", "Top K elements", "Median", "Priority"
- Stack Keywords - "Nested structure", "Undo operations"
- Binary Search - "Search in sorted", "Minimize maximum", "First/last occurrence"
- We need to have left and right boundary (min and max) for Binary Search along with a property to cancel elements before/after mid. One such property is of sorting, but it can be any
check()
given its monotonic. The main idea is to alter the problem so the answer is not simply a matter of picking the largest-values-first (greedy), but of searching for a threshold or minimal/optimal value that cannot be directly achieved by sequentially picking values.
Every problem listed is unique and teaches you something. Lets gooo....
Arrays
- XOR to find unique element and duplicates
nums = [1, 2, 2, 3, 3]
unique = 0
for num in nums:
unique ^= num # XOR accumulates and cancels out duplicates
print(unique) # Output: 1
- Rotation of array and Reverse after m positions
def rotate(arr, k):
k %= len(arr) # Handle cases where k > len(arr)
return arr[-k:] + arr[:-k] # Move last k elements to the front
return arr[k:] + arr[:k] # Move first k elements to the back
def reverse_after_m(arr, m):
arr[m+1:] = arr[m+1:][::-1] # Reverse portion after m
return arr
- A subarray/substring is a contiguous part of an array, meaning all elements are taken in sequence and without skipping any in between. A subsequence is a sequence that can be derived from the array by deleting some (or no) elements without changing the order of the remaining elements. Elements need not be contiguous. A subset is any possible combination of the original elements, regardless of order or contiguity. Generating subarry by bruteforce is O(n^2) and subset and subsequence is O(nā 2^n) with space O(n).
- When finding target sum, if array is not sorted, it's better to use hashmap to store the index-value pair and then do a o(n) search on array with o(1) lookup in hashmap to get the target sum. But if the array is sorted, two-pointer should be used as the sum can be increased/decreased to match the target sum (move i forward to increase the sum, move j back to decrease the sum according to the target).
- Prefix Sum to find sum between specific indices in array. Useful for subarray sum and continuity (contiguous) is maintained.
"""
prefix[i] stores the sum of all elements from index 0 to i-1.
prefix = 0 represents the sum before any elements.
This lets you get the sum of any subarray arr[l:r] using:
sum(l,r) = prefix[r+1] ā prefix[l]
This reduces querying subarray sums from O(n) to O(1) in time, using O(n) in space for prefix sum array
"""
def prefix_sum(self, arr):
n = len(arr)
prefix = [0] * (n + 1) # prefix sum array has added element as sum of all numbers at last
for i in range(n): # iterate till n - 1
prefix[i+1] = prefix[i] + arr[i]
return prefix
# iterating across all l and r is O(n^2)
def range_sum(self, prefix, l, r):
return prefix[r+1] - prefix[l]
# instead find target - l from map of prefix sums
count = 0
prefix = self.prefix_sum(nums)
prefix_counts = defaultdict(int)
prefix_counts[0] = 1 # sum = 0 before starting {0 : 1}
for i in range(1, len(prefix)):
curr_sum = prefix[i]
needed = curr_sum - sum(l, r)
count += prefix_counts.get(needed, 0)
prefix_counts[curr_sum] += 1
return count
- Difference Array technique allows us to update a range in array but just changing 2 elements. To increase the value in range [a, b] by x, increase the value at position a by x, and b+1 by -x, and then prefix sum over those indices. 2D Difference Array is quite useful in Image processing.
def apply_range_updates(array_length, updates):
"""
O(1) per update
"""
# Step 1: Create a difference array of size (n + 1) initialized to 0
diff_array = [0] * (array_length + 1)
# Step 2: Apply each update in O(1) using the diff array
for start, end, increment in updates:
diff_array[start] += increment
if end + 1 < array_length:
diff_array[end + 1] -= increment
# Step 3: Convert the diff array to the final result using prefix sum
final_array = [0] * array_length
final_array[0] = diff_array[0]
for i in range(1, array_length):
final_array[i] = final_array[i - 1] + diff_array[i]
return final_array
- Isomorphic Strings
def is_isomorphic_single_direction(s: str, t: str) -> bool:
"""
Checks if string s can be transformed into string t following the
isomorphic string rules (one-to-many mapping, preserving order).
"""
if len(s) != len(t):
return False
s_to_t_map = {}
t_to_s_map = {}
for i in range(len(s)):
char_s = s[i]
char_t = t[i]
# Check mapping from s to t
if char_s in s_to_t_map:
if s_to_t_map[char_s] != char_t:
return False
else:
if char_t in t_to_s_map:
return False # t_char is already mapped to another s_char
s_to_t_map[char_s] = char_t
t_to_s_map[char_t] = char_s
return True
def is_isomorphic(s: str, t: str) -> bool:
"""
Determines if two strings s and t are isomorphic by checking the
isomorphism in both directions.
"""
return is_isomorphic_single_direction(s, t) and is_isomorphic_single_direction(t, s)
- Longest Consecutive Subsequence
def longestConsecutive(self, nums: List[int]) -> int:
numSet = set(nums) # no need to worry about duplicates in between, we are hopping
longest = 0
for num in numSet: # O(1)
if (num - 1) not in numSet:
length = 1
while (num + length) in numSet: # consecutives are in set
length += 1
longest = max(length, longest)
return longest
- Next Permutation
def nextPermutation(self, nums: List[int]) -> None:
n = len(nums)
i = n - 2
# Find the pivot: first index where nums[i] < nums[i+1]
while i >= 0 and nums[i] >= nums[i + 1]:
i -= 1
# If pivot found (i >= 0), find successor to swap
if i >= 0:
j = n - 1
# Find smallest element greater than nums[i] from the end
while j > i and nums[j] <= nums[i]:
j -= 1
# Swap pivot with successor
nums[i], nums[j] = nums[j], nums[i]
# Reverse the suffix starting from i+1 to end
left, right = i + 1, n - 1
while left < right:
nums[left], nums[right] = nums[right], nums[left]
left += 1
right -= 1
- Count Longest Subarray with Sum = k
"""
In case of having integers this solution is valid, but incase of whole numbers we can further optimize by using greedy + two pointers exploiting the monotonicity. When all elements are positive, increasing the end pointer (right) of a window will always increase the sum, and moving the start pointer (left) forward will always decrease the sum. With negative numbers or zeros, increasing the window size can decrease or not change the sum, so the sliding window property breaks.
In both cases, time complexity is O(n)
"""
def longest_subarray_sum_k(arr, k):
# Dictionary to store the earliest index of each prefix sum
prefix_sum_indices = {0: -1}
prefix_sum = 0
max_length = 0 # Length of the longest subarray with sum = k
count = 0 # Number of longest subarrays with sum = k
for i, num in enumerate(arr):
prefix_sum += num
# Check if there is a prefix sum that, when subtracted, gives sum = k
if prefix_sum - k in prefix_sum_indices:
length = i - prefix_sum_indices[prefix_sum - k]
# If found a longer subarray, update max_length and reset count
if length > max_length:
max_length = length
count = 1
# If found another subarray of same max_length, increment count
elif length == max_length:
count += 1
# Only store the earliest index for each prefix sum to maximize subarray length
if prefix_sum not in prefix_sum_indices:
prefix_sum_indices[prefix_sum] = i
return max_length, count
- Maximum Subarray sum
def kadane(arr):
"""Maximum subarray sum"""
max_current = max_global = arr[0]
for i in range(1, len(arr)):
max_current = max(arr[i], max_current + arr[i])
max_global = max(max_global, max_current)
return max_global
- Sliding Window Maximum
def max_sliding_window(nums: list[int], k: int) -> list[int]:
"""Maximum in sliding window using deque"""
if not nums:
return []
result = []
window = deque()
for i in range(len(nums)):
while window and window[0] < i - k + 1:
window.popleft()
while window and nums[window[-1]] < nums[i]:
window.pop()
window.append(i)
if i >= k - 1:
result.append(nums[window[0]])
return result
- Quick and Merge Sort are O(nlogn) but if finite elements are there, then use Bucket Sort to sort in O(n) time and O(1) space in-place. When you have a large dataset and want to distribute the sorting workload, its useful. Sorting in memory is faster than sort on disk. However if you have more data than will fit in memory you need another option. What you can do is a bucket sort, where the buckets are small enough to fit into memory. i.e. there is a large number of entries in each bucket. These you can quick sort individually.
- Top K Frequent Elements
"""
O(n) Space and Time
Bucket sort is an algorithm which involves grouping numbers based on their frequency. Use the bucket sort algorithm to create n buckets, grouping numbers based on their frequencies from 1 to n. Then, pick the top k numbers from the buckets, starting from n down to 1.
Heap solution is better than sorting as it will O(klogn) compared to O(nlogn) but have O(n+k) space compared to O(n).
"""
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
count = {}
freq = freq = [[] for i in range(len(nums) + 1)] # finite upto lenght of array
for num in nums:
count[num] = 1 + count.get(num, 0)
for num, cnt in count.items():
# count is index for array of nums there!
freq[cnt].append(num)
res = []
for i in range(len(freq) - 1, 0, -1): # reverse traversal for top k
for num in freq[i]:
res.append(num)
if len(res) == k:
return res
- Starting the pointer from back is benefitial if you are trying to merge things with any condition (like non-decreasing).
Strings
- Minimum Changes To Make Alternating Binary String
"""
There are two valid alternating patterns for any string of length n:
Starting with '0': e.g. "010101..."
Starting with '1': e.g. "101010..."
"""
def minOperations(self, s: str) -> int:
flips_start_with_0 = 0
flips_start_with_1 = 0
for i, ch in enumerate(s):
expected_0 = '0' if i % 2 == 0 else '1'
expected_1 = '1' if i % 2 == 0 else '0'
if ch != expected_0:
flips_start_with_0 += 1
if ch != expected_1:
flips_start_with_1 += 1
return min(flips_start_with_0, flips_start_with_1)
- Longest Substring without Repeating Character
"""
Time: O(n) ā each character is visited at most twice
Space: O(min(n, m)) ā where m is the alphabet size (26 when only lowercase)
"""
def lengthOfLongestSubstring(self, s: str) -> int:
char_index_map = {} # stores character ā last seen index
start = 0 # start index of current window
max_length = 0 # result
for end, char in enumerate(s):
# If the character is already seen and is in the current window
if char in char_index_map and char_index_map[char] >= start:
start = char_index_map[char] + 1 # shrink window from left
char_index_map[char] = end # update last seen index
max_length = max(max_length, end - start + 1)
return max_length
- Find Anagram or Permutation or similar in String
"""
O(n+m) Time and O(1) space
If given that the input consists only of lowercase English letters then frequency counting tuple is most optimal. Otherwise just simply append tuple in hashmap.
"""
from typing import List
class Solution:
def findAnagrams(self, s: str, p: str) -> List[int]:
n, m = len(s), len(p)
if m > n:
return []
res = []
# a-z has 26 characters
pCount = [0] * 26
sCount = [0] * 26
# Build frequency counts for the first window and target
for i in range(m):
pCount[ord(p[i]) - ord('a')] += 1
sCount[ord(s[i]) - ord('a')] += 1
# Track how many characters match in frequency
matches = sum(pCount[i] == sCount[i] for i in range(26))
# Initial match
if matches == 26:
res.append(0)
for i in range(m, n):
left = ord(s[i - m]) - ord('a')
right = ord(s[i]) - ord('a')
# Remove the outgoing character
if sCount[left] == pCount[left]:
matches -= 1
sCount[left] -= 1
if sCount[left] == pCount[left]:
matches += 1
# Add the new character
if sCount[right] == pCount[right]:
matches -= 1
sCount[right] += 1
if sCount[right] == pCount[right]:
matches += 1
if matches == 26:
res.append(i - m + 1)
return res
# Grouping the anagram
d = defaultdict(findAnagrams())
for s in strs: # O(N)
d[tuple(sorted(s))].append(s)
return d.values()
- Maximum Score After Splitting a String
"""
Score = zero_in_left + (total_ones - ones_in_left)
Time - O(N) in one pass through array
"""
def maxScore(self, s: str) -> int:
# Initialize left and right counts
left_zeros = 1 if s[0] == '0' else 0
right_ones = s[1:].count('1')
max_score = left_zeros + right_ones
for i in range(1, len(s) - 1):
if s[i] == '0':
left_zeros += 1
else:
right_ones -= 1
max_score = max(max_score, left_zeros + right_ones)
return max_score
- String Compression
"""
O(n) Time and O(1) Space - in-place modification
"a" ā just "a" (not "a1")
"a","a" ā "a","2"
"a","b" ā "a","b" (no numbers needed)
"""
from typing import List
class Solution:
def compress(self, chars: List[str]) -> int:
write = 0 # Where we write next
left = 0 # Start of current group
for right in range(len(chars) + 1): # Extra step to flush last group
# Check group break (or end of list)
if right == len(chars) or chars[right] != chars[left]:
count = right - left
chars[write] = chars[left]
write += 1
if count > 1:
for digit in str(count):
chars[write] = digit
write += 1
left = right # Move to start of next group
return write
- Next Smallest Palindrome
"""
Time Complexity - O(N)
Lots of edge cases :
All 9s ā e.g., 999 ā 1001
Mirroring creates a smaller number ā e.g., 12932 ā mirror ā 12921 < 12932, so you must increment and mirror again
Odd vs Even Length
Middle Carry Propagation when incrementing e.g., 1991 ā mirror = 1991 < 1991, so increment ā 20 ā 2002
"""
def next_palindrome(num: str) -> str:
n = len(num)
half = (n) // 2
is_odd = (n % 2 != 0)
# Split into left, middle (if odd), right
left = num[:half]
middle = num[half] if is_odd else ''
right = num[half+1:] if is_odd else num[half:]
# Mirror left + middle to create a palindrome
if is_odd:
candidate = left + middle + left[::-1]
else:
candidate = left + left[::-1]
if candidate > num:
return candidate
# If not, we need to increment (left + middle)
if is_odd:
incremented = str(int(left + middle) + 1)
new_left = incremented[:-1]
new_middle = incremented[-1]
return new_left + new_middle + new_left[::-1]
else:
incremented = str(int(left) + 1)
return incremented + incremented[::-1]
- Flip String to Monotonic Increasing
"""
We want to decide a split point in the string such that:
All characters to the left are 0
All characters to the right are 1
We try every split point i (between characters):
Count number of 1s before i (that need to be flipped to 0)
Count number of 0s after i (that need to be flipped to 1)
The total flips = ones_before + zeros_after
Time Complexity: O(n)
Space Complexity: O(n) (due to the prefix array)
"""
"""
|00110 ā 0 flips left, flip 3 zeros to 1 ā 3
0|0110 ā 0 1s left, 2 zeros right ā 2
00|110 ā 0 1s left, 1 zero right ā 1
001|10 ā 1 1s left, 1 zero right ā 2
0011|0 ā 1 1s left, 1 zero right ā 2
00110| ā 2 1s left, 0 zero right ā 2
"""
def minFlipsMonoIncr(self, s: str) -> int:
n = len(s)
# Compute prefix sum of 1s
ones_prefix = [0] * (n + 1)
for i in range(n):
ones_prefix[i + 1] = ones_prefix[i] + (1 if s[i] == '1' else 0)
min_flips = float('inf')
for i in range(n + 1):
# Left: 0..i-1 ā should be 0 ā flip all 1s in that region
ones_left = ones_prefix[i]
# Right: i..n-1 ā should be 1 ā flip all 0s in that region
zeros_right = (n - i) - (ones_prefix[n] - ones_prefix[i])
min_flips = min(min_flips, ones_left + zeros_right)
return min_flips
LinkedList
- In typical bruteforce solutions of LL, when we want to find lenght of LL :
"""
O(N) Time
"""
def length_of_linked_list(head: Optional[ListNode]):
"""
Although nodes is just a Python list, each element in nodes is still a reference to a node in the original linked list.
"""
nodes = []
current = head
while current is not None:
nodes.append(current)
current = current.next
return len(nodes)
# if we want to remove from back in bruteforce way
removeIndex = len(nodes) - n
if removeIndex == 0:
return head.next
# directly operate on nodes array as its reference to LL
nodes[removeIndex - 1].next = nodes[removeIndex].next
return head
- If we are working on the problems where we are asked to remove head node only and there are no other nodes in the linkedlist the dummy node can handle this case and our logic will work fine else we will need to add extra if conditions to handle this special case. Beware of
.next.next
giving null pointer exception. - Reverse Sublist
"""
Time - O(n) and Space - O(1)
"""
class Solution:
def reverseBetween(self, head: Optional[ListNode], left: int, right: int) -> Optional[ListNode]:
if not head or left == right:
return head
dummy = ListNode(0, head)
prev, head = dummy, dummy.next
# Move `prev` to node before `left`
for _ in range(left - 1):
prev = prev.next
# `start` is the first node to reverse
start = prev.next
then = start.next
# Reverse the sublist using head-insertion
for _ in range(right - left):
start.next = then.next
then.next = prev.next
prev.next = then
then = start.next
return dummy.next
- Loop Detection and Removal from LinkedList (Fast and Slow pointers)
def detect_and_remove_cycle(head: ListNode) -> ListNode:
if not head or not head.next:
return head
# Step 1: Detect loop
slow = fast = head
has_cycle = False
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow == fast:
has_cycle = True
break
if not has_cycle:
return head # No loop detected
# Step 2: Find the start of the loop
slow = head
while slow != fast:
slow = slow.next
fast = fast.next
# slow (or fast) is now at the start of the loop
# Step 3: Find the last node in the cycle and break it
ptr = slow
while ptr.next != slow:
ptr = ptr.next
ptr.next = None # Break the loop
return head
Monotonic Stack
- Next/Previous Greater/Smaller Element
def next_greater_element(nums):
n = len(nums)
res = [-1] * n # Initialize result with -1 (no greater element)
stack = [] # Will store indices in decreasing order of nums values
for i in range(n):
# Pop from stack while current element is greater than element at stack top
while stack and nums[i] > nums[stack[-1]]:
idx = stack.pop()
res[idx] = nums[i] # Found next greater element for idx
stack.append(i)
return res
def next_smaller_element(nums):
n = len(nums)
res = [-1] * n # Initialize with -1 (no smaller element)
stack = [] # Will store indices in increasing order of nums values
for i in range(n):
while stack and nums[i] < nums[stack[-1]]:
idx = stack.pop()
res[idx] = nums[i] # Found next smaller element
stack.append(i)
return res
def previous_greater_element(nums):
"""
For each element, find the nearest element to the LEFT that is GREATER.
Returns a list where res[i] is the previous greater element of nums[i],
or -1 if no such element exists.
"""
n = len(nums)
res = [-1] * n # Initialize result array with -1 (means no previous greater)
stack = [] # Stack to keep indices of elements, maintaining strictly decreasing order
for i in range(n):
# Pop elements from the stack while current element is >= element at stack top
# because they cannot be previous greater for future elements
while stack and nums[i] >= nums[stack[-1]]:
stack.pop()
# If stack is not empty, the top is the index of previous greater element
res[i] = nums[stack[-1]] if stack else -1
# Push current index to stack to possibly be previous greater for future elements
stack.append(i)
return res
def previous_smaller_element(nums):
"""
For each element, find the nearest element to the LEFT that is SMALLER.
Returns a list where res[i] is the previous smaller element of nums[i],
or -1 if no such element exists.
"""
n = len(nums)
res = [-1] * n # Initialize result array with -1 (means no previous smaller)
stack = [] # Stack to keep indices of elements, maintaining strictly increasing order
for i in range(n):
# Pop elements from stack while current element is <= element at top of stack
# because they cannot be previous smaller for future elements
while stack and nums[i] <= nums[stack[-1]]:
stack.pop()
# If stack not empty, top of stack is index of previous smaller element
res[i] = nums[stack[-1]] if stack else -1
# Push current index for future elements to compare
stack.append(i)
return res
- In problems like Valid Parenthesis, you can simple take
(
as +1 and)
as -1, and then if the end summation is 0, then its valid. If anytime it becomes -1, then we can break then an there as now there is a)
without(
. Not necessary to have a stack.
def is_valid_parentheses(s: str) -> bool:
open_count = 0 # Tracks the net number of unmatched '('
for char in s:
if char == '(':
open_count += 1
elif char == ')':
open_count -= 1
# If at any point open_count is negative, a ')' is unmatched
if open_count < 0:
return False
# At the end, all '(' must be matched by ')'
return open_count == 0
- Expression Evaluation
def evaluate(expression: str) -> int:
def calculate(tokens):
stack = []
num = 0
sign = '+' # Default sign is '+'
while tokens:
token = tokens.pop(0) # Read one character at a time
if token.isdigit(): # Build multi-digit number
num = num * 10 + int(token)
if token in "+-*/" or not tokens: # End of number OR end of expression
if sign == '+':
stack.append(num)
elif sign == '-':
stack.append(-num)
elif sign == '*':
stack.append(stack.pop() * num)
elif sign == '/':
stack.append(int(stack.pop() / num)) # Python's `/` is float, so use `int()`
sign = token # Update the operator
num = 0 # Reset the number
elif token == '(': # Solve sub-expression inside parentheses
num = calculate(tokens)
elif token == ')': # Return computed result if closing parenthesis is found
break
return sum(stack) # Sum all elements left in the stack
return calculate(list(expression.replace(" ", ""))) # Convert expression to list (char by char)
Binary Search
Binary search is used when you need to repeatedly find a value in a sorted array without worrying about removals, or when each query is independent. Apply whenever there is a bounded range and we want to find a given target or min/max (by reducing the search space via lograthmic partitioning of the monotonic answer function's range).
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = left + (right - left) // 2 # Avoids overflow
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1 # Target not found
- The lower bound of target is the index of the first element in arr that is greater than or equal to target. The upper bound of target is the index of the first element in arr that is strictly greater than target. If target is not present, it's the index where target would be inserted to maintain the sorted order. In sorted array, first occurance is
lower bound
and last occurance isupper bound - 1
. Total number of occurance of an element in sorted array is(last occurance - first occurance) + 1
=(upper bound - lower bound)
.
def lower_bound(array, target):
"""
Returns the smallest index l such that array[l] >= target and array[right] < target
If all elements are < target, returns len(array).
"""
left, right = 0, len(array) # allows the solution to return len(arr) when the target is greater than all elements in the array (i.e., the insertion point is at the end) and 0 if less than all elements in array
while left < right:
mid = (left + right) // 2
# If mid element is less than target, move left up
if array[mid] < target:
left = mid + 1
else:
# mid is a candidate, but keep searching left
right = mid
# Now, left == right is the first index where array[index] >= target
# If all elements in the array are less than or equal to target, the search will finish with left == len(array).
return left
def upper_bound(array, target):
"""
Returns the smallest index r such that array[r] > target and array[left] <= target.
If all elements are <= target, returns len(array).
To get the index of the last element <= target, return upper_bound(array, target) - 1
"""
left, right = 0, len(array)
while left < right:
mid = (left + right) // 2
if array[mid] <= target:
# mid is too small or just right, move left up
left = mid + 1
else:
# mid is a candidate, move right down
right = mid
# left == right is the first index where array[index] > target
# If all elements in the array are less than or equal to target, the search will finish with left == len(array).
return left
- In binary search on answers, the answer space has to be sorted contiguous array. The check function returns array like [false, false, false, true, true, true...] and the first true is what we are interested in, the values moving in a monotonic way i.e lower bound.
def binary_search_answer(space):
left, right = 0, len(space)
while left < right:
mid = (left + right) // 2
if check(space[mid]):
right = mid # mid might be the first True, so keep it
else:
left = mid + 1 # mid is False, need larger
return left # index of first True (if any)
- Longest Common Prefix
"""
Approach Time Complexity Space Complexity Suitable For
Brute Force (Vertical Scanning) O(N * M) O(1) Small Inputs
Binary Search O(N log M) O(1) Medium Inputs
Divide & Conquer O(N * M) O(log N) Recursion-friendly cases
Trie (Prefix Tree) O(N * M) O(N * M) Large Inputs
"""
def is_common_prefix(strs, length):
prefix = strs[0][:length]
return all(s.startswith(prefix) for s in strs)
def longest_common_prefix(strs):
if not strs:
return ""
left, right = 0, min(map(len, strs))
while left < right:
mid = (left + right + 1) // 2
if is_common_prefix(strs, mid):
left = mid # Move right (increase prefix size)
else:
right = mid - 1 # Move left (decrease prefix size)
return strs[0][:left]
- Search in Rotated Sorted Array
def searchInRotatedArray(self, arr: List[int]) -> int:
left, right = 0, len(arr) - 1
while left < right: # if they are equal, we will be stuck in a loop
mid = (left + right) // 2
# as both parts are sorted individually, the minimum value will occur when arr[left] > arr[right]
if arr[mid] < arr[right]:
# we are in the increasing part ā move right
left = mid + 1
else: # when mid value is smaller than right
# we are in the decreasing part ā move left
right = mid
return right # or left; both point to the peak, left is maximum and right is minimum
Now, from the [0, pivot_index - 1] and [pivot_index, end] apply binary search to both to get the target in rotated array. Also pivot_index is the number of times the array has been rotated.
- Minimum Size Subarray Sum
"""
Prefix + Binary Search is O(nlogn). This is best we can do incase of +ve, -ve, 0 in array. Prefix sums allow you to compute the sum of any subarray efficiently. Convert the problem of minimal subarray length with sum ā„ target into finding the smallest j such that prefixSum[j] - prefixSum[i] >= target for each i.
"""
def minSubArrayLen(self, target: int, nums: List[int]) -> int:
n = len(nums)
prefix_sums = [0] * (n + 1)
# Compute prefix sums
for i in range(n):
prefix_sums[i + 1] = prefix_sums[i] + nums[i]
min_length = n + 1 # this length cannot be the answer, so min() with it from any subarray will be fine
for start in range(n):
left, right = start, n
# Binary search to find smallest end index where subarray sum >= target (lower bound)
while left < right:
mid = (left + right) // 2
current_sum = prefix_sums[mid + 1] - prefix_sums[start]
if current_sum >= target:
right = mid
else:
left = mid + 1
if left != n:
min_length = min(min_length, left - start + 1)
# If no valid subarray found, return 0
return 0 if min_length == n + 1 else min_length
"""
Sliding Window is exploiting the fact that we have positive integers only in the array. The sliding window technique relies on the property that adding more elements to the window does not decrease the sum (because all are positive). So, if the subarray sum >= k is found, the right pointer can be increase till end of array. But we need minimum, so no need for that! Instead we move the left pointer to find another subarray. We want to shrink the window as soon as we see a valid subarray.
"""
def minSubArrayLen(self, target: int, nums: List[int]) -> int:
left, sum = 0, 0
count = float("inf")
"""
Just because we are using 2 loops dosent make it O(n^2).
The outer for right in range(len(nums)) loop runs n times. The inner while sum >= target: loop moves the left pointer forward. Critically, left pointer only moves forward throughout the entire algorithm (never back). Both left and right pointers together move at most 2n steps (right goes from 0 to n-1; left only moves forward). Each element is visited at most twice: once when right includes it and once when left excludes it.
"""
for right in range(len(nums)):
sum += nums[right]
while sum >= target:
count = min(right - left + 1, count)
sum -= nums[left]
left += 1
return 0 if count == float("inf") else count
- Koko Eating Banana
def minEatingSpeed(self, piles: list[int], h: int) -> int:
"""
Finds the minimum eating speed (k) such that Koko can eat all bananas
within h hours.
Args:
piles: A list of integers representing the number of bananas in each pile.
h: The maximum number of hours Koko has to eat all bananas.
Returns:
The minimum integer k (eating speed).
"""
# The search space for 'k'
# The minimum possible speed is 1 (Koko must eat at least 1 banana per hour).
# The maximum possible speed is the largest pile size. If Koko eats at this speed,
# she'll finish each pile in 1 hour, which is the fastest possible.
low = 1
high = max(piles)
# This will store the minimum valid 'k' found so far.
# Initialize with 'high' because we know 'high' is always a valid speed,
# but we want to find the minimum.
min_k = high
# Binary search for the minimum k
while low <= high:
mid = (low + high) // 2 # Calculate the middle speed
# Calculate the total time taken to eat all bananas at 'mid' speed
total_time = 0
for pile in piles:
# For each pile, the time taken is ceil(pile / mid).
# math.ceil(a / b) calculates the smallest integer greater than or equal to a/b.
# This correctly accounts for cases where pile < mid (eats all in 1 hour)
# or pile is a multiple of mid (exact division) or not (needs an extra hour).
total_time += math.ceil(pile / mid)
# Check if Koko can eat all bananas within 'h' hours at 'mid' speed
if total_time <= h:
# If she can, 'mid' is a possible answer.
# We try to find an even smaller 'k' by searching in the lower half.
min_k = mid
high = mid - 1
else:
# If she cannot, 'mid' is too slow.
# We need a faster speed, so search in the upper half.
low = mid + 1
return min_k
- Most Beautiful Item for Each Query
"""
Binary Search + Prefix Sum for range queries
"""
def maximumBeauty(self, items, queries):
# Step 1: Sort items by price
items.sort()
# Step 2: Build prefix max array for beauty
prices = []
prefix_max_beauty = []
max_beauty = 0
for price, beauty in items:
# Only keep the best beauty seen so far for each price
max_beauty = max(max_beauty, beauty)
if prices and prices[-1] == price:
# skip duplicate prices, keep max
prefix_max_beauty[-1] = max_beauty
else:
prices.append(price)
prefix_max_beauty.append(max_beauty)
# Step 3: Binary search for each query
res = []
for q in queries:
# Find the rightmost item with price <= q
idx = bisect_right(prices, q) - 1
if idx >= 0:
res.append(prefix_max_beauty[idx])
else:
res.append(0)
return res
- Find in Mountain Array
# """
# This is MountainArray's API interface.
# You should not implement it, or speculate about its implementation
# """
#class MountainArray:
# def get(self, index: int) -> int:
# def length(self) -> int:
class Solution:
def findInMountainArray(self, target: int, mountainArr: 'MountainArray') -> int:
n = mountainArr.length()
# Step 1: Find peak index of the mountain array using binary search
left, right = 1, n - 2 # peak cannot be first or last element
while left <= right:
mid = (left + right) // 2
val_left = mountainArr.get(mid - 1)
val_mid = mountainArr.get(mid)
val_right = mountainArr.get(mid + 1)
# If slope is ascending: move right to find peak
if val_left < val_mid < val_right:
left = mid + 1
# If slope is descending: move left to find peak
elif val_left > val_mid > val_right:
right = mid - 1
else:
# Found peak point where array changes from increasing to decreasing
break
peak_index = mid
# Step 2: Binary search on ascending part from start to peak
left, right = 0, peak_index
while left <= right:
mid = (left + right) // 2
val = mountainArr.get(mid)
if val == target:
return mid
elif val < target:
left = mid + 1
else:
right = mid - 1
# Step 3: Binary search on descending part from peak+1 to end
left, right = peak_index + 1, n - 1
while left <= right:
mid = (left + right) // 2
val = mountainArr.get(mid)
if val == target:
return mid
elif val > target:
left = mid + 1
else:
right = mid - 1
# If target is not found in either part
return -1
Recursion and Backtracking
- Lexographical Numbers
def lexicalOrder(n: int) -> List[int]:
result = []
def dfs(curr):
if curr > n:
return
result.append(curr)
for i in range(10):
next_num = curr * 10 + i
if next_num > n:
break # Prune this branch
dfs(next_num)
for i in range(1, 10):
dfs(i)
return result
- Palindrome Partioning
def partition(s):
"""
Returns all possible palindrome partitionings of string s.
Each partition is a list of substrings, all of which are
palindromes.
At each recursion, you are not deciding for each character whether to include it or not. Instead, you are choosing every possible end index for a substring starting at the current position, and only recursing on those substrings that are palindromes. This is a multi-way branching at each step, not binary.
Time and Space Complexity - O(nĆ2^n) as each partitioning operation is O(n), and there are 2^nā1 partitions
"""
from typing import List
class Solution:
def partition(self, s: str) -> List[List[str]]:
result = []
substring = []
self.backtrack(result, 0, substring, s)
return result # Return all palindrome partitions collected
def backtrack(self, result: List[List[str]], start: int, substring: List[str], s: str) -> None:
# If we reached the end, add a copy of current partition to the result
if start == len(s):
result.append(substring[:]) # add copy of substring
return
# Try all possible partitions starting at 'start'
for end in range(start, len(s)):
if self.isPalindrome(s, start, end):
# Append current palindrome substring
substring.append(s[start:end+1])
# Recurse for the remaining substring
self.backtrack(result, end + 1, substring, s)
# Backtrack: remove last substring and try next
substring.pop()
def isPalindrome(self, s: str, left: int, right: int) -> bool:
# Check palindrome between indices left and right inclusive
while left < right:
if s[left] != s[right]:
return False
left += 1
right -= 1
return True
- Word Search
def exist(board, word):
def dfs(i, j, k):
if k == len(word):
return True
if i < 0 or i >= len(board) or j < 0 or j >= len(board[0]) or board[i][j] != word[k]:
return False
temp = board[i][j]
board[i][j] = '#' # Mark as visited
# Explore neighbors
if (dfs(i + 1, j, k + 1) or (dfs(i - 1, j, k + 1)) or (dfs(i, j + 1, k + 1)) or (dfs(i, j - 1, k + 1)):
return True
board[i][j] = temp # Backtrack
return False
for i in range(len(board)):
for j in range(len(board[0])):
if dfs(i, j, 0):
return True
return False
- Letter combinations of a phone number
"""
Approach Time Space Notes
Backtracking (DFS) O(4^N) O(N) Efficient, recursive
Iterative (BFS with Queue) O(4^N) O(4^N) Uses queue for level-wise expansion
"""
from typing import List
def letterCombinations(digits: str) -> List[str]:
if not digits:
return []
phone_map = {
"2": "abc", "3": "def", "4": "ghi", "5": "jkl",
"6": "mno", "7": "pqrs", "8": "tuv", "9": "wxyz"
}
res = []
def backtrack(index, path):
if index == len(digits):
res.append("".join(path))
return
for char in phone_map[digits[index]]:
path.append(char)
backtrack(index + 1, path)
path.pop() # Backtrack to previous state
backtrack(0, [])
return res
- Partition to K Equal Sum Subsets
"""
We can generate all subsets such that, each element falls into one of k buckets - each has k options, such that their sum is same = sum(nums)/k. Such decision tree would be of height n, so O(k^n) - pretty bad!
Now we dont really need to put the elements in summation buckets, we can just recurse each element and if along the recursion the sum matches, store it until we have k such subsets stored. That would be O(k*2^n) as we will have k such decision trees! Also, we have to track what numbers we used to not use it again.
Interestingly, with right pruning, its 30ms solution!
Memoization is also possible, if we pass subsets as tuple.
"""
from typing import List
class Solution:
def canPartitionKSubsets(self, nums: List[int], k: int) -> bool:
total_sum = sum(nums)
# If total sum is not divisible by k, partitioning into k equal subsets is impossible
if total_sum % k != 0:
return False
target_sum = total_sum // k
# This list keeps track of the current sum of each subset (not the elements themselves). Tracking sum only is sufficient here because we only need to know if subsets can achieve the target sum, not the exact elements in them.
subsets_sums = [0] * k
# Sorting in descending order helps pruning:
# We try to place bigger numbers first, so if a big number doesn't fit early,
# we can backtrack faster.
nums.sort(reverse=True)
return backtrack(0)
def backtrack(start: int) -> bool:
# If all numbers have been placed successfully
if start == len(nums):
return True
current_num = nums[start]
# Try to put current_num into each of the k subsets
for end in range(k):
# If adding current_num doesn't exceed the target sum for this subset
if subsets_sums[end] + current_num <= target_sum:
subsets_sums[end] += current_num # Choose to add
# Recursive call to place next number
if backtrack(start + 1):
return True # Found a valid partition, no need to explore further
subsets_sums[end] -= current_num # Backtrack (undo the choice)
# Optimization :
# If this subset was empty (sum zero) and adding current_num didn't lead to solution,
# then trying other empty subsets at this level will also fail.
if subsets_sums[end] == 0:
break
# If no subset could accommodate current_num without violating target sum, fail
return False
- Fair Distribution of Cookies
"""
In palindrome partitioning, we divide continuously in a linear manner. However, in this question, we can divide from any point, pick any element, and partition discontinuously.
As the bags chosen is not necessarily contiguous subarray of cookies, we can't use Greedy or Sliding window of size k.
Greedy algorithms make locally optimal choices at each step, hoping to reach a globally optimal solution. In distribution problems, a locally "fair" choice might lead to a globally unfair outcome later. Imagine cookies [10, 1, 1, 10] and 2 children. If we had to give continuous subarrays, a greedy approach might try to give the first child [10] (largest first). Then the second child gets [1, 1, 10] (unfairness 12). However, the optimal solution might be to give the first child [10, 1] (unfairness 11) and the second child [1, 10] (unfairness 11). The initial greedy choice prevented this better outcome.
"""
def distributeCookies(self, cookies: List[int], k: int) -> int:
n = len(cookies)
# Minimum possible unfairness is the largest single bag.
# Maximum possible unfairness is the sum of all cookies.
left = max(cookies)
right = sum(cookies)
ans = right
def can_distribute(max_unfairness: int) -> bool:
children_cookies = [0] * k
def backtrack(cookie_index: int):
if cookie_index == n:
return True
for i in range(k):
if children_cookies[i] + cookies[cookie_index] <= max_unfairness:
children_cookies[i] += cookies[cookie_index]
if backtrack(cookie_index + 1):
children_cookies[i] -= cookies[cookie_index] # Backtrack
return True
children_cookies[i] -= cookies[cookie_index] # Backtrack
return False
return backtrack(0)
while left <= right:
mid = (left + right) // 2
if can_distribute(mid):
ans = mid
right = mid - 1
else:
left = mid + 1
return ans
Greedy
- Non Overlapping Intervals
"""
Minimize the number of intervals thats needs to be removed to avoid overlaps or select the maximum number of non-overlapping intervals.
At every step, always pick the interval that finishes earliest among the available ones. By always picking the interval that ends earliest, you leave the most room for the remaining intervals, maximizing your chances of fitting more without overlap
"""
def eraseOverlapIntervals(intervals):
if not intervals:
return 0
# Sort intervals by end time
intervals.sort(key=lambda x: x[1])
count = 0
prev_end = intervals[0][1]
for i in range(1, len(intervals)):
start, end = intervals[i]
if start < prev_end: # If the current interval starts before the previous interval ends, they overlap.
count += 1
else:
# No overlap, update prev_end
prev_end = end
return count
- Huffman Coding
import heapq
from collections import defaultdict, Counter
class HuffmanNode:
def __init__(self, freq, char=None, left=None, right=None):
self.freq = freq # Frequency of the character(s)
self.char = char # Character stored (None for internal nodes)
self.left = left # Left child
self.right = right # Right child
# For heapq to sort nodes by frequency
def __lt__(self, other):
return self.freq < other.freq
def build_huffman_tree(char_freq):
"""
Build Huffman tree from character frequencies.
Returns the root of the Huffman tree.
"""
min_heap = []
# Initialize the heap with leaf nodes for each character
for char, freq in char_freq.items():
heapq.heappush(min_heap, HuffmanNode(freq, char))
# Combine nodes until single tree remains
while len(min_heap) > 1:
# Pop two nodes with smallest freq
left = heapq.heappop(min_heap)
right = heapq.heappop(min_heap)
# Merge these nodes into a new internal node
merged = HuffmanNode(left.freq + right.freq, None, left, right)
heapq.heappush(min_heap, merged)
# The remaining node is the root of the Huffman tree
return min_heap[0] if min_heap else None
def build_huffman_codes(root):
"""
Traverse the Huffman tree to build a dictionary of character->code mappings.
"""
huffman_codes = {}
def dfs(node, code):
if node is None:
return
# If leaf node
if node.char is not None:
huffman_codes[node.char] = code if code else "0" # handle single char edge case
return
# Traverse left adds "0", right adds "1"
dfs(node.left, code + "0")
dfs(node.right, code + "1")
dfs(root, "")
return huffman_codes
def huffman_encode(text):
"""
Encode the input text using Huffman coding.
Returns the encoded string and the Huffman tree root for decoding.
"""
# Count frequency of each character
char_freq = Counter(text)
# Build Huffman tree
root = build_huffman_tree(char_freq)
# Build codes by traversing the tree
codes = build_huffman_codes(root)
# Encode the text using code map
encoded_text = "".join(codes[ch] for ch in text)
return encoded_text, root
def huffman_decode(encoded_text, root):
"""
Decode the encoded text using the Huffman tree.
Returns the original text string.
"""
decoded_chars = []
current = root
for bit in encoded_text:
# Traverse tree per bit
if bit == "0":
current = current.left
else:
current = current.right
# On reaching leaf node, append character and restart from root
if current.char is not None:
decoded_chars.append(current.char)
current = root
return "".join(decoded_chars)
- Fractional Knapsack
def fractional_knapsack(weights, values, capacity):
"""
Args:
weights: List of item weights.
values: List of item values.
capacity: Maximum capacity of the knapsack.
Returns:
max_value: Maximum value that can be carried in the knapsack.
Time Complexity - O(nlogn)
"""
# Step 1: Calculate value-to-weight ratio for each item
n = len(weights)
items = []
for i in range(n):
ratio = values[i] / weights[i]
items.append((ratio, weights[i], values[i])) # (ratio, weight, value)
# Step 2: Sort items by ratio in descending order
items.sort(reverse=True)
# Step 3: Accumulate items in the knapsack
max_value = 0.0
for ratio, weight, value in items:
if capacity >= weight:
# If the whole item can be taken, take it all
max_value += value
capacity -= weight
else:
# Take the fraction that fits
fraction = capacity / weight
max_value += value * fraction
break # Knapsack is full
return max_value
Tree
def level_order_traversal(root):
"""BFS for tree"""
if not root:
return []
result = []
# We enqueue child nodes at the back.
# We dequeue nodes from the front to process them level by level.
queue = deque([root]) # queue = [1]
level_sum = 0 # when its never reset, it gives sum of all nodes in binary tree
# each iteration of while loop is one level
while queue:
level = []
level_sum = 0 # reset sum at each level, so last level sum is of deepest layer
level_size = len(queue)
# each iteration of for loop is for nodes at each level
for _ in range(level_size):
node = queue.popleft()
level.append(node.val)
level_sum += node.val
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
result.append(level)
return result
# Iteration 1 (Level 0):
# - level_size = 1
# - Process node 1:
# * Add 1 to current level
# * Add children (2,3) to queue
# - After iteration:
# * queue = [2,3]
# * result = [[1]]
# Iteration 2 (Level 1):
# - level_size = 2
# - Process node 2:
# * Add 2 to current level
# * Add children (4,5) to queue
# - Process node 3:
# * Add 3 to current level
# * Add child (6) to queue
# - After iteration:
# * queue = [4,5,6]
# * result = [[1], [2,3]]
# Iteration 3 (Level 2):
# - level_size = 3
# - Process nodes 4,5,6:
# * Add 4,5,6 to current level
# * No children to add
# - After iteration:
# * queue = []
# * result = [[1], [2,3], [4,5,6]]
# Final result: [[1], [2,3], [4,5,6]]
# Time: O(n) - visit each node once
# Space: O(w) - where w is max width of tree
# Left -> Root -> Right (Sorted for BST)
def inorder(node: TreeNode) -> List[int]:
if not node:
return []
return inorder(node.left) + [node.val] + inorder(node.right)
return inorder(root)
# Root -> Left -> Right
def preorder(node: TreeNode) -> List[int]:
if not node:
return []
return [node.val] + preorder(node.left) + preorder(node.right)
return preorder(root)
#
def postorder(node: TreeNode) -> List[int]:
if not node:
return []
return postorder(node.left) + postorder(node.right) + [node.val]
return postorder(root)
- Count Nodes Equal to Average of Subtree
class Solution:
def averageOfSubtree(self, root: TreeNode) -> int:
# If you use result = 0 (a local variable inside the method), each recursive call gets its own copy of result, and their changes are not reflected back to the caller or accumulated. With self.result, all recursive calls share the same variable attached to the Solution object. Any modification in any nested call is preserved and visible globally.
self.result = 0
self.dfs(root)
return self.result
def dfs(self, node: TreeNode):
if not node:
return (0, 0) # sum = 0, count = 0
left_sum, left_count = self.dfs(node.left)
right_sum, right_count = self.dfs(node.right)
total_sum = node.val + left_sum + right_sum
total_count = 1 + left_count + right_count
average = total_sum // total_count
if node.val == average:
self.result += 1
return (total_sum, total_count)
- Delete Leaves With a Given Value
"""
O(n) Time and O(h) space
"""
def removeLeafNodes(self, root: Optional[TreeNode], target: int) -> Optional[TreeNode]:
if not root:
return None
# First recurse into children
root.left = self.removeLeafNodes(root.left, target)
root.right = self.removeLeafNodes(root.right, target)
# Then check if current node is now a leaf
if not root.left and not root.right and root.val == target:
# in case of leaf node, it gets cut from the parent as removeLeafNodes will return None making root.left/right None. When the node has no reference, python's garbage collector will handle it
return None
return root
- Side view and Boundary Traversal
def right_view(root: TreeNode) -> List[int]:
"""Right view of binary tree using level order"""
if not root:
return []
result = []
queue = deque([root])
while queue:
level_size = len(queue)
for i in range(level_size):
node = queue.popleft()
if i == level_size - 1: # rightmost node of current level
result.append(node.val)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
return result
def boundary_traversal(root: TreeNode) -> List[int]:
"""
Boundary Traversal of Binary Tree
Intuition:
- Traverse tree boundary in anti-clockwise direction
- Three parts: left boundary, leaves, right boundary
- Need to avoid duplicate nodes
Algorithm steps:
1. Add root
2. Add left boundary (excluding leaves)
3. Add leaves from left to right
4. Add right boundary (excluding leaves) in reverse
Helper functions:
- add_left_boundary: Adds nodes on leftmost path
- add_leaves: Adds all leaf nodes
- add_right_boundary: Adds nodes on rightmost path (bottom-up)
Use cases:
1. Printing tree perimeter
2. Finding external nodes
3. Tree visualization
Time: O(n), Space: O(h) where h is height of tree
"""
def add_left_boundary(node: TreeNode, result: List[int]):
if not node or (not node.left and not node.right):
return
result.append(node.val)
if node.left:
add_left_boundary(node.left, result)
elif node.right:
add_left_boundary(node.right, result)
def add_leaves(node: TreeNode, result: List[int]):
if not node:
return
if not node.left and not node.right:
result.append(node.val)
return
add_leaves(node.left, result)
add_leaves(node.right, result)
def add_right_boundary(node: TreeNode, result: List[int]):
if not node or (not node.left and not node.right):
return
if node.right:
add_right_boundary(node.right, result)
elif node.left:
add_right_boundary(node.left, result)
result.append(node.val)
if not root:
return []
result = [root.val]
if not root.left and not root.right:
return result
if root.left:
add_left_boundary(root.left, result)
add_leaves(root.left, result)
add_leaves(root.right, result)
if root.right:
add_right_boundary(root.right, result)
return result
- When finding most frequent value in trees, create Hashmap and traverse it in one go to find values which have a certain frequency. Initial frequency will be 0, if (freq = max_freq) then
- House Robber
def rob(self, root: TreeNode) -> int:
def dfs(root):
if not root:
return [0, 0]
leftPair = dfs(root.left)
rightPair = dfs(root.right)
withRoot = root.val + leftPair[1] + rightPair[1]
withoutRoot = max(leftPair) + max(rightPair)
return [withRoot, withoutRoot]
return max(dfs(root))
- Path Sum
"""
Complexity is O(n^2) in the worst case if we start a DFS from each node of the tree.
Time: O(n) ā Each node visited once.
Space: O(h + n) ā h = height of recursion stack, n from the prefix sum map.
Prefix Sum with HashMap
"""
def pathSum(self, root: TreeNode, targetSum: int) -> int:
prefix_sum_count = defaultdict(int)
prefix_sum_count[0] = 1 # base case to handle exact target from root
def dfs(node, currSum):
if not node:
return 0
currSum += node.val
count = prefix_sum_count[currSum - targetSum] # number of valid paths ending here
prefix_sum_count[currSum] += 1
count += dfs(node.left, currSum)
count += dfs(node.right, currSum)
prefix_sum_count[currSum] -= 1 # backtrack
return count
return dfs(root, 0)
- Maximum Path Sum
"""
The path does not have to pass through the root.
greedy pruning - if a subtree gives a negative contribution, we treat it as 0.
Time - O(n) Space - O(h)
"""
def maxPathSum(self, root: TreeNode) -> int:
max_sum = float('-inf') # This will hold the final result
def dfs(node):
nonlocal max_sum
if not node:
return 0
# Recursively calculate max path sum from left and right
left_gain = max(dfs(node.left), 0) # Only add if > 0
right_gain = max(dfs(node.right), 0)
# Price of current path: may be the max sum at this node
current_path = node.val + left_gain + right_gain
# Update the global max if this path is better
max_sum = max(max_sum, current_path)
# Return max gain this node can contribute to its parent
return node.val + max(left_gain, right_gain)
dfs(root)
return max_sum
- Maximum Product of Splitted Binary Tree
"""re
Time: O(N) ā two DFS traversals.
Space: O(H) ā recursion stack, where H is the height of the tree.
max(sum(T)ā
(totalSumāsum(T)))
"""
def maxProduct(self, root: Optional[TreeNode]) -> int:
MOD = 10**9 + 7
self.total = 0
self.max_prod = 0
# First pass: get the total sum of all nodes
def get_total_sum(node):
if not node:
return 0
return node.val + get_total_sum(node.left) + get_total_sum(node.right)
self.total = get_total_sum(root)
# Second pass: for each subtree, calculate product
def dfs(node):
if not node:
return 0
left_sum = dfs(node.left)
right_sum = dfs(node.right)
sub_sum = node.val + left_sum + right_sum
# Consider removing the edge above this node
product = sub_sum * (self.total - sub_sum)
self.max_prod = max(self.max_prod, product)
return sub_sum
dfs(root)
return self.max_prod % MOD
- Insufficient Nodes in Root to Leaf Paths
"""
Greedy wont work when we start from bottom. We have to retain nodes only if there's any path to a leaf from that node with total sum ā„ limit. That requires examining all root-to-leaf paths. So, a node must be pruned even if it has a high value if all paths through it fall below the limit.
"""
def sufficientSubset(self, root: TreeNode, limit: int) -> TreeNode:
def dfs(node, path_sum):
if not node:
return None
# If it's a leaf
if not node.left and not node.right:
return node if path_sum + node.val >= limit else None
# Post-order DFS: left and right first
node.left = dfs(node.left, path_sum + node.val)
node.right = dfs(node.right, path_sum + node.val)
# If both subtrees are pruned, prune this node
if not node.left and not node.right:
return None
return node
return dfs(root, 0)
- Distance between 2 Nodes
def lowestCommonAncestor(self, root: TreeNode, node1: TreeNode, node2: TreeNode) -> TreeNode:
"""
Recursively finds the lowest common ancestor (LCA) of node1 and node2 in the binary tree.
"""
if not root or root == node1 or root == node2:
return root
left_subtree = self.lowestCommonAncestor(root.left, node1, node2)
right_subtree = self.lowestCommonAncestor(root.right, node1, node2)
if left_subtree and right_subtree:
return root # current node is LCA if node1 and node2 are found in left and right
return left_subtree if left_subtree else right_subtree
def findDepthFromAncestor(self, ancestor: TreeNode, target: TreeNode, current_depth: int = 0) -> int:
"""
Finds the depth of the target node starting from the ancestor node.
Returns -1 if the target is not found in the subtree rooted at ancestor.
"""
if not ancestor:
return -1
if ancestor == target:
return current_depth
# Search left subtree
left_depth = self.findDepthFromAncestor(ancestor.left, target, current_depth + 1)
if left_depth != -1:
return left_depth
# Search right subtree
return self.findDepthFromAncestor(ancestor.right, target, current_depth + 1)
def distanceBetweenNodes(self, root: TreeNode, node1: TreeNode, node2: TreeNode) -> int:
"""
Computes the distance (number of edges) between node1 and node2 in a binary tree.
Time - O(n) and Space - O(h)
"""
# Step 1: Find LCA of the two nodes
lca = self.lowestCommonAncestor(root, node1, node2)
# Step 2: Find depth of each node from the LCA
depth1 = self.findDepthFromAncestor(lca, node1)
depth2 = self.findDepthFromAncestor(lca, node2)
# Step 3: Distance = depth1 + depth2
return depth1 + depth2
- Maximum Difference between Node and Ancestor
"""
Time - O(n) and Space - O(h)
"""
def maxAncestorDiff(self, root: TreeNode) -> int:
def dfs(node, current_min, current_max):
if not node:
return current_max - current_min
# Update min and max for current path
current_min = min(current_min, node.val)
current_max = max(current_max, node.val)
# Recurse on left and right
left_diff = dfs(node.left, current_min, current_max)
right_diff = dfs(node.right, current_min, current_max)
# Return the max diff seen in either subtree
return max(left_diff, right_diff)
return dfs(root, root.val, root.val)
- Diameter and Width of Tree
"""
The diameter is the length (number of edges) of the longest path between any two nodes in the tree. This path may or may not pass through the root.
Time - O(n) and Space - O(h)
"""
def diameterOfBinaryTree(self, root: TreeNode) -> int:
self.max_diameter = 0
def dfs(node):
if not node:
return 0
# Height of left and right subtrees
left = dfs(node.left)
right = dfs(node.right)
# Update max diameter
self.max_diameter = max(self.max_diameter, left + right)
# Height of current node
return 1 + max(left, right)
dfs(root)
return self.max_diameter
"""
The width is defined as the length between the leftmost and rightmost non-null nodes at any level (inclusive of nulls in between).
Time - O(n) and Space - O(n)
"""
def widthOfBinaryTree(self, root: TreeNode) -> int:
if not root:
return 0
max_width = 0
# Each element: (node, index)
queue = deque([(root, 0)])
while queue:
level_length = len(queue)
_, level_head_index = queue[0]
for _ in range(level_length):
node, idx = queue.popleft()
# Normalize index to prevent overflow
idx -= level_head_index
if node.left:
queue.append((node.left, 2 * idx))
if node.right:
queue.append((node.right, 2 * idx + 1))
current_width = idx + 1 # since idx is the last one after popping all
max_width = max(max_width, current_width)
return max_width
- In a balanced BST, operations like search, insert, and delete run in average and best-case O(logā”n) time; in the worst (unbalanced), these degrade to O(n). BSTs are the basis for self-balancing trees (e.g., AVL, Red-Black), which maintain the O(logā”n) guarantee in all cases.
"""
Search in BST
"""
def searchAllBST(root: Optional[TreeNode], target: int) -> List[TreeNode]:
"""
Returns a list of all TreeNode references with .val == target in a BST with duplicates.
"""
res: List[TreeNode] = []
def dfs(node: Optional[TreeNode]) -> None:
if not node:
return
if node.val == target:
res.append(node)
# Duplicates may be both left and right in some BSTs, so check both subtrees
dfs(node.left)
dfs(node.right)
elif target < node.val:
dfs(node.left)
else:
dfs(node.right)
dfs(root)
return res
- Inorder Traversal is sorted in ascending order for BST. Reverse inorder traversal (R->V->L) gives descending order. Exploit in cases of >= or <= problems in BST.
- Balance BST
"""
O(n)
"""
def balanceBST(self, root: TreeNode) -> TreeNode:
# Step 1: Inorder traversal to get sorted node values
def inorder(node):
if not node:
return []
return inorder(node.left) + [node.val] + inorder(node.right)
sorted_vals = inorder(root)
# Step 2: Build balanced BST from sorted values
def build_balanced_tree(start, end):
if start > end:
return None
mid = (start + end) // 2
node = TreeNode(sorted_vals[mid])
node.left = build_balanced_tree(start, mid - 1)
node.right = build_balanced_tree(mid + 1, end)
return node
return build_balanced_tree(0, len(sorted_vals) - 1)
- Range Sum of BST
"""
Simply doing inorder is worse as we don't necessarily need to go through all the node values in BST. We can prune the search.
Time - O(n)
"""
def rangeSumBST(self, root: Optional[TreeNode], low: int, high: int) -> int:
if not root:
return 0
# If root value is greater than high, skip the right subtree
if root.val > high:
return self.rangeSumBST(root.left, low, high)
# If root value is less than low, skip the left subtree
if root.val < low:
return self.rangeSumBST(root.right, low, high)
# Root is within [low, high]; include it and recurse both sides
return (
root.val +
self.rangeSumBST(root.left, low, high) +
self.rangeSumBST(root.right, low, high)
)
- Insert and Delete Node in BST
def insertIntoBST(self, root: TreeNode, val: int) -> TreeNode:
# If the tree is empty, create a new node with the value and return it
if not root:
return TreeNode(val)
# If the value is less than the current node, go to the left subtree
if val < root.val:
root.left = self.insertIntoBST(root.left, val)
else:
# If the value is greater than or equal to the current node, go to the right subtree
root.right = self.insertIntoBST(root.right, val)
# Return the root as the tree structure is updated bottom-up
return root
def deleteNode(self, root: Optional[TreeNode], key: int) -> Optional[TreeNode]:
if not root:
return root
if key > root.val:
root.right = self.deleteNode(root.right, key)
elif key < root.val:
root.left = self.deleteNode(root.left, key)
else:
if not root.left:
return root.right
elif not root.right:
return root.left
cur = root.right
while cur.left:
cur = cur.left
root.val = cur.val
root.right = self.deleteNode(root.right, root.val)
return root
- Minimum Absolute Difference in BST
"""
The minimum difference must occur between consecutive nodes in the inorder traversal as the inorder traversal will be sorted.
We need to keep track of previous node.
"""
def getMinimumDifference(self, root: TreeNode) -> int:
# Initialize minimum difference and previous node value
self.min_diff = float('inf')
self.prev_node = None
def inorder(node):
if not node:
return
# Traverse the left subtree
inorder(node.left)
# Process current node: compute difference with previous node
if self.prev_node is not None:
diff = abs(node.val - self.prev_node)
self.min_diff = min(self.min_diff, diff)
# Update previous node value
self.prev_node = node.val
# Traverse the right subtree
inorder(node.right)
inorder(root)
return self.min_diff
- Serialize and Deserialize Binary Tree
"""
Preorder traversal using DFS to get a non-ambiguous string (no need for a seperate array). Decoding
the string should be feasible.
O(n) for both serializing and deserializing
"""
# Encodes a tree to a single string.
def serialize(self, root: Optional[TreeNode]) -> str:
res = []
def dfs(node):
if not node:
res.append("N")
return
res.append(str(node.val))
dfs(node.left)
dfs(node.right)
dfs(root)
return ",".join(res)
# Decodes your encoded data to tree.
def deserialize(self, data: str) -> Optional[TreeNode]:
vals = data.split(",") // comma is delimiter
self.i = 0
def dfs():
if vals[self.i] == "N": // empty node
self.i += 1
return None
node = TreeNode(int(vals[self.i]))
self.i += 1
node.left = dfs()
node.right = dfs()
return node
return dfs()
- Queue Based N-ary Tree Level Order Traversal
"""
Time - O(n) and Space - O(w), w is max width of tree
"""
def levelOrder(self, root: 'Node') -> List[List[int]]:
if not root:
return []
result = []
queue = deque([root])
while queue:
level_size = len(queue)
level_nodes = []
for _ in range(level_size):
node = queue.popleft()
level_nodes.append(node.val)
for child in node.children:
queue.append(child)
result.append(level_nodes)
return result
Heap
- Top K frequent elements
"""
Better than MaxHeap solution is
Bucket Sort O(n) Space and Time Complexity
It works as frequency table is bounded by the fact that min = 0 (element never duplicates) and max = size of array (the array has only one element duplicated)
"""
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
count = {}
freq = [[] for i in range(len(nums) + 1)]
for num in nums:
count[num] = 1 + count.get(num, 0)
for num, cnt in count.items():
freq[cnt].append(num)
res = []
for i in range(len(freq) - 1, 0, -1):
for num in freq[i]:
res.append(num)
if len(res) == k:
return res
- Find K pair with smallest sums
"""
Two Pointers + Heap
For merging two sorted listes, two pointers is enough, moving pointers taking care of a condition.
Since both nums1 and nums2 are sorted, the smallest pairs (by sum) āradiate outā from the (0,0) pair.
Butāto find the k pairs with smallest sumsāyou canāt just systematically generate (0,0), (1,0), (0,1), (2,0), (1,1), etc., with pure two-pointer logic. Why not? Because at each step there could be multiple ānextā smallest pairs that you havenāt yet visited, and their indices can overlap in complex ways.
For example :
nums1 = [1, 7, 11]
nums2 = [2, 4, 6]
(1,2) sum=3
(1,4) sum=5
(7,2) sum=9 <-- this one is the problem
(1,6) sum=7
(7,4) sum=11
(11,2) sum=13
...
At any step there can be many candidate pairs waiting to be considered, not just the ānext in rowā or ānext in columnā.
The sorted structure of nums1 and nums2 means:
Row 0: [nums1+nums2, nums1+nums2, nums1+nums2, ...] ā increasing in j
Row 1: [nums1+nums2, nums1+nums2, nums1+nums2, ...] ā increasing in j
But sums across rows are not in row/col traversal order.
Heap ensures global ordering of the next smallest pair ā two pointers in a fixed pattern only give local ordering.
The min-heap is used here because:
* It always gives you the next smallest sum pair.
* It lets you explore ānext candidatesā in the correct order, potentially expanding along either direction.
* It allows duplicate sums or equal value pairs naturally.
But, with min-heap you cannot keep the size of heap in check. It will be O(mn). To restrict the heap size to at most k while building it for the k smallest sum pairs problem, you need to keep the heap as a max-heap of size k so that you can discard pairs with sums larger than the current largest in the heap.
By maintaining a max-heap of size at most k, you only keep the k pairs with smallest sums so far. When a new pair (u, v) has sum smaller than the largest sum in the heap (the max in the heap), it replaces the largest one. This way, the heap never grows beyond size k, keeping space complexity at O(k). Time complexity for pushing all pairs remains O(nm * log k), which is better than full sort (O(n*m * log (n*m))) when k is much smaller than nm.
j=0 j=1 j=2
i=0 3 5 7
i=1 9 11 13
i=2 13 15 17
we are checking down (i, j+1) and right (i+1, j)!
"""
import heapq
def kSmallestPairs(nums1, nums2, k):
if not nums1 or not nums2 or k == 0:
return []
max_heap = [] # will store (-sum, u, v)
for u in nums1:
for v in nums2:
current_sum = u + v
if len(max_heap) < k:
# Push negative sum to simulate max heap
heapq.heappush(max_heap, (-current_sum, u, v))
else:
# If current sum is smaller than the largest sum in max_heap (which is -max_heap[0][0])
if current_sum < -max_heap[0][0]:
# Pop the largest and add the smaller current pair
heapq.heappop(max_heap)
heapq.heappush(max_heap, (-current_sum, u, v))
else:
# Current sum is not smaller, no need to proceed further in this inner loop since nums2 is sorted
# You could add break here if you want to optimize for sorted arrays
break
# Convert max_heap to result list, reverse negative sums back
result = [[u, v] for (_, u, v) in max_heap]
return result
- Task Scheduler
import heapq
from collections import Counter, deque
def leastInterval(tasks, n):
"""
Greedy for always schedule the most frequent available task to minimize idle time, Priority Queue (Max-Heap) for efficient retrieval of the next best task and a Cooldown Queue to manages tasks that are waiting to be scheduled again.
Args:
tasks: List[str] - List of tasks (e.g., ['A', 'A', 'A', 'B', 'B', 'B'])
n: int - cooldown period between same tasks
Returns:
int - least number of units of time to finish all tasks
Counting tasks: O(N), where N = number of tasks.Each task is pushed and popped from the heap at most once: O(N log K), where K is the number of unique tasks (K ⤠26). Each task is added and removed from the cooldown queue at most once: O(N). Since K is at most 26, O(log K) is O(1), so the overall time is O(N).
"""
# Step 1: Count frequency of each task
task_counts = Counter(tasks)
# Step 2: Use a max-heap (invert counts for heapq which is min-heap by default)
max_heap = [-cnt for cnt in task_counts.values()]
heapq.heapify(max_heap)
# Step 3: Queue to keep tasks in cooldown (task, available_time)
cooldown = deque() # (ready_time, count)
time = 0 # Current time
while max_heap or cooldown:
time += 1
if max_heap:
# Pop the most frequent task
cnt = heapq.heappop(max_heap)
cnt += 1 # Decrease count (since stored as negative)
if cnt != 0:
# If there are more of this task, add to cooldown
cooldown.append((time + n, cnt))
# If any task's cooldown has expired, push back to heap
if cooldown and cooldown[0][0] == time:
_, ready_cnt = cooldown.popleft()
heapq.heappush(max_heap, ready_cnt)
return time
- Find Median from Data Stream
"""
Mean is the average of all values: add up all the numbers and divide by how many there are. Median is the middle value when the numbers are arranged in order. If thereās an even number of values, itās the average of the two middle numbers. Mode is the value that appears most frequently in the data set
We don't want to sort the array again and again when new element gets added, so we can use a priorty-queue to keep our elements sorted. But, we will have to do a klogn operation to get to middle elements. So, we will use a min and max heap to store the mid values (even numbers of data in stream) and obtain it in o(1). In case of odd number of data in stream we will store extra in max/min heap.
"""
import heapq
class MedianFinder:
def __init__(self):
# Max-heap for the smaller half (store negatives to simulate max-heap)
self.left = []
# Min-heap for the larger half
self.right = []
def addNum(self, num: int) -> None:
"""
Adds a number into the data structure.
"""
# Step 1: Add to max-heap (left)
heapq.heappush(self.left, -num)
# Step 2: Balance: Move the largest from left to right (if necessary)
# This ensures all elements in left <= all elements in right
if self.left and self.right and (-self.left[0] > self.right[0]):
val = -heapq.heappop(self.left)
heapq.heappush(self.right, val)
# Step 3: Maintain size property (left can have at most 1 more element than right)
if len(self.left) > len(self.right) + 1:
val = -heapq.heappop(self.left)
heapq.heappush(self.right, val)
elif len(self.right) > len(self.left):
val = heapq.heappop(self.right)
heapq.heappush(self.left, -val)
def findMedian(self) -> float:
"""
Returns the median of all elements so far.
"""
if len(self.left) > len(self.right):
# Odd number of elements: median is the top of max-heap
return -self.left[0]
else:
# Even number of elements: median is the average of two middles
return (-self.left[0] + self.right[0]) / 2
Trie/Prefix Tree
Graphs
"""
BFS naturally processes nodes in increasing ādistanceā (layers) from starting node/root. Want minimum/smallest ānumber of movesā to reach something? BFS is ideal if all moves have the same cost.
"""
def bfs(graph, start):
"""Breadth-First Search
Time Complexity - O(n) + O(2e)
Space Complexity - O(n)
"""
visited = set()
queue = deque([start])
visited.add(start)
result = []
while queue:
vertex = queue.popleft()
result.append(vertex)
for neighbor in graph[vertex]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return result
"""
When all ways must be explored, or the visitor order matters.
"""
def dfs(graph, start, visited=None, result=None):
if visited is None:
visited = set()
if result is None:
result = []
visited.add(start)
result.append(start)
for neighbor in graph[start]:
if neighbor not in visited:
dfs(graph, neighbor, visited, result)
return result
- Cycle Detection
def undirected_has_cycle_dfs(graph):
"""
Each DFS call tracks the parent to avoid falsely detecting the immediate back edge as a cycle.
The outer loop ensures all connected components are checked.
Time Complexity - O(n)
Space Complexity - O(n + 2e)
"""
visited = set()
def dfs(node, parent):
visited.add(node)
for neighbor in graph.get(node, []):
if neighbor not in visited:
if dfs(neighbor, node):
return True
elif neighbor != parent:
# Found a cycle
return True # no need for more recursion after cycle is already detected
return False
# Handle disconnected components
for node in graph:
if node not in visited: # one node per component
if dfs(node, None):
return True
return False
from collections import deque
def undirected_has_cycle_bfs(graph):
"""
Each BFS node tracks its parent to avoid confusing the edge back to the parent as a cycle.
The outer loop ensures all components are checked, even if the graph is disconnected.
"""
visited = set()
for start in graph:
if start not in visited:
queue = deque([(start, None)]) # (current_node, parent)
visited.add(start)
while queue:
node, parent = queue.popleft()
for neighbor in graph.get(node, []):
if neighbor not in visited:
visited.add(neighbor)
queue.append((neighbor, node))
elif neighbor != parent:
# Found a back edge (cycle)
return True
return False
from collections import defaultdict
class Solution:
def detectCycleInDirectedGraph(self, V: int, adj: list[list[int]]) -> bool:
"""
Detects if a directed graph contains a cycle using DFS.
Args:
V: The number of vertices in the graph.
adj: An adjacency list representation of the graph.
adj[i] contains a list of neighbors of vertex i.
Returns:
True if a cycle is detected, False otherwise.
"""
# visited array: This array keeps track of all nodes that have been part of any DFS traversal at some point. Once a node is marked visited, it means we've explored it from at least one path. This prevents redundant processing of nodes and helps optimize the traversal.
# visited[i] = True if node i has been visited in any DFS traversal.
visited = [False] * V
# path_visited array: This is the key for cycle detection in a directed graph. It keeps track of all nodes that are currently in the current recursion stack of the DFS traversal. If we encounter a node that is already in our path_visited array, it means we've found a back edge, which indicates a cycle. Think of it as: "I'm currently exploring this path, and I just tried to go to a node I've already visited on this exact path.
# path_visited[i] = True if node i is currently in the recursion stack.
path_visited = [False] * V
# The DFS helper function
def dfs(node: int) -> bool:
"""
Performs DFS starting from 'node' to detect cycles.
Args:
node: The current node being visited.
Returns:
True if a cycle is detected starting from this node's path, False otherwise.
"""
visited[node] = True # Mark current node as globally visited
path_visited[node] = True # Mark current node as part of the current path
# Explore all neighbors of the current node
for neighbor in adj[node]:
if not visited[neighbor]:
# If neighbor is not visited, recursively call DFS
if dfs(neighbor):
# If a cycle is detected in the subtree, propagate True upwards
return True
elif path_visited[neighbor]:
# If neighbor is already in the current recursion stack (path_visited is True),
# it means we found a back edge, hence a cycle.
return True
# Backtrack: Remove the current node from the current path
# as we are done exploring all paths from this node.
path_visited[node] = False
return False # No cycle found from this node's path
# Iterate through all nodes to handle disconnected components
for i in range(V):
if not visited[i]:
# If a node hasn't been visited, start a new DFS traversal
if dfs(i):
# If DFS from this node finds a cycle, return True immediately
return True
# No cycle found after checking all components
return False
- Topological Sort
"""
For each vertex, we iterate through its outgoing edges. Thus, every vertex and every edge in the graph is processed exactly once. This leads to O(V+E) for the DFS part. adj (Adjacency List): O(V+E) to store the graph connections, visited_state array: O(V) space, result list: O(V) space to store the topological order, Recursion Stack: In the worst-case scenario (a linear graph, e.g., 0 -> 1 -> 2 -> ... -> V-1), the depth of the recursion stack can go up to O(V).
0 (UNVISITED): The node has not been encountered at all by any DFS traversal.
1 (VISITING / IN_STACK): The node has been visited, and it is currently in the active recursion stack of the current DFS path. This means we are currently exploring its descendants.
2 (VISITED / PROCESSED): The node has been completely processed. Its DFS traversal (and all of its reachable descendants) has finished, and it has been added to the topological sort result.
"""
def _dfs(self, node: int):
"""
Recursive DFS helper function for topological sort and cycle detection.
"""
if self.has_cycle:
# If a cycle has already been detected, stop further processing
return
# Mark the current node as VISITING (in recursion stack)
self.visited_state[node] = 1
# Explore all neighbors of the current node
for neighbor in self.adj[node]:
if self.visited_state[neighbor] == 0:
# If neighbor is UNVISITED, recursively call DFS on it
self._dfs(neighbor)
if self.has_cycle: # Propagate cycle detection upwards
return
elif self.visited_state[neighbor] == 1:
# If neighbor is VISITING, it means we found a back edge,
# hence a cycle exists in the graph.
self.has_cycle = True
return
# After visiting all neighbors and ensuring no cycle, mark current node as VISITED
self.visited_state[node] = 2
# Add the current node to the result list.
# Nodes are added when all their descendants (dependencies) have been processed.
# This naturally builds the list in reverse topological order.
self.result.append(node)
def topological_sort(self) -> list[int]:
"""
Performs topological sort on the graph.
Returns:
A list representing a valid topological order if one exists.
An empty list if a cycle is detected (no valid order).
"""
self.result = [] # Reset result for new sort attempts
self.has_cycle = False
self.visited_state = [0] * self.num_nodes # Reset visited states
# Iterate through all nodes to handle disconnected components in the graph
for i in range(self.num_nodes):
if self.visited_state[i] == 0: # If node is unvisited, start a new DFS
self._dfs(i)
if self.has_cycle:
# If a cycle is detected during any DFS call, return empty list
return []
# If no cycle was found, reverse the result list to get the correct topological order.
# The DFS appends nodes after all their descendants are processed, so the
# sink nodes are added first. Reversing puts source nodes first.
return self.result[::-1]
- Dijkstra
def dijkstra(graph, start):
"""Dijkstra's shortest path algorithm"""
distances = {vertex: float('infinity') for vertex in graph}
distances[start] = 0
pq = [(0, start)]
visited = set()
while pq:
current_distance, current_vertex = heapq.heappop(pq)
if current_vertex in visited:
continue
visited.add(current_vertex)
for neighbor, weight in graph[current_vertex].items():
distance = current_distance + weight
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(pq, (distance, neighbor))
return distances
- Course Schedular
- All Nodes Distance K in Binary Tree
- Word Ladder (Graph Transformation)
- Union-Find (Disjoint Set Union)
Matrix
- Number of Islands
def numIslands(self, grid: List[List[str]]) -> int:
if not grid:
return 0
self.rows, self.cols = len(grid), len(grid[0])
self.visited = set()
self.directions = [(1, 0), (-1, 0), (0, 1), (0, -1)]
self.grid = grid # store grid as instance variable
islands = 0
for r in range(self.rows):
for c in range(self.cols):
if self.grid[r][c] == "1" and (r, c) not in self.visited:
self.bfs(r, c)
islands += 1
return islands
def bfs(self, r, c):
q = deque()
q.append((r, c))
self.visited.add((r, c))
while q:
row, col = q.popleft()
for dr, dc in self.directions:
nr, nc = row + dr, col + dc
if (0 <= nr < self.rows and 0 <= nc < self.cols and
self.grid[nr][nc] == "1" and (nr, nc) not in self.visited):
q.append((nr, nc))
self.visited.add((nr, nc))
- Matrix Traversal - Rat in a Maze
"""
Explore all the paths - DFS, but if shortest path is required then BFS
"""
def findPath(self, maze: List[List[int]], n: int) -> List[str]:
if maze[0][0] == 0 or maze[n - 1][n - 1] == 0:
return [] # No path if start or end is blocked
result = []
visited = [[False for _ in range(n)] for _ in range(n)]
directions = [('D', 1, 0), ('L', 0, -1), ('R', 0, 1), ('U', -1, 0)]
def is_safe(row, col):
return 0 <= row < n and 0 <= col < n and maze[row][col] == 1 and not visited[row][col]
def dfs(row, col, path):
if row == n - 1 and col == n - 1:
result.append(path)
return
# Mark current cell as visited
visited[row][col] = True
for move, dx, dy in directions:
new_row, new_col = row + dx, col + dy
if is_safe(new_row, new_col):
dfs(new_row, new_col, path + move)
# Backtrack: unmark current cell
visited[row][col] = False
dfs(0, 0, "")
return sorted(result) # Lexicographically sorted paths
- Minimum square to fit N rectangles
"""
For any candidate size XX, you can check in O(1)O(1) whether all nn rectangles fit by seeing how many can fit along the width and along the height:
max_rects=āX/wāĆāX/hā
As X increases, max_rects also increases (monotonically).
The property: "can fit at least n rectangles" transitions from False (when X is too small) to True as X increases.
"""
# Function to check if side of square X
# can pack all the N rectangles or not
def bound(w, h, N, x):
# Find the number of rectangle
# it can pack
val = (x//w)*(x//h)
# If val is atleast N,
# then return true
if(val >= N):
return True
# Otherwise, return false
else:
return False
# Function to find the size of the
# smallest square that can contain
# N rectangles of dimensions W * H
def FindSquare(N, W, H):
# Stores the lower bound
ileft = 1
# Stores the upper bound
right = (W * H) * N
# Iterate until i is less than j
while(left < right):
# Calculate the mid value
mid = left + (right - left)//2
# If the current size of square can contain N rectangles
if(bound(W, H, N, mid)):
right = mid
# Otherwise, update i
else:
left = mid + 1
# Return the minimum size of the square required
return right
- Search in Sorted Matrix
import bisect
"""
For using bisect you will have to flatten the matrix which will make the time complexity O(mn). Directly applying Binary Search on Matrix is better to get O(log(m*n)) time complexity.
"""
def searchMatrix(matrix, target):
if not matrix or not matrix[0]:
return False
rows, columns = len(matrix), len(matrix[0])
# Binary Search on Matrix
left, right = 0, rows * columns - 1
while left <= right:
mid = left + (right - left) // 2
row, column = mid // columns, mid % columns
mid_value = matrix[row][column]
if mid_value == target:
return True
elif mid_value < target:
left = mid + 1
else:
right = mid - 1
return False
- Rotten Oranges
"""
Multi sourc BFS - O(mn) Space and Time complexity
"""
from collections import deque
def orangesRotting(grid):
# no oranges to process
if not grid:
return -1
rows, cols = len(grid), len(grid[0])
queue = deque() # Queue for BFS storing positions of rotten oranges and the time when they became rotten
fresh = 0 # number of fresh oranges
# Step 1: Collect all rotten oranges and count fresh ones
for r in range(rows):
for c in range(cols):
# Append rotten orange location with time = 0 (start)
if grid[r][c] == 2:
queue.append((r, c, 0)) # (row, col, time)
elif grid[r][c] == 1:
fresh += 1 # count the fresh ones
# If there are no fresh oranges from the start, no time needed to rot anything
if fresh == 0:
return 0
# Step 2: BFS traversal (4-directional)
directions = [(0, 1), (1, 0), (0, -1), (-1, 0)]
minutes = 0 # time elapsed as rot spread
while queue:
r, c, minutes = queue.popleft() # Current rotten orange and time of rot
# Check all 4 adjacent neighbors
for dr, dc in directions:
nr, nc = r + dr, c + dc
# If the neighbor is a fresh orange, rot it
if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1:
grid[nr][nc] = 2 # Mark neighbor as rotten
fresh -= 1 # One fewer fresh orange left
queue.append((nr, nc, minutes + 1)) # Add newly rotten orange to queue with incremented time
# If no fresh oranges left, return number of minutes elapsed to rot all
# Otherwise, return -1 indicating some fresh oranges cannot rot
return minutes if fresh == 0 else -1
- 0/1 Matrix
from collections import deque
class Solution:
def updateMatrix(self, mat: list[list[int]]) -> list[list[int]]:
"""
Calculates the distance of the nearest 0 for each cell in a binary matrix.
Args:
mat: A 2D list of integers representing the matrix, where 0 represents
an empty cell and 1 represents a filled cell.
Returns:
A 2D list of integers where each cell contains the distance to the
nearest 0.
"""
m, n = len(mat), len(mat[0])
# Initialize the result matrix 'dist' with -1 for cells that are not 0.
# Cells that are 0 will have a distance of 0.
dist = [[-1] * n for _ in range(m)]
# Initialize a queue for Breadth-First Search (BFS).
# We will start BFS from all cells that contain 0.
q = deque()
for r in range(m):
for c in range(n):
if mat[r][c] == 0:
dist[r][c] = 0 # Distance to itself is 0
q.append((r, c)) # Add 0-cells to the queue
# Define the possible directions for movement (up, down, left, right)
directions = [(0, 1), (0, -1), (1, 0), (-1, 0)]
# Perform BFS
while q:
row, col = q.popleft()
# Explore all 4 neighboring cells
for dr, dc in directions:
new_row, new_col = row + dr, col + dc
# Check if the new cell is within the matrix bounds
if 0 <= new_row < m and 0 <= new_col < n:
# If the cell has not been visited yet (distance is -1)
if dist[new_row][new_col] == -1:
# The distance to this cell is 1 more than the current cell's distance
dist[new_row][new_col] = dist[row][col] + 1
q.append((new_row, new_col)) # Add to queue for further exploration
return dist
- Tic Tac Toe (Winning Conditions and Board State)
class TicTacToe:
def __init__(self, n: int):
self.n = n
self.rows = [0] * n
self.cols = [0] * n
self.diag = 0
self.anti_diag = 0
"""
Time - O(1) per move
Space - O(N)
"""
def move(self, row: int, col: int, player: int) -> int:
value = 1 if player == 1 else -1
self.rows[row] += value
self.cols[col] += value
if row == col:
self.diag += value
if row + col == self.n - 1:
self.anti_diag += value
if (abs(self.rows[row]) == self.n or
abs(self.cols[col]) == self.n or
abs(self.diag) == self.n or
abs(self.anti_diag) == self.n):
return player
return 0 # No winner yet
- Shortest path in matrix with Obstacles
"""
BFS is preferred in grid traversals as it allows us to visit all possible neighbours at once. All the cells at the certain level from the start will be pushed into the queue in single iteration (imagine moving radially). So if the endpoint is at multiple levels, the first return will occur at the minimum possible level. And thinking in the same line DFS is complicated here. We would need to backtrack and keep track of visited in DFS (not required in BFS) and track the path lengths and compare at the end. It doesnāt guarantee minimum path length unless all paths are explored. Prone to TLE if not heavily pruned.
"""
"""
Time: O(m * n * (k+1)) - Each cell can be visited multiple times with different k values (like (i, j, 0)...(i, j, k))
Space: O(m * n * (k+1)) - Due to the size of visited and queue
"""
from collections import deque
class Solution:
def shortestPath(self, grid, k):
m, n = len(grid), len(grid[0])
# Early exit: if k is large enough to walk straight ignoring obstacles
if k >= m + n - 2:
return m + n - 2
# Each state: (row, col, remaining k)
queue = deque([(0, 0, k)])
visited = set([(0, 0, k)])
steps = 0
directions = [(0, 1), (1, 0), (0, -1), (-1, 0)]
while queue:
for _ in range(len(queue)):
x, y, remaining = queue.popleft()
# Reached destination
if x == m - 1 and y == n - 1:
return steps
for dx, dy in directions:
nx, ny = x + dx, y + dy
if 0 <= nx < m and 0 <= ny < n:
nk = remaining - grid[nx][ny] # decrease k if obstacle
state = (nx, ny, nk)
if nk >= 0 and state not in visited:
visited.add(state)
queue.append(state)
steps += 1
return -1 # Not reachable
- Sudoku solving (Backtracking with constraints)
"""
Time - O(9^(n)), where n is the number of empty cells. Each cell has at most 9 choices. In practice, backtracking prunes invalid branches, making it much faster.
Space - O(1), since the board is modified in place. The recursion depth is at most O(81).
Row/Col/Box lookup - O(1) using sets
Sorting Empty Cells (MRV) - O(k log k), where k is the number of empty cells
Backtracking Calls - Much fewer than O(9^k) due to MRV
"""
from typing import List
def solveSudoku(board: List[List[str]]) -> None:
row_sets = [set() for _ in range(9)]
col_sets = [set() for _ in range(9)]
box_sets = [[set() for _ in range(3)] for _ in range(3)]
empty_cells = []
# Initialize constraints and collect empty cells
for i in range(9):
for j in range(9):
if board[i][j] != ".":
num = board[i][j]
row_sets[i].add(num)
col_sets[j].add(num)
box_sets[i // 3][j // 3].add(num)
else:
empty_cells.append((i, j))
# Sort empty cells by MRV (least available options first)
def count_options(cell):
i, j = cell
return sum(1 for num in "123456789" if num not in row_sets[i] and num not in col_sets[j] and num not in box_sets[i // 3][j // 3])
empty_cells.sort(key=count_options)
def backtrack(index):
if index == len(empty_cells):
return True # Solved
i, j = empty_cells[index]
for num in "123456789":
if num not in row_sets[i] and num not in col_sets[j] and num not in box_sets[i // 3][j // 3]:
# Place the number
board[i][j] = num
row_sets[i].add(num)
col_sets[j].add(num)
box_sets[i // 3][j // 3].add(num)
if backtrack(index + 1):
return True # Continue solving
# Undo placement (backtrack)
board[i][j] = "."
row_sets[i].remove(num)
col_sets[j].remove(num)
box_sets[i // 3][j // 3].remove(num)
return False # No valid number found
backtrack(0)
- N-queens (Backtracking with column AND diagonal tracking)
"""
Worst case O(N!) time complexity, because for each row we can try all N columns (but backtracking prunes a lot of branches).
"""
def solveNQueens(self, n: int) -> List[List[str]]:
def backtrack(row: int):
if row == n:
# Found a valid board configuration
board = ["".join(r) for r in current_board]
results.append(board)
return
for col in range(n):
if col in cols or (row + col) in pos_diagonals or (row - col) in neg_diagonals:
continue # Conflict: skip this position
# Place queen
cols.add(col)
pos_diagonals.add(row + col)
neg_diagonals.add(row - col)
current_board[row][col] = 'Q'
backtrack(row + 1)
# Remove queen (backtrack)
cols.remove(col)
pos_diagonals.remove(row + col)
neg_diagonals.remove(row - col)
current_board[row][col] = '.'
results = []
cols = set() # columns where queens are placed
pos_diagonals = set() # r + c diagonals (āļø)
neg_diagonals = set() # r - c diagonals (āļø)
current_board = [["."] * n for _ in range(n)]
backtrack(0)
return results
- Number of Distinct Islands
class Solution:
def numDistinctIslands(self, grid: list[list[int]]) -> int:
m, n = len(grid), len(grid[0])
# A set to store the unique shapes of islands.
# We'll store shapes as frozensets of relative coordinates (tuples).
distinct_islands = set()
# 8 Directions for DFS:
# (dr, dc)
# Horizontal: (0, 1), (0, -1)
# Vertical: (1, 0), (-1, 0)
# Diagonal: (1, 1), (1, -1), (-1, 1), (-1, -1)
# directions = [
# (0, 1), (0, -1), # Right, Left
# (1, 0), (-1, 0), # Down, Up
# (1, 1), (1, -1), # Down-Right, Down-Left
# (-1, 1), (-1, -1) # Up-Right, Up-Left
# ]
# Generate 8 directions using nested loops
# dr will go from -1, 0, 1 (for up, current row, down)
# dc will go from -1, 0, 1 (for left, current col, right)
# We exclude (0, 0) as that's the current cell itself.
directions = []
for dr in range(-1, 2):
for dc in range(-1, 2):
if dr == 0 and dc == 0:
continue # Skip the current cell itself
directions.append((dr, dc))
# DFS helper function to explore an island and record its shape
def dfs(r: int, c: int, island_shape: list, row_offset: int, col_offset: int):
# Mark the current cell as visited by changing it from 1 to 0
grid[r][c] = 0 # Change land to water (or a distinct visited marker)
# Add the relative coordinates of the current cell to the island_shape list
# Normalized relative to the top-leftmost point of the island.
island_shape.append((r - row_offset, c - col_offset))
# Explore neighbors in all 8 directions
for dr, dc in directions:
nr, nc = r + dr, c + dc
# Check bounds and if it's an unvisited land cell
if 0 <= nr < m and 0 <= nc < n and grid[nr][nc] == 1:
dfs(nr, nc, island_shape, row_offset, col_offset)
# Iterate through the grid to find islands
for r in range(m):
for c in range(n):
if grid[r][c] == 1: # Check if it's land
# Found a new island, start DFS from here
current_island_shape = []
# Pass the starting coordinates as offsets to normalize future coordinates
dfs(r, c, current_island_shape, r, c)
# Add the unique shape (as a frozenset for immutability) to our set
# Using frozenset handles cases where the points might be visited
# in different orders but represent the same shape.
# If maintaining a specific order within the shape is crucial (e.g., for
# a path-based signature), then sorting current_island_shape and then
# converting to a tuple would be more appropriate:
# current_island_shape.sort()
# distinct_islands.add(tuple(current_island_shape))
distinct_islands.add(frozenset(current_island_shape))
return len(distinct_islands)
Dynamic Programming
- Palindromic Substrings
"""
Expand Around Center:
A palindrome can be centered around a single character (odd length, e.g., "aba") or between two characters (even length, e.g., "abba").
The "expand around center" approach iterates through every possible center of a palindrome and then expands outwards to find all palindromes centered at that point.
Odd Length Palindromes - Each character in the string can be the center of an odd-length palindrome. We can start with l = r = i (where i is the current character's index) and expand outwards.
Even Length Palindromes - Each pair of adjacent characters can be the center of an even-length palindrome. We can start with l = i and r = i + 1 and expand outwards.
By doing this for every possible center, we ensure that all palindromic substrings are counted exactly once.
"""
class Solution:
def countSubstrings(self, s: str) -> int:
"""
Counts the total number of palindromic substrings in a given string.
Args:
s: The input string.
Returns:
The total count of palindromic substrings.
"""
# Initialize the total count of palindromic substrings found.
total_palindromes = 0
n = len(s) # Get the length of the string for boundary checks
# Iterate through each possible center of a palindrome.
# A single character can be the center of an odd-length palindrome (i, i).
# The space between two characters can be the center of an even-length palindrome (i, i + 1).
for i in range(n):
# Case 1: Odd-length palindromes (center is a single character)
# Start expanding from s[i] as the center.
total_palindromes += self._count_palindromes_from_center(s, i, i)
# Case 2: Even-length palindromes (center is between two characters)
# Start expanding from s[i] and s[i+1] as the center.
total_palindromes += self._count_palindromes_from_center(s, i, i + 1)
return total_palindromes
def _count_palindromes_from_center(self, s: str, l: int, r: int) -> int:
"""
Helper function to count palindromes by expanding outwards from a given center.
Args:
s: The input string.
l: The left pointer, starting at the left side of the potential center.
r: The right pointer, starting at the right side of the potential center.
Returns:
The number of palindromes found by expanding from this center.
"""
# Initialize count for palindromes found from this specific center.
current_center_palindromes = 0
n = len(s) # Get the length of the string for boundary checks
# Expand outwards as long as:
# 1. The left pointer 'l' is within bounds (>= 0).
# 2. The right pointer 'r' is within bounds (< string length).
# 3. The characters at 'l' and 'r' are equal (maintaining palindrome property).
while l >= 0 and r < n and s[l] == s[r]:
# If the characters match, it means we've found a valid palindrome.
# E.g., if "aba", initially l=0, r=0, s[0]==s[0] (a), count=1.
# Then l=-1, r=1, now l=0, r=2, s[0]==s[2] (a==a), count=2 for "aba".
current_center_palindromes += 1
# Move pointers outwards for the next check.
l -= 1 # Move left pointer to the left
r += 1 # Move right pointer to the right
return current_center_palindromes
- Decode Ways
"""
Time: O(N)
Space: O(N) due to recursion stack + memoization
"""
def numDecodings(self, s: str) -> int:
n = len(s)
@lru_cache(None)
def recurse(start: int) -> int:
if start >= n:
return 1
one, two = 0, 0
# Take one digit if it's valid (i.e., not '0')
if "1" <= s[start:start+1] <= "9":
one = recurse(start + 1)
# Take two digits if they form a valid number from 10 to 26
if start + 2 <= n and "10" <= s[start:start+2] <= "26":
two = recurse(start + 2)
return one + two
return recurse(0)
- Edit Distance
def edit_distance(word1: str, word2: str) -> int:
"""Minimum operations to convert word1 to word2"""
m, n = len(word1), len(word2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(m + 1):
dp[i][0] = i
for j in range(n + 1):
dp[0][j] = j
for i in range(1, m + 1):
for j in range(1, n + 1):
if word1[i-1] == word2[j-1]:
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) + 1
return dp[m][n]
- Knapsack Problem
def knapsack_01(values, weights, capacity):
"""0/1 Knapsack Problem"""
n = len(values)
dp = [[0] * (capacity + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for w in range(capacity + 1):
if weights[i-1] <= w:
dp[i][w] = max(values[i-1] + dp[i-1][w-weights[i-1]], dp[i-1][w])
else:
dp[i][w] = dp[i-1][w]
return dp[n][capacity]
- Longest Common Subsequence and Longest Increasing Subsequence
def longest_common_subsequence(text1: str, text2: str) -> int:
"""Dynamic Programming - LCS"""
m, n = len(text1), len(text2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if text1[i-1] == text2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
return dp[m][n]
def longest_increasing_subsequence(nums):
"""LIS using binary search O(nlogn)"""
if not nums:
return 0
tails = [0] * len(nums)
size = 0
for num in nums:
left, right = 0, size
while left < right:
mid = (left + right) // 2
if tails[mid] < num:
left = mid + 1
else:
right = mid
tails[left] = num
size = max(size, left + 1)
return size
- Best Time to Buy and Sell Stock limited to k Transactions
"""
Time Complexity: O(n * k)
Space Complexity: O(n * k) due to recursion stack and cache
"""
from functools import lru_cache
from math import inf
def maxProfit(self, k: int, prices: List[int]) -> int:
@lru_cache(None)
def recurse(time: int, stock: int, count: int) -> int:
# Base cases
if count >= k or time == len(prices):
return 0
# Option 1: Hold
hold = recurse(time + 1, stock, count)
# Option 2: Buy (if not holding)
if stock == 0:
buy = -prices[time] + recurse(time + 1, 1, count)
else:
buy = -inf
# Option 3: Sell (if holding)
if stock == 1:
sell = prices[time] + recurse(time + 1, 0, count + 1)
else:
sell = -inf
return max(hold, buy, sell)
return recurse(0, 0, 0)
- Unique BST
"""
Time: O(n^2)
Space: O(n)
"""
def numTrees(n: int) -> int:
dp = [0] * (n + 1)
dp[0] = dp[1] = 1 # Base cases
for nodes in range(2, n + 1):
for root in range(1, nodes + 1):
left = dp[root - 1] # Nodes in left subtree
right = dp[nodes - root] # Nodes in right subtree
dp[nodes] += left * right
return dp[n]
- Matrix Chain Multiplication
Don't underestimate easy topics. Generally Dynamic Programming and Advanced Graphs are not asked in interviews (until and unless you are very unlucky). Just know the basics really well, and rest you can manage in the interview.
Impractical Stuff
If I ever take interviews, I wont ask Hard Leetcode questions. They are mostly impractical for any Senior Engineer with a day-job and a family ( Āŗļ¹Āŗ ). I am even crazier, I would ask the candidate to build a Iterator or some applied Data Structure problem, any basic thing that's used in day-to-day job just to check ability to fight through a problem. And that would cover most relevant Data Structures and Algorithms concepts. Even LeetCode has some interesting questions like :
- Design HashMap and HashSet
- Design Twitter
- Design Add and Search Words Data Structure
- Design Hit Counter
- LFU and LRU Cache
- Design Circular Queue
- Insert Delete GetRandom O(1)
- Time Based Key-Value Store
- Single-Threaded CPU
- Design Browser History
OOPs
When only one instance of a class is required, we use a Singleton Class. For this, we make the constructor private. Clarity over OOPs concepts is crucial. The Pillars of OOPs are : like Tight and Loose Coupling, Inheritance and Composition, Abstract Classes and Interfaces, Diamond Problem, Dependecny Injection,
- Data Abstraction - hide the internal implementation, expose only the essential functionality. Achieved via Interface and Abstract Classes
- Data Encapsulation - Classes bundle the data with its corresponsing code
Operating Systems
A process has its own memeory, pages. Multi-processing is different from multi-threading. Inter-process communication is hard, so naturally we use mult-threading more.
System Calls
Process Scheduling and Page Replacement Algorithms
Syncronization and Deadlocks
Multi-threading and Concurrency
Another crucial thing to focus on is concurrency (context switching in the user space), parallelism (exploiting multiple cores) and asyncronous programming (non-blocking coroutine). Most Python devs using async/await
donāt really know whatās going on under the hood. async/await
is an asynchronous mechanism, NOT a concurrency model. It lets you write non-blocking code that looks like sequential code. It (by itself) does NOT make tasks run concurrently or in parallel on its own. We get concurrency only when we initiate multiple async operations together.
A thread is the single unit of execution of the process. In each process, there is atleast a main thread. Mult-threading works with shared-memory. Due to illusion of concurrency, we can run as many threads as we want, but they will not be actual hardware threads.
Threading (for CPU-intensive tasks) is a concurrent execution model whereby multiple threads within a single process takes turns executing tasks. One process can contain multiple threads. Python has a complicated relationship with threading thanks to its GIL, which limits the ability of threads to run concurrently when executing Python code. So, in case of CPU bound tasks in python, process-based concurrency is superior as its not nerfed by GIL. However, since each process has its memory space, interprocess communication (IPC) can be more challenging to implement. Asyncio (allows for concurrent execution of coroutines) can be seen as a combination of the benefits of threading and processes, allowing for the efficient handling of I/O-bound tasks while still being able to execute CPU-bound tasks concurrently. The async keyword converts a Python function into a coroutine, whereby the coroutine can be executed asynchronously. The async keyword returns a coroutine object that is run by the event loop. What this essentially means is that the coroutine can momentarily pause its execution under the following circumstances:
- Waiting for I/O operations ā making network requests, interacting with databases, and more.
- Waiting for external events ā specific test conditions before proceeding with actions, monitoring & logging server-side issues, and more.
- Achieving better concurrency ā yielding control to the event loop when there are waits (or sleep), running multiple coroutines using
asyncio.gather()
Coroutines can pause their execution using the await keyword. The await keyword suspends the currently executing coroutine, and the control is yielded to the event loop. The suspend coroutine/task is again scheduled for execution when the awaitables (i.e. I/O, timer expiry, etc.) completes. With the current task suspended, the event loop now schedules and executes coroutines that are ready for execution. Once the awaited task is completed, the earlier suspended coroutine resumes execution from the point where it was paused.
Though, debugging can be challenging here in complex applications, as the interactions between coroutines can be difficult to reason about.
There are other ways to handle asynchronous logic as well - Callbacks that are hard to manage, Futures/Promises which are cleaner yet nested, and manual Threads that are heavy and complex. Sometimes we need those. But in general interviews, I think the focus would be on async/await
made it possible to write asynchronous logic that looks like normal sequential code avoiding thread blocking while improving readability and maintainability.
It is very evident that asynchronous execution is the way to go when the application involves operations with external resources ā network requests, database queries, I/O, etc. In such scenarios, the CPU would be less loaded whereby it can pick up other tasks that require its attention. For CPU-bound tasks or blocking I/O, Python asyncio can be used with ThreadPoolExecutor for offloading tasks from the asyncio event loop. Also, Python asyncio with ProcessPoolExecutor offers the benefits of parallelism by making the best use of multi-CPU cores.
Using async in the code may result in some unexpected behavior as global objects and singletons can share state.
But multiple threads working on same data at same time can create race condition, so use Mutex. And we want our locks to be as Granular as possible, as the locks make our program syncronous for some time. In case of read/write heavy applications, shared lock help. Suppose there are a lot of reads, then shared lock will allow multiple reads to happen, but mutex on write. Another thing is Deadlocks, which can be solved using condition variables.
And the age old interview question on Semaphore and Mutexes. Both are synconization primitives. But Mutexes are a locking mechanism, allow only one thread in the critical section - either locked/unlocked,
Interestingly, there are some cute problems on LeetCode revolving around these concepts :
Databases
A database transaction is ACID if it satisfies Atomicity (all-or-nothing execution preventing data loss and corruption from occurring), Consistency (transactions only make changes with rules/constraints preserved ensuring that corruption in the data does not create unintended consequences for the integrity of your table), Isolation (concurrent transactions behave as if serial), and Durability (successfully executed transactions are committed, changes survive system failure). In a distributed environment, maintaining ACID requires a coordination layer. For example, a two-phase commit (2PC) protocol is often used: a coordinator asks each participant to prepare (vote yes/no), and only if all vote āYesā does the coordinator broadcast a commit; otherwise it aborts. This ensures atomic commits across nodes, but 2PC can block under failures and is not Byzantine fault tolerant. Some modern systems use consensus (Raft/Paxos) for metadata or use optimistic commit via logs (e.g. Delta Lakeās transaction log) to provide ACID guarantees on object stores. The beauty of ACID transactions is that users can trust the data that is stored in Delta Lake.
Multi-version concurrency control (MVCC) allows lock-free reads by keeping multiple versions of each record. Writers create new versions instead of overwriting, so readers access a consistent snapshot based on a timestamp or transaction ID. MVCC thus ensures readers never see āhalf-committedā data. Periodically, old versions are garbage-collected (e.g. PostgreSQLās VACUUM). It's widely used in RDBMS (Postgres, Oracle) and also in storage engines like RocksDB to handle concurrent updates and snapshot reads without locks.
Traditional relational engines use B-tree indexes: balanced tree structures on disk where each node holds sorted key ranges. B-trees allow efficient point queries and range scans by traversing from root to leaf (requiring O(logāÆN) I/O). In contrast, LSM-tree (Log-Structured Merge-tree) engines (used in Cassandra, RocksDB, HBase) batch writes in memory (memtables) and periodically merge to disk in sorted runs. LSM-trees optimize write throughput by sequentializing disk writes and trading some read complexity (reads may need to check multiple levels).
SQL query optimizers use cost-based planning: they enumerate possible join orders, access paths, and choose the lowest-cost plan. Heuristics (e.g. join reordering, predicate pushdown) and statistics (table/cardinalities) guide the planner. Common optimization strategies include transforming subqueries to joins, combining filters, and choosing efficient join algorithms (see below). Indexes dramatically speed up queries by allowing lookups instead of full scans. The most common index is the B-tree index, which stores keys in sorted tree nodes. B-tree indexes offer logarithmic lookup time and support efficient range queries (e.g. all rows between values). For example, a B-tree index on a date column can quickly return all rows within a date range. Hash indexes, by contrast, map keys via a hash function and allow O(1) exact-match lookups, but do not support ordered or range scans (they lose the key order). In analytical engines, columnar indexes and bloom-filters are used: Parquet files, for instance, record per-column dictionary or min/max stats to skip non-matching row-groups. Parquet can also embed Bloom filters per page to quickly exclude pages that do not contain a value
As above, 2PC is the classic protocol for atomic commits in a distributed DB, ensuring all nodes commit or all abort. 2PC is simple but can be blocking: if the coordinator fails after sending āprepareā but before final decision, participants may wait indefinitely. To mitigate this, some systems use consensus (e.g. Raft) to coordinate commit logs or use asynchronous replication with conflict detection. Another approach is optimistic concurrency + transaction logs (as in Delta Lake/Iceberg) where readers always see a consistent snapshot and writers append new versions atomically, sidestepping classic 2PC among data nodes.
ACID
Normalization
Normalization is the process of structuring relational tables to reduce redundancy and anomalies. First Normal Form (1NF) requires atomic values and a primary key. Second Normal Form (2NF) requires 1NF plus no partial dependencies: every non-key column must depend on the entire primary key, not just part of it. In practice, this means eliminating tables where a composite key partially determines a column. Third Normal Form (3NF) requires 2NF and no transitive dependencies: non-key columns cannot depend on other non-key columns. For example, splitting āemployeesā from ārolesā tables so that attributes like employee name arenāt duplicated per role. Higher forms (BCNF, 4NF) address more subtle cases. Normalizing to 3NF ensures that each fact is stored once, which avoids insert/delete/update anomalie
Query Optimzation
SQL Joins
SQL supports several join types:
- Inner Join: Returns rows where the join condition matches on both tables.
- Outer Joins: Left/Right Outer include all rows from one side and matched rows from the other (filling nulls if no match). Full Outer includes all rows from both sides.
- Cross Join: Cartesian product (all combinations).
- Semijoins/Antijoins: (e.g. WHERE EXISTS) return rows from one side that have (or donāt have) matches in the other.
For Join Implementation, at execution time, a DB uses algorithms like:
- Nested-Loop Join: For each row in Table A (outer), scan matching rows in Table B. Simple and good when one table is small or when indexed on the join key - Worst-case cost is O(NĆM).
- Hash Join: Build a hash table on the smaller inputās join keys (build phase), then probe with the larger inputās rows (probe phase). Provides O(N+M) time for equi-joins. Excellent for large unordered tables with exact-key predicates.
- Sort-Merge Join: Sort both inputs on the join key and then perform a linear merge scan. Efficient when inputs are pre-sorted (or indexed) on the join key. It requires additional cost to sort if not already ordered.
Modern systems choose among these based on statistics: e.g., use a hash join when both sides are large and unsorted, use merge join if sorted (or a merge-friendly storage like columnar), and fall back to nested loops for highly selective conditions or small inputs. Cost-based optimizers will pick the plan with lowest estimated I/O/CPU.
When analyzing a query in databases like MySQL, the EXPLAIN statement shows the join type, indicating how the database engine accesses and joins tables.
Join types (from best to worst): const, eq_ref, ref, range, index, and ALL. ALL - The worst join typeāa full table scan, meaning every row is checked. This is slow, especially for large tables, thus bad for performance. index and range - These are better because they mean the database uses an index to access a subset of data, avoiding full table scans.
Distributed Systems
The CAP theorem states a distributed system can simultaneously provide only two of {Consistency, Availability, Partition Tolerance}. In practice, partitions (network failures) are a fact, so systems choose between strong consistency (always agree on a single value) and availability (always respond, possibly stale). Strong consistency (as in traditional RDBMS or Paxos-based systems) ensures every read sees the latest committed write, but can block or fail under partition. Eventual consistency (as in many NoSQL stores) allows reads to return outdated data but guarantees convergence when nodes reconcile. Many modern distributed DBs offer tunable consistency (e.g. Quorum reads/writes in Cassandra, linearizable reads vs timeline in DynamoDB).
Fault-tolerant systems use consensus protocols to agree on state changes. Paxos is the original consensus protocol for agreeing on a value among unreliable nodes. It guarantees safety (consistency) but is complex. Raft is a later protocol designed for understandability: it decomposes consensus into leader election, log replication, and safety rules. In Raft, one leader handles all log appends; if the leader fails, a new election is held. Both Paxos and Raft require a majority of replicas and can tolerate node failures (but not network partitions beyond a majority). These protocols underlie many systems (e.g. etcd, Consul, TiKV) to manage a replicated log or state machine, ensuring that updates (schema changes, metadata commits) occur atomically across nodes
Beyond MVCC, distributed systems use patterns like distributed locking (e.g. ZooKeeper/etcd locks), leader-follower sharding (each partition has a single writer leader), and CRDTs (conflict-free replicated data types) to handle concurrent updates. Many data lake systems use optimistic concurrency: readers get snapshot views and writers append updates; conflicts are resolved via atomic replace or merge (as with Delta Lakeās transaction log.
Low-Level Design
This round primarily focuses on SOLID, DRY, KISS, OOPs, Design Patterns and the overall modularity, maintainability and testability of the code.
The best way to master LLD is to write good code in your day job ;)
I will do all of these in different languages to share a feel on how each language has its own philosophy of doing things. Also, in these Machine Coding Rounds, you will be expected to write fully-functional code (maybe with class diagrams as well) in an hour, so practise these :
- Design Tic Tac Toe
- Design ATM
- Design Login System
- Design Web Scraper (will be used to train AI models, diffenrent forms of data finally converted to text )
- Design Parking Lot (vehicle entry, exit and pricing model)
- Design Library Management (catalog users, borrow-return logic)
- Design Snake Game (dynamic movement and collision detection)
- Design Chess Game
- Design Rate Limiter (Token bucket or sliding window algorithm)
- Design Elevator System (multi-threaded handling of floors and requests)
- Design Message Queue
- Design File System
- Design SQL (data structures to use, generic type of data, adding indexing, making it multithreaded, making it extensible, adding logs, deadlocks management)
High Level Rounds
Everybody seems to throw scaling buzzwords in interviews as if they are being hired to build the next Whatsapp! Most probably you will do some vibe coding in your day-job, be honest!
But anyways, understanding the building blocks of designing a large scale system is cool. The broad understanding of designing large-scale systems in these domains are required.
Data Engineering
Data Lake is the foundational storage containing raw data, on top of which Delta Lake acts as an open format storage layer, managing metadata giving us relaibility, security and performance on the data lake - for both streaming and batch operations. It extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling.
These days, Delta Lakes is mostly used for managing data - useful for un/semi/strcuture as well as streaming data with Schema on Read - accepted even if data is not exactly aligned with schema. Data Lake also supports it whereas Data Warehouse is Schema on Write i.e the data is discarded when writing if its not aligned with schema. Also ACID transaction support is there in Delta Lake and Warehouse but Lake has minimal support. Also, the data from Delta Lake and Data Warehouse doesnt leave the system curropted (unlike Data Lake) making it reliable.
Design a distributed data lake architecture that can handle petabyte-scale data How would you build a high-performance object storage system with ACID compliance? Design a metadata management service for lakehouse operations Architecture for data compression and optimization at scale
MLOps
You should know the tradeoffs of your decision.
Why I wrote this?
So that I can ponder over these notes in my university classes before interviews as well (ć„ļ½”āāæāæāļ½”)ć„