From production systems to offer letters.
One hub that turns what you already ship — RAG pipelines, multi-registry data systems, agentic bots — into crisp, interview-ready knowledge, then extends it into the platform layer: containers, Kubernetes, AWS, and the Ops disciplines (MLOps · LLMOps · AIOps). Every topic runs the same rail: concept → workflow → code → on-the-job → interview.
Tip: combine the search box with the colour filters. Click an active pill again to clear it.
Python Foundations
You write this every day — so this section is tuned for revision speed and interview traps, not basics. The gotchas below are the ones panels actually probe: identity vs equality, mutability, scope, the GIL.
The object model & dynamic typing model
Every value in Python is an object with an identity, a type, and a value. A variable is just a name bound to an object — not a box holding bytes. Names are dynamically typed; the object carries the type, which is why x = 10 then x = "hi" is legal.
x = 10
print(type(x), id(x)) # <class 'int'> 140...
print(isinstance(x, int)) # True — prefer isinstance over type() ==
# Truthiness: empty containers / 0 / None / "" are falsy
if not []: print("empty list is falsy")
if (0 or "fallback"): print("or returns first truthy → 'fallback'")
Interview Q&A
isinstance(x, int) over type(x) == int?isinstance respects inheritance (a subclass passes), and it accepts a tuple of types. type() == is an exact-class check that breaks polymorphism.Hold this picture: an object lives on the heap and owns three things forever — an identity (its address, via id()), a type (fixed at creation), and a value. A name is a label in a namespace dict that points at an object. Assignment never copies a value; it only re-points a label. That single rule explains aliasing, garbage collection timing, and why two names can mutate "each other".
Python is dynamically typed (the type check happens at runtime, on the object) yet strongly typed (it will not silently coerce "3" + 5). Mixing those up is a classic interview slip. Type hints add an optional, erased layer — they are read by tools like mypy but the interpreter ignores them at runtime.
# strong typing: no implicit string/number coercion
try:
total = "3" + 5 # TypeError, unlike JS/PHP
except TypeError as e:
print("refused:", e)
# duck typing: behaviour, not declared type, decides usability
def total_len(items):
return sum(len(x) for x in items) # works for list/str/tuple/dict
print(total_len(["ab", "cde"])) # 5
print(total_len(("x", "yz"))) # 3 — same code, different type
# type hints are erased at runtime — they don't enforce anything
def greet(name: str) -> str:
return "hi " + name
print(greet(123) if False else greet("Sam")) # hint is advisory only
| Concept | Means | Python's answer |
|---|---|---|
| Dynamic vs static | when types are checked | dynamic (runtime, on the object) |
| Strong vs weak | how strict coercion is | strong (no implicit mixing) |
| Nominal vs duck | how usability is decided | duck (has the method? good enough) |
Interview Q&A · deep dive
"3" + 5 raises TypeError). The two axes are independent — dynamic is about when, strong is about how strict.__annotations__ and read by static checkers/IDEs/runtime libraries that opt in (pydantic, dataclasses), but the interpreter does no enforcement. greet(123) runs unless an external tool flags it.id()), type (fixed at creation), and value (mutable only if the object's type allows it). You can never change an object's identity or type — you make a new object and rebind the name.int itself?type(int) is type — classes are objects too, and their type is the metaclass. This is what makes class a first-class, programmable construct.Mutable vs immutable — the classic trap gotcha
Immutable: int float str tuple frozenset bytes. Mutable: list dict set and most custom objects. This drives copying behaviour, dict keys, and the single most-asked Python bug: the mutable default argument.
# BUG: default list is created once, shared across calls
def add(item, bucket=[]): # ❌
bucket.append(item); return bucket
add(1); add(2) # → [1, 2] (leaks between calls!)
def add(item, bucket=None): # ✅ sentinel pattern
if bucket is None: bucket = []
bucket.append(item); return bucket
# is vs == : identity vs equality
a = [1,2]; b = a[:]
print(a == b, a is b) # True False — equal value, different object
Interview Q&A
tuple (or frozenset) instead.a is b surprisingly True for ints/strings?is may return True — but that's an implementation detail. Never use is for value comparison; only for None/sentinels.Immutability is not just a restriction — it is what makes an object usable as a dict key or set member, because the hash must stay constant for the object's lifetime. It also makes objects safe to share freely (no defensive copying) and lets CPython cache/intern them. The mental rule: if you would not want a value to change under you while it sits in a set, it should be immutable.
# A tuple is immutable, but it can hold a mutable object —
# so a tuple is only hashable if ALL its members are hashable.
t = (1, [2, 3])
t[1].append(4) # legal! the tuple slot still points to the same list
print(t) # (1, [2, 3, 4]) — "immutable" container, mutable content
try:
{t: "x"} # TypeError: unhashable type: 'list'
except TypeError as e:
print(e)
# Freeze a coordinate so it can be a key
def freeze(point):
return tuple(point) # list -> tuple, now hashable
grid = {}
grid[freeze([0, 0])] = "start"
grid[freeze([1, 2])] = "goal"
print(grid[(0, 0)]) # start
import copy
config = {"retries": 3, "hosts": ["a", "b"]}
shallow = copy.copy(config)
deep = copy.deepcopy(config)
shallow["hosts"].append("c") # mutates the SHARED inner list
print(config["hosts"]) # ['a','b','c'] — leaked into the original!
print(deep["hosts"]) # ['a','b'] — fully independent
Interview Q&A · deep dive
(1, 2) is fine; (1, [2]) raises TypeError: unhashable type: 'list' when you try to hash it, because its hash would depend on a mutable member.__hash__ and __eq__?__eq__ and want the object hashable, you must also define __hash__ over the same fields, or Python sets it to None (unhashable).copy.copy duplicates only the top level; inner mutable objects stay shared, so mutating them leaks across copies. Reach for copy.deepcopy for true independence — at the cost of full traversal.id() of an immutable sometimes match after "modifying" it?x += 1 on an int create a new int and rebind x; the original object is untouched.Scope, LEGB & closures scope
Name lookup walks L→E→G→B: Local, Enclosing, Global, Built-in. A closure is an inner function that captures variables from its enclosing scope and keeps them alive after the outer function returns.
def counter():
count = 0
def tick():
nonlocal count # write to enclosing var, not a new local
count += 1; return count
return tick # closure: 'count' survives
c = counter(); print(c(), c(), c()) # 1 2 3
Interview Q&A
[lambda: i for i in range(3)] all return 2 — they capture the variable i, not its value. Fix by binding per-iteration: lambda i=i: i.A closure does not copy the enclosing variables — it keeps a live reference to each captured variable through a cell object. Those cells are exposed as fn.__closure__, and the names live in fn.__code__.co_freevars. This is why two closures created from the same call share the same cell and see each other's writes.
def make_pair():
n = 0
def inc(): nonlocal n; n += 1; return n
def get(): return n
return inc, get
inc, get = make_pair()
inc(); inc()
print(get()) # 2 — both close over the SAME n cell
print(inc.__code__.co_freevars) # ('n',)
print(inc.__closure__[0].cell_contents) # 2 — the live captured value
# BUG: all closures share one 'i', read AFTER the loop ends
bad = [lambda: i for i in range(3)]
print([f() for f in bad]) # [2, 2, 2]
# FIX 1: default argument snapshots the value at def-time
ok1 = [lambda i=i: i for i in range(3)]
print([f() for f in ok1]) # [0, 1, 2]
# FIX 2: a factory gives each closure its own scope
def make(i): return lambda: i
ok2 = [make(i) for i in range(3)]
print([f() for f in ok2]) # [0, 1, 2]
| Keyword | What it rebinds | When to use |
|---|---|---|
| (none) | creates/reads a local | default — most code |
| nonlocal | nearest enclosing function var | closures, counters |
| global | module-level var | rare; module config/singletons |
Interview Q&A · deep dive
__closure__). Sibling closures from the same call share cells, so a write through one is visible to the other. That's why late binding happens: the value is read when the closure runs, not when it's defined.nonlocal instead of just assigning?nonlocal, count += 1 creates a new local and raises UnboundLocalError on the read. nonlocal tells Python to bind the existing enclosing variable.global and nonlocal?global rebinds a name at module scope; nonlocal rebinds the nearest enclosing function scope (never module, never builtin). nonlocal requires an enclosing function with that name to exist or it's a SyntaxError.[i for i in range(3)] leaves no i behind. This also prevents accidentally clobbering an outer i.Decorators pattern
A decorator is a callable that takes a function and returns a replacement — the standard way to add cross-cutting behaviour (timing, retries, caching, auth) without touching the wrapped code.
import functools, time
def retry(n=3):
def deco(fn):
@functools.wraps(fn) # preserve name/docstring
def wrap(*a, **kw):
for i in range(n):
try: return fn(*a, **kw)
except Exception:
if i == n-1: raise
time.sleep(2**i) # exponential backoff
return wrap
return deco
@retry(5)
def call_registry_api(url): ...
Interview Q&A
@functools.wraps?__name__, __doc__ and signature — breaking introspection, logging, and tools that read metadata. wraps copies them across.A decorator runs in two distinct phases that trip people up. At definition time (when Python reads the @deco line) the decorator is called once with the function and returns a replacement that gets bound to the name. At call time (every invocation) it is the wrapper that runs, deciding whether/how to delegate to the original. Decorators with arguments add a third outer layer that returns the actual decorator.
import functools, time, logging
log = logging.getLogger("perf")
def timed(_fn=None, *, threshold=0.0):
# supports both @timed and @timed(threshold=0.5)
def deco(fn):
@functools.wraps(fn)
def wrap(*a, **kw):
t0 = time.perf_counter()
try:
return fn(*a, **kw)
finally:
dt = time.perf_counter() - t0
if dt >= threshold:
log.info("%s took %.3fs", fn.__name__, dt)
return wrap
return deco if _fn is None else deco(_fn)
@timed(threshold=0.5)
def extract(url): time.sleep(0.6); return "ok"
extract("https://reg/api") # logs: extract took 0.6xx s
Interview Q&A · deep dive
@deco → deco(fn). With arguments, @deco(x) first calls deco(x), which must return the real decorator; that return value is then applied to fn. So an argumented decorator is one extra layer of nesting.def wraps first. They execute top-down: a call enters the outermost wrapper first. So @a @b def f is a(b(f)) — apply b then a, run a then b then f.@functools.wraps copy, and why care?__name__, __doc__, __module__, __qualname__, __dict__ and sets __wrapped__ to the original. Without it, introspection, logging, OpenAPI generation, and signature-based dispatch all see the wrapper instead of your function.__init__ stores the function and whose __call__ implements the wrapper behaves as a decorator, and it can hold state (call counts, caches) cleanly as instance attributes.Comprehensions & the functional trio idiom
Comprehensions are the Pythonic map+filter. Map transforms, filter selects, reduce aggregates. Prefer a comprehension for readability; reach for map/filter when passing an existing function.
nums = [1,2,3,4,5,6]
squares_even = [n*n for n in nums if n%2==0] # [4,16,36]
by_id = {r["id"]: r for r in records} # dict comp = fast index
seen = {r["email"] for r in records} # set comp = dedupe
from functools import reduce
total = reduce(lambda a,b: a+b, nums) # 21 (sum() is clearer)
gen = (n*n for n in nums) # generator — lazy, O(1) memory
Interview Q&A
len). Generator when you iterate once and want constant memory — e.g. streaming a 5M-row export through a transform.A list comprehension is not just shorter — it is faster than an equivalent for+append loop because CPython uses a specialised LIST_APPEND opcode and skips repeated attribute lookups of .append. The decision tree: need the whole collection now → comprehension; iterate once over huge data → generator expression (lazy, O(1) memory); building a lookup → dict/set comprehension.
from collections import Counter
def rows(path):
with open(path) as f:
for line in f:
yield line.rstrip().split(",")
# generator pipeline — nothing is materialised until consumed
recs = rows("trials.csv")
phases = (r[2] for r in recs if r[2]) # lazy filter+map
counts = Counter(phases) # consumes the stream once
for phase, n in counts.most_common(3):
print(f"{phase}: {n}")
# nested comp: flatten a list of lists (read left-to-right as nested for)
matrix = [[1, 2], [3, 4]]
flat = [x for row in matrix for x in row] # [1,2,3,4]
| Form | Builds | Memory | Reach for it when |
|---|---|---|---|
| [x for …] | list | O(n) | need to index / reuse / len |
| (x for …) | generator | O(1) | iterate once over big data |
| {k: v for …} | dict | O(n) | build an index / lookup |
| {x for …} | set | O(n) | dedupe / membership |
Interview Q&A · deep dive
LIST_APPEND opcode and avoids re-resolving list.append on every iteration. The work happens in C, with fewer Python-level frame operations.for statements: [x for row in m for x in row] means for row in m: for x in row: yield x. The first clause is the outer loop.map/filter preferable to a comprehension?map(int, tokens) is cleaner than [int(t) for t in tokens]. But if you'd need a lambda, the comprehension is usually clearer and as fast.Lambdas — anonymous, single-expression functions functional
A lambda is a function with no name, written inline: lambda args: expression. It exists for exactly one reason — passing a tiny, throwaway function where giving it a name would be noise (a sort key, a callback, a one-line transform). The senior rule: reach for a lambda only when the body is a single trivial expression used once. The moment you need a statement, a name worth reusing, or a docstring, write a def.
# these two are equivalent — lambda is just sugar for a one-expression def
add = lambda a, b: a + b
def add(a, b): return a + b
# a lambda body is ONE expression — no statements allowed:
# lambda x: x += 1 -> SyntaxError (assignment is a statement)
# lambda x: print(x); x -> SyntaxError (two statements)
# lambda x: return x -> SyntaxError (return is a statement)
ok = lambda x: x if x > 0 else 0 # conditional EXPRESSION is fine
# 1) the key= argument — by far the most common real use
trials.sort(key=lambda t: t["enrollment"]) # sort by one field
top = max(sites, key=lambda s: s.recruited) # pick by a derived value
rows = sorted(rows, key=lambda r: (r.country, -r.n)) # multi-key sort
# 2) tiny inline transforms / callbacks
names = list(map(lambda s: s.strip().lower(), raw)) # (a comprehension is often clearer)
df["band"] = df.apply(lambda r: "big" if r.n > 100 else "small", axis=1)
# 3) a default factory that needs an argument-free callable
from collections import defaultdict
counts = defaultdict(lambda: 0) # or just int; lambda shines for non-trivial defaults
groups = defaultdict(lambda: {"n": 0, "ids": []})
| Use a lambda when… | Use a def when… |
|---|---|
| it's a one-line expression passed inline | the body needs a statement, loop, or try |
| it's a key= / callback used once | you'll reuse it or call it from several places |
| naming it would add no clarity | it deserves a docstring or a clear name |
| a default factory (defaultdict) | it needs unit tests of its own |
Interview Q&A
def is for anything with a body worth naming, testing, documenting, or reusing — or that needs statements (assignments, loops, try), which a lambda can't contain. If you're tempted to assign a lambda to a name, that's the signal to use def.return, =, for, or try aren't allowed. You can still use expression-form constructs: a conditional expression (a if c else b), comprehensions, or the walrus := for an inline assignment-expression. Anything beyond that means you've outgrown a lambda.lambda i=i: ...) or build the function in a helper that takes i as a parameter, creating a fresh binding per call.map/filter + lambda for the same work.)*args, **kwargs & argument passing api
*args collects extra positional args into a tuple; **kwargs collects extra keyword args into a dict. They make functions flexible and are how you write transparent wrappers and pass config through layers.
def run_extractor(name, *sources, retries=3, **opts):
print(name, sources, retries, opts)
run_extractor("ctgov", "v1", "v2", retries=5, timeout=30)
# ctgov ('v1','v2') 5 {'timeout': 30}
cfg = {"retries":5, "timeout":30}
run_extractor("euct", **cfg) # unpack dict into kwargs
Interview Q&A
* in a signature do?def f(a, *, b) makes b keyword-only — callers must write f(1, b=2). Great for booleans/flags so call sites stay readable.Python's signature grammar is richer than most people use. The full order is positional-only (before /), then normal, then *args, then keyword-only (after *), then **kwargs. The / marker (PEP 570) lets you forbid passing an argument by name — useful when a parameter name is an implementation detail you may rename.
# pos-only before /, keyword-only after *
def connect(host, /, port=443, *, timeout=30, **driver_opts):
return (host, port, timeout, driver_opts)
connect("db.local", 5432, timeout=5, ssl=True)
# host is positional-only: connect(host="x") would raise TypeError
# timeout is keyword-only: must be named, never positional
# A transparent wrapper forwards everything unchanged
def traced(fn):
def wrap(*args, **kwargs):
print("call", fn.__name__, args, kwargs)
return fn(*args, **kwargs) # re-unpack: pass through intact
return wrap
defaults = {"retries": 3, "timeout": 30}
override = {"timeout": 5}
# later keys win — clean layered config without mutation
final = {**defaults, **override} # {'retries':3,'timeout':5}
def run(*sources, **cfg):
return sources, cfg
print(run(*["a", "b"], **final)) # (('a','b'), {'retries':3,'timeout':5})
| Marker | Effect | Why |
|---|---|---|
| / | args before it are positional-only | free to rename params later |
| *args | collects extra positionals (tuple) | variadic, forwarding |
| * (bare) | everything after is keyword-only | readable, safe flags |
| **kwargs | collects extra keywords (dict) | pass-through config |
Interview Q&A · deep dive
* in a definition vs in a call?*args collects surplus positional arguments into a tuple. In a call, *iterable unpacks the iterable into separate positional arguments. Same for **: collect into a dict vs unpack a dict into keyword args./ and bare * markers do?/ makes every parameter before it positional-only (callers can't use its name). A bare * makes every parameter after it keyword-only (callers must name them). Together they give precise control over the calling convention.*args, and keywords?*args, then matches keyword arguments to keyword-only / remaining params, and finally collects leftover keywords into **kwargs. A keyword that duplicates an already-filled positional raises TypeError.{**a, **b} resolve to on key conflict?b's value overrides a's for shared keys. It builds a new dict without mutating either, which makes it the idiomatic layered-config merge.OOP, dunder methods & MRO design
Four pillars — encapsulation, abstraction, inheritance, polymorphism. Dunder (magic) methods hook your objects into language syntax. @dataclass removes boilerplate for data-holding classes.
| Kind | First arg | Use |
|---|---|---|
| instance | self | per-object state |
| @classmethod | cls | alt constructors, class state |
| @staticmethod | — | namespaced helper |
| @property | self | computed attr w/o () |
from dataclasses import dataclass
@dataclass
class Trial:
nct_id: str
phase: str = "NA"
def __repr__(self): return f"<Trial {self.nct_id}>"
def __eq__(self, o): return self.nct_id == o.nct_id # identity by NCT
# super() + MRO: cooperative inheritance
class Base: def load(self): print("base")
class Registry(Base): def load(self): super().load(); print("registry")
Trial.__mro__ # resolution order C3 linearisation
Interview Q&A
super() solve?super() follows the MRO rather than hard-coding a parent, enabling cooperative mixins.With multiple inheritance Python needs one deterministic order in which to search bases. It uses C3 linearisation: the MRO of a class is the class itself, followed by a merge of the MROs of its parents and the list of parents — preserving each parent's order and never placing a class before its subclass. If no consistent order exists, the class statement itself raises TypeError. super() walks this list, which is what makes cooperative multiple inheritance work.
class A:
def load(self): print("A")
class B(A):
def load(self): print("B"); super().load()
class C(A):
def load(self): print("C"); super().load()
class D(B, C): # the diamond
def load(self): print("D"); super().load()
D().load() # D B C A — each runs once, in MRO order
print([c.__name__ for c in D.__mro__]) # ['D','B','C','A','object']
class Money:
__slots__ = ("cents",) # no __dict__: less memory, fixed attrs
def __init__(self, cents): self.cents = cents
def __repr__(self): return f"Money({self.cents})"
def __add__(self, o): return Money(self.cents + o.cents) # enables +
def __eq__(self, o): return self.cents == o.cents # enables ==
def __hash__(self): return hash(self.cents) # keep hashable after __eq__
def __lt__(self, o): return self.cents < o.cents # enables sort/<
print(Money(150) + Money(50)) # Money(200)
print(sorted([Money(9), Money(1)])) # [Money(1), Money(9)]
| Dunder | Triggered by | Note |
|---|---|---|
| __repr__ / __str__ | repr() / str(), print | repr for devs, str for users |
| __eq__ + __hash__ | ==, set/dict keys | define together or lose hashability |
| __lt__ … | <, sorted() | or use @total_ordering |
| __enter__ / __exit__ | with | resource management |
| __call__ | obj() | makes instances callable |
Interview Q&A · deep dive
super() follows this list, so each ancestor's cooperative method runs exactly once.super() be used everywhere for cooperative inheritance to work?super() delegates to the next class in the MRO, not a hard-coded parent. If one class in a diamond hard-codes A.load(self) or skips super(), the chain breaks and some ancestors are skipped or run twice.__slots__ buy and cost you?__dict__ with a fixed set of descriptors: lower memory and faster attribute access. The cost is you can't add new attributes dynamically, and multiple-inheritance with slots needs care. Great for many small objects, unnecessary for a handful.__repr__ and __str__?__repr__ targets developers — ideally unambiguous and eval-able; it's the fallback for containers and the REPL. __str__ targets end users / display. If you define only one, define __repr__, since str() falls back to it.__eq__ require thinking about __hash__?__eq__ sets __hash__ to None (unhashable) unless you also define __hash__ over the same fields — otherwise sets and dict keys would behave inconsistently with your equality.The four pillars of OOP fundamentals
Every OOP interview circles the same four ideas. Don't just define them — say what each buys you and how Python expresses it (which is looser than Java/C++: Python uses convention and duck typing, not hard access modifiers).
| Pillar | One line | Python expresses it as |
|---|---|---|
| Encapsulation | bundle data + behaviour, hide internals behind an interface | convention (_x protected, __x name-mangled), @property |
| Abstraction | expose what, hide how | ABCs (abc.ABC), Protocol, clean public methods |
| Inheritance | a subclass reuses/specialises a base ("is-a") | class Sub(Base), super(), the MRO |
| Polymorphism | one interface, many behaviours | method overriding + duck typing ("if it quacks…") |
from abc import ABC, abstractmethod
class Extractor(ABC): # abstraction: defines the contract
def __init__(self, name):
self._name = name # encapsulation: protected by convention
@property
def name(self): return self._name # controlled access
@abstractmethod
def parse(self, raw): ... # subclasses MUST implement
class CtgovExtractor(Extractor): # inheritance: is-a Extractor
def parse(self, raw): return {"phase": raw["phase"]} # polymorphism: overrides parse
def run(ex: Extractor, raw): return ex.parse(raw) # works for ANY Extractor
Interview Q&A
Both express abstraction, but they answer "what counts as a valid type?" differently. An abc.ABC is nominal: a class is an Extractor only if it explicitly subclasses it. A typing.Protocol is structural (static duck typing): anything with the right methods satisfies it, no inheritance required — the type checker verifies the shape.
from typing import Protocol, runtime_checkable
@runtime_checkable
class Parser(Protocol): # a shape, not an ancestor
def parse(self, raw: dict) -> dict: ...
class Ctgov: # note: does NOT subclass Parser
def parse(self, raw): return {"phase": raw["phase"]}
class Euct:
def parse(self, raw): return {"phase": raw.get("trialPhase")}
def run_all(parsers: list[Parser], raw: dict):
return [p.parse(raw) for p in parsers] # polymorphism by shape
rows = run_all([Ctgov(), Euct()], {"phase": "III", "trialPhase": "III"})
print(isinstance(Ctgov(), Parser)) # True — runtime_checkable checks methods
class Base:
def log(self): print("Base")
class Audited(Base):
def log(self): print("Audited"); super().log()
class Cached(Base):
def log(self): print("Cached"); super().log()
class Service(Audited, Cached): pass
Service().log() # Audited -> Cached -> Base (each super() walks the MRO once)
print([c.__name__ for c in Service.__mro__])
# ['Service', 'Audited', 'Cached', 'Base', 'object']
| Concept | ABC (nominal) | Protocol (structural) |
|---|---|---|
| Conformance | must explicitly subclass | just match the method shape |
| Checked when | at instantiation (runtime) | statically by mypy/pyright |
| Best for | your own class trees you control | typing third-party / external classes |
| Cost | couples to a base class | zero coupling, no inheritance |
Interview Q&A · deep dive
ABC uses nominal typing: a class conforms only by explicitly subclassing, and an abstract method blocks instantiation until overridden — good for class trees you own and want enforced at runtime. A Protocol uses structural typing: anything with the matching methods conforms, checked statically. Reach for Protocol to type objects you don't control (third-party SDKs) without forcing inheritance.super() do in multiple inheritance?super().method() calls, a diamond hierarchy runs each class's method exactly once in C3-linearized order. The catch: every participant must call super() with a compatible signature, or the cooperative chain breaks.*args/**kwargs branching, or — cleanly — functools.singledispatch to dispatch on the type of the first argument. For type-checker-visible overloads of a single implementation, typing.overload declares the signatures.property — computed & validated attributes attributes
@property makes a method look like a plain attribute — callers write obj.area, not obj.area(). The senior point is the uniform access principle: expose attributes directly, and the day you need validation or a computed value, swap in a property without changing the public API. That's why idiomatic Python has no Java-style getX()/setX() boilerplate up front — you add the getter/setter only when a real reason appears, and no caller has to change.
class Account:
def __init__(self, balance):
self._balance = balance # note the underscore: real storage
@property # the getter — read obj.balance
def balance(self):
return self._balance
@balance.setter # validate on write: obj.balance = 50
def balance(self, value):
if value < 0:
raise ValueError("balance cannot be negative")
self._balance = value
@balance.deleter # del obj.balance
def balance(self):
del self._balance
@property # read-only computed value — no setter
def is_overdrawn(self):
return self._balance < 0
a = Account(100)
a.balance = 50 # goes through the setter (validated)
a.is_overdrawn # computed on access; assigning to it raises AttributeError
| Piece | What it gives you | Note |
|---|---|---|
| @property (getter) | read obj.x runs your code | with no setter, the attribute is read-only |
| @x.setter | validate/transform on write | store to a different name (self._x) |
| @x.deleter | hook del obj.x | rarely needed |
| functools.cached_property | compute once, cache on the instance | expensive derived value; recomputed only if you del it |
from functools import cached_property, lru_cache
class Dataset:
@cached_property # stored in self.__dict__; per-instance; recompute = del obj.stats
def stats(self):
return expensive_scan(self.path)
@property # DON'T stack @property over @lru_cache:
@lru_cache # lru_cache keys on `self`, so it pins every
def bad(self): # instance alive forever -> memory leak
return expensive_scan(self.path)
Interview Q&A
@property instead of a plain attribute or a get_x() method?obj.x, so you can start with a public attribute and later add validation, computation, or logging behind it without breaking a single caller. A get_x() method would force every call site to change the day you needed control — the property gives you that control for free while preserving the simple attribute syntax.self.x = ...), which re-invokes the setter. Store the value under a different backing name (self._x) and have the getter return that. The property is the public interface; _x is the real storage.cached_property vs property vs lru_cache?property recomputes on every access. cached_property computes once and stores the result in the instance's __dict__ — subsequent reads are a plain dict hit, and it only recomputes if you del the attribute (it has no setter). Stacking property over lru_cache is an anti-pattern: the cache keys on self, keeping every instance alive and leaking memory.property a descriptor?__get__ and __set__ on the class). That's why it takes precedence over an instance attribute of the same name and can't be shadowed. It's the textbook example of the descriptor protocol that also powers methods, classmethod, and ORM fields.Generators & iterators scale
An iterator implements __next__; a generator is the easy way to make one using yield. It produces values lazily and holds constant memory regardless of dataset size — the backbone of streaming ETL.
def read_rows(path):
with open(path) as f:
for line in f: # file object is itself lazy
yield line.rstrip().split(",")
# chain lazy stages — nothing materialises until consumed
rows = read_rows("investigators.csv")
valid = (r for r in rows if r[2])
parsed = (normalize(r) for r in valid)
for rec in parsed: # 5M rows, O(1) memory
upsert(rec)
Interview Q&A
A for loop is sugar. The interpreter calls iter(obj) once to get an iterator, then calls next() on it repeatedly until StopIteration is raised — that exception is the loop's stop signal, not an error. A generator function builds this iterator for you: each yield hands back one value and freezes the frame (locals, instruction pointer); the next next() thaws it and resumes on the line after the yield.
def countdown(n):
while n > 0:
yield n # pause here, return n, remember n and the line
n -= 1 # resumes HERE on the next next()
g = countdown(3) # nothing runs yet — calling a gen fn returns a generator
print(next(g)) # 3 (runs to the first yield)
print(next(g)) # 2
print(next(g)) # 1
next(g) # raises StopIteration -> a for-loop catches this and stops
def running_avg():
total = count = 0
avg = None
while True:
x = yield avg # yield is also an expression: receives send()'d values
total += x; count += 1
avg = total / count
a = running_avg()
next(a) # prime it: advance to the first yield
print(a.send(10)) # 10.0
print(a.send(20)) # 15.0 (state persists between calls)
a.close() # raises GeneratorExit inside, stops the coroutine
def flatten(nested):
for sub in nested:
yield from sub # delegate: re-yield every item, transparently
print(list(flatten([[1, 2], [3]]))) # [1, 2, 3]
| Construct | Memory | Re-iterable? | Use when |
|---|---|---|---|
| list comp [x for x] | O(n) | yes | need it more than once / random access |
| gen expr (x for x) | O(1) | no (single pass) | stream once into a sink (sum, write, upsert) |
| generator fn | O(1) | fresh per call | complex lazy logic, statefulness, pipelines |
Interview Q&A · deep dive
yield?next()/send(), the frame is restored and execution resumes on the statement after the yield. It's a paused function, not a returned value — which is why state survives across calls for free.for loop know when to stop?iter() once, then next() repeatedly. When the iterator raises StopIteration, the loop catches it and ends normally. In a generator, falling off the end (or hitting return) raises StopIteration automatically — StopIteration is a control-flow signal, not an error condition.send() do that next() doesn't?yield is an expression, so it can receive a value: x = yield. send(val) resumes the generator and makes that yield evaluate to val (plain next() is equivalent to send(None)). This turns a generator into a coroutine you can push data into — though you must "prime" it with one next() first so it's paused at a yield.yield from sub instead of a loop that re-yields?yield from delegates the entire sub-iterator: it re-yields every value, and crucially forwards send(), throw(), and the sub-generator's return value back to the delegating generator. A manual for x in sub: yield x only handles the value flow, not the two-way coroutine protocol — which is exactly why yield from was the foundation of pre-async coroutines.Context managers (with) safety
The with block guarantees setup/teardown even on exceptions — closing files, releasing connections, committing or rolling back transactions. Implement __enter__/__exit__ or use @contextmanager.
from contextlib import contextmanager
@contextmanager
def transaction(conn):
cur = conn.cursor()
try:
yield cur
conn.commit() # success → commit
except Exception:
conn.rollback() # failure → roll back, then re-raise
raise
finally:
cur.close()
with transaction(conn) as cur:
cur.execute("INSERT INTO trials VALUES (?,?)", row)
Interview Q&A
with guarantee cleanup on error?__exit__ is called whether the block exits normally or via exception (it receives the exception info). Returning falsy from __exit__ re-raises; the finally in a @contextmanager generator plays the same role.A with expr as v: block is shorthand for calling two dunders around the body. __enter__ runs first and its return value is bound to v; the body runs; then __exit__(exc_type, exc, tb) runs no matter what — normal exit passes three Nones, an exception passes its details. The teardown is guaranteed even on return, break, or a raised exception mid-body.
import time
class Timed:
def __init__(self, label): self.label = label
def __enter__(self):
self.t0 = time.perf_counter()
return self # bound to the 'as' target
def __exit__(self, exc_type, exc, tb):
dt = time.perf_counter() - self.t0
print(f"{self.label}: {dt:.3f}s")
if exc_type is TimeoutError:
print(" (suppressing timeout)")
return True # truthy -> SWALLOW the exception
return False # falsy / None -> let it propagate
with Timed("query") as t:
raise TimeoutError # __exit__ still runs, sees it, returns True -> no crash
print("continued") # reached, because the exception was suppressed
from contextlib import ExitStack
# open a dynamic number of resources, all closed in reverse on exit
with ExitStack() as stack:
files = [stack.enter_context(open(p)) for p in paths]
merged = merge(files) # every file guaranteed closed, even on error
# async resources need __aenter__/__aexit__, driven by 'async with'
from contextlib import asynccontextmanager
@asynccontextmanager
async def lease(pool):
conn = await pool.acquire()
try:
yield conn
finally:
await pool.release(conn)
| __exit__ returns | If body raised | Result |
|---|---|---|
| falsy / None | yes | exception re-raised (the default) |
| truthy (True) | yes | exception suppressed — body looks like it succeeded |
| any value | no | ignored; teardown ran, control continues |
Interview Q&A · deep dive
with desugar to exactly?mgr.__enter__() and bind its result to the as target; run the body inside an implicit try; in the equivalent of finally, call mgr.__exit__(exc_type, exc, tb) with either the live exception's details or three Nones. The return value of __exit__ decides whether a pending exception is re-raised or suppressed.@contextmanager generator map onto __enter__/__exit__?yield is __enter__ (the yielded value becomes the as target); the code after the yield is __exit__. If the body raises, the exception is thrown into the generator at the yield point, so wrapping the yield in try/except/finally gives you the rollback-and-cleanup logic. Re-raising (or not) inside that except is how you choose to propagate or suppress.__exit__ (or catch-and-don't-reraise in a @contextmanager). You should only do it for a specific expected exception (contextlib.suppress(FileNotFoundError) is the clean idiom). Blanket suppression hides real bugs and makes a failing block look successful — almost always the wrong default.ExitStack over nested with statements?with blocks. ExitStack lets you register each opened resource with enter_context as you go and guarantees all of them are exited in reverse order when the stack closes — including correct exception handling — replacing brittle hand-written try/finally ladders.Concurrency & the GIL heavy hitter
CPython's Global Interpreter Lock lets only one thread execute Python bytecode at a time. So threads help I/O-bound work (waiting on network/disk releases the GIL) but not CPU-bound work — for that you need processes.
| Workload | Tool | Why |
|---|---|---|
| Many API calls / I/O wait | asyncio or threads | GIL released while waiting; huge concurrency |
| Heavy CPU (parse, embed, math) | multiprocessing | separate interpreters → true parallelism |
| Mixed / simplest | concurrent.futures | one API, swap Thread/Process pool |
import asyncio, aiohttp
async def fetch(session, url):
async with session.get(url) as r: return await r.json()
async def pull_all(urls):
async with aiohttp.ClientSession() as s:
return await asyncio.gather(*[fetch(s,u) for u in urls])
# 40 registries pulled concurrently; GIL is a non-issue (I/O bound)
Interview Q&A
asyncio scales to thousands of in-flight calls on one thread with low overhead; threads cost ~MBs of stack each and add context-switching. Async wins for high-concurrency I/O — provided the libraries are async-native.The GIL isn't laziness — it's the price of CPython's reference counting. Every object's refcount is mutated constantly; making those increments atomic per-object would need a lock on every object and wreck single-thread speed. One global lock is the cheap alternative. A thread holds the GIL while running bytecode and releases it (a) voluntarily on blocking I/O and many C-extension calls, and (b) involuntarily every few milliseconds (the "check interval", sys.setswitchinterval) so other threads get a turn.
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def cpu(n): # pure-Python CPU work: holds the GIL
return sum(i*i for i in range(n))
def timed(executor, fn, args):
t0 = time.perf_counter()
with executor() as ex:
list(ex.map(fn, args))
return time.perf_counter() - t0
work = [5_000_000] * 4
print("threads:", timed(ThreadPoolExecutor, cpu, work)) # ~no speedup: GIL serializes
print("procs: ", timed(ProcessPoolExecutor, cpu, work)) # ~4x: real parallelism
# swap cpu() for a requests.get() and the THREAD version wins instead
import threading
counter = 0
def bump():
global counter
for _ in range(1_000_000):
counter += 1 # load, add, store: 3 bytecodes — GIL can switch between them
ts = [threading.Thread(target=bump) for _ in range(4)]
for t in ts: t.start()
for t in ts: t.join()
print(counter) # < 4_000_000: lost updates. GIL != thread safety
# fix: guard the shared state with threading.Lock()
| Model | Parallelism | Cost / scale | Best for |
|---|---|---|---|
| asyncio | none (1 thread) | thousands of tasks, KBs each | massive I/O concurrency, async-native libs |
| threads | I/O only | ~MBs of stack each | blocking I/O libs, moderate concurrency |
| processes | true (multi-core) | heavy: separate interpreters, IPC pickling | CPU-bound: parsing, math, embedding |
Interview Q&A · deep dive
sys.setswitchinterval) so a CPU-bound thread can't starve the others. That periodic release is exactly why pure-Python threads round-robin but don't parallelize.x += 1 on shared state races. A few single-bytecode operations happen to be atomic, but you should never rely on that; guard shared mutable state with locks or hand it through a queue.Queue.asyncio (thousands of cheap tasks on one thread). Blocking I/O with only sync libraries → threads. CPU-bound work that needs multiple cores → processes (separate interpreters sidestep the GIL, at the cost of IPC/pickling). concurrent.futures lets you start with a thread pool and swap to a process pool with a one-line change.Python internals — how it really works deep
Past syntax, these are the mechanics that explain Python's behaviour and show up in senior rounds: how memory is freed, how attribute access really works, and the machinery behind classes.
| Mechanism | What's actually happening |
|---|---|
| Reference counting | every object tracks how many references point to it; at zero it's freed immediately. Fast and deterministic, but can't free reference cycles. |
| Generational GC | a second collector finds and frees cyclic garbage, organized into 3 generations (newer objects checked more often) for efficiency. |
| The GIL | one thread runs bytecode at a time, partly so reference counts don't need per-object locks — see the concurrency card. |
| Interning | small ints and some strings are cached and reused, so is can surprise you — compare values with ==. |
# an object defining __get__/__set__ controls attribute access
class Positive:
def __set_name__(self, owner, name): self.n = "_" + name
def __get__(self, obj, owner): return getattr(obj, self.n)
def __set__(self, obj, value):
if value < 0: raise ValueError("must be non-negative")
setattr(obj, self.n, value)
class Account:
balance = Positive() # validation runs on every assignment
class Point:
__slots__ = ("x", "y") # no per-instance __dict__: less memory, no new attrs
C.__mro__ # method resolution order: the exact lookup chain (C3 linearization)
| Tool | When it earns its place |
|---|---|
| Descriptors | reusable managed attributes (validation, lazy load) — what property and ORM fields are built on |
| Metaclasses | customize class creation (registries, enforcing APIs). Rare — "if you wonder whether you need one, you don't." |
| __slots__ | drop the per-instance dict to save memory across millions of small objects |
Interview Q&A
Python is compiled and interpreted. Your .py is parsed to an AST, compiled to bytecode (cached in __pycache__/*.pyc), and that bytecode is run by the CPython evaluation loop — a giant dispatch over opcodes operating on a per-frame value stack. "Interpreted" means there's no machine-code build step you run; the VM executes the bytecode each time.
import dis
def add(a, b): return a + b
dis.dis(add)
# LOAD_FAST a / LOAD_FAST b / BINARY_OP + / RETURN_VALUE
# attribute lookup obj.x walks a precise chain:
# 1) data descriptor on the type (has __set__) -> wins
# 2) instance __dict__ -> obj.__dict__["x"]
# 3) non-data descriptor / class attr on the MRO -> e.g. methods
# 4) __getattr__ fallback -> only if all miss
class C:
cls_attr = 1
def m(self): return 42
c = C(); c.inst_attr = 9
print(c.inst_attr, c.cls_attr, c.m()) # 9 1 42
class Singleton:
_inst = None
def __new__(cls): # allocates/returns the instance (runs BEFORE __init__)
if cls._inst is None:
cls._inst = super().__new__(cls)
return cls._inst
def __init__(self): self.ready = True # initializes the (possibly reused) instance
print(Singleton() is Singleton()) # True
class Plugin:
registry = {}
def __init_subclass__(cls, key, **kw): # runs once per subclass DEFINITION
super().__init_subclass__(**kw)
Plugin.registry[key] = cls # auto-register — no metaclass needed
class Ctgov(Plugin, key="ctgov"): pass
print(Plugin.registry) # {'ctgov': <class 'Ctgov'>}
| Hook | Fires when | Typical use |
|---|---|---|
| __new__ | instance is allocated | immutable subclasses, singletons, caching/interning |
| __init__ | after allocation | normal instance setup |
| __init_subclass__ | a subclass is defined | auto-registration, API enforcement (vs a metaclass) |
| __set_name__ | a descriptor is bound in a class body | descriptor learns its own attribute name |
Interview Q&A · deep dive
.pyc in __pycache__), and CPython's evaluation loop interprets that bytecode on a per-frame value stack. There's no separate native build step you invoke — the VM runs the bytecode — but it is genuinely a compile-then-execute pipeline, not line-by-line interpretation of source text.obj.x attribute resolution.__getattribute__ drives it: first a data descriptor (has __set__/__delete__) found on the type's MRO wins; otherwise the instance __dict__; otherwise a non-data descriptor or plain class attribute on the MRO (this is how methods are found); and only if all of those miss is __getattr__ called as a fallback. That precedence is why property overrides an instance attribute of the same name.__new__ vs __init__?__new__ is the allocator: a static method that creates and returns the instance, running before __init__. __init__ just initializes the already-created instance and returns None. You override __new__ when you must control creation itself — immutable types (subclassing int/tuple/str), singletons, or instance caching — because by the time __init__ runs the object already exists.__init_subclass__?type.__call__, or framework-level magic. But most "do something whenever a subclass is defined" needs — registries, enforcing that subclasses set certain attributes — are cleaner with __init_subclass__ (subclass hook) and __set_name__ (descriptor naming). Reach for a metaclass only when those genuinely can't express it.Python memory & garbage collection — deep dive internals
Python frees memory with two cooperating systems: reference counting (the workhorse — immediate, deterministic) and a generational cycle collector (the backstop for reference cycles). Beneath them sits pymalloc, a tiered allocator tuned for many small, short-lived objects. All three together explain leaks, latency, and why memory won't return to the OS. (Expands the internals card.)
import sys
x = [] # the list object's refcount = 1
y = x # 2 (another name points at it)
sys.getrefcount(x) # 3: x, y, + the temp arg to getrefcount
del y # back to 1
del x # 0 -> object freed IMMEDIATELY (no pause)
| refcount goes up when… | … and down when |
|---|---|
| a new name binds it, it's added to a container, or passed into a function | a name leaves scope or is reassigned, it's removed from a container, or del |
a = {}; b = {}
a["b"] = b; b["a"] = a # a and b reference each other
del a, b # names gone, but each still has refcount 1
# -> unreachable yet NOT freed by refcounting; the cyclic GC handles it
import gc
gc.collect() # force a full collection; returns # objects freed
gc.get_threshold() # (700, 10, 10) -> gen0, gen1, gen2 trigger ratios
gc.get_count() # live (gen0, gen1, gen2) allocation counters
gc.disable() # stop the CYCLIC gc (refcounting still runs)
gc.set_threshold(0) # also disables automatic gen0 collection
| Layer | Size | Role |
|---|---|---|
| Arena | 256 KB | chunk requested from the OS via malloc |
| Pool | 4 KB | a page inside an arena serving one size class |
| Block | one slot | the actual memory handed to a small object |
# reduce footprint
__slots__ # drop per-instance __dict__ on small objects
generators # stream instead of materializing big lists
numpy / array # packed typed buffers vs lists of boxed objects
weakref.WeakValueDictionary() # caches that don't keep objects alive
sys.intern(s) # dedupe many identical strings
# find the leak
import tracemalloc
tracemalloc.start()
snap = tracemalloc.take_snapshot()
snap.statistics("lineno")[:10] # top allocation sites
gc.get_referrers(obj) # what still points at it?
Interview Q&A
Two systems run together. Refcounting reclaims the moment an object becomes unreachable (no pause). The generational collector is only triggered by net allocations crossing a threshold, and only scans container types (the only ones that can form cycles); ints, strings, and floats never participate. The diagram traces one object from birth to either an instant refcount-zero free or promotion through the generations.
import gc
gc.collect() # clean slate
print(gc.get_count()) # (gen0, gen1, gen2) live allocation counters, e.g. (12, 0, 0)
print(gc.get_threshold()) # (700, 10, 10): gen0 collects after ~700 net allocs
class Node: pass
def make_cycle():
a, b = Node(), Node()
a.peer = b; b.peer = a # mutual references -> a cycle
# a, b go out of scope here: refcount stays 1 each, NOT freed
for _ in range(5): make_cycle()
print("freed by cyclic gc:", gc.collect()) # > 0: the cycles refcounting missed
import weakref, tracemalloc
# a parent/child cycle that a normal dict would keep alive forever
class Child:
def __init__(self, parent):
self.parent = weakref.ref(parent) # weak: does NOT bump parent's refcount
# a cache that releases entries when no one else holds them
cache = weakref.WeakValueDictionary()
# diff allocations over time to localize a growing leak
tracemalloc.start()
snap1 = tracemalloc.take_snapshot()
# ... run the suspect workload ...
snap2 = tracemalloc.take_snapshot()
for stat in snap2.compare_to(snap1, "lineno")[:5]:
print(stat) # the lines whose memory grew most -> your leak
| Generation | Holds | Scanned | Idea |
|---|---|---|---|
| gen 0 | newest objects | most often (~700 net allocs) | most objects die here, cheaply |
| gen 1 | gen-0 survivors | after ~10 gen-0 collections | middle-aged, checked less |
| gen 2 | long-lived | rarely | caches, modules — pay scan cost seldom |
Interview Q&A · deep dive
weakref?WeakValueDictionary), back-references in parent/child graphs to avoid cycles, and observer registrations that shouldn't pin observers. A weakref doesn't increment the refcount, so the target can still be collected and the ref then reads as dead.a is b be True for some equal values and False for others?is reports True. Compute the same value at runtime or use a larger int, and you get distinct objects, so is is False — even though == is True. The rule: compare values with ==; use is only for identity and singletons like None.tracemalloc to snapshot allocations and compare_to across time to see which source lines grow — that localizes the leak. Then gc.get_referrers (or objgraph) reveals what still references the leaking objects, since the cause is almost always a lingering reference (global cache, lru_cache, captured closure). Fix by bounding or removing that reference or using weakref; for transient spikes that won't release to the OS, recycle the worker process.Type hints, generics & static checking typing
Type hints are an optional, erased layer: the interpreter stores them in __annotations__ but never enforces them. Value comes from static checkers (mypy, pyright) that read them ahead of runtime to catch None-bugs, wrong shapes, and bad refactors before they ship. The mental model: hints document intent and let a tool prove it; they cost you nothing at runtime unless a library opts in to read them.
Two checkers dominate: mypy (the reference checker) and pyright (Microsoft, powers Pylance in VS Code, very fast). They do flow-sensitive type narrowing: after if x is None: return, the checker knows x is non-None below. Hints are zero-cost at runtime — but tools like pydantic and dataclasses deliberately do read annotations to build validation and __init__. Gradual typing means you can add hints file-by-file and tighten mypy --strict over time.
from typing import Protocol, TypedDict, Optional
class SupportsClose(Protocol): # structural / "static duck typing"
def close(self) -> None: ...
def shutdown(res: SupportsClose) -> None:
res.close() # any object WITH close() type-checks
class Trial(TypedDict): # dict shape known to the checker
id: str
phase: int
sponsor: Optional[str] # Optional[str] == str | None
def label(t: Trial) -> str:
s = t["sponsor"] # type: str | None
if s is None: # narrowing: below here s is str | None
return f("{t['id']} (unsponsored)")
return f("{t['id']} / {s.upper()}") # s narrowed to str — .upper() is safe
# NEW PEP 695 syntax (Python 3.12+): type params inline, no imports
def first[T](items: list[T]) -> T | None:
return items[0] if items else None
class Box[T]: # generic class, no Generic[T] base
def __init__(self, value: T) -> None:
self.value = value
type Result[T] = T | None # PEP 695 lazy type alias
# LEGACY (still valid, pre-3.12): explicit TypeVar + Generic
from typing import TypeVar, Generic
U = TypeVar("U")
class OldBox(Generic[U]):
def __init__(self, value: U) -> None:
self.value = value
print(first([1, 2, 3])) # 1 — checker infers T = int
print(Box("hi").value) # hi — Box[str]
| Construct | Use it for | Note |
|---|---|---|
| Protocol | structural "has these methods" | no inheritance needed (PEP 544) |
| TypedDict | JSON / dict with known keys | still a plain dict at runtime |
| Optional[X] | value or None | alias for X | None |
| Union / | | one of several types | prefer X | Y (3.10+) |
| type X = ... | named alias (PEP 695) | 3.12+, lazily evaluated |
Interview Q&A · deep dive
Protocol?if x is None: return narrows x to its non-None type afterward; isinstance(x, int), assert x, x is not None, and even TypeGuard functions all narrow. It is how a checker proves .upper() is safe on a str | None after a guard.def first[T](...), class Box[T]:, and the type Alias = ... statement — eliminating most explicit TypeVar declarations and the Generic[T] base. The new type aliases are evaluated lazily (forward references just work). The old TypeVar/Generic style still works and is required on older runtimes.TypedDict a real class at runtime?TypedDict value is an ordinary dict — there is no instance type, no isinstance check, and no key enforcement. It exists purely for static checkers to verify keys and value types. Use a dataclass or pydantic model if you want runtime structure/validation.__annotations__ and otherwise ignored by the interpreter, so they don't slow execution. The exception is libraries that opt in to read them: dataclasses, pydantic, and DI frameworks inspect annotations at class-definition time to generate code or validators — that work happens once, at import, not on every call.Exceptions, chaining & error design errors
Exceptions are Python's primary control-flow for failure. The culture is EAFP — "easier to ask forgiveness than permission": just attempt the operation and catch what breaks, rather than pre-checking everything (LBYL). Good error code is mostly about catching narrowly, preserving the original cause, and raising a domain-specific type callers can act on.
All exceptions derive from BaseException; almost everything you should catch derives from Exception. Above it sit SystemExit, KeyboardInterrupt, and GeneratorExit — never swallow these with a bare except:. The four clauses split cleanly: try = risky code, except = handle a specific failure, else = ran only if no exception (keeps the try body minimal), finally = always runs, even on return or re-raise — the place for cleanup.
class TrialError(Exception): # domain base — callers catch this
"""Base for trial-pipeline failures."""
class ParseError(TrialError): # specific subtype
def __init__(self, trial_id: str, reason: str):
self.trial_id = trial_id
super().__init__(f("{trial_id}: {reason}"))
def parse_phase(raw: dict) -> int:
try: # EAFP: attempt, don't pre-check
return int(raw["phase"])
except (KeyError, ValueError) as e:
# raise from: keep original cause in the traceback (__cause__)
raise ParseError(raw.get("id", "?"), "bad phase") from e
try:
parse_phase({"id": "T1", "phase": "x"})
except TrialError as e: # catch the domain base → handles all subtypes
print("handled:", e, "| cause:", repr(e.__cause__))
# Concurrent work can fail in MANY ways at once → ExceptionGroup
def run_batch():
errors = []
for tid in ("T1", "T2", "T3"):
try:
if tid != "T2":
raise ValueError(f("{tid} invalid"))
except Exception as e:
errors.append(e)
if errors:
raise ExceptionGroup("batch failed", errors)
try:
run_batch()
except* ValueError as eg: # except* handles a SUBSET of the group
print("value errors:", len(eg.exceptions)) # value errors: 2
| Style | Means | Best when |
|---|---|---|
| EAFP | try the op, catch failure | races / costly pre-checks (dict, file, DB) |
| LBYL | check before acting | cheap check, no race (validate user input) |
| raise from e | chain, set __cause__ | wrapping a low-level error in a domain one |
| raise from None | suppress the chain | internal detail you don't want leaked |
Interview Q&A · deep dive
except with else vs putting code in the try body?else runs only if the try succeeded, but it is outside the protected region — so an exception it raises is not caught by the same except. Putting that code in the try would accidentally catch its errors too, masking bugs. else keeps the try body minimal and precise about what you're guarding.finally run, and what happens if it contains a return?finally always runs — after normal completion, after a handled or unhandled exception, and even when the try/except has a return. If finally itself executes a return (or raises), it overrides any pending return or in-flight exception — a notorious way to silently swallow errors. Keep finally to cleanup only.raise X from e, plain raise X, and raise X from None?raise X from e sets __cause__ = e ("The above exception was the direct cause..."). A plain raise X inside an except block implicitly sets __context__ ("During handling... another occurred"). raise X from None suppresses the chain entirely — useful when the underlying error is an implementation detail you don't want to leak.ExceptionGroup and except* solve?ExceptionGroup bundles them; except* lets a handler peel off and handle just the matching subtypes while re-raising the rest as a smaller group. It is the foundation of structured concurrency error handling.BaseException?BaseException is the root of everything, including SystemExit, KeyboardInterrupt, and GeneratorExit — control signals you almost never want to intercept. Catching it makes processes un-killable by Ctrl-C and can hang shutdown. Catch Exception (or narrower) so those control-flow exceptions still propagate.dataclasses, namedtuples & enums data
When a class is mostly data with a little behaviour, @dataclass writes the boilerplate for you: __init__, __repr__, __eq__, and optionally ordering and hashing — generated from the annotated fields at class-definition time. Enums give named, type-safe constants instead of magic strings/ints. Picking the right container (dataclass vs namedtuple vs pydantic) is a recurring design call.
The decorator inspects the class's __annotations__ and synthesises dunder methods. frozen=True makes instances immutable (and hashable, so they work as dict keys / set members). slots=True (3.10+) generates __slots__, cutting per-instance memory and blocking accidental new attributes. field(default_factory=list) is the correct way to give a mutable default — sharing one list across instances is the same trap as a mutable default argument. __post_init__ runs after the generated __init__ for validation or derived fields.
from dataclasses import dataclass, field
@dataclass(frozen=True, slots=True) # immutable + memory-lean + hashable
class Trial:
id: str
phase: int = 1
# default_factory: each instance gets its own list (NOT tags=[] — shared!).
# compare=False keeps it out of __hash__ so a list field can't break hashing.
tags: list[str] = field(default_factory=list, compare=False)
code: str = field(init=False, default="") # derived, not a ctor arg
def __post_init__(self):
if self.phase not in (1, 2, 3, 4):
raise ValueError("phase must be 1-4")
# frozen → must use object.__setattr__ to set derived field
object.__setattr__(self, "code", f("{self.id}-P{self.phase}"))
t = Trial("NCT01", 3, ["oncology"])
print(t) # Trial(id='NCT01', phase=3, tags=['oncology'], code='NCT01-P3')
print({t}) # hashable because frozen — works in a set
from enum import Enum, IntEnum, StrEnum, auto
class Status(Enum): # named constants; identity comparison
DRAFT = auto() # auto() → 1, 2, 3...
ACTIVE = auto()
CLOSED = auto()
class Priority(IntEnum): # compares/sorts as ints
LOW = 1; HIGH = 9
class Region(StrEnum): # is-a str → JSON-friendly (3.11+)
US = "us"; EU = "eu"
print(Status.ACTIVE, Status.ACTIVE.value) # Status.ACTIVE 2
print(Priority.HIGH > Priority.LOW) # True — IntEnum sorts
print(Region.EU == "eu") # True — StrEnum equals its str
| Container | Mutable? | Validates? | Reach for it when |
|---|---|---|---|
| NamedTuple | no (tuple) | no | tiny immutable record, tuple-unpack, lightweight |
| @dataclass | yes (or frozen) | only via __post_init__ | internal value objects, stdlib-only |
| pydantic | yes | yes (coerces & validates) | untrusted input: API bodies, config, JSON |
Interview Q&A · deep dive
@dataclass actually generate, and when?__init__, __repr__, and __eq__ by default; with order=True it adds the comparison dunders, and frozen=True makes it immutable and hashable. It is pure code generation from the field declarations — no runtime overhead per call beyond what hand-written dunders would cost.tags: list = [] as a dataclass default?field(default_factory=list) so each instance gets a fresh list.NamedTuple over a dataclass?x, y = point), is indexable, is hashable for free, and has the lowest memory footprint. Choose a dataclass when you need mutability, methods, default factories, or clearer attribute-only semantics; slots=True closes most of the memory gap.Enum, IntEnum, and StrEnum?Enum members are distinct objects compared by identity and are not equal to their underlying value. IntEnum members are ints (sort, compare, do arithmetic) and StrEnum members (3.11+) are strings — handy for JSON serialization and DB columns where the member must behave as its primitive. The trade-off: the mixed-in types compare equal to raw values, which can hide bugs that plain Enum would catch.match/case — structural pattern matching 3.10+
Introduced in Python 3.10 (PEP 634), match is not a switch — it inspects the structure of a value and destructures it, binding parts to names. You match a subject against patterns top-to-bottom; the first matching case runs and there is no fall-through. It shines on shaped data: parsing ASTs, command dispatch, JSON-like payloads, and tagged unions.
Patterns compose: a literal matches a value; a capture (a bare lowercase name) binds whatever is there; a sequence pattern [a, b, *rest] matches and unpacks lists/tuples; a mapping pattern {"type": t} matches dict subsets (extra keys allowed); a class pattern Point(x=0, y=y) matches by type and destructures attributes. Add guards (case p if p.phase > 2) for extra conditions, | for or-patterns, and as to bind a whole sub-pattern. _ is the wildcard default.
from dataclasses import dataclass
@dataclass
class Click: x: int; y: int
@dataclass
class Key: code: str
def handle(event):
match event:
case Click(x=0, y=0): # class pattern + literal
return "origin click"
case Click(x=x, y=y) if x == y: # class pattern + guard
return f("diagonal at {x}")
case Key(code="esc" | "q"): # or-pattern
return "quit"
case {"type": "scroll", "dy": dy}: # mapping pattern (subset)
return f("scroll {dy}")
case [first, *rest]: # sequence pattern + capture
return f("batch of {1 + len(rest)}, head={first}")
case _: # wildcard fallback
return "unknown"
print(handle(Click(3, 3))) # diagonal at 3
print(handle({"type": "scroll", "dy": -4})) # scroll -4
print(handle([1, 2, 3])) # batch of 3, head=1
OK = 200
def classify(status):
match status:
# WRONG: a bare OK here is read as a CAPTURE name, not the constant 200!
# case OK: ... ← would match EVERYTHING and rebind OK
case 200: # compare against a literal — fine
return "ok"
case http.OK: # dotted name → treated as a VALUE to compare
return "ok-const"
case int() as code if code >= 500: # class pattern + as-bind + guard
return f("server error {code}")
case _:
return "other"
| Pattern | Example | Matches |
|---|---|---|
| Literal | case 200: | exact value (== / is for None/True/False) |
| Capture | case x: | anything; binds to x |
| Sequence | case [a, *rest]: | list/tuple; unpacks like assignment |
| Mapping | case {"k": v}: | dict containing key k (extras ok) |
| Class | case Point(x=0): | instance of type + matching attrs |
| Guard | case x if x>0: | pattern matched AND condition true |
Interview Q&A · deep dive
match different from a C-style switch?switch compares a value against constants. match does structural matching: it checks the shape/type of the subject and destructures it, binding inner parts to names (like unpacking). It supports class, sequence, and mapping patterns, guards, and or-patterns. There is also no fall-through — only the first matching case runs.case some_name: match everything?Color.RED) or a literal — otherwise you silently rebind your "constant" and match anything.{"a": x} matches any dict that contains key "a"; extra keys are ignored. A class pattern checks isinstance and only the attributes you name; other attributes are irrelevant. Sequence patterns, by contrast, must match length unless you include a *rest star.Point(0, 0)?__match_args__ tuple, which maps positional sub-patterns to attribute names (dataclasses set it automatically from field order). So case Point(0, y) compares the first __match_args__ attr to 0 and binds the second to y. Keyword sub-patterns (Point(x=0)) bypass __match_args__ entirely.match?handlers[key]()) or an if/elif chain is clearer and faster to read. Also avoid it in libraries that must support Python < 3.10. match pays off specifically when you're branching on the shape of data and want to bind its parts in one step.Regular expressions with re pattern
A regex is a tiny pattern language compiled into a state machine that scans text. In Python you reach for the re module; the skill is knowing which entry point to use (match vs search vs fullmatch), how to capture what you matched, and how to avoid the two classic traps: greedy quantifiers eating too much and catastrophic backtracking hanging the interpreter.
re.match anchors at the start only, re.search scans the whole string for the first hit, and re.fullmatch requires the pattern to consume the entire string. Most "my regex doesn't work" bugs are really "I used match when I meant search". Anchors ^/$ make intent explicit and are usually clearer than relying on which function you called.
import re
# Use a RAW string r"" so backslashes mean regex, not Python escapes.
# Named groups (?P<name>...) give you a dict instead of fragile indexes.
LOG = re.compile(
r"(?P<ip>\d{1,3}(?:\.\d{1,3}){3})\s+"
r"\[(?P<ts>[^\]]+)\]\s+"
r'"(?P<method>[A-Z]+)\s+(?P<path>\S+)"\s+'
r"(?P<status>\d{3})"
)
line = '10.0.0.7 [28/Jun/2026:10:00:00] "GET /api/users" 200'
m = LOG.search(line)
if m:
print(m.group("method"), m.group("path"), m.group("status"))
print(m.groupdict()) # {'ip': '10.0.0.7', 'ts': ..., 'method': 'GET', ...}
# findall returns tuples of groups; finditer yields match objects (better)
errors = [mm.group("path")
for mm in LOG.finditer(line)
if mm.group("status").startswith("5")]
# sub with a callback: redact emails, keep the domain
text = "reach me at sam@acme.io or jo@acme.io"
redacted = re.sub(r"[\w.]+@([\w.]+)",
lambda g: "***@" + g.group(1), text)
print(redacted) # reach me at ***@acme.io or ***@acme.io
import re
# Lookaround asserts context WITHOUT consuming it (zero-width).
# Password rule: 8+ chars, at least one digit and one letter.
pw = re.compile(r"(?=.*[A-Za-z])(?=.*\d).{8,}")
print(bool(pw.fullmatch("alpha123"))) # True
print(bool(pw.fullmatch("alphabet"))) # False — no digit
# Negative lookbehind: a price NOT preceded by a currency code already.
print(re.findall(r"(?<!USD )\d+\.\d{2}", "USD 9.99 and 4.50")) # ['4.50']
# re.VERBOSE: whitespace/comments ignored — document complex patterns.
phone = re.compile(r"""
(\+\d{1,2}\s?)? # optional country code
\(?\d{3}\)?[\s.-]? # area code
\d{3}[\s.-]?\d{4} # local number
""", re.VERBOSE)
print(bool(phone.search("+1 (415) 555-2671"))) # True
| Token | Means | Note |
|---|---|---|
| * + ? | greedy 0+/1+/0-1 | match as much as possible |
| *? +? ?? | lazy variants | match as little as possible |
| (?:...) | non-capturing group | group without a capture slot |
| (?P<n>...) | named capture | read via groupdict() |
| (?=...) (?!...) | lookahead pos/neg | zero-width, no consume |
| (?<=...) (?<!...) | lookbehind pos/neg | must be fixed-width |
Interview Q&A · deep dive
lxml, json, csv, email). Regex can't balance nested delimiters and becomes unmaintainable. Reach for regex on flat, line-oriented, token-level text..* grabs as much as possible then backtracks; lazy .*? grabs the minimum then expands. On "<a><b>", <.*> matches the whole string, while <.*?> matches just <a>. For "first closing tag", lazy (or a negated class <[^>]*>) is correct.finditer over findall?findall returns strings/tuples and loses position and named-group convenience; with groups its return shape changes confusingly. finditer yields match objects lazily — you keep .start(), .span(), .groupdict(), and you don't materialize a huge list for large inputs.^ $ \b) and lookaround ((?=) (?!) (?<=) (?<!)). It lets you match "X followed by Y" while only capturing X — invaluable for validation rules combining multiple independent conditions.re.VERBOSE so you can add whitespace and # comments, name every capture, and store the compiled object at module scope. Combine flags with | (e.g. re.IGNORECASE | re.MULTILINE) or inline as (?im).datetime & timezones done right time
Time is where correct-looking code quietly corrupts data. The core types are datetime, date, and timedelta; the core discipline is the split between naive datetimes (no timezone — ambiguous) and aware ones (carry a tzinfo). The rule that prevents 90% of bugs: store and compute in UTC, convert to local only at the edges for display.
A naive datetime like datetime(2026, 6, 28, 10, 0) means "10:00 — somewhere, who knows". You cannot subtract a naive from an aware one (it raises), and comparing two naives from different zones silently lies. An aware datetime pins the instant. Use zoneinfo (stdlib since Python 3.9, IANA tz database) — the old pytz is no longer needed and had a famous localize() footgun.
from datetime import datetime, timezone, timedelta
from zoneinfo import ZoneInfo # stdlib since 3.9 (IANA tz data)
# ✅ Current instant, explicitly aware in UTC.
now = datetime.now(timezone.utc)
print(now.isoformat()) # 2026-06-28T14:00:00+00:00
# A user submits a local wall-clock time in New York — attach the zone.
ny = ZoneInfo("America/New_York")
local = datetime(2026, 11, 1, 1, 30, tzinfo=ny) # near DST fall-back
# Normalize to UTC for storage / arithmetic.
utc = local.astimezone(timezone.utc)
print(utc.isoformat()) # 2026-11-01T05:30:00+00:00
# timedelta arithmetic is unambiguous in UTC.
deadline = utc + timedelta(days=3, hours=12)
remaining = deadline - now
print(remaining.total_seconds() / 3600, "hours left")
# Display back in the user's zone only at the edge.
print(deadline.astimezone(ny).strftime("%Y-%m-%d %H:%M %Z"))
from datetime import datetime, date
# Prefer fromisoformat for machine input — fast, no format string.
dt = datetime.fromisoformat("2026-06-28T14:00:00+00:00")
# strptime when you must parse a custom human format.
human = datetime.strptime("28/06/2026 09:15", "%d/%m/%Y %H:%M")
# Unix epoch round-trip (epoch is ALWAYS UTC seconds).
ts = dt.timestamp() # float seconds since 1970-01-01 UTC
back = datetime.fromtimestamp(ts, tz=datetime.now().astimezone().tzinfo)
# date math: business-agnostic; .today() is naive — use .date() of an aware dt
age_days = (date(2026, 6, 28) - date(2000, 1, 1)).days
print(age_days) # 9675
| Need | Use | Avoid |
|---|---|---|
| Current instant | datetime.now(timezone.utc) | datetime.utcnow() (naive! deprecated) |
| Attach a zone | ZoneInfo("Area/City") | pytz.localize() |
| Parse machine ISO | fromisoformat | hand-rolled strptime |
| Compare/subtract | both aware, in UTC | mixing naive + aware (raises) |
Interview Q&A · deep dive
tzinfo is None — it's an unanchored wall-clock with no offset, so it can't be unambiguously converted or compared across zones. An aware datetime carries a tzinfo, pinning an exact instant. Mixing them in arithmetic raises TypeError.timedelta(days=1) to a local-aware datetime adds 24 real hours, which may land on a different wall-clock. Do duration math in UTC; only convert to local for display.zoneinfo over pytz?zoneinfo is stdlib (3.9+), uses the OS IANA database, and works correctly with the normal tzinfo= constructor and astimezone. pytz required the non-obvious localize()/normalize() dance because attaching it directly gave a wrong historical offset (LMT). New code should use zoneinfo.1970-01-01T00:00:00Z. It is inherently UTC and tz-free. aware_dt.timestamp() is well-defined; calling .timestamp() on a naive datetime assumes local time, a common source of off-by-offset errors.Power stdlib: itertools · functools · pathlib batteries
"Batteries included" is real leverage: three modules turn verbose loops into declarative, fast, memory-light code. itertools composes lazy iterators in C; functools gives you memoization and partial application; pathlib replaces brittle os.path string-mashing with an object that knows it's a path. Reaching for these first is a hallmark of idiomatic Python.
itertools functions return iterators, not lists — they pull one item at a time, so you can chain them over a multi-gigabyte stream in constant memory. functools.lru_cache trades memory for time by memoizing pure functions. pathlib.Path overloads the / operator for joining and unifies the dozens of os.path helpers into methods — and it's cross-platform without manual separators.
from itertools import chain, groupby, islice, accumulate, pairwise
from operator import itemgetter
rows = [
{"team": "A", "pts": 3}, {"team": "A", "pts": 5},
{"team": "B", "pts": 2}, {"team": "B", "pts": 9},
]
# groupby needs the data PRE-SORTED on the key (it groups runs).
rows.sort(key=itemgetter("team"))
for team, grp in groupby(rows, key=itemgetter("team")):
total = sum(r["pts"] for r in grp)
print(team, total) # A 8 / B 11
# islice: take a window from an infinite/large iterator without a list.
def naturals():
n = 1
while True:
yield n; n += 1
print(list(islice(naturals(), 5, 10))) # [6, 7, 8, 9, 10]
# chain flattens; accumulate runs a running total; pairwise (3.10+) windows.
print(list(chain([1, 2], [3, 4]))) # [1, 2, 3, 4]
print(list(accumulate([1, 2, 3, 4]))) # [1, 3, 6, 10]
print(list(pairwise([1, 2, 3]))) # [(1, 2), (2, 3)]
from functools import lru_cache, partial, reduce
from pathlib import Path
# lru_cache: memoize a pure, expensive function (here, recursion).
@lru_cache(maxsize=None) # functools.cache is the 3.9+ alias for this
def fib(n):
return n if n < 2 else fib(n - 1) + fib(n - 2)
print(fib(50)) # instant; without cache: exponential
print(fib.cache_info()) # hits/misses/maxsize/currsize
# partial: freeze arguments to build a specialized callable.
to_int = partial(int, base=16)
print(to_int("ff")) # 255
# reduce: fold a sequence (use sparingly — a loop is often clearer).
print(reduce(lambda a, b: a * b, range(1, 6))) # 120 = 5!
# pathlib: build, inspect, and read paths cross-platform.
cfg = Path.home() / ".config" / "app" / "settings.toml"
print(cfg.suffix, cfg.stem, cfg.parent.name) # .toml settings app
for py in Path(".").glob("**/*.py"): # recursive glob
if py.stat().st_size > 0:
text = py.read_text(encoding="utf-8") # one call, no open()
| os.path | pathlib equivalent |
|---|---|
| os.path.join(a, b) | Path(a) / b |
| os.path.basename(p) | p.name |
| os.path.splitext(p)[1] | p.suffix |
| os.path.exists(p) | p.exists() |
| glob.glob("*.py") | Path().glob("*.py") |
Interview Q&A · deep dive
itertools.groupby "miss" groups sometimes?uniq. If the data isn't sorted on the grouping key, identical keys scattered through the input produce multiple separate groups. Sort by the same key first.lru_cache?partial better than a lambda?partial is picklable, introspectable (keeps func/args), and signals intent ("pre-bind these args"). A lambda creates a new closure each time and can't be pickled — a problem when passing callables to multiprocessing. Use partial for "specialize an existing function".functools.wraps do and why care?__name__, __doc__, and __wrapped__. @wraps(fn) copies that metadata over so introspection, tracebacks, and tools like Sphinx/pydoc still see the real function.Packaging, venvs & dependency management 2026
Two problems, often conflated. Environment isolation: keep each project's dependencies separate (a venv). Distribution: turn your code into an installable artifact (wheel + sdist) others can pip install. Modern Python has converged on a single declarative file — pyproject.toml (PEP 621) — and a fast new contender, uv, that collapses venv + pip + lock into one tool.
pyproject.toml declares a build backend (hatchling, setuptools, flit, or uv's own). A frontend (pip, build, uv) reads it, spins up an isolated build env, and asks the backend to produce a wheel (a zip you install directly) and an sdist (source tarball). You then upload to PyPI with twine or uv publish. A lockfile (uv.lock, or PEP 751 pylock.toml) pins exact transitive versions for reproducible installs.
# Create and activate an isolated environment (stdlib, no install needed).
python -m venv .venv
# Windows: .venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
# Install, freeze, and reproduce.
pip install "requests>=2.32" rich
pip freeze > requirements.txt # exact pins of what's installed
pip install -r requirements.txt # recreate elsewhere
# Editable/dev install of YOUR package (reads pyproject.toml).
pip install -e "." # changes to source apply live
pip install -e ".[dev]" # with the 'dev' optional-deps group
# --- pyproject.toml ---
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "acme-tools"
version = "1.2.0"
requires-python = ">=3.10"
dependencies = ["requests>=2.32", "click>=8.1"]
[project.optional-dependencies] # installed via .[dev]
dev = ["pytest>=8", "ruff", "mypy"]
[project.scripts]
acme = "acme_tools.cli:main" # creates an `acme` console command
# --- build & publish (shell) ---
# pip-based: python -m build → twine upload dist/*
# uv-based (fastest, Rust): one tool for venv + deps + build + publish
uv init acme-tools # scaffold a standards-compliant project
uv add requests click # resolve + write uv.lock + sync .venv
uv build # produce wheel + sdist in dist/
uv publish # upload to PyPI
| Tool | Role | Use it when |
|---|---|---|
| venv + pip | stdlib baseline | always available; simple scripts/CI |
| uv | all-in-one, Rust, ~10-100x faster | new projects, CI speed, team standard |
| Poetry | library workflow; 2.0+ speaks PEP 621 | publishing libraries to PyPI |
| hatchling | PEP 621 build backend | building wheels/sdists |
| twine | upload artifacts | pip-based publish to PyPI |
Interview Q&A · deep dive
pip install can break OS tools that depend on the system interpreter. A venv is just a directory with its own site-packages and a tweaked path.setup.py, and why?pyproject.toml with the PEP 621 [project] table for metadata and PEP 517/518 for the build-system declaration. setup.py was executable config (arbitrary code at build time — a security and reproducibility hazard); the TOML is declarative, tool-agnostic, and lets any frontend build any backend. Keep setup.py only for programmatic needs like C extensions.uv.lock, requirements.txt from pip freeze, or PEP 751 pylock.toml) for reproducible deploys. Libraries declare ranges in dependencies so consumers can resolve compatibly — over-pinning a library forces dependency conflicts downstream.Poetry — dependency management & packaging tooling
Poetry is an all-in-one project + dependency manager. From a single pyproject.toml it resolves the full dependency graph, writes a lockfile (poetry.lock) for byte-for-byte reproducible installs, manages a per-project virtualenv for you, and builds/publishes wheels. Think of it as the curated, batteries-included alternative to wiring up pip + venv + pip-tools yourself — and a slower-but-mature sibling to the newer Rust tool uv.
# start a project (or `poetry init` inside an existing one)
poetry new trainhub && cd trainhub
# add deps: edits pyproject.toml, re-resolves, updates poetry.lock, installs
poetry add django celery
poetry add --group dev pytest ruff # dev-only group, not shipped to prod
poetry remove celery
# install from the lockfile — every machine gets identical versions
poetry install # app deps + your package (editable)
poetry install --only main --no-root # CI/prod: no dev deps, don't install the project itself
# re-resolve within your constraints and rewrite the lock
poetry update # everything; or `poetry update django`
poetry lock # re-lock without installing
# run things inside the managed venv (no manual activate needed)
poetry run pytest
poetry run python manage.py migrate
poetry env info --path # where the venv lives
# ship it
poetry build # wheel + sdist into dist/
poetry publish -r pypi # upload (configure the token first)
# Poetry 2.x understands the standard [project] table (PEP 621);
# classic projects still use [tool.poetry]. Dependencies + groups:
[tool.poetry.dependencies]
python = "^3.12"
django = "^5.0" # caret: >=5.0.0, <6.0.0 (no breaking major bump)
redis = "~5.1" # tilde: >=5.1.0, <5.2.0 (only patch updates)
[tool.poetry.group.dev.dependencies]
pytest = "*"
ruff = "*"
[build-system] # makes the project pip-installable too
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
| Command | Does |
|---|---|
| poetry add / remove | change a dep — updates both pyproject.toml and the lock |
| poetry install | sync the venv to exactly what poetry.lock says |
| poetry lock | re-resolve and rewrite the lock (no install) |
| poetry update | re-resolve within constraints, bump pins, rewrite lock |
| poetry run / shell | execute inside the managed virtualenv |
| poetry show --tree | visualise the resolved dependency graph |
| poetry build / publish | produce wheel + sdist, upload to an index |
Interview Q&A
pip + venv is the always-available baseline but you manage isolation, resolution, and locking yourself. Poetry bundles all of that — resolver, lockfile, venv, build/publish — behind one tool with a curated workflow, ideal for application and library projects that want reproducibility out of the box. uv is the newer Rust tool that does much the same far faster with a pip-compatible interface; it's increasingly the speed-first choice. The trade-off is maturity/ecosystem familiarity (Poetry) vs raw speed and a single static binary (uv).poetry.lock give you over requirements.txt?requirements.txt is usually hand-maintained top-level deps; unless you also pin transitively (e.g. via pip-tools) you can get different sub-dependency versions across machines. The lock makes installs deterministic and tamper-evident.dev group for test/lint tools separate from runtime main deps. You install selectively: poetry install --only main in production keeps the image small and the attack surface down, while developers get the full set. It replaces the old "extra requirements-dev.txt" pattern with something the resolver understands.^1.2.3 and tilde ~1.2.3?^1.2.3 means >=1.2.3, <2.0.0 (any new minor/patch, no breaking major). Tilde is tighter — ~1.2.3 means >=1.2.3, <1.3.0 (patch-level only). Caret is the common default; tilde is for when you want to pin a minor line.uv — the fast all-in-one Python toolchain 2026
uv (from Astral, the team behind Ruff) is a single Rust binary that collapses pip + venv + pip-tools + pipx + pyenv + much of poetry into one tool — and runs them 10–100× faster. The speed comes from a parallel resolver and a global content-addressed cache that hard-links packages into each venv instead of re-downloading and re-extracting them. There are two ways to drive it: project mode (a pyproject.toml + universal uv.lock) and a pip-compatible interface you can drop into an existing pip workflow with zero changes.
# install the standalone binary — needs neither Python nor Rust
curl -LsSf https://astral.sh/uv/install.sh | sh # macOS / Linux
powershell -c "irm https://astral.sh/uv/install.ps1 | iex" # Windows
# --- project mode: pyproject.toml + uv.lock ---
uv init myapp && cd myapp
uv add django celery # resolve + write uv.lock + sync .venv (auto-creates it)
uv add --dev pytest ruff # dev dependency group
uv remove celery
uv sync # make .venv match uv.lock EXACTLY
uv sync --frozen --no-dev # CI/prod: don't touch the lock, skip dev deps
uv lock # re-resolve and rewrite uv.lock (no install)
uv run pytest # run inside the env — auto-syncs first, no activate
uv run python manage.py migrate
# --- pip mode: a near drop-in replacement for pip + venv ---
uv venv # create .venv in ~10ms (vs ~1s for python -m venv)
uv pip install -r requirements.txt # same flags as pip, 10-100x faster
uv pip install "fastapi>=0.110"
uv pip compile requirements.in --universal -o requirements.txt # pip-tools
uv pip sync requirements.txt # make the env match the file exactly
# --- manage Python interpreters (replaces pyenv) ---
uv python install 3.12 3.13 # download + manage multiple versions
uv python pin 3.12 # writes .python-version for the project
uv run --python 3.13 script.py
# --- run / install CLI tools (replaces pipx) ---
uvx ruff check . # run a tool in a throwaway env (alias for `uv tool run`)
uv tool install ruff # install a CLI tool globally, isolated from projects
FROM python:3.12-slim
# copy the uv binary straight from Astral's published image
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
WORKDIR /app
# cache deps separately from source for fast rebuilds
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-install-project --no-dev
COPY . .
RUN uv sync --frozen --no-dev
ENV PATH="/app/.venv/bin:$PATH"
CMD ["python", "-m", "my_service"]
| uv command | Replaces | Does |
|---|---|---|
| uv venv | python -m venv / virtualenv | create a venv near-instantly |
| uv pip install | pip install | drop-in, same flags, far faster |
| uv pip compile / sync | pip-tools | lock a .in file / make env match it |
| uv add / remove | — | edit pyproject + re-resolve + update uv.lock |
| uv sync | — | install env to exactly uv.lock |
| uv lock | — | resolve and write the universal lockfile |
| uv run | activate + run | run a command in the env (auto-syncs) |
| uv python install / pin | pyenv | install + select interpreter versions |
| uvx / uv tool install | pipx | run / install CLI tools in isolated envs |
Interview Q&A
uv pip install vs uv add / uv sync — what's the difference?uv pip install is the imperative, pip-compatible mode — you manage the env yourself, just faster. uv add/uv sync is project mode: add edits pyproject.toml and re-resolves the universal uv.lock; sync makes the environment match that lock exactly. Use the pip interface to speed up an existing workflow; use project mode for reproducible, lock-driven environments.uv.lock vs requirements.txt vs poetry.lock?uv.lock is universal — a single lockfile that resolves for every platform/Python combination, so the same file works on your laptop and a different-OS CI runner. poetry.lock is also a real resolved lock but Poetry-specific. A plain requirements.txt is usually a flat pinned list with no cross-platform guarantees unless you also use pip-tools. uv sync --frozen installs from the lock without re-resolving — the deterministic prod path.uv python install 3.12 3.13 downloads and manages interpreters (pyenv's job), and uv python pin records the chosen version in .python-version; uv run --python 3.13 runs against a specific one. For tools, uvx <tool> runs a CLI in an ephemeral env and uv tool install installs it globally but isolated — the pipx role — so linters/formatters never collide with project dependencies.Logging done right observability
A print goes to one place and tells you nothing about when, where, or how severe. The logging module gives you leveled, routable, formatted records you can dial up in production without touching code. The mental model: a logger creates a record, a filter may drop it, a handler routes it to a destination, and a formatter shapes the text.
Get a logger per module with logging.getLogger(__name__) — that names records by their origin and forms a dotted hierarchy. Records flow up the hierarchy (propagation) to the root logger's handlers. Set levels in two places: the logger's level gates what it emits; each handler's level gates what that destination accepts. Configure handlers/formatters once at startup (ideally via dictConfig), never per call.
import logging
# One logger per module — naming by __name__ gives a clean hierarchy.
log = logging.getLogger(__name__)
# Configure ONCE at the app entry point (not in library modules).
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)-8s %(name)s:%(lineno)d %(message)s",
)
def charge(user_id, cents):
# Use %-style args, NOT f-strings: formatting is deferred
# and skipped entirely if the level is disabled.
log.info("charging user=%s amount=%d", user_id, cents)
try:
if cents < 0:
raise ValueError("negative amount")
return True
except ValueError:
# exc_info=True attaches the full traceback to the record.
log.exception("charge failed user=%s", user_id)
return False
charge(42, 1500) # INFO ... charging user=42 amount=1500
charge(42, -1) # ERROR ... charge failed + traceback
import logging, json, sys
class JsonFormatter(logging.Formatter):
def format(self, record):
payload = {
"ts": self.formatTime(record),
"level": record.levelname,
"logger": record.name,
"msg": record.getMessage(),
}
# 'extra=' kwargs land as attributes on the record — pull them in.
if hasattr(record, "request_id"):
payload["request_id"] = record.request_id
if record.exc_info:
payload["exc"] = self.formatException(record.exc_info)
return json.dumps(payload)
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(JsonFormatter())
root = logging.getLogger()
root.setLevel(logging.INFO)
root.addHandler(handler)
# Attach request-scoped context via extra= (great for correlation IDs).
logging.getLogger("api").info(
"request handled", extra={"request_id": "req-7f3a"})
# {"ts": ..., "level": "INFO", "logger": "api",
# "msg": "request handled", "request_id": "req-7f3a"}
| Level | When to use |
|---|---|
| DEBUG | diagnostic detail for developers; off in prod |
| INFO | normal lifecycle events worth recording |
| WARNING | unexpected but handled; default root level |
| ERROR | an operation failed; needs attention |
| CRITICAL | the app/service may be unable to continue |
Interview Q&A · deep dive
logging better than print?print goes only to stdout with no metadata and no control.logger.propagate = False on the child or configuring handlers only at root.except block call log.exception("msg") (it implies exc_info=True at ERROR level) or any level with exc_info=True. That attaches the current exception and stack to the record so the formatter can render the full traceback.getLogger(__name__) the recommended pattern?logging.getLogger("urllib3").setLevel(WARNING)) and every record self-identifies its origin without hardcoding strings.extra={"request_id": rid} on each call (lands as a record attribute), or install a Filter/contextvar that injects it onto every record automatically. A structured formatter then serializes it, letting you grep/trace one request across many log lines and services.Data Structures & SQL
The complexity table you must have at your fingertips, the standard-library structures that win interviews, the four DSA patterns that solve most screens, and the SQL that keeps your pipelines correct and injection-safe.
Big-O & choosing the structure fundamentals
Pick the structure by the operation you do most. Hashing (dict/set) gives O(1) membership — the single biggest practical speedup, turning O(n²) nested scans into O(n).
| Op | list | dict / set | note |
|---|---|---|---|
| index access | O(1) | — | list by position |
membership x in | O(n) | O(1) | use a set for lookups |
| insert/delete end | O(1) amortised | O(1) | list front is O(n) → use deque |
| search by key | O(n) | O(1) | dict = index |
Interview Q&A
value → index: for each x, check if target − x is already in the dict. Hash lookup replaces the inner loop.collections.deque gives O(1) appends/pops at both ends.A dict/set is a hash table: the key is run through hash(), the low bits index into a backing array of slots, and the value lands there directly — no scan. That's why lookup is average O(1): you compute the slot, you don't search for it. The cost you pay is hashing the key and tolerating collisions (two keys → same slot), which CPython resolves by open addressing + probing. Average O(1) holds while the table stays under its load factor; it resizes (re-hashing everything) when it fills, which is why insert is "amortised" O(1), not worst-case.
Amortised O(1) means: averaged over a long run of operations, each costs O(1) — even though one occasional op is expensive. A list.append is the canonical case: usually free, but when the backing array fills it allocates a bigger one and copies everything (O(n)). Because the array grows geometrically (~1.125× in CPython, doubling in many languages), those copies are rare enough that the per-append average stays constant. Don't confuse amortised O(1) (dict insert, list append) with true worst-case O(1) (list index by position) — an adversarial input or a resize can spike a single op.
| Structure | Access | Search | Insert | Delete | Ordered? |
|---|---|---|---|---|---|
| list | O(1) by index | O(n) | O(1)* end / O(n) mid | O(n) | insertion |
| dict / set | — | O(1) avg | O(1)* avg | O(1) avg | dict: insertion (3.7+) |
| deque | O(n) mid | O(n) | O(1) both ends | O(1) both ends | insertion |
| heapq (list) | O(1) min only | O(n) | O(log n) | O(log n) pop-min | heap order |
| sorted list + bisect | O(1) by index | O(log n) | O(n) shift | O(n) shift | sorted |
import timeit, random
n = 100_000
data = [random.randint(0, n) for _ in range(n)]
as_list = data # membership is O(n)
as_set = set(data) # membership is O(1) avg
needle = -1 # worst case: not present → full scan for list
t_list = timeit.timeit(lambda: needle in as_list, number=1000)
t_set = timeit.timeit(lambda: needle in as_set, number=1000)
print(f"list in: {t_list:.4f}s set in: {t_set:.6f}s")
print(f"set is ~{t_list / t_set:,.0f}x faster") # typically 1000x+ at this n
# The O(n^2) -> O(n) refactor in one screen:
def has_dupe_slow(xs): # O(n^2): scan seen-list each time
seen = []
for x in xs:
if x in seen: return True
seen.append(x)
return False
def has_dupe_fast(xs): # O(n): set membership is O(1) avg
seen = set()
for x in xs:
if x in seen: return True
seen.add(x)
return False
Interview Q&A · deep dive
PYTHONHASHSEED) and open addressing, but the guarantee is amortised average O(1), not worst-case.heapq.nsmallest(k, xs) picks the strategy for you (it sorts when k is close to n).Standard-library power tools stdlib
Knowing these signals fluency. They replace fragile hand-rolled code in interviews and production alike.
| Tool | Solves |
|---|---|
defaultdict(list) | grouping without "if key not in dict" boilerplate |
Counter | frequency counts, .most_common(k) |
deque | O(1) both-ends queue / sliding window |
heapq | top-k / priority queue in O(n log k) |
bisect | keep a list sorted; binary search insert |
from collections import defaultdict, Counter
import heapq
groups = defaultdict(list)
for r in records: groups[r["registry"]].append(r) # group by registry
top = Counter(t["phase"] for t in trials).most_common(3)
top3 = heapq.nlargest(3, scores) # top-k without full sort
Interview Q&A
heapq): push each item, pop when size > k. O(n log k) time, O(k) space — never sort the whole stream.namedtuple gives you a lightweight, immutable, memory-efficient record with named fields — clearer than a tuple of mystery indices, lighter than a class, and it stays hashable so it works as a dict key or set member. deque is a doubly-linked block list: O(1) push/pop at both ends (a plain list is O(n) at the front), plus a maxlen that makes a fixed-size sliding window or a "last N events" buffer trivial.
from collections import namedtuple, deque
# namedtuple: a self-documenting record, still a tuple (hashable, unpackable)
Trial = namedtuple("Trial", "gdcid phase indication")
t = Trial("GDC-91", 3, "NSCLC")
print(t.phase, t[1]) # 3 3 — name OR index
late = {t for t in [t] if t.phase >= 3} # usable in a set: it's hashable
# deque with maxlen: a fixed sliding window that drops the oldest for free
window = deque(maxlen=3)
for x in [10, 20, 30, 40]:
window.append(x) # appending the 4th auto-evicts 10
print(list(window)) # [20, 30, 40]
window.appendleft(5) # O(1) at the front (list would be O(n))
heapq turns a plain list into a binary min-heap in place: heappush/heappop are O(log n) and the smallest item is always at [0]. For a max-heap or a priority queue with a custom key, push (priority, item) tuples (negate for max). bisect does binary search on an already-sorted list — O(log n) to find the insertion point — so you can keep a list sorted as items arrive (insort) or bucket scores into grades without a chain of if.
import heapq, bisect
# priority queue: a scheduler that always pops the most urgent task
pq = []
for prio, name in [(3, "low"), (1, "urgent"), (2, "mid")]:
heapq.heappush(pq, (prio, name))
print(heapq.heappop(pq)) # (1, 'urgent') — smallest priority first
# bisect: classify a value into ordered buckets in O(log n)
cuts = [60, 70, 80, 90]
grade = "FDCBA"
def to_grade(score):
return grade[bisect.bisect_right(cuts, score)]
print([to_grade(s) for s in [55, 73, 95]]) # ['F', 'C', 'A']
| Need | Reach for | Why not the obvious thing |
|---|---|---|
| group rows by key | defaultdict(list) | skips the setdefault/if-in dance |
| count then rank | Counter().most_common(k) | hand-rolled dict + sort is slower & longer |
| queue / sliding window | deque(maxlen=k) | list .pop(0) is O(n) |
| streaming top-k | heapq.nlargest(k, …) | full sort is O(n log n) vs O(n log k) |
| keep a list sorted | bisect.insort | re-sorting after each insert is O(n log n) |
| named record | namedtuple / NamedTuple | cheaper than a class, clearer than a bare tuple |
Interview Q&A · deep dive
defaultdict vs dict.setdefault vs Counter — when each?defaultdict(factory) for repeated grouping/accumulating where the default is built lazily per missing key. setdefault for a one-off default on a plain dict (note it always evaluates its default arg, so it's wasteful in a loop). Counter specifically for integer tallies — it adds most_common, set-like arithmetic, and missing-key-returns-0.heapq implement a max-heap or custom priority?-score) or use heapq.nlargest. For custom priority, push tuples (priority, tiebreak, item); the tiebreak (e.g. an insertion counter) avoids comparing the items themselves when priorities tie.deque.popleft() O(1) but list.pop(0) O(n)?bisect beat a hash set?The four patterns that clear most screens patterns
Most coding screens reduce to one of these. Recognise the pattern from the problem shape, then the code is mechanical.
| Pattern | Signal in the prompt | Idea |
|---|---|---|
| Hashing | "have we seen…", counts, pairs | dict/set for O(1) lookup |
| Two pointers | sorted array, pair/triplet, in-place | converge from both ends |
| Sliding window | longest/shortest substring/subarray | grow/shrink a window, track best |
| BFS / DFS | grid, tree, graph, "connected" | queue (BFS) / stack-recursion (DFS) |
# Sliding window: longest substring of distinct chars
def longest_unique(s):
seen = {}; start = best = 0
for i, ch in enumerate(s):
if ch in seen and seen[ch] >= start:
start = seen[ch] + 1
seen[ch] = i
best = max(best, i - start + 1)
return best
Interview Q&A
The screen is won at the moment of recognition, not the typing. Each pattern has a tell in the prompt; once you name it, the skeleton is muscle memory. The flow below is the triage you run in the first 60 seconds.
# TWO POINTERS — sorted array, find a pair summing to target
def pair_sum(nums, target): # nums sorted ascending
lo, hi = 0, len(nums) - 1
while lo < hi:
s = nums[lo] + nums[hi]
if s == target: return (lo, hi)
elif s < target: lo += 1 # need bigger → move left ptr up
else: hi -= 1 # need smaller → move right ptr down
return None # O(n) time, O(1) space
# HASHING — unsorted two-sum, one pass, value -> index
def two_sum(nums, target):
seen = {}
for i, x in enumerate(nums):
if target - x in seen:
return (seen[target - x], i)
seen[x] = i # O(n) time, O(n) space
return None
from collections import deque
# SLIDING WINDOW — shortest subarray with sum >= target (positives)
def min_window(nums, target):
start = total = 0; best = float("inf")
for end, x in enumerate(nums):
total += x # grow window to the right
while total >= target: # shrink from the left while valid
best = min(best, end - start + 1)
total -= nums[start]; start += 1
return 0 if best == float("inf") else best
# BFS — shortest hops in an unweighted graph (adjacency dict)
def bfs_dist(graph, src):
dist = {src: 0}; q = deque([src])
while q:
node = q.popleft() # FIFO = level order
for nxt in graph[node]:
if nxt not in dist:
dist[nxt] = dist[node] + 1
q.append(nxt)
return dist
# DFS — count connected components, iterative (no recursion-depth risk)
def components(graph):
seen = set(); count = 0
for start in graph:
if start in seen: continue
count += 1; stack = [start]
while stack: # LIFO = depth first
node = stack.pop()
if node in seen: continue
seen.add(node)
stack.extend(graph[node])
return count
Interview Q&A · deep dive
sys.setrecursionlimit is a band-aid that risks a C-level stack overflow.SQL that keeps pipelines correct data
Beyond CRUD, three things separate juniors from seniors: parameterised queries (never string-format user data), indexes (the lever for read latency), and transactions (all-or-nothing writes).
# ✅ parameterised — driver escapes safely, prevents SQL injection
cur.execute("SELECT * FROM trials WHERE phase = ? AND registry = ?",
(phase, registry))
# ❌ never: f-string lets input become SQL
# cur.execute(f"... WHERE phase = '{phase}'")
# index the columns you filter/join on
cur.execute("CREATE INDEX idx_trials_phase ON trials(phase)")
Interview Q&A
A join is a filter over the cross-product of two tables. The join type decides what happens to unmatched rows: INNER keeps only matches; LEFT keeps all left rows and pads the right with NULL; FULL keeps everything. The classic bug is a missing or non-unique join key fanning out rows (a one-to-many becomes a row multiplier) and silently inflating a SUM.
-- LEFT JOIN: every trial, even those with no recorded sites (sites = NULL)
SELECT t.gdcid, t.phase, COUNT(s.site_id) AS n_sites
FROM trials t
LEFT JOIN sites s ON s.gdcid = t.gdcid -- COUNT(s.site_id) ignores NULLs → 0
WHERE t.phase = 3
GROUP BY t.gdcid, t.phase
HAVING COUNT(s.site_id) = 0; -- phase-3 trials with no sites
A window function computes across a set of rows related to the current row but, unlike GROUP BY, keeps every row. That's how you do running totals, rankings within a group, and "compare each row to its group's average" in one pass. The OVER (PARTITION BY … ORDER BY …) clause defines the window.
-- rank trials by enrollment within each phase, keep all rows
SELECT gdcid, phase, enrollment,
RANK() OVER (PARTITION BY phase ORDER BY enrollment DESC) AS rnk,
AVG(enrollment) OVER (PARTITION BY phase) AS phase_avg,
SUM(enrollment) OVER (ORDER BY start_date
ROWS UNBOUNDED PRECEDING) AS running_total
FROM trials;
-- ROW_NUMBER for dedupe: keep the latest row per key
-- ROW_NUMBER() OVER (PARTITION BY gdcid ORDER BY updated_at DESC) = 1
A CTE (WITH … AS (…)) names a subquery so a complex pipeline reads top-to-bottom instead of nesting inside-out; it can also be recursive (org charts, graph reachability). EXPLAIN (and EXPLAIN ANALYZE for real timings) shows the planner's chosen access path — the words you hunt for are Seq Scan (full-table read; usually bad on a big filtered table) vs Index Scan/Seek, plus the join algorithm (nested-loop vs hash vs merge).
WITH site_counts AS ( -- aggregate the many-side ONCE
SELECT gdcid, COUNT(*) AS n_sites
FROM sites GROUP BY gdcid
)
SELECT t.gdcid, t.phase, sc.n_sites
FROM trials t
LEFT JOIN site_counts sc ON sc.gdcid = t.gdcid; -- no fan-out, no inflated sums
EXPLAIN ANALYZE SELECT * FROM trials WHERE phase = 3;
-- look for: Index Scan using idx_trials_phase (good)
-- vs: Seq Scan on trials Filter: (phase = 3) (add an index)
| Isolation level | Prevents | Still allows |
|---|---|---|
| Read Uncommitted | — | dirty reads (sees uncommitted data) |
| Read Committed | dirty reads | non-repeatable reads, phantoms |
| Repeatable Read | + non-repeatable reads | phantoms (in standard SQL) |
| Serializable | everything (acts as if serial) | nothing — most overhead/contention |
Interview Q&A · deep dive
WHERE instead of the ON clause silently turns a LEFT JOIN back into an INNER JOIN — because WHERE right.col = x filters out the NULL-padded rows. Put right-side filters in ON to preserve the outer join.pg_locks/sys.dm_tran_locks.Databases compared — pick the right store decision
"Which database?" is a senior question because the answer is access pattern first, not popularity. The big forks: relational vs NoSQL, transactional (OLTP) vs analytical (OLAP), and strong vs eventual consistency.
| Family | Shape | Reach for it when | Examples |
|---|---|---|---|
| Relational | tables + joins, schema, ACID | structured data, integrity, complex queries | PostgreSQL, MySQL |
| Document | JSON-like, flexible schema | nested, evolving shapes; per-doc access | MongoDB |
| Key-value | hash by key, O(1) | cache, sessions, leaderboards | Redis, DynamoDB |
| Wide-column | rows with dynamic columns, write-optimised | massive writes, time-series at scale | Cassandra, ScyllaDB |
| Search | inverted index, full-text + relevance | text search, log analytics | Elasticsearch, OpenSearch |
| Vector | ANN over embeddings | semantic search, RAG retrieval | pgvector, Qdrant, Pinecone |
| Graph | nodes + edges, traversal | relationships, recommendation, fraud | Neo4j |
| Axis | Left | Right |
|---|---|---|
| OLTP vs OLAP | OLTP: many small read/writes (an app) | OLAP: few huge scans/aggregations (a warehouse: Redshift, Snowflake, BigQuery) |
| ACID vs BASE | ACID: atomic, consistent, isolated, durable (relational) | BASE: basically-available, soft-state, eventually-consistent (many NoSQL) |
| Normalize vs denormalize | normalize: no duplication, integrity (OLTP) | denormalize: duplicate for read speed (OLAP, document) |
| Replication vs sharding | replication: copies for HA + read scale | sharding: split data across nodes for write scale |
Interview Q&A
CAP only describes behaviour during a partition. PACELC extends it: if Partition, trade Availability vs Consistency; Else (normal operation), trade Latency vs Consistency. That "Else" is what you actually feel day-to-day — a strongly-consistent store pays a latency tax on every write (quorum/round-trips) even when nothing is failing. Dynamo-style stores (Cassandra, DynamoDB) are PA/EL: available under partition, low-latency normally, at the cost of consistency. Spanner-style systems are PC/EC: consistent always, latency is the price.
| System | Partition → | Else (normal) → | PACELC |
|---|---|---|---|
| PostgreSQL (single) | consistency | consistency | PC/EC |
| Spanner / CockroachDB | consistency | consistency | PC/EC |
| DynamoDB / Cassandra | availability | latency | PA/EL (tunable) |
| MongoDB (default) | consistency | latency | PC/EL |
"Eventually consistent" is not a single thing — the spectrum runs from strong (every read sees the latest write) through read-your-writes and monotonic reads (you never go backward in time) to plain eventual (replicas converge someday). Many stores let you tune this per query via quorums: with N replicas, choosing read+write quorums where R + W > N guarantees a read overlaps the latest write — strong consistency. Drop below that and you trade staleness for latency and availability.
| Model | Guarantee | Typical use |
|---|---|---|
| Strong / linearizable | read always sees newest write | balances, inventory, locks |
| Read-your-writes | you see your own updates | user editing their profile |
| Monotonic reads | never see older than before | feeds, timelines |
| Eventual | converges, no ordering promise | caches, view counters, DNS |
| Family | Primary index | Write cost | Scales by | Weak at |
|---|---|---|---|---|
| Relational | B-tree | moderate (index upkeep) | vertical + read replicas | huge write throughput, flexible schema |
| Document | B-tree per field | low (single-doc) | sharding by key | cross-document joins/transactions |
| Key-value | hash | very low | consistent-hash sharding | range queries, ad-hoc filters |
| Wide-column | LSM-tree | very low (append) | linear, add nodes | read amplification, ad-hoc joins |
| Graph | adjacency + index | moderate | hard (traversals cross shards) | scale-out, bulk aggregation |
Interview Q&A · deep dive
Snowflake — the cloud data warehouse analytics
Snowflake is a fully-managed cloud data warehouse (OLAP) whose defining idea is the separation of storage and compute: data sits once in cheap cloud object storage, and independent virtual warehouses (compute clusters) read it — so analytics, ETL, and data-science teams scale up/down separately and never block each other. (Builds on Databases compared.)
| Feature | Why it's a big deal |
|---|---|
| Storage ⟂ compute | resize compute in seconds, pay per-second only while a warehouse runs; one team's heavy query can't starve another's |
| Micro-partitions | data auto-split into pruned, columnar chunks — fast scans, no manual indexing/partitioning |
| Time Travel | query or restore data as-of a past timestamp (oops-recovery, audits) within a retention window |
| Zero-copy clone | instant copy of a table/DB sharing storage until changed — spin a full prod-like dev env in seconds |
| Data sharing | share live data with another account without copying — no export/ingest |
| Snowpark · Cortex | run Python/Java in-DB (Snowpark); call LLMs / ML from SQL (Cortex) — bring compute to the data |
-- compute is a named, resizable resource you turn on per workload
CREATE WAREHOUSE etl_wh WITH warehouse_size = 'MEDIUM'
auto_suspend = 60 auto_resume = TRUE; -- pause when idle = save money
-- instant dev copy of prod, no storage cost until you change it
CREATE TABLE trials_dev CLONE trials_prod;
-- query the table as it looked 2 hours ago
SELECT * FROM trials AT(offset => -7200);
-- call an LLM from SQL (Cortex) to summarise rows
SELECT snowflake.cortex.summarize(notes) FROM site_inspections;
| Pick | When |
|---|---|
| Snowflake | multi-cloud, easy ops, strong sharing/cloning, mixed analytics teams |
| BigQuery | deep in GCP, fully serverless, pay-per-query analytics |
| Redshift | deep in AWS, tight S3 / Bedrock integration |
| Databricks | lakehouse + heavy Spark / ML on open formats (Delta / Iceberg) |
Interview Q&A
Snowflake's architecture is three decoupled layers, and almost every interview answer reduces to which one you're talking about. Storage = your data, kept once as compressed columnar micro-partitions in cloud object storage (S3/Azure Blob/GCS). Compute = virtual warehouses, ephemeral MPP clusters you size and suspend per workload. Cloud Services = the always-on brain that does query optimisation, transaction management, security, and metadata. You pay storage and compute on separate meters — that decoupling is the whole product.
Snowflake auto-divides every table into micro-partitions of ~50–500 MB uncompressed, each storing columns separately with per-partition metadata (min/max, distinct counts, nulls). A WHERE filter is answered by partition pruning: the optimiser reads the metadata, skips partitions whose min/max can't match, and only scans the survivors — no B-tree indexes to design or maintain. Data arriving in roughly sorted order prunes well for free; when query patterns drift off the natural order you add automatic clustering on a clustering key so background re-clustering keeps pruning effective. The 2026 engine extends pruning to Iceberg tables, Top-K, and LIKE predicates.
-- keep pruning healthy on a column you filter that isn't the load order
ALTER TABLE events CLUSTER BY (event_date, registry);
SELECT system$clustering_information('events', '(event_date)'); -- check overlap depth
-- semi-structured JSON lands in a VARIANT and is queried with path + FLATTEN
SELECT v:patient:age::int AS age, f.value:dose::float
FROM raw_payloads,
LATERAL FLATTEN(input => v:medications) f;
-- three layers of caching, no config: result cache (24h, exact-query reuse),
-- local SSD cache on the warehouse, and remote storage. Re-run = often free.
-- 2026 AISQL (Cortex) — AI_ as first-class SQL operators over columns
SELECT review_id,
AI_CLASSIFY(body, ['bug', 'praise', 'billing']):labels[0] AS topic,
AI_SENTIMENT(body) AS mood,
AI_COMPLETE('llama3.1-70b', 'Summarise in 8 words: ' || body) AS tldr
FROM reviews
WHERE created >= dateadd(day, -7, current_date());
| Feature | What it does |
|---|---|
| Snowpipe / Snowpipe Streaming | continuous low-latency ingest — files auto-load on arrival, or rows stream in via SDK without staging files |
| Streams + Tasks | a stream is a CDC cursor (what changed since last read); tasks are scheduled SQL — together they build incremental ELT |
| Dynamic Tables | declarative materialised pipelines: you state the query + target freshness, Snowflake handles incremental refresh (AISQL pipelines build on these) |
| Snowpark | DataFrame API for Python/Java/Scala pushed down to the warehouse; SPCS runs full containers (incl. GPUs) next to the data |
| RBAC roles | privileges granted to roles, roles to users — least-privilege, hierarchical; the exam-favourite security model |
Interview Q&A · deep dive
DATE(created_at) suddenly scans the whole table. Why?created_at, not for DATE(created_at), so every partition must be scanned. Rewrite as a half-open range on the bare column (>= day AND < next_day) and pruning returns.Pandas — the complete working reference analysis
Pandas is the workhorse for tabular data in Python. The golden rule: think in vectorised column operations, never row loops. This section covers the whole surface a data role assumes you own — structures, I/O, selection, the core verbs, missing data, dtypes & memory, time series, performance, and the traps that bite everyone.
| Object | What it is |
|---|---|
| Series | a 1-D labelled array (one column) — values + an Index |
| DataFrame | a 2-D table — a dict of Series sharing one Index (rows) and columns |
| Index | the row labels; enables fast alignment, joins, and lookups |
import pandas as pd
df = pd.read_csv("trials.csv") # also read_excel, read_json, read_sql
df = pd.read_parquet("trials.parquet") # columnar: faster + smaller than CSV
df.to_parquet("out.parquet", index=False) # prefer parquet for re-use
chunks = pd.read_csv("huge.csv", chunksize=100_000) # stream a file too big for RAM
df["phase"] # a column (Series)
df[["phase", "status"]] # several columns (DataFrame)
df.loc[df["status"] == "Recruiting", "phase"] # LABEL-based: rows by mask, one col
df.iloc[0:5, 0:3] # POSITION-based: first 5 rows, 3 cols
df[(df.phase == "2") & (df.enrollment > 100)] # boolean AND — wrap each term in ()
df[df.registry.isin(["NCT", "CTRI"])] # membership filter
| Use | When |
|---|---|
| .loc | select by label (column names, index values, boolean mask) — your default |
| .iloc | select by integer position |
| boolean mask | filter rows by condition — combine with & / |, each side parenthesised |
# filter + sort
act = df[df.status == "Recruiting"].sort_values("enrollment", ascending=False)
# groupby + aggregate (split-apply-combine)
g = df.groupby("phase").agg(
n=("nct_id", "count"),
avg_enroll=("enrollment", "mean")) # named aggregations
# merge = SQL join; concat = stack
joined = df.merge(sites, on="nct_id", how="left") # inner|left|right|outer
stacked = pd.concat([jan, feb], ignore_index=True) # append rows
# reshape: long<->wide
wide = df.pivot_table(index="registry", columns="phase", values="enrollment", aggfunc="mean")
long = wide.melt(ignore_index=False) # unpivot back to long
| Verb | SQL analogue | Does |
|---|---|---|
| groupby + agg | GROUP BY | split into groups, compute per group, combine |
| merge | JOIN | combine tables on key(s); pick how |
| concat | UNION / append | stack rows or columns |
| pivot_table | crosstab | long → wide with aggregation |
| melt | unpivot | wide → long |
df.isna().sum() # count nulls per column
df.dropna(subset=["enrollment"]) # drop rows missing a key field
df["enrollment"] = df.enrollment.fillna(df.enrollment.median()) # impute
df.assign(flag=df.sponsor.isna()) # derive a column, chain-friendly
df.info(memory_usage="deep") # see real memory
df["registry"] = df.registry.astype("category") # repeated strings → huge RAM win
df["enrollment"] = pd.to_numeric(df.enrollment, downcast="integer")
df["start"] = pd.to_datetime(df.start) # real datetimes, not strings
ts = df.set_index("start").sort_index()
ts.resample("M")["nct_id"].count() # trials per month
ts["enrollment"].rolling(7).mean() # 7-period moving average
g.groupby("phase")["enrollment"].transform("mean") # broadcast group stat back to rows
out = (df
.query("status == 'Recruiting'")
.assign(yr=lambda d: pd.to_datetime(d.start).dt.year)
.groupby("yr", as_index=False)
.agg(n=("nct_id", "count"))
.sort_values("yr")) # reads top-to-bottom, no temp vars
| Performance lever | Why |
|---|---|
| Vectorise (column ops) | runs in C over NumPy arrays — 10–100× faster than loops |
| Avoid iterrows / apply(axis=1) | per-row Python overhead; last resort only |
| category dtype | shrinks repeated-string memory, speeds groupby/merge |
| chunksize | process files larger than RAM in streams |
| Parquet over CSV | columnar, typed, compressed — faster re-reads |
| Polars / DuckDB | when pandas is too slow/big — know they exist |
Interview Q&A
The card already says "vectorise, not loop" — here is the decision tree underneath it. A true vectorised op runs one C loop over a NumPy buffer. .apply(axis=1) and .iterrows() call back into Python per row and build a Series each time — slowest. .apply on a Series (axis-free) is a Python-level loop too, but cheaper. .map on a Series is element-wise and accepts a dict (great for lookups). For string/date work use the accessors (.str, .dt) which are vectorised, not .apply(str.upper).
import numpy as np
# SLOW: row-wise Python callback, materialises a Series per row
df["band"] = df.apply(lambda r: "big" if r.enrollment > 100 else "small", axis=1)
# FAST: vectorised — np.where over the whole column in C
df["band"] = np.where(df.enrollment > 100, "big", "small")
# many branches: np.select beats a chain of apply()
conds = [df.enrollment > 500, df.enrollment > 100]
labels = ["mega", "big"]
df["band"] = np.select(conds, labels, default="small")
# lookup: .map with a dict is the idiomatic recode, not apply
df["region"] = df.country.map({"US": "NA", "BR": "LATAM"}).fillna("other")
# string/date work: use the vectorised accessors, never apply()
df["sponsor"] = df.sponsor.str.strip().str.upper()
df["qtr"] = df.start.dt.quarter
The card shows the fix; here's the why. Pandas can't always tell whether an indexing result is a view (shares the parent's NumPy buffer) or a copy. Chained indexing df[mask]["col"] = v compiles to two calls: __getitem__ returns a possibly-temporary object, then __setitem__ writes to that — which may be a copy that's discarded, so your write vanishes. The single-call form df.loc[mask, "col"] = v goes through one __setitem__ on the original, which is unambiguous. The other reliable cause is operating on a slice you forgot to .copy(). Pandas 3.0's Copy-on-Write (default) ends the ambiguity: every indexing result behaves like an independent object, the warning is retired, and you opt into mutation explicitly with .copy() or .loc.
# transform: returns a result aligned to the ORIGINAL index (same length)
df["z"] = (df.enrollment - df.groupby("phase").enrollment.transform("mean")) \
/ df.groupby("phase").enrollment.transform("std") # per-group z-score
# filter: keep whole GROUPS by a group-level predicate (not rows)
big = df.groupby("sponsor").filter(lambda g: len(g) >= 10)
# validate a merge so a silent fan-out can't happen
j = df.merge(sites, on="nct_id", how="left",
validate="m:1", # raise if sites isn't unique on key
indicator=True) # _merge col: left_only / both — audit match rate
assert (j._merge == "both").mean() > 0.95 # >95% matched or investigate
| Trap | What goes wrong | Guard |
|---|---|---|
| Duplicate join key | m:m fan-out silently multiplies rows; sums double-count | validate="m:1" / check .duplicated() first |
| NaN != NaN | rows with null keys never match; == on NaN is False | filter nulls before merge; .isna() not == None |
| Index misalignment | arithmetic on two Series aligns by index, injecting NaN | compare indexes, or .reset_index / .values |
| object dtype | silent fallback to Python objects kills speed | check .dtypes; cast category / numeric / datetime |
Interview Q&A · deep dive
.apply actually justified?np.select — e.g. calling an external API, parsing irregular text with branching, or applying a stateful function. Even then prefer a vectorised path first; if you must loop, apply over a Series beats apply(axis=1), and a list comprehension over .to_numpy() can beat both..to_numpy() / .values when you want positional math.agg collapses each group to one value (length = number of groups). transform returns a result broadcast back to the original row index (same length as input) — ideal for per-group normalisation. apply is the flexible, slow fallback that can return any shape. Reach for agg/transform first; they're vectorised per group..copy() / .loc when you do want to mutate.NumPy — the complete array reference numerical core
NumPy is the foundation of the entire Python data/ML stack — Pandas, scikit-learn, PyTorch, and TensorFlow all sit on the ndarray: a fixed-type, contiguous block of memory you operate on in bulk with fast C loops instead of Python loops. Master the array and broadcasting and everything above it makes sense.
import numpy as np
np.array([[1, 2], [3, 4]]) # from a list
np.zeros((2, 3)); np.ones((2, 3)); np.full((2, 2), 9)
np.arange(0, 10, 2) # [0 2 4 6 8]
np.linspace(0, 1, 5) # 5 evenly spaced points
np.eye(3) # identity matrix
| Attribute | Tells you |
|---|---|
| a.shape | dimensions, e.g. (rows, cols) |
| a.ndim | number of axes |
| a.size | total elements |
| a.dtype | element type (int64, float32…) — fixed & uniform |
a = np.arange(12)
a.reshape(3, 4) # 3x4 (use -1 to infer: reshape(3, -1))
a.reshape(3, 4).T # transpose to 4x3 (just swaps strides)
a.ravel() # flatten to 1-D (a view when possible)
a[:, np.newaxis] # add an axis: column vector
a = np.arange(10)
a[2:7:2] # slice start:stop:step
a[[0, 3, 9]] # fancy: pick indices
a[a > 5] # boolean mask: elements over 5
a[a > 5] = 0 # assign through a mask
np.where(a > 5, 1, 0) # vectorized if/else
# shapes align from the RIGHT; a dim of 1 stretches to match
A = np.ones((3, 4)) # (3, 4)
b = np.array([1, 2, 3, 4]) # (4,) stretches across rows
A + b # each row gets b added, no loop
col = np.array([[10], [20], [30]]) # (3, 1) stretches across columns
A + col # each column gets col added
a = np.arange(6).reshape(2, 3)
np.exp(a); np.sqrt(a); a ** 2 # element-wise, vectorized
a.sum() # 15 (everything)
a.sum(axis=0) # down columns: shape (3,)
a.sum(axis=1) # across rows: shape (2,)
a.mean(); a.std(); a.max(axis=1)
| Need | Function |
|---|---|
| Stack / split | np.concatenate · vstack · hstack · stack · split |
| Linear algebra | a @ b · np.linalg.inv · solve · eig · svd · norm |
| Random | rng = np.random.default_rng() then rng.normal · rng.choice |
| Type / clip | a.astype(np.float32) · np.clip(a, lo, hi) |
Interview Q&A
The card states the rule; here is the exact procedure NumPy runs. (1) Right-align the two shapes and left-pad the shorter with 1s. (2) For each dimension, sizes are compatible if they're equal or one is 1; otherwise raise. (3) The output dimension is the max of the two; a size-1 dimension is virtually stretched by setting its stride to 0 — no data is copied, the same element is re-read. That zero-stride trick is why (1000,1) + (1,1000) produces a million-element result without a million-element temporary for either input.
import numpy as np
a = np.arange(3).reshape(3, 1) # (3,1)
b = np.arange(4) # (4,) -> padded to (1,4)
(a + b).shape # (3,4) outer-style grid, no python loop
# classic use: pairwise distances without a loop
pts = np.random.default_rng(0).normal(size=(5, 2))
diff = pts[:, None, :] - pts[None, :, :] # (5,1,2)-(1,5,2) => (5,5,2)
dist = np.sqrt((diff ** 2).sum(axis=-1)) # (5,5) distance matrix
# trap: (n,) and (n,1) do NOT broadcast the way you expect
x = np.arange(3) # (3,)
(x + x[:, None]).shape # (3,3) outer sum, not element-wise! keep ranks explicit
An ndarray is a view over a flat buffer plus three pieces of metadata: shape, dtype, and strides (bytes to step per axis). .T, basic slices, and reshape (when possible) just synthesise new strides over the same buffer — O(1), zero copy. .reshape must copy only when the requested layout isn't expressible as strides on the current buffer (e.g. reshaping a non-contiguous transpose); .ravel() returns a view if it can, .flatten() always copies. C-order (row-major, default) stores rows contiguously; F-order (column-major) stores columns contiguously — and the difference decides cache performance: summing along the contiguous axis is dramatically faster.
a = np.arange(12).reshape(3, 4)
a.strides # (32, 8) on int64: 32B per row, 8B per col
a.T.strides # (8, 32) — transpose just swaps strides, no copy
a.flags["C_CONTIGUOUS"], a.T.flags["C_CONTIGUOUS"] # True, False
# contiguity drives speed: same data, ~order-of-magnitude gap on big arrays
big = np.random.default_rng(0).normal(size=(4000, 4000))
big.sum(axis=1) # fast: walks contiguous rows (C-order)
big.sum(axis=0) # slower: strides across rows, cache-unfriendly
# a stride trick: sliding windows with ZERO copy (use the safe helper)
from numpy.lib.stride_tricks import sliding_window_view
w = sliding_window_view(np.arange(6), 3) # [[0,1,2],[1,2,3],[2,3,4],[3,4,5]]
Ufuncs (np.add, np.exp, …) are compiled element-wise kernels with broadcasting, type promotion, and an out= parameter built in. np.vectorize is not vectorisation — it's a convenience wrapper around a Python loop, so it gives the API but not the speed; reach for real ufuncs, np.where/np.select, or Numba instead. Two real levers: ufunc methods like .reduce / .accumulate / .outer, and in-place ops (out= or +=) to avoid allocating a fresh array on hot paths.
a = np.arange(1, 5, dtype=np.float64)
np.add.reduce(a) # 10.0 (same as a.sum, the reduction form)
np.multiply.outer([1,2], [3,4]) # outer product via ufunc method
# in-place: no new allocation, writes back into a (mind the dtype!)
np.exp(a, out=a) # a is overwritten with exp(a)
a /= a.sum() # normalise in place
Interview Q&A · deep dive
(3,1) + (1,4) broadcasts and what it costs.(3,4); each size-1 axis is stretched by setting its stride to 0 so the single element is re-read rather than copied. No large temporaries for the inputs are materialised — only the (3,4) output. That zero-stride mechanism is what makes outer-style operations memory-cheap.reshape copy, and how do you guarantee no copy?reshape returns a view when the new shape is expressible as strides over the existing buffer; it must copy when it isn't — e.g. reshaping a transposed (non-contiguous) array to a shape that would require reordering bytes. To guarantee a view you can assign to a.shape (raises instead of silently copying), and to control layout use np.ascontiguousarray first.axis=0 of a big C-order array slower than axis=1?axis=1 walks each row in cache-friendly contiguous steps; axis=0 strides across rows (one row-length jump per element), thrashing the cache. Same FLOPs, very different memory access pattern — layout, not arithmetic, dominates. Transposing/copying to the favourable order can pay off if you reduce repeatedly.np.vectorize a real speedup? What do you use instead?for loop for ergonomics, not performance. For speed use genuine ufuncs and array expressions, np.where/np.select for branching, boolean masking for filters, and Numba or Cython when the logic truly can't be expressed in array ops.a[[0,2]]) and boolean masking return copies, so those writes don't. When you need independence, call .copy(); when you want the in-place effect, slice deliberately.SQLAlchemy — Python SQL toolkit & ORM orm
SQLAlchemy is two layers: Core (a Pythonic SQL expression language) and the ORM (map classes to tables). The modern 2.0 style uses typed declarative models, a Session (the unit of work), and select(). It's the default data layer behind most Flask and FastAPI apps.
from sqlalchemy import create_engine, select, String
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, Session
class Base(DeclarativeBase): pass
class Trial(Base):
__tablename__ = "trials"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str] = mapped_column(String(200))
engine = create_engine("postgresql+psycopg://...")
with Session(engine) as s:
rows = s.scalars(select(Trial).where(Trial.id == 42)).all()
s.add(Trial(title="new")); s.commit() # unit of work: flush + commit
| Concept | What it is |
|---|---|
| Core vs ORM | Core = compose SQL in Python (full control); ORM = map classes to rows (productivity). They interoperate. |
| Session | the unit of work — tracks changes and flushes them on commit; the transactional boundary |
| relationship() | declares links between models so you traverse trial.sponsor in Python |
| Engine + pool | manages connections; pooling reuses them instead of reconnecting per query |
| Alembic | the migration tool — versioned schema changes from your models |
Interview Q&A
A Session is a workspace plus a transaction. Every object it loads or adds is tracked in its identity map — a dict keyed by class+primary-key — so two queries for the same row return the same Python object, and the session knows exactly what changed. Objects move through states: transient (new, not added) → pending (added, not yet in DB) → persistent (flushed/loaded, has a PK, tracked) → detached (session closed) / deleted. A flush emits the pending INSERT/UPDATE/DELETE SQL in dependency order inside the transaction; commit flushes then commits. Flush ≠ commit — autoflush sends SQL before a query so your reads see your own writes, but nothing is durable until commit.
from sqlalchemy import ForeignKey, select, func
from sqlalchemy.orm import (DeclarativeBase, Mapped, mapped_column,
relationship, Session, selectinload, joinedload)
class Base(DeclarativeBase): pass
class Sponsor(Base):
__tablename__ = "sponsors"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str]
# lazy="raise" makes any accidental lazy load blow up loudly in dev/tests
trials: Mapped[list["Trial"]] = relationship(back_populates="sponsor", lazy="raise")
class Trial(Base):
__tablename__ = "trials"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str]
sponsor_id: Mapped[int] = mapped_column(ForeignKey("sponsors.id"))
sponsor: Mapped[Sponsor] = relationship(back_populates="trials")
with Session(engine) as s:
# N+1 FIX: one query for sponsors + one batched IN() query for all trials
stmt = select(Sponsor).options(selectinload(Sponsor.trials))
for sp in s.scalars(stmt):
print(sp.name, len(sp.trials)) # no extra query per sponsor
# many-to-one: joinedload pulls parent+child in ONE join
t = s.scalars(select(Trial).options(joinedload(Trial.sponsor))).first()
# aggregate in SQL, not Python (2.0 select + func)
counts = s.execute(
select(Trial.sponsor_id, func.count())
.group_by(Trial.sponsor_id)).all()
| Loader | Emits | Best for |
|---|---|---|
| lazy (default) | one query per access | rarely-touched relations; the N+1 source under loops |
| selectinload | 2nd SELECT with IN (pks) | collections (one-to-many / many-to-many) — the default eager choice |
| joinedload | LEFT OUTER JOIN, one query | many-to-one / one-to-one scalars |
| contains_eager | none — you wrote the JOIN | reuse a join you already filtered on; pair with populate_existing |
| lazy="raise" | raises on access | forcing every load to be explicit — catch N+1 in tests, not prod |
# one-time
alembic init migrations
# point env.py target_metadata = Base.metadata, then autogenerate a diff
alembic revision --autogenerate -m "add sponsor.country"
alembic upgrade head # apply; alembic downgrade -1 to roll back one
alembic current; alembic history --verbose
Interview Q&A · deep dive
expire_on_commit=False).IN query, no row multiplication, plays well with LIMIT). Scalar many-to-one compresses, so joinedload (single LEFT JOIN) is ideal. Using joinedload for a collection inflates rows and breaks pagination; using selectinload for a single scalar adds an unnecessary round trip.lazy="raise" (or raiseload()) so any unplanned lazy load raises in tests/dev; declare loader strategy explicitly at the query with .options(); and assert query counts in integration tests (e.g. via sqlalchemy event hooks or a fixture that counts statements). That turns a latent perf bug into a hard failure during review.AsyncSession with create_async_engine and await session.execute(...). The catch: implicit lazy loads don't work under async (a lazy load is sync I/O), so you must eager-load with selectinload/joinedload or use AsyncSession.run_sync / write-only relationships. That constraint is exactly why explicit loading discipline matters even more in async services.Linked lists pointers
A linked list is a chain of nodes, each holding a value and a reference (next) to the following node. There is no contiguous block and no index — you reach element k by walking k hops from the head. That single property explains every trade-off: O(1) insert/delete once you hold the node, but O(n) to find a position and terrible cache locality versus a Python list (a contiguous array).
A singly linked list stores only a forward pointer; a doubly linked list adds a prev pointer so you can delete a node in O(1) without first walking to its predecessor — the reason collections.deque and an LRU cache use one internally. The cost is an extra pointer per node and two links to fix on every edit. A sentinel/dummy head node (a fake node before the real first one) removes almost all "is this the head?" special-casing — senior code uses one by default.
class Node:
def __init__(self, val, nxt=None):
self.val, self.next = val, nxt
class LinkedList:
def __init__(self):
self.head = None
def push_front(self, val): # O(1) insert at head
self.head = Node(val, self.head)
def delete(self, target): # O(n) find, O(1) unlink
dummy = Node(None, self.head) # sentinel kills head edge-case
prev, cur = dummy, self.head
while cur:
if cur.val == target:
prev.next = cur.next # skip the node = delete
break
prev, cur = cur, cur.next
self.head = dummy.next
def reverse(self): # O(n) time, O(1) space
prev, cur = None, self.head
while cur:
cur.next, prev, cur = prev, cur, cur.next # flip one link
self.head = prev
def __iter__(self):
cur = self.head
while cur:
yield cur.val
cur = cur.next
ll = LinkedList()
for x in [3, 2, 1]: ll.push_front(x)
ll.reverse()
print(list(ll)) # [3, 2, 1]
Many list problems collapse if you walk two pointers at different speeds. Move fast two steps and slow one: when fast hits the end, slow sits at the middle (one pass, no length needed). The same trick finds the k-th from the end (start fast k nodes ahead) and detects a cycle (Floyd's tortoise & hare — if there is a loop the two pointers must eventually meet).
def has_cycle(head): # Floyd's algorithm, O(1) space
slow = fast = head
while fast and fast.next:
slow = slow.next # 1 hop
fast = fast.next.next # 2 hops
if slow is fast: # they collided → loop exists
return True
return False
| Op | Linked list | Array (list) | Why |
|---|---|---|---|
| access by index | O(n) | O(1) | array = pointer arithmetic; list = walk |
| insert/delete at head | O(1) | O(n) | array shifts every element |
| insert/delete mid (node held) | O(1) | O(n) | list relinks; array shifts |
| cache / memory | poor, scattered | contiguous | array wins real-world scans |
Interview Q&A · deep dive
fast reaches None first and you return early. Time O(n), space O(1).head and advance both one step at a time; they meet at the cycle entry. It works because the distance from head to entry equals the distance from the meeting point to the entry (modulo loop length) — the classic two-phase Floyd proof.tail pointer; repeatedly splice whichever input node is smaller onto tail.next and advance. You re-use the existing nodes (no allocation), so it's O(n) time, O(1) auxiliary space — the building block of merge sort on lists.list usually faster than a linked list even for inserts?Stacks & queues LIFO / FIFO
Two restricted lists defined by where you add and remove. A stack is LIFO (push/pop the same end) — the shape of recursion, undo, and bracket matching. A queue is FIFO (enqueue one end, dequeue the other) — the shape of BFS, task pipelines, and fair scheduling. In Python a plain list is a fine stack; for a queue use collections.deque so both ends are O(1).
A Python list appends in amortised O(1) but list.pop(0) is O(n) — it shifts every remaining element left. A deque is a doubly linked list of fixed-size blocks, so append/appendleft/pop/popleft are all O(1). Use a list for a stack (pop from the end is O(1)); use a deque the moment you touch the front. For thread-safe producer/consumer hand-off, reach for queue.Queue instead — it adds locking and blocking.
def is_balanced(s):
pairs = {')': '(', ']': '[', '}': '{'}
stack = []
for ch in s:
if ch in '([{':
stack.append(ch) # open → push
elif ch in pairs:
if not stack or stack.pop() != pairs[ch]:
return False # mismatch or nothing to close
return not stack # all opens were closed
print(is_balanced("a(b[c]{d})")) # True
print(is_balanced("([)]")) # False — wrong nesting
from collections import deque
def bfs(graph, start): # level-order, shortest hops
seen, q, order = {start}, deque([start]), []
while q:
node = q.popleft() # FIFO → explore nearest first
order.append(node)
for nb in graph[node]:
if nb not in seen:
seen.add(nb)
q.append(nb)
return order
def next_greater(nums): # monotonic decreasing stack, O(n)
res, stack = [-1] * len(nums), [] # stack holds indices
for i, x in enumerate(nums):
while stack and nums[stack[-1]] < x:
res[stack.pop()] = x # x is the next-greater for that index
stack.append(i)
return res
print(next_greater([2, 1, 3, 0])) # [3, 3, -1, -1]
| Need | Use | Ops | Note |
|---|---|---|---|
| Stack (LIFO) | list | append / pop() | pop from end is O(1) |
| Queue (FIFO) | deque | append / popleft() | both ends O(1) |
| Double-ended | deque | both ends | sliding-window, monotonic deque |
| Thread-safe handoff | queue.Queue | put / get (blocking) | locks + backpressure |
Interview Q&A · deep dive
in stack and an out stack. Push to in; to dequeue, if out is empty pour all of in into out (reversing order), then pop out. Each element is moved at most once, so dequeue is amortised O(1) even though a single transfer is O(n).deque(maxlen=n) auto-evicts — a free ring buffer), or thread-safe-ish single appends. The cost of a deque is no O(1) random indexing.RecursionError. An explicit list-as-stack moves the frames to the heap, lifting the limit to available memory and often running faster.Hash tables internals O(1) avg
A hash table turns a key into an array slot in one shot: run the key through hash(), fold the digest down to an index, store the entry there. No scan — you compute the location. That's why dict/set lookup is average O(1). The whole engineering problem is what happens when two keys land in the same slot (collisions) and how the table grows (resizing) to keep collisions rare.
Two strategies resolve a collision. Separate chaining: each slot holds a small list (or tree) of entries that hashed there — simple, degrades gracefully, but pointer-chases and wastes memory (used by Java's HashMap). Open addressing: keep everything in one array and, on a collision, probe to another slot by a deterministic rule (linear, quadratic, or double hashing) — cache-friendly, no per-entry allocation, but suffers clustering and needs tombstones on delete. CPython's dict uses open addressing with a perturbation probe sequence.
The load factor α = entries / slots measures how full the table is. As α rises, collisions and probe lengths grow; performance stays O(1) only while α is bounded. So the table resizes (allocates a bigger array, ~2× or 4×, and re-hashes every entry into it) when α crosses a threshold — about 2/3 for CPython dicts, 0.75 for Java. A resize is O(n), but because the array grows geometrically the cost amortises to O(1) per insert. This is exactly why dict insertion is "amortised O(1)," not worst-case.
class HashMap:
def __init__(self, cap=8):
self._cap = cap
self._n = 0
self._slots = [None] * cap # each slot: None or (key, value)
def _index(self, key):
i = hash(key) & (self._cap - 1) # fold digest to a slot
while self._slots[i] is not None and self._slots[i][0] != key:
i = (i + 1) & (self._cap - 1) # linear probe, wrap around
return i
def put(self, key, value):
if (self._n + 1) / self._cap > 0.66: # load factor > 2/3
self._resize()
i = self._index(key)
if self._slots[i] is None:
self._n += 1
self._slots[i] = (key, value)
def get(self, key, default=None):
i = self._index(key)
slot = self._slots[i]
return slot[1] if slot else default
def _resize(self):
old = [s for s in self._slots if s]
self._cap *= 2
self._slots = [None] * self._cap
self._n = 0
for k, v in old: # re-hash everything into bigger table
self.put(k, v)
m = HashMap()
for i in range(20): m.put(f"k{i}", i * i)
print(m.get("k7"), m.get("nope", -1)) # 49 -1
| Chaining | Open addressing | |
|---|---|---|
| Storage | array of lists | single flat array |
| Cache | pointer-chasing | cache-friendly |
| Delete | just unlink | needs tombstone |
| Best load factor | can exceed 1 | keep below ~0.7 |
| Used by | Java HashMap | CPython dict |
Interview Q&A · deep dive
h & (size-1) is a single bitwise op, far cheaper than %. It only uses the low bits, so it relies on the hash already mixing high bits down — CPython adds a "perturbation" that folds high bits into the probe sequence. Prime-sized tables tolerate weaker hashes via modulo's mixing but pay for the division. It's a hash-quality vs arithmetic-cost trade.Heaps & priority queues priority
A binary heap is a complete binary tree stored in a flat array where every parent is ≤ (min-heap) or ≥ (max-heap) its children. That single invariant gives you the smallest (or largest) element at index 0 in O(1) and lets you push/pop in O(log n) — the engine behind a priority queue. Python's heapq turns any list into a min-heap in place; no separate class needed.
Because the tree is complete, you don't store child pointers — arithmetic finds them. For node at index i: parent is (i-1)//2, children are 2i+1 and 2i+2. That's why a heap is just a list. The two repair operations keep the invariant: sift-up (a freshly pushed leaf bubbles toward the root while smaller than its parent) and sift-down (after popping the root, the last element drops to the top and sinks past its smaller child). Both touch one root-to-leaf path → O(log n).
import heapq
nums = [5, 1, 8, 3, 9, 2]
heapq.heapify(nums) # O(n), in place → min-heap
heapq.heappush(nums, 0) # O(log n)
print(heapq.heappop(nums)) # 0 → smallest, O(log n)
# Top-k largest with a bounded MIN-heap of size k: O(n log k), O(k) space
def top_k(stream, k):
h = []
for x in stream:
if len(h) < k:
heapq.heappush(h, x)
elif x > h[0]: # bigger than the smallest kept?
heapq.heapreplace(h, x) # pop min + push x, one sift
return sorted(h, reverse=True)
print(top_k([5, 1, 8, 3, 9, 2], 3)) # [9, 8, 5]
# Max-heap or tie-broken priority queue: push (priority, counter, item)
import itertools
counter = itertools.count()
pq = []
for prio, task in [(2, "b"), (1, "a"), (2, "c")]:
heapq.heappush(pq, (prio, next(counter), task)) # counter breaks ties, avoids comparing tasks
while pq:
print(heapq.heappop(pq)[2], end=" ") # a b c → priority 1 first, then FIFO within ties
| Operation | heapq | Cost | Note |
|---|---|---|---|
| peek min | h[0] | O(1) | root is always smallest |
| push | heappush | O(log n) | sift-up |
| pop min | heappop | O(log n) | sift-down |
| build from list | heapify | O(n) | not O(n log n) |
| top-k | nlargest(k, …) | O(n log k) | beats full sort when k≪n |
Interview Q&A · deep dive
heapify O(n) and not O(n log n)?heapify is strictly better.(priority, counter, item) instead of just (priority, item)?__lt__) you get a TypeError. A monotonically increasing counter is always comparable, breaks ties deterministically (FIFO within a priority), and stops Python from ever comparing the items.heapq has no decrease-key, you don't update an existing entry; you push a new (dist, node) and, on pop, skip any entry whose distance is stale (worse than the best already finalised). Simpler than a real decrease-key and fast enough.Trees, BST & traversals trees
A tree is a connected acyclic graph with one root; each node points to children. A binary search tree keeps an ordering invariant — everything in the left subtree is smaller, everything right is larger — which turns search, insert and delete into O(h) where h is height. Keep the tree balanced and h ≈ log n; let it degrade to a linked list and h = n. Closely related: complexity tradeoffs and the heapq priority queue.
A BST is not "a tree that holds sorted data" — it is a tree where every node satisfies left < node < right recursively. That single rule is what lets you discard half the tree at each step (binary search on a structure). The moment the invariant is violated, search is just an O(n) walk. Three operations preserve it: insert descends to a leaf slot; search compares and branches; delete has three cases (leaf, one child, two children → replace with in-order successor).
from collections import deque
class Node:
def __init__(self, key):
self.key, self.left, self.right = key, None, None
def insert(root, key):
if root is None: return Node(key)
if key < root.key: root.left = insert(root.left, key)
elif key > root.key: root.right = insert(root.right, key) # dup ignored
return root
def search(root, key):
while root and root.key != key:
root = root.left if key < root.key else root.right
return root # Node or None — O(h)
def inorder(n): # sorted order for a BST!
if n: yield from inorder(n.left); yield n.key; yield from inorder(n.right)
def level_order(root): # BFS — uses a queue, not recursion
q, out = deque([root] if root else []), []
while q:
n = q.popleft(); out.append(n.key)
if n.left: q.append(n.left)
if n.right: q.append(n.right)
return out
root = None
for k in (8, 3, 10, 1, 6, 14, 4): root = insert(root, k)
print(list(inorder(root))) # [1, 3, 4, 6, 8, 10, 14] ← sorted
print(level_order(root)) # [8, 3, 10, 1, 6, 14, 4] ← by depth
print(bool(search(root, 6)), bool(search(root, 7))) # True False
| Traversal | Order rule | Classic use |
|---|---|---|
| In-order | left, node, right | BST → emit keys sorted |
| Pre-order | node, left, right | serialize / copy a tree |
| Post-order | left, right, node | delete / size / evaluate expr |
| Level-order | BFS by depth (queue) | shortest path in unweighted tree, "by row" |
A plain BST has no self-healing: insert 1,2,3,4,5 in order and you get a degenerate right-leaning chain with h = n — search is O(n). Self-balancing trees fix this by rotating on insert/delete to keep h = O(log n). An AVL tree is strictly balanced (heights of siblings differ by ≤ 1) → fastest lookups, more rotations. A red-black tree is loosely balanced (the rule Python's sortedcontainers and most language maps use, e.g. Java TreeMap, C++ std::map) → fewer rotations, great for write-heavy workloads. A trie (prefix tree) is a different beast: keys are paths of characters, so prefix lookup is O(key length) regardless of how many words are stored — the backbone of autocomplete and IP routing.
class Trie:
def __init__(self): self.root = {} # nested dict of chars
def add(self, word):
node = self.root
for ch in word: node = node.setdefault(ch, {})
node["$"] = True # end-of-word marker
def starts_with(self, prefix):
node = self.root
for ch in prefix:
if ch not in node: return False
node = node[ch]
return True
t = Trie(); t.add("cat"); t.add("car")
print(t.starts_with("ca"), t.starts_with("do")) # True False
Interview Q&A · deep dive
left, node, right and the BST invariant guarantees all of left < node < all of right at every level. Recursively that emits the smallest subtree first, then the node, then the larger subtree — exactly ascending order.queue and explores by depth — pick it for shortest-path-in-edges or "process by row". DFS (pre/in/post) uses a stack (often the call stack via recursion) and goes deep first — pick it for path-existence, subtree aggregation, or when the answer is near the leaves.Graphs: BFS, DFS, Dijkstra & friends graphs
A graph is vertices joined by edges — directed or not, weighted or not. Almost every "find a path / detect a cycle / order tasks / spread through a network" problem is a graph problem in disguise. The two universal walks are BFS (queue, shortest path in edges) and DFS (stack/recursion, reachability and ordering); add edge weights and you graduate to Dijkstra. See the BFS/DFS pattern card and heapq which powers Dijkstra.
How you store the graph dominates performance. An adjacency list (dict of node → neighbours) costs O(V+E) space and makes "who are my neighbours?" O(degree) — the default for sparse real-world graphs. An adjacency matrix is a V×V grid costing O(V²) space but answers "is there an edge u→v?" in O(1) — only worth it for dense graphs or when you need fast edge existence. Most code you write uses an adjacency list via defaultdict(list).
| Adjacency list | Adjacency matrix | |
|---|---|---|
| Space | O(V + E) | O(V²) |
| Edge exists? | O(degree) | O(1) |
| Iterate neighbours | O(degree) | O(V) |
| Best for | sparse (most graphs) | dense / many edge checks |
from collections import deque, defaultdict
import heapq
g = defaultdict(list)
for u, v in [("A","B"),("A","C"),("B","D"),("C","D"),("D","E")]:
g[u].append(v); g[v].append(u) # undirected
def bfs_dist(start): # fewest edges from start — O(V+E)
dist, q = {start: 0}, deque([start])
while q:
u = q.popleft()
for v in g[u]:
if v not in dist: # mark on enqueue, not dequeue
dist[v] = dist[u] + 1; q.append(v)
return dist
def has_cycle(start): # DFS, track parent (undirected)
seen = set()
def dfs(u, parent):
seen.add(u)
for v in g[u]:
if v not in seen:
if dfs(v, u): return True
elif v != parent: return True # back-edge
return False
return dfs(start, None)
def dijkstra(adj, src): # weighted shortest path — O(E log V)
dist = {src: 0}; pq = [(0, src)]
while pq:
d, u = heapq.heappop(pq)
if d > dist.get(u, float("inf")): continue # stale entry
for v, w in adj[u]:
nd = d + w
if nd < dist.get(v, float("inf")):
dist[v] = nd; heapq.heappush(pq, (nd, v))
return dist
print(bfs_dist("A")) # {'A':0,'B':1,'C':1,'D':2,'E':3}
print(has_cycle("A")) # True (A-B-D-C-A)
w = {"A":[("B",4),("C",1)], "C":[("B",2)], "B":[]}
print(dijkstra(w, "A")) # {'A':0,'C':1,'B':3} via C, not direct
Topological sort orders the nodes of a DAG so every edge points "forward" — the answer to "in what order can I run these tasks given dependencies?" (build systems, course prerequisites, package installs). Kahn's algorithm repeatedly removes nodes with in-degree 0; if any remain, there is a cycle. Union-find (disjoint-set) answers "are these two nodes in the same group?" in near-O(1) with path compression — the engine behind connected-components, Kruskal's MST, and dynamic connectivity.
from collections import deque
def topo_sort(nodes, edges): # Kahn's algorithm
indeg = {n: 0 for n in nodes}
adj = {n: [] for n in nodes}
for u, v in edges: adj[u].append(v); indeg[v] += 1
q = deque(n for n in nodes if indeg[n] == 0)
order = []
while q:
u = q.popleft(); order.append(u)
for v in adj[u]:
indeg[v] -= 1
if indeg[v] == 0: q.append(v)
if len(order) != len(nodes): raise ValueError("cycle!")
return order
print(topo_sort("abcd", [("a","b"),("a","c"),("b","d"),("c","d")]))
# ['a', 'b', 'c', 'd'] — a valid dependency order
Interview Q&A · deep dive
if d > dist[u]: continue skips those stale copies so we process each node only at its final distance.Sorting algorithms sorting
Sorting is the canonical divide-and-conquer playground and a comparison-sort floor of O(n log n) is one of CS's most-cited results. You will almost never write a sort in production — you call sorted() — but understanding quicksort (fast in practice, in-place), mergesort (stable, predictable, parallelisable), heapsort (in-place, guaranteed), and Python's hybrid Timsort tells you which built-in behaviour to expect and when an O(n) non-comparison sort is possible. Pairs with Big-O and the heapq structure.
Any sort that only compares elements needs at least ⌈log₂(n!)⌉ ≈ n log n comparisons — there are n! orderings and each comparison gives one bit. Both flagship sorts hit that bound by halving, but they differ in where the work happens. Mergesort splits trivially and does the work merging two sorted halves (stable, O(n) extra space). Quicksort does the work up front by partitioning around a pivot, then recurses on trivially-ordered halves (in-place, but O(n²) worst case if pivots are unlucky). Choosing a random/median pivot makes the worst case astronomically unlikely.
def merge_sort(a): # stable, O(n log n), O(n) space
if len(a) <= 1: return a
mid = len(a) // 2
left, right = merge_sort(a[:mid]), merge_sort(a[mid:])
out, i, j = [], 0, 0
while i < len(left) and j < len(right):
if left[i] <= right[j]: # <= keeps it STABLE
out.append(left[i]); i += 1
else:
out.append(right[j]); j += 1
out.extend(left[i:]); out.extend(right[j:])
return out
def quick_sort(a, lo=0, hi=None): # in-place, Lomuto partition
if hi is None: hi = len(a) - 1
if lo >= hi: return a
import random
p = random.randint(lo, hi) # random pivot dodges O(n^2)
a[p], a[hi] = a[hi], a[p]
pivot, i = a[hi], lo
for k in range(lo, hi):
if a[k] < pivot:
a[i], a[k] = a[k], a[i]; i += 1
a[i], a[hi] = a[hi], a[i] # pivot to its final slot
quick_sort(a, lo, i - 1); quick_sort(a, i + 1, hi)
return a
print(merge_sort([5,2,9,1,5,6])) # [1, 2, 5, 5, 6, 9]
print(quick_sort([5,2,9,1,5,6])) # [1, 2, 5, 5, 6, 9]
| Algorithm | Avg / Worst | Space | Stable? | Notes |
|---|---|---|---|---|
| Quicksort | n log n / n² | O(log n) | No | fastest in practice, in-place, cache-friendly |
| Mergesort | n log n / n log n | O(n) | Yes | predictable, parallelisable, external sort |
| Heapsort | n log n / n log n | O(1) | No | in-place + guaranteed, but poor cache locality |
| Timsort | n log n / n log n | O(n) | Yes | Python/Java default; O(n) on near-sorted data |
| Counting/Radix | O(n + k) | O(n + k) | Yes | non-comparison; ints/fixed keys in small range |
Python's sorted() and list.sort() use Timsort — a hybrid of mergesort and insertion sort by Tim Peters. It scans for already-sorted "runs" (ascending or descending), extends short runs with insertion sort, then merges runs with clever rules. The payoff: it is stable and runs in O(n) on already-sorted or reverse-sorted data — extremely common in real datasets. When keys are integers in a small range you can beat the n log n floor entirely with counting sort (O(n+k)) or radix sort (sort digit by digit), because they never compare elements.
# Real-world sort: stable, multi-key, with a custom key fn
people = [
{"name": "Ada", "team": "infra", "age": 31},
{"name": "Bo", "team": "data", "age": 31},
{"name": "Cy", "team": "data", "age": 25},
]
# sort by team asc, then age desc — tuple key, - for descending
ordered = sorted(people, key=lambda p: (p["team"], -p["age"]))
print([p["name"] for p in ordered]) # ['Bo', 'Cy', 'Ada']
def counting_sort(a, k): # O(n + k), ints in 0..k
cnt = [0] * (k + 1)
for x in a: cnt[x] += 1
out = []
for val, c in enumerate(cnt): out += [val] * c
return out
print(counting_sort([3,0,2,3,1], 3)) # [0, 1, 2, 3, 3]
Interview Q&A · deep dive
heapq.nlargest(10, data), which uses this strategy internally.Recursion & dynamic programming dp
Recursion solves a problem by solving smaller copies of itself until a base case. Dynamic programming is recursion plus memory: when those smaller copies overlap, you cache each answer once instead of recomputing it exponentially. DP turns "this is 2ⁿ and times out" into a clean polynomial table. The trick interviewers test is recognising a DP problem and writing the state transition. Builds on complexity analysis and the algorithm patterns card.
A problem is DP-shaped when it has both: overlapping subproblems (the naive recursion recomputes the same inputs many times) and optimal substructure (the best answer is built from best answers to subproblems). If subproblems don't overlap, plain recursion / divide-and-conquer is enough (mergesort doesn't need DP). The tell in a prompt: "count the number of ways…", "minimum/maximum cost to…", "can you reach…", "longest/shortest …subsequence", especially with choices made step by step.
from functools import lru_cache
# Naive recursion: O(2^n) — recomputes the same n exponentially
def fib_slow(n):
return n if n < 2 else fib_slow(n-1) + fib_slow(n-2)
# Top-down DP: same recursion + a cache → O(n). One line!
@lru_cache(maxsize=None)
def fib_memo(n):
return n if n < 2 else fib_memo(n-1) + fib_memo(n-2)
# Bottom-up DP: fill a table, no recursion, O(n) time O(1) space
def fib_tab(n):
if n < 2: return n
a, b = 0, 1
for _ in range(n - 1): a, b = b, a + b
return b
print(fib_memo(50)) # 12586269025 — instant; fib_slow would hang
print(fib_tab(50)) # 12586269025
| Memoization (top-down) | Tabulation (bottom-up) | |
|---|---|---|
| Direction | recurse from the goal, cache results | iterate from base cases up to the goal |
| Code feel | natural — add a cache to recursion | loop filling an array/grid |
| Computes | only states you actually need | every state in range |
| Risk | recursion-depth / stack limits | more upfront thought on fill order |
| Space trick | — | often drop to O(1)/O(width) rolling rows |
Most DP problems are one of a handful of templates wearing a costume. 0/1 knapsack (each item taken or not, maximise value under a weight cap) is the parent of countless "choose a subset under a budget" problems — the state is (item index, remaining capacity) and the transition is max(skip it, take it). Coin change (fewest coins to make an amount) and LCS (longest common subsequence — the core of diff and DNA alignment) are the other two you should be able to write cold.
def knapsack(weights, values, cap): # 0/1 knapsack — O(n*cap)
n = len(weights)
# dp[w] = best value achievable with capacity w
dp = [0] * (cap + 1)
for i in range(n):
# iterate capacity DOWNWARD so each item is used once
for w in range(cap, weights[i] - 1, -1):
dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
return dp[cap]
def coin_change(coins, amount): # fewest coins — O(amount*coins)
INF = float("inf")
dp = [0] + [INF] * amount # dp[a] = min coins to make a
for a in range(1, amount + 1):
for c in coins:
if c <= a: dp[a] = min(dp[a], dp[a - c] + 1)
return dp[amount] if dp[amount] != INF else -1
print(knapsack([1,3,4], [15,20,30], 4)) # 35 (items 1+3)
print(coin_change([1,2,5], 11)) # 3 (5+5+1)
Interview Q&A · deep dive
dp[w] is updated from dp[w - weight], and we need that source value to still reflect the previous item row (item not yet used). Going downward guarantees dp[w - weight] hasn't been touched this iteration, so each item is counted at most once. Going upward reuses the item arbitrarily — that's the unbounded knapsack.Window functions, end to end data
A window function computes across a set of rows related to the current row while keeping every row — that's the difference from GROUP BY, which collapses. The whole grammar lives in one clause: func() OVER (PARTITION BY … ORDER BY … frame). Master the three knobs — partition (the reset boundary), order (the sequence), and frame (which rows count) — and you can express rankings, running totals, moving averages, gaps-and-islands, and period-over-period deltas without a single self-join.
Read OVER right-to-left in effect: first the rows are split into independent partitions (no PARTITION BY means one big partition = the whole result). Within each partition the ORDER BY imposes a sequence. The frame then picks a moving slice of that sequence for the current row. Crucial gotcha: the moment you add ORDER BY to an aggregate like SUM(), the default frame becomes RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW — i.e. it silently turns into a running total.
-- sales: (region, sale_date, rep, amount)
SELECT region, rep, sale_date, amount,
-- ranking family: ties handled differently
ROW_NUMBER() OVER w AS rn, -- 1,2,3,4 (arbitrary tiebreak)
RANK() OVER w AS rnk, -- 1,2,2,4 (gaps after ties)
DENSE_RANK() OVER w AS drnk, -- 1,2,2,3 (no gaps)
NTILE(4) OVER w AS quartile, -- bucket into 4
-- navigation: peek at neighbouring rows for deltas
LAG(amount) OVER w AS prev_amt,
amount - LAG(amount, 1, 0) OVER w AS day_delta,
-- running total: ORDER BY flips SUM into cumulative
SUM(amount) OVER (PARTITION BY region ORDER BY sale_date) AS running_total,
-- 7-row moving average (explicit ROWS frame)
AVG(amount) OVER (PARTITION BY region ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma7
FROM sales
WINDOW w AS (PARTITION BY region ORDER BY amount DESC) -- named window, reused above
ORDER BY region, amount DESC;
-- "top 3 reps per region": rank in a subquery, then filter outside
-- (you CANNOT put a window function in WHERE — it runs after WHERE)
SELECT * FROM (
SELECT region, rep, total,
DENSE_RANK() OVER (PARTITION BY region ORDER BY total DESC) AS r
FROM rep_totals
) ranked
WHERE r <= 3;
-- gaps-and-islands: collapse consecutive active days into streaks
SELECT user_id, MIN(d) AS streak_start, MAX(d) AS streak_end, COUNT(*) AS days
FROM (
SELECT user_id, d,
-- the classic trick: date minus its row-number is constant within a run
d - (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY d)) AS grp
FROM active_days
) t
GROUP BY user_id, grp;
| Function | Ties / behaviour | Reach for it when |
|---|---|---|
| ROW_NUMBER | always 1,2,3… (non-deterministic on ties) | dedupe, exact pagination, pick latest-per-key |
| RANK | 1,2,2,4 — leaves gaps | leaderboards where ties skip places |
| DENSE_RANK | 1,2,2,3 — no gaps | top-N-per-group (includes all tied rows) |
| LAG/LEAD | value from N rows back/ahead | period-over-period deltas, "previous status" |
| SUM/AVG OVER | running/moving when ORDER BY present | cumulative totals, moving averages |
Interview Q&A · deep dive
FROM → WHERE → GROUP BY → HAVING → window functions → SELECT → ORDER BY. Window functions are computed after WHERE/GROUP BY, so the rank doesn't exist yet when WHERE runs. Wrap the query in a subquery/CTE and filter on the computed column outside, or use QUALIFY in engines that support it (Snowflake, BigQuery, DuckDB).DENSE_RANK() <= 3 if ties should all count (could return more than 3 rows). Use ROW_NUMBER() <= 3 for exactly 3 (but add a deterministic tiebreaker to ORDER BY or results are arbitrary). RANK skips numbers after ties, so RANK <= 3 can return fewer than three distinct groups.RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. So adding ORDER BY to SUM() OVER silently makes it cumulative and RANGE means tied rows share a value. People expect either the full-partition total or a strict per-row running sum and get neither. Be explicit with ROWS.ROWS BETWEEN 6 PRECEDING averages the last 7 rows, which is wrong if days are missing. Use a time-range frame: RANGE BETWEEN INTERVAL '6' DAY PRECEDING AND CURRENT ROW, or first densify the calendar with a generated date series LEFT JOINed to the data so every day is a row.WindowAgg over a Sort instead of a join — far less I/O and no fan-out risk.CTEs & recursion data
A CTE (WITH name AS (…)) names a subquery so a multi-stage query reads top-to-bottom like a pipeline instead of nesting inside-out. Beyond readability, the recursive CTE is SQL's loop — it walks hierarchies and graphs (org charts, bill-of-materials, category trees, reachability) that a flat join can't express. Two things separate a working recursive CTE from an infinite one: a correct anchor, and a termination guard against cycles.
A recursive CTE has two halves joined by UNION ALL: the anchor (the seed rows — the root of the tree) and the recursive member (which references the CTE name and produces the next level from the previous one). The engine runs the anchor once, then repeatedly runs the recursive member against the rows produced last iteration, appending results until an iteration returns zero rows. That fixed-point loop is how you descend a tree of unknown depth.
-- employees(id, name, manager_id) → full reporting tree under a CEO
WITH RECURSIVE org AS (
-- ANCHOR: the roots (no manager)
SELECT id, name, manager_id,
1 AS depth,
CAST(name AS TEXT) AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- RECURSIVE MEMBER: each child of the rows found so far
SELECT e.id, e.name, e.manager_id,
o.depth + 1,
o.path || ' > ' || e.name -- breadcrumb path
FROM employees e
JOIN org o ON e.manager_id = o.id
WHERE o.depth < 100 -- cycle / runaway guard
)
SELECT REPEAT(' ', depth - 1) || name AS tree, depth, path
FROM org
ORDER BY path;
-- edges(src, dst): which nodes are reachable from node 'A'?
-- a real graph can have cycles, so track the visited path explicitly
WITH RECURSIVE reach(node, hops, visited) AS (
SELECT 'A', 0, ARRAY['A'] -- anchor: start node
UNION ALL
SELECT e.dst, r.hops + 1, r.visited || e.dst
FROM edges e
JOIN reach r ON e.src = r.node
WHERE e.dst <> ALL(r.visited) -- ← prevents infinite cycling
)
SELECT DISTINCT node, MIN(hops) AS shortest_hops
FROM reach GROUP BY node;
Interview Q&A · deep dive
UNION ALL is the normal, fast choice. UNION (some engines forbid it here) deduplicates each step, which is one way to halt on cyclic graphs without an explicit visited set — but it's slower and the semantics differ. For trees use UNION ALL plus a depth guard; for graphs use UNION ALL plus an explicit visited array.MATERIALIZED/NOT MATERIALIZED hints when you need to override, and verify with EXPLAIN rather than trusting folklore.WHERE depth < N) and a cycle detector — either an explicit visited array with NOT (next = ANY(visited)), or the built-in CYCLE … SET … USING … clause (SQL standard / Postgres 14+ / Oracle). The depth cap is the seatbelt; the cycle check is the correct fix.Query optimization & the planner data
SQL is declarative — you state the result, the cost-based optimizer decides how to get it. It enumerates plans (which index, which join order, which join algorithm), estimates each plan's cost from table statistics, and picks the cheapest. Tuning is mostly a conversation with that estimator: read EXPLAIN ANALYZE, find where estimated rows diverge wildly from actual rows, and fix the thing that misled it — a missing index, stale stats, or a non-sargable predicate.
Parse → rewrite → plan/optimize → execute. The optimizer is the interesting stage: it uses statistics (row counts, value histograms, distinct-value estimates) to predict the cardinality of each step, then assigns a cost (an abstract blend of I/O + CPU). Bad cardinality estimates are the root of most bad plans — if it thinks a filter returns 5 rows but it returns 5 million, it'll pick a nested loop that becomes catastrophic.
EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, c.name
FROM orders o JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'shipped' AND o.created_at >= '2026-01-01';
-- READ IT BOTTOM-UP, INSIDE-OUT. The red flags:
-- Seq Scan on orders (cost=0..18211 rows=5 width=…) (actual rows=2104388)
-- ^ estimate 5, actual 2.1M → stats are stale OR predicate not sargable
-- Nested Loop (chosen because it expected 5 rows on the inner side)
-- ^ a hash join would be far cheaper for 2.1M rows
-- FIX 1: refresh the estimator's picture of the data
ANALYZE orders;
-- FIX 2: a partial/composite index matching the predicate
CREATE INDEX idx_orders_shipped
ON orders (created_at)
WHERE status = 'shipped'; -- partial index: tiny, hot-path only
-- ❌ NON-SARGABLE: wrapping the column kills the index (must scan + compute)
WHERE YEAR(created_at) = 2026
WHERE UPPER(email) = 'A@B.COM'
WHERE amount * 1.1 > 100
WHERE status LIKE '%shipped' -- leading wildcard = no index
-- ✅ SARGABLE: leave the column bare so the index range-scans
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01'
WHERE email = 'a@b.com' -- or build a functional index on UPPER(email)
WHERE amount > 100 / 1.1
WHERE status LIKE 'shipped%' -- trailing wildcard CAN use a B-tree
| Join algorithm | How it works | Best when | Cost shape |
|---|---|---|---|
| Nested loop | for each outer row, probe inner (ideally via index) | small outer side, indexed inner | O(n · index lookup) |
| Hash join | build hash on smaller side, probe with larger | large unsorted inputs, equi-join | O(n + m), needs memory |
| Merge join | sort both, walk in lockstep | inputs already sorted on the key | O(n log n) if a sort is needed |
Interview Q&A · deep dive
EXPLAIN shows the planned tree with estimated rows/cost without running it. EXPLAIN ANALYZE actually executes and adds actual rows and timing. The first thing to check is the gap between estimated and actual row counts per node — a big divergence means the optimizer was working from a wrong cardinality, which is the root cause behind most bad join/scan choices.YEAR(col), UPPER(col)), arithmetic on it, an implicit type cast, or a leading-wildcard LIKE forces the engine to compute the expression per row → full scan. Fix by rewriting to a range, or building a functional/expression index that matches the predicate.ANALYZE (or autovacuum/auto-update stats) refreshes them; extended/multi-column statistics help with correlated columns the single-column histograms miss.a, a,b, and a,b,c (and ORDER BY in that order). It can't seek on b alone or c alone. An equality on a plus a range on b is fine; a range on a means b can't be used for seeking past it. Column order should put equality-filtered, high-selectivity columns first.Transactions, isolation & concurrency data
A transaction is the unit of all-or-nothing, never-corrupt change. ACID names the guarantees; the hard, practical part is the I — Isolation: how much one transaction sees of another's in-flight work. The SQL standard defines isolation by which read anomalies it forbids — dirty, non-repeatable, and phantom reads. Underneath, engines deliver isolation two very different ways: pessimistic locking (block conflicting access) or MVCC (give each transaction a consistent snapshot of versioned rows). Knowing which your engine uses explains your deadlocks and your throughput.
Read these as a ladder: each higher level forbids one more anomaly at the cost of more contention. A dirty read sees another txn's uncommitted change. A non-repeatable read sees a row's value change when you re-read it (another txn committed an UPDATE). A phantom sees new rows appear in a range you re-query (another txn committed an INSERT). A write skew (only Serializable stops it) is two txns each reading an overlapping set and writing based on it, jointly violating an invariant neither could alone.
| Isolation level | Dirty read | Non-repeatable | Phantom | Write skew |
|---|---|---|---|---|
| Read Uncommitted | possible | possible | possible | possible |
| Read Committed | no | possible | possible | possible |
| Repeatable Read | no | no | possible* | possible |
| Serializable | no | no | no | no |
-- transfer $100: must be atomic AND not lose a concurrent update
BEGIN;
-- pessimistic lock: nobody else can modify these rows until commit
SELECT balance FROM accounts WHERE id IN (1, 2) FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT; -- both writes durable, or neither (on ROLLBACK / crash)
-- OPTIMISTIC alternative (no lock held): version-check on write
UPDATE accounts
SET balance = balance - 100, version = version + 1
WHERE id = 1 AND version = 42; -- if 0 rows updated → someone else won, retry
MVCC (Postgres, Oracle, InnoDB) keeps multiple versions of a row; readers see a snapshot as of their transaction start and never block writers — "readers don't block writers, writers don't block readers." The cost is version bloat that VACUUM must reclaim. Two-phase locking instead acquires locks growing-then-shrinking around the commit. A deadlock is a lock cycle: txn A holds row 1 and waits for row 2 while txn B holds row 2 and waits for row 1. The engine's deadlock detector picks a victim and aborts it with an error you must catch and retry.
Interview Q&A · deep dive
COUNT(*) WHERE status='open', someone commits an INSERT, you re-run and the count grew. Write skew: two on-call doctors each check "≥1 other on duty" (true), each goes off duty — now zero are on duty.SELECT … FOR UPDATE before computing, do the arithmetic in SQL (SET balance = balance - 100) so it's atomic, or use optimistic concurrency: include a version/timestamp in the WHERE and retry if zero rows updated. Read Committed alone does not prevent lost updates.NULLs, pivot & upsert — the sharp edges data
Three things trip up otherwise-strong SQL: NULL isn't a value, it's "unknown," and it makes logic three-valued; reshaping rows↔columns (pivot/unpivot) is just conditional aggregation in disguise; and upsert ("insert or update") needs an atomic, race-free construct, not a check-then-insert. Get these right and a whole class of silent-wrong-answer bugs disappears.
In SQL a comparison can be TRUE, FALSE, or UNKNOWN. Any arithmetic or comparison with NULL yields NULL/UNKNOWN — so NULL = NULL is not TRUE, and x <> 5 silently drops rows where x is NULL. WHERE keeps only TRUE rows, so UNKNOWN rows vanish from filters but a CHECK constraint passes on UNKNOWN. Aggregates skip NULLs (so AVG ignores them, but COUNT(*) counts them and COUNT(col) doesn't). Test for null only with IS NULL / IS NOT NULL (or IS DISTINCT FROM).
-- ❌ TRAP: NOT IN with a NULL in the list returns NOTHING
-- x NOT IN (1, 2, NULL) → x<>1 AND x<>2 AND x<>NULL → ... AND UNKNOWN
SELECT * FROM orders
WHERE customer_id NOT IN (SELECT id FROM banned); -- empty if banned has a NULL!
-- ✅ FIX: NOT EXISTS is NULL-safe
SELECT * FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM banned b WHERE b.id = o.customer_id);
-- COALESCE: first non-NULL. NULLIF: NULL when equal (guard /0)
SELECT
COALESCE(nickname, full_name, '(anonymous)') AS display,
revenue / NULLIF(orders_count, 0) AS avg_order, -- no divide-by-zero
-- IS DISTINCT FROM treats NULL as a comparable value
(old_status IS DISTINCT FROM new_status) AS changed
FROM customers;
-- PIVOT = conditional aggregation: rows → columns (portable everywhere)
SELECT product,
SUM(CASE WHEN quarter = 'Q1' THEN amount ELSE 0 END) AS q1,
SUM(CASE WHEN quarter = 'Q2' THEN amount ELSE 0 END) AS q2,
SUM(CASE WHEN quarter = 'Q3' THEN amount ELSE 0 END) AS q3
FROM sales GROUP BY product;
-- UPSERT (Postgres / SQLite): atomic insert-or-update, no race
INSERT INTO inventory (sku, qty)
VALUES ('A-100', 5)
ON CONFLICT (sku) DO UPDATE
SET qty = inventory.qty + EXCLUDED.qty; -- EXCLUDED = the row we tried to insert
-- MERGE (SQL standard / SQL Server / Oracle / PG 15+): multi-action
MERGE INTO inventory t
USING staging s ON t.sku = s.sku
WHEN MATCHED THEN UPDATE SET t.qty = s.qty
WHEN NOT MATCHED THEN INSERT (sku, qty) VALUES (s.sku, s.qty);
-- one pass, multiple grouping granularities (region, region+product, grand total)
SELECT region, product, SUM(amount) AS total
FROM sales
GROUP BY ROLLUP (region, product); -- = GROUPING SETS ((region,product),(region),())
-- rows with NULL region/product are the subtotal/grand-total lines;
-- use GROUPING(region) to tell a real NULL from a subtotal marker
| Need | Use | Watch out for |
|---|---|---|
| Default for NULL | COALESCE(a, b, …) | returns NULL only if all args NULL; type must match |
| Avoid divide-by-zero | x / NULLIF(y, 0) | result is NULL (not error) when y=0 — handle it |
| Insert-or-update | ON CONFLICT / MERGE | needs a unique/PK constraint; MERGE has had concurrency CVEs |
| Subtotals + grand total | ROLLUP / CUBE | NULL markers vs real NULLs — use GROUPING() |
Interview Q&A · deep dive
WHERE x <> 'a' exclude rows where x is NULL?NULL <> 'a' evaluates to UNKNOWN, not TRUE, and WHERE keeps only TRUE rows. So any predicate that should include unknowns must say so explicitly: WHERE x <> 'a' OR x IS NULL. This is three-valued logic — the single most common source of "rows mysteriously missing."COUNT(*) counts every row including NULLs. COUNT(col) counts only rows where col IS NOT NULL. COUNT(DISTINCT col) counts distinct non-NULL values. Likewise AVG(col) divides by the non-NULL count, so AVG ≠ SUM/COUNT(*) when NULLs exist — a frequent reconciliation bug.COALESCE(a,b,…) is standard and variadic: returns the first non-NULL. NULLIF(a,b) returns NULL if a=b else a — perfect for guarding division. ISNULL/IFNULL are two-arg, vendor-specific (SQL Server / MySQL) and differ in return type rules; prefer COALESCE for portability.SUM(CASE WHEN key='X' THEN val END) per target column, grouped by the row key. It's fully portable and clearer than vendor PIVOT syntax. For a dynamic set of columns (unknown at write time) you must generate the SQL string from the distinct keys, then execute it — there's no static SQL that produces a variable number of columns.ON CONFLICT (Postgres/SQLite) is purpose-built for single-table upsert, is concise, and is atomic against concurrent inserts via the unique index. MERGE is the SQL-standard, multi-table/multi-action statement (also does deletes) but historically had concurrency pitfalls (non-atomic match-then-act races, documented in SQL Server) requiring careful locking hints. For plain upsert, ON CONFLICT is simpler and safer; reach for MERGE when you genuinely need INSERT+UPDATE+DELETE in one pass.GROUPING(col) function: it returns 1 for a row where that column was aggregated away (a subtotal/grand-total line) and 0 for a normal grouping value. CASE WHEN GROUPING(region)=1 THEN 'All regions' ELSE region END labels totals cleanly instead of showing a bare NULL.Design Patterns, Concurrency & APIs
The software-engineering craft a senior Python role is assumed to own: the named design patterns interviewers probe for, the three concurrency models and when each wins, and how to design an API that other teams can build on. Anchored to systems you actually run — extractor registries, async scrapers, Celery workers, FastAPI services.
Design patterns — what they are & when to reach for one orientation
A design pattern is a named, proven solution to a recurring design problem — not a library you import, but a shape your code takes. The 23 Gang-of-Four patterns fall into three families. Their real value in an interview is vocabulary: naming the force that makes a pattern necessary, and knowing when a plain function beats a pattern.
| Family | Solves | The ones that come up |
|---|---|---|
| Creational | how objects get made | Factory, Builder, Singleton |
| Structural | how objects compose | Adapter, Decorator, Facade, Proxy |
| Behavioural | how objects collaborate | Strategy, Observer, Iterator, Command |
Interview Q&A
Every pattern exists to absorb a specific force — a pressure that will otherwise leak into your code as a smell. Strategy absorbs "the algorithm varies"; Observer absorbs "the listeners vary"; Adapter absorbs "the interface is wrong"; Decorator absorbs "behaviour stacks". The interview-grade move is to name the force first, then say which pattern neutralises it. If you can't name a force, you don't need a pattern — you have a function.
| Smell you feel | Force underneath | Pattern that fits |
|---|---|---|
| Big if/elif ladder on a type | behaviour varies by case | Strategy / polymorphism |
| Constructor with 8 optional args | complex stepwise assembly | Builder |
| Calling code knows concrete classes | creation is coupled to use | Factory |
| Wrapping to add log/retry/cache | behaviour layers independently | Decorator |
| One change must notify many | fan-out without coupling | Observer |
The honest default is no pattern. Reach for one only when you have observed (not imagined) variation, or a force that keeps recurring. A pattern bought against speculative future flexibility is the most common form of over-engineering — it adds indirection you pay for on every read and refactor, with no payoff until the day (often never) the variation arrives. YAGNI beats GoF.
# Stage 0 — a function. If this is all you need, STOP HERE.
def discount(price): return price * 0.9
# Stage 1 — variation appears: two discount rules. A dict of callables
# is the Pythonic Strategy. No classes, no ceremony.
RULES = {
"black_friday": lambda p: p * 0.5,
"loyalty": lambda p: p - 5,
}
def price_for(price, rule): return RULES[rule](price)
# Stage 2 — rules now need state + validation + names → promote to a
# Protocol-typed Strategy ONLY now, because the force finally justifies it.
from typing import Protocol
class DiscountRule(Protocol):
def apply(self, price: float) -> float: ...
print(price_for(100, "loyalty")) # 95
Interview Q&A · deep dive
__iter__/generators are Iterator; first-class classes make Factory a dict lookup. A pattern is a workaround for a missing language feature — when the feature exists, the pattern dissolves into idiom. Peter Norvig showed 16 of 23 GoF patterns are simpler or invisible in dynamic languages.Creational — controlling how objects are born creational
These decouple what you want from how it's constructed. Factory picks the concrete class for you; Builder assembles a complex object step by step; Singleton guarantees one shared instance.
EXTRACTORS = {} # registry
def register(name):
def wrap(cls): EXTRACTORS[name] = cls; return cls
return wrap
@register("ctgov")
class CtgovExtractor: ...
def make_extractor(name): # the factory
return EXTRACTORS[name]() # caller never names the class
Interview Q&A
| Pattern | The question it answers | Reach for it when |
|---|---|---|
| Factory Method | which concrete class do I make? | the type depends on input/config |
| Abstract Factory | which whole family do I make? | products come in matched sets (e.g. cloud provider's client+bucket+queue) |
| Builder | how do I assemble a complex object? | many optional parts, validation, immutability at the end |
| Prototype | how do I copy an existing object? | cloning is cheaper than constructing |
| Singleton | how do I share one instance? | almost never in Python — use a module |
| Registry | how do I find a class by name? | plugins self-register; factory resolves by key |
from typing import Protocol
# Abstract Factory: one factory makes a coherent SET of objects that
# must agree with each other (same cloud, same auth, same region).
class Storage: def put(self, k, v): ...
class Queue: def push(self, m): ...
class CloudFactory(Protocol):
def storage(self) -> Storage: ...
def queue(self) -> Queue: ...
class AwsFactory:
def storage(self): return Storage() # would be S3
def queue(self): return Queue() # would be SQS
def build_app(factory: CloudFactory): # app never names AWS/GCP
store, q = factory.storage(), factory.queue()
return store, q
build_app(AwsFactory()) # swap to GcpFactory() with zero app edits
from dataclasses import dataclass, replace
@dataclass(frozen=True)
class Query:
table: str
where: tuple = ()
limit: int | None = None
# fluent builder steps return NEW immutable objects (replace)
def filter(self, c): return replace(self, where=self.where + (c,))
def top(self, n): return replace(self, limit=n)
def sql(self):
w = " AND ".join(self.where) or "1=1"
l = f" LIMIT {self.limit}" if self.limit else ""
return f"SELECT * FROM {self.table} WHERE {w}{l}"
q = Query("trials").filter("phase=3").top(10)
print(q.sql()) # SELECT * FROM trials WHERE phase=3 LIMIT 10
Interview Q&A · deep dive
make() -> Product). Abstract Factory makes a family of related products that must be consistent (storage(), queue(), db() all from the same cloud). Rule of thumb: if you'd otherwise risk mixing an AWS bucket with a GCP queue, you want an Abstract Factory to keep the set coherent.class Single(type): _i={}; def __call__(cls,*a,**k): with _lock: if cls not in cls._i: cls._i[cls]=super().__call__(*a,**k); return cls._i[cls]. Avoid because it's global state — untestable, hidden coupling, lifecycle tied to interpreter, not request/job. A module-level instance plus DI gives sharing without the downsides.copy.deepcopy(template) clones the assembled state. Watch the deep-vs-shallow trap: shallow copy shares nested mutables, so a clone can mutate the original's lists.if name in REG: raise KeyError), or namespace keys per package. Silent override is how a third-party plugin can shadow your built-in handler and nobody notices until prod.Structural — composing objects into bigger shapes structural
Adapter makes an incompatible interface fit; Decorator adds behaviour by wrapping; Facade hides a messy subsystem behind one simple entry point; Proxy stands in for another object to add control (lazy load, cache, access).
| Pattern | Intent | Everyday example |
|---|---|---|
| Adapter | translate one interface to another | wrap a vendor SDK so it matches your own client interface |
| Decorator | add behaviour without subclassing | retry / cache / log wrappers around a function |
| Facade | one simple API over many parts | a PipelineService hiding ingest+embed+index |
| Proxy | control access to an object | a lazy-loading or rate-limited client stand-in |
Interview Q&A
| Pattern | Wraps | To change |
|---|---|---|
| Adapter | one object | its interface (make it fit yours) |
| Decorator | one object | its behaviour (add, keep interface) |
| Proxy | one object | its access (lazy, cache, guard, remote) |
| Facade | many objects | the surface (one simple door) |
| Composite | a tree of objects | treat leaf & group uniformly |
| Bridge | two hierarchies | vary abstraction & impl independently |
Adapter, Decorator, and Proxy have identical structure (wrap one object, hold a reference, delegate) and differ only in intent. Adapter changes the shape of the door; Decorator adds locks to the door; Proxy decides whether you may open it. Interviewers love this because it tests whether you reason about intent, not just UML.
from dataclasses import dataclass, field
# Composite: a File and a Folder share one interface (.size()),
# so client code recurses a tree without checking leaf-vs-node.
@dataclass
class File:
name: str; bytes_: int
def size(self): return self.bytes_
@dataclass
class Folder:
name: str; children: list = field(default_factory=list)
def size(self): # same method name as leaf
return sum(c.size() for c in self.children)
root = Folder("/", [File("a.txt", 100),
Folder("sub", [File("b.txt", 250)])])
print(root.size()) # 350 — client never special-cases the tree
import time
class RealModel:
def __init__(self):
time.sleep(0) # pretend: slow 2s warm-up
self.weights = "loaded"
def embed(self, text): return hash(text) % 997
class ModelProxy:
def __init__(self): self._real = None; self._cache = {}
def embed(self, text):
if self._real is None: # lazy: build only on first real use
self._real = RealModel()
if text not in self._cache: # caching proxy
self._cache[text] = self._real.embed(text)
return self._cache[text]
m = ModelProxy() # cheap — no warm-up yet
print(m.embed("hi")) # warms up + caches; same call signature as RealModel
Interview Q&A · deep dive
fetch() → get()). Decorator: same interface, but it does extra work around the same call (log, retry, then delegate). Proxy: same interface, but it decides whether/when to delegate (lazy, cache, permission check). Same skeleton, three different reasons.File.add_child() that throws is the "rejected request" smell (a Liskov violation). Composite shines for genuine part-whole trees (filesystems, UI widgets, org charts, expression trees) where "do X to the whole subtree" is a real operation.functools.lru_cache relate to these patterns?@ syntax is the Decorator pattern; the behaviour it adds (intercept call, return cached result, only invoke the real function on a miss) is a Proxy's job. Real code blends patterns — naming the blend ("a caching proxy applied via a decorator") is the senior articulation.VectorCircle, RasterCircle…). Bridge makes Shape hold a Renderer via composition, so you add one shape or one renderer in isolation — N+M classes, and you can mix at runtime.Behavioural — how objects talk to each other behavioural
Strategy swaps an algorithm at runtime; Observer notifies subscribers of changes; Iterator walks a collection without exposing it; Command turns a request into an object you can queue, log, or undo.
def match(record, strategy): # strategy is just a callable
return strategy(record)
# swap behaviour without touching match()
score = match(r, exact_name_strategy)
score = match(r, fuzzy_plus_location_strategy)
Interview Q&A
Both delegate to a swappable object behind a stable interface. The difference is who pulls the lever. In Strategy the client chooses the algorithm and it stays put for the call ("sort with this comparator"). In State the object transitions itself between states based on events ("a connection goes Connecting → Open → Closed"), and each state knows which state comes next. State is a Strategy that rewires its own pointer.
| Pattern | Turns into an object… | So you can |
|---|---|---|
| Command | a request / action | queue, log, retry, undo it |
| Observer | a subscription | fan one event out to many |
| State | a mode of behaviour | replace mode-flag spaghetti |
| Template Method | the fixed skeleton of an algorithm | let subclasses fill the gaps |
| Chain of Resp. | a handler in a pipeline | pass a request down until handled |
| Iterator | a cursor over a collection | walk it without exposing internals |
Below, each state is a class that handles events and returns the next state. The win over a giant if self.mode == ... block: adding a state is a new class, and illegal transitions are simply absent — you can't "send" while "closed" because that state has no send path.
class Closed:
def open(self): print("opening"); return Open()
def send(self, m): raise RuntimeError("not open")
class Open:
def send(self, m): print("sent:", m); return self
def close(self): print("closing"); return Closed()
class Connection:
def __init__(self): self.state = Closed()
def __getattr__(self, name): # delegate to current state
def call(*a):
self.state = getattr(self.state, name)(*a)
return call
c = Connection()
c.open(); c.send("ping"); c.close() # transitions handled by the states
class Bus: # Observer: source doesn't know subscribers
def __init__(self): self.subs = []
def on(self, fn): self.subs.append(fn); return fn
def emit(self, ev):
for fn in self.subs: fn(ev)
class AddItem: # Command: action as an object with undo
def __init__(self, cart, item): self.cart, self.item = cart, item
def do(self): self.cart.append(self.item)
def undo(self): self.cart.remove(self.item)
bus, cart, history = Bus(), [], []
@bus.on
def log(ev): print("event:", ev)
cmd = AddItem(cart, "book"); cmd.do(); history.append(cmd)
bus.emit("added book")
history.pop().undo() # Ctrl-Z: cart back to []
print(cart) # []
Interview Q&A · deep dive
run() calls step1(); step2()) and subclasses override the holes. Strategy uses composition: you inject the varying part as an object. Template Method fixes the structure and varies steps via subclassing (compile-time-ish); Strategy varies the whole behaviour at runtime and avoids inheritance. Modern advice: prefer Strategy (composition) unless the skeleton is genuinely fixed and shared.__iter__/__next__ with state suspended between yields. It buys laziness (compute one item at a time, O(1) memory over a stream), composability (pipe generators), and infinite sequences. The pattern that needed a whole class in Java is one keyword in Python.send() on Closed is an AttributeError/explicit raise instead of silently corrupting a mode flag. Compared to a status enum + scattered if checks, the transition logic is co-located with the behaviour it guards, so adding a state can't miss a check elsewhere.SOLID & Pythonic design principles
Patterns are tactics; SOLID is the strategy underneath them — five principles that keep code changeable. In Python they show up as composition, dependency injection, dataclasses, and Protocols (structural typing).
| Principle | In one line |
|---|---|
| Single responsibility | a class/function has one reason to change |
| Open/closed | open to extension, closed to modification (add, don't edit) |
| Liskov substitution | a subtype must work anywhere its base does |
| Interface segregation | small focused interfaces beat one fat one |
| Dependency inversion | depend on abstractions, inject the concrete |
Interview Q&A
| Principle | The smell it kills | Python-native tool |
|---|---|---|
| SRP | a class that parses and validates and saves | small modules/functions; dataclasses for data |
| OCP | editing a big if/elif for every new case | registry/dispatch dict; @singledispatch |
| LSP | a subclass that throws on a base method | favour composition; honour the contract |
| ISP | implementing a fat ABC's 9 methods to use 1 | Protocol — split into narrow ones |
| DIP | Service builds its own DB/LLM client | constructor injection of a Protocol |
In Python, SOLID leans on structural typing. You don't need a class to declare it implements an interface — if it has the methods, it fits the Protocol. This makes DIP and ISP nearly free: the abstraction is a Protocol, the concrete is anything that matches, and tests inject a hand-written fake with no inheritance.
from typing import Protocol
# ISP: a NARROW interface — Notifier needs only one method, not a god-class.
class Notifier(Protocol):
def send(self, to: str, msg: str) -> None: ...
# DIP: AlertService depends on the ABSTRACTION, injected — not a concrete SDK.
class AlertService:
def __init__(self, notifier: Notifier):
self.notifier = notifier
def trip(self, who):
self.notifier.send(who, "circuit OPEN")
class Slack: # production impl — no inheritance needed
def send(self, to, msg): print(f"slack→{to}: {msg}")
class FakeNotifier: # test double — just matches the shape
def __init__(self): self.sent = []
def send(self, to, msg): self.sent.append((to, msg))
f = FakeNotifier()
AlertService(f).trip("oncall")
assert f.sent == [("oncall", "circuit OPEN")] # deterministic test, no mocks
from functools import singledispatch
# OCP: add a new shape by registering a function — never edit area() itself.
@singledispatch
def area(shape): raise TypeError(f"no area for {type(shape)}")
class Circle: def __init__(self, r): self.r = r
class Square: def __init__(self, s): self.s = s
@area.register
def _(c: Circle): return 3.14159 * c.r ** 2
@area.register
def _(s: Square): return s.s ** 2
print(area(Circle(2)), area(Square(3))) # 12.56636 9
# A Triangle ships in its own file with one @area.register — core untouched.
Interview Q&A · deep dive
.read()" and for typing third-party objects you can't subclass. ABC for a nominal family you control and want to enforce at instantiation (it raises if abstract methods are missing) and to share implementation via concrete base methods. Rule: Protocol to describe a shape; ABC to own and enforce a hierarchy.Square(Rectangle) where setting width also forces height — a function written against Rectangle that sets them independently breaks. It's a violation because the subtype strengthens preconditions / weakens postconditions. Fix: drop the "is-a" (a square isn't a substitutable rectangle here) and use composition or a shared Shape with area(), not mutable width/height.Concurrency models — threads vs async vs multiprocessing decide
The single most useful decision in Python performance work. The fork in the road is CPU-bound vs I/O-bound, because the GIL lets only one thread run Python bytecode at a time — so threads help I/O wait, but not raw computation.
| Model | Best for | Mechanism |
|---|---|---|
| threading | I/O-bound, moderate concurrency | OS threads, share memory, GIL-limited for CPU |
| asyncio | I/O-bound, huge concurrency | one thread, cooperative await on the event loop |
| multiprocessing | CPU-bound | separate processes, separate GILs, real parallelism |
Interview Q&A
Untangle two words people use interchangeably. Concurrency is structure — many tasks in flight, interleaved on possibly one core. Parallelism is execution — many tasks literally running at the same instant on different cores. The GIL is why Python gives you concurrency on threads for free but reserves true parallelism for processes. The decision is never "threads or async" in the abstract — it is "what is this task waiting on?"
| Question | threading | asyncio | multiprocessing |
|---|---|---|---|
| Workload | I/O-bound | I/O-bound | CPU-bound |
| Scale ceiling | ~hundreds (stack + OS limits) | tens of thousands (cheap coroutines) | ~#cores (memory-bound) |
| Real parallelism? | no (GIL) | no (one loop) | yes (separate GILs) |
| Sharing state | shared memory + locks (risky) | one thread, no locks needed | IPC / pickling (no shared memory) |
| Library cost | works with any blocking lib | needs async libs end-to-end | data must be picklable |
| Failure blast radius | one bad lib can deadlock all | one blocking call freezes all | a crashed worker is isolated |
import time, math
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def io_task(_): # simulates a network/disk wait
time.sleep(0.2); return 1
def cpu_task(n): # pure computation — GIL-bound on threads
return sum(math.isqrt(i) for i in range(n))
def timed(label, pool_cls, fn, args):
t = time.perf_counter()
with pool_cls(max_workers=8) as ex:
list(ex.map(fn, args))
print(f"{label:<28}{time.perf_counter() - t:.2f}s")
if __name__ == "__main__": # guard is REQUIRED for processes
timed("IO · threads", ThreadPoolExecutor, io_task, range(40)) # ~1s (overlaps waits)
timed("CPU · threads (GIL!)", ThreadPoolExecutor, cpu_task, [2_000_000]*8) # no speedup
timed("CPU · processes", ProcessPoolExecutor, cpu_task, [2_000_000]*8) # ~Nx faster
Interview Q&A · deep dive
asyncio gives concurrency on a single core; multiprocessing gives parallelism across cores. Threads in CPython give concurrency but not CPU parallelism because of the GIL.multiprocessing need picklable arguments while threads don't?loop.run_in_executor(ProcessPoolExecutor(), cpu_fn, data), which awaits the result without blocking the loop.asyncio in practice async
One thread, one event loop, thousands of in-flight I/O operations. A coroutine (async def) yields control at every await, letting the loop run others while it waits — perfect for fan-out network calls.
import asyncio, aiohttp
async def fetch(session, url, sem):
async with sem: # cap concurrency
async with session.get(url) as r:
return await r.json()
async def run(urls):
sem = asyncio.Semaphore(10)
async with aiohttp.ClientSession() as s:
tasks = [fetch(s, u, sem) for u in urls]
return await asyncio.gather(*tasks) # all at once, bounded
Interview Q&A
The loop is a single-threaded scheduler running a ready queue of callbacks. Each iteration it: (1) runs every callback currently ready, (2) asks the OS via selectors (epoll/kqueue/IOCP) "which of these sockets/timers are now ready?", and (3) schedules those callbacks for the next tick. An await is the point where a coroutine hands a future to the loop and says "wake me when this resolves". Nothing is pre-empted — a coroutine that never awaits never yields, and a blocking call between awaits stalls the whole loop.
import asyncio, httpx
async def fetch(client, url):
r = await client.get(url, timeout=10)
r.raise_for_status()
return r.json()
async def main(urls):
results = {}
async with httpx.AsyncClient() as client:
try:
async with asyncio.TaskGroup() as tg: # all tasks share one scope
tasks = {u: tg.create_task(fetch(client, u)) for u in urls}
# block exits only when EVERY task is done or cancelled
results = {u: t.result() for u, t in tasks.items()}
except* httpx.HTTPError as eg: # except* unpacks an ExceptionGroup
print(f"{len(eg.exceptions)} fetch(es) failed")
return results
asyncio.run(main(["https://example.com/a", "https://example.com/b"]))
| Behaviour | asyncio.gather | asyncio.TaskGroup (3.11+) |
|---|---|---|
| One task raises | others keep running (orphaned) | siblings auto-cancelled |
| Multiple failures | only the first surfaces | all aggregated in an ExceptionGroup |
| Catch by type | normal except | except* filters the group |
| Leaked tasks on error | likely | impossible — scope joins all |
Interview Q&A · deep dive
await point.await. It is cooperative — control only moves at await.TaskGroup preferred over gather for new code?async with block can't exit until every child task finishes, so there are no orphaned tasks. On failure it cancels siblings and raises an ExceptionGroup aggregating all errors, which you filter with except*. gather leaks running tasks on error and only surfaces the first exception unless you pass return_exceptions=True and inspect each result manually.await asyncio.to_thread(blocking_fn, arg) for I/O-ish or GIL-releasing C calls, or loop.run_in_executor(ProcessPoolExecutor(), cpu_fn, arg) for pure-Python CPU work. Both return an awaitable so the loop stays responsive while the work runs elsewhere.async def — inert until awaited. A Future is a low-level placeholder for a result that will arrive. A Task is a Future that wraps and drives a coroutine on the loop — created by create_task/TaskGroup.create_task — which is what actually makes the coroutine run concurrently rather than just when you await it inline.Synchronisation & pools safety
When work runs in parallel, shared mutable state is the enemy. A race condition is two workers touching the same data without ordering. The fix is rarely manual locks — prefer queues and executors that hide the sharp edges.
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=10) as ex:
results = list(ex.map(fetch_one, urls)) # parallel I/O, ordered results
| Tool | Use when |
|---|---|
| Lock | a short critical section must be exclusive |
| Queue | hand work between producers/consumers safely |
| ThreadPoolExecutor | I/O-bound fan-out, bounded workers |
| ProcessPoolExecutor | CPU-bound fan-out across cores |
Interview Q&A
People assume counter += 1 is one step. It is three bytecodes — load, add, store — and the GIL can switch threads between any of them. Two threads can both load the old value, both add one, and both store the same result: one increment is lost. The GIL prevents memory corruption, not logical race conditions. A critical section is any sequence that must appear atomic to other threads; a Lock makes it so.
import threading
counter = 0
lock = threading.Lock()
def unsafe():
global counter
for _ in range(100_000):
counter += 1 # load-add-store: NOT atomic
def safe():
global counter
for _ in range(100_000):
with lock: # critical section — only one thread inside
counter += 1
def race(target):
global counter; counter = 0
ts = [threading.Thread(target=target) for _ in range(8)]
for t in ts: t.start()
for t in ts: t.join()
return counter
print("unsafe:", race(unsafe)) # < 800000, varies run to run
print("safe: ", race(safe)) # exactly 800000, always
| Primitive | Guarantees | Reach for it when |
|---|---|---|
| Lock | one holder; not re-entrant | a simple exclusive critical section |
| RLock | same thread can re-acquire | a locked method calls another locked method |
| Semaphore(n) | at most n holders at once | cap concurrency to a pool of n resources |
| Event | broadcast a one-shot signal | threads wait until "ready"/"shutdown" is set |
| Condition | wait/notify on a predicate | "wake a consumer when the buffer is non-empty" |
| Queue | thread-safe FIFO, built-in locking | producer/consumer — the lock you don't write |
import threading, queue
q = queue.Queue(maxsize=20) # bounded → built-in back-pressure
DONE = object() # sentinel to signal "no more work"
def producer(items):
for it in items: q.put(it) # blocks if full → throttles producer
q.put(DONE)
def consumer():
while True:
it = q.get() # blocks if empty
if it is DONE: q.task_done(); break
handle(it)
q.task_done()
threading.Thread(target=producer, args=(range(100),)).start()
threading.Thread(target=consumer, daemon=True).start()
q.join() # wait until every item is task_done()
Interview Q&A · deep dive
counter += 1 is load-add-store; an interleaving loses updates. The GIL prevents interpreter-state corruption, not application-level races, so you still serialise multi-step critical sections with a lock.Lock deadlocks if the same thread tries to acquire it twice — common when a locked method calls another method that also locks. RLock tracks the owning thread and a recursion count, so re-acquisition by the holder succeeds. Use RLock for re-entrant code paths; prefer plain Lock otherwise since it's cheaper and surfaces accidental recursion as a bug.Lock allows exactly one holder; a Semaphore(n) allows up to n concurrent holders. Use a semaphore to cap concurrency against a limited resource — e.g. at most 10 simultaneous calls to a downstream API, or a connection pool of size n. A Lock is just a Semaphore(1).queue.Queue already has correct internal locking, condition-variable wait/notify, and (when bounded) back-pressure that blocks producers when full — three things you'd otherwise hand-roll and get subtly wrong. It also gives task_done()/join() for clean completion. Less code, no custom lock to deadlock on.REST API design interfaces
REST models your system as resources (nouns) acted on by HTTP verbs. Good API design is mostly consistency: predictable URLs, correct verbs, honest status codes, and stable contracts other teams can build against.
| Verb | Means | Idempotent? |
|---|---|---|
| GET | read a resource | yes (no side effects) |
| POST | create / action | no (creates each time) |
| PUT | replace fully | yes (same result if repeated) |
| PATCH | update partially | usually no |
| DELETE | remove | yes |
Interview Q&A
REST is a set of constraints (Fielding's thesis), and the ones interviewers probe are: statelessness (every request carries all context — no server-side session affinity, which is what lets you scale horizontally behind a load balancer), uniform interface (the same verbs/status codes everywhere, so clients are predictable), and cacheability (responses say whether they can be cached). "RESTful" CRUD over HTTP satisfies a subset; the constraints are the part that actually buys you scale and evolvability.
| PUT | PATCH | |
|---|---|---|
| Semantics | replace the entire resource | apply a partial change |
| Missing fields | treated as cleared/defaulted | left untouched |
| Idempotent? | yes — same body → same end state | not inherently (e.g. {"qty": "+1"}) |
| Safe to blind-retry? | yes | only if the patch is itself idempotent |
Make PATCH idempotent by sending absolute values ({"status":"paid"}), not deltas. For deltas or any non-idempotent write, attach an idempotency key so a retried request is deduplicated server-side.
# GET /trials?limit=50&cursor=<opaque> — opaque cursor = base64(last sort key)
import base64, json
def list_trials(db, limit=50, cursor=None):
after = json.loads(base64.urlsafe_b64decode(cursor)) if cursor else None
# keyset: WHERE (created_at, id) > (:ts, :id) ORDER BY created_at, id LIMIT n+1
rows = db.query_after(after, limit + 1) # fetch one extra to detect more
has_more = len(rows) > limit
rows = rows[:limit]
next_cursor = None
if has_more:
last = rows[-1]
token = json.dumps([last["created_at"], last["id"]])
next_cursor = base64.urlsafe_b64encode(token.encode()).decode()
return {"items": rows, "next_cursor": next_cursor} # null = last page
Interview Q&A · deep dive
Idempotency-Key header; the server stores the key with the result of the first successful execution. A retry with the same key returns the stored result instead of creating a second order. POST isn't idempotent by nature, so you make this specific operation idempotent with the key — and set a TTL on stored keys.400 Bad Request is for syntactically malformed requests the server can't parse (bad JSON, missing required structure). 422 Unprocessable Entity is for well-formed requests that fail business/validation rules (valid JSON, but age: -5). FastAPI returns 422 for Pydantic validation failures for exactly this reason — the syntax was fine, the semantics weren't.Auth, API styles & FastAPI production
Beyond REST: how to secure an API, when another style fits, and the FastAPI patterns that make Python services clean — typed validation, dependency injection, async endpoints, and auto-generated docs.
| Style | Strength | Reach for it when |
|---|---|---|
| REST | simple, cacheable, universal | resource CRUD, public APIs |
| GraphQL | client picks exact fields | varied clients, over/under-fetching pain |
| gRPC | fast binary, streaming, typed | internal service-to-service, low latency |
from fastapi import FastAPI, Depends
from pydantic import BaseModel
class Query(BaseModel): # validated automatically
text: str; top_k: int = 5
app = FastAPI()
@app.post("/search")
async def search(q: Query, svc=Depends(get_service)):
return await svc.search(q.text, q.top_k)
Interview Q&A
REST, GraphQL, and gRPC aren't a maturity ladder — they optimise different things. REST optimises for cacheability and ubiquity (every proxy, browser, and CDN understands it). GraphQL optimises for diverse clients fetching exactly what they need from one endpoint (mobile + web + partners, no over/under-fetch). gRPC optimises for low-latency typed service-to-service calls (HTTP/2, binary Protobuf, bidirectional streaming). Pick by who calls you and how often the shape of the data they need changes.
| REST | GraphQL | gRPC | |
|---|---|---|---|
| Transport | HTTP/1.1+, JSON | HTTP, JSON over one POST | HTTP/2, Protobuf binary |
| Caching | native (HTTP caches) | hard (one POST endpoint) | app-level only |
| Over/under-fetch | common | client picks fields | fixed message, compact |
| Streaming | SSE / WebSocket bolt-on | subscriptions | first-class bidirectional |
| Browser-native | yes | yes | needs gRPC-Web proxy |
API key: a shared secret identifying a client, sent per request — coarse, easy, no expiry by default (rotate it). JWT: a signed, self-contained token carrying claims (sub, exp, scope); the server verifies the signature, so it needs no lookup — stateless and fast, but you can't easily revoke one before it expires. OAuth2: a framework for delegated access — "let app X act on user U's behalf without U handing over their password" — which mints access tokens (often JWTs). OAuth2 is the how you get the token; JWT is often what the token is.
import time, jwt # PyJWT
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
oauth2 = OAuth2PasswordBearer(tokenUrl="token") # pulls Bearer from header
SECRET = "..."; ALGO = "HS256" # RS256 in prod (asymmetric)
def current_user(token: str = Depends(oauth2)):
try:
claims = jwt.decode(token, SECRET, algorithms=[ALGO]) # verifies sig + exp
except jwt.ExpiredSignatureError:
raise HTTPException(status.HTTP_401_UNAUTHORIZED, "token expired")
except jwt.InvalidTokenError:
raise HTTPException(status.HTTP_401_UNAUTHORIZED, "invalid token")
if "read:trials" not in claims.get("scope", "").split():
raise HTTPException(status.HTTP_403_FORBIDDEN, "missing scope")
return claims["sub"] # inject this into endpoints
Interview Q&A · deep dive
read:trials, write:orders) — they bound the delegated access an OAuth2 client was granted. Roles describe what a user/principal is (admin, analyst) and usually map to a set of permissions. A request is allowed only if the token's scope and the principal's role both permit it — scope caps the client, role caps the user.FastAPI in depth framework
FastAPI is a thin layer over Starlette (ASGI server) + Pydantic (validation). The reason it became the Python API default: types are the contract, validation and OpenAPI docs are free, and async endpoints scale I/O concurrency without ceremony.
from fastapi import FastAPI, Depends, HTTPException, status
from pydantic import BaseModel, Field
class SearchIn(BaseModel):
text: str = Field(min_length=1, max_length=512)
top_k: int = Field(5, ge=1, le=50)
class Hit(BaseModel):
gdcid: str; score: float
app = FastAPI()
@app.post("/search", response_model=list[Hit], status_code=status.HTTP_200_OK)
async def search(q: SearchIn, svc=Depends(get_service)):
if not await svc.ready():
raise HTTPException(status.HTTP_503_SERVICE_UNAVAILABLE)
return await svc.search(q.text, q.top_k)
| Lever | What it gives you |
|---|---|
| Pydantic models | request validation, response shape, OpenAPI schema — one declaration |
| Depends() | dependency injection — services, DB sessions, auth, scoped cleanly per request |
| async def | non-blocking I/O; FastAPI runs def endpoints in a threadpool, so don't mix blocking I/O into async |
| Lifespan | startup/shutdown context — load models, warm caches, close pools |
| Middleware | cross-cutting: logging, request-id, CORS, auth, rate-limit — runs around every request |
| Background tasks | fire-and-forget after response; for real work use Celery/RQ instead |
| WebSockets | streaming endpoints (LLM token stream, live updates) |
Interview Q&A
FastAPI inspects each endpoint. An async def runs directly on the event loop — so every call inside it must be awaitable, and a blocking call there freezes the whole process. A plain def is dispatched to a bounded threadpool (default ~40 threads) so blocking libraries are safe — but throughput is capped by that pool. The rule: async endpoint → async clients only; sync library → keep the endpoint def. The disaster is an async def that calls requests.get() — it looks concurrent and serialises under load.
from contextlib import asynccontextmanager
from typing import Annotated
from fastapi import FastAPI, Depends, BackgroundTasks
@asynccontextmanager
async def lifespan(app: FastAPI):
app.state.pool = await open_db_pool() # startup: warm caches, load models
yield # <-- app serves requests here
await app.state.pool.close() # shutdown: graceful cleanup
app = FastAPI(lifespan=lifespan)
async def get_db(): # yield dependency = setup/teardown per request
conn = await app.state.pool.acquire()
try:
yield conn
finally:
await app.state.pool.release(conn) # runs AFTER the response is sent
DB = Annotated[object, Depends(get_db)] # reusable typed dependency alias
@app.post("/reports", status_code=202)
async def make_report(db: DB, bg: BackgroundTasks):
rid = await db.insert_pending_report()
bg.add_task(build_report, rid) # after response: fire-and-forget
return {"id": rid, "status": "accepted"} # 202 returns immediately
import time, uuid
from fastapi import Request
from fastapi.responses import JSONResponse
@app.middleware("http")
async def add_request_id(request: Request, call_next):
rid = str(uuid.uuid4())
t = time.perf_counter()
response = await call_next(request) # runs the rest of the pipeline
response.headers["X-Request-ID"] = rid
response.headers["X-Process-Time"] = f"{time.perf_counter()-t:.3f}"
return response
@app.exception_handler(ValueError)
async def on_value_error(request: Request, exc: ValueError):
return JSONResponse(status_code=422, content={"detail": str(exc)})
Interview Q&A · deep dive
lifespan instead of the old @app.on_event("startup")?lifespan is a single async context manager, so setup and teardown live together (acquired-on-startup is exactly what's released-on-shutdown) and it's harder to leak a resource. The on_event hooks are deprecated, split startup/shutdown apart, and don't share scope. Lifespan also integrates cleanly with ASGI's lifespan protocol used by Uvicorn/Gunicorn.yield dependency give you over a plain return dependency?yield runs as setup, the yielded value is injected, and code after yield (in a finally) runs after the response is sent — perfect for releasing a DB connection or closing a transaction. FastAPI orders teardown correctly even with nested dependencies, and runs it whether the endpoint succeeded or raised.async def but calls a synchronous SQLAlchemy session. What breaks and how do you fix it?def so FastAPI runs it in the threadpool, switch to the async SQLAlchemy engine and await it, or wrap the blocking call in await asyncio.to_thread(...). Don't mix a blocking call into an async def.422 Unprocessable Entity with a structured per-field error list, automatically, without your code running. That's why the endpoint can assume its inputs are already typed and valid, and why the OpenAPI schema is generated from the same model.API limits, quotas & rate limiting capacity
Every API has hard limits — payload size, URL length, headers, requests per second, tokens per minute, context window. The senior move is naming the limits before they bite production and choosing the right rate-limit algorithm for the workload.
| Algorithm | How it works | Sweet spot |
|---|---|---|
| Fixed window | counter per N-second window; resets at boundary | simple, but bursts at window edges |
| Sliding window | rolling count over last N seconds | smoother; per-key memory cost |
| Token bucket | bucket refills at rate R, request costs 1 token; burst = bucket size | default choice — allows controlled bursts |
| Leaky bucket | requests queue, drain at fixed rate; overflow drops | strict downstream rate-shaping |
| Concurrency cap | max N in-flight; reject or queue beyond | protecting a slow backend (an LLM, a DB) |
import time, random, httpx
def call_with_retry(url, payload, tries=6):
for i in range(tries):
r = httpx.post(url, json=payload, timeout=30)
if r.status_code != 429 and r.status_code < 500:
return r
# honour server hint; otherwise jittered exponential backoff
wait = float(r.headers.get("Retry-After", 0)) or (2**i + random.random())
time.sleep(wait)
r.raise_for_status()
Interview Q&A
All four limiters answer "is this request allowed?", but they differ in how they treat bursts. Fixed window counts per calendar window and resets hard — so a client can fire 2× the limit across a boundary (the "edge burst" problem). Sliding window log keeps timestamps of recent requests and counts the true last-N-seconds — accurate but O(requests) memory per key. Token bucket refills tokens at rate R up to cap B; idle time banks burst budget, so it allows controlled bursts. Leaky bucket drains at exactly rate R regardless of input — it smooths output, never bursts. Bursty real traffic → token bucket; protect a fragile downstream → leaky bucket.
import time
class TokenBucket:
def __init__(self, rate, capacity):
self.rate = rate # tokens added per second
self.capacity = capacity # max burst size
self.tokens = capacity
self.updated = time.monotonic()
def allow(self, cost=1):
now = time.monotonic()
# lazy refill: add only the tokens earned since last check
self.tokens = min(self.capacity, self.tokens + (now - self.updated) * self.rate)
self.updated = now
if self.tokens >= cost:
self.tokens -= cost
return True, 0.0
deficit = cost - self.tokens
return False, deficit / self.rate # seconds → Retry-After
bucket = TokenBucket(rate=5, capacity=10) # 5 req/s steady, burst 10
for i in range(14):
ok, retry = bucket.allow()
print(i, "OK" if ok else f"429 retry in {retry:.2f}s")
import asyncio, random, httpx
async def call(client, url, payload, tries=6):
for i in range(tries):
r = await client.post(url, json=payload, timeout=30)
if r.status_code < 500 and r.status_code != 429:
return r # success or non-retryable
# prefer the server's hint; else exponential backoff with full jitter
hint = r.headers.get("Retry-After")
wait = float(hint) if hint else random.uniform(0, 2 ** i)
await asyncio.sleep(wait)
r.raise_for_status() # give up after N tries
Interview Q&A · deep dive
Retry-After (how long to wait) and ideally RateLimit/RateLimit-Policy headers showing remaining quota and the policy. A good client honours Retry-After exactly when present, otherwise uses exponential backoff with full jitter, caps the number of retries, and attaches an idempotency key on writes so a retry after a partial success can't double-apply.Resilience & agentic patterns senior
Patterns the GoF book never covered but a senior is assumed to own: the distributed-resilience set that keeps a service alive when its dependencies fail, and the emerging agentic vocabulary that's becoming the architecture layer for LLM systems.
| Pattern | Force it resolves | Where it lands for you |
|---|---|---|
| Circuit breaker | stop hammering a failing dependency | wrap the per-field LLM endpoints (CT LLM Executor) so an outage trips open, not cascades |
| Bulkhead | isolate resource pools so one slow path can't starve others | one registry's slowness mustn't drain the shared worker pool |
| Retry + backoff | ride out transient faults without a thundering herd | TrainHub chunked-upload resumability; jittered exponential backoff |
| Saga | consistency across steps with no distributed transaction | multi-stage ingest where each step has a compensating undo |
import time, random, functools
def retry(tries=5, base=0.5):
def deco(fn):
@functools.wraps(fn)
def wrap(*a, **kw):
for i in range(tries):
try: return fn(*a, **kw)
except Exception:
if i == tries - 1: raise
time.sleep(base * 2**i + random.random()) # jitter
return wrap
return deco
Interview Q&A
| Layer | Failure it absorbs | Without it |
|---|---|---|
| Timeout | a call that never returns | threads pile up, pool exhausts |
| Retry + backoff + jitter | transient blips | fail on a 1-in-100 hiccup |
| Circuit breaker | a sustained outage | retries amplify the outage (DDoS yourself) |
| Bulkhead | one slow path | it drains the shared pool, everything stalls |
| Idempotency | duplicate delivery | double-charge, double-write |
| Fallback | the dependency is just gone | hard error reaches the user |
These compose in a precise order, innermost to outermost: timeout wraps the raw call, retry wraps the timeout, the breaker wraps the retry, the bulkhead caps concurrency around all of it. Get the order wrong — e.g. retrying outside a breaker that's already open — and you defeat the breaker. The deepest gotcha: retries make a transient outage worse unless a breaker caps them, because every client retries the struggling service in unison (the retry storm / thundering herd).
Closed = healthy, calls flow, count failures. Trip past a threshold → Open: fail fast (or fallback) for a cool-down, sending zero load to the sick dependency. After the cool-down → Half-open: allow a few probe calls; success closes the breaker, failure re-opens it. This is the State pattern applied to fault handling.
import time
class CircuitOpen(Exception): pass
class Breaker:
def __init__(self, fail_max=3, cool=5.0):
self.fail_max, self.cool = fail_max, cool
self.fails = 0; self.opened_at = None; self.state = "closed"
def call(self, fn, *a):
if self.state == "open":
if time.monotonic() - self.opened_at < self.cool:
raise CircuitOpen("fail fast") # open: no load sent
self.state = "half" # cool-down elapsed → probe
try:
r = fn(*a)
except Exception:
self.fails += 1
if self.fails >= self.fail_max or self.state == "half":
self.state, self.opened_at = "open", time.monotonic()
raise
self.fails = 0; self.state = "closed" # probe/call ok → close
return r
# Idempotency: dedupe by key so retried/duplicate deliveries are safe.
SEEN = {}
def charge(idem_key, amount):
if idem_key in SEEN: # same key → return prior result, no re-charge
return SEEN[idem_key]
result = {"charged": amount} # the real side effect happens once
SEEN[idem_key] = result
return result
charge("req-7", 50); charge("req-7", 50) # billed once, not twice
# ReAct: bounded reason→act→observe loop over tools (the agent control loop).
def react(goal, tools, llm, max_steps=5):
scratch = []
for _ in range(max_steps): # bound is non-negotiable — no infinite loops
thought, action, arg = llm(goal, scratch) # reason → choose tool
if action == "finish":
return arg # terminal: model says it's done
obs = tools[action](arg) # act, then observe the result
scratch.append((thought, action, obs))
return "gave up: step budget exhausted" # fail safe, not silent
Interview Q&A · deep dive
PUT x=5, DELETE, set-membership) need no key; INSERT/increment/charge do.UI/UX concepts for engineers product craft
You don't need to be a designer, but a senior who builds tools and dashboards is expected to make usable interfaces and speak the language. UX is how it works and feels; UI is how it looks. The fastest level-up is a handful of durable principles, not pixel-pushing.
| Visibility of status | always show what's happening — loading, saved, progress |
| Match the real world | use the user's words and mental models, not internal jargon |
| User control | undo, cancel, clear exits — never trap the user |
| Consistency | the same action looks & behaves the same everywhere (a design system enforces this) |
| Error prevention | stop mistakes before they happen (confirm destructive actions, validate inputs) |
| Recognition over recall | show options; don't make people remember them |
| Clear error recovery | plain-language errors that say what to do next |
| Concept | What to apply |
|---|---|
| Visual hierarchy | size, weight, colour, spacing guide the eye to what matters first; one primary action per screen |
| Accessibility (WCAG / a11y) | sufficient colour contrast, keyboard navigation, alt text, labels — usable by everyone, often legally required |
| Responsive design | layouts that work mobile → desktop; design mobile-first, enhance up |
| Design system | reusable tokens + components (spacing, colour, type, buttons) so a team ships consistent UI fast |
| Information architecture | group and label so users find things — fewer top-level choices, clear paths (your jobs-to-be-done framing) |
Interview Q&A
Engineers ship the ideal state and forget the other four. A robust screen has five states, and the boring ones (loading, empty, error) are where trust is won or lost. The discipline: for every view that fetches data, sketch all five before writing the component. "It works on my machine with seeded data" is the ideal state in disguise.
| State | What it must do |
|---|---|
| Ideal | the rich, populated view — what you naturally build first |
| Empty | no data yet — explain why and give the next action (not a blank box) |
| Loading | skeleton that mirrors layout, not a centred spinner that hides shape |
| Partial | some data, some still streaming in — keep the page usable |
| Error | plain-language cause + a retry, never a stack trace |
Speed is felt, not measured. Three thresholds (the classic HCI numbers) decide what feedback you owe the user. Below 100 ms feels instant — no indicator. Up to ~1 s the user stays "in flow" — show subtle motion, no blocking spinner. Past ~10 s attention is gone — show real progress and let them work elsewhere. A skeleton screen tests ~20% faster than a spinner for the same wait because it primes the eye to the final shape; optimistic UI (render the success state immediately, reconcile on the server reply) makes a 300 ms round-trip feel like zero.
Interview Q&A · deep dive
<button>/<label>/heading order and alt text, so screen readers announce meaning, not "clickable div".Pydantic — typed data validation models
Pydantic turns Python type hints into runtime validation. Declare a model with annotated fields; Pydantic parses, coerces, and validates input, raising clear structured errors on bad data. It's the schema layer beneath FastAPI and the standard for config and API I/O.
from pydantic import BaseModel, Field, field_validator
from datetime import date
class Trial(BaseModel):
id: int
title: str = Field(min_length=3)
phase: int = Field(ge=1, le=4) # 1..4 enforced
start: date | None = None
@field_validator("title")
@classmethod
def not_blank(cls, v):
if not v.strip(): raise ValueError("blank title")
return v.strip()
t = Trial(id="42", title="NSCLC study", phase="3") # coerces "42"->42
t.model_dump() # {'id':42,'title':'NSCLC study','phase':3,'start':None}
t.model_dump_json() # -> JSON string
| Feature | What it does |
|---|---|
| Field(gt=, max_length=) | declarative constraints on a field |
| @field_validator / @model_validator | custom checks on one field / the whole model |
| model_dump() / model_validate() | serialize to dict / parse + validate input (v2 names) |
| BaseSettings (pydantic-settings) | typed config loaded and validated from env vars |
Interview Q&A
At import time Pydantic compiles each model into a CoreSchema — a tree describing every field's validators and serializers — and hands it to pydantic-core, a Rust engine. Validation at runtime is then a tight Rust loop, not Python attribute-by-attribute checking, which is why v2 is roughly 5–50× faster than v1. The compile-once / validate-many split is the mental model: model definition is the slow part (paid once), instantiation is cheap. Validators run in modes — mode="before" sees raw input (good for normalising/parsing), mode="after" sees the already-coerced typed value (good for business rules).
from pydantic import BaseModel, Field, field_validator, model_validator, computed_field, field_serializer
from datetime import date
class Enrollment(BaseModel):
site: str
opened: date
closed: date | None = None
target: int = Field(gt=0)
enrolled: int = Field(ge=0)
@field_validator("site", mode="before") # runs on RAW input, before coercion
@classmethod
def upper(cls, v: str) -> str:
return v.strip().upper()
@model_validator(mode="after") # whole-model rule, post-coercion
def check_window(self) -> "Enrollment":
if self.closed and self.closed < self.opened:
raise ValueError("closed before opened")
if self.enrolled > self.target:
raise ValueError("over-enrolled")
return self
@computed_field # derived, appears in model_dump()
@property
def pct_full(self) -> float:
return round(100 * self.enrolled / self.target, 1)
@field_serializer("opened") # control wire format
def iso(self, v: date) -> str:
return v.isoformat()
e = Enrollment(site=" bdx-07 ", opened="2026-01-10", target=50, enrolled="20")
print(e.model_dump()) # {'site':'BDX-07', ..., 'pct_full': 40.0}
from pydantic import Field
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(env_prefix="APP_", env_file=".env")
db_url: str # from APP_DB_URL (required)
pool_size: int = Field(default=10, ge=1, le=100)
debug: bool = False # "1"/"true"/"yes" all coerce
settings = Settings() # validated at startup -> fail fast on bad config
| v1 (legacy) | v2 (current) |
|---|---|
| .dict() / .json() | .model_dump() / .model_dump_json() |
| .parse_obj() | .model_validate() / .model_validate_json() |
| @validator / @root_validator | @field_validator / @model_validator (+ mode=) |
| class Config: | model_config = ConfigDict(...) |
| (none) | @computed_field, @field_serializer, @model_serializer |
Interview Q&A · deep dive
before and after validator, and when do you reach for a model validator?mode="before" validator receives the raw input prior to type coercion — ideal for normalising or parsing (trim a string, split a CSV). A mode="after" validator receives the value already coerced to the field's type — ideal for business invariants. Use a @model_validator when a rule spans multiple fields (e.g. end > start), since a field validator only sees its own field.pydantic-core). Each model is compiled once into a CoreSchema, then validation runs as a Rust loop rather than per-attribute Python. Real-world speedups are ~5–50×. The cost moves to model-definition time, which is fine because you define once and validate many."5" to 5?model_config = ConfigDict(strict=True), per-field with a Strict* type (StrictInt, StrictStr) or Field(strict=True), or per-call with model_validate(data, strict=True). Choose lax at human/forgiving boundaries, strict at machine contracts.@computed_field (above @property). It is excluded from validation input but included in model_dump() / JSON Schema, and can be conditionally dropped via exclude_if. For reshaping an existing field on the way out, use @field_serializer; for whole-model output, @model_serializer.@dataclass with type hints — what does Pydantic add at runtime?ValidationError, and serialises to/from dict and JSON with schema generation. Dataclass for internal plumbing; Pydantic at trust boundaries.Flask — the micro-framework minimal
Flask gives you routing, request/response handling, and templating, then stays out of the way — you assemble the rest (DB, auth, validation) from extensions. It's WSGI / synchronous by default: ideal when you want a small footprint and full control over the stack.
from flask import Flask, request, jsonify
def create_app(): # factory -> testable, configurable
app = Flask(__name__)
@app.route("/trials/<int:tid>") # typed URL converter
def get_trial(tid):
return jsonify(id=tid, status="active")
@app.post("/trials")
def create():
data = request.get_json()
return jsonify(created=data), 201
return app
| Piece | Role |
|---|---|
| Blueprints | split routes into modular, registrable groups |
| App factory | build the app in a function so tests get a fresh instance |
| Extensions | Flask-SQLAlchemy, Marshmallow, JWT — you pick the stack |
| WSGI / sync | blocking by default; async views exist but are limited |
Interview Q&A
Flask's most-misunderstood feature is that request, session, g, and current_app are global-looking proxies that are actually per-request. On each incoming request Flask pushes a request context (and an application context) onto a stack; the proxies resolve to whatever is on top of the stack for the current worker/thread/coroutine. That is how the same from flask import request import gives every concurrent request its own data without you passing it around. Outside a request (a script, a CLI command) those proxies are unbound — touching them raises "Working outside of application context", which you fix with with app.app_context():.
| Proxy | Scope / holds |
|---|---|
| request | request context — the incoming HTTP request |
| g | application context — scratch space for one request (e.g. db handle) |
| current_app | application context — the active app (factory-friendly) |
| session | request context — signed cookie store |
# trials/api.py -- a blueprint groups related routes
from flask import Blueprint, request, jsonify, g, abort
bp = Blueprint("trials", __name__, url_prefix="/api/trials")
@bp.get("/<int:tid>")
def get_one(tid):
row = g.db.find(tid) # g = per-request scratch space
if row is None:
abort(404) # short-circuits to the 404 handler
return jsonify(row)
# app.py -- the factory assembles & returns the app
from flask import Flask, jsonify
from trials.api import bp as trials_bp
def create_app(config=None):
app = Flask(__name__)
app.config.update(config or {})
app.register_blueprint(trials_bp) # mount the module
@app.errorhandler(404) # JSON errors, not HTML pages
def not_found(e):
return jsonify(error="not found"), 404
@app.teardown_appcontext # runs after every request
def close_db(exc):
db = g.pop("db", None)
if db is not None: db.close()
return app
Interview Q&A · deep dive
from flask import request be a module-level import yet give each concurrent request its own data?request is a context-local proxy, not the request itself. Flask pushes a request context onto a stack at the start of each request; the proxy forwards attribute access to whatever sits on top of the stack for the current execution context (thread/greenlet/task). So the global name resolves to a different object per in-flight request — no thread-safety problem, no passing it around.g for, and how long does it live?g is per-request scratch storage tied to the application context — typically a DB connection or the authenticated user, set once and reused within that request. It is reset for every request and is not shared between them, so it is not a cache. Clean up in teardown_appcontext.app = Flask(__name__) works?init_app. It removes import-time side effects.async def views, but each still runs in a worker thread — it's not a true async stack like ASGI/FastAPI, so for high-concurrency async I/O, FastAPI is the better fit.Django — batteries included full-stack
Django ships everything: a powerful ORM with migrations, an auto-generated admin site, auth, forms, and templating, organized as MTV (model–template–view). You trade Flask's flexibility for convention and speed on CRUD-heavy apps; Django REST Framework adds APIs on top.
# models.py (a migration is generated from this)
from django.db import models
class Trial(models.Model):
title = models.CharField(max_length=200)
phase = models.IntegerField()
# the ORM — a lazy QuerySet
Trial.objects.filter(phase=3).order_by("title")
Trial.objects.select_related("sponsor").get(id=42) # avoid N+1
# views.py
from django.http import JsonResponse
def active(request):
qs = Trial.objects.filter(phase__gte=3).values()
return JsonResponse(list(qs), safe=False)
| Built in | What you get |
|---|---|
| ORM + migrations | models become tables; schema changes are versioned migrations |
| Admin | auto CRUD UI over your models — huge time-saver |
| Auth, forms, templates | users/permissions, validation, server-rendered HTML |
| Django REST Framework | serializers + viewsets to expose the ORM as a REST API |
Interview Q&A
A Django request is a pipeline, not a function call. The URL resolver maps the path to a view; middleware wraps the view as nested layers (each can short-circuit or post-process — session, auth, CSRF, GZip all live here); the view runs business logic against the ORM and renders a template or returns JSON. "MTV" is Django's MVC: the Model is the ORM layer, the Template is the presentation, the View is the controller that ties them. Knowing the order matters: request.user exists only because AuthenticationMiddleware ran before your view.
from django.db.models import Count, Q, F, Avg
# annotate = compute per-row aggregates IN SQL (not in Python)
sites = (Site.objects
.annotate(n_active=Count("trial", filter=Q(trial__phase__gte=3)))
.filter(n_active__gt=0)
.order_by("-n_active"))
# F() references a column -> atomic update, no read-modify-write race
Trial.objects.filter(id=42).update(enrolled=F("enrolled") + 1)
# Q() builds complex boolean filters (| OR, & AND, ~ NOT)
Trial.objects.filter(Q(phase=3) | Q(phase=4), ~Q(status="closed"))
# beat N+1: one query for the FK join, one batched query for the reverse set
qs = (Trial.objects
.select_related("sponsor") # to-one -> SQL JOIN
.prefetch_related("sites") # to-many -> 2nd query, joined in Python
.filter(phase__gte=3))
from rest_framework import serializers, viewsets
class TrialSerializer(serializers.ModelSerializer):
sponsor = serializers.StringRelatedField() # nested read-only field
class Meta:
model = Trial
fields = ["id", "title", "phase", "sponsor"]
class TrialViewSet(viewsets.ModelViewSet): # full CRUD from one class
serializer_class = TrialSerializer
# select_related here so the API doesn't trigger N+1 per row
queryset = Trial.objects.select_related("sponsor").all()
| Need | Tool | Why |
|---|---|---|
| to-one relation | select_related | SQL JOIN in one query |
| to-many / reverse FK | prefetch_related | second batched query, joined in Python |
| per-row aggregate | annotate | computed in SQL, not a Python loop |
| atomic counter | F() | UPDATE in DB, dodges read-modify-write races |
| complex boolean filter | Q() | OR / NOT / grouped conditions |
Interview Q&A · deep dive
HttpRequest → the middleware stack runs top-down (request phase: session, auth populating request.user, CSRF) → the URL resolver matches the path to a view → the view runs logic, queries the Model/ORM, renders a Template or returns JSON as an HttpResponse → middleware runs bottom-up (response phase: GZip, headers) → the response is returned. That ordering is why request.user is available in the view at all.filter().order_by() constructs SQL but executes nothing. Evaluation is triggered by iteration, slicing with a step, len(), list(), bool(), or pickling. Results are then cached on the QuerySet, so re-iterating reuses them — but .count() and a fresh slice issue new queries. This laziness lets you compose filters cheaply and chain them across functions.F() for enrolled = enrolled + 1 instead of doing it in Python?obj.enrolled += 1; obj.save() reads the value, increments in Python, and writes back — two concurrent requests can both read the same value and one increment is lost. update(enrolled=F("enrolled") + 1) compiles to a single atomic UPDATE ... SET enrolled = enrolled + 1 in the database, so the increment is race-free and skips loading the object.select_related follows to-one relations (FK, one-to-one) via a SQL JOIN, so the related rows come back in the same query — cheap but widens each row. prefetch_related handles to-many (reverse FK, M2M) with a second query that fetches all related objects in one shot, then stitches them in Python — more queries (a small constant) but avoids a giant cartesian JOIN. Both turn N+1 into O(1) queries.select_related/prefetch_related to the view or ViewSet queryset, and to guard with assertNumQueries in tests or django-debug-toolbar in dev.requests & httpx — HTTP clients calling APIs
requests is the classic synchronous HTTP client — simple and ubiquitous. httpx is the modern successor: the same ergonomic API plus async support, HTTP/2, and connection pooling via a client. For anything async (FastAPI, agents calling tools), httpx is the default.
import requests, httpx
# requests -- synchronous, the classic
r = requests.get("https://api.example.com/trials", timeout=10)
r.raise_for_status() # turn 4xx/5xx into an exception
data = r.json()
# httpx -- async + reused connections
async with httpx.AsyncClient(timeout=10) as client:
resp = await client.post("/trials", json={"phase": 3})
resp.raise_for_status()
| Tool / habit | Why |
|---|---|
| requests | sync, dead simple — scripts and most server code |
| httpx | sync and async, HTTP/2 — modern apps and concurrency |
| Session / Client | reuse one across calls to pool connections — a big perf win |
| timeout= | always set it — no timeout means a hang can wedge your service |
| raise_for_status() | fail loudly on HTTP errors instead of parsing an error body |
Interview Q&A
The jump from "it works" to "it survives production" is three habits beyond raise_for_status(). (1) One long-lived client with a connection-pool Limits so concurrent calls reuse warm TCP/TLS instead of handshaking every time. (2) Granular timeouts — a single timeout=10 is a blunt instrument; httpx lets you bound connect, read, write, and pool separately, which is what you want when a server accepts the connection fast but streams slowly. (3) Retries with backoff — but know what the built-in retry does and does not cover.
| Timeout phase | Bounds |
|---|---|
| connect | time to establish the TCP/TLS connection |
| read | max gap between received chunks of the response |
| write | max gap while sending the request body |
| pool | time waited for a free connection from the pool |
import httpx
# transport retries ONLY connect errors/timeouts -- not 429/5xx responses
transport = httpx.AsyncHTTPTransport(retries=3)
limits = httpx.Limits(max_connections=100, max_keepalive_connections=20)
timeout = httpx.Timeout(connect=5.0, read=10.0, write=5.0, pool=2.0)
# build ONE client at startup and reuse it (DI / module singleton)
client = httpx.AsyncClient(
base_url="https://api.example.com",
transport=transport, limits=limits, timeout=timeout,
headers={"authorization": "Bearer ..."},
)
async def fetch_trial(tid: int) -> dict:
r = await client.get(f"/trials/{tid}")
r.raise_for_status() # 4xx/5xx -> HTTPStatusError
return r.json()
# on shutdown: await client.aclose() -- release pooled sockets
async def download(url: str, dest: str):
# .stream() returns headers immediately; body is pulled lazily
async with client.stream("GET", url) as resp:
resp.raise_for_status()
with open(dest, "wb") as f:
async for chunk in resp.aiter_bytes(chunk_size=65536):
f.write(chunk) # constant memory, even for a 2 GB file
Interview Q&A · deep dive
timeout=10 is set but requests still hang for minutes. What's likely wrong?Timeout(connect=, read=, write=, pool=) and, for a hard ceiling on total time, wrap the whole call (e.g. asyncio.wait_for / an overall deadline) so no single request can exceed a wall-clock budget.retries= covers only connection errors, not HTTP error responses, and adds no backoff. Handle 429/503 yourself: retry only idempotent methods, use exponential backoff with jitter, honour the Retry-After header if present, cap attempts, and ideally pair with a circuit breaker. Libraries like tenacity or httpx-retries express this cleanly.Limits control?Limits(max_connections, max_keepalive_connections, keepalive_expiry) caps total concurrent sockets (back-pressure so you don't exhaust the upstream or your file descriptors) and how many idle connections to keep warm and for how long.client.stream("GET", url) in a context manager and iterate aiter_bytes() (or aiter_lines()), writing each chunk to disk — memory stays constant. The gotcha: inside the stream block the body isn't buffered, so resp.text/resp.json() raise until you've read it; if you need the parsed body, call resp.read() first or don't stream.Machine Learning & Data Science
The classical-ML and data-science layer underneath the LLM work — the DS workflow, the algorithms and when each fits, honest evaluation, the scikit-learn ecosystem, and MLflow for tracking and shipping models. This is the foundation a Python/ML/GenAI role expects you to stand on before the LLM specifics.
The data-science workflow lifecycle
A model is the small part. Most of the value — and most interview discussion — is in framing the problem, understanding the data, and evaluating honestly. The work is a loop, not a line.
Interview Q&A
The diagram people draw is a straight pipeline; the work is a cycle with two inner loops. The fast loop (features → train → eval) runs dozens of times a day against the validation set. The slow loop (re-frame the problem, re-collect data, redefine the metric) runs when the fast loop plateaus or production drifts. The senior signal is knowing which loop you are in: tuning hyperparameters when the real problem is a wrong success metric is wasted motion.
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.dummy import DummyClassifier
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import average_precision_score
# 1. FRAME: target + the metric that matches the business cost (recall-heavy)
df = pd.read_parquet("events.parquet")
y = df.pop("churned_30d"); X = df
# 2. SPLIT first — stratify so the rare class survives in every split
X_tr, X_te, y_tr, y_te = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42)
# 3. BASELINE — the bar every later model must clear on the SAME metric
base = DummyClassifier(strategy="prior").fit(X_tr, y_tr)
print("baseline AP:", average_precision_score(y_te, base.predict_proba(X_te)[:, 1]))
# 4. ITERATE — measure on CV (train only), not the test set
model = HistGradientBoostingClassifier(random_state=42)
cv = cross_val_score(model, X_tr, y_tr, scoring="average_precision", cv=5)
print("model CV AP:", cv.mean().round(3))
# 5. EVAL once on held-out test only after you've stopped iterating
model.fit(X_tr, y_tr)
print("held-out AP:", average_precision_score(y_te, model.predict_proba(X_te)[:, 1]))
| Stage | Real time spent | What goes wrong here |
|---|---|---|
| Frame & metric | under-invested | optimising accuracy on a 2% class; metric doesn't match cost of errors |
| EDA & data cleaning | ~60-80% | missingness patterns and leakage missed; train data not like production |
| Modelling | ~10% | jumping to deep nets before a baseline exists |
| Eval & ship | under-invested | no monitoring; offline metric ≠ online metric (selection bias, drift) |
Interview Q&A · deep dive
Model development — rules & process discipline
A model is shipped on process, not on a single clever idea. The senior tell is naming the rules you never break (no test-set leakage, baseline first, one variable at a time) and the loop you always run (frame → split → baseline → iterate → eval → ship).
| Rule | What it means |
|---|---|
| Baseline first | simplest model + naive features. Anything later must beat it on the same eval, or it's not progress. |
| No leakage | fit scalers/encoders on train only; never let test data leak into preprocessing, feature selection, or model picking. |
| Hold out the test set | touch it once at the end. Use train/val for everything else. Use cross-validation when data is small. |
| One change at a time | change features or model or hyperparams per experiment, log everything to MLflow, so you know what moved the needle. |
| Regularize before complicating | L1/L2, dropout, early stopping, simpler model. Don't add features to a model that's already overfitting. |
| Reproducibility | seed RNGs, pin versions, version data, log the run. "It worked on my notebook" is not a ship. |
Interview Q&A
Every discipline rule reduces to one principle: your validation must simulate prediction time exactly. A plain random k-fold is fine for i.i.d. tabular rows, but it lies when there is structure — time, groups, or rare classes. The fix is to choose a splitter that respects that structure, so your CV score is an honest forecast of held-out performance.
| Data has... | Wrong splitter | Right splitter |
|---|---|---|
| Time order | random KFold (peeks at the future) | TimeSeriesSplit — train past, test future |
| Groups (per user/site) | KFold (same user in train & test) | GroupKFold — no group spans folds |
| Class imbalance | KFold (a fold may have 0 positives) | StratifiedKFold — keep class ratio |
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, StratifiedKFold, cross_val_score
# Pipeline = scaler fit inside each fold → no leakage during CV
pipe = Pipeline([("sc", StandardScaler()),
("clf", LogisticRegression(max_iter=1000))])
grid = {"clf__C": [0.01, 0.1, 1, 10]} # regularisation strength
inner = StratifiedKFold(5, shuffle=True, random_state=0)
outer = StratifiedKFold(5, shuffle=True, random_state=1)
# INNER loop tunes C; OUTER loop estimates generalisation of the whole procedure
search = GridSearchCV(pipe, grid, scoring="roc_auc", cv=inner)
nested = cross_val_score(search, X, y, scoring="roc_auc", cv=outer)
print("unbiased AUC estimate:", nested.mean().round(3),
"+/-", nested.std().round(3))
Interview Q&A · deep dive
TimeSeriesSplit (expanding or rolling window: train on [t0..t], test on [t+1..t+k]). Also lag/window features must be computed without crossing the split boundary.C = stronger regularisation (C is inverse strength).account_closed_date for churn)? (2) was preprocessing fit on the full set? (3) are duplicate/near-duplicate rows split across train and test? (4) is there a group (user/session) appearing on both sides? Reproduce with the suspect feature removed and inside a proper pipeline.Supervised learning labels in
Learn a mapping from features to a labelled target. Two shapes: regression (predict a number) and classification (predict a category). Knowing when each algorithm fits beats memorising math.
| Algorithm | Use when |
|---|---|
| Linear / Logistic regression | baseline, interpretable, roughly linear signal |
| k-NN | small data, local structure, simple baseline |
| SVM | clear margins, medium data, high-dimensional |
| Naive Bayes | text/spam, fast, strong-independence ok |
| Tree ensembles | tabular default — see the ensembles card |
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
clf = LogisticRegression(max_iter=1000).fit(X_train, y_train)
proba = clf.predict_proba(X_test)[:, 1] # a probability, not just a label
print(classification_report(y_test, proba > 0.5))
print(dict(zip(features, clf.coef_[0]))) # which signal drove the decision
Interview Q&A
Strip the marketing and a supervised algorithm is two choices: the hypothesis class (what shapes of decision boundary it can draw) and the loss function (how it scores being wrong). Training is just minimising the loss over that class. This is why the same data gives different boundaries: a linear model can only draw a hyperplane; a tree draws axis-aligned rectangles; an SVM with an RBF kernel draws smooth curved regions.
| Task / model | Loss minimised | Boundary shape |
|---|---|---|
| Linear regression | MSE (squared error) | hyperplane (a number) |
| Logistic regression | log-loss (cross-entropy) | linear in feature space |
| Linear SVM | hinge loss (max-margin) | max-margin hyperplane |
| kNN | none (lazy, no training) | local, jagged (Voronoi) |
| Decision tree | Gini / entropy split gain | axis-aligned rectangles |
from sklearn.datasets import make_classification
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
X, y = make_classification(n_samples=2000, n_informative=8, random_state=0)
# distance/gradient models NEED scaling; the tree does not — so pipe per model
models = {
"logreg": make_pipeline(StandardScaler(), LogisticRegression(max_iter=1000)),
"svm-rbf": make_pipeline(StandardScaler(), SVC(kernel="rbf")),
"knn": make_pipeline(StandardScaler(), KNeighborsClassifier(n_neighbors=15)),
"tree": DecisionTreeClassifier(max_depth=5, random_state=0),
}
for name, m in models.items():
s = cross_val_score(m, X, y, scoring="roc_auc", cv=5)
print(f"{name:8s} AUC {s.mean():.3f} +/- {s.std():.3f}")
import numpy as np
from sklearn.metrics import precision_recall_curve
proba = clf.predict_proba(X_val)[:, 1]
prec, rec, thr = precision_recall_curve(y_val, proba)
# pick the lowest threshold that still gives >= 90% precision
ok = prec[:-1] >= 0.90
best = thr[ok][np.argmax(rec[:-1][ok])] # max recall at that precision
print("deploy threshold:", round(best, 3)) # NOT the default 0.5
Interview Q&A · deep dive
(prediction - label) term, so optimisation is well-behaved and penalises confident mistakes heavily. It's also the maximum-likelihood objective for a Bernoulli target.Unsupervised learning no labels
Find structure without a target. Clustering groups similar points; dimensionality reduction compresses many features into a few while keeping signal; both power exploration and anomaly detection.
| Task | Tool | Note |
|---|---|---|
| Clustering (known k) | k-means | fast, assumes round, similar-size clusters |
| Clustering (density) | DBSCAN | finds arbitrary shapes + outliers, no k needed |
| Reduce dimensions | PCA | linear, keeps max variance; great preprocessing |
| Visualise clusters | t-SNE / UMAP | 2-D plots only — don't feed downstream |
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
km = KMeans(n_clusters=5, n_init="auto").fit(X)
print(silhouette_score(X, km.labels_)) # higher = tighter, better-separated
Interview Q&A
Unsupervised methods have no label to tell you you're right, so the danger is finding structure that isn't there. Two disciplines guard against it: defend k (don't eyeball it — use the elbow, silhouette, or a stability check) and defend the distance (scale features first, or one large-magnitude column silently becomes "the cluster"). A clustering is only as meaningful as the metric it's built on.
| Method | Picks k? | Cluster shape | Scales to big n? | Outliers |
|---|---|---|---|---|
| k-means | you must | convex, similar size | yes (O(nki)) | forced into a cluster |
| Hierarchical | cut the tree | any (linkage-dependent) | no (O(n²)) | visible in dendrogram |
| DBSCAN | no (eps, minPts) | arbitrary, density-based | medium | labelled as noise (-1) |
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import silhouette_score
Xs = StandardScaler().fit_transform(X) # scale FIRST — distance is everything
# defend k: sweep candidates, take the best average silhouette
scores = {}
for k in range(2, 9):
km = KMeans(n_clusters=k, n_init="auto", random_state=0).fit(Xs)
scores[k] = silhouette_score(Xs, km.labels_)
best_k = max(scores, key=scores.get)
print("chosen k:", best_k, scores)
# density clustering: no k, and -1 means "noise / outlier"
db = DBSCAN(eps=0.8, min_samples=10).fit(Xs)
n_clusters = len(set(db.labels_)) - (1 if -1 in db.labels_ else 0)
print("DBSCAN clusters:", n_clusters,
"outliers:", int((db.labels_ == -1).sum()))
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
# PCA: linear, deterministic, keeps 95% of variance — safe to feed downstream
pca = PCA(n_components=0.95).fit(Xs)
print("kept dims:", pca.n_components_,
"var explained:", pca.explained_variance_ratio_.sum().round(3))
# t-SNE/UMAP: nonlinear, for 2-D PLOTS ONLY — distances/sizes are not meaningful
emb2d = TSNE(n_components=2, perplexity=30, random_state=0).fit_transform(Xs)
Interview Q&A · deep dive
Feature engineering & preprocessing where models are won
Better features beat fancier models more often than not. The core moves: handle missing data, scale numerics, encode categoricals — and above all, avoid leakage.
| Step | How |
|---|---|
| Missing values | impute (mean/median/model) or flag; never silently drop |
| Scale numerics | standardise/normalise for distance- & gradient-based models |
| Encode categoricals | one-hot (low cardinality), target/frequency (high) |
| New signal | ratios, dates→parts, text→length/keywords, domain features |
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
pre = ColumnTransformer([
("num", StandardScaler(), num_cols),
("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols)])
model = Pipeline([("pre", pre), ("clf", clf)])
model.fit(X_train, y_train) # scaler/encoder learn from TRAIN only
Interview Q&A
Every transformer that learns a statistic — a scaler's mean/std, an imputer's median, an encoder's category list, a target-encoder's per-category mean — must learn it from the training fold only, then apply (transform) that frozen statistic to validation, test, and production. The moment you call fit on data that includes the rows you'll later score, the test distribution has leaked in. The diagram below is the discipline made visual.
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
num = Pipeline([("imp", SimpleImputer(strategy="median")),
("sc", StandardScaler())])
cat = Pipeline([("imp", SimpleImputer(strategy="most_frequent")),
("oh", OneHotEncoder(handle_unknown="ignore"))]) # unseen cats → all-zeros
pre = ColumnTransformer([("num", num, num_cols),
("cat", cat, cat_cols)])
clf = Pipeline([("pre", pre), ("lr", LogisticRegression(max_iter=1000))])
clf.fit(X_train, y_train) # every statistic above is learned from TRAIN only
clf.predict(X_new) # same frozen transforms apply at prediction time
from sklearn.preprocessing import TargetEncoder # sklearn 1.3+, CV-fitted internally
from sklearn.feature_selection import SelectKBest, mutual_info_classif
# one-hot explodes on 10k zip codes → target encoding stays compact.
# TargetEncoder cross-fits internally so a row never sees its own label.
te = TargetEncoder(smooth="auto")
zip_encoded = te.fit_transform(X_train[["zip"]], y_train)
# feature selection is ALSO fit-on-train — put it in the pipeline, not before split
sel = SelectKBest(mutual_info_classif, k=20) # picks k highest mutual-info feats
| Scenario | Encoding | Why |
|---|---|---|
| Low cardinality (< ~15) | one-hot | no false ordinal order; sparse, exact |
| High cardinality (zip, sku) | target / frequency / hashing | one-hot would explode dimensionality |
| True ordinal (S<M<L) | ordinal encoder | the order is real signal |
| Trees, any cardinality | native categorical (LightGBM / HistGB) | handles categories without one-hot |
Interview Q&A · deep dive
Pipeline and passing that to cross_val_score makes sklearn re-fit the transforms on each training fold and only transform the validation fold, so no fold sees its own held-out data. Same for feature selection and resampling.OneHotEncoder(handle_unknown="ignore") (unseen → all-zero vector) or a target/frequency encoder with a fallback to the global prior. The deeper guard is monitoring for new categories and a retraining trigger — an all-zeros row is a quiet signal the model is now extrapolating.TargetEncoder does the cross-fitting for you.Vectorization & NumPy performance
Pure-Python loops over millions of records are slow because every iteration pays interpreter overhead. Vectorization pushes the loop into C — NumPy & pandas operate on contiguous typed arrays with batched SIMD-friendly ops, typically 10–100× faster than equivalent Python loops.
import numpy as np
x = np.arange(10_000_000, dtype=np.float32)
# slow: ~3s — interpreter loop, boxed Python floats
out = [v*v + 1.0 for v in x]
# fast: ~20ms — one C-level ufunc, no Python overhead
out = x*x + 1.0
# broadcasting: align shapes without copying — scale rows by per-column means
M = np.random.randn(1000, 50)
centred = M - M.mean(axis=0) # (1000,50) - (50,) → broadcast
| Lever | What it gives | Trap |
|---|---|---|
| Vectorized ufuncs | 10–100× over loops | only works on numeric, fixed-dtype arrays |
| Broadcasting | align shapes without copies | silent shape bugs — assert shapes explicitly |
| Right dtype | float32 halves RAM vs float64; categoricals shrink string memory in pandas | narrow dtypes overflow; precision loss in long sums |
| Avoid iterrows | use apply with a vectorised function, or build columns directly | iterrows boxes every row — slowest path in pandas |
| Embeddings = vectors | cosine similarity is one dot product over a (N×d) matrix | not normalising before cosine |
Interview Q&A
Two costs vanish when you vectorize. First, interpreter overhead: a Python loop re-dispatches bytecode and boxes/unboxes a PyObject every iteration; a ufunc loops once in C over raw machine ints/floats. Second, memory layout: a NumPy array is one contiguous block of a single dtype, so the CPU's cache and SIMD units stay fed — whereas a Python list is an array of pointers scattered across the heap, a cache-miss per element. Vectorization is as much about data layout as about avoiding the loop.
import numpy as np
A = np.random.randn(1000, 64) # 1000 points, 64 dims
B = np.random.randn(500, 64)
# every A[i] vs every B[j] WITHOUT a Python loop, via the (a-b)^2 = a^2 - 2ab + b^2 trick
# shapes broadcast: (1000,1) + (500,) - 2*(1000,500) → (1000,500)
d2 = (A**2).sum(1)[:, None] + (B**2).sum(1)[None, :] - 2 * A @ B.T
dist = np.sqrt(np.maximum(d2, 0)) # clamp tiny negatives from float error
print(dist.shape) # (1000, 500) — this is kNN's inner loop
import numpy as np
X = np.random.randn(32, 128) # batch of 32, feature dim 128
W = np.random.randn(128, 10) # projection to 10 classes
# a dense layer: 'bf,fc->bc' (batch,feat) x (feat,class) → (batch,class)
logits = np.einsum("bf,fc->bc", X, W) # == X @ W, but the indices document intent
# batched attention scores: 'bid,bjd->bij' — each query·key dot, per batch
Q = np.random.randn(8, 20, 64)
K = np.random.randn(8, 20, 64)
scores = np.einsum("bid,bjd->bij", Q, K) # (8,20,20) — no loops, no transpose juggling
import numpy as np
M = np.random.randn(10_000, 1_000).astype(np.float32)
# C-order: rows are contiguous → summing over axis=1 (rows) is cache-friendly
print(M.flags["C_CONTIGUOUS"]) # True
# in-place: no new 40MB array allocated; out= reuses the buffer
np.multiply(M, 2.0, out=M) # vs M = M * 2.0 which copies
M /= M.sum(axis=1, keepdims=True) # row-normalise in place, broadcast denom
| einsum string | Operation | Equivalent |
|---|---|---|
| 'ij,jk->ik' | matrix multiply | A @ B |
| 'ii->i' | diagonal | np.diag(A) |
| 'ij->ji' | transpose | A.T |
| 'bij,bjk->bik' | batched matmul | A @ B (3-D) |
| 'i,i->' | dot product | a @ b |
Interview Q&A · deep dive
np.ascontiguousarray deliberately rather than letting copies happen silently.optimize=True finds a good contraction order. But a plain A @ B dispatches straight to tuned BLAS (GEMM); einsum may not, so for a single large 2-D matmul, @ can be faster. Use einsum for clarity and exotic contractions; @ for the hot 2-D path.np.sum(x, dtype=np.float64)) for those while keeping storage in float32.out= / in-place ops to reuse buffers, process in chunks/batches (tile the big axis), or use einsum with optimize=True which can avoid materialising intermediates. Vectorized doesn't mean free — budget the peak memory of every intermediate shape.Evaluation & metrics prove it works
The most interview-tested topic in ML. Split honestly, pick the metric that matches the cost of errors, and read the bias–variance trade-off to know whether to add or remove complexity.
| Problem | Metric | Why |
|---|---|---|
| Balanced classes | accuracy | fine when classes are even |
| Imbalanced / costly FN | precision, recall, F1 | accuracy lies when one class is rare |
| Ranking / threshold-free | ROC-AUC, PR-AUC | quality across all thresholds |
| Regression | RMSE / MAE / R² | error in the target's units |
from sklearn.metrics import (precision_score, recall_score,
f1_score, roc_auc_score, confusion_matrix)
print("precision", precision_score(y, pred))
print("recall ", recall_score(y, pred))
print("f1 ", f1_score(y, pred))
print("roc_auc ", roc_auc_score(y, proba)) # threshold-independent
print(confusion_matrix(y, pred)) # TN FP / FN TP
Interview Q&A
Don't memorise metrics — derive them from the cost of each error and whether you control a threshold. The first question is always: is this classification or regression, and do I score a hard label or a probability?
ROC-AUC plots TPR vs FPR and is insensitive to class balance — on a 1-in-1000 problem it can look gorgeous (0.95) while the model is useless in production, because FPR has a huge negative denominator. PR-AUC (precision vs recall) keeps the rare positive class in the numerator on both axes, so it collapses honestly when you flood the output with false positives. Rule: balanced or you care about ranking both classes → ROC-AUC; rare positive class you actually act on → PR-AUC.
import numpy as np
from sklearn.metrics import roc_auc_score, average_precision_score
# 1000 samples, 1% positive — a realistic fraud-style imbalance
rng = np.random.default_rng(0)
y = (rng.random(1000) < 0.01).astype(int)
# a weak scorer: barely correlated with the label
proba = np.clip(y * 0.3 + rng.random(1000) * 0.7, 0, 1)
print("ROC-AUC", round(roc_auc_score(y, proba), 3)) # looks healthy
print("PR-AUC ", round(average_precision_score(y, proba), 3)) # tells the truth on rare class
import numpy as np
from sklearn.metrics import precision_recall_curve, brier_score_loss
from sklearn.metrics import mean_absolute_error, root_mean_squared_error, r2_score
# 1) pick the operating threshold, don't default to 0.5
prec, rec, thr = precision_recall_curve(y_true, y_proba)
f1 = 2 * prec * rec / (prec + rec + 1e-9)
best = thr[np.argmax(f1[:-1])] # threshold that maximises F1
print("operating threshold", round(float(best), 3))
# 2) calibration: are predicted probabilities trustworthy?
print("brier", round(brier_score_loss(y_true, y_proba), 4)) # lower = better calibrated
# 3) regression: RMSE in target units, MAE robust to outliers, R^2 unitless
print("MAE ", mean_absolute_error(yr, pr))
print("RMSE", root_mean_squared_error(yr, pr)) # sklearn >=1.4 helper
print("R2 ", r2_score(yr, pr))
| Symptom | What it means | Reach for |
|---|---|---|
| Accuracy high, recall low | imbalanced, predicting majority | PR-AUC, lower the threshold |
| Probabilities cluster near 0.5 | poor calibration / confidence | Brier, reliability curve, isotonic/Platt |
| RMSE >> MAE | a few large errors dominate | inspect outliers; consider MAE/Huber |
| Great CV, bad in prod | leakage or distribution shift | audit splits, time-based CV, monitor |
Interview Q&A · deep dive
PR-AUC (average precision) keeps precision in view, so it drops when you generate false positives — the metric that matches a rare-event detection job.CalibratedClassifierCV.Tree ensembles — the tabular workhorses go-to
On real tabular data, ensembles of decision trees win most of the time. Two recipes: bagging (Random Forest — many independent trees, averaged, lowers variance) and boosting (XGBoost/LightGBM — trees built sequentially, each fixing the last, lowers bias).
| Random Forest | Gradient Boosting | |
|---|---|---|
| How | parallel trees, averaged (bagging) | sequential trees, error-correcting (boosting) |
| Strength | robust, hard to overfit, low tuning | usually higher accuracy |
| Watch | can underfit vs boosting | needs tuning; can overfit if unchecked |
Interview Q&A
A single decision tree greedily picks the split that most reduces impurity (Gini/entropy for classification, variance for regression). It is high-variance — reshuffle the data and you get a different tree. The two ensemble families attack different errors: bagging grows deep, decorrelated trees on bootstrap samples and averages them (variance ↓); boosting grows shallow trees in sequence, each fitting the residual gradient of the loss so far (bias ↓). That residual-fitting view is the whole idea of gradient boosting.
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
X_tr, X_va, y_tr, y_va = train_test_split(X, y, test_size=0.2, stratify=y, random_state=0)
model = lgb.LGBMClassifier(
n_estimators=2000, # upper bound; early stopping picks the real count
learning_rate=0.05, # low LR + many trees = the boosting sweet spot
num_leaves=31, # LightGBM grows leaf-wise; cap leaves to control overfit
subsample=0.8, colsample_bytree=0.8, # stochastic boosting = regularisation
)
model.fit(
X_tr, y_tr,
eval_set=[(X_va, y_va)], eval_metric="auc",
callbacks=[lgb.early_stopping(50), lgb.log_evaluation(0)], # LightGBM 4.x callback API
)
print("best iteration", model.best_iteration_)
print("val auc", round(roc_auc_score(y_va, model.predict_proba(X_va)[:, 1]), 4))
from xgboost import XGBClassifier
clf = XGBClassifier(
n_estimators=2000, learning_rate=0.05, max_depth=6,
subsample=0.8, colsample_bytree=0.8,
tree_method="hist", # histogram splitting (default since 2.x); device="cuda" for GPU
early_stopping_rounds=50, # now a constructor arg in modern XGBoost
eval_metric="auc",
)
clf.fit(X_tr, y_tr, eval_set=[(X_va, y_va)], verbose=False)
print("best_iteration", clf.best_iteration)
| XGBoost | LightGBM | CatBoost | |
|---|---|---|---|
| Tree growth | level-wise (depth-balanced) | leaf-wise (best-gain leaf) | symmetric / oblivious |
| Speed on wide data | fast (hist) | usually fastest | moderate |
| Categoricals | native (recent) | native | best-in-class, built-in |
| Default risk | solid all-rounder | can overfit small data (leaf-wise) | great defaults, less tuning |
Interview Q&A · deep dive
num_leaves/min_child_samples. Level-wise grows balanced trees, more conservative and easier to reason about. On big data leaf-wise usually wins.scikit-learn & pipelines the toolkit
scikit-learn's power is one consistent interface — fit / transform / predict — across every estimator. The single most important habit is wrapping preprocessing + model in a Pipeline so cross-validation is leak-free.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
pipe = Pipeline([
("scale", StandardScaler()), # fit on train fold only
("clf", LogisticRegression(max_iter=1000)),
])
grid = GridSearchCV(pipe, {"clf__C": [0.1, 1, 10]}, cv=5)
grid.fit(X_train, y_train) # CV does scaling per fold — no leakage
Interview Q&A
Production tables are mixed: numeric columns want imputing + scaling, categoricals want imputing + one-hot. ColumnTransformer routes each column group to its own sub-pipeline and stitches the outputs back together, all inside one estimator. Nesting it in a Pipeline with the model is what makes the whole transform fit per-fold and serialise as a single unit.
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.model_selection import RandomizedSearchCV
num = ["age", "income"]
cat = ["country", "plan"]
pre = ColumnTransformer([
("num", Pipeline([("imp", SimpleImputer(strategy="median")),
("sc", StandardScaler())]), num),
("cat", Pipeline([("imp", SimpleImputer(strategy="most_frequent")),
("oh", OneHotEncoder(handle_unknown="ignore"))]), cat),
])
pipe = Pipeline([("pre", pre),
("clf", HistGradientBoostingClassifier())])
search = RandomizedSearchCV(
pipe,
{"clf__max_depth": [3, 5, None], "clf__learning_rate": [0.05, 0.1]},
n_iter=6, cv=5, scoring="roc_auc", random_state=0,
)
search.fit(X_train, y_train) # every transform refit inside each fold
print(search.best_params_, round(search.best_score_, 3))
from sklearn.base import BaseEstimator, TransformerMixin
import numpy as np
class LogClip(BaseEstimator, TransformerMixin):
"""Winsorise at a learned upper percentile, then log1p — fit on train only."""
def __init__(self, q=0.99):
self.q = q # params set in __init__, never mutated in fit
def fit(self, X, y=None):
self.cap_ = np.quantile(X, self.q, axis=0) # learned state ends with _
return self
def transform(self, X):
return np.log1p(np.minimum(X, self.cap_))
# drops straight into a Pipeline step; get_params/set_params come free
| Need | Tool | Note |
|---|---|---|
| Route columns by type | ColumnTransformer | numeric vs categorical sub-pipelines |
| Exhaustive small grid | GridSearchCV | cartesian product; expensive |
| Many params, budget-bound | RandomizedSearchCV | often finds as-good with fewer fits |
| Transform the target too | TransformedTargetRegressor | e.g. log the target safely |
| Keep DataFrame columns | set_output(transform="pandas") | named outputs, easier debugging |
Interview Q&A · deep dive
BaseEstimator + TransformerMixin; declare all hyperparameters as __init__ args and store them unchanged (so get_params/set_params and cloning work); learn state in fit and store it on attributes ending in _; implement transform as pure given that state. Following this lets it slot into Pipelines, grid search, and cloning without surprises.HalvingRandomSearchCV) is even more efficient.TransformedTargetRegressor with a forward func (e.g. log1p) and its inverse (expm1). It applies the transform during fit and automatically inverts predictions, all inside CV, so the target transform is part of the estimator and never computed on the full dataset.set_output(transform="pandas") matter in real pipelines?Neural networks & deep learning foundations
A neural network is layered, differentiable, vectorised function approximation. Inputs flow forward through linear projections + non-linear activations, a loss compares output to truth, and backpropagation uses the chain rule to push gradients back so an optimiser nudges the weights. Everything else (CNNs, RNNs, Transformers) is a clever choice of layer.
| Architecture | Inductive bias | Lives at |
|---|---|---|
| MLP | universal approximator, no spatial/temporal prior | tabular features, embeddings |
| CNN | local spatial structure, translation invariance | images, signals, grid data |
| RNN / LSTM / GRU | sequential order, memory across steps | time series; mostly replaced by attention |
| Transformer | global token interaction via attention, parallel training | text, code, multimodal — every LLM you use |
for epoch in range(E):
for x, y in loader: # mini-batches
y_hat = model(x) # forward
loss = loss_fn(y_hat, y) # scalar
loss.backward() # gradients via autograd
optimiser.step(); optimiser.zero_grad()
| Lever | Default | What it does |
|---|---|---|
| Activation | ReLU hidden, softmax classification, sigmoid binary | introduces non-linearity |
| Loss | cross-entropy classification, MSE/MAE regression | what gradient descent is minimising |
| Optimiser | Adam(W) for almost everything; SGD+momentum for vision research | how weights step on the loss surface |
| Regularise | dropout, weight decay, early stopping, data aug | fight overfitting |
| Normalize | BatchNorm (CNNs) / LayerNorm (Transformers) | stabilise & speed training |
Interview Q&A
Backprop is just the chain rule run in reverse over a recorded computation graph. The forward pass computes activations and the scalar loss while autograd records every op; the backward pass walks that graph from the loss back to each parameter, multiplying local derivatives, to fill .grad. The optimiser then steps. Seeing the cycle as a loop — and where zero_grad sits — is what makes the framework code stop feeling magical.
import torch
from torch import nn
class MLP(nn.Module):
def __init__(self, d_in, d_h, d_out, p=0.1):
super().__init__()
self.net = nn.Sequential(
nn.Linear(d_in, d_h), nn.LayerNorm(d_h), nn.GELU(),
nn.Dropout(p), nn.Linear(d_h, d_out))
def forward(self, x): return self.net(x)
model = MLP(784, 256, 10)
opt = torch.optim.AdamW(model.parameters(), lr=3e-4, weight_decay=1e-2)
lossf = nn.CrossEntropyLoss() # expects raw logits, NOT softmax
for epoch in range(epochs):
model.train() # dropout/BN in train mode
for x, y in train_loader:
opt.zero_grad() # grads accumulate by default — clear them
logits = model(x)
loss = lossf(logits, y)
loss.backward() # chain rule fills every .grad
nn.utils.clip_grad_norm_(model.parameters(), 1.0) # tame exploding grads
opt.step()
model.eval() # turn off dropout for validation
with torch.no_grad(): # no graph -> less memory, faster
acc = evaluate(model, val_loader)
| Activation | Shape | Use / caveat |
|---|---|---|
| ReLU | max(0,x) | cheap default; "dying ReLU" on negatives |
| GELU / SiLU | smooth gate | Transformer default; better gradients |
| Sigmoid | (0,1) | binary output; saturates → vanishing grad |
| Softmax | probs sum 1 | multiclass output layer only |
| Tanh | (-1,1) | zero-centred; still saturates |
Interview Q&A · deep dive
backward() applies the chain rule from the loss back to each parameter, populating .grad. The optimiser's step() updates each weight using its gradient (and momentum/adaptive state for Adam). zero_grad() clears grads so the next batch starts clean.eval() mode; the framework scales activations so expected magnitudes match training.PyTorch · TensorFlow · the framework choice tools
Two frameworks dominate. PyTorch won research and is now the production default for most LLM/CV work; TensorFlow/Keras remains strong in established enterprise pipelines and on TPU. The differences narrowed (TF went eager, PyTorch added compile), so the senior answer is "depends on the team's stack and the deployment target" — but be ready to defend a choice.
| Concern | PyTorch | TensorFlow / Keras |
|---|---|---|
| Default mode | eager (define-by-run) | eager since 2.x; tf.function compiles to graph |
| Autograd | tensor.requires_grad + .backward() | GradientTape context manager |
| Ecosystem | Hugging Face, Lightning, vLLM, torch.compile | Keras 3 (now multi-backend), TF-Serving, TFX |
| Hardware | CUDA-first, Apple MPS, growing ROCm | CUDA + first-class TPU support |
| Deploy | TorchScript, ONNX, vLLM, Triton | SavedModel, TF-Lite (mobile), TF-Serving |
| Sweet spot | research, LLMs, custom models | large established pipelines, TPU, mobile |
import torch
from torch import nn, optim
class Net(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Sequential(nn.Linear(784,128), nn.ReLU(), nn.Linear(128,10))
def forward(self, x): return self.fc(x)
model, opt, lossf = Net(), optim.AdamW(Net().parameters()), nn.CrossEntropyLoss()
for x, y in loader:
opt.zero_grad()
lossf(model(x), y).backward()
opt.step()
Interview Q&A
The original split was define-by-run (PyTorch eager: build the graph as Python executes, easy to debug) vs define-and-run (old TF static graph: compile once, optimise hard, then feed data). That gap has largely closed from both sides: TF 2 made eager the default and uses @tf.function + XLA to recover graph speed; PyTorch added torch.compile (Dynamo + Inductor) to fuse and compile eager code, delivering ~30–60% speedups while you still write plain Python. So in 2026 the choice is less about "which can go fast" and more about ecosystem, deployment target, and team familiarity.
# --- PyTorch: imperative, .backward() walks the recorded graph ---
import torch
w = torch.zeros(3, requires_grad=True)
loss = ((X @ w - y) ** 2).mean()
loss.backward() # dL/dw lands in w.grad
# --- TensorFlow: record ops under a GradientTape, then ask for grads ---
import tensorflow as tf
w = tf.Variable(tf.zeros([3]))
with tf.GradientTape() as tape:
loss = tf.reduce_mean((X @ w - y) ** 2)
grad = tape.gradient(loss, w) # explicit grad request
# --- JAX: grad is a function transform; pure functions, no in-place state ---
import jax, jax.numpy as jnp
def loss_fn(w): return jnp.mean((X @ w - y) ** 2)
grad = jax.jit(jax.grad(loss_fn))(w) # compiled + differentiated
# PyTorch 2.x: one line for graph-level fusion on top of eager code
model = torch.compile(model) # Dynamo traces, Inductor fuses kernels
scaler = torch.amp.GradScaler("cuda")
for x, y in loader:
opt.zero_grad()
with torch.autocast("cuda", dtype=torch.bfloat16): # half-precision math
loss = lossf(model(x), y)
scaler.scale(loss).backward() # loss scaling avoids fp16 underflow
scaler.step(opt); scaler.update()
| If you... | Lean | Because |
|---|---|---|
| Start LLM/CV research today | PyTorch + HF | shortest research→prod path, biggest ecosystem |
| Deploy to mobile / edge | TF Lite | most mature on-device runtime |
| Train at scale on TPU | JAX or TF | first-class TPU + XLA performance |
| Want one code path, many backends | Keras 3 | swap TF/PyTorch/JAX under one API |
| Need max single-GPU throughput | PyTorch + compile | fused kernels, FlashAttention |
Interview Q&A · deep dive
torch.compile for PyTorch, tf.function/XLA for TF, and JAX is jit-compiled by design. You get eager ergonomics with graph-mode speed.torch.compile actually do?.backward(). JAX treats differentiation as a function transformation: grad(f) returns a new pure function computing the gradient, composable with jit (compile) and vmap (auto-batch). It demands pure, side-effect-free functions, which is more rigid but composes beautifully and shines on TPU.GradScaler). bf16 has fp32's exponent range so it usually needs no scaling, which is why it's the modern default on capable hardware.MLflow — track, version, and ship models mlops bridge
MLflow answers "which run produced this model, with what data and params, and how good was it?" Four components, but Tracking and the Model Registry are the ones you'll use daily.
| Component | Does |
|---|---|
| Tracking | logs params, metrics, and artifacts per run — the experiment journal |
| Models | a standard packaging format that serves anywhere |
| Model Registry | versioned models with stages: Staging → Production |
| Projects | reproducible, re-runnable packaging of the code |
import mlflow
with mlflow.start_run():
mlflow.log_param("C", 1.0)
mlflow.log_metric("f1", 0.91)
mlflow.sklearn.log_model(model, "model") # now reproducible + servable
Interview Q&A
MLflow's value is answering "which run, with what data/params/code, produced this model, and is it the one in prod?" A run logs params/metrics/artifacts under an experiment; the best run's model is registered as a versioned entry; that version is then pointed at by environments. Crucially, the old hard-coded stages (Staging/Production) have been deprecated since MLflow 2.9 in favour of free-form aliases (e.g. @champion) and tags — more flexible, multiple per version, no rigid state machine. MLflow 3 also added a first-class LoggedModel entity carrying its own metrics and params.
import mlflow
from mlflow import MlflowClient
mlflow.set_experiment("churn")
mlflow.sklearn.autolog() # params, metrics, model logged automatically
with mlflow.start_run() as run:
model.fit(X_train, y_train)
mlflow.log_metric("f1", f1)
info = mlflow.sklearn.log_model( # log + register together
model, name="model",
registered_model_name="churn-clf")
# promote by ALIAS instead of the deprecated stage transition
client = MlflowClient()
mv = client.get_latest_versions("churn-clf")[0]
client.set_registered_model_alias("churn-clf", "champion", mv.version)
client.set_model_version_tag("churn-clf", mv.version,
"validation", "passed")
import mlflow
# load whatever version currently holds the @champion alias
model = mlflow.pyfunc.load_model("models:/churn-clf@champion")
preds = model.predict(X_new)
# rollback = repoint the alias to an older version; no redeploy of app code
# client.set_registered_model_alias("churn-clf", "champion", "7")
| Old (deprecated) | Now | Why better |
|---|---|---|
| stage = "Production" | alias @champion | any name, multiple per version |
| transition_model_version_stage | set_registered_model_alias | no rigid state machine |
| stage as status flag | model version tags | validation=passed, owner, etc. |
| load by stage | models:/name@alias URI | swap version without touching app code |
Interview Q&A · deep dive
@champion, @challenger), you can set several on different versions, and you load via models:/name@alias so app code never hard-codes a version. Tags carry status metadata (e.g. validation=passed). It's a flexible labelling scheme instead of a constrained state machine.log_model call via registered_model_name). You attach an alias like @champion to the approved version. Serving loads models:/name@champion, so promotion and rollback are just repointing the alias — no application redeploy.mlflow.autolog() buy you, and what's the catch?log_metric calls for the numbers that drive promotion decisions.@champion on a held-out golden set, and only if it wins by a meaningful margin set @champion to the candidate (keeping the prior version for one-line rollback). Tag the version with the eval result and the commit SHA so every production model traces back to exact code, data, and metrics.Stats for interviews foundations
DS rounds test whether you reason about uncertainty. You don't need proofs — you need to wield distributions, hypothesis testing, and the line between correlation and causation correctly.
| Concept | The interview-ready version |
|---|---|
| Mean vs median | median resists outliers/skew; report it for skewed data |
| p-value | P(data this extreme | null true) — not P(hypothesis) |
| Confidence interval | a range of plausible values for the estimate |
| Correlation ≠ causation | a relationship isn't a cause; confounders lurk |
Interview Q&A
Every frequentist test is really one comparison: signal ÷ noise. The signal is the effect you saw (a difference in means, a lift in conversion); the noise is the standard error — how much that estimate would wobble across resamples. A t-statistic is literally effect / standard error, and the p-value just asks how far out in the null distribution that ratio lands. Internalise that and the whole zoo of tests collapses into one idea: is the effect big relative to its own uncertainty?
import numpy as np
from scipy import stats
rng = np.random.default_rng(42)
a = rng.normal(100, 15, 200) # control
b = rng.normal(104, 15, 200) # treatment (+4 true lift)
t, p = stats.ttest_ind(b, a, equal_var=False) # Welch: don't assume equal variance
diff = b.mean() - a.mean()
se = np.sqrt(b.var(ddof=1)/len(b) + a.var(ddof=1)/len(a))
ci = (diff - 1.96*se, diff + 1.96*se) # 95% CI for the difference
d = diff / np.sqrt((a.var(ddof=1) + b.var(ddof=1)) / 2) # Cohen's d
print(f"diff={diff:.2f} t={t:.2f} p={p:.4f}")
print(f"95% CI=({ci[0]:.2f}, {ci[1]:.2f}) Cohen's d={d:.2f}")
# report ALL of it: a tiny p with d=0.05 is statistically real, practically nothing
from statsmodels.stats.power import TTestIndPower
from statsmodels.stats.proportion import proportions_ztest
# How many users per arm to detect d=0.2 at 80% power, alpha 5%?
n = TTestIndPower().solve_power(effect_size=0.2, alpha=0.05, power=0.8)
print(round(n)) # ~394 per arm
# A/B on a binary metric (conversions) -> two-proportion z-test
conv = [182, 219] # control, treatment successes
total = [2000, 2000]
z, p = proportions_ztest(conv, total)
print(f"z={z:.2f} p={p:.4f}") # size FIRST, then peek once at the end
import numpy as np
rng = np.random.default_rng(0)
# Beta-Binomial: prior Beta(1,1) + observed successes/failures = posterior
post_a = rng.beta(1 + 182, 1 + (2000 - 182), 100_000)
post_b = rng.beta(1 + 219, 1 + (2000 - 219), 100_000)
print(f"P(B > A) = {(post_b > post_a).mean():.3f}") # a directly useful answer
print(f"expected lift = {(post_b - post_a).mean():.4f}") # with full uncertainty
| Frequentist | Bayesian |
|---|---|
| p-value: P(data | null) | posterior: P(hypothesis | data) |
| fixed sample size, peeking inflates error | can update continuously, but priors matter |
| answers “is it ≠ 0?” | answers “P(B beats A) and by how much?” |
| CI: 95% of such intervals cover truth | credible interval: 95% prob the value is inside |
Interview Q&A · deep dive
equal_var=False above.NLP — natural language processing text ML
NLP turns unstructured text into something a model can use. The pipeline is always: text → tokens → numeric features → model → task output. The last few years collapsed most of it onto transformers, but the classical stack still wins when data is small, latency is tight, or you need interpretability.
raw docs→ Tokenize
+ clean→ Represent
TF-IDF / embeddings→ Model
classifier / transformer→ Task
label / entities / answer
| Step / concept | What it is |
|---|---|
| Tokenization | split text into units (words / sub-words). Modern models use sub-word (BPE / WordPiece) so unknown words still encode. |
| Normalization | lowercasing, stop-word removal, stemming (chop to root) vs lemmatization (dictionary base form — cleaner). |
| Bag-of-Words / TF-IDF | count-based features; TF-IDF down-weights common words. Fast, interpretable, strong baseline. |
| Word embeddings | word2vec / GloVe map words to dense vectors where similar words are close — but one vector per word, no context. |
| Contextual embeddings | BERT / transformers give a different vector per usage (river “bank” vs money “bank”) — the modern default. |
| Common task | Example |
|---|---|
| Text classification | spam, sentiment, topic, intent |
| Named-entity recognition (NER) | pull people, orgs, drugs, sites from free text |
| Summarization / QA | condense a doc / answer from context (RAG) |
| Translation / generation | seq-to-seq with transformers |
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
clf = make_pipeline(
TfidfVectorizer(ngram_range=(1, 2), min_df=2), # words + bigrams
LogisticRegression(max_iter=1000))
clf.fit(train_texts, train_labels)
clf.predict(["phase 3 NSCLC trial terminated for futility"])
from transformers import pipeline
ner = pipeline("ner", grouped_entities=True) # pretrained model
ner("Dr. Jane Smith enrolled patients at Mayo Clinic.")
# -> [{'entity_group':'PER','word':'Jane Smith'}, {'ORG','Mayo Clinic'}]
| spaCy | fast production NLP — tokenize, POS, NER, pipelines |
| NLTK | teaching / classical building blocks |
| Gensim | topic modelling (LDA), word2vec |
| scikit-learn | TF-IDF + classical classifiers |
| Hugging Face | transformers for everything modern |
Interview Q&A
Word-level vocabularies explode (millions of words, every typo is “unknown”); character-level sequences are tiny in vocab but brutally long. Sub-word tokenization (BPE, WordPiece, SentencePiece) is the compromise that powers every transformer: it greedily merges frequent character pairs into a fixed ~30k–100k vocabulary, so common words stay one token while rare ones split into reusable pieces (tokenization → token + ##ization). Nothing is ever truly out-of-vocabulary, and morphology gets shared for free.
import numpy as np
from collections import Counter
docs = ["trial enrolled patients", "trial terminated early", "patients withdrew"]
toks = [d.split() for d in docs]
vocab = sorted({w for t in toks for w in t})
N = len(docs)
def tfidf(term, doc):
tf = doc.count(term) / len(doc) # freq in this doc
df = sum(term in d for d in toks) # docs containing term
idf = np.log((1 + N) / (1 + df)) + 1 # smoothed, sklearn-style
return tf * idf
M = np.array([[tfidf(w, d) for w in vocab] for d in toks])
print(vocab)
print(M.round(2)) # 'trial' is common -> low weight; 'withdrew' is rare -> high
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("all-MiniLM-L6-v2") # small, fast, 384-dim
corpus = ["study halted for safety",
"site activated in Boston",
"primary endpoint not met"]
emb = model.encode(corpus, convert_to_tensor=True)
q = model.encode("trial stopped due to adverse events", convert_to_tensor=True)
scores = util.cos_sim(q, emb)[0] # cosine similarity
best = scores.argmax().item()
print(corpus[best], float(scores[best])) # matches 'halted for safety' by meaning, not words
from transformers import pipeline
clf = pipeline("zero-shot-classification") # no training data needed
out = clf("The DSMB recommended stopping the study.",
candidate_labels=["safety", "efficacy", "enrollment"])
print(out["labels"][0]) # -> 'safety'
| Representation | Captures | Cost / limit |
|---|---|---|
| Bag-of-Words | word presence/counts | no order, no semantics, sparse |
| TF-IDF | distinctive words per doc | still no semantics; great baseline |
| word2vec / GloVe | static semantic similarity | one vector per word, no context |
| Transformer (BERT) | contextual meaning | compute & latency heavy |
Interview Q&A · deep dive
[CLS] token; its final hidden state is meant to summarise the sequence for classification. But for sentence similarity, raw [CLS] is mediocre — mean-pooling the token vectors (as Sentence-BERT does) usually gives better embeddings. The lesson: how you pool the per-token vectors into one sentence vector matters as much as the model.NLTK — the classical NLP toolkit text toolkit
NLTK (Natural Language Toolkit) is the long-standing library for classical NLP building blocks and learning. It's where you go for explicit tokenization, stemming, POS tagging, WordNet, and corpora — the explainable primitives beneath the transformer era. (Pairs with the NLP card.)
from nltk.tokenize import word_tokenize, sent_tokenize
text = "NLTK is great. It tokenizes text easily!"
sent_tokenize(text) # ['NLTK is great.', 'It tokenizes text easily!']
word_tokenize(text) # ['NLTK', 'is', 'great', '.', 'It', ...]
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
stop = set(stopwords.words("english"))
[w for w in words if w.lower() not in stop] # drop noise words
PorterStemmer().stem("studies") # 'studi' (crude, fast)
WordNetLemmatizer().lemmatize("studies") # 'study' (real word)
from nltk import pos_tag, ne_chunk
tags = pos_tag(word_tokenize("Apple opened a store in Paris"))
# [('Apple','NNP'), ('opened','VBD'), ('store','NN'), ('Paris','NNP')]
ne_chunk(tags) # groups: (ORGANIZATION Apple) ... (GPE Paris)
from nltk import bigrams, FreqDist
list(bigrams(["a", "b", "c"])) # [('a','b'), ('b','c')]
FreqDist(words).most_common(5) # top 5 words by count
from nltk.corpus import wordnet as wn
wn.synsets("car")[0].definition() # the meaning
wn.synsets("car")[0].lemma_names() # synonyms: car, auto, automobile
from nltk.sentiment import SentimentIntensityAnalyzer
SentimentIntensityAnalyzer().polarity_scores("I love this!")
# {'neg': 0.0, 'neu': 0.2, 'pos': 0.8, 'compound': 0.69}
| Library | Reach for it when |
|---|---|
| NLTK | learning, classical primitives, WordNet, quick prototyping |
| spaCy | fast production pipelines (tokenize, POS, NER, dependencies) |
| Hugging Face | state-of-the-art transformers for any modern task |
Interview Q&A
NLTK ships code but not corpora. The single most common “it doesn't work” with NLTK is a LookupError because punkt, stopwords, wordnet, or the POS tagger model isn't on disk. Download once, then the imports above work. Recent NLTK split the tokenizer data into punkt_tab, so pin what you download.
import nltk
for pkg in ["punkt_tab", "stopwords", "wordnet",
"averaged_perceptron_tagger_eng", "maxent_ne_chunker_tab", "words"]:
nltk.download(pkg, quiet=True) # run once; cached under ~/nltk_data
import nltk, string
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag, FreqDist
lem = WordNetLemmatizer()
stop = set(stopwords.words("english")) | set(string.punctuation)
def keywords(text, k=5):
toks = word_tokenize(text.lower())
tagged = pos_tag(toks)
# keep only nouns (NN*), lemmatize, drop stopwords
nouns = [lem.lemmatize(w) for w, t in tagged
if t.startswith("NN") and w not in stop]
return FreqDist(nouns).most_common(k)
print(keywords("The clinical trial enrolled patients across many trial sites."))
# -> [('trial', 2), ('patient', 1), ('site', 1)] (note: lemmatized + noun-filtered)
from nltk.stem import WordNetLemmatizer
lem = WordNetLemmatizer()
lem.lemmatize("better") # 'better' (default treats it as a noun)
lem.lemmatize("better", pos="a") # 'good' (told it's an adjective)
lem.lemmatize("running", pos="v") # 'run' (verb)
# real pipelines map the Penn-Treebank POS tag -> WordNet pos for accuracy
| Stemming | Lemmatization |
|---|---|
| rule-based suffix chop | dictionary (WordNet) lookup |
| fast, no POS needed | slower, wants the POS tag |
| output may not be a word (“studi”) | always a real lemma (“study”) |
| good enough for search/IR recall | use when output is shown or fed to NER |
Interview Q&A · deep dive
WordNetLemmatizer defaults to pos="n" (noun), and “better” as a noun is already a lemma. Pass pos="a" and you get “good”. The takeaway: lemmatization is POS-conditioned, so a real pipeline tags first, maps the Penn-Treebank tag to a WordNet pos, then lemmatizes — otherwise you silently get wrong base forms.Time-series forecasting temporal
Time-series data is ordered by time, so the rules change: observations aren't independent, you can't shuffle, and the cardinal sin is using the future to predict the past. Most series decompose into trend + seasonality + residual — model those and you're most of the way there.
| Approach | Use when |
|---|---|
| Classical (ARIMA / SARIMA) | single series, clear autocorrelation / seasonality, want a statistical model |
| Exponential smoothing (Holt-Winters) | trend + seasonality, simple robust baseline |
| Prophet | business series with holidays / seasonality, easy, good defaults |
| ML (lags → gradient boosting) | many series / extra covariates; turn time into features |
| Deep learning (LSTM / TFT) | long, many, complex series with rich covariates and enough data |
# reframe forecasting as supervised learning with lagged columns
for lag in [1, 7, 14]:
df[f"lag_{lag}"] = df["volume"].shift(lag) # past values as features
df["roll7"] = df["volume"].shift(1).rolling(7).mean()
from lightgbm import LGBMRegressor
X, y = df.drop(columns=["volume"]).dropna(), df["volume"]
model = LGBMRegressor().fit(X[:-30], y[:-30]) # train on the past only
from prophet import Prophet
m = Prophet(yearly_seasonality=True, weekly_seasonality=True)
m.fit(df.rename(columns={"date": "ds", "volume": "y"}))
forecast = m.predict(m.make_future_dataframe(periods=30))
Interview Q&A
ARIMA's I is “integrated” — the d in (p,d,q) is how many times you difference the series to kill the trend and make it stationary. Read the order off the autocorrelation plots: PACF cutting off after lag p suggests the AR order; ACF cutting off after lag q suggests the MA order. A formal ADF test (Augmented Dickey-Fuller) tells you whether you've differenced enough — a small p-value means “stationary, stop differencing.”
import pandas as pd
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA # modern import path
y = df.set_index("date")["volume"].asfreq("D") # regular daily index
p_adf = adfuller(y.dropna())[1]
if p_adf > 0.05:
y_d = y.diff().dropna() # difference once -> usually stationary
print("differenced; new ADF p =", round(adfuller(y_d)[1], 4))
model = ARIMA(y, order=(2, 1, 2)) # (p,d,q): AR=2, diff=1, MA=2
fit = model.fit()
fc = fit.get_forecast(steps=14) # 14-day horizon
print(fc.predicted_mean.round(1))
print(fc.conf_int().round(1)) # forecast WITH an uncertainty band
import numpy as np
from sklearn.model_selection import TimeSeriesSplit
tscv = TimeSeriesSplit(n_splits=5) # expanding train, future test — never shuffles
errs = []
for tr, te in tscv.split(X):
model.fit(X.iloc[tr], y.iloc[tr]) # train strictly on the past
pred = model.predict(X.iloc[te])
mape = np.mean(np.abs((y.iloc[te] - pred) / y.iloc[te])) * 100
errs.append(mape)
# the bar: beat the seasonal-naive baseline (last week's same day)
naive_mape = np.mean(np.abs((y - y.shift(7)) / y).dropna()) * 100
print(f"model MAPE={np.mean(errs):.2f}% naive MAPE={naive_mape:.2f}%")
from prophet import Prophet
from prophet.diagnostics import cross_validation, performance_metrics
m = Prophet(yearly_seasonality=True, weekly_seasonality=True,
changepoint_prior_scale=0.05) # higher = more flexible trend
m.add_country_holidays(country_name="US") # holidays as known regressors
m.fit(df.rename(columns={"date": "ds", "volume": "y"}))
cv = cross_validation(m, initial="365 days", period="30 days", horizon="30 days")
print(performance_metrics(cv)[["horizon", "mae", "mape"]].head())
| Symptom in ACF/PACF | Likely model term |
|---|---|
| ACF decays slowly, never cuts off | non-stationary → difference (raise d) |
| PACF cuts off after lag p | AR(p) component |
| ACF cuts off after lag q | MA(q) component |
| spike at the seasonal lag (e.g. 7, 12) | add seasonal terms → SARIMA |
Interview Q&A · deep dive
auto_arima (pmdarima) or AIC minimisation pick, then sanity-check residuals are white noise (Ljung-Box test) — if residuals still have structure, the model is under-specified.TimeSeriesSplit / walk-forward guarantees every test point lies strictly after its training data, so the offline metric actually estimates production performance.Computer vision image ML
Computer vision teaches models to extract meaning from pixels. The breakthrough idea is the convolution: small learnable filters slide over the image detecting edges → textures → shapes → objects, layer by layer. Today you rarely train from scratch — you transfer-learn from a pretrained backbone.
| Task | What it answers |
|---|---|
| Classification | what is in this image? (one label) |
| Object detection | what and where? (boxes — YOLO, Faster R-CNN) |
| Segmentation | which pixels belong to what? (masks — U-Net, SAM) |
| OCR | read text from an image / scan (Tesseract, vision transformers) |
| Concept | What it is |
|---|---|
| Convolution + pooling | filters detect local patterns; pooling shrinks & keeps the strongest signal → translation-tolerant features |
| CNN backbone | stacked conv layers (ResNet, EfficientNet) that learn a feature hierarchy |
| Transfer learning | take a model pretrained on millions of images, swap the head, fine-tune on your few thousand — the default |
| ViT · CLIP · SAM | modern attention-based vision; CLIP links images ↔ text; SAM segments anything |
import torch, torchvision as tv
model = tv.models.resnet50(weights="IMAGENET1K_V2") # pretrained
for p in model.parameters(): p.requires_grad = False # freeze backbone
model.fc = torch.nn.Linear(model.fc.in_features, num_classes) # new head
# now train only model.fc on your labelled images
import pytesseract, cv2
img = cv2.imread("scan.png", cv2.IMREAD_GRAYSCALE)
img = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)[1] # binarize first
text = pytesseract.image_to_string(img) # read the text
Interview Q&A
A conv layer is defined by a few hyperparameters that decide its receptive field and output size: kernel size (the patch each filter sees, e.g. 3×3), stride (how far it hops — stride 2 halves resolution), padding (zeros at the border to keep size), and channels (how many filters = depth of the output). Early layers (small receptive field) learn edges and color blobs; stacking layers grows the receptive field so deep layers “see” whole objects. Pooling (or strided conv) downsamples to buy translation tolerance and compute.
import torch
from torch import nn
class ConvBlock(nn.Module):
def __init__(self, c_in, c_out):
super().__init__()
self.net = nn.Sequential(
nn.Conv2d(c_in, c_out, kernel_size=3, padding=1), # same-size output
nn.BatchNorm2d(c_out), # stabilises & speeds training
nn.ReLU(inplace=True),
nn.MaxPool2d(2)) # halve H and W
def forward(self, x):
return self.net(x)
x = torch.randn(8, 3, 64, 64) # batch=8, RGB, 64x64
out = ConvBlock(3, 32)(x)
print(out.shape) # torch.Size([8, 32, 32, 32]) - depth up, size halved
import torch
from torchvision.models import efficientnet_v2_s, EfficientNet_V2_S_Weights
weights = EfficientNet_V2_S_Weights.DEFAULT # best available pretrained weights
model = efficientnet_v2_s(weights=weights)
preprocess = weights.transforms() # EXACT resize/normalize the model expects
for p in model.features.parameters():
p.requires_grad = False # freeze the backbone
model.classifier[1] = torch.nn.Linear(model.classifier[1].in_features, num_classes)
# train only the new head; later unfreeze top blocks at a low LR to fine-tune
from torchvision.transforms import v2
from torchvision.models.detection import (
fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights)
# augmentation: cheap regularisation that simulates real-world variation
train_tf = v2.Compose([v2.RandomResizedCrop(224), v2.RandomHorizontalFlip(),
v2.ColorJitter(0.2, 0.2), v2.ToDtype(torch.float32, scale=True)])
det = fasterrcnn_resnet50_fpn_v2(weights=FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT)
det.eval()
out = det([img_tensor]) # -> boxes, labels, scores per detection
| Task | Output | Typical loss / metric |
|---|---|---|
| Classification | one label | cross-entropy / top-1 accuracy |
| Detection | boxes + labels | box + class loss / mAP @ IoU |
| Segmentation | per-pixel mask | Dice / IoU (Jaccard) |
| OCR | text string | CTC loss / character error rate |
Interview Q&A · deep dive
F(x) + x, so the identity path lets gradients flow straight back and makes it trivial for a layer to learn “do nothing” if it isn't useful. That's what allowed 50/100+ layer networks to train at all — the foundation of every modern backbone.model.eval() matters — forgetting it is a classic bug.Probability essentials foundations
Probability is the grammar of uncertainty that every model speaks. A data scientist doesn't memorise formulas — they recognise which random variable generated the data, reach for the right distribution, and update beliefs with Bayes' theorem. Get the modelling assumption right and the maths follows; get it wrong and no amount of tuning saves you.
A random variable (RV) maps outcomes to numbers: a coin flip → {0,1}, a session → minutes watched. The distribution describes how probability mass spreads over those numbers — a PMF for discrete RVs (probability at each value) and a PDF for continuous ones (density, where probability is area under the curve, so P(X=x)=0 for any single point). The CDF F(x)=P(X≤x) works for both and is what you actually compute for tail probabilities. Two summary numbers carry most of the weight: expectation E[X] (the long-run average, the centre) and variance Var(X)=E[(X-μ)²] (the spread). The single most useful identity in practice is the computational form Var(X)=E[X²]-E[X]².
| Distribution | Models | Mean / Variance |
|---|---|---|
| Bernoulli(p) | one yes/no trial (a single click/convert) | p / p(1-p) |
| Binomial(n,p) | count of successes in n independent trials | np / np(1-p) |
| Poisson(λ) | count of rare events in a fixed window (arrivals/errors) | λ / λ |
| Normal(μ,σ²) | sums/averages of many small effects (the CLT magnet) | μ / σ² |
| Exponential(λ) | waiting time between Poisson events; memoryless | 1/λ / 1/λ² |
import numpy as np
from scipy import stats
rng = np.random.default_rng(0)
# 1) Closed-form moments straight from scipy frozen distributions
for name, dist in {
"Binom(10,0.3)": stats.binom(10, 0.3),
"Poisson(4)": stats.poisson(4),
"Expon(1/2)": stats.expon(scale=2), # scale = 1/lambda
}.items():
print(f"{name:14s} mean={dist.mean():.2f} var={dist.var():.2f}")
# 2) Tail probability via the CDF (no integral by hand)
p_busy = 1 - stats.poisson(4).cdf(7) # P(>7 arrivals)
print(f"P(more than 7 arrivals) = {p_busy:.3f}")
# 3) Central Limit Theorem: means of a SKEWED variable go Normal
pop = stats.expon(scale=2) # heavily right-skewed
means = pop.rvs(size=(20_000, 50), random_state=rng).mean(axis=1)
print(f"sample-mean: mean={means.mean():.2f} (≈2), "
f"std={means.std(ddof=1):.3f} (≈sigma/sqrt(n)={2/np.sqrt(50):.3f})")
# Shapiro on the MEANS is ~normal even though the raw data never is
print(f"normality p (of the means) = {stats.shapiro(means[:500]).pvalue:.3f}")
Bayes inverts a conditional: P(H|E) = P(E|H)·P(H) / P(E). In words, posterior ∝ likelihood × prior. The classic interview trap: a test is 99% accurate, you test positive — what's the chance you're actually sick? The answer hinges on the base rate people ignore. With a 1% prevalence, most positives are false positives because the healthy population is so much larger.
# P(sick)=0.01, sensitivity P(+|sick)=0.99, specificity P(-|healthy)=0.99
prior = 0.01
sens = 0.99 # true-positive rate
spec = 0.99 # true-negative rate
p_pos_sick = sens
p_pos_well = 1 - spec # false-positive rate = 0.01
# Law of total probability for the evidence P(+)
p_pos = p_pos_sick*prior + p_pos_well*(1 - prior)
posterior = p_pos_sick*prior / p_pos
print(f"P(sick | positive) = {posterior:.1%}") # only 50% !
# Re-test (now prior = 0.50) and the posterior jumps to ~99%:
print(f"after a 2nd positive = {(sens*posterior)/(sens*posterior + (1-spec)*(1-posterior)):.1%}")
Interview Q&A · deep dive
P(+|sick) with P(sick|+). With 1% prevalence the answer is only ~50% (see the code): among 10,000 people, ~99 true positives but ~99 false positives, so a positive is a coin flip. This is base-rate neglect; the posterior depends on prevalence, not just test accuracy. A confirmatory second test pushes it to ~99%.mean == variance == λ. If real data shows variance > mean (overdispersion, very common with bursty or correlated events) the Poisson under-states uncertainty and you switch to a Negative Binomial.P(X > s+t | X > s) = P(X > t) — having already waited s doesn't change the remaining wait. The continuous Exponential and the discrete Geometric are the only ones. It's why "the bus is overdue so it must come soon" is a fallacy under a memoryless arrival model, and it's the assumption baked into basic queueing/Markov models.E[X+Y] = E[X]+E[Y] always, even when X and Y are dependent (linearity of expectation — hugely useful). But Var(X+Y) = Var(X)+Var(Y)+2·Cov(X,Y); the covariance term vanishes only if they're uncorrelated. Forgetting the covariance term is how people miscompute portfolio risk or the variance of correlated metrics.Hypothesis testing inference
A hypothesis test is a structured way to decide whether a pattern is signal or noise. It is not a truth machine — it controls the rate at which you fool yourself. The card on t-ds-stats covers the t-test and p-value intuition; here we go deeper into choosing the right test, chi-square & ANOVA, the Type I/II / power triangle, and the multiple-comparisons problem that quietly invalidates most exploratory analyses.
Frequentist testing is one recipe with swappable parts. State a null H₀ ("no effect", the boring default) and an alternative H₁. Pick α (your tolerated false-positive rate, usually 0.05) before looking. Compute a test statistic that measures effect relative to noise, find where it lands in the null's sampling distribution — the p-value — and reject H₀ iff p < α. The whole game is that if H₀ were true, this procedure wrongly rejects only α of the time.
| Question / data | Test | Note |
|---|---|---|
| Mean of 2 groups, numeric | Welch's t-test | default; don't assume equal variance |
| Means of 3+ groups, numeric | One-way ANOVA | then post-hoc (Tukey) for which pair |
| Two categorical variables linked? | Chi-square independence | on a contingency table of counts |
| Observed counts vs expected | Chi-square goodness-of-fit | dice fairness, category mix |
| 2 groups, skewed / ordinal / outliers | Mann–Whitney U | nonparametric; tests distributions |
| Paired before/after, non-normal | Wilcoxon signed-rank | paired nonparametric |
import numpy as np
from scipy import stats
# --- Chi-square test of INDEPENDENCE: does plan tier relate to churn? ---
# churned retained
table = np.array([[ 90, 310], # free
[ 40, 560]]) # paid
chi2, p, dof, expected = stats.chi2_contingency(table)
print(f"chi2={chi2:.1f} dof={dof} p={p:.2e}") # tier and churn are dependent
# --- One-way ANOVA: do 3 landing pages have different time-on-site? ---
rng = np.random.default_rng(7)
A = rng.normal(60, 12, 120)
B = rng.normal(63, 12, 120)
C = rng.normal(60, 12, 120)
F, p = stats.f_oneway(A, B, C)
print(f"ANOVA F={F:.2f} p={p:.4f}") # omnibus: ANY page differs?
# --- Assumptions shaky (skew/outliers)? fall back to rank-based test ---
skewed_A = stats.expon(scale=5).rvs(200, random_state=rng)
skewed_B = stats.expon(scale=6).rvs(200, random_state=rng)
U, p = stats.mannwhitneyu(skewed_A, skewed_B, alternative="two-sided")
print(f"Mann-Whitney U={U:.0f} p={p:.4f}") # no normality assumed
| H₀ true (no effect) | H₀ false (real effect) | |
|---|---|---|
| Reject H₀ | Type I error (α) — false positive | Correct — power = 1−β |
| Fail to reject | Correct | Type II error (β) — false negative |
Four knobs trade off and you only freely pick three: α, power, effect size, and n. Lower α to cut false positives and you raise β (miss more real effects) unless you add sample size. Power analysis solves for n before you run anything — running an under-powered test is the most common way to waste a quarter and then wrongly conclude "no effect".
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
# n per arm to detect a small effect (d=0.2) at 80% power, alpha 5%
n = analysis.solve_power(effect_size=0.2, alpha=0.05, power=0.8)
print(round(n)) # ~394 per arm
# Flip it: with only 100/arm, what power do we actually have for d=0.2?
print(round(analysis.power(effect_size=0.2, nobs1=100, alpha=0.05), 2)) # ~0.29 — badly under-powered
from statsmodels.stats.multitest import multipletests
raw = [0.001, 0.013, 0.021, 0.04, 0.31, 0.55]
for method in ("bonferroni", "fdr_bh"):
reject, p_adj, *_ = multipletests(raw, alpha=0.05, method=method)
print(method, reject.sum(), "survive", np.round(p_adj, 3))
# Bonferroni keeps fewer; BH keeps more while still controlling false discoveries
Interview Q&A · deep dive
A/B testing & experiments causal
A randomised online experiment is the gold standard for causal inference at scale — randomisation makes the two arms exchangeable, so a difference in outcome is the treatment effect, confounders and all. The t-ds-stats card covers the maths of significance; this card is about the lifecycle and the traps that decide whether the number you ship is real: design, sizing, guardrails, the peeking problem, variance reduction, and the failure modes that silently corrupt results.
A trustworthy experiment runs a fixed path. Each stage has a way to go wrong, and the discipline is refusing to skip ahead — especially refusing to look at the result before the committed sample size.
First pick the OEC (Overall Evaluation Criterion) — one primary metric that captures success and is hard to game (revenue-per-user, not raw clicks which a clickbait change inflates). Then state the MDE (Minimum Detectable Effect): the smallest lift you'd care to ship. Sample size falls out of MDE, baseline variance, α, and power — smaller MDE means quadratically more users. You commit to this n before launch; that pre-commitment is what makes the later p-value valid.
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize
base, mde = 0.10, 0.005 # 10% baseline conversion, detect +0.5pp
es = proportion_effectsize(base + mde, base)
n = NormalIndPower().solve_power(effect_size=es, alpha=0.05, power=0.8,
alternative="two-sided")
print(f"need ~{n:,.0f} users PER ARM") # ~57k/arm — small lifts are expensive
Checking a fixed-horizon test repeatedly and stopping the instant p<0.05 inflates the real false-positive rate from 5% toward 30%+ — each peek is another roll of the dice. Two principled cures, both now standard in industry platforms:
| Technique | What it does | When |
|---|---|---|
| Sequential / always-valid p-values | p-values & CIs valid at every peek (group-sequential or mSPRT); you can stop early safely | you want to monitor live and stop as soon as a clear winner emerges |
| CUPED | uses pre-experiment data as a covariate to strip out predictable variance — same power with ~30–50% fewer users | you have stable pre-period metrics per user (most growth teams) |
| Fixed-horizon (classic) | pre-commit n, look once at the end | simple, when you can wait the full run |
import numpy as np
# CUPED: adjust the outcome Y using pre-period metric X (theta = Cov/Var)
rng = np.random.default_rng(1)
X = rng.normal(50, 10, 10_000) # pre-experiment spend
Y = 0.8*X + rng.normal(0, 6, 10_000) # in-experiment spend (correlated)
theta = np.cov(X, Y)[0, 1] / np.var(X)
Y_cuped = Y - theta*(X - X.mean()) # same mean, lower variance
print(f"variance Y={Y.var():.1f} -> CUPED={Y_cuped.var():.1f} "
f"({1 - Y_cuped.var()/Y.var():.0%} reduction)")
Beyond the primary metric, every experiment carries guardrail metrics — things that must not regress even for a winning change (latency, crash rate, unsubscribe rate, revenue when you optimise engagement). And before trusting any result you run automatic sanity checks, the most important being Sample-Ratio Mismatch (SRM): if you split 50/50 but observe 50.8/49.2 on millions of users, the randomisation or logging is broken and the whole result is void.
from scipy import stats
# SRM check: are the arm sizes consistent with the intended split?
obs = [501_200, 498_800] # users in control, treatment
expected = [sum(obs)/2]*2 # intended 50/50
chi2, p = stats.chisquare(obs, expected)
print(f"SRM check p={p:.4f}")
if p < 0.001: # very low threshold: SRM is a hard stop
print("SRM detected -> DO NOT trust the experiment; debug assignment/logging")
Interview Q&A · deep dive
AI / ML / LLM Engineering
Your home turf, organised for the panel. From "when do I even use ML" through RAG and agents to evaluation — the discipline a Principal QE role is hired to own. Real anchors: CI-Radar (RAG), the Dell ReAct bot (agents), the investigator-matching system (applied ML logic).
ML algorithm map — when to use what fundamentals
Match the algorithm to the problem shape and the data you have, not to hype. Start simple (linear/tree); reach for deep learning when data is large and unstructured (text, images).
| You have… | You want… | Reach for |
|---|---|---|
| labelled data, categories | predict a class | Logistic Reg, Random Forest, XGBoost |
| labelled data, numbers | predict a quantity | Linear Reg, Gradient Boosting |
| no labels | find groups | K-Means, DBSCAN, hierarchical |
| high-dim data | compress / visualise | PCA, t-SNE, UMAP |
| text / images / sequence | rich patterns | neural nets, transformers |
Interview Q&A
The table above answers "what fits"; this answers "in what order to think". Walk it top-down and you almost never reach for deep learning when a tree would have won. The senior reflex is to start with the cheapest model that could plausibly work and only climb when a held-out gap forces you to.
For structured / tabular data — rows and columns, the shape most businesses actually have — gradient-boosted decision trees still beat neural nets in 2026. A model isn't "better" because it's deeper; trees win here because tabular features have no spatial structure to exploit and boosting handles mixed types, missing values, and non-linear thresholds natively.
| Pick | Killer trait | When |
|---|---|---|
| XGBoost | battle-tested, max accuracy with tuning | you have time to tune and want the safest default |
| LightGBM | leaf-wise growth + GOSS → 3–10× faster | millions of rows, fast iteration, GPU training |
| CatBoost | ordered target encoding, no leakage | many categorical features, messy data, little tuning |
# Compare a simple baseline against a boosted tree the HONEST way:
# stratified k-fold CV so the accuracy gap is measured, not guessed.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, StratifiedKFold
from xgboost import XGBClassifier
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
models = {
"baseline_logreg": LogisticRegression(max_iter=1000),
"xgboost": XGBClassifier(n_estimators=400, max_depth=5,
learning_rate=0.05, subsample=0.8),
}
for name, clf in models.items():
scores = cross_val_score(clf, X, y, cv=cv, scoring="f1_macro")
print(f"{name:16s} f1={scores.mean():.3f} +/- {scores.std():.3f}")
# Rule: ship the simpler model unless the boosted tree's CV mean
# clears it by MORE than the std bands overlap. Beating noise != better.
Interview Q&A · deep dive
k up front; it's fast and scales. DBSCAN finds arbitrary shapes, decides the cluster count itself, and labels outliers as noise — but it's sensitive to its eps/min_samples and struggles when densities vary. Pick DBSCAN for spatial/anomaly data with noise; K-Means for clean, convex, large data.The AI stack — a clean mental model model
A useful analogy panels love: the LLM is the brain, RAG is open-book memory, tools/MCP are hands, and an agent is the brain that plans, acts with hands, and loops.
Interview Q&A
"Brain + memory + hands" is the analogy; the control loop is the mechanism. An agent is a while loop wrapped around an LLM: it reasons about a goal, picks a tool, executes it, feeds the observation back into context, and repeats until it decides it's done or a guardrail stops it. Everything advanced — multi-step research, coding agents, computer use — is this loop with better tools and stopping rules.
# An agent is a loop, not a library. This is the whole idea in <25 lines.
def run_agent(goal, tools, llm, max_steps=6):
messages = [{"role": "user", "content": goal}]
for step in range(max_steps):
reply = llm.chat(messages, tools=tools) # model plans / picks a tool
if not reply.tool_calls: # no tool wanted = it's answering
return reply.content # goal met -> exit the loop
for call in reply.tool_calls: # ACT
result = tools[call.name].run(**call.args) # call API / DB / code
messages.append({"role": "tool", # OBSERVE -> back into context
"name": call.name,
"content": str(result)})
return "stopped: hit max_steps without finishing" # guardrail beats infinite loop
| Capability you need | Layer that supplies it |
|---|---|
| fresh / private facts with citations | RAG (retrieval over your data) |
| act on the world (read/write APIs, DBs, code) | tools, discovered & called via MCP |
| state across a multi-step task | memory (scratchpad + conversation + long-term store) |
| decide what to do next, loop, recover | the agent control loop + a planner |
Interview Q&A · deep dive
The full AI stack — every layer, named production map
A production GenAI system is a stack of seven swappable layers. The senior signal in interviews and design reviews is being able to name real options at each layer and justify a pick — then swap one without rewriting the rest. This is that map, with the tools that actually ship in 2026.
| Layer | Job | How to choose |
|---|---|---|
| 1 · LLMs | the reasoning engine that generates the answer | capability vs cost vs latency — closed frontier (Claude / GPT / Gemini) for the hardest reasoning; open-weight (Llama / Qwen / DeepSeek / Mistral) for control, privacy, or price |
| 2 · Vector DB | stores embeddings, serves nearest-neighbour search | Chroma to prototype · pgvector if you already run Postgres · Qdrant / Milvus / Weaviate at scale · Pinecone for fully-managed · OpenSearch inside AWS |
| 3 · Embeddings | turn text into vectors so “similar” = “close” | OpenAI / Voyage / Cohere for managed quality · nomic or sentence-transformers (SBERT) for open + self-hosted · match the model to your domain & language |
| 4 · Data extraction | turn PDFs, web pages, docs into clean Markdown / JSON | Docling / Unstructured (self-host) · LlamaParse (best tables, LlamaIndex-native) · Firecrawl (web → Markdown, agent-friendly) · Crawl4AI (open crawler you control) |
| 5 · Open-LLM access | run or serve open-weight models | Ollama locally · vLLM for production throughput · Groq for ultra-low-latency inference · Together / Hugging Face for hosted endpoints + fine-tuning |
| 6 · Framework | glue: chunking, retrieval, tool-calling, agent loops | LlamaIndex if RAG / data-connectors are the core · LangChain for breadth of integrations · LangGraph for stateful / graph agents · Haystack for production pipelines · txtai for an all-in-one embeddings DB |
| 7 · Evaluation | prove a non-deterministic system is good enough to ship | RAGAS for RAG metrics (no ground truth needed) · DeepEval for pytest / CI gating · TruLens for tracing + feedback · Giskard for robustness / bias / risk testing |
| Closed / frontier (rent via API) | Open-weight (download & run) |
|---|---|
| Claude (Anthropic) · GPT-5 (OpenAI) · Gemini (Google) · Grok (xAI) | Llama 4 (Meta) · Qwen 3 (Alibaba) · DeepSeek V4 · Mistral (Large 3 / Small 4) · Gemma 4 (Google) · Phi-4 (Microsoft) · GLM-5 (Z.ai) |
| Best raw reasoning, safety & polish; no infra; you pay per token | Control, privacy, no per-token cost at scale; you own GPUs & ops; licence terms matter — prefer Apache-2.0 / MIT, check Llama's usage caps |
# Layers 5+1 · pull & run an OPEN model locally with Hugging Face
from transformers import pipeline
gen = pipeline("text-generation", model="Qwen/Qwen3-8B", device_map="auto")
gen("Summarise NCT01234567 in one line:", max_new_tokens=80)
# Layers 4+3+2+6 · a full RAG query engine in ~6 lines with LlamaIndex
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
docs = SimpleDirectoryReader("trials/").load_data() # 4 · extraction
index = VectorStoreIndex.from_documents(docs) # 3 embed + 2 store
qe = index.as_query_engine() # 6 · framework wires it
qe.query("Which phase-3 trials target NSCLC?") # 1 · LLM answers, grounded
Interview Q&A
The model table above lists families; here is the current pecking order so you can speak to it without being stale. Anthropic now ships a generation-plus-tier scheme: Claude Fable 5 is the flagship, with Opus 4.8 as the top reasoning workhorse, Sonnet 4.6 the balanced default, and Haiku 4.5 the speed/cost tier. Opus 4.8 and Sonnet 4.6 both serve a 1M-token context window generally; Haiku 4.5 is 200k. Across labs the pattern repeats: a flagship, a balanced mid-tier, and a cheap fast tier — interviewers want you fluent in that shape, not in one vendor's marketing.
| Tier | Anthropic (2026) | What it's for |
|---|---|---|
| Flagship | Claude Fable 5 · Opus 4.8 | hardest reasoning, agents, long-horizon coding |
| Balanced | Claude Sonnet 4.6 | most production traffic — quality near flagship, far cheaper |
| Fast / cheap | Claude Haiku 4.5 | high-volume, latency-sensitive, simple calls |
The seven layers describe the application stack; underneath sits the serving and ops layer that makes it survive production. This is where the real cost and latency wins live, and naming it is a senior tell.
| Concern | What you reach for | Why it matters |
|---|---|---|
| Throughput serving | vLLM · SGLang · TensorRT-LLM | continuous batching + paged KV cache → many× the tokens/sec of naive serving |
| Inference speedups | speculative decoding · quantization (FP8/INT4) | 2–3× faster decode and cheaper memory with negligible quality loss |
| Gateway / routing | LiteLLM · model router · semantic cache | one API across vendors, fallback, cost-based routing, cache identical calls |
| Observability | LangSmith · Langfuse · OpenTelemetry GenAI | trace prompts/tools/tokens/cost; you can't fix what you can't see |
| Guardrails | input/output filters · PII redaction · schema validation | block injection, leakage, and malformed tool calls before they act |
# Layer 8 in practice: one interface, many providers, graceful fallback.
# Route cheap traffic to Haiku, escalate hard prompts to Opus, fail over if down.
import litellm
litellm.set_verbose = False
ROUTES = {
"easy": "anthropic/claude-haiku-4-5", # cheap, fast tier
"hard": "anthropic/claude-opus-4-8", # flagship reasoning
}
def ask(prompt, difficulty="easy"):
primary = ROUTES[difficulty]
try:
r = litellm.completion(model=primary,
messages=[{"role": "user", "content": prompt}],
timeout=30)
return r.choices[0].message.content
except Exception: # provider hiccup / rate limit
r = litellm.completion(model="openai/gpt-5", # cross-vendor fallback
messages=[{"role": "user", "content": prompt}])
return r.choices[0].message.content
Interview Q&A · deep dive
Transformers & attention — under the hood the core mechanism
Every modern LLM is a stack of transformer blocks, and the engine inside each block is self-attention: a mechanism that lets every token look at every other token and decide what's relevant. Attention is the single highest-leverage deep concept for a GenAI interview.
# each token's Q dotted with every K -> relevance scores
scores = Q @ K.T / sqrt(d_k) # scale keeps gradients sane
weights = softmax(scores, axis=-1) # how much to attend to each token
output = weights @ V # weighted blend of values
# Attention(Q,K,V) = softmax(QK^T / sqrt(d_k)) . V
| Piece | Why it exists |
|---|---|
| Multi-head attention | run attention h times in parallel subspaces — different heads learn different relations (syntax, coreference, position) |
| Positional encoding | attention is order-blind, so positions are injected (sinusoidal / learned / RoPE) to encode sequence order |
| Feed-forward (MLP) | per-token non-linear transform after attention — where much of the stored "knowledge" lives |
| Residual + LayerNorm | skip connections + normalization keep very deep stacks trainable |
| Causal mask | in decoders, hides future tokens so each prediction only sees the past |
Interview Q&A
The formula above is the algebra; this is the plumbing. Trace a single token's vector as it becomes Q/K/V, scores against every other token, gets masked and softmaxed, and emerges as a context-mixed output — then remember every head does this in parallel and the block stacks dozens deep.
import numpy as np
def softmax(x, axis=-1):
x = x - x.max(axis=axis, keepdims=True) # subtract max -> numerically stable
e = np.exp(x)
return e / e.sum(axis=axis, keepdims=True)
def attention(Q, K, V, causal=True):
d_k = Q.shape[-1]
scores = Q @ K.T / np.sqrt(d_k) # (T,T) relevance, scaled
if causal: # decoder: hide the future
T = scores.shape[0]
mask = np.triu(np.ones((T, T)), k=1).astype(bool)
scores[mask] = -np.inf # softmax sends these to 0
weights = softmax(scores) # each row sums to 1
return weights @ V, weights # context-mixed output + attn map
T, d_k = 4, 8 # 4 tokens, head dim 8
Q = K = V = np.random.randn(T, d_k)
out, attn = attention(Q, K, V)
print(out.shape, attn[0]) # token 0 attends ONLY to itself (causal)
The naive form above materialises the full T×T score matrix — fine for 4 tokens, fatal at 100k. Modern stacks change the shape of attention to fight the O(n²) memory and the KV-cache bloat, without changing the math you'd describe in an interview.
| Technique | What it changes | Win |
|---|---|---|
| FlashAttention-3 | tiled, fused kernel; never writes the full score matrix to HBM | same result, far less memory + faster on Hopper GPUs |
| GQA (grouped-query) | many query heads share a few KV heads | de-facto standard; shrinks the KV cache with little quality loss |
| MLA (multi-head latent) | compress K/V into a low-rank latent before caching | ~90%+ KV-cache reduction (DeepSeek), often better quality than MQA/GQA |
| RoPE (rotary) | rotates Q/K by position → relative position is baked in | extrapolates to longer contexts; now standard (Llama, Mistral, Qwen) |
Interview Q&A · deep dive
i can only attend to ≤ i — required for autoregressive training/generation. You add -inf to those scores before softmax so they become exactly 0 weight afterward; masking after softmax would leak normalization mass from the future.How an LLM actually generates text mechanics
An LLM is an autoregressive next-token predictor: given the tokens so far it outputs a probability over the whole vocabulary for the next token, you pick one, append it, and repeat. Everything users feel — creativity, determinism, cost — falls out of this loop and how you sample from it.
| Stage | What happens |
|---|---|
| Tokenization (BPE) | text splits into sub-word tokens from a fixed vocabulary (~50–100k). Common words = 1 token; rare words split. "1 token ≈ 0.75 words" drives cost. |
| Embedding | each token id → a learned vector; positions added; fed through the transformer stack. |
| Logits → softmax | the final layer scores every vocab token; softmax turns scores into a probability distribution. |
| Sampling | choose the next token from that distribution — the knob you control. |
| temperature | flattens (high → creative) or sharpens (low → focused) the distribution; 0 ≈ deterministic |
| top-k | sample only from the k most likely tokens |
| top-p (nucleus) | sample from the smallest set whose probabilities sum to p |
| greedy / beam | always take the argmax / track several best sequences — precise, less diverse |
Interview Q&A
The chips above list the stages; this shows the two-phase reality that explains every latency number you'll ever debug. Prefill reads your whole prompt in one parallel pass (compute-bound) and fills the KV cache; decode then emits one token per step reusing that cache (memory-bandwidth-bound). Time-to-first-token comes from prefill; tokens-per-second comes from decode.
import numpy as np
def sample(logits, temperature=0.8, top_k=40, top_p=0.95):
if temperature == 0: # greedy: deterministic argmax
return int(np.argmax(logits))
logits = logits / temperature # scale BEFORE softmax
if top_k: # keep only k most likely
kth = np.sort(logits)[-top_k]
logits = np.where(logits < kth, -np.inf, logits)
probs = np.exp(logits - logits.max())
probs /= probs.sum()
if top_p: # nucleus: smallest set summing to p
order = np.argsort(probs)[::-1]
cum = np.cumsum(probs[order])
keep = order[cum <= top_p]
if len(keep) == 0: keep = order[:1] # always keep the top token
mask = np.ones_like(probs, dtype=bool); mask[keep] = False
probs[mask] = 0; probs /= probs.sum()
return int(np.random.choice(len(probs), p=probs))
| Knob / trick | What it does | Use it for |
|---|---|---|
| min-p | keeps tokens above (min_p × top-token prob) — threshold scales with confidence | creative output that stays coherent; robust at high temperature |
| repetition / frequency penalty | down-weights tokens already produced | stop the model looping the same phrase |
| speculative decoding | a small draft model proposes 5–8 tokens; the big model verifies them in parallel | 2–3× faster decode, identical output distribution |
| seed + temperature 0 | removes sampling randomness | reproducible evals and tests |
Interview Q&A · deep dive
min_p × the top token's probability, so the cutoff tightens when the model is confident and loosens when it's uncertain. That makes min-p more robust at high temperature — coherent but creative — which is why it's now in HF Transformers, vLLM, and Ollama.Training & adapting LLMs the lifecycle
A frontier model is built in stages, and "fine-tuning" means adapting one of them to your needs. Knowing the lifecycle — and when not to fine-tune — is a senior signal.
next-token on the web→ SFT
instruction / chat examples→ Align
RLHF / DPO→ aligned assistant
| Stage | What it teaches |
|---|---|
| Pretraining | raw next-token prediction over trillions of tokens → world knowledge + language. Hugely expensive; done once by labs. |
| SFT (instruction tuning) | fine-tune on (prompt, good answer) pairs so the model follows instructions instead of merely continuing text. |
| RLHF / DPO | align to human preference. RLHF trains a reward model then optimizes against it; DPO skips the reward model and optimizes preferences directly — simpler, now common. |
| How you'd adapt one | Cost / use |
|---|---|
| Full fine-tune | update all weights — powerful, expensive, risks catastrophic forgetting |
| LoRA / QLoRA (PEFT) | freeze the base, train tiny low-rank adapters (QLoRA also quantizes the base) — cheap, fast, swappable; the default |
| Distillation | train a small model to mimic a big one — cheaper inference |
Interview Q&A
Think of the three lifecycle stages as moving different knobs. Pretraining sets the model's knowledge and language priors; SFT sets which behaviour the model expresses from that knowledge (it follows instructions rather than autocompleting); alignment (RLHF/DPO) sets how it ranks competing good answers — tone, refusal, formatting, helpfulness. A LoRA fine-tune nudges the second and third knobs cheaply. It does not reliably move the first — you cannot LoRA in a fact the base never saw and expect recall; you can only make a behaviour the base can produce far more consistent.
LoRA's bet is that the change a task needs (ΔW) lives in a low-dimensional subspace, even though W itself is huge. So instead of learning a full d×d update, you learn two skinny matrices A (r×d) and B (d×r) whose product approximates ΔW, with r as small as 8–32. The effective update is scaled by α/r — alpha is a learning-rate-like gain on the adapter, not a separate capacity knob. QLoRA adds the orthogonal trick: hold the frozen base in 4-bit NF4 so the whole thing fits one GPU, while the adapters stay in higher precision. DoRA (weight-decomposed LoRA) splits each weight into magnitude + direction and only LoRA-adapts the direction — a near-free quality bump now exposed as a single flag in PEFT and Unsloth.
| Knob | 2025–26 default | What moving it does |
|---|---|---|
| rank r | 16 (8 light · 64 heavy) | adapter capacity; higher r = more to learn, more VRAM |
| alpha α | ≈ r (or 2r) | scales the update; treat α/r as the effective gain |
| target modules | all linear (q,k,v,o,gate,up,down) | all-linear beats attention-only with little extra VRAM |
| DoRA | on for hard tasks | decompose magnitude/direction → closer to full FT |
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from trl import SFTTrainer, SFTConfig
import torch
# 1) load the base in 4-bit NF4 — this is the "Q" in QLoRA
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True)
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B", quantization_config=bnb)
model = prepare_model_for_kbit_training(model)
# 2) attach low-rank adapters to ALL linear layers; DoRA via one flag
lora = LoraConfig(r=16, lora_alpha=16, lora_dropout=0.05, use_dora=True,
target_modules="all-linear", task_type="CAUSAL_LM")
model = get_peft_model(model, lora)
model.print_trainable_parameters() # e.g. trainable: 0.4% of 8B
# 3) train only the adapters on your (prompt, completion) pairs
trainer = SFTTrainer(model=model, train_dataset=ds,
args=SFTConfig(per_device_train_batch_size=4, num_train_epochs=2,
learning_rate=2e-4, bf16=True, output_dir="out"))
trainer.train()
model.save_pretrained("trial-adapter") # a few MB — version it like code
from trl import DPOTrainer, DPOConfig
# dataset rows: {"prompt": ..., "chosen": good_answer, "rejected": bad_answer}
# DPO turns the RLHF objective into a simple classification loss on pairs.
dpo = DPOTrainer(model=sft_model, ref_model=None, # ref_model=None reuses a frozen copy
train_dataset=pairs,
args=DPOConfig(beta=0.1, learning_rate=5e-6, output_dir="dpo"))
dpo.train() # ORPO collapses SFT+preference into one pass, dropping the ref model
Interview Q&A · deep dive
rank and alpha actually control, and what happens if you crank rank up?rank is the dimensionality of the low-rank update — the adapter's capacity. alpha scales the update; the effective gain is alpha/rank, so people often set α≈r or 2r to keep it stable. Cranking rank up adds capacity and VRAM but, on a small dataset, mostly buys overfitting and forgetting, not skill. Start at r=16, all-linear, and only raise it if held-out task metrics are still climbing.Inference & serving optimization fast & cheap
Training gets the headlines; inference is where the bill lives. Serving an LLM well is a latency-vs-throughput-vs-cost trade-off, and these are the levers a senior is expected to name.
| Lever | What it buys |
|---|---|
| Quantization (int8 / int4) | store weights in fewer bits (GPTQ, AWQ) → less memory, faster, cheaper; small accuracy hit |
| KV cache | reuse past keys / values so each token isn't recomputed from scratch |
| Continuous batching | pack many requests through the GPU together, filling slots as they free → large throughput gain |
| PagedAttention (vLLM) | manage KV-cache memory like OS paging → less waste, more concurrent requests |
| Speculative decoding | a small draft model proposes tokens a big model verifies in one pass → lower latency |
| Tensor / pipeline parallelism | split a model too big for one GPU across several |
| You care about… | Optimize for |
|---|---|
| Chat UX | latency + time-to-first-token (prefill, speculative decoding) |
| Batch / offline jobs | throughput (continuous batching, quantization) |
| Cost | tokens / sec / $ (quantization, smaller models, caching) |
Interview Q&A
Serving cost is governed by two scarce resources: compute (FLOPs) and memory bandwidth (GB/s). The split tracks the two phases. Prefill runs the whole prompt through the model in one big matmul — it's compute-bound and parallel, so it dominates time-to-first-token on long prompts. Decode emits one token at a time, each step reloading the entire model + KV cache from memory to produce a single token — it's memory-bandwidth-bound and sequential, so it dominates tokens-per-second. Almost every optimization is "make decode less bandwidth-starved" or "stop wasting KV memory so more requests fit."
| Prefill | Decode | |
|---|---|---|
| Work shape | all prompt tokens at once | one token per step, autoregressive |
| Bottleneck | compute (FLOPs) | memory bandwidth |
| Drives | time-to-first-token (TTFT) | inter-token latency, throughput |
| Helped by | flash attention, chunked prefill | batching, KV paging, quantization, spec decode |
Static (request-level) batching waits for a full batch, runs them together, and can't return any result until the slowest sequence finishes — short requests sit idle behind long ones. Continuous (iteration-level) batching schedules at the granularity of a single decode step: the moment any sequence emits its stop token, its slot is freed and a queued request takes its place that same iteration. The GPU stays saturated, and tail latency stops being hostage to the longest generation. vLLM pairs this with chunked prefill — slicing a long prompt's prefill into pieces interleaved with ongoing decodes — so one giant prompt no longer stalls every other user's token stream.
PagedAttention is the memory half. The KV cache for each sequence grows unpredictably; allocating a contiguous max-length buffer per request wastes most of the GPU on padding. PagedAttention stores KV in fixed-size blocks mapped through a block table — exactly like OS virtual memory pages — so memory is allocated on demand and shared. Identical prefixes (a shared system prompt across thousands of requests) can even point at the same physical blocks (prefix caching), cutting both memory and prefill.
from vllm import LLM, SamplingParams
# PagedAttention + continuous batching + chunked prefill are on by default.
llm = LLM(model="meta-llama/Llama-3.1-8B-Instruct",
quantization="awq", # 4-bit weight-only; or "fp8" on Hopper/Blackwell
gpu_memory_utilization=0.90, # headroom for the KV-cache block pool
max_model_len=8192,
enable_prefix_caching=True, # reuse KV for shared system prompts
speculative_config={"method": "eagle", "num_speculative_tokens": 5})
params = SamplingParams(temperature=0, max_tokens=256)
# Throw 1000 prompts at it; the scheduler batches them across decode steps.
outs = llm.generate(prompts, params) # engine fills/frees slots per iteration
for o in outs:
print(o.outputs[0].text)
Interview Q&A · deep dive
Prompting techniques — the senior catalogue technique
Prompting is interface design for a model. There are a dozen named techniques and the senior move is knowing which one resolves which failure mode — not citing a buzzword list. This card is the catalogue; the next two go deep on reasoning and production.
| Technique | What it is | Reach for it when |
|---|---|---|
| Zero-shot | instruction only, no examples | well-defined task, model already capable |
| Few-shot (ICL) | 2–8 input→output examples in the prompt | format or edge-cases hard to describe in words |
| Role / Persona | "You are a senior clinical-trial analyst…" | set tone, expertise, behavioural constraints |
| Delimiters / XML tags | <context>…</context> blocks | separate instructions / data / examples cleanly |
| Structured output | force JSON / schema / function call | downstream code parses the result |
| Prefilling | seed the start of the assistant turn | force a format (e.g. start with {) or character |
| Sampling controls | temperature, top-p, top-k, stop sequences | dial determinism vs diversity per task |
prompt = """You are a clinical-trial metadata extractor.
<rules>
- Return ONLY JSON matching the schema. No prose, no markdown fences.
- If a field is missing in the text, use null. Do not invent values.
</rules>
<schema>
{"phase": "string|null", "status": "string|null", "sponsor": "string|null"}
</schema>
<examples>
<ex>input: "Phase 2 trial, sponsored by Acme, currently recruiting."
output: {"phase":"2","status":"Recruiting","sponsor":"Acme"}</ex>
</examples>
<trial>
{doc}
</trial>"""
# temperature=0, then json.loads inside try/except with a repair re-prompt
Interview Q&A
Every named technique is really one of four moves: show the shape (few-shot examples pin format the words can't), separate the parts (delimiters keep instructions, data, and examples from bleeding), constrain the output (schema/structured mode so code can trust it), or set the frame (role/persona to fix tone and expertise). The senior skill is diagnosing the failure mode first, then reaching for the one move that fixes it — not stacking every technique because more feels safer. Each addition costs tokens, latency, and a chance to confuse the model.
prompt = """Classify the support ticket's intent. Return one label only.
<labels>billing | bug | feature_request | other</labels>
<examples>
ticket: "I was charged twice this month" -> billing
ticket: "The export button does nothing on iOS" -> bug
ticket: "Could you add dark mode?" -> feature_request
ticket: "ok thanks!" -> other
ticket: "App crashes AND I want a refund" -> bug # bug wins over billing
</examples>
ticket: "{text}"
-> """
# The last example resolves an ambiguity prose would argue about forever:
# a tie-break rule shown once is worth a paragraph of instruction.
messages = [
{"role": "system", "content":
"You are a senior SRE. Be terse. Never speculate; say 'insufficient data' if unsure."},
{"role": "user", "content":
"Triage this alert in 3 numbered steps: (1) likely cause (2) blast radius "
"(3) first action.\n\n<alert>{payload}</alert>"}
]
# temperature=0 for stable triage; stop=["\n4."] guarantees exactly 3 steps
# so the parser downstream never has to handle a runaway 4th line.
resp = client.chat(messages=messages, temperature=0, stop=["\n4."])
Interview Q&A · deep dive
<document>…</document>) gives the model an unambiguous boundary between instructions and data, which sharply reduces the model treating content as commands — the first line of prompt-injection defense. Second, tags are machine-addressable: you can tell the model to put its answer in <result> tags and extract deterministically. Models tuned on tagged formats (Claude especially) follow structure better than prose markers.{ or Here are the three options:). The model continues from there, which cheaply forces a format or skips a preamble without a longer instruction. It shines for JSON-only output and for steering past a hedging opener. Caveat: providers that enforce structured outputs at the decoding layer make prefilling for JSON largely unnecessary — prefer the mode when it exists.Reasoning prompts — CoT, ToT, Reflexion & friends reasoning
When a task needs multi-step thought, you don't make the model "smarter" — you give it space and structure to reason. The named techniques below are different shapes of that space, each with a known win condition.
| Technique | One-line idea | Best for |
|---|---|---|
| Chain-of-Thought (CoT) | show worked steps before the answer | arithmetic, multi-hop, extraction logic |
| Zero-shot CoT | append "Let's think step by step." | cheapest reasoning lift, no examples needed |
| Self-Consistency | sample N CoT paths, majority-vote the answer | tasks with a single right answer; trades cost for accuracy |
| Tree of Thoughts (ToT) | branch & evaluate alternative reasoning paths | planning, puzzles, search-like problems |
| Least-to-Most | decompose into sub-problems, solve in order | complex tasks made of simpler ones |
| Step-Back | ask the abstract/general question first | retrieve principles before applying them |
| Generated Knowledge | have the model state relevant facts first | knowledge-light tasks; primes the answer |
| Reflexion / self-critique | generate → critique → revise loop | quality-sensitive output; tolerates 2–3× latency |
| Meta-prompting | ask the model to write the prompt | bootstrapping or hard-to-articulate tasks |
| Prompt chaining | pipeline of small prompts, each focused | multi-stage flows; debuggable; cacheable |
from collections import Counter
def vote(question, n=8):
answers = []
for _ in range(n):
out = llm(f"{question}\n\nLet's think step by step.",
temperature=0.7) # diversity needed for voting
answers.append(parse_final_answer(out))
return Counter(answers).most_common(1)[0][0] # majority vote
Interview Q&A
Chain-of-Thought is the primitive — make the model write its reasoning before the answer so it has working space. The named techniques are different search strategies over CoT: Self-Consistency samples many independent chains and votes (ensemble, no structure); Tree of Thoughts branches, evaluates partial states, and prunes (explicit search); ReAct interleaves a thought with an action in the world and an observation back (CoT + tools); Reflexion wraps a generate→critique→retry loop (CoT + feedback memory). Pick by the shape of the problem: ensemble for "one right answer, want reliability," search for "many paths, need planning," tools for "needs external state," feedback for "first draft is rarely good enough."
The big shift: models like o1/o3 and DeepSeek-R1 are trained to reason and spend test-time compute internally. On those models you should not hand-write "let's think step by step" or stack heavy CoT scaffolding — they already do it, and over-prompting can hurt. The technique catalogue still matters, but its center of gravity moved: explicit CoT/Self-Consistency are most valuable on non-reasoning models, while on reasoning models you instead control the reasoning effort/budget and keep the prompt clean. The senior tell is knowing which model you're on before reaching for a technique.
| Technique | Win condition | Cost / caveat |
|---|---|---|
| Zero-shot CoT | cheap lift on non-reasoning models | noise on reasoning models; can derail simple tasks |
| Self-Consistency | single correct answer, want reliability | N× calls; needs temperature > 0 for path diversity |
| Tree of Thoughts | planning/puzzles, search-shaped | 10–100× cost; complex to operate |
| ReAct | needs external tools/state | cheapest agentic loop; can loop forever w/o limits |
| Reflexion | quality-critical, draft rarely good | 2–3× latency; needs a real critique signal |
def react(question, tools, max_steps=6):
transcript = f"Question: {question}\n"
for _ in range(max_steps):
# model emits a Thought then either an Action or a Final Answer
step = llm(transcript + "Thought:", stop=["Observation:"], temperature=0)
transcript += "Thought:" + step
if "Final Answer:" in step:
return step.split("Final Answer:")[-1].strip()
name, arg = parse_action(step) # e.g. search("vLLM paged attention")
obs = tools[name](arg) # ground the next thought in reality
transcript += f"\nObservation: {obs}\n"
return "insufficient steps" # always bound the loop
def reflexion(task, check, max_tries=3):
draft = llm(task)
for _ in range(max_tries):
verdict = check(draft) # tests/linter/eval — a REAL signal, not vibes
if verdict.ok:
return draft
# feed the concrete failure back as reflective memory
draft = llm(f"{task}\n\nYour last attempt failed: {verdict.reason}\n"
f"Diagnose why, then produce a corrected version.")
return draft
Interview Q&A · deep dive
Production prompting — structured outputs, DSPy & safety prod
Once prompts ship, the work shifts from "what to write" to "how to operate": typed contracts with the model, programmatic prompt construction, versioning, evaluation gates, and defending the prompt boundary against injection.
| Production lever | What it gives you |
|---|---|
| Function calling / tool use | provider-enforced schema for tool arguments — no parsing |
| JSON mode / structured outputs | provider-enforced JSON validity at the decoding layer |
| Pydantic-typed responses | schema = code; validation = first-class |
| DSPy | compile prompts from declarative signatures; auto-optimise demos |
| Prompt versioning | git-tracked templates with eval scores per version |
| System / user / assistant separation | instructions in system, untrusted data in user; never blend |
from pydantic import BaseModel, Field
class TrialMeta(BaseModel):
phase: str | None = Field(description="Phase 1/2/3/4 or null")
status: str | None
sponsor: str | None
# provider-side schema enforcement — no regex, no JSON-repair gymnastics
resp = client.responses.parse(model="gpt-5",
input=prompt, response_format=TrialMeta)
trial: TrialMeta = resp.output_parsed # typed object, validated
Interview Q&A
"Get JSON out" hides a precision ladder, and senior answers name where the guarantee comes from. Layer 1 (weakest): prompt pleading — "respond only in JSON" — no guarantee, needs a parse + repair loop. Layer 2: JSON mode — the provider forces syntactically valid JSON but not your schema, so you still validate fields. Layer 3 (strongest): constrained decoding / structured outputs — the provider compiles your JSON Schema into a grammar (a finite-state machine) and restricts the token sampler at every step so the model literally cannot emit a token that violates the schema. Strict tool use is the same mechanism applied to tool-call arguments. This is now table stakes: OpenAI shipped strict structured outputs in 2024, Anthropic shipped constrained decoding for Claude in Nov 2025, and grammar backends like XGrammar are the default in vLLM/SGLang/TensorRT-LLM.
| Layer | Guarantee | You still must… |
|---|---|---|
| Prompt only | none | parse, repair, retry |
| JSON mode | valid JSON syntax | validate your schema/types |
| Structured outputs (strict) | schema-conformant tokens | handle refusals & semantic correctness |
from pydantic import BaseModel, Field
from enum import Enum
class Intent(str, Enum): # enum → strict mode forbids any other value
billing = "billing"; bug = "bug"; feature = "feature"; other = "other"
class Ticket(BaseModel):
intent: Intent
severity: int = Field(ge=1, le=5) # bounds the model can't violate
summary: str = Field(max_length=120)
needs_human: bool
# provider enforces the schema during decoding — no regex, no repair loop
resp = client.responses.parse(model="gpt-5.1", input=prompt, text_format=Ticket)
ticket: Ticket = resp.output_parsed # typed, validated object
if ticket.needs_human: escalate(ticket)
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-5.1"))
class Triage(dspy.Signature):
"""Classify a support ticket and flag escalations."""
ticket: str = dspy.InputField()
intent: str = dspy.OutputField()
needs_human: bool = dspy.OutputField()
triage = dspy.ChainOfThought(Triage)
# GEPA (2025) reflectively evolves the instruction+demos against your metric;
# MIPROv2 Bayesian-searches instruction/demo combos. Either COMPILES the prompt.
compiled = dspy.GEPA(metric=accuracy).compile(triage, trainset=labeled, valset=dev)
Interview Q&A · deep dive
strict: true — the provider validates the tool name and arguments against your schema at decode time, killing a whole class of parsing and type bugs before any tool runs. For fixed-shape extraction: Pydantic-typed structured outputs — one declarative schema is both code and contract. JSON mode is the floor: it forces valid JSON syntax but not your schema, so you still validate types and fields downstream. The 2025–26 reality is that strict modes exist on the major providers, so prompt-only JSON should be a fallback, not a default.Embeddings & vector databases retrieval
An embedding maps text to a vector so that semantic similarity ≈ geometric closeness (cosine similarity). A vector DB indexes millions of these for fast approximate-nearest-neighbour (ANN) search — the retrieval engine under RAG and semantic search.
Interview Q&A
An embedding model is a learned function that projects text into a few hundred to a few thousand dimensions where each axis is a latent feature the model invented during training. You never read the axes — what carries meaning is the angle between vectors. "Heart attack" and "myocardial infarction" land in nearly the same direction even though they share no characters, which is exactly what keyword search cannot do. The model is frozen at query time: same text in, same vector out, so you can precompute and store them.
Three distance metrics show up, and the choice is not cosmetic. If your vectors are L2-normalised (unit length, which most modern text embedders are), cosine and dot product rank results identically and Euclidean becomes a monotonic function of cosine — so the "which metric" debate often collapses to "did you normalise?".
| Metric | Formula (intuition) | Sensitive to | Use when |
|---|---|---|---|
| Cosine | angle between vectors, magnitude divided out | direction only | text embeddings (the default) |
| Dot product | cosine × both magnitudes | direction and length | vectors already normalised, or magnitude is a signal |
| Euclidean (L2) | straight-line distance in space | absolute position | spatial / non-normalised features |
| Index | How it searches | Build / memory | Best for |
|---|---|---|---|
| HNSW | greedy walk down a multi-layer proximity graph | slow build, ~3–4× the RAM of IVF at 1M vectors | low-latency online queries, high recall |
| IVF / IVF-Flat | cluster into lists, probe the nearest nprobe lists | fast build, low memory | huge static corpora, batch, tight RAM |
HNSW exposes two knobs that are the recall/latency dial: ef_construction (graph quality at build) and ef_search (how wide the walk explores at query). Raising ef_search buys recall for latency with no re-index. IVF's equivalent is nprobe. Quantization (scalar/product/binary) then shrinks each vector 4×–32× so the index fits in RAM, trading a little recall for big cost savings — the standard move past ~10M vectors.
# pgvector turns Postgres into a vector DB — no new datastore to operate.
import psycopg, numpy as np
from openai import OpenAI # any embedder works; vectors must be L2-normalised
client = OpenAI()
def embed(text):
v = client.embeddings.create(model="text-embedding-3-small", input=text).data[0].embedding
v = np.array(v); return (v / np.linalg.norm(v)).tolist() # normalise → cosine == dot
db = psycopg.connect("dbname=rag")
db.execute("CREATE EXTENSION IF NOT EXISTS vector")
db.execute("""CREATE TABLE IF NOT EXISTS chunks(
id bigserial PRIMARY KEY, source text, body text, embedding vector(1536))""")
# cosine index; build AFTER bulk load so the graph is built once
db.execute("CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops)")
def search(q, source, k=5):
qv = embed(q)
# <=> is cosine distance in pgvector; filter FIRST, then rank by similarity
rows = db.execute(
"SELECT body, 1 - (embedding <=> %s::vector) AS sim "
"FROM chunks WHERE source = %s ORDER BY embedding <=> %s::vector LIMIT %s",
(qv, source, qv, k)).fetchall()
return rows
| Strategy | Idea | Tradeoff |
|---|---|---|
| Fixed-size + overlap | N tokens, slide with 10–20% overlap | simple, cheap; splits mid-thought |
| Recursive / structural | split on headings → paragraphs → sentences | respects document shape; needs clean structure |
| Semantic | break where adjacent sentence embeddings diverge | coherent chunks; extra embedding cost at ingest |
Interview Q&A · deep dive
ef_search at query time — it widens the graph traversal, lifting recall at the cost of latency, with no re-index. If that plateaus, rebuild with higher ef_construction/M for a denser graph (slower build, more RAM). Tune query-side first because it's free to revert.RAG architecture flagship
Retrieval-Augmented Generation grounds an LLM in your data: retrieve relevant chunks, inject them into the prompt, generate an answer with citations. It's the standard cure for hallucination and stale knowledge.
def answer(question):
qv = embed(question)
hits = vstore.search(qv, k=8, filter={"registry": "ctgov"})
context = "\n\n".join(f"[{h.id}] {h.text}" for h in hits)
prompt = ("Answer ONLY from context. Cite [ids]. "
"If not in context, say you don't know.\n\n"
f"Context:\n{context}\n\nQ: {question}")
return llm(prompt, temperature=0)
Interview Q&A
RAG is an offline indexing job bolted to an online answering loop, and almost every production failure lives in the seam between them. The indexing side (load → chunk → embed → store) decides what is possible to retrieve; the answering side (embed query → filter → search → rerank → augment → generate → cite) decides what actually surfaces. Treat retrieval quality and generation faithfulness as two separate metrics with two separate fixes — conflating them is the number-one reason teams thrash on RAG.
The grounding contract is enforced in the prompt, not the model: instruct it to answer only from context, to cite chunk ids, and to say "I don't know" when the answer isn't present. Without an explicit "don't know" escape hatch, the model fills gaps with its parametric memory — which is precisely the hallucination RAG was meant to remove.
| Stage | Job | Failure if skipped/wrong |
|---|---|---|
| Ingest | load & clean source docs | boilerplate/nav text pollutes chunks |
| Chunk | split into retrievable units | facts split across chunk boundaries |
| Embed + Index | vectorise, build ANN index | poor recall ceiling |
| Retrieve | filter + ANN top-k | wrong or missing evidence |
| Rerank | cross-encoder reorders top-N | best chunk buried below the budget cut |
| Augment | assemble system + context + question | "lost in the middle", token overflow |
| Generate | answer strictly from context | hallucination, ignored evidence |
| Cite + Evaluate | attach sources, score faithfulness | untrustworthy, unmeasurable answers |
def rag_answer(question, vstore, llm, k=20, keep=5):
qv = embed(question)
hits = vstore.search(qv, k=k, filter={"lang": "en"}) # over-fetch for the reranker
ranked = rerank(question, hits)[:keep] # cross-encoder → precision
if not ranked or ranked[0].score < 0.2: # weak context → abstain, don't guess
return {"answer": "No grounded answer found.", "cited": []}
context = "\n\n".join(f"[{h.id}] {h.text}" for h in ranked)
msgs = [
{"role": "system", "content":
"Answer ONLY from CONTEXT. Cite ids like [3]. If absent, say you don't know."},
{"role": "user", "content": f"CONTEXT:\n{context}\n\nQUESTION: {question}"},
]
out = llm.chat(msgs, temperature=0) # temp 0 → deterministic, faithful
cited = [h.id for h in ranked if f"[{h.id}]" in out] # verify claims trace to evidence
return {"answer": out, "cited": cited}
Interview Q&A · deep dive
Advanced RAG — make retrieval actually work deep
Naive RAG (embed → top-k → stuff context) fails in predictable ways: wrong chunks, missed facts, or right facts buried where the model ignores them. Advanced RAG is the set of fixes a senior reaches for, grouped by where in the pipeline the problem lives.
| Stage | Technique | Fixes |
|---|---|---|
| Chunking | semantic / recursive / parent-doc ("small-to-big") | chunks that split mid-thought; too coarse vs too fine |
| Retrieval | hybrid = BM25 (keyword) + dense (vector), fused | vector misses exact IDs/codes; keyword misses meaning |
| Query | rewriting, multi-query, HyDE, RAG-fusion | vague/under-specified user queries |
| Rerank | cross-encoder reranker on the top-N | bi-encoder recall is noisy; precision @ top-k |
| Filter | metadata pre-filter (date, registry, type) | scanning irrelevant partitions of the index |
| Assemble | dedupe, order by relevance, fit budget | "lost in the middle" — models ignore mid-context |
| Pattern | What it is |
|---|---|
| Hybrid search | run keyword + vector, combine scores (Reciprocal Rank Fusion). The single highest-ROI upgrade. |
| HyDE | generate a hypothetical answer, embed that, retrieve on it — closes the question/answer vocabulary gap. |
| Reranking | cheap bi-encoder fetches 50, an accurate cross-encoder reorders to the best 5. |
| Self-RAG / CRAG | the model grades retrieval and retries/abstains if context is weak — where RAG meets agents. |
| GraphRAG | retrieve over a knowledge graph for multi-hop / "connect-the-entities" questions. |
| Contextual retrieval | prepend a short doc-level summary to each chunk before embedding — big recall gain. |
Interview Q&A
A chunk pulled out of its document loses the context that made it findable — "the company grew 3%" doesn't say which company or which quarter. Contextual Retrieval fixes this at ingest: before embedding, prepend a 50–100 token LLM-generated blurb situating the chunk in its document, then build both the embedding and the BM25 index on the augmented chunk. Anthropic measured that contextual embeddings cut failed retrievals ~35%, adding contextual BM25 reaches ~49%, and combining with reranking reaches ~67% (top-20 failure rate 5.7% → 1.9%). Prompt-caching the document makes generating per-chunk context cheap.
| Reranker | Type | Note |
|---|---|---|
| Cohere Rerank 3.5 | hosted cross-encoder | strong multilingual, ~600ms latency class |
| Voyage rerank-2.5 | hosted, instruction-following | balanced quality/latency for agents |
| bge-reranker-v2-m3 | open-source | self-host, data stays in-house |
| mxbai-rerank (Qwen2.5) | open-source | RL-trained cross-encoder, deployable |
Why a second stage exists at all: first-stage bi-encoders embed query and document separately, so they're fast and indexable but lose cross-term interaction. A cross-encoder feeds query+document together through the model and scores true relevance — far more accurate but O(N) per query, so you only run it on the ~20–50 candidates the bi-encoder already surfaced.
Vector RAG retrieves chunks that look like the query; it cannot answer "what connects A to D through B and C?" because the connecting chunks aren't individually similar to the question. GraphRAG (Microsoft) extracts entities and relationships into a knowledge graph, then uses the Leiden algorithm to cluster it into hierarchical communities with LLM-written summaries. Global search reasons over community summaries for corpus-wide thematic questions; local search expands from specific entities to neighbours for fact lookups; DRIFT blends both. It costs far more to build than vector RAG — reserve it for genuinely multi-hop, connect-the-entities problems.
# RRF fuses two ranked lists without tuning score scales — the workhorse of hybrid search.
def reciprocal_rank_fusion(rankings, k=60):
# rankings: list of lists, each a ranked list of doc ids (e.g. [bm25_ids, dense_ids])
scores = {}
for ranked in rankings:
for rank, doc_id in enumerate(ranked):
scores[doc_id] = scores.get(doc_id, 0.0) + 1.0 / (k + rank + 1)
return sorted(scores, key=scores.get, reverse=True)
bm25_hits = lexical.search(query, k=50) # exact IDs, codes, rare tokens
dense_hits = vstore.search(embed(query), k=50) # paraphrase, semantics
fused = reciprocal_rank_fusion([bm25_hits, dense_hits])[:20]
final = rerank(query, fused)[:5] # cross-encoder picks the best 5
# Questions and answers use different vocab; embedding a drafted answer closes that gap.
def hyde_retrieve(question, llm, vstore, k=8):
draft = llm.chat([{"role": "user",
"content": f"Write a short, plausible passage answering: {question}"}])
# embed the hypothetical doc, not the question — it lives in 'answer space'
return vstore.search(embed(draft), k=k)
Interview Q&A · deep dive
k (~60) damps the influence of very top ranks so a single list can't monopolise the result.Agentic patterns — ReAct, planning, multi-agent flagship
An agent gives the LLM a control loop and tools. The workhorse is ReAct (Reason + Act): the model thinks, picks a tool, observes the result, and repeats until done.
| Pattern | Shape | Use when |
|---|---|---|
| ReAct | think→act→observe loop | tool use, lookups, interactive tasks |
| Plan-and-execute | plan all steps, then run | complex multi-step jobs |
| Multi-agent | specialised agents + orchestrator | distinct roles (researcher/writer/checker) |
| Reflection | generate→critique→revise | quality-critical output |
Interview Q&A
Strip away the framework and an agent is a while-loop around the model: keep calling the LLM, let it emit a tool call, execute it, feed the result back, repeat until it emits a final answer or you hit a guardrail. Modern tool-use APIs make the loop explicit — the model returns a structured tool_use block, you run it, and return a tool_result. ReAct is this loop with the model's reasoning ("Thought") interleaved; the framework is sugar over the same control flow.
The single hardest engineering problem is context management. Every step appends thought + action + observation, so a 15-step task can blow the window with stale tool output. Senior agents prune, summarise, or offload old observations to external memory — "context engineering" is now as important as prompt engineering.
| Memory | Holds | Stored in |
|---|---|---|
| Working / short-term | current task scratchpad & tool results | the context window |
| Episodic | past interactions/events ("last time we…") | a vector store, retrieved on demand |
| Semantic | distilled facts/preferences about the world/user | a DB / knowledge store |
| Procedural | how-to skills, SOPs, learned workflows | prompts, tools, or fine-tuned weights |
Reflection (Reflexion-style) is what turns episodic memory into improvement: after a failed attempt the agent writes a verbal self-critique, stores it, and conditions the next attempt on it — verbal reinforcement learning, no weight updates.
def run_agent(task, llm, tools, max_steps=8, budget_usd=0.50):
messages = [{"role": "user", "content": task}]
spent = 0.0
for step in range(max_steps): # hard cap → can't loop forever
reply = llm.chat(messages, tools=tools) # model thinks + may call a tool
spent += reply.cost
if spent > budget_usd: # cost guardrail
return "Stopped: budget exceeded."
if not reply.tool_calls: # no tool → it's the final answer
return reply.text
messages.append(reply.message)
for call in reply.tool_calls:
fn = tools.get(call.name)
if fn is None: # validate before executing
result = f"Error: unknown tool {call.name}"
else:
try:
result = fn(**call.args) # run the tool
except Exception as e:
result = f"Tool error: {e}" # feed errors back, don't crash
messages.append({"role": "tool",
"tool_call_id": call.id, "content": str(result)})
return "Stopped: max steps reached."
Interview Q&A · deep dive
MCP — the Model Context Protocol standard
MCP is an open standard (introduced by Anthropic, now broadly adopted) for connecting LLM apps to tools and data through one protocol instead of N bespoke integrations. The "USB-C for AI tools" framing: write a server once, any MCP-capable host can use it.
| Primitive | What the server exposes | Think |
|---|---|---|
| Tools | callable functions the model can invoke (with side effects) | "do something" — query DB, send email |
| Resources | readable data the host can load into context | "read something" — a file, a record |
| Prompts | reusable templated workflows the user can trigger | "a saved recipe" |
| Transport | Use |
|---|---|
| stdio | local server as a subprocess — desktop tools, dev |
| Streamable HTTP / SSE | remote servers — hosted, multi-user connectors |
Interview Q&A
Under the architecture diagram, every MCP message is JSON-RPC 2.0. A session begins with an initialize request where client and server negotiate capabilities and protocol version, so a host only offers what a given server actually supports. After the handshake the host calls tools/list to discover tools at runtime (dynamic discovery — no hard-coded integration), then tools/call to invoke one. Resources and prompts have parallel resources/list / prompts/list methods.
| Version | Notable additions |
|---|---|
| 2025-03-26 | Streamable HTTP transport (replaces HTTP+SSE); single endpoint, optional SSE streaming |
| 2025-06-18 | OAuth 2.0/2.1 alignment; elicitation (server can request input from the user mid-call) |
| 2025-11-25 | current stable: async Tasks, refined OAuth, extensions, server identity |
Elicitation is the senior-relevant one: a server can pause a tool call to ask the user for missing input — including a URL elicitation that opens a browser for OAuth/API-key/payment flows, so the secret token is obtained server-side and the LLM never sees it. That closes a real credential-leak hole in earlier designs.
| Dimension | stdio | Streamable HTTP |
|---|---|---|
| Topology | local subprocess of the host | remote, networked, multi-client |
| Auth | inherits the local user | OAuth / bearer tokens |
| Use when | desktop tools, dev, local files | hosted connectors, SaaS, teams |
Streamable HTTP is JSON-RPC over one POST/GET endpoint with optional Server-Sent Events for streaming partial results — it superseded the older two-endpoint HTTP+SSE design and is the standard for remote servers.
# FastMCP: declare a tool with a typed signature; the SDK generates the JSON schema
# that the host's tools/list returns — discovery is automatic.
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("trials")
@mcp.tool()
def get_trial(nct_id: str) -> dict:
"""Fetch one clinical trial by its NCT identifier.""" # docstring → tool description
return db.fetch_one("SELECT * FROM trials WHERE nct_id = %s", nct_id)
@mcp.resource("schema://trials")
def trials_schema() -> str:
"""Readable table schema the host can load into context."""
return db.describe("trials")
if __name__ == "__main__":
mcp.run(transport="stdio") # swap to "streamable-http" for a remote server
Interview Q&A · deep dive
initialize handshake that negotiates protocol version and capabilities; then the host calls tools/list to discover available tools at runtime (with their JSON schemas) and tools/call to invoke one. Discovery is dynamic, which is what removes the need for hard-coded, per-tool integration code.Multi-agent systems architecture
When one agent juggling many tools gets unreliable, you decompose into specialists coordinated by a topology — agentic AI's "microservices moment." The skill is picking the coordination shape and knowing the coordination tax you pay for it.
| Topology | Shape | Use when |
|---|---|---|
| Orchestrator–worker | a lead agent plans & delegates to specialists | a task decomposes into parallel sub-tasks (the default) |
| Sequential pipeline | agent A → B → C, each refines | clear stages (extract → draft → review) |
| Hierarchical | supervisors of supervisors | large org-shaped problems |
| Debate / critique | agents argue or one critiques another | quality, reasoning, reducing error |
| Blackboard | shared memory all agents read/write | loosely-coupled collaboration |
| Swarm / handoff | agents pass control peer-to-peer | routing by capability, no central boss |
| Framework | Model |
|---|---|
| LangGraph | agents as a graph/state-machine — explicit control, durable state |
| CrewAI | role-based "crews" with tasks — high-level, fast to stand up |
| AutoGen | conversational multi-agent, flexible message passing |
| OpenAI Agents SDK / Swarm | lightweight handoffs between agents |
Interview Q&A
Don't pick a topology because it sounds clever — derive it from the shape of the work. If the task fans into independent sub-questions, you want parallel workers under an orchestrator. If it's a strict assembly line, a pipeline. If two answers must be reconciled, debate. The 2025 lesson from Anthropic's own research system is blunt: a multi-agent setup beat a single agent by ~90% on hard research, but cost ~15x the tokens, and token volume alone explained ~80% of the quality gain. So the real mechanism is "more parallel context windows", not magic coordination — which means multi-agent only pays off when the task genuinely decomposes into parallel threads with little shared state.
Two dominant 2025–2026 shapes. In a supervisor, every hop goes through the lead — clean to debug and the routing logic lives in one place, but you pay 2 LLM calls per domain (worker, then back to supervisor). In a swarm agents hand control peer-to-peer and the system remembers who was last active, so it's 1 call per domain after the first — cheaper and lower-latency, but routing is smeared across every agent's prompt and far harder to trace. The mature default: start supervisor, graduate to swarm only when data shows latency is the bottleneck and misroutes are rare.
| Axis | Supervisor | Swarm / handoff |
|---|---|---|
| Control | central router owns the turn | peer-to-peer, decentralized |
| Cost | ~2 LLM calls per domain | ~1 call per domain (after first) |
| Debuggability | routing in one place — easy | routing spread across agents — hard |
| Best for | early builds, audited routing | latency-critical, capability routing |
# Modern LangGraph: the supervisor delegates via handoff TOOLS, not a
# bespoke router node — this is now the recommended pattern.
from langgraph.prebuilt import create_react_agent
from langgraph.graph import StateGraph, MessagesState, START
from langgraph.types import Command
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-opus-4-8") # smart router
worker_llm = ChatAnthropic(model="claude-haiku-4-5") # cheap workers
researcher = create_react_agent(worker_llm, [search_trials], name="researcher")
checker = create_react_agent(worker_llm, [verify_citation], name="checker")
def supervisor(state: MessagesState) -> Command:
# LLM decides the NEXT worker (or to finish); Command routes the graph
decision = llm.invoke(state["messages"] + [ROUTER_PROMPT])
nxt = parse_route(decision) # "researcher" | "checker" | "__end__"
return Command(goto=nxt, update={"messages": [decision]})
g = StateGraph(MessagesState)
g.add_node("supervisor", supervisor)
g.add_node(researcher); g.add_node(checker)
g.add_edge(START, "supervisor")
g.add_edge("researcher", "supervisor") # workers report back
g.add_edge("checker", "supervisor")
app = g.compile() # durable state + checkpointing for free
# Swarm: an agent hands control directly to a peer via a tool that
# returns Command(goto=..., graph=PARENT). The HANDOFF CONTRACT — what
# context travels with control — is where swarms quietly lose state.
from langchain_core.tools import tool
from langgraph.types import Command
def make_handoff(to_agent: str):
@tool(f"handoff_to_{to_agent}")
def _handoff(reason: str, payload: dict) -> Command:
"""Transfer control. `payload` = the explicit state the next
agent needs — pass enough or it repeats your work."""
return Command(
goto=to_agent, graph=Command.PARENT,
update={"handoff": {"from": to_agent, "why": reason, "ctx": payload}},
)
return _handoff
Interview Q&A · deep dive
Agentic AI — the complete guide capstone
Pulling it together: an agent is an LLM given a goal, a loop, memory, and tools, allowed to decide its own next action. This card is the mental model — anatomy, autonomy, the loop, and what it takes to run one in production.
| Component | Role |
|---|---|
| Model (brain) | reasons, decides the next action |
| Tools | hands — retrieval, APIs, code, MCP servers |
| Memory | working (this run) + episodic/semantic/procedural (across runs) |
| Planning | decompose goal → steps (upfront or adaptive) |
| Loop + termination | act→observe→repeat until done or capped |
| Autonomy level | What the model controls |
|---|---|
| 1 · Workflow | fixed pipeline, LLM fills steps (most reliable) |
| 2 · Router | LLM picks among predefined paths/tools |
| 3 · Tool-calling agent | LLM decides which tools, in what order (ReAct) |
| 4 · Autonomous / multi-agent | LLM plans, spawns, self-corrects (most capable, least predictable) |
Interview Q&A
Strip an agent to its essence and you get a while-loop around a model that can call tools. The model proposes an action; the harness executes it; the result is fed back; repeat until a stop condition. Every framework — LangGraph, CrewAI, the Agent SDKs — is sugar over this loop plus three orthogonal concerns: state (what persists), control (who decides the next step), and safety (what's allowed). When you debug an agent, locate the failure on those three axes: a wrong answer is usually a context/state problem, a runaway is a control problem, a dangerous action is a safety problem.
An agent's hardest constraint isn't reasoning — it's the context window as a working-memory budget. Long-horizon agents fail not because the model got dumber but because the window filled with stale tool output and the relevant fact scrolled out of attention ("lost in the middle"). Production agents therefore actively curate context: summarise old turns, drop raw tool dumps after extracting the answer, retrieve long-term memory only as needed, and offload bulk state to files the agent reads on demand. Treat tokens like RAM, not disk.
| Pressure | Symptom | Mitigation |
|---|---|---|
| Window fills | forgets early instructions | summarise + pin the system goal each step |
| Tool-output bloat | cost spikes, signal buried | extract → discard raw payload |
| Lost in the middle | ignores mid-context facts | put critical facts at the edges; retrieve just-in-time |
| State > window | can't hold the whole task | offload to files / external store, read slices |
import time
from anthropic import Anthropic
client = Anthropic()
def agent(goal, tools, dispatch, *, max_steps=8, max_cost=0.50, approve=None):
msgs = [{"role": "user", "content": goal}]
spent, t0 = 0.0, time.time()
for step in range(max_steps): # GUARD 1: bounded steps
if spent > max_cost or time.time() - t0 > 60: # GUARD 2: cost + wall-clock
return "halted: budget exceeded"
r = client.messages.create(model="claude-opus-4-8", max_tokens=1024,
system=SYSTEM, tools=tools, messages=msgs)
spent += est_cost(r.usage)
msgs.append({"role": "assistant", "content": r.content})
if r.stop_reason != "tool_use": # GUARD 3: clear termination
return r.content[-1].text
out = []
for b in r.content:
if b.type != "tool_use": continue
if b.name in IRREVERSIBLE and approve and not approve(b):
out.append(_result(b.id, "denied by human gate")) # GUARD 4: HITL
continue
try:
out.append(_result(b.id, dispatch(b.name, validate(b.input))))
except Exception as e: # GUARD 5: tools fail closed
out.append(_result(b.id, f"error: {e}", is_error=True))
msgs.append({"role": "user", "content": out})
trace(step, r, out) # GUARD 6: full observability
return "halted: step budget exhausted"
Interview Q&A · deep dive
The 5 types of AI agents taxonomy
Agents differ on two axes: how much they decide on their own and which capability dominates. The labels blur in the wild, but this is the taxonomy interviewers expect — and every type runs the same core loop underneath: perceive → reason → act → learn.
read input / env→ Reason
plan · infer→ Act
call tools / APIs→ Observe
check result→ Learn
update · improve
| # | Type | What it is | Core capabilities |
|---|---|---|---|
| 1 | Self-directed | fully autonomous; decides & executes without human input | define goal · perceive environment · plan actions · execute via APIs/tools · observe & learn · self-correct |
| 2 | Collaborative multi-agent | many agents coordinating to solve one complex task | assign roles · share context · divide & parallelize · exchange feedback · merge outcomes · produce final output |
| 3 | Cognitive | simulates human-like reasoning with memory + context | perceive input · retrieve relevant memory · reason & infer · generate · evaluate correctness · store learnings |
| 4 | Tool-augmented | extends an LLM with external tools, APIs & databases | receive task · identify tools · connect via API/plugin · fetch/process data · validate · return enriched response |
| 5 | Reflective (self-improving) | learns from feedback & refines performance over time | execute · analyze outcome · spot improvements · adjust reasoning · update memory/models · improve accuracy |
Interview Q&A
Interviewers who studied AI formally expect the Russell & Norvig five, which map cleanly onto the modern labels above. Knowing both vocabularies lets you bridge a CS-fundamentals question to LLM practice in one sentence — a strong signal.
| # | Classic type | Decision rule | Modern echo |
|---|---|---|---|
| 1 | Simple reflex | condition → action on current percept; no memory | a stateless rule / regex router |
| 2 | Model-based reflex | keeps internal state to handle partial observability | agent with working memory of the session |
| 3 | Goal-based | searches/plans toward an explicit goal state | planner / ReAct that decomposes a goal |
| 4 | Utility-based | maximizes a utility function across competing goals | agent optimizing a scored objective / reward |
| 5 | Learning | improves its policy from feedback over time | reflective / self-improving agent |
Concretize with one running scenario — a thermostat-style support deflection bot — so the jump in capability is visible:
| Type | What it does on the same ticket |
|---|---|
| Simple reflex | keyword "refund" → canned reply. No context, no follow-up. |
| Model-based | remembers the user already tried a restart this session, so it skips that step. |
| Goal-based | goal = "resolve or escalate"; plans: diagnose → check KB → propose fix → verify. |
| Utility-based | trades off resolution speed vs. CSAT vs. escalation cost, picking the action with best expected score. |
| Learning | feeds resolved/unresolved outcomes back to tune which fixes it offers first. |
# 1 · SIMPLE REFLEX — pure condition→action, no state, no model
def reflex(percept):
return "escalate" if "refund" in percept.lower() else "ack"
# 2 · MODEL-BASED — carries internal state across percepts
class ModelBased:
def __init__(self): self.state = {"tried_restart": False}
def act(self, percept):
if "restart" in percept: self.state["tried_restart"] = True
if self.state["tried_restart"]: return "try_next_fix"
return "suggest_restart"
# 4 · UTILITY-BASED — scores candidate actions, picks the argmax
def utility(action, ctx): # expected value, not just "valid"
speed, csat, cost = ACTION_EFFECTS[action]
return 0.5*csat + 0.3*speed - 0.2*cost
def utility_agent(ctx, actions):
return max(actions, key=lambda a: utility(a, ctx)) # the defining move
Interview Q&A · deep dive
How to build an AI agent — the 8-step blueprint build
A practical checklist for shipping an agent end to end. Each step is a real decision with a failure mode if you skip it — this is the order a senior actually builds in, and it lines up with the agentic guide's anatomy.
| # | Step | The decision · what to nail |
|---|---|---|
| 1 | Purpose & scope | use case, user needs, success criteria, hard constraints — a narrow goal beats a vague "do anything" |
| 2 | System-prompt design | goals, role/persona, instructions, guardrails — the agent's constitution |
| 3 | Choose the model | base model + parameters (temperature, top-p) + context window; capability vs cost vs latency |
| 4 | Tools & integration | APIs (web/data), databases & storage, services, custom functions — ideally via MCP |
| 5 | Memory systems | episodic + semantic (vector store) + procedural; SQL/structured + file storage |
| 6 | Orchestration | workflows/flows, triggers, parameters, message queues, agent routing, error handling |
| 7 | User interface | chat, web app, API endpoint, Slack/Discord bot — how people actually reach it |
| 8 | Testing & evals | unit tests, latency testing, quality metrics, then iterate & improve — the release gate |
| # | Stage | The sub-decisions you make here |
|---|---|---|
| 1 | Purpose & scope | use case · user needs · success criteria · hard constraints |
| 2 | System-prompt design | goals · role / persona · instructions · guardrails |
| 3 | Choose the model | base model · parameters (temp, top-p) · context window |
| 4 | Tools & integration | web/data APIs · databases & storage · AI tools & services · custom functions |
| 5 | Memory systems | episodic · semantic (vector) · SQL / structured · file storage |
| 6 | Orchestration | workflows · triggers · parameters · message queues · agent routing · error handling |
| 7 | User interface | chat · web app · API endpoint · Slack / Discord bot |
| 8 | Testing & evals | unit tests · latency testing · quality metrics · iterate & improve |
read request→ Reason
plan next step→ Act
call a tool→ Observe
read result↺ Reflect
done? loop or stop
from anthropic import Anthropic # step 3 · the model
client = Anthropic()
# 1-2 · purpose + system prompt = the agent's constitution
SYSTEM = ("You are a clinical-trials analyst. Answer ONLY from tool "
"results, cite the GDCID, and say so if unsure.")
# 4 · tools the model is allowed to call (JSON schema)
TOOLS = [{"name": "search_trials",
"description": "Search the trial index. Returns GDCID, phase, status.",
"input_schema": {"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]}}]
def run_tool(name, args): # 6 · orchestration dispatch
if name == "search_trials":
return db.search(args["query"]) # 5 · your real retrieval / memory
raise ValueError(name)
# 6 · the loop: model → tool → model, bounded so it can't run away
def agent(user_msg, max_steps=6):
msgs = [{"role": "user", "content": user_msg}]
for _ in range(max_steps):
r = client.messages.create(model="claude-opus-4-8",
system=SYSTEM, tools=TOOLS, max_tokens=1024, messages=msgs)
msgs.append({"role": "assistant", "content": r.content})
if r.stop_reason != "tool_use": # 8 · termination
return r.content[-1].text
results = [] # run every requested tool
for b in r.content:
if b.type == "tool_use":
out = run_tool(b.name, b.input)
results.append({"type": "tool_result",
"tool_use_id": b.id, "content": str(out)})
msgs.append({"role": "user", "content": results})
return "stopped: step budget exhausted" # guardrail
# LangGraph — a ReAct agent in ~5 lines; it owns the loop & state
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic
agent = create_react_agent(
model=ChatAnthropic(model="claude-opus-4-8"),
tools=[search_trials], prompt=SYSTEM) # your @tool functions
agent.invoke({"messages": [("user", "What phase is GDC-00123?")]})
# CrewAI — when the job splits into roles (multi-agent)
from crewai import Agent, Task, Crew
researcher = Agent(role="Trial researcher", goal="find the trial",
tools=[search_trials], llm="claude-opus-4-8")
Crew(agents=[researcher],
tasks=[Task(description="...", agent=researcher)]).kickoff()
| Category | Tools | Best for |
|---|---|---|
| Consumer assistants | Claude, ChatGPT, Perplexity | research, writing, analysis, general work |
| Agentic coding | Claude Code, Cursor, Windsurf | terminal/IDE-native, multi-file, autonomous coding |
| No-code builders | Lindy, Relay.app, n8n | business automation, integrations, non-technical teams |
| Dev frameworks | LangGraph, CrewAI, LlamaIndex | graph/state flows, multi-agent crews, RAG-first apps |
Interview Q&A
The card's earlier snippets show the pieces; here is the whole thing in one file — two real tools, the bounded loop, input validation, a human gate on the dangerous tool, and a final termination. This is the smallest program that is honestly "an agent you could ship a v0 of."
import json, math
from anthropic import Anthropic
client = Anthropic()
SYSTEM = "You are an ops assistant. Use tools; never guess numbers. " \
"Confirm before any write. Cite which tool gave each fact."
# --- step 4: two tools, JSON-schema'd so the model can call them ---
TOOLS = [
{"name": "calc", "description": "Evaluate a safe arithmetic expression.",
"input_schema": {"type": "object",
"properties": {"expr": {"type": "string"}}, "required": ["expr"]}},
{"name": "set_quota", "description": "WRITE: set a user's quota (irreversible-ish).",
"input_schema": {"type": "object",
"properties": {"user": {"type": "string"}, "gb": {"type": "number"}},
"required": ["user", "gb"]}},
]
WRITE_TOOLS = {"set_quota"} # gate these behind a human
def dispatch(name, args): # step 6: orchestration
if name == "calc":
if not set(args["expr"]) <= set("0123456789+-*/(). "): # validate!
raise ValueError("unsafe expression")
return {"result": eval(args["expr"], {"__builtins__": {}})}
if name == "set_quota":
db[args["user"]] = args["gb"]; return {"ok": True}
raise ValueError(f"unknown tool {name}")
def approve(block): # step 8: human-in-the-loop
return input(f"Run {block.name}({block.input})? [y/N] ") == "y"
def run(goal, max_steps=6): # step 6: the bounded loop
msgs = [{"role": "user", "content": goal}]
for _ in range(max_steps):
r = client.messages.create(model="claude-opus-4-8", max_tokens=1024,
system=SYSTEM, tools=TOOLS, messages=msgs)
msgs.append({"role": "assistant", "content": r.content})
if r.stop_reason != "tool_use": # termination
return r.content[-1].text
results = []
for b in r.content:
if b.type != "tool_use": continue
if b.name in WRITE_TOOLS and not approve(b):
payload, err = "denied by human", True
else:
try: payload, err = dispatch(b.name, b.input), False
except Exception as e: payload, err = str(e), True
results.append({"type": "tool_result", "tool_use_id": b.id,
"content": json.dumps(payload), "is_error": err})
msgs.append({"role": "user", "content": results})
return "stopped: step budget exhausted" # guardrail
db = {}
print(run("Compute 240*0.85 then set that many GB quota for user 'kiran'."))
A frequent confusion: the 8-step build order (a one-time engineering sequence) is not the runtime loop (what the shipped agent does every request). The card shows both as chips; the diagram below makes the build pipeline explicit so the "scope and evals are the bookends" point is visual.
Interview Q&A · deep dive
dispatch (the charset check on calc) prevents arbitrary code execution. Tool errors fail closed — caught and returned as is_error so the model can recover instead of crashing. Human approval on WRITE_TOOLS gates the irreversible action. The termination check on stop_reason ends cleanly when the model stops requesting tools. A real build adds cost/wall-clock caps and tracing.Evaluation — RAGAS, DeepEval, LLM-as-judge your edge
This is the discipline a Principal QE (AI/LLM) role exists to own: how do you prove a non-deterministic system is good enough to ship, and catch regressions? You measure with reference-free metrics, golden datasets, and CI-gated eval suites.
| RAG metric | Answers |
|---|---|
| Faithfulness | Is the answer grounded in retrieved context (no hallucination)? |
| Answer relevance | Does it actually address the question? |
| Context precision | Are the top-ranked chunks the relevant ones? |
| Context recall | Did retrieval fetch all needed info? |
# DeepEval-style assertion in a pytest suite
from deepeval import assert_test
from deepeval.metrics import FaithfulnessMetric, AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
def test_rag_faithful():
tc = LLMTestCase(
input="What phase is NCT01234567?",
actual_output=rag.answer("What phase is NCT01234567?"),
retrieval_context=rag.last_context)
assert_test(tc, [FaithfulnessMetric(threshold=0.8),
AnswerRelevancyMetric(threshold=0.7)])
Interview Q&A
A RAG score that just says "bad" is useless — you need to know which half failed. Split every metric onto one of three layers. Retrieval metrics (context precision/recall) ask "did we fetch the right chunks?" — they ignore the LLM entirely. Generation metrics (faithfulness, answer relevance) ask "given these chunks, did the model answer well?" — they ignore the retriever. End-to-end (task success, citation validity) is what the user actually feels. The diagnostic rule: low context-recall but high faithfulness = your retriever is starving the model (fix chunking/embeddings); high context-recall but low faithfulness = the model is hallucinating despite having the facts (fix the prompt/model). Conflating the two is the #1 reason teams "tune RAG" for weeks with no movement.
# RAGAS 0.2+ API: build an EvaluationDataset, pick metrics, pass a judge LLM.
from ragas import evaluate, EvaluationDataset
from ragas.metrics import Faithfulness, ResponseRelevancy, LLMContextRecall, LLMContextPrecisionWithReference
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
judge = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini", temperature=0))
samples = [{
"user_input": "What phase is trial NCT01234567?",
"response": rag.answer("What phase is trial NCT01234567?"),
"retrieved_contexts": rag.last_context, # list[str] of chunks shown to the model
"reference": "Phase 2", # needed for *recall*; faithfulness needs none
}]
ds = EvaluationDataset.from_list(samples)
result = evaluate(
dataset=ds,
metrics=[Faithfulness(), ResponseRelevancy(), LLMContextRecall(), LLMContextPrecisionWithReference()],
llm=judge,
)
print(result) # {'faithfulness': 0.92, 'answer_relevancy': 0.88, ...}
df = result.to_pandas() # per-row scores → triage the worst questions, grow the golden set
# Roll-your-own G-Eval: a rubric, structured JSON out, and pairwise position de-biasing.
import json, statistics
from anthropic import Anthropic
client = Anthropic()
RUBRIC = """Score the ANSWER 1-5 for groundedness in CONTEXT only.
5=every claim supported; 1=fabricated. Return JSON: {"score":int,"reason":str}."""
def judge(question, answer, context):
msg = client.messages.create(
model="claude-sonnet-4-5", max_tokens=300, temperature=0, # temp 0 = repeatable judge
system=RUBRIC,
messages=[{"role":"user","content":f"Q: {question}\nCONTEXT: {context}\nANSWER: {answer}"}])
return json.loads(msg.content[0].text)["score"]
def pairwise(q, a, b, ctx):
# position bias is real (>10% swing): judge A-then-B and B-then-A, average
s1 = judge(q, a, ctx) - judge(q, b, ctx)
s2 = judge(q, b, ctx) - judge(q, a, ctx)
return (s1 - s2) / 2 # >0 → A wins, order-invariant
| Metric kind | Needs a reference? | Catches | Blind to |
|---|---|---|---|
| Faithfulness | No (uses context) | hallucination / ungrounded claims | whether retrieval was complete |
| Answer relevance | No | off-topic / evasive answers | factual correctness |
| Context recall | Yes (ground truth) | missing evidence / starved retriever | generation quality |
| Context precision | Yes/ranked | noisy, diluted top-k | whether the model used the good chunk |
Interview Q&A · deep dive
context recall against ground truth to see the gap. Faithfulness measures honesty-to-evidence; recall measures whether the evidence was even there.gpt-4o-mini latest, or a hosted judge), the provider can roll a new version and every score shifts — usually upward as judges get more lenient/verbose-tolerant. Always pin the judge to a dated snapshot and version your rubric; a metric you can't reproduce isn't a gate.Advanced AI techniques depth
Beyond prompting and vanilla RAG, this is the toolkit a GenAI interview expects you to recognise and place — you won't train a frontier model, but you must know what each technique buys and costs.
| Technique | What it does | Cost / when |
|---|---|---|
| Full fine-tuning | update all weights on your data | expensive; needs lots of data + GPUs; rare |
| LoRA / QLoRA (PEFT) | train tiny low-rank adapters (+quantised base) | cheap, fast, swappable — the default fine-tune |
| RLHF / DPO | align to human preferences (DPO is simpler, no reward model) | behaviour/safety tuning; DPO is the modern path |
| Distillation | train a small model to mimic a big one | cut latency/cost while keeping much quality |
| Quantization | int8/int4 weights (GGUF, AWQ) | run big models on small hardware; tiny quality loss |
| Mixture-of-Experts | route each token to a few expert sub-nets | more capacity at constant inference cost |
| Long-context / FlashAttention | efficient attention over very long inputs | whole-doc reasoning; watch cost & recall |
Interview Q&A
The biggest shift since this card was first written: the scaling frontier is no longer just "bigger model, more pre-training tokens" — it's reasoning models that spend more inference compute to think. DeepSeek-R1 (Jan 2025) showed reasoning can emerge from pure RL (no supervised fine-tuning needed first), and OpenAI's o-series + Claude's extended thinking proved that letting a model emit a long chain of thought before answering — then training it to verify and self-correct — beats a much larger base model on math/code/science. The trade is stark: o3 at high compute can burn tens of millions of tokens and minutes per hard question. So the new lever isn't only model size; it's how long you let it think, and the new cost dimension is reasoning tokens you pay for but never see.
| Era | Lever | Cost paid | Wins at |
|---|---|---|---|
| 2020-23 | pre-training scale (params, tokens) | training compute | breadth of knowledge |
| 2024-26 | test-time compute (long CoT, verify, self-correct) | inference tokens/latency | hard reasoning, math, code, agents |
Three generations of "make the model behave". RLHF/PPO trains a separate reward model from human preference pairs, then optimises the policy against it with PPO — powerful but a 3-stage, unstable, compute-heavy pipeline (you're training two networks). DPO (Direct Preference Optimization) collapses this: it reformulates the RLHF objective as a simple classification loss directly on chosen/rejected pairs — no reward model, no RL loop, far easier to run. The catch: DPO can help chat yet barely move (or hurt) math reasoning. GRPO (Group Relative Policy Optimization, the DeepSeek-R1 recipe) is the reasoning-era default: it keeps RL but drops PPO's value/critic network — it samples a group of answers per prompt, scores each, and uses the group's mean as the baseline for advantage. That pairs perfectly with verifiable rewards (math answer is right/wrong, code passes tests) where reward is cheap to compute and hard to game.
| Method | Reward model? | Critic/value net? | Best for |
|---|---|---|---|
| RLHF (PPO) | yes (trained) | yes | general preference/safety alignment |
| DPO | no — implicit | no | cheap chat/style alignment from pairs |
| GRPO | often a verifier/rule | no (group baseline) | reasoning, math, code, agents |
# Soft-label distillation: the student learns the teacher's full probability
# distribution (the "dark knowledge"), not just the hard argmax label.
import torch
import torch.nn.functional as F
def distillation_loss(student_logits, teacher_logits, labels, T=2.0, alpha=0.5):
# T = temperature: softens distributions so small probs carry signal
soft_teacher = F.softmax(teacher_logits / T, dim=-1)
soft_student = F.log_softmax(student_logits / T, dim=-1)
# KL term: match the teacher's whole distribution (scaled by T^2)
kd = F.kl_div(soft_student, soft_teacher, reduction="batchmean") * (T * T)
# standard supervised term against the real labels
ce = F.cross_entropy(student_logits, labels)
return alpha * kd + (1 - alpha) * ce # blend: imitate teacher + stay correct
# Why it works: a 1.5B student trained on a 70B teacher's outputs keeps
# most of the quality at a fraction of the latency/cost — the same recipe
# that produced the DeepSeek-R1-distill models (reasoning on consumer GPUs).
MoE replaces the dense feed-forward block with N expert sub-networks plus a tiny router (gating network). For each token, the router picks the top-k experts (k is usually 2), so only a slice of the parameters fires per token. That decouples total capacity from active compute: a 671B-parameter MoE (e.g. DeepSeek-V3-class) might activate only ~37B per token, giving you a huge knowledge store at the inference cost of a much smaller dense model. The hard parts are load balancing (an auxiliary loss stops the router collapsing onto a few favourite experts) and memory — you still must hold all experts in VRAM even though most are idle each step.
Interview Q&A · deep dive
Future AI evaluation — the discipline that's becoming the job QE edge
As models get more capable and more autonomous, the bottleneck shifts from building to proving it works. Evaluation is becoming the new unit test — and the core of the Principal QE role you're targeting.
| Method | What it measures |
|---|---|
| Reference metrics | vs ground truth: exact-match, BLEU/ROUGE, retrieval recall/precision |
| RAGAS / DeepEval | RAG-specific: faithfulness, answer-relevance, context-precision/recall |
| LLM-as-judge (G-Eval) | a model scores output against a rubric — scalable, needs calibration vs humans |
| Agent / trajectory eval | did the agent pick the right tools, in the right order, and finish the task? |
| Red-teaming / adversarial | prompt-injection, jailbreaks, harmful-output probes in the eval set |
| Online (production) | citation validity, tool-success rate, user signals, A/B — the real test |
Interview Q&A
Treat evaluation as a closed loop with an offline side and an online side, and know what each can and can't see. Offline (CI, on a golden set) is your gate: fast, reproducible, blocks regressions before merge — but it only knows the inputs you thought to write down. Online (production telemetry) is your truth: real queries, real failure modes, real distribution shift — but it's noisy, lagged, and you can't block on it. The discipline that's becoming the QE job is closing the loop: every online failure (a bad citation, a tool misuse, a thumbs-down) becomes a new offline regression case, so the gate grows to cover reality. A team whose golden set never grows is flying blind between releases.
# conftest-style: run the whole golden set, fail the build if mean faithfulness
# drops below the committed baseline. This is the "eval as the new unit test".
import json, pytest, statistics
from deepeval.metrics import FaithfulnessMetric
from deepeval.test_case import LLMTestCase
GOLDEN = json.load(open("golden_set.v7.json")) # versioned in git, grows per incident
BASELINE = 0.85 # committed; a drop = a blocking regression
@pytest.mark.parametrize("case", GOLDEN, ids=lambda c: c["id"])
def test_case_is_faithful(case):
out = rag.answer(case["q"])
tc = LLMTestCase(input=case["q"], actual_output=out.text,
retrieval_context=out.context)
m = FaithfulnessMetric(threshold=0.0) # score now, gate on the aggregate below
m.measure(tc)
case["_score"] = m.score
def test_suite_above_baseline(record_property):
scores = [c["_score"] for c in GOLDEN if "_score" in c]
mean = statistics.mean(scores)
record_property("faithfulness_mean", mean) # surfaced in CI report / trend
assert mean >= BASELINE, f"regression: {mean:.3f} < {BASELINE}"
# You can't block prod on an LLM judge (latency/cost), so eval async on a sample
# and alert on a rolling window. This is the heart of LLM observability.
import random
from collections import deque
window = deque(maxlen=200) # rolling faithfulness over last 200 sampled calls
def on_response(trace): # called after every prod RAG response
log.emit(trace) # Langfuse/Braintrust span: latency, tokens, cost, cites
if random.random() < 0.05: # 5% sample keeps judge cost bounded
score = llm_judge_faithfulness(trace.question, trace.answer, trace.context)
window.append(score)
if len(window) == window.maxlen and mean(window) < 0.80:
page_oncall("faithfulness drift on live traffic") # SLO breach → alert
| Axis | Offline / CI gate | Online / production |
|---|---|---|
| Distribution | curated golden set | real, drifting traffic |
| Reproducible? | yes — same inputs each run | no — noisy, lagged |
| Can it block a release? | yes (the gate) | no (alert/rollback only) |
| Catches | known regressions pre-merge | novel failures, drift, abuse |
| Tooling | pytest + RAGAS/DeepEval | Langfuse / Braintrust / LangSmith |
Interview Q&A · deep dive
RAG vs fine-tune vs prompt — the decision judgment
A favourite senior question. The rule of thumb: prompt for behaviour you can describe, RAG for knowledge that changes or is private, fine-tune for consistent style/format or narrow tasks where prompting plateaus.
| Need | Best lever | Why |
|---|---|---|
| current / private facts | RAG | update index, not weights; citations; access control |
| consistent format / tone / narrow skill | Fine-tune | bakes behaviour in; shorter prompts; lower latency |
| describable behaviour, fast iteration | Prompt | cheapest, instant to change, no training |
| both knowledge + behaviour | RAG + fine-tune | they're complementary, not either/or |
Interview Q&A
Don't reach for fine-tuning because it sounds sophisticated — answer four questions in order, and the cheapest sufficient lever wins. (1) Is it knowledge or behaviour? Facts that change → RAG; how-to-respond → fine-tune/prompt. (2) Does the knowledge change? Daily/private → RAG (re-index, don't retrain). (3) Can you describe the behaviour in words? Yes → prompt; only if prompting plateaus → fine-tune. (4) Do you have hundreds+ of clean labelled examples? No → you cannot fine-tune well, so don't. Most "we need fine-tuning" requests are actually prompt or RAG problems wearing a costume.
| Axis | Prompt | RAG | Fine-tune |
|---|---|---|---|
| Knowledge freshness | frozen at training | live (re-index) | frozen until retrain |
| Changes behaviour/format | somewhat | no | yes, deeply |
| Cost to change | seconds, free | re-embed docs | GPU hours + data |
| Latency / token cost | grows with prompt | +retrieval, long ctx | lowest (short prompts) |
| Citations / auditability | no | yes (source chunks) | no |
| Data needed | none | a corpus | 100s+ clean labels |
| Access control | n/a | per-doc ACLs | baked in (leaky) |
Interview Q&A · deep dive
Claude Mastery
Every Claude topic in depth — how the model works, how to prompt it, its features (Artifacts, Projects, Memory, Design), how to drive real work with it, and how to build on it. Written from Claude's actual current capabilities; features evolve, so the live source of truth is docs.claude.com and support.claude.com.
How Claude works — and why it's different foundations
Claude is a family of large language models from Anthropic, trained to be helpful, honest, and harmless. Under the hood it's a Transformer doing next-token prediction; what distinguishes it is the alignment approach — Constitutional AI, where the model is trained against an explicit set of principles rather than only human preference labels.
| Model | Profile | Reach for it |
|---|---|---|
| Claude Opus 4.8 | most capable; deepest reasoning | hard, complex, high-stakes work |
| Claude Sonnet 4.6 | balanced capability / speed / cost | the everyday default for most work |
| Claude Haiku 4.5 | fastest, cheapest | high-volume, latency-sensitive tasks |
Interview Q&A
It helps to separate the two things that make Claude Claude. The base capability comes from large-scale next-token pre-training on text — that is the raw "knows things, can reason" engine, shared in spirit with every frontier LLM. The character comes from a second stage: post-training for the HHH goals (helpful · honest · harmless) using Constitutional AI. CAI is not a content filter bolted on at inference; it is a training signal. The model drafts a response, critiques it against written principles (the "constitution"), revises, and those revisions become preference data — a loop Anthropic calls RLAIF (reinforcement learning from AI feedback), which scales beyond what hand-labelled RLHF alone can reach. The payoff you feel: consistent values, calibrated uncertainty, and resistance to being talked out of its guardrails.
The card above lists the tiers; the senior question is which knob to turn per request. All current models (Opus 4.8, Sonnet 4.6, Haiku 4.5, and the Fable 5 generation) share adaptive thinking, tool use, and vision; they differ on the capability/latency/cost curve. Two extra levers matter as much as the tier: the effort setting (how hard the model thinks) and the context window you feed it. Reach for a bigger model or higher effort for genuinely hard reasoning; reach for a cheaper model for high-volume classification where you would otherwise pay Opus rates to stamp labels.
| Lever | What it trades | Turn it up when… |
|---|---|---|
| Model tier | capability vs $/latency | the task is open-ended, multi-step, high-stakes |
| effort setting | answer quality vs thinking tokens | multi-step logic, math, tricky debugging |
| Context size | recall vs cost & "lost in the middle" | long source docs you must reason over |
| Extended/adaptive thinking | depth vs speed | the model needs room to plan before answering |
Interview Q&A · deep dive
Prompting Claude — basics to advanced technique
Most of Claude's quality comes from how you ask. The reliable levers, in the order you should reach for them — and Claude responds especially well to XML-tagged structure.
| Lever | How to use it | Fixes |
|---|---|---|
| Be clear & specific | state goal, audience, format, length | vague, generic, wrong-shaped output |
| Give examples | 1–3 input→output pairs (positive + negative) | format you can't fully describe in words |
| Set role & tone | "You are a senior… writing for…" | wrong register or expertise level |
| XML tags | fence parts: <context>, <rules>, <example> | instructions blurring into data |
| Chain-of-thought | "think step by step" / "reason first" | multi-step logic, math, extraction |
| Prefilling | start the answer for it (e.g. { or a heading) | force a format or skip preamble |
| Iterate & save | refine in-thread; keep winners as templates | re-deriving good prompts every time |
<role>You are a senior clinical-trial analyst writing for executives.</role>
<task>Summarise the attached trial into 5 bullets + 1 risk callout.</task>
<rules>
- Plain language; one line per bullet; no jargon.
- If a value isn't in the source, write "not stated" — never invent it.
- Output as a markdown list, nothing else.
</rules>
<example>
input: "Phase 2, recruiting, sponsor Acme."
output: "- Phase 2 trial, currently recruiting (sponsor: Acme)"
</example>
<trial>{attached document}</trial>
Interview Q&A
The single mental model behind every advanced technique: a prompt is a mix of instructions (what to do), data (what to do it to), and examples (what "done right" looks like). The model's failures are usually category errors — it treats your data as an instruction, or your example as the real task. XML tags exist to make the categories unambiguous: there are no magic tag names, but <instructions>, <document>, and <examples> let the model parse role from content. Anthropic's measured guidance: use 3–5 examples (relevant, diverse, edge-case-covering), wrap each in <example> inside an <examples> block, and for long inputs put the data at the top and the query at the bottom.
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
SYSTEM = "You are a senior clinical-trial analyst. Be precise; never invent figures."
# Multishot: 3-5 diverse examples, each fenced so the model can't confuse
# an example for the real task. Long data goes near the TOP of the user turn.
USER = """<documents>
<document index="1"><source>NCT00000.txt</source>
<document_content>{trial_text}</document_content></document>
</documents>
<examples>
<example>input: "Phase 2, recruiting" → output: "- Phase 2 (recruiting)"</example>
</examples>
<instructions>Quote the relevant span first in <q> tags, then give 5 bullets.
If a value is absent write "not stated".</instructions>"""
msg = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system=SYSTEM,
thinking={"type": "adaptive"}, # model decides depth
output_config={"effort": "high"}, # turn the dial up for hard tasks
messages=[{"role": "user", "content": USER}],
)
print(msg.content[0].text)
Two techniques people still reach for are now obsolete or harmful on Claude's latest models, and it's a sharp interview signal to know why.
| Old habit | Status on 4.6+ / Fable 5 | Do this instead |
|---|---|---|
| Prefilling the assistant turn (e.g. start the reply with {) | Removed — a prefilled last assistant message returns a 400 error | Structured Outputs for schemas; "respond without preamble" for formatting; for continuations, move the partial text into the user turn |
| Manual thinking budget (budget_tokens) | Deprecated; 400 on Opus 4.7+ and Fable/Mythos 5 | Adaptive thinking + the effort parameter; cap cost with max_tokens |
| "CRITICAL: you MUST use this tool" | Causes over-triggering — newer models are more obedient | Plain "Use this tool when…"; dial language down |
| Hand-written step-by-step CoT | Often beaten by the model's own reasoning | "Think thoroughly" + let adaptive thinking plan |
Interview Q&A · deep dive
Artifacts & Claude Design (canvas) feature
Artifacts are standalone, rendered outputs Claude produces beside the chat — code, documents, HTML/React apps, diagrams — that you can view, edit, and reuse. Claude Design adds a visual canvas with design tools you iterate on by chatting. Together they turn "describe it" into "see it and refine it."
| Artifact type | Good for |
|---|---|
| Documents (markdown) | reports, guides, articles, specs you'll keep or publish |
| Code / scripts | standalone code >20 lines you'll run or reuse |
| HTML / React apps | interactive tools, dashboards, widgets (this hub is one) |
| Diagrams (SVG / Mermaid) | flows, architectures, visual explainers |
Interview Q&A
Under the surface an artifact is a self-contained, versioned document rendered in its own pane — not chat text. Two consequences shape how you use them. First, it's iterable: each edit produces a new version, so Claude patches the existing artifact surgically instead of regenerating, and you can step back through versions. Second, interactive artifacts (HTML/React) run in a sandbox — great for self-contained tools, but the boundary is real (no arbitrary external network calls, ephemeral storage). The trigger heuristic is concrete: Claude reaches for an artifact when content is significant and self-contained — roughly over ~15 lines, something you'd want to edit, run, or reuse — and answers inline for quick explanations.
| Property | Artifact | Inline chat answer |
|---|---|---|
| Persistence | versioned, editable, shareable | ephemeral in the transcript |
| Best for | code >~15 lines, docs, apps, diagrams | short answers, reasoning, Q&A |
| Editing | surgical patch → new version | full re-print each time |
| Execution | HTML/React run sandboxed | not executable |
The leap most people miss: an artifact can call the model at runtime. You ask for "a chatbot / grader / generator that uses Claude," and the generated app embeds calls to a completion API exposed inside the artifact sandbox. The economics are the headline — no API key, no per-call charges, no deployment: calls run against the current user's plan limits, so when you publish and share an app, each user's usage counts against their subscription, not yours. That makes artifacts a genuine zero-infra prototyping surface for AI features. Available across Free, Pro, Max, Team, and Enterprise.
Claude Design is the frontend-focused sibling: a visual canvas where Claude generates and iterates on UI/design interactively, rather than emitting one block of code. It pairs with the model's frontend strength — but the documented failure mode is "AI slop": generic fonts (Inter/Arial), purple-on-white gradients, predictable layouts. The senior move is to steer aesthetics explicitly — distinctive typography, a committed color theme via CSS variables, one well-orchestrated load animation — exactly the discipline this hub's own styling follows.
Interview Q&A · deep dive
Projects, Memory & Styles workspace
Three features that stop you re-explaining yourself. Projects hold standing context for a body of work, Memory carries relevant continuity across chats, and Styles keep Claude writing in your voice.
| Feature | What it does | Use it to |
|---|---|---|
| Projects | a workspace holding shared knowledge/context for related chats | keep a workstream's docs & instructions in one place |
| Memory | builds memory from past chats; can search/reference earlier ones | get continuity without re-pasting context |
| Styles | customise Claude's writing style/voice | match a brand or personal tone consistently |
| Preferences | store tone/format/feature defaults | stop repeating "be concise / no bullets" |
Interview Q&A
A Project is more than a folder of attachments. Each Project carries its own 200K-token context window (the Enterprise tier goes higher, up to ~500K on some models), and its uploaded files form a knowledge base every chat in the Project can see. The clever part is what happens when you over-fill it: as project knowledge approaches the context limit, Claude automatically switches to RAG mode — retrieving the most relevant chunks instead of stuffing everything in — which expands effective capacity by up to 10x while keeping answer quality. So the practical advice flips with scale: under the limit, the whole knowledge base is "in head"; past it, file naming and chunk-ability matter because retrieval, not raw inclusion, decides what the model sees.
"Memory" is overloaded; the distinction is a favourite interview trap. (1) Project knowledge is curated material you deliberately upload, shared by every chat in that Project. (2) The consumer Memory feature is automatic: Claude synthesises key insights from your past chats (refreshed roughly every 24h) so it builds understanding over time — and crucially it is project-scoped, each Project gets its own separate memory space and summary, isolated from other Projects and from non-project chats. (3) Chat search is on-demand retrieval — "what did we decide about X?" pulls from prior conversations via RAG. One you fill, one accrues, one you query.
| Mechanism | How it's populated | Scope & control |
|---|---|---|
| Project knowledge | you upload docs/instructions | shared across the Project; you edit it |
| Memory (auto) | synthesised from past chats (~24h) | per-project; view/edit/pause/reset; off in Incognito |
| Chat search | RAG over prior conversations | on demand; "find what we discussed" |
| Project instructions | you write tone/role/format rules | applied to every chat in the Project |
Interview Q&A · deep dive
Claude for real work — the use-case playbook how-to
The day-to-day jobs, each with how to drive Claude and which feature does the heavy lifting. The pattern is always the same: give context + constraints, pick the right feature, iterate.
| Job | How to drive Claude | Feature |
|---|---|---|
| Emails | gist + tone + recipient; ask for 2–3 variants | chat |
| Reports | raw notes → structured doc; specify sections | artifact → Word file |
| Research | one clear question + scope; turn on web search | web search / research |
| Presentations | outline → full deck; say slides & audience | PowerPoint artifact |
| Proposals | context + win themes → first draft to edit | artifact |
| Data | attach CSV/Excel; ask for analysis + charts | code execution ("analyse without Excel") |
| Long documents | attach the file; ask targeted questions | file upload + extraction |
Interview Q&A
The reason "context + exact format + the right export" beats hunting for a magic prompt is that Claude is a steerable reasoner, not a search box. Every deliverable degrades on the same three axes: missing context (it invents the parts you didn't give), missing constraints (it picks a default shape you didn't want), and missing examples (it guesses your house style). Fix all three up front and the first draft lands close; then you edit in-thread rather than re-prompting from zero. The senior habit is to treat the thread as a workspace, not a one-shot query.
For recurring work, stop free-typing. An XML-tagged skeleton separates the instruction (stable) from the data (changes each run), which is what makes a prompt a template you can hand to a teammate. The tags also stop Claude from confusing your instructions with the pasted content — a real failure mode when you dump a 5-page email thread inline.
<!-- a report template you reuse weekly: only <source> changes -->
You are a clinical-trial analyst writing for a non-technical sponsor.
<task>Turn the raw notes below into a 1-page status report.</task>
<format>
- Sections: Summary, Risks, Decisions Needed, Next Steps
- Each bullet ≤ 2 lines. No jargon without a parenthetical.
- End with a "What's uncertain" line — do not pad it.
</format>
<source>
{{paste this week's notes here}}
</source>
Draft it, then list any place you guessed because the notes were thin.
That last line — "list where you guessed" — is the cheapest hallucination guard there is. It converts silent invention into a visible to-do list you can fill in.
| Failure you see | Real cause | The fix (not "try again") |
|---|---|---|
| Generic, hedgy prose | no audience or "why" | name the reader and what they'll do with it |
| Wrong structure | format left implicit | specify sections + length per section |
| Confident wrong facts | not grounded in a source | attach the file; ask it to quote/cite back |
| Drifts on re-prompt | regenerating from scratch | edit in place: "change only X" |
| Data analysis you can't trust | can't see the steps | ask it to show the code + a sanity check on totals |
Interview Q&A · deep dive
Claude for builders — Code, Cowork, API & connectors build
Beyond chat, Claude is a platform. If you write software or automate work, these are the surfaces that matter — and where your engineering background turns Claude from an assistant into infrastructure.
| Surface | What it is | For |
|---|---|---|
| Claude Code | agentic coding from the terminal/desktop; git-aware, autonomous, multi-file | developers, CLI workflows, automation |
| Claude Cowork | agentic knowledge-work desktop app | non-developers automating real tasks |
| API & Platform | build on Claude directly; model strings like claude-opus-4-8 | products, pipelines, custom apps |
| Chrome / Excel / PowerPoint | browsing, spreadsheet, slide agents (beta) | in-tool automation |
| Connectors (MCP) | wire external apps/data to Claude via MCP | giving Claude governed tool access |
Interview Q&A
Everything Anthropic ships for builders bottoms out in one HTTP endpoint: the Messages API (POST /v1/messages). Tool use, structured outputs, vision, prompt caching, extended/adaptive thinking and server-side tools are all features of that one call, not separate APIs. The surfaces above it are escalating amounts of "who runs the loop": you write a single call → you orchestrate a tool loop → the Agent SDK runs the loop on your infra → Managed Agents runs the loop and hosts the sandbox. Pick the lowest tier that does the job; reach up only when the task genuinely needs autonomy.
classify · extract · summarise→ + tool use
you control the loop→ Agent SDK
you host the agent→ Managed Agents
Anthropic hosts loop + sandbox
The current Python SDK (pip install anthropic) call. Note the 2026 details: model id claude-opus-4-8 (a pinned dateless snapshot, 1M context), adaptive thinking (the fixed budget_tokens is gone — 4.7/4.8 return a 400 if you send it), and reading block.type before .text because content is a list of typed blocks.
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
resp = client.messages.create(
model="claude-opus-4-8", # Opus 4.8 · 1M-token context
max_tokens=2000,
system="You are a precise clinical-trial data assistant.",
thinking={"type": "adaptive"}, # model decides depth; no budget_tokens
messages=[
{"role": "user", "content": "Extract every investigator name + site id."},
],
)
for block in resp.content: # content is a LIST of typed blocks
if block.type == "text":
print(block.text)
print(resp.usage.input_tokens, resp.usage.output_tokens, resp.stop_reason)
An "agent" is this call in a while loop: you advertise tools, the model emits a tool_use block, you execute it and feed back a tool_result, repeat until stop_reason == "end_turn". That loop is the whole game; MCP is just a standard way to supply those tools, and the Agent SDK is a packaged version of the loop.
tools = [{
"name": "match_investigator",
"description": "Look up a site's PI by site_id. Call when a site_id appears.",
"input_schema": {"type": "object",
"properties": {"site_id": {"type": "string"}},
"required": ["site_id"]},
}]
messages = [{"role": "user", "content": "Who runs site 04-217?"}]
while True:
resp = client.messages.create(model="claude-opus-4-8",
max_tokens=1024, tools=tools, messages=messages)
if resp.stop_reason != "tool_use":
break
messages.append({"role": "assistant", "content": resp.content})
results = []
for b in resp.content:
if b.type == "tool_use":
out = lookup_pi(b.input["site_id"]) # your function
results.append({"type": "tool_result",
"tool_use_id": b.id, "content": out})
messages.append({"role": "user", "content": results}) # all results, one message
| Surface | Who runs the agent loop | Reach for it when |
|---|---|---|
| Messages API | nobody — single request | classify, extract, summarise, Q&A |
| API + tool use | you (your loop / SDK tool_runner) | multi-step, your tools, you host compute |
| Claude Agent SDK | you (packaged loop + harness) | building a coding/ops agent on your infra |
| Managed Agents | Anthropic (loop + per-session sandbox) | stateful agent with a hosted workspace |
| MCP | n/a — it supplies tools | wiring external systems in as governed tools |
Claude Code is the agentic coding tool — terminal, IDE, and desktop — and its real surface is composable: subagents (isolated context windows for parallel/bounded work), skills (a SKILL.md bundle loaded on demand; slash commands are now unified with skills), hooks (e.g. PreToolUse to veto a dangerous bash command before it runs), MCP servers, and plugins that ship all of those together. On the API, the two levers that change the economics of a real workload are prompt caching (cache reads cost ~0.1× input; a large fixed prefix becomes nearly free after the first call) and the Batch API (50% off, async, up to 100k requests) for anything not latency-sensitive.
Interview Q&A · deep dive
messages.create. You advertise tools; on stop_reason == "tool_use" you execute the requested tool, append a tool_result, and call again; you exit on end_turn. Everything else — MCP, the Agent SDK, Managed Agents — is about who runs that loop and where the tools live, not a different mechanism. There is one endpoint.budget_tokens gone, and what replaces it?thinking: {"type": "adaptive"}), where the model decides how much to think per request, plus the output_config.effort dial (low|medium|high|xhigh|max) to trade intelligence against latency and cost. The mental shift: you set effort, not a token count.custom_id, submit, poll until ended, then key results by custom_id (results come back unordered). Put the shared instruction/schema in a cached prefix so each request only pays full price for the document itself.Claude mastery — the full curriculum, in depth curriculum
Every Claude capability worth knowing, grouped into a skill ladder: understand it → prompt it → drive real work → power features → systematise. Each row is the senior move, not the toggle — and ties to your actual deliverables.
how it works→ Prompt
basics → advanced→ Real work
email → deck → data→ Power features
projects · memory · artifacts→ Systematise
templates · workflow
| Topic | The depth that matters |
|---|---|
| How Claude works — & why it's different | a transformer trained with Constitutional AI to be helpful / honest / harmless; you're steering a probabilistic reasoner, not querying a database — so context, framing, and constraints do the work. (Full detail in How Claude works.) |
| Prompting basics — the right answer every time | be explicit about role, task, format, and length; give examples; put long context first and the instruction last. Clarity beats cleverness. |
| Advanced prompting — CoT & role prompts | “think step by step” for reasoning; a sharp persona to set tone/expertise; XML tags to separate instructions from data; prefill to lock format. (See Prompting Claude.) |
| Use case | The senior move + your anchor |
|---|---|
| Emails — any message in <2 min | give the goal, recipient, and 3 bullets; ask for 2 tonal variants. The R&A / stakeholder threads on the investigator pipeline. |
| Reports — raw notes → polished doc | paste messy notes, specify the structure, let it draft — then tighten. Your CT accuracy reports and FDA-inspection write-ups. |
| Research — any topic in one prompt | ask for a structured brief with sources and an explicit “what's uncertain” section; verify the load-bearing claims. |
| Presentations — bullets → full deck | hand it an outline; get a slide-by-slide deck in your design system — the TrainHub pitch-slides workflow. |
| Proposals — first draft, no blank page | describe the ask, audience, constraints; iterate the draft. The TrainHub / Political Pulse funding decks. |
| Data — analyse without Excel | upload a CSV / xlsx; ask for the cut, the chart, and the takeaway. The 2,295 red-name extraction and DECRS cleanup. |
| Long documents — extract key info fast | drop a 600-page PDF; ask for a structured extract against a schema. The ECO-2026 abstract parse, but conversational. |
| Feature | What it really buys you |
|---|---|
| Projects — a personal AI workspace | a persistent space with its own knowledge base + instructions, so Claude has standing context for a workstream (one per CI-Radar / investigator pipeline). (See Projects, Memory & Styles.) |
| Artifacts — documents / apps ready to use | standalone, rendered, editable output (code, HTML, a doc) you iterate on in place — this hub is an Artifact. (See Artifacts & Design.) |
| Memory — remembers your work style | carries context across chats so you stop re-explaining your systems, stack, and preferences; you curate what it keeps. |
| Canvas (Design) — write/edit in real time | a side-by-side surface to iterate on a document or design by chat instead of regenerating from scratch. |
| Move | The depth |
|---|---|
| Prompt templates — build once, reuse | turn your best prompts into parameterised templates (a report template, a code-review template) so quality is repeatable, not re-discovered each time. |
| Personal workflow — built in an hour | wire Projects + templates + the right surface (chat / Claude Code / API) into a standing routine per workstream — the difference between a tool and a system. |
| Every feature, every use case | the goal of the ladder: reach for the right capability automatically. The builders card is the next rung — Code, Cowork, API & MCP. |
Interview Q&A
The five rungs answer two different questions and people conflate them. Rungs 1–2 (understand, prompt) build your skill at steering one model; rungs 3–5 (real work, power features, systematise) build a system that holds quality even when you're not the one running it. A casual user lives on rung 2 forever — better and better prompts, every task from scratch. A power user climbs to rung 5, where prompts are reusable assets and context is infrastructure. The whole point of naming the rungs is so you can answer "how do you actually use AI day-to-day?" with a structured progression, not a feature list.
The four-rung ladder above is the knowledge-worker path. For engineers there's a fifth rung that the builders card details: Claude stops being an assistant and becomes infrastructure. The progression mirrors the lower ladder — understand the platform (one Messages API endpoint), prompt it programmatically (system + tools + structured outputs), drive real work (Claude Code on a live repo), power features (MCP connectors, prompt caching, Batch), systematise (a maintained agent + governed tools). The same teaching move applies: name the rung, then point at the deliverable it unlocks.
| Builder rung | The capability | The senior move |
|---|---|---|
| Understand the platform | one endpoint; tools/caching/thinking are features of it | choose the lowest surface that does the job |
| Prompt programmatically | system prompt + tool schemas + output_config.format | typed I/O at service boundaries, not free text |
| Claude Code on the repo | agentic, git-aware, multi-file edits | subagents for parallel work; hooks for guardrails |
| Power features | MCP, prompt caching, Batch API | caching for the fixed prefix; Batch for bulk |
| Systematise | a maintained agent over least-privilege tools | connectors as auditable trust boundaries |
A concrete sequence that climbs the ladder by shipping something on each rung. The discipline is to produce a real artefact every week tied to your own work, not to study features in the abstract — capability you can't point at a deliverable for hasn't been learned.
Interview Q&A · deep dive
MLOps · LLMOps · AIOps
Three related-but-distinct disciplines. MLOps operationalises ML models. LLMOps adapts that for prompts, RAG and tokens. AIOps is the inverse — using AI to run IT/operations. Knowing the boundaries cleanly is itself a senior signal.
MLOps lifecycle discipline
MLOps is DevOps for ML, plus two things software doesn't have: data and models as first-class versioned artifacts, and CT — continuous training alongside CI/CD. Goal: reproducible, automated, monitored model delivery.
| Capability | What it gives you | Tools (examples) |
|---|---|---|
| Experiment tracking | compare runs, params, metrics | MLflow, Weights & Biases |
| Model registry | versioning + stage promotion | MLflow Registry, SageMaker |
| Feature store | consistent features train↔serve | Feast, SageMaker FS |
| Pipeline orchestration | repeatable DAGs | Airflow, Kubeflow |
Interview Q&A
The clearest way to reason about MLOps maturity is to stop thinking "a pipeline" and count three independent ones, each with its own trigger. CI ships code (tests + lint + a model-unit test). CD ships an artifact (the trained model + serving image) through staging to prod. CT — continuous training — produces a new model when the data or the world changes. Google's maturity ladder maps directly onto how automated each of these is: Level 0 notebooks-to-prod by hand, Level 1 an automated training pipeline triggered on data/schedule, Level 2 full CI/CD/CT where a code commit rebuilds the pipeline and drift auto-triggers retraining behind a champion/challenger gate.
A feature store solves two problems at once, and that is why it keeps appearing in 2025-2026 reference stacks (Feast, Tecton, Databricks FS). The offline store serves point-in-time-correct feature snapshots for training; the online store serves the same logic at low latency for inference. "Write the feature once, serve it everywhere" is the slogan — and it is the direct cure for train/serve skew, because training and serving read from the same definition rather than two reimplementations. Point-in-time correctness (no leakage of future values into a training row) is the subtle part juniors miss.
import mlflow
from mlflow import MlflowClient
from sklearn.metrics import f1_score
mlflow.set_experiment("trial-matcher")
with mlflow.start_run() as run:
model = train(X_tr, y_tr) # your estimator
f1 = f1_score(y_val, model.predict(X_val))
mlflow.log_params({"max_depth": 8, "seed": 42})
mlflow.log_metric("val_f1", f1)
info = mlflow.sklearn.log_model(model, name="model",
registered_model_name="matcher")
# promote only if the challenger beats the current champion
client = MlflowClient()
champ = client.get_model_version_by_alias("matcher", "champion")
if f1 > float(client.get_run(champ.run_id).data.metrics["val_f1"]):
client.set_registered_model_alias("matcher", "champion", info.registered_model_version)
print("promoted", info.registered_model_version) # aliases replaced stages in MLflow 3
Interview Q&A · deep dive
t must only contain feature values knowable before t; pulling a value computed later leaks the future and inflates offline metrics that collapse in prod. A feature store does as-of joins against the offline store so each label sees only its causally-valid features, and serves the identical transformation online — killing both leakage and train/serve skew.Airflow & workflow orchestration orchestration
An orchestrator runs tasks in the right order with the right retries on the right schedule. Airflow's model is a DAG (directed acyclic graph) of tasks defined in Python; the scheduler decides what's ready, an executor runs it (locally, on Celery, on Kubernetes), and the metadata DB records every run for replay.
| Concept | What it is |
|---|---|
| DAG | the workflow definition (Python file) — tasks + their dependencies |
| Operator | a task's worker class: PythonOperator, BashOperator, KubernetesPodOperator, etc. |
| Sensor | a task that waits for a condition (file appears, partition lands) |
| Scheduler | finds ready tasks, dispatches to the executor based on dependencies + schedule |
| Executor | where tasks actually run: LocalExecutor, CeleryExecutor, KubernetesExecutor |
| XCom | small key/value hand-offs between tasks (don't push GB through it) |
| Backfill / catchup | re-run a date range — only safe if tasks are idempotent |
from airflow.decorators import dag, task
from datetime import datetime
@dag(schedule="0 11 * * 1", start_date=datetime(2026,1,1),
catchup=False, default_args={"retries": 3}) # Mondays 11:00 UTC
def registry_ingest():
@task
def fetch(reg): return crawl(reg)
@task
def extract(raw): return run_extractor(raw)
@task
def load(rows): return upsert(rows)
for reg in ["ANZCTR", "CTRI", "EUCT", "ISRCTN"]:
load(extract(fetch(reg))) # dependency inferred
dag = registry_ingest()
| Tool | Sweet spot |
|---|---|
| Airflow | scheduled batch ELT/ML pipelines; mature; broad operator ecosystem |
| Prefect | more Pythonic, dynamic workflows, hybrid cloud; great DX |
| Dagster | asset-centric: declare data assets and their lineage, not just tasks |
| Argo Workflows | Kubernetes-native, YAML-defined, container-per-task |
Interview Q&A
If you learned Airflow 2.x, three things changed that interviewers in 2026 probe for. (1) DAG versioning is now native — the metadata DB records structural changes, so the UI shows the exact DAG shape a historical run used (no more "the graph in the UI doesn't match what ran"). (2) Asset-based scheduling generalises 2.x Datasets: DAGs declare the assets they produce/consume and trigger on data events, not just clock time. (3) The Task Execution API + Task SDK decouple task execution from the scheduler, so tasks can run remotely (Edge Executor) and even in non-Python languages. Backfills are now scheduler-managed rather than a separate CLI job.
| Airflow 2.x | Airflow 3 (2025) |
|---|---|
| Datasets for data-aware scheduling | Assets — first-class, lineage-oriented |
| DAG structure not historically tracked | DAG versioning in the metadata DB + UI |
| Backfill = separate airflow dags backfill CLI | Backfill managed by the scheduler |
| Tasks coupled to scheduler (Python) | Task SDK + Execution API; remote/multi-lang |
from airflow.sdk import dag, task, Asset
from datetime import datetime
trials = Asset("s3://lake/trials/normalised") # a data asset, not a clock
@dag(schedule="0 11 * * 1", start_date=datetime(2026,1,1), catchup=False)
def ingest():
@task(outlets=[trials]) # declares it WRITES the asset
def normalise():
rows = crawl_and_clean()
upsert(rows) # idempotent → safe to backfill
normalise()
ingest()
@dag(schedule=[trials], start_date=datetime(2026,1,1)) # runs WHEN the asset updates
def rematch():
@task
def score(): run_matcher()
score()
rematch()
Interview Q&A · deep dive
catchup=True goes live (or is unpaused) the scheduler creates runs for every missed interval since start_date. Backfill is a deliberate re-run of a chosen historical date range. Both only produce correct results if tasks are idempotent; catchup=False is the safe default to avoid an accidental thundering herd of historical runs.NiFi, Kafka & streaming data flow data movement
Orchestration runs jobs; streaming moves data. Two tools dominate: Apache NiFi for visual, flow-based data movement (great for routing/enrichment across heterogeneous sources), and Apache Kafka as the durable event log that fan-out consumers read from. Together they cover "data in motion" the way Airflow covers "scheduled work."
| Tool | Mental model | Use when |
|---|---|---|
| NiFi | visual graph of processors connected by queues; flow-based | routing, enrichment, ETL across many heterogeneous sources; non-programmer friendly |
| Kafka | durable append-only log; topics + partitions; pub/sub | backbone of event-driven systems; many consumers, replay, decoupling |
| Spark Structured Streaming | micro-batch DataFrame ops over a stream | analytics over event streams with the same code as batch |
| Flink | true event-at-a-time stream processing | low-latency, event-time, exactly-once stateful processing |
Interview Q&A
Batch and stream are not different technologies so much as different answers to "how big is my window and when do I close it?" Batch waits for a bounded chunk (a day, a file) then computes. Streaming computes over an unbounded sequence, closing windows continuously. The hard part of streaming is therefore time: event time (when the thing happened) almost always lags processing time (when you saw it), so you need watermarks — a heuristic that says "I believe I've now seen all events up to time T" — to decide when a window is safe to emit and how long to wait for stragglers.
| Axis | Batch | Stream |
|---|---|---|
| Input | bounded (file/day) | unbounded (never-ending) |
| Latency | minutes–hours | ms–seconds |
| Window | the whole batch | tumbling/sliding/session + watermark |
| Reprocessing | rerun the job | rewind the offset / replay the log |
A topic is split into partitions; a partition is an ordered, append-only log. Ordering is per-partition only — choose a partition key (e.g. trial_id) so all events for one entity land on one partition and stay ordered. A consumer group shares partitions among its members: at most one consumer per partition, so your max parallelism equals the partition count (more consumers than partitions = idle consumers). Each consumer tracks an offset; "replay" is just resetting it. Kafka 4.0 (2025) dropped ZooKeeper entirely — KRaft is the only mode — and added share groups (KIP-932), true queue semantics with per-message acks and redelivery, so Kafka can now do work-queue patterns it previously couldn't.
from confluent_kafka import Consumer
c = Consumer({
"bootstrap.servers": "broker:9092",
"group.id": "trial-matcher",
"enable.auto.commit": False, # commit only after the sink write
"auto.offset.reset": "earliest",
})
c.subscribe(["trials.normalised"])
try:
while True:
msg = c.poll(1.0)
if msg is None or msg.error(): continue
key = msg.key().decode()
upsert_idempotent(key, msg.value()) # sink is idempotent on key → effectively-once
c.commit(msg, asynchronous=False) # commit AFTER successful write
finally:
c.close()
Interview Q&A · deep dive
Monitoring & drift production
A deployed model degrades over time. Data drift = input distribution shifts; concept drift = the input→output relationship itself changes. You can't fix what you don't measure — monitoring closes the loop back to retraining.
| Watch | Signal |
|---|---|
| Data drift | feature distributions move (PSI, KS test) |
| Concept drift | accuracy/precision drops on fresh labels |
| Data quality | nulls, schema changes, out-of-range |
| Operational | latency, throughput, error rate, cost |
Interview Q&A
Drift detection is distribution comparison: reference window vs current window, per feature. The two staples behave very differently. KS (Kolmogorov-Smirnov) compares two empirical CDFs and returns a p-value — sensitive, great for numeric features on modest samples, but too sensitive on big data (it flags statistically-significant-but-meaningless shifts). PSI (Population Stability Index) bins the feature and sums (curr% - ref%) * ln(curr%/ref%) across bins — it returns a magnitude that is roughly sample-size-independent, with the field-standard thresholds <0.1 stable, 0.1–0.25 moderate, ≥0.25 significant. Rule of thumb: PSI for monitoring dashboards and thresholded alerts; KS when you need a hypothesis test on a sane sample size.
import numpy as np
def psi(ref, cur, bins=10):
# fixed bin edges from the REFERENCE (quantile bins handle skew)
edges = np.quantile(ref, np.linspace(0, 1, bins + 1))
edges[0], edges[-1] = -np.inf, np.inf
r = np.histogram(ref, edges)[0] / len(ref)
c = np.histogram(cur, edges)[0] / len(cur)
r, c = np.clip(r, 1e-6, None), np.clip(c, 1e-6, None) # avoid log(0)/div0
return float(np.sum((c - r) * np.log(c / r)))
score = psi(train_feature, live_feature)
verdict = ("stable" if score < 0.1 else
"moderate" if score < 0.25 else "SIGNIFICANT")
print(round(score, 3), verdict) # e.g. 0.317 SIGNIFICANT → trigger retrain review
Interview Q&A · deep dive
LLMOps your stack
LLMOps = MLOps adapted to LLM apps. The artifacts shift from "weights you trained" to prompts, retrieval indexes, and provider models, and new concerns appear: token cost, latency, guardrails, and eval-as-CI.
| Concern | Practice |
|---|---|
| Prompt versioning | treat prompts as code: version, review, A/B |
| Cost & tokens | track tokens per call/feature; cache; route to cheaper models |
| Caching | semantic/exact cache for repeated queries → latency + $ down |
| Guardrails | input/output validation, PII checks, schema enforcement, refusals |
| Observability | trace every step (retrieval, prompt, tokens, latency) |
| Eval pipeline | RAGAS/DeepEval gates in CI (see Evals) |
Interview Q&A
The deep difference is the artifact you own. In MLOps you trained the weights, so you control them and version them. In LLMOps the weights live behind a provider API you cannot retrain — so your versioned artifacts become the prompt, the retrieval index, the tool definitions, and the model+params selection. That reshapes every concern: "training" becomes prompt iteration + eval; "model registry" becomes a prompt registry (LangSmith calls a version a prompt commit with a hash); "drift" includes silent provider model updates changing behavior under you; and a brand-new first-class cost axis appears — tokens — because every request has a marginal dollar and latency cost that a trained classifier never did.
| Concern | MLOps | LLMOps |
|---|---|---|
| Core artifact | trained weights | prompt + index + model choice |
| "Training" | fit on labeled data | prompt iteration + eval-in-loop |
| Cost driver | compute at train time | tokens per request, forever |
| Eval | F1/AUC on a test set | LLM-as-judge, faithfulness, groundedness |
| Silent regression | data drift | provider model update + prompt edits |
import hashlib, time, json
def cached_generate(prompt, cache, *, model="claude-...", max_tokens=512):
# exact-match cache; a semantic cache embeds the prompt instead
key = hashlib.sha256((model + prompt).encode()).hexdigest()
if key in cache:
return {**cache[key], "cache": True, "cost_usd": 0.0}
if not guard_input(prompt): # PII / injection / length checks
raise ValueError("input guardrail failed")
t0 = time.perf_counter()
resp = client.generate(model, prompt, max_tokens=max_tokens)
out = {
"text": resp.text, "cache": False,
"in_tok": resp.usage.input_tokens, # meter every call
"out_tok": resp.usage.output_tokens,
"latency_ms": round((time.perf_counter() - t0) * 1000),
"cost_usd": price(resp.usage, model),
}
assert guard_output(out["text"]) # schema / grounding / safety
cache[key] = out
log_trace(json.dumps({"feature": "matcher", **out})) # tag spend by feature
return out
Interview Q&A · deep dive
AIOps inverse
AIOps applies AI to IT operations — using ML on logs, metrics, traces, and events to detect anomalies, correlate alerts, find root cause, and automate response. Don't confuse it with MLOps (which operationalises ML); the arrow points the other way.
| Capability | Value |
|---|---|
| Anomaly detection | catch issues before threshold alerts fire |
| Event correlation | collapse 100 alerts into 1 incident |
| Root-cause analysis | point at the likely failing component |
| Auto-remediation | restart/scale/rollback known patterns |
Interview Q&A
The reason AIOps exists is that static threshold alerting breaks at scale in two directions. It is too noisy (CPU > 80% fires nightly during the backup window — a false positive) and too blind (a 3am latency creep that never crosses any single threshold but is a real incident in the making). AIOps replaces fixed lines with learned baselines: anomaly detection that knows the seasonal shape of normal (weekday vs weekend, hourly cycles) and flags deviation from expected, not deviation from a constant. The second leg is correlation: a single root failure emits a storm of symptom alerts across dozens of services, and the value is collapsing those 100 alerts into one incident with a probable cause — directly cutting MTTR.
import numpy as np
def anomalies(series, period=24, win=14, z=3.5):
# series: hourly metric. Deseasonalise by same-hour-of-day baseline.
s = np.asarray(series, dtype=float)
out = []
for i in range(period * win, len(s)):
same_hour = s[i - period * win : i : period] # last win same-hour points
mu, sd = same_hour.mean(), same_hour.std() + 1e-9
score = (s[i] - mu) / sd # robust-ish residual z
if abs(score) > z:
out.append((i, round(float(score), 2))) # (index, severity)
return out # feed these to correlation, not straight to a page
print(anomalies(latency_hourly)) # e.g. [(412, 6.1)] → 3am spike vs its own baseline
Interview Q&A · deep dive
Celery — distributed task queues async jobs
Celery runs work outside the request/response cycle: a producer enqueues a task to a broker (Redis or RabbitMQ), workers pull and execute it, and a result backend stores the outcome. It's how apps offload slow or scheduled work — emails, ETL, ML inference, report generation.
from celery import Celery
app = Celery("tasks", broker="redis://localhost",
backend="redis://localhost")
@app.task(bind=True, max_retries=3, acks_late=True)
def process(self, record_id):
try:
return crunch(record_id) # heavy work, off the request path
except TransientError as e:
raise self.retry(exc=e, countdown=5) # backoff + retry
process.delay(42) # enqueue; returns immediately (async)
| Piece | Role |
|---|---|
| Broker (Redis / RabbitMQ) | the queue tasks are pushed to and pulled from |
| Worker | a process that consumes and runs tasks (scale horizontally) |
| Result backend | stores return values / state (optional) |
| .delay() / .apply_async() | enqueue a call; apply_async adds eta, retries, routing |
| Celery Beat | the scheduler — cron-like periodic tasks |
| chain / group / chord | compose tasks into sequential / parallel workflows |
Interview Q&A
Every Celery design question reduces to when is the message acknowledged? Default (early ack, acks_late=False): the broker removes the message the moment a worker receives it. If that worker crashes mid-task the message is gone — at-most-once, you can lose work. Late ack (acks_late=True): the message is acked only after the task returns, so a crash mid-task re-delivers it to another worker — at-least-once, you never lose work but you must be idempotent. The subtle 2025-era gotcha: task_acks_on_failure_or_timeout defaults to True, so a task that raises or times out is still acked and not auto-redelivered — only a hard worker crash redelivers. Use explicit self.retry for failures you want replayed.
| Setting | Effect | Requires |
|---|---|---|
| acks_late=False (default) | at-most-once; lose task on crash | nothing; fine for cheap, replayable work |
| acks_late=True | at-least-once; redeliver on crash | idempotent tasks |
| worker_prefetch_multiplier=1 | fair dispatch for long tasks | set with acks_late for long jobs |
| task_acks_on_failure_or_timeout | True ⇒ failures/timeouts ack'd, no auto-redeliver | use self.retry to replay |
from celery import Celery, chord, group
from celery.schedules import crontab
app = Celery("ingest", broker="redis://localhost", backend="redis://localhost")
app.conf.worker_prefetch_multiplier = 1 # long tasks → fair dispatch
@app.task(bind=True, acks_late=True, max_retries=5,
retry_backoff=True, retry_jitter=True) # exp backoff + jitter
def crawl(self, reg):
if already_done(reg, self.request.id): # dedupe → idempotent on retry
return "skip"
try:
return upsert(fetch(reg)) # upsert, never blind insert
except TransientError as e:
raise self.retry(exc=e) # replay only transient failures
@app.task
def summarise(results): return rollup(results)
# fan out all registries, then run summarise once when ALL finish (chord)
def kickoff(regs):
return chord(group(crawl.s(r) for r in regs))(summarise.s())
app.conf.beat_schedule = { # cron-like periodic trigger
"weekly-crawl": {"task": "ingest.crawl",
"schedule": crontab(hour=11, minute=0, day_of_week=1)},
}
Interview Q&A · deep dive
task_acks_on_failure_or_timeout defaults to True so it's acked and not redelivered. Only an actual worker crash (with acks_late) triggers redelivery. To replay a failure you must call self.retry() (ideally only for transient errors, with backoff and a max_retries cap) or configure autoretry_for.chain = sequential pipeline (output of one feeds the next). group = parallel fan-out of independent tasks. chord = a group plus a callback that runs once after all group tasks complete (map-reduce). Use chord when you must aggregate results of a parallel fan-out; note the chord callback waits on the whole group, so one slow task delays the rollup.Docker & Kubernetes
How your pipelines and services actually run in production. Docker makes one app portable and reproducible; Kubernetes runs many containers reliably at scale — self-healing, scaling, and rolling them out. Concept → real commands → the architecture diagram.
Docker fundamentals — what a container actually is containers
A container packages your app + every dependency into one isolated, portable unit that runs the same on any host. Underneath it's not magic: containers are a Linux process that the kernel restricts using namespaces (what it can see) and cgroups (how much it can use). They share the host kernel — so they start in milliseconds and weigh megabytes, where a VM boots an OS and weighs gigabytes.
| Term | Means |
|---|---|
| Image | immutable blueprint — a stack of read-only layers + metadata (entrypoint, env, ports) |
| Container | a running instance of an image plus one writable layer on top |
| Registry | store/distribution for images (Docker Hub, ECR, GitHub, Harbor) |
| Layer | a single filesystem change — cached and shared across images |
| Volume / bind mount | persistent data outside the container's writable layer |
| OCI | the standard image & runtime spec; runc + containerd are the typical engine |
docker build -t myapp:1.0 . # build image from current dir
docker run -d --name api -p 8080:80 myapp:1.0 # run detached, map host:container ports
docker ps # running containers (-a = include stopped)
docker logs -f api # follow logs
docker exec -it api bash # shell into a running container
docker stop api && docker rm api # clean stop + remove
docker image prune -a # reclaim disk; deletes unused images
Interview Q&A
Three nouns get conflated in interviews. Code is your source on disk. An image is the frozen, content-addressed result of building that code — a stack of read-only layers identified by a sha256 digest, not just a tag. A container is a running image: the kernel takes those read-only layers, adds one thin writable layer on top (copy-on-write), wraps the process in namespaces and cgroups, and starts PID 1. Same image, ten containers = ten writable layers over one shared read-only stack — that sharing is why density is high and pulls are cheap.
No "container" object exists in Linux — it's an illusion assembled from three primitives. Namespaces control what a process can see (its own PID 1, network stack, mounts, hostname, users). cgroups v2 control how much it can use (CPU shares, memory limit + OOM, pids, IO). Union/overlay filesystem (overlayfs) stacks the read-only image layers under one writable layer so the FS looks unified but writes never touch the image.
# A container is a normal host process — find its real PID
docker run -d --name web nginx
docker inspect --format '{{.State.Pid}}' web # e.g. 24817 — visible on the HOST
# Inside the container it thinks it is PID 1 (pid namespace)
docker exec web ps -o pid,cmd # PID 1 nginx — same process, different view
# cgroup limits are enforced by the kernel, not Docker
docker run --memory=256m --cpus="0.5" --pids-limit=100 myapp
cat /sys/fs/cgroup/memory.max # 268435456 — the 256m ceiling, set on the host
# Layers are content-addressed: the digest, not the tag, is identity
docker image inspect nginx --format '{{.Id}}' # sha256:... immutable
docker pull nginx@sha256:abc123... # pin by digest in prod, never :latest
| Isolation primitive | What it bounds | You feel it as |
|---|---|---|
| PID namespace | process visibility | your app is PID 1; can't see host processes |
| Network namespace | interfaces, ports, routes | container's own eth0, its own localhost |
| Mount namespace | filesystem view | your own / from the image layers |
| cgroup memory.max | RAM ceiling | OOM-kill at the limit, not host exhaustion |
| User namespace | UID mapping | root in container ≠ root on host (when enabled) |
Interview Q&A · deep dive
docker run to a running process — who does what?dockerd daemon over its socket; dockerd hands the work to containerd (the high-level runtime that manages image pulls and container lifecycle); containerd spawns a containerd-shim per container and calls runc, the low-level OCI runtime, which sets up the namespaces + cgroups and execs your entrypoint as PID 1. The shim stays alive so the container survives daemon restarts.FROM python:3.12-slim, both reference the identical base layers by digest — stored once, reused everywhere. Only the layers that actually differ cost extra disk and network. This is also why reordering Dockerfile instructions changes cache hits across builds.myapp:1.0) is a mutable human label that can be re-pointed at a new image any time. A digest (myapp@sha256:...) is the immutable cryptographic identity of the exact bytes. For reproducible, tamper-evident deploys you pin by digest; tags are for humans and dev convenience.Dockerfile & the multi-stage build build
A Dockerfile is the recipe. Every instruction creates a layer the engine caches and reuses on the next build. The senior moves are: order instructions for cache hits, use a multi-stage build so build-time tools never reach production, and pin/minimise the base image.
# ---- builder stage: heavy, has compilers ----
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# ---- runtime stage: tiny, no build tools ----
FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH PYTHONUNBUFFERED=1
RUN useradd -m app && chown -R app /app
USER app # NEVER run as root
EXPOSE 8080
HEALTHCHECK --interval=30s CMD curl -fsS http://localhost:8080/health || exit 1
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]
| Instruction | What it does | Tip |
|---|---|---|
| FROM | base image | pin a tag (:3.12-slim) — never :latest in prod |
| WORKDIR | cd inside the image | set once; avoids cd in RUN |
| COPY / ADD | copy files in | use COPY; ADD only for tarballs/URLs |
| RUN | execute a command in a new layer | chain with && and clean caches in the same layer |
| ENV / ARG | runtime env / build-time arg | don't bake secrets into either |
| USER | which UID runs the process | non-root for production |
| HEALTHCHECK | liveness probe inside the image | K8s usually overrides this with its own probes |
| CMD vs ENTRYPOINT | default cmd vs fixed binary + args | use both: ENTRYPOINT for the bin, CMD for args you may override |
Interview Q&A
Modern docker build uses BuildKit (default since Docker 23, the only builder in recent releases). It doesn't run instructions top-to-bottom blindly — it builds a DAG of build targets, runs independent stages in parallel, and skips any stage whose output nobody needs. That's why a multi-stage file with a test stage and a prod stage only builds prod when you target it. BuildKit also adds cache mounts (persist a package cache between builds without baking it into the image) and secret mounts (inject a token at build time that never lands in any layer).
# syntax=docker/dockerfile:1 ← enables BuildKit frontend features
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
# cache mount: pip's cache survives across builds, NOT baked into the layer
RUN --mount=type=cache,target=/root/.cache/pip \
pip install --prefix=/install -r requirements.txt
# secret mount: token is available here but never persisted in any layer
RUN --mount=type=secret,id=pip_token \
PIP_INDEX_URL=$(cat /run/secrets/pip_token) pip install --prefix=/install internal-pkg
# ---- final: distroless = no shell, no package manager, tiny attack surface ----
FROM gcr.io/distroless/python3-debian12:nonroot
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .
USER nonroot # distroless:nonroot already ships uid 65532
EXPOSE 8080
# exec-form ENTRYPOINT → app is PID 1, gets SIGTERM directly for clean shutdown
ENTRYPOINT ["python", "-m", "uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8080"]
# BuildKit is on by default; pass the secret from an env var or file
docker build --secret id=pip_token,env=PIP_TOKEN -t myapp:1.0 .
# build only a named stage (e.g. run tests in CI without shipping them)
docker build --target builder -t myapp:test .
# inspect what actually bloats the image, layer by layer
docker history myapp:1.0 # or the `dive` tool for an interactive view
| Base image choice | Has a shell? | Size / use when |
|---|---|---|
| python:3.12 | yes (full Debian) | ~1GB — only if you need apt at runtime |
| python:3.12-slim | yes | ~150MB — sane default, debuggable |
| alpine | yes (busybox) | tiny, but musl libc breaks some wheels |
| distroless/python3 :nonroot | no shell, no apt | smallest attack surface, prod-grade |
| scratch | nothing at all | static Go/Rust binaries only |
Interview Q&A · deep dive
ARG or ENV a vulnerability, and what's the fix?ARG and ENV values are recorded in the image's layer history — anyone with the image can run docker history or unpack the layer and read them, even if a later layer "deletes" the file. The fix is BuildKit --mount=type=secret: the secret is mounted only for that one RUN and is never written to any layer. For runtime secrets, inject via the orchestrator (K8s Secret / env at run time), not the build.--mount=type=cache) is a build-time-only scratch area — it persists across builds on the builder to speed up package installs, but it is discarded when the RUN finishes and never becomes part of the image. So you get fast re-installs without bloating the final image with download caches.scratch for a static binary). The tradeoff is debuggability — no exec into a shell, no package manager, healthchecks that call curl stop working. You mitigate with the :debug variant, ephemeral debug containers, and moving liveness checks to the platform.--target (or the final stage) is skipped entirely. So you can keep a test stage and a heavy builder stage in the same file with zero cost to the production build — only what the target needs is executed.Docker workflow — dev → build → ship → run lifecycle
The Dockerfile is the recipe; the workflow is everything around it — the loop you actually live in. One sentence: you build an immutable image from a Dockerfile, tag it with a name and version, push it to a registry, then anywhere it's needed you pull and run it. Dev does this by hand with Compose; CI does it on every merge; the orchestrator does the pull+run for you. Below: the full lifecycle, every instruction in one place, and the registry/runtime commands that don't live in the Dockerfile.
| Instruction | What it does | The bit interviews probe |
|---|---|---|
| FROM | base image; every build starts here | pin a tag or digest — :latest is non-reproducible. Use FROM x AS builder for multi-stage |
| LABEL | metadata (maintainer, version, source) | free; use OCI keys (org.opencontainers.image.source) so registries link the image back to the repo |
| ARG | build-time variable | only exists during build; lands in docker history — never a secret. Scoped per-stage |
| ENV | env var set at build & baked into runtime | persists in the running container; also in history — not for secrets either |
| WORKDIR | cd inside the image (creates the dir) | set once; avoids cd in RUN and absolute-path bugs |
| COPY | copy files from build context into the image | the default — predictable, no surprises. COPY --from=builder pulls artefacts across stages |
| ADD | COPY + auto-extract local tar + fetch URLs | avoid unless you actually want tar extraction; the URL/extract magic causes cache and security surprises |
| RUN | execute a command, freeze the result as a layer | chain with && and clean caches in the same layer; add --mount=type=cache for fast re-installs |
| EXPOSE | documents the listening port | documentation only — does not publish it. You still need -p host:container at run time |
| VOLUME | marks a path as externally-mounted storage | data there escapes the image layers; in K8s you usually skip it and mount a PVC explicitly instead |
| USER | which UID runs the following steps + the process | switch to non-root before CMD; root-in-container is root-on-kernel without userns |
| HEALTHCHECK | liveness probe baked into the image | fine for plain Docker/Compose; K8s overrides it with its own liveness/readiness probes |
| ENTRYPOINT | the fixed executable | use exec form ["python","app.py"] so your app is PID 1 and receives SIGTERM |
| CMD | default args (or default command) | with an ENTRYPOINT, CMD becomes the default args that docker run can override |
| ONBUILD | deferred instruction that fires in a child build | only for shared base images (e.g. a company "python-service" base); surprising, so document it loudly |
# 1) build with a meaningful name (BuildKit is the default builder)
docker build -t myapp:1.4.0 .
# 2) tag for a specific registry. format: registry/namespace/repo:version
docker tag myapp:1.4.0 ghcr.io/globaldatahc/myapp:1.4.0
docker tag myapp:1.4.0 ghcr.io/globaldatahc/myapp:latest # moving pointer for convenience only
# 3) authenticate (token via stdin so it never hits your shell history)
echo $GHCR_TOKEN | docker login ghcr.io -u myuser --password-stdin
# AWS ECR uses a short-lived token instead of a static password:
aws ecr get-login-password --region ap-south-1 | docker login --password-stdin 1234.dkr.ecr.ap-south-1.amazonaws.com
# 4) push every tag, then pull elsewhere
docker push ghcr.io/globaldatahc/myapp:1.4.0
docker pull ghcr.io/globaldatahc/myapp:1.4.0
# 5) in PROD pin the immutable digest, not a mutable tag — guarantees the exact bytes
docker pull ghcr.io/globaldatahc/myapp@sha256:9f2b...c1
docker run -d ghcr.io/globaldatahc/myapp@sha256:9f2b...c1
# named volume: managed by Docker, survives container recreation (DBs, uploads)
docker run -d --name db -v pgdata:/var/lib/postgresql/data postgres:16
# bind mount: a host path mapped in — the dev hot-reload pattern (NOT for prod data)
docker run -d -v $(pwd)/src:/app/src myapp:1.4.0
# user-defined network: containers reach each other by service NAME via Docker DNS
docker network create appnet
docker run -d --name api --network appnet -p 8080:8080 myapp:1.4.0
# 'api' can now reach 'db' as the hostname db:5432 — no IPs to hard-code
# inject config/secrets at RUN time (never bake them into the image)
docker run -d --env-file ./prod.env --name api myapp:1.4.0
# restart policy + resource caps the kernel enforces via cgroups
docker run -d --restart unless-stopped --memory=512m --cpus="1.0" myapp:1.4.0
| Do this | Because |
|---|---|
| Minimal/pinned base (-slim, distroless, :3.12 not :latest) | smaller pulls, fewer CVEs, reproducible builds |
| Multi-stage build | compilers and build tools never reach the runtime image — smaller + safer |
| .dockerignore for .git, venv, secrets, node_modules | shrinks build context, speeds builds, stops secrets leaking into layers |
| Order layers stable→volatile (deps before source) | a code change rebuilds only the last layers; the heavy install stays cached |
| One concern per RUN, clean caches in the same layer | a later rm can't shrink an earlier layer — the bytes are already frozen |
| Run as a non-root USER | limits blast radius if the process or a kernel bug is exploited |
| Secrets via --mount=type=secret / runtime env, never ARG/ENV | ARG/ENV are readable in docker history forever |
| Exec-form ENTRYPOINT + a HEALTHCHECK | clean SIGTERM shutdown as PID 1; orchestrator knows when you're really ready |
| Deploy by version tag or digest, scan the image | auditable, rollback-able, and you catch known CVEs before they ship |
Interview Q&A
build turns the Dockerfile + context into an immutable, content-addressed image; tag gives it a registry-qualified name and version; push uploads the layers to a registry; the prod node (or K8s) does pull by digest and run, which adds a writable layer and starts your process as PID 1. CI automates build/tag/push on merge; the orchestrator automates pull/run on deploy.-p — what's the difference?EXPOSE is documentation inside the image — it records which port the app listens on but publishes nothing. To actually reach the container from the host you pass -p host:container at run time (or ports: in Compose). So EXPOSE 8080 with no -p means the port is unreachable from outside.:latest for production deploys?:latest isn't a fixed thing — two pulls can return different images. A @sha256 digest addresses the exact bytes of the image, so a deploy is fully reproducible and rollbacks are precise. Tag with a SHA/semver for humans, deploy by digest for guarantees.Docker Compose — multi-container, the right way orchestration
Compose declares a multi-container app in one YAML file: services, networks, volumes, dependencies, and health. It's the right tool for local development and simple single-host deployments; for production at scale you graduate to Kubernetes.
services:
api:
build: .
ports: ["8080:8080"]
environment:
DATABASE_URL: postgres://app:secret@db:5432/trials
depends_on:
db: { condition: service_healthy } # wait for db's healthcheck
restart: unless-stopped
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
POSTGRES_DB: trials
volumes: [pgdata:/var/lib/postgresql/data] # persists across restarts
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app"]
interval: 5s
retries: 5
volumes:
pgdata:
| Concept | What it does |
|---|---|
| service | one container definition (image + config) |
| depends_on + healthcheck | order startup and wait until a dependency is actually ready, not just running |
| networks | services on the same network reach each other by service name (DNS) — api calls db:5432 |
| volumes | named volumes for persistent data; bind mounts for source code in dev |
| profiles | opt-in services (e.g. --profile monitoring) without changing the file |
| override files | docker-compose.override.yml layered automatically; great for dev-only mounts |
| .env | variable substitution from a file — keep secrets out of the YAML |
docker compose up -d # start everything detached
docker compose ps # status
docker compose logs -f api # follow one service
docker compose exec api bash # shell into a running service
docker compose build --no-cache # force a clean build
docker compose down -v # stop + remove containers AND volumes
Interview Q&A
The base compose.yaml (the modern filename; docker-compose.yml still works) describes the app. Real teams layer on top of it instead of forking it: override files merge automatically for dev-only mounts, profiles gate optional services (monitoring, seed jobs) behind a flag, and Compose watch gives container-native hot-reload by syncing source or rebuilding on change. One file set drives laptop, CI, and a single-host prod box — the difference is just which override and which profiles you enable.
# compose.yaml — no top-level `version:` key; it's obsolete in Compose v2
services:
api:
build:
context: .
target: builder # build a specific multi-stage target
ports: ["8080:8080"]
env_file: [.env] # keep config out of YAML
secrets: [db_password] # mounted at /run/secrets/db_password
depends_on:
db: { condition: service_healthy } # gate on health, not just start
deploy:
resources: { limits: { cpus: "1.0", memory: 512M } }
develop:
watch: # hot reload without rebuilding the world
- { action: sync, path: ./src, target: /app/src }
- { action: rebuild, path: requirements.txt }
restart: unless-stopped
db:
image: postgres:16-alpine
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
volumes: [pgdata:/var/lib/postgresql/data]
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5
start_period: 30s # grace window before failures count
seed:
image: myapp:dev
profiles: [tools] # only runs with: compose --profile tools up
command: python -m seed_db
depends_on:
db: { condition: service_healthy }
volumes:
pgdata:
secrets:
db_password:
file: ./secrets/db_password.txt
# compose.override.yaml — auto-merged on top in dev only
services:
api:
build: { target: builder }
volumes: ["./src:/app/src"] # live bind mount for fast edit-loop
environment: { LOG_LEVEL: debug }
# --- commands ---
# docker compose up -d base + override auto-merge
# docker compose --profile tools run seed one-off opt-in service
# docker compose -f compose.yaml up prod: ignore the dev override
# docker compose watch start with file-sync hot reload
# docker compose config print the fully-merged, resolved config
| Mechanism | Solves | Gotcha |
|---|---|---|
| healthcheck + service_healthy | "DB is up but not ready" crashes | without start_period, init counts as failures |
| override file | dev mounts without forking prod config | auto-merged only when named override |
| profiles | optional services in one file | profiled services don't start by default |
| secrets | creds out of env/YAML | file-based locally; real secret store in prod |
| develop.watch | hot reload, no rebuild churn | sync needs the app to reload; rebuild for deps |
Interview Q&A · deep dive
compose.override.yaml is auto-applied on top of compose.yaml when no -f is specified. For prod you pass an explicit set (-f compose.yaml -f compose.prod.yaml) and skip the dev override. Lists like ports are replaced, maps like environment are merged — verify with docker compose config.depends_on orders startup — but does it guarantee readiness?depends_on only waits for the dependency container to start, not for the app inside to accept traffic. To wait for actual readiness you add a healthcheck to the dependency and use depends_on: { db: { condition: service_healthy } }. Even then, your app should retry connections — health gating reduces but never fully eliminates startup races..env / env_file; sensitive values go through the secrets: block, which mounts them as files under /run/secrets/ rather than putting them in the environment. Env vars leak via docker inspect, child processes, and crash dumps; a mounted secret file with restricted perms is harder to exfiltrate. In real prod you back secrets with an external store.version: key, and why did it go away?docker compose) ignores the legacy version: field and derives capabilities from the schema directly, so it's now considered obsolete and omitted. The old Compose v1 Python tool (docker-compose) used it to select a schema version; v2 made it unnecessary.The Kubernetes architecture — control plane + nodes model
Kubernetes runs many containers reliably at scale. You declare desired state as YAML; controllers continuously reconcile actual state toward it. If a pod dies, it's recreated; if traffic spikes, replicas are added; if a node fails, work is rescheduled — all automatically.
| Control-plane component | Role |
|---|---|
| kube-apiserver | the only thing anything talks to — REST API in front of etcd; auth, validation, admission |
| etcd | strongly-consistent KV store — the cluster's source of truth; back this up |
| kube-scheduler | picks the node for each new pod (resources, affinity, taints, topology) |
| kube-controller-manager | bundle of controllers that reconcile desired vs actual state (Deployment, ReplicaSet, Node, etc.) |
| cloud-controller-manager | cloud-specific bits (load balancers, volumes, node lifecycle) on managed K8s |
| On every node | Role |
|---|---|
| kubelet | the node agent — talks to API server, asks the container runtime to run pods, reports status |
| Container runtime (CRI) | containerd or CRI-O — actually runs the containers (Docker Engine needs cri-dockerd as a shim; dockershim was removed in v1.24) |
| kube-proxy | programs iptables/IPVS so Service IPs route to the right pod |
| CNI plugin | pod networking (Calico, Cilium, Flannel) — no CNI installed means CoreDNS stays Pending |
Interview Q&A
Strip away every object and one idea remains: a control loop. A controller watches the API server for objects of a kind, reads their spec (desired) and status (observed), computes the diff, takes one action to close it, writes status back, and repeats — forever. Deployments, ReplicaSets, the scheduler, even your own custom resources are all just this loop. "Self-healing," "rolling updates," and "autoscaling" aren't features bolted on; they are emergent from many tiny reconcilers each driving actual state toward desired.
Components never talk to each other directly — they all talk to kube-apiserver, which is the only thing that touches etcd. The scheduler watches for unscheduled pods and writes a node binding back; the kubelet watches for pods bound to its node and acts; controllers watch their objects and write status. This level-triggered, watch-based design (react to current state, not to a one-shot event) is what makes K8s resilient: any component can crash and restart, re-list the current state, and carry on — no missed events, no central message bus to lose.
# Apply desired state; the loop takes over from here
kubectl apply -f deploy.yaml
kubectl scale deploy/api --replicas=5 # edit desired → controller reconciles to 5
# Kill a pod and watch the ReplicaSet controller recreate it
kubectl get pods -w # -w = watch the stream live
kubectl delete pod api-7d9f-abcde # a fresh pod appears within seconds
# See WHY the scheduler placed (or couldn't place) a pod
kubectl describe pod api-7d9f-fghij | grep -A5 Events
# Events: FailedScheduling 0/3 nodes available: insufficient cpu — the loop is telling you the diff
# etcd is the source of truth; everything else is a cache + a loop
kubectl get --raw=/healthz/etcd # ok — if etcd is unhealthy, the brain is down
| Symptom | Which component / loop | What it means |
|---|---|---|
| Pod stuck Pending | kube-scheduler | no node satisfies resources/affinity/taints |
| Pod stuck ContainerCreating | kubelet + CNI/runtime | image pull, volume mount, or missing CNI |
| Replicas not restored | controller-manager | ReplicaSet loop wedged or paused rollout |
| Service has no endpoints | endpoints/kube-proxy | selector mismatch or pods not Ready |
| Whole cluster read-only / slow | etcd / apiserver | etcd quorum lost or apiserver overloaded |
Interview Q&A · deep dive
kubectl describe pod and read Events — Pending almost always means the scheduler found no fitting node: insufficient CPU/memory, a node taint the pod doesn't tolerate, an unsatisfiable affinity/anti-affinity, or no node matching a required topology/PVC zone. Fix is to add capacity, adjust requests, add a toleration, or relax the constraint. (ContainerCreating, by contrast, is the kubelet stage — image/volume/CNI.)containerd or CRI-O, which in turn use runc to create the namespaces/cgroups. Docker Engine isn't CRI-native; the built-in shim was deprecated in 1.20 and removed in 1.24, so using Docker as the node runtime now requires the external cri-dockerd adapter. Most clusters just standardize on containerd.Core Kubernetes objects — the ones you write every week api
K8s is a system of objects — each is a typed YAML record with a spec (desired) and status (actual). These are the dozen you actually use; everything else builds on them.
| Workload | Use it for |
|---|---|
| Pod | the unit of scheduling — 1+ containers sharing network/volumes. Rarely created directly |
| Deployment | stateless apps; declares replicas + image; gives you rolling updates & rollback |
| StatefulSet | stateful apps needing stable identity + ordered start (databases, leader-election) |
| DaemonSet | one pod per node (log collector, node-level agent) |
| Job / CronJob | run-to-completion / scheduled tasks (batch ingest, nightly aggregation) |
| ReplicaSet | kept by Deployment; you almost never touch it directly |
| Networking | Use it for |
|---|---|
| Service · ClusterIP | stable virtual IP + DNS inside the cluster (default) |
| Service · NodePort | expose on every node's IP at a port — dev only |
| Service · LoadBalancer | provisions a cloud load balancer in front of the Service |
| Ingress | HTTP(S) routing rules (host/path) into Services; needs an ingress controller |
| NetworkPolicy | pod-level firewall — default-deny + explicit allow rules |
| Config & storage | Use it for |
|---|---|
| ConfigMap | non-sensitive config (env vars, files) |
| Secret | credentials and TLS — base64 by default, enable etcd-at-rest encryption |
| PVC / PV / StorageClass | persistent storage — claim, volume, and the provisioner that fulfils it |
| Namespace | logical partition for RBAC, quotas, and naming — your team/env boundary |
apiVersion: apps/v1
kind: Deployment
metadata: { name: api }
spec:
replicas: 3
selector: { matchLabels: { app: api } }
template:
metadata: { labels: { app: api } }
spec:
containers:
- name: api
image: registry.example.com/myapp:1.0
ports: [{ containerPort: 8080 }]
resources:
requests: { cpu: "100m", memory: "256Mi" }
limits: { cpu: "500m", memory: "512Mi" }
envFrom:
- configMapRef: { name: api-config }
- secretRef: { name: api-secrets }
---
apiVersion: v1
kind: Service
metadata: { name: api }
spec:
selector: { app: api } # match the Deployment's labels
ports: [{ port: 80, targetPort: 8080 }]
Interview Q&A
Every object you write is a declaration of desired state, not a command. You apply a spec; a controller watches it and runs a loop forever: observe actual → diff against spec → take one corrective action → repeat. There is no "create pod" verb under the hood — a Deployment controller notices it has 2 pods but wants 3 and makes one. This is why deleting a Deployment-managed pod just brings it back: you changed actual state, not desired. Internalising this loop explains 90% of "why did K8s do that?" moments.
# ConfigMap: non-secret config, consumed two ways
apiVersion: v1
kind: ConfigMap
metadata: { name: api-config }
data:
LOG_LEVEL: "info"
app.yaml: | # a whole file, mounted as a volume
timeout: 30
retries: 3
---
apiVersion: v1
kind: Secret
metadata: { name: api-secrets }
type: Opaque
stringData: # stringData = plain in, base64 at rest (no manual encode)
DATABASE_URL: "postgres://app:pw@db:5432/prod"
---
apiVersion: batch/v1
kind: CronJob
metadata: { name: nightly-rollup }
spec:
schedule: "0 2 * * *" # 02:00 daily
concurrencyPolicy: Forbid # skip a run if the prior one is still going
jobTemplate:
spec:
backoffLimit: 3 # retry the Job 3x before marking failed
template:
spec:
restartPolicy: OnFailure
containers:
- name: rollup
image: registry.example.com/rollup:2.1
envFrom: [{ configMapRef: { name: api-config } }]
| Mounting a ConfigMap/Secret | Env var (envFrom) | Volume file |
|---|---|---|
| Update without restart? | No — env is frozen at start | Yes — file is refreshed (~1 min), if app re-reads |
| Good for | flags, URLs, small values | config files, TLS certs, large blobs |
| Gotcha | changing the CM does NOT roll pods | subPath mounts do NOT auto-update |
Interview Q&A · deep dive
ownerReferences in each child's metadata. It matters for two reasons: cascading deletion (delete the Deployment and garbage collection removes the RS and Pods), and adoption (a controller only manages objects whose labels match its selector AND whose owner it is). A pod with the right labels but no owner is an orphan the controller won't touch.selector → pod labels → pod Ready → container targetPort. Most often the Service has zero endpoints because the selector and labels diverged, or pods aren't Ready (readiness probe). kubectl get endpoints <svc> is the one command that tells you instantly whether the Service found any backends.completions and parallelism, applies backoffLimit for retries, and stops once the target completions are met. Using a Deployment for batch work means your "finished" pods get restarted endlessly.kubectl apply do that create doesn't?apply is declarative and idempotent: it computes a three-way merge between your manifest, the live object, and the stored last-applied-configuration annotation (or, with server-side apply, field-ownership metadata). Re-running it converges to your file. create fails if the object exists, and replace overwrites fields others manage. apply is the only safe verb for GitOps.Scaling, probes & rollouts — how K8s self-heals runtime
The features that make K8s feel magical — except they're not; each is a controller doing one well-defined job. Knowing the levers (probes, resources, HPA, rolling strategy, PDB) is what separates "I deployed it" from "I operate it."
| Probe | Question it answers | What happens on fail |
|---|---|---|
| liveness | is the process alive? | kubelet kills + restarts the container |
| readiness | can it serve traffic right now? | pod removed from Service endpoints (no kill) |
| startup | has it finished initialising? | delays liveness until it passes — great for slow boot |
spec:
strategy:
type: RollingUpdate
rollingUpdate: { maxUnavailable: 1, maxSurge: 1 }
template:
spec:
containers:
- name: api
image: myapp:1.2
resources:
requests: { cpu: "100m", memory: "256Mi" } # scheduler uses requests
limits: { cpu: "500m", memory: "512Mi" } # kernel kills if exceeded
readinessProbe:
httpGet: { path: /ready, port: 8080 }
periodSeconds: 5
livenessProbe:
httpGet: { path: /healthz, port: 8080 }
periodSeconds: 10
failureThreshold: 3
| Scaling lever | What it does |
|---|---|
| HPA (Horizontal Pod Autoscaler) | scales pod count on CPU / memory / custom metrics |
| VPA (Vertical Pod Autoscaler) | recommends/adjusts pod requests & limits |
| Cluster Autoscaler | adds/removes nodes when pods don't fit |
| PDB (PodDisruptionBudget) | "never take more than N pods down at once" — protects you during node drains |
Interview Q&A
Scaling and self-healing aren't one feature — they're four independent control loops at different layers, and they can fight each other if you're careless. The HPA edits a Deployment's replicas; the Deployment controller reconciles pods; the scheduler places them; the Cluster Autoscaler adds nodes when they don't fit. The classic conflict: setting replicas by hand in a manifest that an HPA also manages — your apply and the HPA tug-of-war every reconcile. Rule: once an HPA owns a workload, remove replicas from the manifest (or it will revert the HPA on every deploy).
apiVersion: autoscaling/v2 # v2 is the current API — supports multiple + custom metrics
kind: HorizontalPodAutoscaler
metadata: { name: api }
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 70 }
- type: Pods # custom metric: requests/sec per pod (via adapter)
pods:
metric: { name: http_requests_per_second }
target: { type: AverageValue, averageValue: "100" }
behavior: # tune the velocity — new in v2
scaleDown:
stabilizationWindowSeconds: 300 # default: wait 5 min before scaling in (anti-flap)
policies: [{ type: Percent, value: 10, periodSeconds: 60 }]
scaleUp:
stabilizationWindowSeconds: 0 # default: react to spikes immediately
policies: [{ type: Percent, value: 100, periodSeconds: 15 }]
# startupProbe gates liveness so a 60s boot isn't killed mid-init
startupProbe:
httpGet: { path: /healthz, port: 8080 }
failureThreshold: 30 # 30 × 5s = up to 150s to start before liveness applies
periodSeconds: 5
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: api-pdb }
spec:
minAvailable: 2 # OR maxUnavailable: 1 — never both
selector: { matchLabels: { app: api } }
| QoS class | How you get it | Eviction order under node pressure |
|---|---|---|
| Guaranteed | requests == limits for CPU & mem on every container | last to be evicted |
| Burstable | requests set, but < limits (or only one set) | middle |
| BestEffort | no requests or limits at all | first to be killed |
Interview Q&A · deep dive
behavior.scaleDown.stabilizationWindowSeconds (default 300s) — the controller takes the highest recommendation over that window, so a brief dip won't trigger an immediate scale-in. Also widen the target band, cap the scale-down policy (e.g. 10%/min), and check your metric isn't noisy. Flapping is almost always too-aggressive scale-down, not scale-up.preStop hook / graceful shutdown race: K8s removes the pod from endpoints and sends SIGTERM nearly simultaneously, but in-flight requests and stale kube-proxy iptables rules can still route to the dying pod for a moment. Add a preStop: sleep 5 (or app-level connection draining) so the pod keeps serving while endpoint removal propagates, then exits.kubectl — the daily reference cli
kubectl is the one tool you use every day. Master ~25 commands and the rest is recall. Group them by intent: inspect, change, debug, target.
kubectl get pods -A # all pods, all namespaces
kubectl get deploy,svc,ing -n trial-ai # multiple kinds at once
kubectl get pod api-7f8c -o yaml # full spec + status
kubectl describe pod api-7f8c # events + container state (the #1 debug command)
kubectl top pods -n trial-ai # live CPU/mem (needs metrics-server)
kubectl get events --sort-by=.lastTimestamp # what just happened in this namespace
kubectl apply -f manifests/ # apply a directory of YAML — the right verb
kubectl diff -f manifests/ # dry-run a diff first — safe habit
kubectl scale deploy/api --replicas=6 # quick scale (CI/HPA usually owns this)
kubectl set image deploy/api api=myapp:1.3 # patch image — triggers a rollout
kubectl rollout status deploy/api # follow the rolling update
kubectl rollout undo deploy/api # revert to previous ReplicaSet
kubectl logs -f deploy/api -c api # tail logs for a container
kubectl logs pod/api-7f8c --previous # logs from the PREVIOUS crashed instance
kubectl exec -it deploy/api -- sh # shell into a running pod
kubectl port-forward svc/api 8080:80 # tunnel local:remote to the Service
kubectl debug pod/api-7f8c -it --image=busybox # attach a debug sidecar (ephemeral container)
kubectl config get-contexts # list configured clusters
kubectl config use-context prod-eks # switch cluster
kubectl config set-context --current --namespace=trial-ai # pin namespace
| Habit | Why |
|---|---|
| -o yaml / -o json | see the full object — status and events are where bugs hide |
| describe first | shows recent Events; 80% of failures are visible there |
| --dry-run=client -o yaml | generate a starter manifest without applying — great for new resources |
| -l app=api | label selectors beat typing pod names |
| shell aliases | k for kubectl, kns to switch namespace — save hours per week |
Interview Q&A
Under pressure you want a fixed sequence, not improvisation. The loop below is the one that resolves most pod-level incidents before you ever open a dashboard: start with events, read the previous crash's logs, compare against a healthy replica, then confirm the fix landed with a rollout watch.
# JSONPath: pull a single value out of the object graph
kubectl get pod api-7f8c -o jsonpath='{.status.podIP}'
kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}'
# custom-columns: a tidy table of just what you care about
kubectl get pods -o custom-columns='NAME:.metadata.name,NODE:.spec.nodeName,RESTARTS:.status.containerStatuses[0].restartCount'
# sort + filter server-side, then thin client-side
kubectl get pods --field-selector=status.phase=Running --sort-by=.metadata.creationTimestamp
kubectl get pods -A -o wide | grep -v Running # everything NOT healthy
# 1. ephemeral container: debug a pod whose image has NO shell (distroless)
kubectl debug -it api-7f8c --image=busybox:1.36 --target=api
# --target shares the target container's process namespace → you see its PIDs
# 2. copy a crashing pod with a debug image + command override, untouched original
kubectl debug api-7f8c --copy-to=api-dbg --image=ubuntu --share-processes -- sleep 1d
# 3. node-level debug: a privileged pod in the node's host namespaces
kubectl debug node/ip-10-0-1-23 -it --image=busybox # /host = node root fs
# 4. who can do what? (RBAC self-check before you blame permissions)
kubectl auth can-i create deployments -n trial-ai
kubectl auth can-i '*' '*' --as=system:serviceaccount:trial-ai:api
| Symptom in get pods | Most likely cause | First command |
|---|---|---|
| ImagePullBackOff | bad tag, private registry, no pull secret | describe pod (Events) |
| CrashLoopBackOff | app exits on start; bad config/secret | logs --previous |
| Pending | no node fits (resources, taints, PVC) | describe pod + get events |
| OOMKilled (in describe) | memory limit too low / leak | top pod + raise limit |
| 0/1 Running (not Ready) | readiness probe failing | describe → probe events |
Interview Q&A · deep dive
kubectl debug instead of kubectl exec?exec needs a shell inside the target image — useless for distroless/scratch images or a crashed container. kubectl debug attaches an ephemeral container (with your own tooling image) into the running pod, optionally sharing the target's process namespace via --target so you can inspect its files and PIDs. For a node-level problem, kubectl debug node/<n> drops a privileged pod with the host fs mounted at /host.grep on human output. -o jsonpath / -o json | jq for fields; kubectl wait --for=condition=Available deploy/api --timeout=120s instead of sleeping; kubectl rollout status --timeout=120s which exits non-zero on a stalled rollout so CI fails correctly. Pin the context explicitly so a script never targets the wrong cluster.edit, patch, and apply for a quick change?edit opens the live object in $EDITOR — convenient, but the change isn't in git (config drift). patch applies a targeted strategic/JSON merge from the CLI — scriptable, still imperative. apply reconciles from a file you keep in source control. For anything that should survive the next GitOps sync, change the file and apply; edit/patch are for break-glass only.kubectl auth can-i <verb> <resource> --as=system:serviceaccount:<ns>:<sa> -n <ns>. That answers the exact RBAC question without redeploying. If it says no, inspect the bound Role/ClusterRole (kubectl describe rolebinding -n <ns>) and add the missing rule — least privilege means you grant exactly that verb/resource, not *.Production Kubernetes — HA, networking, security operate
Anything you'd put behind an SLA needs more than a single-node cluster. Production K8s adds highly-available control plane, real networking, RBAC and admission control, and a backup/upgrade story — most easily by using a managed cluster (EKS/GKE/AKS) and focusing on the workload side.
| Pillar | What "production" means |
|---|---|
| HA control plane | 3+ apiserver/controller/scheduler replicas across zones; etcd as an odd-numbered HA cluster (3 or 5); load balancer in front of apiservers |
| etcd backup | periodic snapshots offsite — the only thing protecting you from cluster-state loss |
| Multi-zone | nodes spread across availability zones; topology-spread constraints ensure replicas aren't all in one zone |
| CNI | pick a real CNI (Calico, Cilium, Flannel) with NetworkPolicy support; default-deny + explicit allows |
| RBAC | least-privilege ServiceAccounts; humans via OIDC/SSO; system:masters is break-glass only |
| Admission | PodSecurity standards (baseline/restricted), OPA Gatekeeper / Kyverno for policy; image-signature verification |
| Secrets | etcd encryption-at-rest enabled; secrets actually live in Vault / cloud KMS and are mounted in |
| Upgrades | plan node + control-plane skew; drain nodes one zone at a time; respect PDBs |
| Observability | Prometheus + Grafana for metrics, a log pipeline (Loki/ELK), and tracing (OTel) before the first outage |
Interview Q&A
Production hardening is defence in depth: a request crosses several independent gates, and no single one is trusted to be enough. Identity (RBAC) decides who; admission (PSA / Kyverno) decides what kind of pod; NetworkPolicy decides what can talk to what; secrets-at-rest and a CNI that enforces policy back it. Picture them as rings the traffic and the workload must pass through — break one and the next still holds.
# 1. default-deny ALL ingress + egress in this namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: default-deny, namespace: trial-ai }
spec:
podSelector: {} # {} = selects every pod in the namespace
policyTypes: [Ingress, Egress] # no rules below = deny both directions
---
# 2. allow the api pods to receive from the ingress controller, and reach DNS + db
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: { name: api-allow, namespace: trial-ai }
spec:
podSelector: { matchLabels: { app: api } }
policyTypes: [Ingress, Egress]
ingress:
- from: [{ namespaceSelector: { matchLabels: { kubernetes.io/metadata.name: ingress-nginx } } }]
ports: [{ protocol: TCP, port: 8080 }]
egress:
- to: [{ podSelector: { matchLabels: { app: db } } }]
ports: [{ protocol: TCP, port: 5432 }]
- to: [{ namespaceSelector: {} }] # MUST allow DNS or all lookups fail
ports: [{ protocol: UDP, port: 53 }, { protocol: TCP, port: 53 }]
# A Role granting exactly read-only access to its own namespace's workloads
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata: { name: api-reader, namespace: trial-ai }
rules:
- apiGroups: ["", "apps"]
resources: [pods, pods/log, deployments]
verbs: [get, list, watch] # no create/delete — least privilege
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata: { name: api-reader-bind, namespace: trial-ai }
subjects: [{ kind: ServiceAccount, name: api, namespace: trial-ai }]
roleRef: { kind: Role, name: api-reader, apiGroup: rbac.authorization.k8s.io }
---
# Pod Security Admission: enforce the 'restricted' profile via namespace labels
# (PSS replaced the removed PodSecurityPolicy; this is built-in, no controller to install)
apiVersion: v1
kind: Namespace
metadata:
name: trial-ai
labels:
pod-security.kubernetes.io/enforce: restricted # block non-conforming pods
pod-security.kubernetes.io/warn: restricted # warn on kubectl apply
pod-security.kubernetes.io/audit: restricted # log violations
| Pod Security level | Allows | Use for |
|---|---|---|
| privileged | everything (no restrictions) | system/infra DaemonSets only |
| baseline | blocks known privilege escalations | most app workloads, easy migration |
| restricted | + non-root, drop ALL caps, seccomp, no host* | hardened production default |
Interview Q&A · deep dive
etcdctl snapshot save), encrypt secrets at rest, and restrict access to it tighter than anything else. A clean etcd snapshot is your only true disaster-recovery path for the control plane.kubectl apply in CI doesn't?apply is push-once: it sets state at deploy time but doesn't notice or revert manual drift afterward. Argo CD/Flux run inside the cluster, continuously diff live state against git, and re-apply — so an out-of-band kubectl edit is detected and reverted, the repo is always the single source of truth, and every prod change is a reviewed, revertible commit.AWS Cloud
A working map of the services you'll actually name in interviews, the serverless vs container choice, the ML/GenAI services, and a reference architecture for deploying an LLM app. Service capabilities are stable; exact prices and the newest model names drift, so those are kept conceptual.
The service map orientation
Group services by job. You don't need all 200+ — you need the dozen that come up constantly and the ability to reason about the rest.
| Job | Service | One-liner |
|---|---|---|
| Compute | EC2 · Lambda · ECS/EKS · Fargate | VMs · functions · containers · serverless containers |
| Storage | S3 · EBS · EFS | object · block (disk) · shared file |
| Database | RDS/Aurora · DynamoDB · OpenSearch | relational · NoSQL key-value · search/vector |
| Networking | VPC · ALB · Route 53 · CloudFront | private network · load balancer · DNS · CDN |
| Identity | IAM | who can do what — least privilege |
| Ops | CloudWatch · CloudTrail | metrics/logs · audit of API actions |
| Messaging | SQS · SNS · EventBridge | queue · pub/sub · event bus |
Interview Q&A
Past the brand names, every AWS architecture is the same five planes stacked: a compute plane running your code, a data plane holding state, a network plane wiring it together inside a VPC, a control plane (IAM + APIs) deciding who may do what, and an observability plane watching it all. When you can place any of the 200+ services into one of these five, you can reason about a service you've never used. The exam-and-interview trick: name the plane first, then the service.
| Category | Beyond the headline service | When it earns its keep |
|---|---|---|
| Compute | Batch, App Runner, Lightsail, Graviton (ARM) instances | queued GPU/CPU jobs; simple container PaaS; ~20% cheaper ARM |
| Storage | S3 storage classes (Intelligent-Tiering, Glacier, Express One Zone), FSx | lifecycle cost control; Lustre/Windows file systems |
| Database | Aurora Serverless v2, ElastiCache, Neptune, Timestream | auto-scaling SQL; Redis cache; graph; time-series |
| Networking | PrivateLink, Transit Gateway, NAT Gateway, WAF | private service access; hub-spoke; egress; L7 firewall |
| Security | KMS, Secrets Manager, GuardDuty, Security Hub, Organizations/SCP | encryption keys; rotating secrets; threat detection; guardrails |
| ML / AI | Bedrock, SageMaker, Textract, Comprehend, Kendra/OpenSearch | foundation models; train/serve; OCR; NLP; semantic search |
Interview Q&A · deep dive
lambda:GetFunctionConfiguration and are baked into the function version — they don't rotate. Store the credential in Secrets Manager (auto-rotation, encrypted with KMS, fetched at runtime) or SSM Parameter Store (SecureString) and grant the function's role read access to that one secret. The DB password never appears in code, config, or CloudTrail.s3:DeleteBucket org-wide means even an account's own admin can't delete buckets — the senior pattern for preventing footguns at scale.Choosing compute decision
The recurring design question. Trade control vs operational burden: more managed = less ops, less control.
| Option | Best for | Watch out |
|---|---|---|
| Lambda | event-driven, spiky, short tasks | time/size limits, cold starts |
| Fargate | containers without managing servers | less node-level control |
| ECS/EKS | long-running container services at scale | you run the orchestration |
| EC2 | full control, special hardware (GPU) | you patch & scale it |
Interview Q&A
"EC2 vs Lambda" is the wrong framing. Compute choice is three independent questions: (1) execution model — request/response, batch, or always-on? (2) packaging — raw process, container, or zip? (3) scaling shape — scale-to-zero or warm baseline? Lambda is "function + zip/container + scale-to-zero"; EKS is "container + tunable + warm baseline." Most "pick the wrong compute" mistakes come from optimising one axis and ignoring another — e.g. choosing Lambda for cost (scale-to-zero) on a path that actually needs predictable p99 latency (warm baseline).
# Lambda: per-request, scale-to-zero. Billed only while running.
# Cost driver = invocations x duration x memory. Idle = $0.
import json
def handler(event, _ctx):
body = json.loads(event["body"])
return {"statusCode": 200,
"body": json.dumps({"score": rank(body)})}
# Fargate/EKS: always-on container. Billed per vCPU-second the task
# exists, even at 0 RPS. Wins once traffic is steady & high.
# break-even rule of thumb: if the box is busy > ~40-50% of the
# day, a right-sized container beats per-invocation Lambda pricing.
| Constraint | Lambda | Fargate / ECS / EKS | EC2 |
|---|---|---|---|
| Max runtime / request | 15 min hard cap | unbounded (long-running) | unbounded |
| Memory ceiling | 10 GB (10,240 MB) | up to ~120 GB / task | up to TBs (instance type) |
| Local /tmp | 512 MB, raisable to 10 GB | container ephemeral / EFS | full EBS volumes |
| GPU | no | EKS yes / Fargate no | yes (P/G instances) |
| Cold start | yes (mitigate w/ SnapStart, prov. concurrency) | task start ~secs, then warm | none once running |
| Default concurrency | 1,000 / region (raisable) | service / cluster limits | instance + ASG limits |
Interview Q&A · deep dive
Serverless & analytics — Lambda, Athena & friends popular
The services that come up most: run code without servers (Lambda), query files in place without a database (Athena), and glue it together with events and ETL. Know the one-liner, the use case, and the gotcha for each.
| Service | What it is | Classic use case |
|---|---|---|
| Lambda | run a function on an event; pay per ms; no servers | S3-upload → process; API backend; cron jobs |
| Athena | serverless SQL directly on S3 files (Presto/Trino) | ad-hoc query logs/CSV/Parquet; pay per TB scanned |
| Glue | serverless ETL + a data catalog (crawlers infer schema) | transform raw → curated; catalog for Athena |
| Step Functions | visual state machine orchestrating Lambdas/services | multi-step serverless workflows with retries |
| EventBridge | event bus — route events by rule to targets | decoupled event-driven architecture |
| SQS / SNS | queue (point-to-point) / pub-sub (fan-out) | buffer load; broadcast notifications |
| API Gateway | managed HTTP/REST front door → Lambda/service | expose serverless APIs with auth + throttling |
def handler(event, context):
for rec in event["Records"]:
bucket = rec["s3"]["bucket"]["name"]
key = rec["s3"]["object"]["key"] # the uploaded file
process(bucket, key) # your logic
return {"status": "ok"} # billed per ms of runtime
SELECT registry, count(*) AS trials
FROM s3_trials -- a table over s3://.../trials/*.parquet
WHERE load_date = '2026-06-01'
GROUP BY registry; -- no DB to load; pay per TB scanned
Interview Q&A
Serverless event sources split into two integration styles, and the failure handling differs completely. Push (synchronous & async invoke): the source calls Lambda — API Gateway (sync), S3 / SNS / EventBridge (async). For async push, Lambda retries twice on failure then drops the event unless you wire a Dead Letter Queue or on-failure destination. Pull (poll-based): Lambda's service polls the source — SQS, Kinesis, DynamoDB Streams — in batches, and a poison message can stall a whole shard until it expires or you configure bisectBatchOnFunctionError + a DLQ. Knowing which style a source uses tells you exactly where messages go to die.
import json, boto3
ddb = boto3.resource("dynamodb").Table("processed_ids")
def handler(event, _ctx):
failures = [] # items to retry, not the whole batch
for rec in event["Records"]:
msg = json.loads(rec["body"])
try:
# idempotency: conditional put fails if we've seen this id
ddb.put_item(Item={"id": msg["id"]},
ConditionExpression="attribute_not_exists(id)")
process(msg) # real work, now exactly-once
except ddb.meta.client.exceptions.ConditionalCheckFailedException:
pass # duplicate delivery — safely skip
except Exception:
failures.append({"itemIdentifier": rec["messageId"]})
# only failed messages return to the queue (needs ReportBatchItemFailures)
return {"batchItemFailures": failures}
| Dimension | Standard | Express |
|---|---|---|
| Max duration | up to 1 year | up to 5 minutes |
| Execution guarantee | exactly-once | at-least-once |
| Pricing model | per state transition ($/1k) | per request + GB-second |
| Best for | long, auditable, human-in-loop workflows | high-volume, short event processing / streaming |
| History | full visual history (~90 days) | logs to CloudWatch only |
Interview Q&A · deep dive
SELECT * compounds all three. Raw CSV with no partitions is the beginner mistake that scans terabytes for a one-day query.maxReceiveCount so the message moves to a Dead Letter Queue after N failures instead of recycling forever, and enable ReportBatchItemFailures so one bad record doesn't fail the whole batch (only the failed itemIdentifier returns). Then alarm on DLQ depth and inspect/replay from there. The root-cause habit: a growing DLQ is a paging signal, not a place messages quietly accumulate.Cloud & service differences decision
Two kinds of "difference" come up: which provider (AWS vs Azure vs GCP — mostly the same primitives, different names) and which service within one (compute, storage, DB tiers). Senior answers map by capability, not brand.
| Capability | AWS | Azure | GCP |
|---|---|---|---|
| VMs | EC2 | Virtual Machines | Compute Engine |
| Serverless fn | Lambda | Functions | Cloud Functions |
| Containers (managed K8s) | EKS | AKS | GKE |
| Object storage | S3 | Blob Storage | Cloud Storage |
| Managed relational | RDS / Aurora | SQL Database | Cloud SQL |
| Data warehouse | Redshift | Synapse | BigQuery |
| NoSQL | DynamoDB | Cosmos DB | Firestore / Bigtable |
| Within AWS | Difference that matters |
|---|---|
| EC2 vs Fargate vs Lambda | you manage the VM → you manage only the container → you manage only the function. Control ↓, ops burden ↓, granularity ↑. |
| S3 vs EBS vs EFS | object store (HTTP, infinite) vs block volume (one EC2, like a disk) vs shared file system (many EC2, NFS). |
| RDS vs DynamoDB | managed relational (SQL, joins, ACID) vs managed NoSQL key-value (scale, single-digit-ms, no joins). |
| RDS vs Aurora | Aurora is AWS's cloud-native MySQL/Postgres-compatible engine — more throughput, storage auto-grows, faster failover. |
Interview Q&A
A Region is a geographic area (e.g. us-east-1); an Availability Zone is one or more discrete data centres inside a Region with independent power/cooling/network, close enough for low-latency sync replication but far enough to fail independently. The design rule: spread across ≥2-3 AZs for high availability (an AZ outage shouldn't take you down), and go multi-Region only for disaster recovery, data-residency law, or global latency — because cross-Region adds real cost, latency, and replication complexity. Below the AZ sits the edge / PoP layer (CloudFront, Route 53) for caching and DNS close to users.
| Layer | IaaS (EC2) | PaaS / managed (RDS, Lambda) | SaaS (Workspaces, M365) |
|---|---|---|---|
| Physical / DC / hardware | provider | provider | provider |
| Hypervisor / network fabric | provider | provider | provider |
| OS & patching | you | provider | provider |
| Runtime / middleware | you | provider | provider |
| App config & access (IAM) | you | you | you |
| Your data & encryption choices | you | you | you |
The one-line version: the provider secures "security of the cloud" (infra), you secure "security in the cloud" (your data, identity, config). Notice the bottom two rows never leave you — data and access are always yours, even in SaaS. Most cloud breaches are misconfiguration in those rows, not a provider failure.
| Model | Discount vs on-demand | Trade | Use for |
|---|---|---|---|
| On-demand | baseline (0%) | none — pay per hour/second | spiky, unpredictable, dev |
| Spot | up to ~70-90% | can be reclaimed w/ ~2 min notice | fault-tolerant batch, CI, stateless |
| Savings Plans / Reserved | up to ~72% | 1- or 3-yr spend/usage commitment | steady always-on baseline |
| Serverless (per-use) | $0 when idle | per-invocation premium at scale | event-driven, bursty, low-duty-cycle |
Interview Q&A · deep dive
AWS ↔ Azure ↔ GCP — the full service Rosetta stone comparison
The complete cross-cloud map. ~90% of primitives are the same idea wearing three names — senior engineers answer by capability, then name the one or two services that actually differentiate a cloud. Grouped by job so you can find any service fast.
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Virtual machines | EC2 | Virtual Machines | Compute Engine |
| Serverless functions | Lambda | Functions | Cloud Run functions |
| Managed Kubernetes | EKS | AKS | GKE |
| Serverless containers | Fargate | Container Apps | Cloud Run |
| PaaS app hosting | Elastic Beanstalk | App Service | App Engine |
| Container registry | ECR | ACR | Artifact Registry |
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Object storage | S3 | Blob Storage | Cloud Storage |
| Block (disk) storage | EBS | Managed Disks | Persistent Disk |
| Shared file storage | EFS | Azure Files | Filestore |
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Managed relational | RDS / Aurora | Azure SQL DB | Cloud SQL / AlloyDB |
| NoSQL document / KV | DynamoDB | Cosmos DB | Firestore / Bigtable |
| In-memory cache | ElastiCache | Cache for Redis | Memorystore |
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Data warehouse | Redshift | Synapse / Fabric | BigQuery |
| Managed Spark / big data | EMR | HDInsight | Dataproc |
| ETL / data integration | Glue | Data Factory | Dataflow / Data Fusion |
| Query-in-place (lake) | Athena | Synapse Serverless | BigQuery |
| Streaming ingest | Kinesis | Event Hubs | Pub/Sub |
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Message queue | SQS | Service Bus | Pub/Sub |
| Pub/sub & events | SNS / EventBridge | Event Grid | Pub/Sub / Eventarc |
| Workflow orchestration | Step Functions | Logic Apps | Workflows |
| API gateway | API Gateway | API Management | API Gateway / Apigee |
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Virtual network | VPC | VNet | VPC |
| Load balancer | ELB / ALB / NLB | Load Balancer / App Gateway | Cloud Load Balancing |
| DNS | Route 53 | Azure DNS | Cloud DNS |
| CDN | CloudFront | Front Door / CDN | Cloud CDN |
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Identity & access (IAM) | IAM | Entra ID + RBAC | Cloud IAM |
| Secrets | Secrets Manager | Key Vault | Secret Manager |
| Key management | KMS | Key Vault | Cloud KMS |
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Metrics & logs | CloudWatch | Monitor | Cloud Monitoring / Logging |
| Distributed tracing | X-Ray | App Insights | Cloud Trace |
| Native IaC | CloudFormation / CDK | ARM / Bicep | Deployment Manager |
| CI/CD | CodePipeline / CodeBuild | Azure DevOps / Pipelines | Cloud Build |
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Managed foundation models | Bedrock | Azure OpenAI / AI Foundry | Vertex AI |
| Full ML platform | SageMaker | Azure Machine Learning | Vertex AI |
| Vector / semantic search | OpenSearch / Kendra | AI Search | Vertex Vector Search |
| Document extraction | Textract | Document Intelligence | Document AI |
| If… | Lean | Because |
|---|---|---|
| already on Microsoft 365 / need governed OpenAI models | Azure | first-party Azure OpenAI + tight identity & 365 integration |
| data already in a warehouse, analytics/ML-heavy | GCP | BigQuery + Vertex is the smoothest data → model path |
| want the widest catalogue + multi-vendor models | AWS | Bedrock (Claude / Llama / Mistral / Nova) + deepest breadth |
Interview Q&A
The Rosetta tables above are ~90% honest, but a senior answer flags the ~10% where the equivalence leaks — and that's where the differentiating decisions live. Naming a few of these is the difference between "I memorised a chart" and "I've actually run workloads on more than one cloud."
| "Equivalent" pair | The mismatch that matters |
|---|---|
| DynamoDB ≈ Cosmos DB ≈ Firestore/Bigtable | Cosmos is multi-model + tunable consistency (5 levels); Firestore (document) and Bigtable (wide-column, no secondary indexes) are two different products — DynamoDB sits between them. Not interchangeable. |
| Redshift ≈ Synapse/Fabric ≈ BigQuery | BigQuery is fully serverless, separates storage/compute by default; Redshift is cluster-based (RA3 separates them; Serverless is newer). Different cost and tuning model entirely. |
| Lambda ≈ Azure Functions ≈ Cloud Run functions | GCP folded Functions into Cloud Run (request-based, container-native, can scale to many concurrent requests per instance) — a different concurrency model from Lambda's one-request-per-env. |
| SQS ≈ Service Bus ≈ Pub/Sub | Azure Service Bus is an enterprise broker (sessions, transactions, topics); GCP Pub/Sub is one service doing both queue and pub/sub — AWS splits that across SQS + SNS. |
| IAM ≈ Entra ID + RBAC ≈ Cloud IAM | AWS IAM is policy-on-resource/principal; Azure splits identity (Entra ID) from authorization (RBAC roles + scopes); GCP is role bindings on a resource hierarchy (org→folder→project). The mental model differs, not just the name. |
Engineers porting between clouds underestimate this: compute and storage map almost trivially, but the permission model is genuinely different in shape. AWS attaches JSON policies to identities and resources and evaluates an explicit-deny-wins union. Azure separates who you are (Entra ID) from what you can do where (RBAC role assignment at a scope). GCP binds roles to members on a hierarchical tree where permissions inherit downward. Re-implementing least-privilege correctly across these three is where multi-cloud migrations actually spend their time.
Interview Q&A · deep dive
AWS vs Azure vs GCP — the AI/ML lane your lane
The general compare card maps compute / storage / DB. This one maps the part you actually own: where the models, training, and GenAI services live on each cloud — and how to answer “why this cloud?” for an AI workload.
| Capability | AWS | Azure | GCP |
|---|---|---|---|
| Managed foundation models (GenAI) | Bedrock | Azure OpenAI / AI Foundry | Vertex AI |
| Full ML platform (train + deploy) | SageMaker | Azure Machine Learning | Vertex AI |
| Vector / semantic search | OpenSearch · Kendra | AI Search | Vertex Vector Search |
| Document / data extraction | Textract | Document Intelligence | Document AI |
| Notebooks / dev surface | SageMaker Studio | Azure ML Studio | Vertex Workbench |
| Warehouse (the ML data source) | Redshift | Synapse / Fabric | BigQuery |
S3 → EKS → OpenSearch → Bedrock → CloudWatch≈ Azure
Blob → AKS → AI Search → Azure OpenAI → Monitor≈ GCP
Cloud Storage → GKE → Vector Search → Vertex AI → Cloud Monitoring
Interview Q&A
Two big rebrands landed and an interviewer will probe whether you track them. Azure AI Foundry → Microsoft Foundry (effective Jan 1 2026) — it folds the old Azure OpenAI Service, AI Studio and AI Services into one resource; the Azure OpenAI SKU still exists and still ships new GPT models, so saying "Azure OpenAI is dead" is wrong. Vertex AI → Gemini Enterprise Agent Platform (announced Apr 22 2026) — Model Garden, Vector Search, RAG Engine, Custom Training and Pipelines all live on under it. Say the capability, then footnote the current brand; that signals you reason in primitives, not press releases.
| Capability | AWS | Azure (Microsoft Foundry) | GCP (Gemini Ent. Agent Platform) |
|---|---|---|---|
| House models | Amazon Nova 2 (Lite/Pro) | OpenAI GPT family (1st-party) | Gemini 3.x · Imagen · Veo |
| Managed RAG / knowledge base | Bedrock Knowledge Bases | Foundry + Azure AI Search | Vertex AI Search · RAG Engine |
| Managed agents | Bedrock Agents · Strands | Foundry Agent Service | Agent Builder / ADK · Agent Garden |
| Safety / guardrails | Bedrock Guardrails (6 policies) | Azure AI Content Safety | Vertex safety filters |
| Model catalogue breadth | 110+ models · 18 providers | 11,000+ models in catalog | 200+ in Model Garden |
Interview Q&A · deep dive
Azure OpenAI SKU is still creatable, existing endpoints/keys keep working, and it still receives new GPT models. It's a reorganization and superset (adds non-OpenAI models, agents, observability), not a deprecation.ML & GenAI services your lane
AWS offers managed ML at two altitudes: SageMaker for building/training/serving your own models, and Bedrock for consuming hosted foundation models via API (incl. building RAG/agents) without managing infrastructure.
| Service | Use it to… |
|---|---|
| SageMaker | train, tune, register, deploy custom models; managed notebooks & pipelines |
| Bedrock | call foundation models via API; managed RAG (knowledge bases), agents, guardrails |
| OpenSearch | keyword + vector search backend for RAG |
| Textract / Comprehend | extract text from docs · NLP (entities, sentiment) |
Interview Q&A
"Bedrock" isn't one thing — it's a kit. Naming the four pieces separately is what separates "I've used the chat API" from "I've shipped a GenAI system." Knowledge Bases = managed RAG (ingest → chunk → embed → store → retrieve). Agents = the model plans + calls your tools/APIs in a loop (action groups). Guardrails = a policy layer you attach to either, with six controls: denied topics, content filters, word filters, PII redaction, prompt-attack detection, and contextual-grounding + Automated Reasoning hallucination checks. Flows = a visual graph chaining all of the above.
| Service | Altitude | Reach for it when… |
|---|---|---|
| Bedrock Knowledge Bases | managed RAG | you want grounding without writing the ingest pipeline |
| Bedrock Agents | tool-using loop | the model must take actions (query a DB, call an API) |
| Bedrock Guardrails | policy filter | PII, denied topics, or hallucination grounding is required |
| SageMaker Unified Studio | data + AI IDE | one workbench over Glue, Athena, Redshift, EMR + SageMaker AI |
| Kendra | enterprise search | connector-driven retrieval (SharePoint, S3, Salesforce) |
| Comprehend · Textract | narrow NLP/OCR | entities/sentiment · text + tables out of PDFs |
import boto3, json
# 1) Plain generation via the unified Converse API (model-agnostic shape)
brt = boto3.client("bedrock-runtime", region_name="us-east-1")
resp = brt.converse(
modelId="anthropic.claude-sonnet-4-v1:0",
messages=[{"role": "user",
"content": [{"text": "Summarise our refund policy in 2 lines."}]}],
inferenceConfig={"maxTokens": 512, "temperature": 0.2},
)
print(resp["output"]["message"]["content"][0]["text"])
# 2) Managed RAG: retrieve from a Knowledge Base, then generate — one call
agent = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
rag = agent.retrieve_and_generate(
input={"text": "What is our SLA for enterprise tier?"},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": "KB12345678",
"modelArn": "anthropic.claude-sonnet-4-v1:0",
},
},
)
print(rag["output"]["text"])
# citations carry source S3 URIs — surface them, never trust ungrounded text
for c in rag.get("citations", []):
print(c["retrievedReferences"])
Interview Q&A · deep dive
converse gives one normalized request/response shape (messages, system, tool config, inference params) across all Bedrock model families, plus streaming via converse_stream. invoke_model takes each vendor's raw body, so switching models means rewriting the payload. Converse is the model-portability seam; reach for raw invoke only for vendor features Converse hasn't surfaced yet.Reference architecture: a production RAG service on AWS system design
A concrete, defensible design you can sketch on a whiteboard — split into the offline ingestion path and the online query path.
Interview Q&A
The existing card lists the boxes; the senior signal is narrating why each hop exists and what fails without it. The online path is latency-budgeted: every box adds milliseconds you must justify.
"Use OpenSearch" is the safe default but a real interview wants the tradeoff. As of 2026, Bedrock Knowledge Bases can sit on OpenSearch Serverless, Aurora PostgreSQL (pgvector), Neptune Analytics, S3 Vectors (GA Dec 2025), Pinecone, MongoDB Atlas, or Redis.
| Store | Strength | Watch out |
|---|---|---|
| OpenSearch Serverless | low-latency, hybrid search, rich filters | ~4 OCU floor → real monthly minimum even when idle |
| Aurora pgvector | cheap, SQL + vectors in one DB, joins | tune HNSW/IVF; not a search engine — fewer filter tricks |
| S3 Vectors | zero idle cost, big cost savings, cold tier | ~100ms warm latency — pair as cold tier behind OpenSearch |
| Pinecone / Mongo / Redis | portable, multi-cloud, familiar ops | another vendor + egress; less native IAM story |
# Terraform: the app's task role can ONLY invoke one model + read one prefix
data "aws_iam_policy_document" "rag_app" {
statement {
actions = ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"]
resources = ["arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-v1:0"]
}
statement {
actions = ["aoss:APIAccessAll"] # OpenSearch Serverless data-plane
resources = ["arn:aws:aoss:us-east-1:123456789012:collection/rag-prod"]
}
statement {
actions = ["s3:GetObject"]
resources = ["arn:aws:s3:::ci-radar-docs/prod/*"] # read-only, one prefix
}
}
# No bedrock:* , no s3:* — blast radius if the task is compromised stays tiny
Interview Q&A · deep dive
maxTokens cap. Compute autoscaling is the easy part; these two are where senior judgement shows.tenant_id and enforce it as a mandatory metadata filter on every retrieval — never rely on the prompt to "remember" the tenant. Back it with per-tenant IAM/encryption (KMS keys) where regulation demands hard isolation, and consider separate indices/collections for the largest tenants. The classic leak is a query that returns another tenant's chunks because the filter was optional.Well-Architected pillars framework
AWS's design checklist — handy vocabulary for "how would you make this production-grade?" Six pillars:
| Pillar | Question it forces |
|---|---|
| Operational excellence | can you deploy, observe, and recover smoothly? |
| Security | least privilege, encryption, auditability? |
| Reliability | does it self-heal and degrade gracefully? |
| Performance efficiency | right-sized resources, scales with load? |
| Cost optimisation | paying only for what you use? |
| Sustainability | minimising resource/energy footprint? |
Interview Q&A
Naming the six pillars is table stakes. The differentiator is knowing the framework extends through Lenses — workload-specific overlays. At re:Invent 2025 AWS shipped a new Responsible AI Lens and refreshed the Generative AI Lens and Machine Learning Lens. The GenAI Lens applies all six pillars across six lifecycle phases — scoping → model selection → customization → development → deployment → continuous improvement — and now includes an agentic-AI preamble. Mentioning a Lens by name in a design review is the senior tell.
Pillars conflict, and naming the tension is the interview gold. Optimizing one usually taxes another; "well-architected" means making the tradeoff consciously.
| Tension | What pulls each way | How to resolve |
|---|---|---|
| Cost ↔ Reliability | multi-AZ + spare capacity costs money | match redundancy to the SLA, not to fear |
| Performance ↔ Cost | bigger instances / provisioned throughput | autoscale + cache; pay for peak only at peak |
| Security ↔ Operational ex. | least privilege slows shipping | automate access via IaC + short-lived roles |
| Sustainability ↔ Performance | idle headroom wastes energy | right-size, Graviton, scale-to-zero where possible |
Interview Q&A · deep dive
Cloud cheat sheet — the codes worth memorising recall
The fast-recall layer: the CLI you actually type, the service → job one-liners you name-drop, the “if X reach for Y” shortcuts, and the gotchas. Built for the night-before scan.
# identity · always start here
aws sts get-caller-identity # which role / account am I?
# S3 · the data-lake workhorse
aws s3 ls s3://bucket/prefix/
aws s3 cp file.csv s3://bucket/ --sse # upload, encrypted at rest
aws s3 sync ./local s3://bucket/ --delete # mirror a folder
# compute
aws ec2 describe-instances --filters "Name=instance-state-name,Values=running"
aws lambda invoke --function-name fn out.json
# logs · debugging in prod
aws logs tail /aws/lambda/fn --follow # live tail
# IAM · least-privilege check
aws iam list-attached-role-policies --role-name myrole
| S3 | cheap infinite object storage — the gravity centre of every data/ML system |
| EC2 · Fargate · Lambda | VM → serverless container → function; control ↓, ops ↓ |
| RDS · DynamoDB | managed SQL with joins / managed NoSQL at single-digit-ms scale |
| VPC · SG · IAM | private network · instance firewall · who-can-do-what |
| SQS · SNS · EventBridge | queue · pub/sub · event bus — decouple everything |
| CloudWatch · CloudTrail | metrics/logs · audit trail of every API call |
| Bedrock · SageMaker | call foundation models (no infra) · train/serve your own |
| If you need… | Reach for |
|---|---|
| event-driven, spiky, short task | Lambda (watch cold starts) |
| steady long-running service | Fargate / EKS behind an ALB |
| GPU / full host control | EC2 GPU instance |
| ad-hoc SQL over files in S3 | Athena |
| scheduled ETL / cataloguing | Glue |
| decouple producers / consumers | SQS (queue) or SNS / EventBridge (fan-out) |
| store an API key / DB password | Secrets Manager (never env vars in code) |
| multi-cloud provisioning | Terraform, not CloudFormation |
| AZ availability zone | SG security group | AMI machine image |
| IAM identity & access mgmt | VPC virtual private cloud | ALB / NLB app / network LB |
| ASG auto-scaling group | NACL subnet firewall | TTL time-to-live |
Interview Q&A
Interviewers love "what's the X equivalent on Azure/GCP?" Memorise the rows, not the brands — and keep the 2026 renames straight.
| Job | AWS | Azure | GCP |
|---|---|---|---|
| Object storage | S3 | Blob Storage | Cloud Storage |
| Serverless function | Lambda | Functions | Cloud Functions / Run |
| Managed K8s | EKS | AKS | GKE |
| Identity / IAM | IAM | Entra ID | Cloud IAM |
| Secrets | Secrets Manager | Key Vault | Secret Manager |
| Warehouse | Redshift | Synapse / Fabric | BigQuery |
| GenAI front door | Bedrock | Microsoft Foundry | Gemini Agent Platform |
# Bedrock · what can I call, and quick smoke-test a model
aws bedrock list-foundation-models --query "modelSummaries[].modelId"
aws bedrock-runtime converse --model-id anthropic.claude-sonnet-4-v1:0 \
--messages '[{"role":"user","content":[{"text":"ping"}]}]'
# Assume a role explicitly (the right way to get scoped, short-lived creds)
aws sts assume-role --role-arn arn:aws:iam::123:role/deploy \
--role-session-name ci --duration-seconds 3600
# SSM Parameter Store · config & secrets without baking them in
aws ssm get-parameter --name /prod/db/url --with-decryption
# Terraform · the multi-cloud provisioning loop
terraform plan -out tf.plan # preview — never apply blind
terraform apply tf.plan # apply the exact reviewed plan
terraform state list # what does state think exists?
| If you need… | Reach for |
|---|---|
| ship a grounded chatbot fast | Bedrock Knowledge Bases (managed RAG) |
| the model to call your APIs | Bedrock Agents (action groups) |
| block PII / jailbreaks in/out | Bedrock Guardrails |
| cheapest vector store, has SQL | Aurora pgvector |
| zero-idle-cost vectors | S3 Vectors (cold tier) |
| one IDE over data + ML | SageMaker Unified Studio |
| config without secrets in code | SSM Parameter Store / Secrets Manager |
Interview Q&A · deep dive
aws sts get-caller-identity (who/what am I, which account), then aws iam list-attached-role-policies or simulate-principal-policy to learn what I can actually do, then the resource inventory for the task (e.g. aws s3 ls, aws ec2 describe-instances). Identity first, permissions second, resources third — never act before you know your blast radius.terraform plan -out then apply <plan> instead of bare terraform apply?plan -out tf.plan freezes an explicit, reviewable artifact; apply tf.plan runs exactly that. In CI this gives you a human-approvable diff and prevents the "it applied something I didn't see" class of incident.VPC & networking — how a packet actually reaches your service network plane
A VPC is your own private slice of the AWS network — a CIDR block (e.g. 10.0.0.0/16) carved into subnets, each pinned to one Availability Zone. The whole game is reachability: a subnet is "public" or "private" not by a checkbox but by what its route table points at. Master the path a request takes and the firewalls it passes, and the other 90% of AWS networking falls out of that.
There is no "public subnet" attribute. A subnet is public when its route table has a 0.0.0.0/0 → igw-… route (an Internet Gateway) and its instances have public IPs. A subnet is private when its default route points at a NAT Gateway (outbound-only to the internet) or at nothing internet-facing. The IGW is a two-way door for things with a public IP; the NAT is a one-way valve so private instances can call out (pull packages, hit APIs) but the world can't call in. Put load balancers and bastions in public subnets; put app servers and databases in private ones.
Two firewalls operate at different layers. A security group guards the instance/ENI, is stateful (return traffic for an allowed request is auto-permitted), and has allow rules only. A network ACL guards the whole subnet, is stateless (you must allow both inbound and the ephemeral-port return traffic explicitly), and supports deny rules evaluated in numbered order. SGs are your primary, everyday control; NACLs are a coarse blast-door for subnet-wide blocks (e.g. ban a bad CIDR).
| Dimension | Security group | Network ACL |
|---|---|---|
| Scope | instance / ENI | entire subnet |
| State | stateful (return auto-allowed) | stateless (allow both directions) |
| Rules | allow only | allow and deny, numbered order |
| Default | deny all in, allow all out | default NACL allows all both ways |
| Can reference | other SGs (chaining) | CIDR ranges only |
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true # needed for private DNS on endpoints
}
# Public subnet — route to the Internet Gateway
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
}
resource "aws_internet_gateway" "igw" { vpc_id = aws_vpc.main.id }
# NAT lives in the PUBLIC subnet; private subnets route 0.0.0.0/0 at it
resource "aws_nat_gateway" "nat" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public.id # NAT must sit in a public subnet
}
resource "aws_subnet" "private" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.2.0/24"
availability_zone = "us-east-1a" # match the NAT's AZ to avoid cross-AZ $
}
# Free gateway endpoint — keeps S3 traffic off the NAT entirely
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.s3"
vpc_endpoint_type = "Gateway" # $0 — adds a route-table entry
}
| Need | Use | Why |
|---|---|---|
| Two VPCs talk privately | VPC Peering | 1:1, non-transitive, no overlapping CIDRs; cheapest for a pair |
| Many VPCs + on-prem, hub-spoke | Transit Gateway | one router for N networks; transitive routing; scales past peering mesh |
| Expose ONE service privately | PrivateLink (interface endpoint) | consumer reaches your service via a private ENI; no route/CIDR coupling |
| Reach AWS service privately | VPC endpoint | gateway (S3/DynamoDB, free) or interface (PrivateLink, paid) |
Interview Q&A · deep dive
0.0.0.0/0 route pointing at an Internet Gateway, and instances in it have public/elastic IPs (plus SG/NACL allowing the traffic). Remove the IGW route and the same subnet becomes private. "Public" is a property of routing, not a subnet setting.IAM deep dive — how a request is allowed or denied control plane
IAM is the gate in front of every AWS API call. Stop thinking "users and passwords" — think identities (users, roles, services) presenting credentials, and a policy evaluation engine that says yes or no. The single most valuable thing to internalise is the evaluation algorithm: default deny → explicit deny always wins → an allow must survive every layer. Get that and IAM stops being mysterious.
A policy is JSON with one or more statements. Each has an Effect (Allow/Deny), an Action (e.g. s3:GetObject), a Resource (an ARN), and optional Condition keys. Identity-based policies attach to a user/role/group ("what can this identity do"). Resource-based policies attach to the resource ("who may touch me", with a Principal) — an S3 bucket policy is the canonical example. The two combine.
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "ReadOneBucketOnly",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::trials-curated",
"arn:aws:s3:::trials-curated/*"
],
"Condition": { // least privilege, tightened
"Bool": { "aws:SecureTransport": "true" }, // require TLS
"StringEquals": { "aws:PrincipalTag/team": "data" }
}
}]
}
When a request arrives, AWS starts from an implicit deny and walks the layers. An explicit Deny anywhere short-circuits to DENY — no Allow can override it. To be allowed, the action must be permitted at every applicable layer (it's an intersection): the SCP (org guardrail), the permission boundary (if set), and an identity- or resource-based Allow, plus any session policy. Miss an Allow at one layer and the request is denied even if the others allow it.
import boto3
# 1) Ask STS for short-lived creds by assuming a role in another account.
# No long-lived keys travel anywhere — the role's trust policy gates this.
sts = boto3.client("sts")
creds = sts.assume_role(
RoleArn="arn:aws:iam::222233334444:role/DataReader",
RoleSessionName="radar-etl", # shows up in CloudTrail — auditable
DurationSeconds=3600,
)["Credentials"]
# 2) Use the temporary credentials. They auto-expire — nothing to rotate.
s3 = boto3.client(
"s3",
aws_access_key_id=creds["AccessKeyId"],
aws_secret_access_key=creds["SecretAccessKey"],
aws_session_token=creds["SessionToken"], # the token is mandatory for STS creds
)
for obj in s3.list_objects_v2(Bucket="trials-curated").get("Contents", []):
print(obj["Key"])
# The DataReader role's TRUST policy must name account 1111... as Principal,
# i.e. who is allowed to assume it; its PERMISSION policy says what it can do.
| Control | Applies to | What it does | Can it grant? |
|---|---|---|---|
| Identity policy | a user/role | grants permissions | yes |
| Permission boundary | a single user/role | caps the max that identity can have | no — only limits |
| SCP | whole account / OU | org-wide guardrail / max | no — only limits |
| Resource policy | a resource (bucket, queue) | says who may touch it | yes |
Interview Q&A · deep dive
aws:SecureTransport).Principal and sts:AssumeRole. The permission policy answers "what can the assumed role do." Both must be right: the caller needs sts:AssumeRole permission AND the role's trust policy must list that caller as a principal.sts:AssumeRole on a role in account B. B's role trust policy must name A (account or specific role) as principal; A's identity must be allowed sts:AssumeRole on that ARN. STS returns short-lived credentials scoped to B's role permissions. Optionally add an ExternalId condition to defend against the "confused deputy" problem when a third party assumes on your behalf.Cloud cost & FinOps — where the money actually goes finops
The cloud bill is an engineering artifact, not a finance one. Two systems with identical features can differ 5x in cost based on choices engineers make: pricing model, data-transfer paths, and idle resources. FinOps is the practice of making cost a first-class, observable metric — owned by the teams that create it. The leverage is in three places: commit to baseline, use Spot for the flexible part, and stop paying the silent taxes (egress, idle, NAT, logs).
| Model | Discount vs on-demand | Best for | Catch |
|---|---|---|---|
| On-demand | 0% (baseline) | spiky, unpredictable, short-lived | most expensive per hour |
| Compute Savings Plan | up to ~66% | steady baseline, flexible across EC2/Fargate/Lambda | 1 or 3-yr $/hr commitment |
| EC2 Instance Savings Plan / Standard RI | up to ~72% | steady, known instance family/region | least flexible; locked to a family |
| Spot | up to ~90% | fault-tolerant batch, CI, stateless workers | can be reclaimed with ~2 min notice |
The mature pattern uses all of them at once: a Savings Plan covering the always-on baseline, Spot for interruption-tolerant work (batch, training, CI runners), and on-demand only for the unpredictable spillover. Committing to 100% of current usage is a trap — commit to the floor you're confident persists (often ~70-80% of baseline), leave headroom for change.
| Cost trap | Why it sneaks up | Fix |
|---|---|---|
| Data egress | inbound is free; outbound to internet and cross-AZ/region is metered | keep traffic in-AZ; CloudFront for egress; co-locate chatty services |
| NAT Gateway | ~$0.045/hr + ~$0.045/GB processed, on top of transfer | VPC gateway endpoints for S3/DynamoDB (free); interface endpoints for ECR/Secrets |
| Idle / orphaned | unattached EBS, old snapshots, unused EIPs, dev boxes left on | scheduled stop/start; lifecycle on snapshots; tag + sweep |
| CloudWatch Logs | ingestion + indefinite retention bills forever | set retention; sample/filter; ship cold logs to S3 |
| Over-provisioned | "just in case" instance sizes; gp2 over gp3 | right-size from metrics; gp3; Graviton (ARM) for ~20% better price-perf |
import boto3
ec2 = boto3.client("ec2")
waste = []
# 1) Unattached EBS volumes — you pay for provisioned GB even when idle.
for v in ec2.describe_volumes(
Filters=[{"Name": "status", "Values": ["available"]}]
)["Volumes"]:
waste.append(("orphan-ebs", v["VolumeId"], v["Size"])) # GB still billed
# 2) Unassociated Elastic IPs — an idle public IP is charged hourly.
for a in ec2.describe_addresses()["Addresses"]:
if "AssociationId" not in a:
waste.append(("idle-eip", a["PublicIp"], None))
# 3) Cost allocation: untagged instances can't be charged back to a team.
for r in ec2.describe_instances()["Reservations"]:
for i in r["Instances"]:
tags = {t["Key"] for t in i.get("Tags", [])}
if "team" not in tags or "env" not in tags:
waste.append(("untagged", i["InstanceId"], None))
for kind, rid, size in waste:
print(f"{kind:12} {rid} {size or ''}") # feed into a ticket / Slack alert
Interview Q&A · deep dive
Security & Cryptography
Security at a senior level is threat-models and defaults, not a checklist. Frame every topic as: who is the adversary, what's the asset, where's the trust boundary, what's the blast radius. Anchored to the Kubernetes mutual-TLS mesh you run, the LLM agents you ship, and the pharma data you protect.
AuthN, AuthZ & RBAC foundation
Authentication = who are you (identity). Authorization = what may you do (permission). Conflating the two is the most common junior error — they are separate stages with separate failure modes.
| Model | How it grants | When |
|---|---|---|
| RBAC | permission sets bound to roles, roles to users | most systems — coarse, auditable, default |
| ABAC | policies over resource/subject attributes | fine-grained, context-dependent access |
| ReBAC | relationship graph (owner-of, member-of) | sharing/hierarchies (Zanzibar-style) |
Interview Q&A
A request crosses four distinct checks, and each one fails differently. AuthN establishes a principal (a verified identity). AuthZ maps that principal to permissions on a resource. Admission/policy applies orthogonal rules (quotas, defaulting, org policy) that aren't about identity at all. Audit records who did what. The deepest junior error after conflating AuthN/AuthZ is forgetting that authorization must be re-checked per resource — passing the front door does not authorise touching object #42.
| Axis | Session (stateful) | Token / JWT (stateless) |
|---|---|---|
| State | server stores session; cookie holds an opaque id | server stores nothing; claims travel in the token |
| Revocation | instant — delete the server record | hard — valid until expiry unless you keep a denylist |
| Scale | needs shared store (Redis) across nodes | self-contained, scales horizontally for free |
| Best for | classic web apps, easy logout-everywhere | APIs, microservices, short-lived access tokens |
OAuth2 vs OIDC: OAuth2 is an authorization framework — it issues access tokens that say "this client may call that API". OIDC layers authentication on top, adding an ID token (a JWT about the user) and a /userinfo endpoint. Rule of thumb: access token = for an API, ID token = for the client app to learn who logged in. Never use an ID token to call an API.
import time, jwt # PyJWT
from jwt import PyJWKClient
# Fetch the IdP's public signing keys (cached); never hardcode a key
ISSUER = "https://login.example.com/"
AUDIENCE = "api://trainhub"
jwks = PyJWKClient("https://login.example.com/.well-known/jwks.json")
def verify(token: str) -> dict:
key = jwks.get_signing_key_from_jwt(token).key
claims = jwt.decode(
token, key,
algorithms=["RS256"], # pin alg — block the alg=none / HS256 confusion attack
audience=AUDIENCE, # must match: stops token-from-another-app reuse
issuer=ISSUER, # must match: stops token from a rogue issuer
options={"require": ["exp", "iat", "aud", "iss"]},
) # raises on bad sig / expiry / aud / iss
return claims
# AuthZ is a SEPARATE step — a valid token is not a yes
def authorize(claims: dict, need: str) -> bool:
return need in claims.get("scope", "").split()
Interview Q&A · deep dive
algorithms=["RS256"] instead of trusting the token's header?alg:none (some libs then skip signature verification entirely) and RS256→HS256 confusion (the attacker sets alg:HS256 and signs with the public RSA key, which a naive verifier uses as the HMAC secret). Pinning the expected algorithm server-side neutralises both.aud is an API; the resource server validates it and reads scope. The ID token's aud is the client application; it carries user identity claims (sub, email, name) and should never be forwarded to an API as authorization. Sending an ID token to an API is a common misconfiguration.PKI, TLS & certificate hygiene crypto
TLS gives you three things at once: identity (certs), confidentiality (encryption), and integrity (MAC). Kubernetes is a mutual-TLS mesh — almost every hop authenticates both ends with a certificate, which makes it the best real-world PKI to reason about.
| CA | Signs | Protects |
|---|---|---|
| kubernetes-ca | apiserver, kubelet-client, admin certs | the general control-plane mesh |
| etcd-ca | etcd server/peer, apiserver-etcd-client | the cluster datastore |
| front-proxy-ca | front-proxy client | aggregated API extension |
Interview Q&A
The whole point of a handshake is to bootstrap fast symmetric encryption using slow asymmetric crypto for trust only. Asymmetric (RSA/ECDSA) is used to prove identity (the cert signature) and agree a key (ECDHE), then the bulk data flows under a symmetric cipher (AES-GCM / ChaCha20-Poly1305) which is orders of magnitude faster. So: asymmetric = trust + key agreement, symmetric = throughput. A cert is just a public key plus identity (SAN) wrapped in a signature from a CA you already trust.
| Aspect | TLS 1.2 | TLS 1.3 (RFC 8446) |
|---|---|---|
| Round trips | 2-RTT to first byte | 1-RTT; 0-RTT for resumption |
| Key exchange | static RSA allowed | ephemeral only (ECDHE) — forward secrecy mandatory |
| Cipher suites | large, many weak (CBC, RC4) | 5 AEAD-only suites; legacy removed |
| Handshake privacy | cert sent in cleartext | cert encrypted after key exchange |
Forward secrecy is the headline: because every session uses a fresh ephemeral ECDHE key that's never written to disk, stealing the server's long-term private key tomorrow doesn't decrypt traffic you captured today. The cost: 0-RTT data is replayable and lacks full forward secrecy, so it must be limited to idempotent requests.
# Inspect the live cert a host actually serves (SAN + expiry)
import ssl, socket, datetime
def peek(host, port=443):
ctx = ssl.create_default_context() # verifies chain to system CA store
with socket.create_connection((host, port), timeout=5) as s:
with ctx.wrap_socket(s, server_hostname=host) as tls:
cert = tls.getpeercert() # raises if chain/SAN invalid
sans = [v for t, v in cert["subjectAltName"] if t == "DNS"]
exp = datetime.datetime.strptime(cert["notAfter"], "%b %d %H:%M:%S %Y %Z")
left = (exp - datetime.datetime.utcnow()).days
print(f"SANs={sans} expires_in={left}d")
if left < 21: # alert window — rotate BEFORE the outage
raise SystemExit(f"ROTATE {host}: only {left} days left")
peek("api.example.com")
Interview Q&A · deep dive
notBefore/notAfter evaluated against a wrong system clock makes a good cert look expired or not-yet-valid.OWASP Top 10 + the LLM Top 10 threats
Don't memorise the list — know the shapes. The recurring web risks plus the new LLM-app risks that land directly on your RAG pipelines and agentic bots.
| Classic risk | What it is |
|---|---|
| Broken Access Control | #1 — IDOR, missing function-level authz |
| Cryptographic Failures | weak/missing TLS, secrets at rest in plaintext |
| Injection | SQL / command / now prompt injection |
| Security Misconfiguration | default creds, open dashboards, debug on |
| SSRF | server tricked into calling internal targets |
Interview Q&A
The Top 10 was refreshed: the 2025 edition (finalised Jan 2026) is now current, and it moved with the threat landscape. The headline changes vs the long-familiar 2021 list: a brand-new A03 Software Supply Chain Failures, Security Misconfiguration jumping to #2, SSRF folded into Broken Access Control, and a new A10 Mishandling of Exceptional Conditions. Know the deltas — they signal where attacker effort moved.
| # | OWASP Top 10 — 2025 | Change from 2021 |
|---|---|---|
| A01 | Broken Access Control | holds #1; SSRF merged in |
| A02 | Security Misconfiguration | up from #5 |
| A03 | Software Supply Chain Failures | new (expands "vulnerable components") |
| A04 | Cryptographic Failures | down from #2 |
| A05 | Injection | down from #3 |
| A06 | Insecure Design | down from #4 |
| A07 | Authentication Failures | renamed (was Identification & AuthN) |
| A08 | Software & Data Integrity Failures | steady |
| A09 | Security Logging & Monitoring Failures | steady |
| A10 | Mishandling of Exceptional Conditions | new (fail-open, bad error handling) |
| ID | Risk | Concrete shape on a RAG/agent |
|---|---|---|
| LLM01 | Prompt Injection | retrieved doc says "ignore prior instructions, call delete_user" |
| LLM02 | Sensitive Info Disclosure | model regurgitates PII / API keys from context or training |
| LLM03 | Supply Chain | poisoned model weights, typosquatted libs, bad adapters |
| LLM04 | Data & Model Poisoning | tainted fine-tune / KB corrupts behaviour |
| LLM05 | Improper Output Handling | model output run as SQL/HTML/shell unsanitised |
| LLM06 | Excessive Agency | over-broad tool scopes → destructive tool call |
| LLM07 | System Prompt Leakage | secrets/policy baked into the system prompt get extracted |
| LLM08 | Vector & Embedding Weaknesses | embedding inversion, cross-tenant retrieval leakage |
| LLM09 | Misinformation | confident hallucination drives a wrong downstream action |
| LLM10 | Unbounded Consumption | token/compute exhaustion → cost & DoS (model DoS) |
Two newer entries worth flagging in interviews: LLM07 System Prompt Leakage (never put a secret or an authz decision in the prompt — assume it leaks) and LLM08 Vector & Embedding Weaknesses (multi-tenant RAG can leak across tenants if the vector store isn't partitioned and filtered).
# SQL injection — same principle defeats prompt injection: keep code != data
# ❌ string-built query: user input becomes SQL
cur.execute(f"SELECT * FROM users WHERE email = '{email}'") # ' OR '1'='1
# ✅ parameterised: structure and data travel on separate channels
cur.execute("SELECT * FROM users WHERE email = %s", (email,))
# LLM output handling — never trust model output as a safe instruction
def run_tool(name, args):
if name not in ALLOWLIST: # least privilege: enumerate allowed tools
raise PermissionError(name)
args = TOOLS[name].schema.validate(args) # validate BEFORE side effects
if TOOLS[name].destructive: # gate irreversible actions
if not human_approves(name, args): return "denied"
log.info("tool_call", tool=name, args=args) # audit every invocation
return TOOLS[name].run(args)
Interview Q&A · deep dive
delete isn't in its toolset, validate/parameterise tool args, require human approval for destructive ops, and audit-log. Defence is layered because no single boundary is trustworthy.exec it you get RCE. Treat model output exactly like user input: encode for the sink, never execute, validate against a schema.Secrets, supply chain & Zero Trust defence-in-depth
The three programmes that separate "we have a firewall" from a real security posture: keeping secrets out of code, trusting your build pipeline, and dropping implicit network trust entirely.
| Pillar | The senior move |
|---|---|
| Secrets | secrets manager (Vault / cloud KMS), never baked into images or env layers |
| Supply chain | SBOMs, signed artifacts (Sigstore/cosign), pinned deps, provenance (SLSA) |
| Zero Trust | authenticate & authorise every request, assume breach, segment to shrink blast radius |
Interview Q&A
Zero Trust isn't a product — it's an architecture (NIST SP 800-207) where every access decision runs through a Policy Decision Point and is enforced at a Policy Enforcement Point. The PDP is split into a Policy Engine (runs the trust algorithm over identity, device posture, and threat signals) and a Policy Administrator (opens/closes the actual session). The seven tenets boil down to: every resource is protected, no network location grants trust, sessions are per-request, authenticated, encrypted, and continuously evaluated.
| Layer | Question it answers | Tool |
|---|---|---|
| SBOM | what's in this artifact? | Syft, CycloneDX, SPDX |
| Provenance (SLSA) | how/where was it built? | SLSA Build L1-L3, slsa-github-generator |
| Signing | is it authentic & untampered? | Sigstore cosign (Fulcio + Rekor) |
| Verification | should I deploy it? | admission policy (Kyverno / cosign verify) |
Keyless signing is the 2025 default worth knowing: instead of guarding a long-lived private key, cosign uses your CI's OIDC identity to get a short-lived cert from Fulcio and records the signature in Rekor, a public transparency log. There's no key to leak — the signing identity is your verifiable build, and SLSA Build L3 means the provenance was produced by the build service itself, non-falsifiable by the developer.
# Sign in CI with no private key — identity comes from the OIDC token
cosign sign --yes \
$IMG@$DIGEST # Fulcio issues a short-lived cert; entry → Rekor
# Attach SLSA build provenance as an attestation
cosign attest --yes \
--predicate provenance.json \
--type slsaprovenance \
$IMG@$DIGEST
# Deploy gate: refuse anything not signed by OUR build identity
cosign verify \
--certificate-identity-regexp "https://github.com/acme/.+" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
$IMG@$DIGEST # fails closed if sig/identity/log check fails
Interview Q&A · deep dive
Systems & Platform Craft
The cross-cutting senior layer — the things a Principal / Manager loop assumes you carry in your head regardless of the role's title. Version control discipline, the reusable building blocks of any backend, the laws of distributed systems, and how you keep it all observable, secure, and shippable.
Git & branching that scales to a team version control
Git is easy solo and hard in a team. The senior skill isn't memorising commands — it's choosing a branching model that keeps many people shipping without stepping on each other, and recovering cleanly when history gets messy.
| Need | Do | Why |
|---|---|---|
| Combine branches keeping history | git merge | Preserves the true graph; one merge commit records the join. |
| Linear, clean history | git rebase | Replays your commits on top of main — tidy, but rewrites history (never rebase shared branches). |
| Undo a public commit safely | git revert | Creates an inverse commit — history stays intact for everyone. |
| Move your branch pointer | git reset | Rewrites local history — powerful, local-only. |
Interview Q&A
Underneath the commands, Git is a tiny key-value database. Every piece of content is hashed (SHA-1, migrating to SHA-256) and stored by that hash, so identical content is stored once. There are exactly four object types: a blob (file bytes), a tree (a directory listing of blobs + subtrees), a commit (one tree + parent(s) + author + message), and a tag. A branch is not a thing — it is a 41-byte file under .git/refs/heads/ holding a commit SHA. HEAD is a pointer to the current branch. That is the entire model; everything else is moving pointers.
Git tracks state across three "trees": the working directory (your files), the index/staging area (the proposed next commit), and HEAD (the last commit). add moves working→index; commit moves index→HEAD. This is exactly why reset --soft (moves HEAD only), --mixed (HEAD + index, the default), and --hard (HEAD + index + working dir) differ — each one stops at a different tree.
# peek under the hood — Git really is an object DB
git cat-file -t HEAD # commit
git cat-file -p HEAD # tree, parent, author, message
git rev-parse HEAD # the 40-char commit SHA
cat .git/refs/heads/main # a branch IS just this SHA
# trunk-based daily loop: tiny PRs onto a protected main
git switch -c feat/CT-204 main
# ...edit...
git add -p # stage hunks selectively (review your own diff)
git commit -m "add quorum read path"
git rebase origin/main # replay on latest before opening the PR
git push -u origin feat/CT-204 # CI runs; reviewer approves; squash-merge
| Strategy | History you get | Pick when |
|---|---|---|
| Merge commit | true graph; one extra commit per PR | you want an auditable record of when each PR landed |
| Rebase + FF | perfectly linear, every commit preserved | small, well-curated commit series matter (libraries) |
| Squash-merge | one commit per PR; messy WIP gone | trunk-based teams — clean main, PR = unit of change |
Interview Q&A · deep dive
blob = file content, tree = a directory mapping names to blob/tree SHAs, commit = a snapshot pointing at one root tree plus parent commit(s) and metadata, tag = an annotated pointer to a commit. A commit references a whole tree, so each commit is a full snapshot (deduplicated by hash), not a diff — Git computes diffs on demand..git/refs/heads/ containing a commit SHA (or a packed-refs entry). HEAD is a symbolic ref pointing at the current branch (ref: refs/heads/main). "Detached HEAD" means HEAD points directly at a commit instead of a branch, so new commits aren't recorded on any branch.reset --soft, --mixed, and --hard?--soft stops there (changes stay staged). --mixed (default) also resets the index (changes stay in the working dir, unstaged). --hard also resets the working directory (changes discarded). Mapped to the three trees: soft = HEAD, mixed = HEAD+index, hard = HEAD+index+working.Git & Bitbucket — the differences that trip people version control
Most Git confusion is pairs of commands that feel similar but do different things. Knowing the exact difference (and the safe one) is a senior tell — and it's where your real Bitbucket workflow on the clinical-trial repo lives.
| vs | What it does | Use when |
|---|---|---|
| fetch | downloads remote commits — does not touch your working tree | "show me what's on origin" safely |
| pull | fetch + merge (or --rebase) into your branch | actually integrate remote changes now |
| — moving / creating branches — | ||
| checkout | overloaded: switch branches and restore files (legacy) | old habit; still works everywhere |
| switch | switch/create branches only (clearer, newer) | changing branches — the modern verb |
| restore | restore file contents only | discard local file changes safely |
| — combining history — | ||
| merge | joins branches, keeps both histories (merge commit) | shared branches; preserve true history |
| rebase | replays your commits on top of another base (linear history) | tidy local history before a PR |
| — undoing — | ||
| reset | moves the branch pointer back (rewrites history) | local only; --soft keeps changes, --hard discards |
| revert | new commit that undoes a commit (keeps history) | shared branches — the safe undo |
git switch -c feature/CT-1234 # new branch (vs checkout -b)
# ...edit, commit...
git fetch origin # see remote, no merge
git rebase origin/main # replay onto latest main (tidy)
git push -u origin feature/CT-1234 # open PR in Bitbucket from here
# after review + approvals -> "Merge" (squash) -> pipeline deploys
Interview Q&A
"I lost my commits" is almost never true. As long as a commit was created, it lives in the object store and is reachable via the reflog — a local log of everywhere HEAD has pointed — for ~90 days before garbage collection. Bad merge, blown-away branch, botched rebase, accidental reset --hard: reflog finds the pre-disaster SHA every time.
# 1. UNDO A BAD RESET/REBASE — reflog is the time machine
git reflog # HEAD@{0}, HEAD@{1}... every move
git reset --hard HEAD@{2} # jump back to before the mistake
# 2. FIND THE COMMIT THAT BROKE A TEST — binary search history
git bisect start
git bisect bad # current is broken
git bisect good v1.4.0 # this tag was fine
# Git checks out the midpoint; you test, then mark each:
git bisect good # ...or 'bad'. log2(N) steps -> the culprit
git bisect reset # or: git bisect run pytest -x (fully automated)
# 3. GRAB ONE COMMIT FROM ANOTHER BRANCH
git cherry-pick a1b2c3d # apply that commit here (new SHA)
git cherry-pick --abort # if it conflicts and you change your mind
# 4. PARK WORK TO SWITCH BRANCHES FAST
git stash push -m "wip parser"
git stash list # stash@{0}: On feat: wip parser
git stash pop # reapply + drop (or 'apply' to keep it)
git pull with the default merge config creates ugly "Merge branch 'main' of origin" commits on your feature branch. Configure pull.rebase (or pull with --rebase) so your local commits replay on top of fetched ones — linear history, no noise. --ff-only is the safest pull: it refuses to do anything if a real merge/rebase would be needed, forcing you to decide consciously.
| Command | What actually happens |
|---|---|
| git pull | fetch + merge → can add a merge commit to your branch |
| git pull --rebase | fetch + replay your commits on top → linear, preferred |
| git pull --ff-only | fetch + only fast-forward; aborts if divergent → safest |
| git fetch + git log @..@{u} | fetch, then inspect incoming commits before integrating |
Interview Q&A · deep dive
reset --hard'd and lost a day of commits — they're not in any branch. Recover them?git reflog (or git fsck --lost-found for dangling commits) to find the SHA, then git branch rescue <sha> or git cherry-pick them back. Objects survive until gc prunes unreachable ones (default ~90 days), so act before that.git bisect work and when is it the right tool?good/bad and it narrows to the culprit in O(log N) steps. git bisect run <cmd> automates it with a script that exits 0 (good) / non-zero (bad). Ideal for "it worked last release, broke now, no idea which change."push --force-with-lease over --force?--force overwrites the remote unconditionally, clobbering commits a teammate pushed since you fetched. --force-with-lease only forces if the remote is still where you last saw it — if someone else pushed, it aborts. It's the difference between "I'm sure my view is current" and "overwrite no matter what."The building blocks of any backend system design
Almost every system-design answer is assembled from the same dozen parts. Know what each one buys you and what it costs, and you can compose a credible architecture for anything.
| Block | Buys you |
|---|---|
| Load balancer | Horizontal scale + failover across many app instances. |
| Cache (Redis) | Cheap, fast reads — absorbs the hot path before it hits the DB. |
| Queue (SQS/Kafka) | Decouples producers from consumers; smooths spikes; enables retries. |
| CDN | Serves static/edge content close to users. |
| Rate limiter | Protects you from abuse and runaway clients. |
| Idempotency key | Makes a repeated request safe — the backbone of reliable retries. |
Interview Q&A
The basic path is client → LB → app → cache → DB. A production system has a few more layers worth naming, because interviewers probe the edges. The CDN and API gateway sit in front; the object store and search index sit beside the DB; the queue + workers hang off the side for async work.
| Block | Buys you | Costs you |
|---|---|---|
| API gateway | one entry point: auth, rate limit, routing, TLS termination | a single chokepoint to keep highly available |
| Object store (S3) | cheap, infinite, durable storage for blobs/files/backups | high latency, eventual listing — not a database |
| Search index (ES/OpenSearch) | full-text + faceted queries the DB can't do well | a second copy to keep in sync with the source of truth |
| Read replica | scales reads; offloads the primary | replication lag → stale reads |
| Blob/CDN edge | static assets served near the user, off your origin | cache invalidation across edges |
Vertical scaling (bigger box) is the cheapest first move — no code changes, just more CPU/RAM — but it has a ceiling and a single point of failure. Horizontal scaling (more boxes behind a load balancer) is effectively unbounded but only works if your app tier is stateless: any instance must be able to serve any request. The moment you store session state in process memory, you've broken horizontal scaling and forced sticky sessions.
Interview Q&A · deep dive
The laws of distributed systems fundamentals
Once data lives on more than one machine, physics imposes trade-offs you can't engineer away — only choose between. The senior move is naming the trade-off you're making, out loud.
| Idea | What it forces |
|---|---|
| CAP theorem | During a network partition you must choose: stay Consistent (reject) or stay Available (serve possibly-stale). You can't have both mid-partition. |
| Strong consistency | Every read sees the latest write — simpler to reason about, costs latency and availability. |
| Eventual consistency | Reads may lag; converges over time — high availability, weaker guarantees. |
| Replication | Copies for durability + read scale; introduces lag and conflict. |
| Partitioning / sharding | Splits data by key for write scale; cross-shard queries get hard. |
| Consensus (Raft) | How a cluster agrees on one value despite failures — the basis of leader election. |
Interview Q&A
CAP only describes behaviour during a partition, which is rare. PACELC completes the picture: if Partition, choose Availability or Consistency; Else (normal operation), choose Latency or Consistency. Every real system trades latency for consistency even when nothing is broken — that's the part CAP ignores, and it's the more common decision.
| System | On partition | Normal (else) |
|---|---|---|
| DynamoDB / Cassandra | PA (stay available) | EL (favor latency) |
| Spanner | PC (stay consistent) | EC (favor consistency) |
| Default RDBMS | PC (refuse / fail over) | EC (consistent reads) |
| MongoDB (default) | PC (primary only) | EC, but tunable per read |
"Strong" and "eventual" are the endpoints; the useful guarantees live in between. Most production correctness bugs come from assuming a stronger model than the store actually provides.
| Model | Guarantee |
|---|---|
| Linearizable | strongest: every op appears to happen instantly at one point in real time; reads see the latest write. |
| Sequential | all nodes see ops in the same order, but not necessarily real-time order. |
| Causal | operations that are causally related are seen in order by everyone; concurrent ops may differ. |
| Read-your-writes | a client always sees its own prior writes (a session guarantee). |
| Eventual | weakest: replicas converge if writes stop; no ordering or recency promise. |
The fallacies of distributed computing are the false assumptions that sink naive designs. The FLP impossibility result is the theoretical cousin: in a fully asynchronous network with even one faulty process, no consensus algorithm can guarantee it always terminates — which is why real systems (Raft, Paxos) add timeouts/randomization to make progress in practice.
| The fallacy (it's false) | Reality you must design for |
|---|---|
| The network is reliable | packets drop; calls hang — use timeouts + retries |
| Latency is zero | round trips dominate — batch, cache, co-locate |
| Bandwidth is infinite | large payloads throttle — paginate, compress |
| The network is secure | assume hostile — authn/authz, TLS everywhere |
| Topology doesn't change | nodes come and go — service discovery, no hardcoded IPs |
| There is one administrator | many owners — version contracts, backward compat |
| Transport cost is zero | serialization + bandwidth cost real money/CPU |
| The network is homogeneous | mixed clients/protocols — standard formats, negotiation |
Interview Q&A · deep dive
Distributed systems — the patterns deep
CAP names the trade-off; these are the patterns you reach for once you accept it. Naming the right one for a failure scenario is the heart of a senior system-design round. (Builds on the laws card.)
| Pattern | Problem it solves |
|---|---|
| Idempotency | networks retry, so the same request can arrive twice. An idempotent operation (or an idempotency key the server dedupes on) makes a retry harmless — vital for payments, "create order", etc. |
| Exactly-once (= at-least-once + dedup) | true exactly-once delivery is impossible across a network; you get it in effect by making consumers idempotent and deduping on a message id. |
| Consistent hashing | distribute keys so adding/removing a node moves only ~1/N keys, not everything — the basis of caches, shards, and DHTs. |
| 2PC vs Saga | a transaction across services. 2PC locks all participants (consistent but blocking, fragile); a saga is a chain of local commits with compensating undo steps (available, eventually consistent) — the microservices default. |
| Outbox pattern | write the DB row and the "to-publish" event in one local transaction, then relay the event — avoids the dual-write problem (DB committed but event lost). |
| CRDTs | data types that merge concurrent edits without conflict (counters, sets) — power offline-first and multi-region writes. |
| Backpressure | when a consumer can't keep up, signal upstream to slow down (bounded queues, credits) instead of exploding memory. |
| Leader election | pick one coordinator among peers (via consensus / a lease) so exactly one node owns a task — Raft, ZooKeeper, etcd. |
# client sends a stable key; server dedupes so a retry is a no-op
def create_order(req, key):
if store.seen(key): # already processed this key?
return store.result(key) # same result, no double-charge
result = process(req)
store.save(key, result) # remember key -> result
return result
Interview Q&A
Consensus is "get N nodes to agree on one ordered log despite failures." Raft makes it understandable by splitting it into three sub-problems: leader election (one node wins a majority vote per term), log replication (only the leader takes writes; it appends to a majority before committing), and safety (a new leader must contain all committed entries). The magic word is quorum: any majority overlaps any other majority, so a committed entry can never be lost or contradicted.
You don't always need full Raft. For "exactly one worker runs this job," a lease in a strongly-consistent store (etcd, ZooKeeper, Redis with care) is enough: whoever holds the unexpired lease is leader; they must renew before it expires (fencing). The classic bug is a leader that pauses (GC, network stall) past its lease, a new leader takes over, then the old one wakes and acts — two leaders. The fix is a monotonic fencing token the resource checks.
# single-leader via a fenced lease (pseudo-etcd)
def run_as_leader(node_id):
lease = etcd.grant(ttl=10) # 10s lease
got = etcd.put_if_absent("leader/job", node_id, lease)
if not got:
return # someone else leads; stand by
while etcd.keep_alive(lease): # renew before TTL expires
token = etcd.revision("leader/job") # monotonic fencing token
do_leader_work(fencing=token) # resource rejects stale tokens
The dual-write trap: you update the DB and publish an event in two systems, and a crash between them leaves them inconsistent (row saved, event lost — or vice versa). The transactional outbox fixes it by writing the event into an outbox table in the same DB transaction as the business row. A separate relay (or change-data-capture like Debezium) reads the outbox and publishes — at-least-once — so consumers must be idempotent.
with db.transaction(): # one atomic commit
db.execute("INSERT INTO orders ...", order)
db.execute("INSERT INTO outbox(topic, payload, status) "
"VALUES ('order.created', %s, 'pending')", event)
# --- separate relay process, polls or via CDC ---
for row in db.fetch("SELECT * FROM outbox WHERE status='pending'"):
broker.publish(row.topic, row.payload) # at-least-once
db.execute("UPDATE outbox SET status='sent' WHERE id=%s", row.id)
Interview Q&A · deep dive
Caching — the cheapest performance win, the hardest correctness bug performance
Caching turns expensive work into a fast lookup. The catch is the famous one: cache invalidation. Know the patterns and the failure modes and you get the speed without the stale-data pain.
| Pattern | Behaviour |
|---|---|
| Cache-aside | App manages it; load on miss. Most common, most flexible. |
| Read-through | Cache loads from DB itself on miss — app just asks the cache. |
| Write-through | Write to cache + DB together — consistent, slower writes. |
| Write-back | Write cache now, DB later — fast, risks loss on crash. |
Interview Q&A
A cache is not a second source of truth — it is a guess that the next read wants the same bytes as a recent one. Every entry trades memory + a staleness risk for latency. That framing decides everything: pick a TTL by asking "how wrong can this be before a user notices?", size by working-set not total dataset, and accept that a cache is allowed to be empty at any moment — your DB must survive a 0% hit rate (a cold start or a flush). If it can't, the cache is load-bearing and you've built a fragile system, not a fast one.
| Policy | Keeps | Best when |
|---|---|---|
| LRU (least-recently-used) | recently touched keys | access has temporal locality (sessions, feeds) |
| LFU (least-frequently-used) | popular keys over time | a stable hot set (top products, hot trials) that a one-off scan shouldn't flush |
| FIFO / TTL-only | newest / unexpired | data with a natural freshness clock (tokens, quotes) |
import time, random, threading, hashlib
_locks: dict = {} # per-key in-process locks
_guard = threading.Lock()
def _lock_for(key):
with _guard:
return _locks.setdefault(key, threading.Lock())
def get_or_load(r, key, loader, ttl=300):
val = r.get(key)
if val is not None:
return val # hit
# miss: only ONE caller per key recomputes; others wait + re-read
with _lock_for(key):
val = r.get(key) # double-check after acquiring
if val is None:
val = loader() # the expensive DB / API call
jitter = int(ttl * random.uniform(0.8, 1.2))
r.set(key, val, ex=jitter) # spread expiries → no synchronized stampede
return val
In a multi-process / multi-host fleet the in-process lock isn't enough — promote it to a distributed lock (SET key uuid NX EX 10, released with a Lua compare-and-delete) so exactly one replica rebuilds a hot key.
Interview Q&A · deep dive
Observability & SRE — know it's broken before users do reliability
You can't operate what you can't see. Observability is the three signals that let you ask new questions of a live system; SRE is the discipline of turning reliability into measurable targets.
| Term | Meaning |
|---|---|
| SLI | Service Level Indicator — the measured number (e.g. p95 latency, error rate). |
| SLO | Service Level Objective — the target for that SLI (e.g. 99.9% success). |
| SLA | The contractual promise to a customer, with consequences if missed. |
| Error budget | 1 − SLO. The allowed unreliability you can 'spend' on shipping fast. |
Interview Q&A
The three signals answer different questions: metrics tell you that something is wrong (cheap, aggregated, alertable), traces tell you where in a request path (which span ate the latency), and logs tell you why (the exact error, the bad input). The modern shift is structured + correlated: one trace_id threaded through logs, metrics exemplars, and spans so you pivot from a latency spike straight to the offending request. OpenTelemetry (OTel) is the now-standard vendor-neutral way to emit all three — instrument once, export anywhere (Prometheus, Grafana, Datadog, CloudWatch).
# SLO: 99.9% of requests succeed over a 28-day window
# Error budget = (1 - 0.999) = 0.1% of total requests allowed to fail
total = 50_000_000 # requests in window
budget = total * (1 - 0.999) # = 50,000 allowed failures
failed = 18_500
remaining = budget - failed # 31,500 left
burn_rate = (failed / budget) # 0.37 of budget used
# Multi-window burn-rate alerting (Google SRE): page only on FAST burns
# 14.4x burn over 1h → exhausts a 30d budget in ~2 days → PAGE
# 1x burn over 6h → on track, no action → TICKET / none
def should_page(short_burn, long_burn):
return short_burn > 14.4 and long_burn > 14.4 # both windows confirm
Burn-rate alerting beats "alert if error rate > 1%" because it ties urgency to how fast you're spending the budget: a brief blip self-heals and shouldn't wake anyone; a sustained fast burn that will exhaust the month in days should. Requiring two windows (short + long) kills both flapping and slow-creep blindness.
| Method | Applies to | The three signals |
|---|---|---|
| RED | request-driven services | Rate, Errors, Duration |
| USE | resources (CPU, disk, queue) | Utilisation, Saturation, Errors |
| Four Golden Signals | any user-facing system | Latency, Traffic, Errors, Saturation |
Interview Q&A · deep dive
Security essentials — the non-negotiables security
You don't need to be a security specialist, but a senior engineer is expected to not introduce the obvious holes. Carry this short list and apply it to every design.
| Concept | Plain meaning |
|---|---|
| AuthN (authentication) | Who are you? — verify identity (password, token, OAuth). |
| AuthZ (authorization) | What are you allowed to do? — permissions, roles (RBAC). |
| Least privilege | Grant the minimum access needed — the core of IAM. |
| Secrets management | Keys never in code or Git — use a vault / Secrets Manager + env injection. |
| Encryption | TLS in transit, encryption at rest — both, always, for sensitive data. |
| Parameterised queries | The fix for SQL injection — separate code from data. |
Interview Q&A
Security isn't a checklist of features bolted on at the end — it's a way of drawing trust boundaries and asking, at each one, "what can a hostile input do here?" Every place data crosses from less-trusted to more-trusted (browser→API, API→DB, retrieved doc→LLM prompt) is a boundary that needs validation. Defense in depth means no single control is load-bearing: even if the WAF is bypassed and authN is broken, parameterised queries + least-privilege DB creds + encryption should still contain the blast.
The single highest-leverage habit: validate against what's allowed, not what's forbidden. Deny-lists ("strip <script>") are an arms race you lose — attackers find the encoding you forgot. Allow-lists ("this field is a UUID / an int 1–100 / one of these enum values") fail closed.
from pydantic import BaseModel, EmailStr, conint, constr
class CreateUser(BaseModel): # schema = the trust boundary
email: EmailStr # validated format, not regex-by-hand
age: conint(ge=13, le=120) # bounded int, rejects garbage
role: constr(pattern=r"^(viewer|editor|admin)$") # allow-list enum
# reject unknown/extra fields instead of silently trusting them
class Config:
extra = "forbid"
# parameterised query — code and data never mix (no string-building SQL)
cur.execute("SELECT * FROM users WHERE id = %s", (user_id,))
| Rule | Why |
|---|---|
| Never in Git | history is forever; a leaked key in commit #3 is live even after deletion. Use git-secrets / pre-commit hooks + repo scanning. |
| Inject at runtime | from a vault (HashiCorp Vault, AWS Secrets Manager) into env/memory — not baked into the image layer. |
| Rotate & scope | short-lived, narrowly-scoped credentials limit a leak's damage window and reach (least privilege applied to secrets). |
| Audit access | who read which secret when — so a compromise is detectable, not silent. |
Most of your code is code you didn't write. SCA (software composition analysis — pip-audit, npm audit, Dependabot, Snyk) flags known-vulnerable transitive deps; a lockfile + hash pinning stops a malicious version swap; an SBOM (software bill of materials) lets you answer "are we exposed to CVE-X?" in minutes, not days. This is now a CI gate, not an afterthought — see the scan stage in CI/CD.
Interview Q&A · deep dive
"role": "admin". Can you trust it?CI/CD & a testing strategy that ships delivery
Continuous Integration = every change is built and tested automatically. Continuous Delivery = that change is always releasable. The point is to make shipping boring, frequent, and reversible.
| Layer | Test pyramid |
|---|---|
| Unit (many, fast) | One function/class in isolation — the broad base. |
| Integration (some) | Components together — DB, API, queue. |
| End-to-end (few, slow) | Whole flow via the UI (Playwright/Selenium) — the thin top. |
Interview Q&A
A good pipeline is ordered by cost and confidence: run the fast, cheap, high-signal checks first (lint, unit tests in seconds) so a bad commit dies before it ever spins up a slow integration env or burns cloud minutes. Each stage is a gate — green is required to proceed. The mental model is a funnel: thousands of unit tests, dozens of integration tests, a handful of e2e checks, one deploy.
Trunk-based: everyone commits to main (or short-lived branches merged daily), behind feature flags for incomplete work — so integration happens continuously instead of in one painful long-lived-branch merge. Pair it with the golden rule: build the artifact once, promote the same artifact through dev→staging→prod. Never rebuild per environment (a rebuild can pull a different dependency and ship something you never tested). Config differs per environment; the binary does not.
# .github/workflows/ci.yml
name: ci
on: [push]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install -r requirements.txt
- run: ruff check . # lint — fastest, fail first
- run: pytest -m "not integration" --cov # unit (broad base)
- run: pytest -m integration # integration (some)
- run: pip-audit # dependency CVE scan (security gate)
deploy:
needs: test # gate: only if tests pass
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- run: ./deploy.sh canary --weight 5 # 5% first, watch SLOs, then ramp
| Strategy | How it rolls out | Rollback & cost |
|---|---|---|
| Rolling | replace instances in batches | slow rollback (re-deploy old); cheap (no extra infra) |
| Blue-green | stand up full parallel env, flip the router | instant rollback (flip back); 2x infra briefly |
| Canary | 1–5% of traffic first, auto-ramp on healthy metrics | smallest blast radius; needs good observability to judge "healthy" |
Interview Q&A · deep dive
Redis (& Valkey) — the in-memory swiss-army store data infra
Redis is an in-memory key-value store used as cache, session store, rate-limiter, queue, and leaderboard. It's fast because data lives in RAM and the core is effectively single-threaded — operations are atomic, no lock contention. The real skill is knowing which of its data types turns a hard problem into one command. (Extends Caching.)
| Data type | Use it for |
|---|---|
| String | cache values, counters (INCR), feature flags |
| Hash | objects / records (user:42 → {name, email}) |
| List | queues, recent-items, simple job pipelines |
| Set | unique membership, tags, de-dup |
| Sorted set (ZSET) | leaderboards, priority queues, rate-limit windows |
| Stream | append-only event log with consumer groups |
| Pub/Sub | fire-and-forget messaging, live notifications |
| Vector set (Redis 8) | in-cache semantic search (HNSW) for RAG |
# atomic: increment this caller's counter, expire the window on first hit
def allow(user_id, limit=100, window=60):
key = f"rl:{user_id}:{int(time.time()) // window}"
n = r.incr(key) # INCR is atomic — no race
if n == 1:
r.expire(key, window) # first request sets the TTL
return n <= limit # True = allowed, False = 429
| Operational knob | What to know |
|---|---|
| Persistence | RDB (point-in-time snapshots, fast restart) vs AOF (append every write, more durable). Many run both. |
| Eviction | when memory is full: allkeys-lru for a pure cache, volatile-ttl to respect TTLs, noeviction to error instead of dropping. |
| Scale | replicas for read scale + failover (Sentinel); Cluster mode for sharding across nodes (16384 hash slots). |
Interview Q&A
The leap from "Redis as a dumb cache" to "Redis as a tool" is realising each data type is an algorithm you get for free, atomically, in RAM. A leaderboard is a hard problem in SQL (ranked window queries on every read) and one command in Redis (ZADD + ZREVRANK, O(log N)). The skill is matching the access pattern to the structure before reaching for a string + JSON blob, which throws away every operation the native type would have given you.
import time, redis
r = redis.Redis()
def allow(user_id, limit=100, window=60):
key = f"rl:{user_id}"
now = time.time()
pipe = r.pipeline() # batch 4 ops in one round-trip
pipe.zremrangebyscore(key, 0, now - window) # drop events outside the window
pipe.zadd(key, {str(now): now}) # record this request
pipe.zcard(key) # how many in the window now?
pipe.expire(key, window) # auto-clean idle users
_, _, count, _ = pipe.execute()
return count <= limit # True = allowed
Unlike a fixed window (which lets a user fire 2x the limit across a boundary), the sliding window counts the last 60s exactly. For strict atomicity under contention, wrap the same logic in a Lua script — it runs server-side as one indivisible operation.
| RDB (snapshot) | AOF (append-only log) | |
|---|---|---|
| What | periodic point-in-time dump (fork + copy-on-write) | every write command logged, replayed on restart |
| Durability | lose everything since last snapshot (minutes) | everysec fsync → lose ≤1s; always → near-zero but slow |
| Restart | fast (load one compact file) | slower (replay the log; periodic rewrite compacts it) |
| Cost | fork can stall on huge datasets | larger files, fsync I/O on the hot path |
Stay current here — it's a common senior interview probe. As of mid-2026: Valkey 9.1 (May 2026, Linux Foundation, BSD-3) reports ~2.1M ops/s with a ~10% memory cut and is the default for new clusters on AWS ElastiCache and Google Memorystore; Redis 8.2 (GA Feb 2026) is tri-licensed (RSALv2 / SSPLv1 / OSI-approved AGPLv3 since May 2025) and leads on in-core vector search (vector sets, dual cosine + dot-product similarity). They share the protocol and are ~drop-in compatible. Pragmatic default: Valkey for a clean BSD license + cloud-default pricing (benchmarks ~8% faster, ~20% cheaper, lower p99); pick Redis 8.2 when you specifically want its richer in-core vector / search modules. AGPL is fine for internal use but many legal teams treat network-copyleft as a blocker for SaaS — another reason new greenfield work leans Valkey.
Interview Q&A · deep dive
Apache Kafka — the distributed event log data infra
Kafka is a distributed, append-only commit log you publish events to and many consumers read from independently. It's the backbone of event-driven and streaming systems: durable, ordered per partition, replayable, horizontally scalable. The model: a topic is a log, split into partitions (the unit of parallelism & ordering), and consumers track their position by offset. (Complements NiFi · Kafka · streaming.)
| Concept | What it is |
|---|---|
| Topic | a named stream of events (the log) |
| Partition | an ordered shard of a topic — parallelism & ordering live here |
| Offset | a consumer's position in a partition (Kafka stores the data; you track where you are) |
| Producer / Consumer | writes events / reads events |
| Consumer group | consumers sharing the work — each partition goes to exactly one member |
| Broker | a server holding partitions; replication across brokers gives durability |
| At-most-once | commit offset before processing — may lose messages, never duplicates |
| At-least-once | process then commit — never lose, may duplicate (the common default; make consumers idempotent) |
| Exactly-once | idempotent producer + transactions — strongest, costs throughput; for money / ledgers |
from confluent_kafka import Producer, Consumer
p = Producer({"bootstrap.servers": "broker:9092"})
p.produce("trial-updates", key=gdcid, value=json.dumps(update))
p.flush() # ensure it's sent
c = Consumer({"bootstrap.servers": "broker:9092",
"group.id": "indexer", # the consumer group
"group.protocol": "consumer", # KIP-848 (Kafka 4.0)
"auto.offset.reset": "earliest"})
c.subscribe(["trial-updates"])
while True:
msg = c.poll(1.0)
if msg and not msg.error():
index(msg.value()) # do the work first...
c.commit(msg) # ...then commit = at-least-once
| Reach for | When |
|---|---|
| Kafka | high-throughput streams, multiple independent consumers, replay / audit, event sourcing |
| SQS / RabbitMQ | simple task queues, per-message ack/delete, no replay needed, lower ops |
Interview Q&A
A partition is not an abstraction — it's a directory of segment files (*.log plus *.index / *.timeindex). Writes are append-only sequential I/O, which is why Kafka saturates disks: it never seeks. Old segments roll off by retention (time or size). Consumers don't pull one message over the wire at a time — the broker serves a byte range straight from the page cache via sendfile() (zero-copy), so a healthy cluster barely touches the JVM heap for payloads. Read throughput is dominated by the OS, not Kafka code.
The producer's three knobs decide your durability/throughput tradeoff. acks=0 fire-and-forget (fastest, lossy), acks=1 leader-only (loses data if the leader dies before replication), acks=all waits for the whole ISR. With acks=all + min.insync.replicas=2 a single broker loss never loses an acked write. The idempotent producer (default since 3.0) stamps each record with a producer id + sequence number so a retry after a network blip can't create a duplicate — and it preserves order even with max.in.flight.requests=5.
from confluent_kafka import Producer
p = Producer({
"bootstrap.servers": "broker:9092",
"acks": "all", # wait for full ISR before ack
"enable.idempotence": True, # dedup + ordered retries (pid + seq)
"max.in.flight.requests.per.connection": 5,
"compression.type": "zstd", # batch-level, big throughput win
"linger.ms": 10, # wait 10ms to fill bigger batches
})
def on_delivery(err, msg): # async callback per record
if err: log.error("failed %s", err)
else: log.info("%s[%d]@%d", msg.topic(), msg.partition(), msg.offset())
for evt in updates:
p.produce("trial-updates", key=evt["gdcid"], # key → same partition → ordered
value=json.dumps(evt), callback=on_delivery)
p.poll(0) # serve delivery callbacks without blocking
p.flush(10) # block up to 10s for in-flight to drain
EOS is more than the idempotent producer. A transaction atomically commits both the output records and the consumed input offsets, so a stream job that reads topic A and writes topic B can't double-count on a crash. Consumers must set isolation.level=read_committed to skip aborted batches. This is exactly the machinery Kafka Streams uses under processing.guarantee=exactly_once_v2 — you rarely hand-roll it.
# transactional read-process-write loop (skeleton)
producer.init_transactions()
while True:
batch = consumer.poll(1.0)
producer.begin_transaction()
for rec in batch:
producer.produce("enriched", value=transform(rec.value()))
# offsets committed INSIDE the txn — atomic with the output
producer.send_offsets_to_transaction(consumer.position(consumer.assignment()),
consumer.consumer_group_metadata())
producer.commit_transaction() # both visible together, or neither
| cleanup.policy | Behaviour · use it for |
|---|---|
| delete | drop whole segments past retention.ms/.bytes — event streams, metrics, logs |
| compact | keep the latest value per key forever; tombstone (null value) deletes a key — changelogs, CDC, config, the __consumer_offsets topic itself |
| compact,delete | both: latest-per-key, but also age out very old keys |
Interview Q&A · deep dive
acks=all meets min.insync.replicas?replica.lag.time.max.ms. With acks=all a write is acked only once every ISR member has it. If the ISR shrinks below min.insync.replicas (say a broker dies), the producer gets NotEnoughReplicas and the partition rejects writes — Kafka chooses consistency over availability here rather than ack data it can't durably hold. Tuning min.insync.replicas=2 on RF=3 is the standard durability posture.isolation.level=read_committed. People conflate the two constantly; the idempotent producer is necessary but not sufficient.enable.auto.commit=true commits on a timer and the process dies mid-batch before the next tick. Make commits explicit after successful processing, and make the handler idempotent so the inevitable at-least-once duplicate is harmless.Terraform & IaC — infrastructure as code platform
Infrastructure as Code means your servers, networks, and databases are defined in version-controlled files, not clicked together by hand — so environments are reproducible, reviewable, and disposable. Terraform is the dominant tool: you write declarative HCL describing the desired end state, and it computes the changes to get there.
# declarative: describe the end state, not the steps
resource "aws_s3_bucket" "trials" {
bucket = "ci-radar-trial-exports"
tags = { team = "automation", env = var.env }
}
resource "aws_s3_bucket_versioning" "v" {
bucket = aws_s3_bucket.trials.id # reference = a dependency edge
versioning_configuration { status = "Enabled" }
}
# reuse with a module + variables across dev / stage / prod
module "network" {
source = "./modules/vpc"
cidr = var.vpc_cidr
}
| Concept | Why it matters |
|---|---|
| State | Terraform's record of real resources — the source of truth for diffs. Keep it in a remote backend (S3 + lock) so a team shares it safely; never commit it to git. |
| Provider | the plugin that talks to AWS / Azure / GCP / K8s (3,900+ exist) — one language, every cloud. |
| Module | a reusable, parameterised bundle of resources — your “function” for infrastructure. |
| Drift | when reality diverges from state (someone clicked in the console); plan detects it. |
Interview Q&A
Terraform is a three-way merge, not a script runner. Every plan compares three things: your config (desired), the state file (last-known), and reality (a live refresh of the provider API). The diff is computed from all three — which is why deleting a resource from config schedules a destroy (config says gone, state says exists), and why someone editing in the console shows as drift (state says X, reality says Y). Holding this triangle in your head explains almost every "why is Terraform doing that?" moment.
Junior HCL copy-pastes a resource five times. Senior HCL uses for_each over a map (stable addressing — removing one key destroys only that one, unlike count which re-indexes and can recreate everything after a delete), depends_on only for hidden dependencies the graph can't infer, lifecycle to protect or order changes, and dynamic blocks to template nested config. count is a list (index keys); for_each is a map (semantic keys) — prefer the map.
# for_each over a map → stable, named instances
variable "buckets" {
type = map(object({ versioned = bool }))
default = {
exports = { versioned = true }
cache = { versioned = false }
}
}
resource "aws_s3_bucket" "b" {
for_each = var.buckets # each.key / each.value available
bucket = "ci-radar-${each.key}-${var.env}"
lifecycle {
prevent_destroy = true # refuse to delete prod data buckets
ignore_changes = [tags["LastScanned"]] # a process mutates this; don't fight it
}
}
output "bucket_arns" {
value = { for k, b in aws_s3_bucket.b : k => b.arn }
}
Two engineers running apply at once against shared state corrupts it. The remote backend (S3 + DynamoDB lock, or a managed backend) gives a state lock so the second apply waits. For environments, prefer separate state files per environment (a backend key per env, or directory-per-env) over a single state with workspace switches — workspaces share one backend config and one provider config, so a fat-fingered workspace select prod can apply dev changes to prod. Workspaces suit ephemeral/parallel copies, not the prod/stage boundary.
terraform {
backend "s3" {
bucket = "ci-radar-tfstate"
key = "prod/network.tfstate" # one key per env per stack
region = "us-east-1"
dynamodb_table = "tf-locks" # the lock table
encrypt = true
}
}
| Command | What it really does |
|---|---|
| terraform import | adopt an existing live resource into state without recreating it — how you bring click-ops infra under management |
| terraform state rm | forget a resource (stop managing) without destroying it — surgical state edits |
| terraform taint / -replace | force-recreate a resource on next apply (cordon a bad instance) |
| terraform plan -out | save a plan so apply runs exactly that diff — the safe CI pattern |
Interview Q&A · deep dive
for_each over count?count addresses instances by list index (res[0], res[1]). Delete the middle element and everything after it shifts index — Terraform sees that as destroy+recreate of those resources. for_each keys by a stable map key (res["cache"]), so removing one entry touches only that one. Use count for "N identical copies" or a simple on/off (count = var.enabled ? 1 : 0); use for_each for a set of distinct, named things.plan refreshes live state, detects drift (reality ≠ state), and proposes changes to pull reality back to your config. You either accept (apply re-asserts the config — config is the source of truth) or, if the manual change should be kept, codify it in HCL first. For a resource that should genuinely no longer be managed, terraform state rm. Continuous drift detection in CI (plan on a schedule) catches this before it bites.plan -out=tfplan on the PR (posted as a reviewable diff), gate merge on human approval of that plan, then apply tfplan on merge so apply runs the exact reviewed diff — no surprise drift between plan and apply. Remote state with locking prevents concurrent applies, and a policy-as-code layer (OPA/Sentinel) can hard-block disallowed changes before apply.The breadth shelf — name-drop the rest with judgement breadth
Senior interviews reward breadth with a one-line “when” more than shallow tutorials. These come up constantly; you don't need to have shipped all of them, but you should know what each is and when it's the right reach.
| Tech | What it is · when to reach for it |
|---|---|
| gRPC + Protobuf | fast, typed, binary RPC over HTTP/2 — internal service-to-service calls where REST/JSON is too slow or loose |
| GraphQL | one endpoint, client picks exactly the fields — great for varied front-end needs; watch N+1 and caching |
| Elasticsearch / OpenSearch | full-text + analytics search engine; log search, faceted search, hybrid vector search |
| dbt | SQL transformation + tests + lineage in the warehouse — the “T” in modern ELT (pairs with Snowflake) |
| Apache Spark | distributed compute for big-data ETL & ML over data that won't fit one machine |
| Prometheus + Grafana | metrics scraping + dashboards/alerts — the default observability stack (see Observability) |
| WebSockets | persistent two-way connection for real-time UIs (chat, live dashboards, streaming tokens) |
| Iceberg / Delta Lake | open table formats bringing ACID + time-travel to data-lake files — the “lakehouse” foundation |
| Polars / DuckDB | fast modern data tools — Polars (Rust DataFrames), DuckDB (in-process analytical SQL) when Pandas/Postgres strain |
| Feature store (Feast) | consistent features for training & serving — closes the train/serve skew gap in MLOps |
| Service mesh (Istio) | traffic, mTLS, retries between microservices without app code — when you have many services |
| Tech | What it is · when to reach for it |
|---|---|
| Apache Flink | true streaming compute with event-time, watermarks & large keyed state — when you need stateful joins/windows on streams, not micro-batches (pairs with Kafka) |
| Kafka Connect | config-driven connectors to move data in/out of Kafka (CDC from Postgres, sink to S3) — no custom producer/consumer code |
| Temporal | durable workflow engine — long-running, retryable, stateful orchestrations as plain code that survive crashes (sagas, human-in-the-loop) |
| Celery | Python distributed task queue (Redis/RabbitMQ broker) — background jobs, scheduled work, fan-out when you don't need a full streaming platform |
| Airbyte / Fivetran | managed EL connectors (the "extract-load") — buy the boring pipes from SaaS into the warehouse instead of building 200 integrations |
| Redis | in-memory store — cache, rate-limit counters, ephemeral queues, pub/sub, leaderboards; reach for it when a millisecond matters |
| Envoy / Istio | L7 proxy + mesh control plane — mTLS, retries, traffic-splitting between many services without touching app code (see Kubernetes) |
| Iceberg (table format) | ACID, schema evolution & time-travel over object-store files — the open lakehouse table layer queried by Spark/Trino/Snowflake (see Snowflake) |
| Trino / Presto | distributed SQL engine that federates queries across lake, warehouse & DBs — one SQL surface over many sources without copying data |
| Pulsar | Kafka-alternative log with built-in multi-tenancy, geo-replication & tiered storage — when those are first-class needs over raw throughput |
Interview Q&A · deep dive
YAML — config & data serialization config
YAML is the human-friendly format behind Kubernetes, CI/CD, docker-compose, and Ansible. It's a superset of JSON with indentation-based structure — readable, but with sharp edges that bite in production if you don't know the rules.
defaults: &base # & defines an anchor
retries: 3
timeout: 30
prod:
<<: *base # << merges the anchor, * references it
timeout: 60 # override a single value
hosts: # a list
- web-1
- web-2
notes: | # literal block: newlines preserved
first line
second line
| Rule | Detail |
|---|---|
| Indentation | spaces only (never tabs); nesting is by indent depth |
| Mappings & lists | key: value · list items begin with - |
| Multi-document | --- separates multiple docs in one file |
| Block scalars | | literal (keep newlines) · > folded (join lines) |
| Anchors / merge | &name defines · *name reuses · << merges — keeps config DRY |
import yaml
cfg = yaml.safe_load(open("config.yaml")) # safe: data only, never code
cfg["prod"]["timeout"] # 60
Interview Q&A
YAML's selling point is being readable to humans and a strict superset of JSON — so any valid JSON is valid YAML, and you can mix flow style ({a: 1, b: [2, 3]}) with block style. Under the hood a YAML document is a graph of three node kinds: scalars, sequences (lists), and mappings (dicts). Anchors/aliases make it a graph, not just a tree — the same node can be referenced from multiple places, which is how merge keys avoid copy-paste. The price of human-friendliness is ambiguity: the spec has implicit typing rules that guess scalar types, and that guessing is where production bugs live.
# one file, two documents — common in k8s manifests & --- separators
apiVersion: v1
kind: ConfigMap
data:
port: "8080" # quoted → stays a STRING (k8s data must be strings)
---
apiVersion: v1
kind: Service
spec:
ports: [{ port: 80, targetPort: 8080 }] # flow style = inline JSON-ish
selector: { app: api }
---
# force a type with an explicit tag when the guesser would be wrong
version: !!str 1.10 # without !!str this becomes the float 1.1
country: !!str no # without !!str this becomes the boolean false
ratio: !!float 3 # 3.0 not int 3
import yaml
with open("deploy.yaml") as f:
docs = list(yaml.safe_load_all(f)) # safe + multi-doc aware
for d in docs:
if d.get("kind") == "ConfigMap":
d["data"]["port"] = "9090"
with open("deploy.yaml", "w") as f:
yaml.safe_dump_all(
docs, f,
default_flow_style=False, # block style, human-readable
sort_keys=False, # preserve author ordering
) # NOTE: comments & anchors are LOST on dump
| Written | YAML 1.1 parses it as | Keep it a string by… |
|---|---|---|
| no / off / n | boolean false | quoting: "no" |
| 1.10 | float 1.1 (trailing zero lost) | quoting: "1.10" |
| 3:30 | sexagesimal → 210 (1.1) | quoting: "3:30" |
| 0x1F / 0o17 | int from hex/octal | quoting |
| null / ~ / (empty) | None | quoting: "null" |
Interview Q&A · deep dive
&name anchors a node, *name aliases (references the same node), and <<: *name merges a mapping's keys into the current one (DRY config). The surprise: << is a YAML 1.1 extension, not core 1.2 — strict 1.2 parsers may not honour it, and merge precedence means explicit keys in the child override merged ones, which trips people expecting last-wins across multiple merges. Also, aliases share identity, so mutating an aliased node after load can affect every reference.safe_load a security control, not just a style choice?yaml.load honours type tags like !!python/object/apply that instantiate arbitrary Python objects — feeding it untrusted YAML is remote code execution, a real deserialization CVE class. safe_load restricts construction to the standard scalar/list/dict types. Treat any externally-sourced YAML (user uploads, fetched config) as hostile and always use the safe loader.| vs folded > block scalars, and what do the chomping indicators do?| keeps newlines verbatim (scripts, embedded files, certs); > folds line breaks into spaces (long prose wrapped for readability). The chomping indicator controls the trailing newline: |- strips it, |+ keeps all trailing blanks, | (clip, default) keeps exactly one. This matters for embedded shell scripts where a stray trailing newline or its absence changes behaviour.pytest — testing in Python quality
pytest is the de-facto Python test framework: plain assert statements with rich failure output, fixtures for setup/teardown, parametrization to run one test over many inputs, and a deep plugin ecosystem. Tests are both your safety net and design feedback.
import pytest
@pytest.fixture
def client(): # setup/teardown shared across tests
c = make_client()
yield c # test runs here
c.close() # teardown after
@pytest.mark.parametrize("n,expected", [(2, 4), (3, 9)])
def test_square(n, expected):
assert square(n) == expected # plain assert; pytest shows the diff
def test_calls_api(monkeypatch):
monkeypatch.setattr(api, "get", lambda u: {"ok": True})
assert fetch()["ok"] # no real network call
| Feature | What it gives |
|---|---|
| Fixtures | reusable setup/teardown injected by name; scope per function/module/session |
| @parametrize | one test body, many input/expected cases — great for edge cases |
| monkeypatch / mock | replace external calls (network, time, DB) so tests are fast and deterministic |
| conftest.py | share fixtures across a test tree without importing |
| markers + pytest-cov | tag/select tests (slow, integration) and measure coverage |
Interview Q&A
A fixture isn't just setup code — it's dependency injection by name. Request a fixture by putting its name in a test's signature, and pytest builds the dependency graph (fixtures can depend on other fixtures) and resolves it. Scope controls how often it's built: function (default, fresh per test), class, module, session (once per run — for expensive things like a DB container). The code after yield is teardown and runs even if the test fails, which makes yield fixtures the correct place for cleanup rather than try/finally in every test.
# conftest.py — fixtures here are auto-available to the whole tree, no import
import pytest
@pytest.fixture(scope="session") # built once for the entire run
def db_engine():
eng = create_engine("sqlite:///:memory:")
migrate(eng)
yield eng
eng.dispose()
@pytest.fixture # function-scope, depends on db_engine
def session(db_engine):
conn = db_engine.connect()
txn = conn.begin()
yield Session(bind=conn)
txn.rollback() # each test gets a clean, isolated DB
conn.close()
# a PARAMETRIZED fixture: every test using it runs once per param
@pytest.fixture(params=["v1", "v2"])
def api_version(request):
return request.param
import pytest
@pytest.mark.parametrize(
"raw,expected",
[
pytest.param("NCT01", "NCT01", id="already-clean"),
pytest.param(" nct01 ", "NCT01", id="trim-and-upcase"),
pytest.param("", None, marks=pytest.mark.xfail(reason="empty unsupported")),
],
)
def test_normalize_id(raw, expected):
assert normalize(raw) == expected
@pytest.mark.slow # register in pyproject; select with -m "not slow"
def test_full_pipeline(session):
with pytest.raises(ValueError, match="unknown registry"):
ingest(session, source="???") # assert on the exception, not just that it raised
| Tool | monkeypatch vs unittest.mock |
|---|---|
| monkeypatch | pytest-native, auto-undone at test end; great for env vars, attributes, setattr/setenv/chdir — simple, no assertions on calls |
| mock / MagicMock | when you must assert how it was called (assert_called_once_with), set return values/side-effects, or build a stand-in object |
| mocker (pytest-mock) | thin fixture wrapping mock with auto-cleanup — best of both for call-assertions without manual with patch() nesting |
Interview Q&A · deep dive
session but you need per-test isolation for the DB — how?session-scoped fixture creates the expensive engine/schema once, and a function-scoped fixture opens a transaction (or savepoint) per test and rolls it back in teardown. Each test sees a pristine DB without paying migration cost every time. This split — expensive thing wide, isolation thin — is the standard pattern.pytest -p randomly (pytest-randomly) or --lf/-x, then bisect. The fix is narrowing fixture scope so state can't leak, and never mutating session-scoped fixtures from a test.else paths.Quantum & the 2026 Frontier
The forward-looking layer: where quantum computing actually stands (and the trap of over-claiming it), the cryptography migration it's already forcing on you today, and the agentic-AI shift reshaping how systems get built. Facts here are current as of 2026 — figures stated precisely, never rounded.
Quantum computing & Google Willow state of the art
A qubit holds a superposition of 0 and 1; entangled qubits explore a state space that grows exponentially. The catch is fragility — qubits decohere, so the whole field hinges on error correction: grouping many physical qubits into one stable logical qubit via a surface code.
| Willow fact | Figure |
|---|---|
| Physical qubits (superconducting transmon) | 105, fabbed at Santa Barbara |
| Error suppression per +2 code distance | factor Λ = 2.14 (error halves) |
| Distance-7 logical qubit (101 qubits) | 0.143% error / cycle |
| Beyond breakeven (vs best physical qubit) | lives ~2.4× longer; T1 ~20µs → ~68µs |
| RCS benchmark | <5 min vs ~10²⁵ yrs classical |
Interview Q&A
A classical bit is a switch — 0 or 1. A qubit is a vector on the surface of a sphere (the Bloch sphere): it has a direction, encoding amplitudes for 0 and 1 plus a phase. You never read that direction — measurement collapses it to a single 0/1 with probability set by the amplitudes. The power is not "trying all answers at once" (a popular myth); it's interference — a good algorithm arranges amplitudes so wrong answers cancel and the right one is amplified before you measure.
| Primitive | What it buys | The catch |
|---|---|---|
| Superposition | n qubits hold 2ⁿ amplitudes at once | you can't read them — only sample one outcome |
| Entanglement | correlations no classical state can fake | fragile; touching one qubit disturbs its partners |
| Gates (X, H, CNOT, T) | reversible, unitary rotations build circuits | every gate adds error; depth is the enemy |
Gates are reversible (unlike a classical AND, you can always run them backward), which is why there is no quantum "delete" — and why uncomputing intermediate junk is a real cost. The hard universal gate is the T gate; in a fault-tolerant machine T gates are far more expensive than the rest, so circuit cost is often quoted as T-count.
You cannot copy an unknown qubit (the no-cloning theorem), so classical "store three copies and vote" is illegal. The surface code sidesteps this: spread one logical qubit across a 2-D lattice of physical qubits and measure stabilisers (parity checks on neighbours) every cycle. Those checks reveal where an error happened without ever measuring the data itself; a classical decoder infers the fix in real time. Willow's headline is that this finally crossed below threshold — going 3×3 → 5×5 → 7×7 made the logical error fall (Λ ≈ 2.14 per +2 distance) instead of rising.
# Qiskit: a Bell pair — the "hello world" of entanglement
from qiskit import QuantumCircuit
from qiskit.quantum_info import Statevector
qc = QuantumCircuit(2, 2)
qc.h(0) # Hadamard: put q0 into equal superposition
qc.cx(0, 1) # CNOT: entangle q0 -> q1
sv = Statevector.from_instruction(qc)
print(sv.probabilities_dict()) # {'00': 0.5, '11': 0.5} — never 01 or 10
qc.measure([0, 1], [0, 1]) # collapse: each shot is 00 or 11, perfectly correlated
# The 50/50 split is interference at work, not "both values stored as data".
Google proved the surface code scales; IBM is racing a different error-correcting code, qLDPC, which needs far fewer physical qubits per logical qubit. At its Nov 2025 Quantum Developer Conference IBM showed Nighthawk (120 qubits, 218 tunable couplers, ~5,000 two-qubit gates) and Loon, the first chip with all the components qLDPC needs, plus real-time error decoding on classical hardware in under 480 ns. The stated targets: quantum advantage by end of 2026 and Starling, a fault-tolerant machine of ~200 logical qubits running 100M gates, by 2029.
Interview Q&A · deep dive
Post-quantum cryptography act now
Quantum's near-term impact on you is defensive, not computational. Shor's algorithm breaks RSA, ECDH and ECDSA in polynomial time on a large fault-tolerant machine (~4,000 logical qubits for RSA-2048). The migration must precede the threat — which is why this is a 2026 problem, not a future one.
| Standard | Algorithm | Replaces |
|---|---|---|
| FIPS 203 | ML-KEM (from CRYSTALS-Kyber) | RSA / ECDH key exchange |
| FIPS 204 | ML-DSA (from CRYSTALS-Dilithium) | ECDSA / RSA signatures |
| FIPS 205 | SLH-DSA (from SPHINCS+) | hash-based signature fallback |
Interview Q&A
All the panic traces to two algorithms with completely different reach. Shor is an exponential break: it turns factoring and discrete-log from intractable into polynomial-time, so RSA, ECDH and ECDSA collapse entirely once a big enough fault-tolerant machine exists. Grover is only a quadratic speedup on brute-force search — it halves the effective bits of a symmetric key. That single asymmetry decides the whole migration: rebuild public-key crypto, merely resize symmetric crypto.
| Algorithm | Speedup | Hits | Response |
|---|---|---|---|
| Shor | exponential | RSA, ECDH, ECDSA, DH | replace with PQC (FIPS 203/204/205) |
| Grover | quadratic (√) | AES, SHA-2/3 (brute force) | double the size: AES-256, SHA-384/512 |
So AES-256 keeps ~128-bit effective security against Grover — still comfortable. The cliff is entirely on the asymmetric side, and lattice math (Module-LWE) is the new foundation because no efficient quantum or classical attack on it is known.
NIST finalised the first three on 13 Aug 2024; the family has since grown with deliberately non-lattice backups so a future break of lattice math isn't catastrophic — defence in depth applied to algorithm families.
| Standard | Algorithm · basis | Status (2026) |
|---|---|---|
| FIPS 203 | ML-KEM · lattice (Kyber) | final, Aug 2024 — primary KEM |
| FIPS 204 | ML-DSA · lattice (Dilithium) | final, Aug 2024 — primary signature |
| FIPS 205 | SLH-DSA · hash (SPHINCS+) | final, Aug 2024 — conservative fallback |
| HQC | KEM · code-based (not lattice) | selected Mar 2025; draft early 2026, final ~2027 |
| FIPS 206 | FN-DSA · lattice (Falcon) | draft submitted Aug 2025; ~1-yr review, final ~2026/27 |
Nobody flips to pure PQC overnight. The 2026 pattern is hybrid: run a classical and a PQC key exchange together and mix both shared secrets through a KDF, so the channel stays safe if either algorithm survives. This is already what browsers ship (X25519 + ML-KEM-768 in TLS 1.3).
# Hybrid KEM: secure if EITHER the classical OR the PQC half holds.
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
def hybrid_secret(ss_classical: bytes, ss_pqc: bytes) -> bytes:
# Concatenate both shared secrets, then derive one session key.
# An attacker must break X25519 AND ML-KEM to recover it.
return HKDF(
algorithm=hashes.SHA384(), # SHA-384: Grover-resistant margin
length=32,
salt=None,
info=b"tls13 hybrid x25519+ml-kem-768",
).derive(ss_classical + ss_pqc)
# ss_classical <- X25519 ECDH ; ss_pqc <- ML-KEM-768 decapsulation
key = hybrid_secret(x25519_shared, mlkem_shared)
Interview Q&A · deep dive
The agentic 2026 frontier trending
The dominant near-term shift isn't quantum — it's agentic AI moving from single chatbots to orchestrated systems, and the engineering disciplines forming around it. The senior value migrates from writing code to orchestrating and evaluating it.
| Current | What it is |
|---|---|
| Multi-agent orchestration | a "puppeteer" coordinating specialist agents — agentic's microservices moment |
| MCP | Model Context Protocol — the "USB-C" standard wiring agents to tools/data |
| Small Language Models | route cheap/narrow sub-tasks to SLMs; escalate only hard steps |
| CLI coding agents | delegation over suggestion — autonomous, multi-file, git-worktree isolation |
Interview Q&A
2025–26 is the year agentic AI stopped being a pile of clever frameworks and grew the boring infrastructure that means it's real: open protocols and a neutral standards body. The same arc as the early web — once HTTP and the W3C existed, the platform mattered more than any one browser. For agents that connective tissue is now MCP (agent ↔ tools/data) and A2A (agent ↔ agent), both moved under the Linux Foundation's Agentic AI Foundation (AAIF), founded Dec 2025.
| Layer | 2025–26 standard | Analogy |
|---|---|---|
| Agent → tools/data | MCP (Model Context Protocol) | USB-C for context |
| Agent → agent | A2A (Agent2Agent), v1.2, signed agent cards | HTTP between services |
| Governance | AAIF — Linux Foundation (MCP, goose, AGENTS.md) | the W3C of agents |
| Observability | OpenTelemetry (OTLP) traces across hops | distributed tracing, reused |
The car-repair analogy clarifies it: MCP connects the mechanic (one agent) to their tools — a wrench, the parts database. A2A lets the customer talk to the mechanic and lets mechanics coordinate with each other — peer agents, possibly built by different vendors on different frameworks, discovering one another and exchanging tasks. They are complementary, not competing: a single agent uses MCP inside and speaks A2A outward.
A2A interoperability starts with discovery: each agent serves a small agent card at a well-known URL describing who it is and what skills it offers, so peers can find and call it. v1.2 added cryptographic signing of these cards for domain verification — identity is now part of the protocol, not bolted on.
# An A2A "agent card" — the public manifest peers discover.
# Served at https://host/.well-known/agent-card.json
{
"protocolVersion": "1.2",
"name": "trial-matcher",
"description": "Matches patients to clinical trials",
"url": "https://agents.example.com/a2a",
"capabilities": { "streaming": true },
"skills": [
{ "id": "eligibility-check",
"description": "Score a patient against trial criteria",
"inputModes": ["application/json"] }
],
"securitySchemes": { "oauth2": { "type": "oauth2" } }
}
# A peer reads this, then POSTs a task to /a2a; trace IDs (OTLP)
# follow the call across every agent hop for unified observability.
Three durable trends underneath the protocol noise: (1) multi-model routing — the best systems no longer use one model; they route by cost/latency/capability, frontier models for hard reasoning, small/open models for extraction and classification; (2) pilots → production — 2026 is the year of KPI-gated, human-in-command deployment, not demos; (3) evaluation & observability as the gate — autonomy is only shippable if you can measure faithfulness, tool-call success, and cost continuously.
Interview Q&A · deep dive
Leadership & Career Growth
Your current role is Python Development Manager leading the AT & DS teams. This domain is the deliberate move from senior IC who happens to manage to Senior Manager / Director who multiplies a team — what each level actually requires, how to operate one level up now, business thinking, and the daily / monthly cadence that gets you promoted instead of just busier.
Where you are today — the honest inventory baseline
You can't level up cleanly without naming what your weeks actually contain. As a Python Dev Manager at GlobalData Pharma Intelligence leading AT & DS, your time today splits roughly across four buckets — the goal isn't to do less, it's to shift the mix as you climb.
| Bucket | What it looks like for you now | Healthy mix today |
|---|---|---|
| Build (IC) | CI-Radar cache layer, investigator matcher tiers, Bitbucket/Windows ops, Word/PPTX/Excel deliverables | ~40–50% |
| Lead the team | 1:1s, code review, sprint cadence, unblockers, hiring | ~25–30% |
| Stakeholders & cross-team | R&A feedback loops, scheduler/server alignment, CI-Radar handovers, exec demos | ~15–20% |
| Strategy & thinking | TrainHub roadmap, Political Pulse POC, CI Radar consolidated platform design | ~10–15% |
Interview Q&A
Don't read your time mix as a to-do list — read it as a derivative. Build is output (you produce). Lead is the first derivative (you change the team's output). Stakeholders and Strategy are the second derivative (you change what the org chooses to build at all). Every promotion is the same physical move: shift mass up the ladder while keeping the lower rungs credible. The trap is going to zero on Build — you lose the technical authority that makes your strategy land; the goal is to make Build chosen and rare, not absent.
"I should delegate more" is not a plan. Turn the inventory into a scored, dated artefact you re-run quarterly. Rate each next-level competency 1–5 on evidence (not intent), name the single proof that would move it +1, and let the lowest two scores set your quarter. This is the same self-assessment a calibration committee runs on you — running it yourself first is the whole game.
| Next-level competency | Evidence that scores it high | Your honest signal |
|---|---|---|
| Delegation / multiplier | a team member shipped a hard thing you'd normally own | still >40% personal build = low |
| Successor depth | someone could run the team for a month without you | name them or score it low |
| Written leadership | a circulated one-pager changed a decision | count them in the last quarter |
| Business fluency | you pitched an initiative in revenue/cost/customer terms | did anyone above you repeat it? |
| Cross-team influence | a peer team changed behaviour on your argument | no recent case = a real gap |
# A tiny readiness self-audit you actually re-run each quarter.
# Score on EVIDENCE (what shipped), not intent. Lowest two set the quarter.
from dataclasses import dataclass
@dataclass
class Competency:
name: str
score: int # 1=no evidence ... 5=consistent evidence
next_proof: str # the ONE artefact that moves it +1
audit = [
Competency("delegate signature work", 2, "hand off FDA cleanup end-to-end"),
Competency("grow a successor", 2, "a lead presents to R&A without me"),
Competency("lead in writing", 3, "circulate CI-Radar platform 1-pager"),
Competency("business fluency", 3, "re-pitch cache layer as margin"),
]
focus = sorted(audit, key=lambda c: c.score)[:2] # attack the weakest, not the loudest
for c in focus:
print(f"Q-focus: {c.name} ({c.score}/5) → {c.next_proof}")
Interview Q&A · deep dive
Manager → Senior Manager → Director — what each level actually requires map
Promotions stall when people assume the next level is "more of this." It isn't — the kind of value changes. Here's the honest expectations matrix, calibrated to engineering management in a product/intelligence org like yours.
| Axis | Manager (today) | Senior Manager (next) | Director (after) |
|---|---|---|---|
| Scope | one team, one product/area | multiple teams or a large team; one full product line | a portfolio; a function across the org |
| Horizon | this quarter | 2–3 quarters ahead | 1–2 year strategy + hiring plan |
| Source of value | delivery + raising your ICs | raising your managers / leads + multi-team outcomes | org design, capability bets, talent density |
| Tech depth | code-level on hard problems | architecture & trade-off review | tech bets & build/buy at platform scale |
| Business view | understands product KPIs | owns product KPIs; speaks revenue/cost/customer | connects tech bets to P&L & strategy |
| Stakeholders | peers + product mgr + 1–2 levels up | cross-functional execs, customers, vendors | C-suite, board-adjacent, external |
| Hiring | hires juniors/seniors | hires staff engineers + leads; builds bench | hires managers; succession planning |
| Failure mode | becomes the team's bottleneck | still firefighting at IC level | too far from reality; trusts slide decks over signal |
Interview Q&A
There are two different motions people confuse. Scope (more teams, a bigger product, a harder bet) is granted by your manager — it's a bet on you and it comes before the title. Level is ratified by a calibration committee after you've demonstrably operated at it, usually for ~two quarters, with evidence other leaders can see. So the sequence is always: take scope → operate up → accumulate cross-org evidence → get ratified. People who ask for the title before taking the scope are reversing the only order that works.
A common fear is that climbing means abandoning engineering. It doesn't; the resolution changes. As Manager you debug at the line; as Senior Manager you review the architecture and the trade-off; as Director you make the platform-scale bet and the build/buy call. The depth must stay real enough to call BS — a Director who can't tell a credible architecture from a confident slide is the failure mode the matrix already named. Keep one channel into real technical signal (a design review you actually attend, a postmortem you actually read) at every level.
| Decision | Manager owns | Senior Manager owns | Director owns |
|---|---|---|---|
| Architecture | this service's design | cross-team contracts & review bar | platform bets, build/buy/partner |
| Hiring | fills roles to plan | raises the bar; builds bench | sets headcount & org shape |
| Roadmap | this quarter's commitments | 2-3 quarter sequencing & bets | 1-2 year strategy & capability |
| Conflict | within the team | between teams / with peers | across functions / external |
Interview Q&A · deep dive
Operating a level up now — the seven moves practical
Promotion isn't granted; it's ratified after you've already been doing the next job. These seven moves shift you there without waiting for a title.
| # | Move | What it looks like |
|---|---|---|
| 1 | Cap your IC time | one signature hard problem in your hands; the rest delegated with you on review & coaching |
| 2 | Grow a successor | one engineer reaches "could run this team for a month" — the single biggest signal of readiness |
| 3 | Own a multi-quarter bet | not just sprints — something with a 6–12-month arc (CI Radar consolidated platform, AT × DS roadmap, eval-driven QE practice) |
| 4 | Write, don't just talk | one-pagers / strategy docs / vision memos — leaders at the next level work in writing |
| 5 | Connect tech to business | every initiative tagged to a KPI (revenue, retention, cost, time-to-insight) |
| 6 | Influence without authority | get a peer team or stakeholder to change behaviour because your argument was right — not because you outrank them |
| 7 | Say no, well | protect the team from low-leverage work; explain the trade-off in business terms, propose the alternative |
Interview Q&A
"Delegate more" fails because it's a volume instruction, not a sorting rule. Sort every piece of work on two axes: leverage (does only you have the context/authority?) and growth value (would owning this stretch a teammate?). That gives four quadrants and four different actions — the most common mistake is hoarding the bottom-right (high-growth, low-leverage) because you're faster, which starves your successor of exactly the reps they need.
| Low growth value | High growth value | |
|---|---|---|
| High leverage (only you) | do it now, briefly (sign-offs, exec asks) | do-with: pair, narrate your reasoning, then hand the next one over |
| Low leverage (others can) | automate or kill it (it shouldn't need a human) | delegate the outcome + decision rights — your highest-ROI move |
Operating up means decisions stop being yours to make alone and start being yours to build consensus for. Two senior mechanics do most of the work. First, the pre-wire: never let a stakeholder hear a proposal for the first time in the room — talk to each one-on-one beforehand, absorb their objection, and walk in with it already handled. Second, explicit decision rights (a lightweight RACI) so "who decides" is settled before the debate, not during it. Surprises and ambiguous ownership are what actually kill cross-team initiatives — not bad ideas.
| Role | Means | For CI-Radar consolidated platform |
|---|---|---|
| Responsible | does the work | your AT/DS leads |
| Accountable | one neck, owns the outcome | you |
| Consulted | two-way input before deciding | R&A, scheduling/server owners |
| Informed | told after, one-way | exec sponsors, adjacent teams |
Hiring is a level-up move because it compounds for years and is where unstructured judgment does the most damage. Replace "felt strong" with a weighted, pre-committed scorecard and a default-no bar: independent scores first (kill groupthink), then debrief.
# Structured hiring decision — weights set BEFORE interviews, scores independent.
SIGNALS = { # weight by what THIS role needs
"problem_solving": 0.30,
"code_quality": 0.20,
"system_design": 0.25,
"collaboration": 0.15,
"ownership": 0.10,
}
def decide(scores, bar=3.2): # scores: dict signal -> 1..4 per interviewer
weighted = sum(SIGNALS[s] * scores[s] for s in SIGNALS)
no_hire = any(v <= 2 for v in scores.values()) # any hard fail = stop
verdict = "NO — default to no on doubt"
if weighted >= bar and not no_hire:
verdict = "HIRE"
return round(weighted, 2), verdict
print(decide({"problem_solving":4,"code_quality":3,
"system_design":4,"collaboration":3,"ownership":3}))
# (3.45, 'HIRE') — defensible, repeatable, bias-resistant
Interview Q&A · deep dive
Business & commercial thinking — the second language commercial
At Manager and below, engineering excellence is enough. From Senior Manager up, you must also speak commercial — the language of revenue, cost, customers, and trade-offs that the rest of the company uses. You don't need an MBA; you need the same dozen ideas to be muscle memory.
| Concept | One-line meaning | How it lands for your work |
|---|---|---|
| P&L | revenue − costs = profit (for a product / unit) | CI-Radar is a P&L: subscription revenue minus the cost of running it; cache layer cut cost |
| Unit economics | per-customer cost vs per-customer revenue | your LLM _track_usage() is unit-economics instrumentation |
| ARR / MRR | annualised / monthly recurring revenue | what enterprise pharma sales actually book |
| CAC / LTV | cost to acquire vs lifetime value | LTV/CAC > 3 is healthy SaaS |
| Churn | customers (or revenue) lost per period | logo churn vs gross-revenue retention vs net-revenue retention |
| Gross margin | (revenue − COGS) / revenue | LLM token cost is now a real COGS line |
| Build / buy / partner | do it yourself, license, or partner | frontier LLMs → buy; matching logic → build; registries → partner |
| ROI & payback | return on investment / months to recoup | Dell ReAct: 400+ FTE saved → measurable payback <1 quarter |
| Opportunity cost | what you didn't do because you did this | the most ignored cost in engineering planning |
| Moat | what makes your product hard to replicate | the 5.4M-record investigator graph and your registry breadth are moats |
Interview Q&A
The basic vocabulary (P&L, CAC, LTV, churn) gets you in the room. The metrics that investors and your CEO actually watch are the composite ones — and as of 2025 the bar has moved. Knowing the current numbers, not the textbook ideals, is what makes you sound like you sit in the business, not adjacent to it.
| Metric | What it is | Current bar (2025) |
|---|---|---|
| Rule of 40 | revenue growth % + profit margin % | > 40 is healthy; public-SaaS median ~28 — only ~1 in 5 clear it |
| NRR (net rev retention) | this cohort's revenue a year later (expansion − churn) | median ~82%; top-quartile ~130% (>100 = grows without new logos) |
| CAC payback | months of gross profit to recoup acquisition cost | top-quartile ~16 mo; >18 mo + margin <75% = borrowing from the future |
| LTV : CAC | lifetime value vs cost to acquire | >3:1 viable; enterprise top-quartile 4-6:1 |
| Magic number | new ARR ÷ prior-period S&M spend | >0.75 = efficient growth, spend more; <0.5 = fix the funnel first |
A senior business case is a small, honest model anyone can stress-test — not a paragraph of optimism. Make the assumptions explicit, compute payback and annualised ROI, and let the reader change the inputs. The reframe that lands with a CFO: cost in weeks, savings in dollars/month, payback in months, then the risk you'd accept.
# Business case for the CI-Radar cache layer. Inputs are assumptions —
# make them visible so a CFO can push on them, not on your conviction.
ENG_WEEKS = 6
COST_PER_WEEK = 3000 # fully-loaded engineering cost
QUERIES_MONTH = 900_000
LLM_COST_QUERY = 0.014 # $ per query before caching
CACHE_HIT_RATE = 0.55 # conservative; measure, don't hope
def case():
build_cost = ENG_WEEKS * COST_PER_WEEK
monthly_save = QUERIES_MONTH * LLM_COST_QUERY * CACHE_HIT_RATE
payback_mo = build_cost / monthly_save
roi_year = (monthly_save * 12 - build_cost) / build_cost
return build_cost, round(monthly_save), round(payback_mo, 1), round(roi_year, 1)
cost, save, payback, roi = case()
print(f"Build ${cost:,} · saves ${save:,}/mo · payback {payback} mo · {roi:.0%} 1-yr ROI")
# Build $18,000 · saves $6,930/mo · payback 2.6 mo · 362% 1-yr ROI
# Talk-track: "and it lifts gross margin on the intelligence product,
# removing the cost blocker to expand to the 3 clients we deferred."
Interview Q&A · deep dive
The promotion cadence — daily, weekly, monthly, quarterly habits
Most managers stall not from lack of ability but from lack of rhythm. Promotions follow visible, sustained operation at the next level — which means a deliberate cadence you actually run. Steal this one and adapt it.
| Cadence | Ritual | Output |
|---|---|---|
| Daily (15 min) | review priority list; one act of delegation or coaching; one block of strategic time protected on the calendar | your time mix shifts — the only thing that matters |
| Weekly | 1:1 with each direct (focused on their growth, not status); 1:1 with your manager (focused on outcomes & risks, not tasks); one cross-team conversation outside your line | relationships compound; risks surface early |
| Bi-weekly | update the brag doc — dated entries: outcome, impact in numbers, who saw it | raw material for promo case & perf review |
| Monthly | skip-level with your manager's manager (or a senior peer / sponsor); one written artefact (one-pager / vision / postmortem) | visibility + written evidence |
| Quarterly | career conversation with your manager: "Am I operating at L+1? What's the gap?"; revise the 12-month growth plan | explicit promo trajectory & signal |
| Yearly | self-review: ship a written promo case = scope, outcomes, growth, gaps closed; pick one stretch bet for next year | compounding visibility & intentional growth |
Interview Q&A
Promotions are decided on evidence at a moment in time (calibration) by people with imperfect memory of a year's work. Two forces work against you: recency bias (the last six weeks dominate the impression) and memory decay (your January wins are gone by December — yours and your manager's). Cadence is the engineering answer: a write-ahead log of impact (the brag doc) plus scheduled visibility (skip-levels, written artefacts) so the moment the decision is made, the evidence is already assembled and already seen. Effort without cadence is a tree falling in an empty forest.
A brag doc isn't a diary; it's structured promo evidence written in advance. Every entry answers four questions the committee will ask, in their order. Make it dated and append-only so it doubles as your perf review and your promo case with citations.
# brag-doc.md — append-only, dated. Update bi-weekly (15 min).
## 2026-Q2
### CI-Radar cache layer
- What: designed + shipped query/embedding cache across the RAG pipeline
- Impact: -55% LLM cost/query, -40% p95 latency # NUMBERS, always
- Level: SM signal — owned cross-team rollout, not just code
- Witness: R&A lead, scheduling owner, demoed to exec sponsor
### Grew a successor
- What: handed investigator R&A feedback loop end-to-end to a senior
- Impact: they presented to R&A without me in the room
- Level: the single strongest readiness signal for SM
- Witness: their manager (me), R&A stakeholders
Your manager nominates; a sponsor spends political capital defending the promotion in the room you're not in. You don't ask for sponsorship — you earn it by doing visible, sustained, excellent work on the things that sponsor already cares about, then making it trivially easy for them to advocate: a tight evidence packet, a clear narrative, no homework required. The cadence feeds this directly — the brag doc is the packet you hand your sponsor.
| Cadence touchpoint | What it builds | Sponsorship link |
|---|---|---|
| Weekly cross-team chat | relationships outside your line | future sponsors form here |
| Monthly skip-level | visibility two levels up | the room where ratification happens |
| Monthly written artefact | durable, shareable evidence | what a sponsor forwards on your behalf |
| Quarterly career convo | explicit gap + trajectory | turns your manager into your first sponsor |
Interview Q&A · deep dive
Problem Solving & Reasoning
The meta-skill beneath every other domain: how to analyze a problem from the root, reason from fundamentals instead of memorized recipes, and pick the right method to crack it. Tools change; these thinking patterns are what let you solve problems you've never seen — and they're what interviewers are really scoring when they watch you work.
How to analyze any problem — the loop framework
Most people jump straight to solving. Strong problem-solvers spend more time understanding. This is the universal loop — Polya's method generalized — that works on a bug, a system design, a business question, or a whiteboard problem.
| Step | The question it answers |
|---|---|
| Understand | what is actually being asked? what's known vs unknown? restate it in your own words. |
| Define | what does "solved" look like? success criteria, constraints, scope. |
| Decompose | break it into sub-problems; locate the core difficulty. |
| Strategize | pick a method (analogy, work backwards, simplify…); plan before you build. |
| Execute | build the smallest thing that could possibly work. |
| Verify | test against the definition; check edge cases; does it truly solve it? |
| Reflect | what generalizes? what would you do differently next time? |
Interview Q&A
The arrows aren't one-way. The loop is a controller with an error signal: every pass through Verify measures the gap between where you are and the Defined goal, and feeds that gap back to refine Understand. Beginners run it once, top-down, and call it done. Experts run tight, cheap loops — small experiments that buy information — and let the result reshape the problem. The skill is not following seven steps; it's deciding, at each turn, which step is currently the bottleneck and spending there.
Request: "the dashboard is slow, make it fast." Watch the loop turn a non-problem into a solvable one.
| Step | Concrete move on this request |
|---|---|
| Understand | "Slow" for whom? One user, one page. p95 load is 9s; the rest are fine. Known: one tenant; unknown: which query. |
| Define | Done = that page's p95 under 2s for that tenant, no regression elsewhere. Now it's measurable. |
| Decompose | Render time vs network vs server time. Trace shows 8.4s is server-side, in one endpoint. |
| Strategize | Hypothesize an N+1 query (analogy to a pattern seen before) — cheapest theory to test first. |
| Execute | Add a single eager-load / join; measure on that tenant's data, not synthetic data. |
| Verify | p95 now 1.3s; spot-check three other tenants for regressions. Meets the definition. |
| Reflect | Generalize: add a query-count assertion in tests so N+1 can't silently return. |
Interview Q&A · deep dive
Define (an explicit success criterion, because engineering problems are solved, not just understood) and Reflect (capturing the generalizable lesson so the next problem is cheaper). Science seeks truth; this loop seeks a verified, durable solution.First-principles thinking — roots & fundamentals reason from scratch
Reasoning by analogy copies what others did ("everyone uses X"). First-principles reasoning strips a problem to its fundamental, irreducible truths and rebuilds from there — how you find non-obvious solutions and escape inherited assumptions.
| Reasoning by analogy | Reasoning from first principles |
|---|---|
| "It's done this way, so we do it this way" | "What do we actually know to be true here?" |
| Fast, usually fine, inherits hidden limits | Slower, harder, finds what others miss |
| Copies the surface | Asks why the surface exists |
Interview Q&A
First-principles thinking is climbing down the abstraction ladder until every rung is something you can independently verify, then climbing back up by construction. Reasoning by analogy operates near the top of the ladder ("use the framework everyone uses"); it inherits whatever assumptions are baked into the rungs below — including the broken ones. The discipline is to ask, at each rung, "is this a law, or a convention someone chose?" Laws (the spec, the math, the physics, the data) you keep; conventions you are free to discard.
"This managed log pipeline costs $40k/month, that's just what observability costs." First-principles: price the irreducible inputs, ignore the vendor's bundle.
| Layer | Analogy answer | First-principles answer |
|---|---|---|
| What you pay for | "the platform's per-GB rate" | storage bytes + ingest CPU + query compute — three separable costs |
| The actual driver | "we log a lot" | 92% of bytes are DEBUG logs no one queries after 24h |
| The rebuild | "negotiate the contract" | sample DEBUG at 5%, tier old logs to cold storage → same signal, ~$6k/month |
The inherited frame ("observability is expensive") was never the constraint. The irreducible question — which bytes carry information we'll actually use? — was, and it had a 6x answer hiding in plain sight.
Interview Q&A · deep dive
Root cause analysis — root-level investigation find the real cause
A symptom is where it hurts; the root cause is why. Fixing symptoms makes problems recur. Root-cause techniques force you past the surface to the underlying defect — in process, design, or assumption — so the fix actually holds.
| Technique | How it works |
|---|---|
| 5 Whys | ask "why?" about five times, walking from symptom down to the root |
| Fishbone (Ishikawa) | brainstorm candidate causes by category — people, process, tools, data, environment |
| Pareto (80/20) | a few causes drive most failures; fix those first for the biggest win |
| Fault tree | work top-down from the failure through AND/OR cause branches |
Interview Q&A
RCA rests on one shift: every recurring failure is a property of the system, not the person. If a human error could cause an outage, the real root cause is the missing guardrail that let a single human error reach production. "Operator typed the wrong flag" is never a root cause — "a destructive command had no confirmation and no staging gate" is. This reframing is what makes postmortems blameless and useful: it points the fix at something you can actually change.
Pareto tells you which failure to dig into; 5 Whys tells you how deep to dig. They compose.
| Failure class (last 30 days) | Count | % of pages |
|---|---|---|
| Deploy-time config drift | 22 | 61% |
| Upstream timeout | 7 | 19% |
| Disk full | 5 | 14% |
| Other | 2 | 6% |
Pareto says: ignore the long tail, attack config drift (61%). Now 5 Whys on it: pages on deploy → why? prod config differed from staging → why? values were edited by hand in the console → why? there was no config-as-code path → why? the original service shipped before IaC was standard → why? no migration was ever scheduled. Root cause: config lives outside version control. Fix: move it into reviewed IaC — kills 61% of pages, not one of them.
Interview Q&A · deep dive
Decomposition & solving strategies methods toolbox
When a problem is too big to solve directly, you change its shape. These are the classic strategies — the moves an expert reaches for when stuck. Keep them as a checklist.
| Strategy | The move |
|---|---|
| Divide & conquer | split into independent sub-problems, solve, combine (merge sort, MapReduce) |
| Abstraction | drop the detail, model the essence, solve the general case |
| Work backwards | start at the goal and reason toward the start (proofs, planning, mazes) |
| Simplify / specialize | solve a smaller or special case first (set n=1, then n), then generalize |
| Analogy / pattern-match | "what known problem is this like?" — map it to a graph, a queue, a DP |
| Invariants & constraints | find what must stay true; use constraints to prune the search space |
| Inversion | instead of "how to succeed", ask "how would this fail?" and avoid that |
Interview Q&A
The strategies in the table are all the same primitive — cut the problem along a seam where the pieces are weakly coupled — applied to different axes. Divide-and-conquer cuts along data (halve the input). Abstraction cuts along detail (drop what doesn't matter). MECE cuts along categories (no gaps, no overlaps). The expert move isn't knowing the list; it's having a feel for where the natural seams are, because a cut across a tightly-coupled join just creates two sub-problems that have to constantly talk to each other — which is harder than the original.
"Solve halves, combine" isn't a metaphor — it's an algorithm shape with a known cost. Here it counts inversions (how far a list is from sorted) in O(n log n), something the brute-force double loop does in O(n²):
def sort_and_count(a):
# base case: a single element is sorted, 0 inversions
if len(a) <= 1:
return a, 0
mid = len(a) // 2
left, cl = sort_and_count(a[:mid]) # divide
right, cr = sort_and_count(a[mid:])
merged, cm = _merge_count(left, right) # conquer + combine
return merged, cl + cr + cm
def _merge_count(l, r):
out, i, j, inv = [], 0, 0, 0
while i < len(l) and j < len(r):
if l[i] <= r[j]:
out.append(l[i]); i += 1
else:
out.append(r[j]); j += 1
inv += len(l) - i # every remaining left elem is an inversion
out.extend(l[i:]); out.extend(r[j:])
return out, inv
print(sort_and_count([2, 4, 1, 3, 5])[1]) # 3 inversions
The structure pays off twice: it's faster and the inversion count falls out of the combine step for free — a count the naive approach can't get without the full O(n²) comparison.
| Stuck signal | Reach for | Because |
|---|---|---|
| Too many cases / huge input | Simplify: solve n=1, n=2 | the pattern that scales is visible in the small case |
| Goal is clear, start is murky | Work backwards | the last step often forces the second-to-last |
| "I've seen something like this" | Pattern-match to a known structure | borrow a proven algorithm instead of inventing |
| Search space explodes | Find an invariant / constraint | each constraint prunes whole branches |
| Hard to define success | Inversion: define failure, avoid it | "how would this break?" is often easier to enumerate |
Interview Q&A · deep dive
Logical reasoning & mental models think clearly
How you move from evidence to conclusion. Knowing the modes of inference — and the biases that corrupt them — keeps your reasoning honest under pressure.
| Mode | From → to |
|---|---|
| Deduction | general rule → certain conclusion (all A are B; x is A; so x is B). Math, logic. |
| Induction | specific observations → probable rule (it rose every day → it'll rise tomorrow). Science, ML. |
| Abduction | observation → best explanation (the lawn is wet → it probably rained). Debugging, diagnosis. |
| Hypothesis-driven | form a falsifiable guess → design a test that could disprove it → run it. |
| Bayesian updating | start from a prior → update on evidence; extraordinary claims need extraordinary evidence. |
| Mental model | Use |
|---|---|
| Occam's razor | prefer the simplest explanation that fits the facts |
| MECE | split a space into mutually exclusive, collectively exhaustive parts — no gaps, no overlaps |
| Second-order thinking | "and then what?" — the consequences of the consequences |
| Inversion | solve a goal by working out how it would fail, then avoid that |
Interview Q&A
The three modes differ in what they're allowed to conclude. Deduction transfers certainty: if the premises hold, the conclusion must. Induction manufactures confidence from repetition — it can be overturned by one black swan. Abduction picks the best available story and is the weakest of the three (the obvious explanation can be wrong), which is exactly why debugging — pure abduction — must always be followed by a deductive test. Misjudging which mode you're in is the deepest reasoning error: treating an inductive pattern ("it's always been fine") as a deductive guarantee is how systems get blindsided.
A test for a rare condition is 99% accurate; base rate is 1 in 1000. A positive comes back. How worried should you be? Intuition screams "99%." The math says ~9%.
def posterior(prior, sensitivity, false_pos):
# P(condition | positive) via Bayes' rule
p_pos = prior * sensitivity + (1 - prior) * false_pos
return (prior * sensitivity) / p_pos
p = posterior(prior=0.001, sensitivity=0.99, false_pos=0.01)
print(round(p, 3)) # 0.09 — a rare prior swamps a "good" test
Engineering version: an alert that fires "99% accurately" on an event that's genuinely rare is mostly false positives. Base rates beat test accuracy — this is the math behind why noisy alerting and over-eager anomaly detectors get muted, and why "the model is 99% accurate" tells you almost nothing without the prior.
| Bias | How it shows up in engineering | The counter |
|---|---|---|
| Confirmation | only checking logs that fit your theory | predict what you'd see if you're wrong, then look for that |
| Anchoring | the first estimate sets the whole sprint plan | estimate independently before hearing others' numbers |
| Survivorship | studying only the services that didn't fail | deliberately go find the failures / the churned users |
| Recency / availability | last incident dominates the roadmap | weight by frequency & impact data, not vividness |
| Sunk cost | "we've spent 3 months, we can't stop now" | decide on future value only; past spend is gone |
Interview Q&A · deep dive
Debugging as applied science systematic
Debugging isn't luck — it's the scientific method aimed at code. Random changes ("shotgun debugging") burn hours; a disciplined loop finds the cause fast.
| Step | What to do |
|---|---|
| Reproduce | get a reliable, minimal repro — a bug you can't reproduce, you can't fix |
| Read the error | the stack trace usually names the file, line, and cause — read it before guessing |
| Isolate (bisect) | binary-search the space: comment out half, git bisect across commits, shrink the input — each step halves it |
| Hypothesize | form one testable theory of the cause |
| Test | change one thing, predict the result, observe — never two at once |
| Verify & prevent | confirm the fix, then add a test so it can't regress |
Interview Q&A
Bisection turns a linear search into a logarithmic one, and the gap is enormous: across 1,000 commits, a linear walk averages 500 checks; git bisect needs at most 10. That's the whole reason the loop's Isolate step dominates — every other step gets cheaper once the bug is localized to one commit, one function, or one input row. The precondition is a reliable test for "broken vs not"; with that, bisection is nearly mechanical. Here's the engine that git bisect automates, made explicit:
def find_first_bad(commits, is_bad):
# commits: chronological list; is_bad(c): True once the bug exists
lo, hi = 0, len(commits) - 1
first_bad = None
while lo <= hi:
mid = (lo + hi) // 2
if is_bad(commits[mid]):
first_bad = commits[mid] # candidate; look earlier
hi = mid - 1
else:
lo = mid + 1 # still good; look later
return first_bad
# the bug appears at commit index 37 (a regression in formatting)
log = [f"c{i}" for i in range(1000)]
print(find_first_bad(log, lambda c: int(c[1:]) >= 37)) # c37, in ~10 probes
You can't bisect a bug you can't trigger. For races, the move is to amplify the timing window until the bug is reliable, prove it, then fix and prove it's gone. This harness exposes a classic check-then-act race:
import threading
balance = {"v": 100}
def withdraw(amount, slow):
if balance["v"] >= amount: # check
slow() # widen the race window on purpose
balance["v"] -= amount # act — two threads can both pass the check
def race():
balance["v"] = 100
delay = lambda: __import__("time").sleep(0.01)
ts = [threading.Thread(target=withdraw, args=(100, delay)) for _ in range(2)]
for t in ts: t.start()
for t in ts: t.join()
return balance["v"]
print(race()) # -100 reliably — the injected sleep makes the race deterministic
Once it's deterministic you have a regression test. The fix (a lock around check-and-act) is then trivial to verify — re-run the same harness and it stays at 0. The skill was never the lock; it was making the ghost stand still.
Interview Q&A · deep dive
git bisect to work, and what breaks it?git bisect skip). So for flaky bugs you must first make the failure deterministic — then bisect. Bisection is mechanical only once "broken vs not" is a sharp, repeatable signal.Systematic solving — coding & design problems interview-ready
A repeatable method for whiteboard and system-design problems so you never freeze. The rule: clarify before coding, correctness before speed.
| Step | What to do |
|---|---|
| Clarify | restate it; ask about input ranges, types, edge cases, constraints — catches "the wrong problem" |
| Examples | work one concrete case by hand, including an edge case |
| Brute force first | state the obvious O(n²) solution out loud — correctness before cleverness; never freeze hunting the optimal |
| Optimize | name the bottleneck, then reach for a pattern: hash map, two pointers, sorting, heap, DP |
| Code | clean, named, in small pieces |
| Test | walk edge cases: empty, single element, duplicates, overflow, null |
Interview Q&A
The spine is worth seeing as a single transcript, because interviewers score the seams between steps — the moment you say "the brute force is O(n²) because of the nested scan; the bottleneck is the repeated lookup, so I'll trade space for time with a hash map" is the moment you demonstrate optimization is a deliberate move, not a memorized trick. Below, the same problem (two-sum) carried through the rail, naive then optimized, with the reasoning that connects them.
# CLARIFY: ints may be negative; exactly one answer; return indices.
# EXAMPLE by hand: [2,7,11,15], target 9 -> (0,1) since 2+7=9.
def two_sum_brute(nums, target): # O(n^2) time, O(1) space
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] + nums[j] == target:
return (i, j)
return None
# BOTTLENECK: the inner loop re-searches for the complement every time.
# OPTIMIZE: remember what we've seen -> hash lookup is O(1).
def two_sum(nums, target): # O(n) time, O(n) space
seen = {} # value -> index
for i, x in enumerate(nums):
need = target - x
if need in seen:
return (seen[need], i)
seen[x] = i
return None
# TEST: normal, no-solution, duplicates, negatives, two-element edge.
assert two_sum([2, 7, 11, 15], 9) == (0, 1)
assert two_sum([3, 3], 6) == (0, 1) # duplicate values
assert two_sum([-1, -2, -3], -5) == (1, 2) # negatives
assert two_sum([1], 2) is None # no pair
print("all cases pass")
Note the narration baked into comments — clarify, example, bottleneck, optimize, test. That's the spoken track an interviewer hears. The hash-map move (remember what you've seen so each future element is an O(1) lookup) is one of the five patterns that crack most array problems; naming it out loud is the signal.
| Bottleneck you name | Pattern to reach for | Buys you |
|---|---|---|
| repeated lookup / "have I seen X?" | hash map / set | O(n²) → O(n) |
| find pairs in a sorted array | two pointers | O(n²) → O(n) |
| "k largest / smallest", streaming | heap | O(n log n) → O(n log k) |
| contiguous subarray / window | sliding window | O(n²) → O(n) |
| overlapping subproblems | dynamic programming | exponential → polynomial |
Interview Q&A · deep dive
Multi-Domain Mastery
The reframe that makes everything else portable: the examples in this hub lean on clinical-trial & pharma intelligence because that's where the work happened — but the skills are domain-agnostic. This section pulls the transferable spine out from under the pharma skin, teaches you to ramp into any industry fast, maps your exact stack onto other high-value domains, and turns your range into a deliberate growth path. The goal: be the engineer who drops into any domain and is productive in weeks.
Your transferable skill spine domain-agnostic
The most freeing realization for your career: you're not a "pharma engineer." You're a data + AI engineer who happens to work in pharma. The same spine — ingest messy real-world data, structure it, layer ML/GenAI, serve it — is exactly what finance, legal, healthcare, govtech, and retail pay for. The domain is a swappable layer on top.
messy / regulated sources→ Structure
clean · model · match→ Intelligence
RAG · ML · agents→ Serve
dashboard · API · report
| Portable (≈90%) | Domain-specific (≈10%) |
|---|---|
| scraping, parsing, ETL, data modelling | the entities (trials vs trades vs cases) |
| RAG, embeddings, agents, summarization | the jargon & mental model |
| entity matching, dedup, fuzzy logic | the regulations (GxP vs SOX vs HIPAA) |
| pipelines, APIs, dashboards, MLOps | the key business metric |
Interview Q&A
Picture your career as two stacked layers. The value stack (ingest → structure → intelligence → serve) is what produces money and never changes. The domain skin (entities, jargon, regs, the one metric) wraps it and does change — but it's the thin part. When you say "I'm a pharma engineer" you accidentally name yourself after the skin; when you say "I build production data+AI systems over messy regulated sources" you name yourself after the stack. Pricing, mobility, and confidence all follow which layer you anchor your identity to.
A mechanical drill: take any bullet that names a domain and rewrite it so the capability leads and the domain becomes the evidence clause. Do this once per resume line.
# skin-first (boxes you in)
"Built a clinical-trial RAG pipeline over 40+ pharma registries."
# stack-first (capability leads, domain is the proof)
"Build production RAG + entity-resolution systems over messy,"
"regulated, heterogeneous sources — proven on a 440K-record"
"trial-intelligence platform spanning 40+ registries."
# the reusable formula:
# [portable capability] + [scale/quality] + [domain as evidence, last]
When you learn something new, ask: "Would a different industry pay for this exact thing tomorrow?" If yes, it belongs on the spine — invest deeply and put it on the resume's top line. If it's only legible inside one vertical, it's skin — learn enough to be credible, but don't let it define you.
| Signal | Spine (invest) | Skin (rent) |
|---|---|---|
| Transfers across industries | yes — ETL, RAG, matching, MLOps | no — GxP audit trails, ICH codes |
| Shelf life | years to a decade | changes with the vertical |
| Resume placement | headline capability | evidence clause / context |
| Re-learn cost on domain switch | ≈ zero | 2–4 weeks (see ramp method) |
Interview Q&A · deep dive
Ramp into any domain — fast domain acquisition
Productivity in a new industry isn't about years — it's about learning the right seven things quickly. This is the repeatable method to go from zero to credible in a new domain in weeks, the way a good consultant onboards a new client.
| Learn fast | The question — with a pharma→finance analogy |
|---|---|
| 1 Entities | what are the core nouns? (trials, investigators → trades, counterparties) |
| 2 Data sources | where does the messy data live? (registries → filings, ledgers, market feeds) |
| 3 Regulations | what rules constrain it? (GxP, HIPAA → SOX, KYC, GDPR) |
| 4 Key metrics | what does the business optimize? (enrollment → risk, conversion, churn) |
| 5 Workflows | what's the end-to-end process, and where's the pain? |
| 6 Stakeholders | who decides, and what do they actually care about? |
| 7 Jargon | the ~50 words that unlock every conversation |
Interview Q&A
The seven questions tell you what to learn; this is the order and tempo. Treat a new domain like a system to reverse-engineer: read for the map, interview to correct your map, then ship to prove you actually understand it. The ship step is non-negotiable — building one small real thing surfaces every wrong assumption that reading let you keep.
A consultant-style onboarding you can literally paste into a planning doc. The constraint is what makes it work: a hard deadline forces you to learn the load-bearing 20% and ignore the rest.
# WEEK 1 — build the map (input-heavy)
Day 1-2 Entities + data sources # the core nouns & where truth lives
Day 3 Regulations + key metric # what constrains, what's optimized
Day 4 Workflow walk-through # end-to-end; mark the pain point
Day 5 Two expert interviews # correct the map; harvest jargon
# WEEK 2 — convert reading into understanding (output-heavy)
Day 6-8 Ship ONE small real thing # a parser, a match, a tiny dashboard
Day 9 Demo to a domain expert # their corrections = the real syllabus
Day 10 Write the analogy doc # "X in new domain == Y I already know"
# exit test: can you explain the workflow's #1 pain to a stranger?
The seven things get you the surface. To get credible, follow each with a "where does it break?" probe — practitioners trust people who ask about the messy edges, not the brochure version.
| Surface question | The probe that earns trust |
|---|---|
| What are the core entities? | Which entity is hardest to identify uniquely, and why? (that's your matching problem) |
| Where's the data? | Which source do people secretly not trust? (that's your data-quality work) |
| What's the key metric? | What do people game to hit it? (that's where the real incentives live) |
| What's the workflow? | What's still done in a spreadsheet by one person? (that's your automation wedge) |
Interview Q&A · deep dive
The same stack across industries where your skills sell
A concrete map of how your exact capabilities — scraping, structuring, RAG, entity matching, dashboards — translate into other high-paying domains. Same engine, different fuel.
| Domain | The messy data | The AI/ML win |
|---|---|---|
| Finance / Fintech | filings, transactions, market feeds | fraud detection, risk RAG, KYC entity matching, document intelligence |
| Legal / RegTech | contracts, case law, dockets | clause extraction, contract review, e-discovery, compliance RAG |
| Healthcare | EHRs, claims, literature | clinical NLP, coding automation, patient matching, prior-auth |
| Govtech / Civic | rolls, records, budgets | public-data pipelines, transparency dashboards (your Political Pulse) |
| Retail / E-commerce | catalogs, reviews, clickstream | recommendation, demand forecasting, catalog matching, search |
| Insurance | claims, policies, documents | claims triage, fraud, underwriting NLP |
Interview Q&A
The reason your skills sell everywhere is that the reference architecture is identical across these domains — only the type parameters change. Think of your stack as a generic system with the domain as a parameter you bind at the seams. This is also the cleanest way to scope a new-domain project: instantiate the generic, then ask only "what fills these four slots?"
# your reference pipeline, written as a generic — domain is a parameter
class IntelligencePlatform:
def run(self, source, entity, rule, metric):
raw = self.ingest(source) # scrape / API / file drop
clean = self.structure(raw, entity) # normalize + resolve entities
intel = self.enrich(clean, rule) # RAG / ML / scoring
return self.serve(intel, metric) # dashboard / API / report
# bind the type parameters per domain — the body never changes:
pharma = ("registries", "trial", "GxP", "enrollment")
fintech = ("filings", "counterparty", "KYC", "risk")
legal = ("dockets", "party", "privilege", "exposure")
retail = ("catalogs", "SKU", "PCI", "conversion")
Not all spine skills transfer equally. Entity resolution and ingestion are the most universal (every domain has duplicate, dirty records); domain-tuned ML models are the least (a churn model isn't a fraud model). Invest your deepest hours in the top rows — they're the ones that make six domains feel like one.
| Capability | Transfer strength | Why |
|---|---|---|
| Entity matching / dedup | universal | every domain has the same "are these two records the same thing?" problem |
| Ingestion / ETL | universal | messy heterogeneous sources are the default everywhere |
| RAG over documents | very high | contracts, filings, EHRs, literature — all "ground an LLM on our docs" |
| Evaluation / observability | high | same discipline; only the gold labels are domain-specific |
| Domain-tuned ML model | moderate | the pattern transfers; weights and features are retrained per domain |
Interview Q&A · deep dive
Your proven cross-domain range the narrative
You're not asking employers to take a leap — you've already shipped across domains. Framed right, your portfolio proves range, not narrowness. This is your multi-domain story, ready for an interview.
| Project | Domain | Transferable proof |
|---|---|---|
| CI-Radar / registry pipelines | pharma intelligence | production RAG over 440K records across 40+ messy sources |
| AD patient-flow / market models | pharma epidemiology & market access | data viz + AI forecasting from Excel models (React + Streamlit) |
| FDA inspection pipeline | regulatory / compliance | fuzzy entity matching + multi-sheet reporting |
| India Political Pulse | civic / political analytics | constituency dashboards, DPDP-compliant aggregation |
| Electoral-roll OCR | govtech | computer-vision / OCR pipeline at scale |
| TrainHub | edtech / SaaS | Django video platform, Celery/HLS transcoding, RBAC |
| Surabhi Vanam | nonprofit / community | web platform for a goshala & spiritual initiative |
Interview Q&A
Most candidates land in one of three common quadrants; the valuable one is nearly empty. Plot yourself on two axes — how deep is the core craft, and how many unrelated domains has it been proven in. Deep-and-narrow is the typical specialist; shallow-and-broad is the typical generalist (and the one that scares hiring managers). Your portfolio puts you in deep-and-broad, which is exactly the staff/principal and founding-engineer signal because it's the hardest to fake.
| Narrow (1 domain) | Broad (5+ domains) | |
|---|---|---|
| Deep craft | specialist — valuable but boxed in | you — staff/principal signal, rare |
| Shallow craft | junior / early career | jack-of-all-trades — the scary hire |
A reusable script for the "tell me about your range" question. Pick the two projects nearest the role's domain, then explicitly name the spine that connects all of them. The structure: anchor → spread → through-line → fit.
# ANCHOR — your deepest proof (always lead here)
"My deepest work is a 440K-record production RAG + entity-matching"
"platform over 40+ messy regulated sources."
# SPREAD — name 2 unrelated domains to prove range
"The same pattern shipped in civic analytics (Political Pulse,"
"DPDP-compliant) and govtech (electoral-roll OCR at scale)."
# THROUGH-LINE — say the spine out loud
"The constant is: messy data -> structure -> AI -> product."
# FIT — bridge to THEIR domain (swap per interview)
"For your fraud problem, that pattern instantiates as ..."
Never dump all seven projects — it reads as a list, not a range. Pick by domain distance from the role: one project near the target (shows relevance) and one far from it (shows you adapt). Then the through-line does the work of connecting them.
| If the role is… | Lead near | Lead far (range proof) |
|---|---|---|
| Fintech / RegTech | FDA inspection (compliance, matching) | TrainHub (edtech SaaS, infra) |
| Govtech / Civic | Political Pulse + Electoral OCR | CI-Radar (regulated RAG at scale) |
| AI platform / founding eng | CI-Radar (RAG, scale) | Surabhi Vanam (0→1 product build) |
| Healthcare / health-tech | AD patient-flow models | Political Pulse (privacy-aware data) |
Interview Q&A · deep dive
The multi-domain learning path how to grow
How to deliberately become valuable across domains without being shallow. The shape of your expertise matters more than its size.
| Shape | What it is |
|---|---|
| I-shaped | one deep skill, little breadth — capable but fragile and easily boxed in |
| T-shaped | one deep domain + broad working knowledge — the baseline for "senior" |
| π-shaped (Pi) | two deep legs — rare and powerful (you: data-engineering depth + GenAI depth) |
| Comb-shaped | several deep competencies — staff / principal and independent-consulting range |
| Step | For you, concretely |
|---|---|
| 1 Anchor the spine | done — Python + data engineering + GenAI is your deep leg |
| 2 Add a second deep leg | a domain (fintech, healthtech) or a discipline (system design, MLOps, agentic architecture) |
| 3 Cross-train via projects | ship one real thing in a new domain — Political Pulse was exactly this move |
| 4 Abstract the pattern | after 2–3 domains you see the meta-pattern and ramp into the next in weeks |
Interview Q&A
The shapes (I → T → π → comb) aren't personality types — they're a route you walk on purpose. Each transition has one move that earns it. This is the lifecycle of deliberately widening without going shallow: anchor depth, add a second deep leg, cross-train each new domain through a shipped project, then abstract the meta-pattern so the next leg is cheaper than the last.
Step 2 (add a deep leg) forks. A second domain (fintech, healthtech) widens the markets you can sell into; a second discipline (system design, MLOps, agentic architecture) deepens the craft itself. Pick by what your target roles screen for — and note disciplines compound across all domains, so they're usually the higher-leverage second leg.
| Pick a 2nd DOMAIN if… | Pick a 2nd DISCIPLINE if… |
|---|---|
| you want to switch industries / consult | you want staff/principal in your current industry |
| your market is geographically domain-locked | you want leverage that applies in every domain |
| a specific high-pay vertical attracts you | you keep hitting an architecture/scale ceiling |
| example: add fintech for the comp band | example: add agentic system design — pays everywhere |
The compounding rule made operational: every quarter, convert one breadth ambition into a shipped artifact. A certificate proves you watched; a deployed project proves you can. Use this loop to add a comb tooth every 3–4 months.
# quarterly breadth loop — repeat to grow the comb
def breadth_quarter(new_area):
pick = smallest_real_problem(new_area) # scoped to ship in weeks
ship = build_end_to_end(pick) # real users or real data, not a toy
proof = add_to_portfolio(ship) # beats any certificate
lesson = abstract_pattern(ship) # what transfers to leg #N+1?
return proof, lesson
# after 2-3 cycles the meta-pattern emerges and ramp cost -> drops
# Political Pulse was one such quarter: pharma -> civic, shipped, proven
Interview Q&A · deep dive
The Path to Mastery
The capstone, and an honest one: the 157 cards in this hub are inputs, not expertise. Reading them gives you knowledge; only deliberate practice, retention, and application turn knowledge into mastery. This section is the operating system for becoming — and staying — expert in every direction: how skill actually forms, how to make it stick, the order to learn it in, and how to keep from going stale.
How expertise is actually built deliberate practice
Reading this hub gives you knowledge. Knowledge isn't expertise. Expertise comes from deliberate practice — focused, effortful work at the edge of your ability, with immediate feedback. Understanding how mastery actually forms is what turns 157 cards into a real skill.
| Principle | What it means |
|---|---|
| Deliberate practice | not "doing the job" — specific, hard tasks just beyond your current ability, with feedback (Ericsson's finding across every expert field) |
| The learning zone | comfort zone (no growth) → learning zone (hard, error-prone, where growth lives) → panic zone (too hard). Live in the middle. |
| Feedback loops | practice without feedback entrenches errors. Tighten the loop — tests, code review, mentors, predicting outcomes before you check |
| Experience ≠ expertise | ten years of the same year repeated is a plateau, not mastery; deliberate practice is what keeps you climbing |
Interview Q&A
Anders Ericsson's research is precise about what separates deliberate practice from mere repetition. A rep only counts if it has all four: a specific stretch goal (one named thing slightly beyond reach), full focus (no autopilot, no multitasking), immediate feedback (you find out fast whether it worked), and error correction (you adjust and retry the same edge). Drop any one and you're back to logging hours. "I coded for three hours" is not three hours of practice; "I rewrote this parser without lookups until I could do it clean, twice" is.
Skill does not rise smoothly. It moves in a power law: fast early gains, then a long plateau where effort seems to produce nothing, then a jump. The plateau is not failure — it's the brain consolidating and your old method hitting its ceiling. Plateaus break when you change the constraint, not the volume: slow down to fix the broken sub-skill, raise difficulty deliberately, or get an outside eye on the error you can't see. Pushing the same method harder just deepens the rut.
A repeatable template you can run on any hub card. The point is the tight feedback loop and predict-before-check step — predicting forces retrieval and surfaces the exact gap.
# A deliberate-practice rep, written as pseudocode you actually run
def practice_rep(skill, edge_task):
# 1. specific stretch: one thing just past your current ability
goal = pick_edge(skill) # e.g. "write LRU cache from memory, no lookups"
# 2. predict BEFORE you check — this is the high-yield step
prediction = attempt_from_memory(goal)
actual = run_and_observe(prediction) # tests, repl, mock interviewer
# 3. immediate feedback → name the exact error
gap = diff(prediction, actual)
if not gap:
return raise_difficulty(goal) # too easy = no learning; re-stretch
# 4. correct the SAME edge immediately, then space it
redo_until_clean(goal, times=2)
return schedule_review(goal, days=3) # hand off to your retention system
# Effort budget: 80% at the edge (hard, error-prone), 20% review.
# If a session felt smooth and easy, the edge was set too low.
Interview Q&A · deep dive
Make it stick — your learning system retention
You forget most of what you read within a day (the forgetting curve). A deliberate retention system is the difference between "I read about RAG once" and "I can build RAG from memory." Here's how to convert this hub into permanent knowledge.
| Method | How to use it here |
|---|---|
| Active recall | close the card and explain it from memory — retrieval builds memory far more than re-reading. The Q&A rail on every card is built for exactly this. |
| Spaced repetition | review at expanding intervals (1d → 3d → 1w → 1m) to beat the forgetting curve; put the facts in Anki |
| Feynman technique | explain it simply, as if teaching a beginner — where you stumble is where you don't really understand it yet |
| Interleaving | mix topics instead of blocking one (a Python card, a system-design card, a reasoning card) — harder, but builds flexible recall |
| Elaboration | connect each new idea to what you already know — "why does this work? what is it like?" |
Interview Q&A
Ebbinghaus showed memory decays roughly exponentially: without review you lose the majority of new material within a day or two. Each successful retrieval just before you'd forget flattens that curve and lengthens the next interval — this is the spacing effect, and it's why expanding intervals (1d → 3d → 1w → 1m → 3m) beat cramming on total retention per minute invested. The hard part is counterintuitive: difficulty is the mechanism, not a side effect. A review that feels effortful (you almost forgot) strengthens memory far more than one that feels easy — these are Bjork's "desirable difficulties."
The schedule below is the SM-2-style algorithm Anki uses, simplified. The deeper skill is writing good cards: one idea per card (atomic), phrased as a question that forces recall of a fact you can't guess, and—crucially—your own words, not a copy-paste.
# Minimal spaced-repetition scheduler (the core of SM-2 / Anki)
def next_interval(card, quality):
# quality 0-5: how well you recalled. <3 = failed.
if quality < 3:
card.interval = 1 # reset — relearn tomorrow
card.reps = 0
return card
card.reps += 1
if card.reps == 1: card.interval = 1
elif card.reps == 2: card.interval = 6
else: card.interval = round(card.interval * card.ease)
# ease drifts with performance, floored so it never collapses
card.ease = max(1.3, card.ease + (0.1 - (5 - quality) * 0.08))
return card
# Card-writing rule the algorithm can't fix for you:
# BAD : "Tell me everything about RAG." (not atomic, not recallable)
# GOOD: "RAG: what step turns the query into a vector?" -> "embedding"
# GOOD: "Why does RAG reduce hallucination?" -> "grounds answer in retrieved text"
| Method | Best for | Failure mode |
|---|---|---|
| Active recall | any fact you must reproduce under pressure | skipped because re-reading feels more productive |
| Spaced repetition | durable facts, vocabulary, APIs, definitions | cramming 200 new cards/day → review avalanche, burnout |
| Feynman technique | conceptual understanding, exposing fuzzy "knowing" | stopping at the part you can explain, skipping the gap |
| Note systems (Zettelkasten) | connecting ideas across domains over months | collecting notes you never revisit ("digital hoarding") |
| Project-based | integrated skill — using ideas together in the wild | no transferable extraction; lessons stay stuck to one project |
Interview Q&A · deep dive
The roadmap across all 16 domains sequenced path
"Expert in all directions" needs an order, not a pile — trying to learn everything at once learns nothing. This sequences the hub into layers, each building on the last, so you always know what to learn next.
| Layer | Domains — and why here |
|---|---|
| 0 · Foundations | Python Foundations, Data Structures & SQL, Problem Solving — everything else sits on these |
| 1 · Core craft | Design/Concurrency/APIs, ML & Data Science, Systems & Platform Craft — the working engineer's toolkit |
| 2 · AI specialization | AI/ML/LLM Engineering, Claude Mastery, the transformer + LLM-internals cluster — your differentiator |
| 3 · Production & scale | MLOps/Orchestration, Docker & Kubernetes, AWS Cloud, Security — how it survives real traffic |
| 4 · Range & leadership | Multi-Domain Mastery, Leadership & Growth, Architecture & System Design — scope beyond code |
| 5 · Frontier & interview | Quantum & PQC, Interview Playbook — as the goal demands |
Interview Q&A
A roadmap is a dependency graph, not a reading order. Each layer unlocks the next: you can't reason about RAG retrieval quality without embeddings and vector intuition; you can't run an LLM service at scale without containers and cloud first. The diagram below is the critical path — follow the arrows, and let your target role decide how deep to go in each layer rather than trying to complete them all.
The classic T-shape — broad awareness across many areas, deep in one — is the right default, but for a senior generalist a π-shape (two deep legs, e.g. AI engineering + production/MLOps) is the higher-leverage target because the two depths reinforce each other. The mistake is the dash with no stem (shallow everywhere → impressive in conversation, useless under load) or the lone vertical line (one deep skill, no context → can't operate in real systems).
| Shape | Profile | Where it wins / fails |
|---|---|---|
| I-shape | one deep skill, narrow | wins as a pure specialist; fails the moment work spans domains |
| Dash (—) | broad, shallow everywhere | great at meetings; can't actually build or debug the hard part |
| T-shape | one deep leg + broad awareness | the reliable default for most engineers |
| π-shape | two deep legs + broad awareness | the generalist-expert; two depths compound (AI × infra) |
# Turn the layered roadmap into a runnable quarter.
# Rule: ONE deep target gets 70% of learning time; rest stays "warm."
target = "AI/LLM Engineering" # chosen by the role you're aiming at
horizon = "Q3 2026"
plan = {
"deep (70%)": ["t-rag", "t-prompt", "build: eval harness for a RAG app"],
"warm (20%)": ["one Docker card/wk", "one SQL card/wk"], # maintain prerequisites
"aware (10%)": ["skim quantum + leadership once"], # single awareness pass
}
def milestone(plan):
# a layer is "done" when you can BUILD from it, not when you've read it
return "ship one project that exercises the deep layer end-to-end"
# Re-sequence every quarter: the target shifts, the roadmap shifts with it.
# Foundations (Layer 0) are the only thing you never let go cold.
Interview Q&A · deep dive
Staying expert in a fast field never stale
In AI especially, expertise decays — what's current in 2026 is legacy by 2028. The half-life of a skill is shrinking, so staying expert is a system, not an event. Here's how to compound instead of decay.
| Habit | Why it compounds |
|---|---|
| Curate your information diet | follow primary sources (papers, lab blogs, release notes) over hot takes; ~10 high-signal sources, cut the rest |
| Build, don't just consume | "tutorial hell" is endless consuming with no building; one real project teaches more than ten courses |
| Learn in public | write, post, teach — explaining forces real understanding and compounds your reputation at the same time |
| Teaching is the final form | if you can teach it clearly, you own it — this hub is itself an act of learning-by-teaching |
| First principles over trends | understand why a technique works, not just that it's hot — fundamentals don't expire, frameworks do |
Interview Q&A
Think of expertise as a balance that decays. The "half-life of a skill" is how long until half of what you know is obsolete — for stable fundamentals (algorithms, OS, networking) it's decades; for fast tooling (a specific LLM framework's API) it can be under a year. Staying expert means your learning rate must exceed your decay rate. The leverage move is to invest most of your time in the slow-decaying layer (first principles) and just enough in the fast layer to stay fluent — because fundamentals transfer to whatever replaces today's tools.
The failure mode is volume, not scarcity — infinite feeds optimized for engagement, not signal. Build a small, primary-source-weighted diet and a discipline for turning consumption into output, so reading converts to skill instead of dopamine.
| Tier | Source type | Cadence / rule |
|---|---|---|
| Primary | papers, lab/release notes, official docs, source code | ~1 deep read/week; this is ground truth, weight it heaviest |
| Curated | a few high-signal newsletters / practitioners you trust | skim weekly for what to go read, not as the read itself |
| Social | feeds, forums, hot takes | timeboxed; treat as a discovery layer, never the source |
| Build | your own small projects exercising the new idea | 1/month — the diet's output; without it the rest is consumption |
# When a shiny new tool/paper/framework appears, triage it:
def triage(item):
if item.touches_a_current_project:
return "BUILD a 1-hour spike with it now" # learning by doing, in context
if item.is_a_fundamental_shift: # new paradigm, not new wrapper
return "READ the primary source, take 1 atomic note"
if item.is_a_thin_wrapper_on_what_you_know:
return "SKIP — note it exists, move on" # most things land here
return "BOOKMARK, revisit only if it keeps recurring"
# Heuristic: signal recurs. If three sources you trust independently
# keep mentioning it over a month, it's worth a real build. One viral
# thread is noise. Let the recurrence filter the hype for you.
Interview Q&A · deep dive
Interview Playbook
The other domains give you the knowledge; this one packages it. Your edge is that you don't have to invent stories — you ship the systems. The job is to compress what you already run into tight, structured answers. Every story below uses only your real, stated numbers.
STAR & your three headline stories anchors
Most questions — technical or behavioural — are best answered by routing them to one of three production systems you own. Keep each as a STAR skeleton: Situation, Task, Action, Result. Lead with the result when the interviewer is senior; build up to it when they want the reasoning.
| Story | S / T | Action | Result |
|---|---|---|---|
| Dell ReAct agentic bot | A high-volume manual workflow needed automating with reasoning, not just rules. | Built a LangChain ReAct agent (reason → act → observe loop) with tool use over the relevant systems. | 95% processing-time reduction, 400+ FTE of effort saved. |
| CI-Radar | Competitive clinical-trial intelligence needed to be searchable and synthesised across many sources. | Production RAG pipeline (Streamlit + FastAPI) — ingest, index, retrieve, generate — across the registry estate. | 440K+ trials across 40+ registries served through one retrieval layer. |
| Investigator matching | Investigators had to be resolved and de-duplicated across many registries with messy names. | 8-tier matching logic with fuzzy name matching + location verification over the record estate. | 5.4M records reconciled across 13 registries. |
Interview Q&A
Plain STAR tells the story; STARL (adding Learning) is what reads as senior. Anyone can narrate a win — a Principal/Manager candidate closes with what the experience changed in how they work or what they built so it never recurs. That final beat converts a war story into evidence of judgement.
You don't need ten stories — you need three orthogonal ones that you can re-aim at almost any prompt. Pick stories that each carry a different dominant theme, then the interviewer's question only has to map to the closest axis.
| Anchor | Dominant theme it owns | Re-aims to answer |
|---|---|---|
| Dell ReAct bot | technical ambition · autonomy · ROI | "proudest", "biggest impact", "took a risk", "automated something" |
| CI-Radar | scale · architecture · delivery under scope | "complex system", "scaling", "shipped end-to-end", "tech choice you defend" |
| Investigator matching | ambiguity · quality · stakeholder feedback | "hardest problem", "data quality", "got it wrong then fixed it", "tradeoff" |
Half of real impact isn't pre-measured. The senior move is to reconstruct a defensible estimate out loud rather than hand-wave. Show the arithmetic — interviewers trust a number they watched you derive.
# Turning "it saved a lot of time" into a number you can defend
manual_minutes_per_case = 11 # measured from 20 timed runs
cases_per_month = 38000 # pulled from the ticket system
automated_minutes = 0.6 # agent latency, observed p50
saved_min = (manual_minutes_per_case - automated_minutes) * cases_per_month
saved_fte = saved_min / (60 * 160) # 160 productive hrs / FTE-month
reduction = (manual_minutes_per_case - automated_minutes) / manual_minutes_per_case
print(f"{reduction:.0%} time cut, ~{saved_fte:.0f} FTE/mo")
# 95% time cut, ~411 FTE/mo — now defensible, with every input named
Interview Q&A · deep dive
A system-design framework that always works structure
Senior loops grade structure over trivia. Drive the conversation through the same six steps every time so you never freeze on a blank whiteboard.
Interview Q&A
The earlier rail names the steps; here is the expanded version with the one stage candidates skip — the deep-dive, where the interviewer probes a single component to depth. Budget your 45 minutes so you reach it: spend ~5 on requirements, ~5 on estimation, ~10 on API+data, ~10 on high-level, then leave ~15 for the deep-dive and bottlenecks. Running out of time at the high-level diagram is the most common silent fail.
You are graded on being directionally correct and consistent, not exact. Memorise three anchors and derive the rest: 1M requests/day ≈ 12 QPS average, peak is roughly 10× the average, and storage = writes × retention × replication × overhead. Round aggressively to powers of ten.
# Sizing CI-Radar-style RAG retrieval at scale
trials = 440_000
chunks_per_trial= 12
dim = 1024 # embedding dimension
bytes_per_float = 4 # float32
vectors = trials * chunks_per_trial # ~5.3M vectors
index_gb = vectors * dim * bytes_per_float / 1e9
print(f"{vectors/1e6:.1f}M vectors, ~{index_gb:.0f} GB raw")
# 5.3M vectors, ~22 GB raw → fits in RAM on one large node; no shard yet
# Online QPS & the real constraint
daily_queries = 2_000_000
avg_qps = daily_queries / 86_400 # ~23 QPS
peak_qps = avg_qps * 10 # ~230 QPS at peak
llm_p50_s= 1.8 # generation dominates latency
print(f"peak {peak_qps:.0f} QPS; bottleneck = LLM at {llm_p50_s}s, not the ANN index")
Show the contract before the boxes. A tight endpoint and a normalised schema signal you think in interfaces, not diagrams.
# API contract — explicit pagination, idempotency, versioned
POST /v1/search
{ "q": "phase 3 oncology in EU", "k": 8, "filters": {"phase": "3"} }
-> { "answer": "...", "citations": ["trial_id"], "latency_ms": 1900 }
# Data model — canonical entity + source rows linked to it (matching pattern)
class Entity: # the resolved record
id: str; canonical_name: str; n_sources: int
class SourceRecord: # one row from one registry, points at an Entity
id: str; entity_id: str; registry: str; raw_name: str; score: float
Interview Q&A · deep dive
The QE / LLM-evaluation angle role-specific
For a Principal Engineer QE loop, the question behind every question is: "how do you prove an AI system works — and keep proving it?" You have the rare combination of having built the systems and needing to test them, so frame evaluation as engineering, not QA-as-afterthought.
| They ask about | Your framing |
|---|---|
| Testing non-deterministic LLM output | You can't assert exact strings — you assert properties: faithfulness to context, answer relevance, no hallucination. That's what RAGAS / DeepEval measure. |
| RAGAS / DeepEval | Reference-free metrics over a RAG system — faithfulness, context precision/recall, answer relevancy — runnable in a pipeline like any other test. |
| pytest | The harness: parametrise over a golden dataset, run metric assertions with thresholds, fail the build when quality regresses. |
| Selenium / Playwright | End-to-end UI verification on top of the model layer — the app actually renders the cited answer, not just the API. |
Interview Q&A
Frame QE for AI as a pyramid, widest and cheapest at the bottom. Most quality is caught by deterministic tests; LLM-as-judge metrics sit above them for the irreducibly probabilistic layer; human review caps the tip for the ambiguous tail. Saying "I'd LLM-judge everything" is a junior answer — judges are slow, costly, and themselves need validating.
| Layer | Catches | Tooling | Cost / speed |
|---|---|---|---|
| Deterministic | schema, parsing, regex, exact-match, latency budgets | pytest, contract tests | cheap · ms |
| Reference-based metrics | retrieval quality vs labels | RAGAS context precision/recall | cheap · no LLM call |
| LLM-as-judge | faithfulness, relevancy, tone, G-Eval rubrics | DeepEval (50+ metrics), RAGAS faithfulness | costly · seconds |
| Human | the ambiguous, high-stakes tail | labelling UI feeding the golden set | expensive · slow |
Talk about the gate concretely. A golden set, a metric, a threshold, a build that goes red — that's the whole loop, and being able to write it is the difference between describing evaluation and owning it.
import pytest
from deepeval import assert_test
from deepeval.metrics import FaithfulnessMetric, ContextualRecallMetric
from deepeval.test_case import LLMTestCase
GOLDEN = load_golden("trials_v7.jsonl") # versioned, grown from prod failures
@pytest.mark.parametrize("item", GOLDEN)
def test_rag_quality(item):
out = rag_pipeline(item["question"])
case = LLMTestCase(
input=item["question"],
actual_output=out.answer,
retrieval_context=out.chunks,
expected_output=item["reference"],
)
# fail the BUILD if faithfulness or recall regresses below the bar
assert_test(case, [
FaithfulnessMetric(threshold=0.9), # no hallucination past the context
ContextualRecallMetric(threshold=0.8), # retriever found the right chunks
])
Interview Q&A · deep dive
Leadership & behavioural manager
You're a Development Manager leading two teams — so behavioural answers should show judgement and multiplication, not just individual heroics. The pattern: a situation, the call you made, how you brought people with you, the outcome.
Interview Q&A
Behavioural rounds aren't random — each question screens for a named trait. Recognise the trait and you know which beat to emphasise. Answer with SCRO: Situation, the Call you made, how you brought people with you (Rally), the Outcome — then the trait surfaces itself.
| Question | Trait it screens for | Beat to land |
|---|---|---|
| Disagreement with a senior | conviction + disagree-and-commit | brought data; committed fully once decided |
| A time you failed / missed a deadline | ownership + recovery | owned it; built the mechanism that prevents recurrence |
| Conflict between two engineers | de-escalation + fairness | moved it to data/criteria, not personalities |
| Growing / mentoring someone | multiplication, not heroics | handed ownership; stepped back; raised their ceiling |
| Prioritising under pressure | judgement + saying no | made the tradeoff explicit; protected the team's focus |
The trap in conflict stories is sounding like you won a fight. Reframe every one around criteria over personalities: you didn't out-argue anyone, you moved the decision onto shared, objective ground.
Interview Q&A · deep dive
Rapid-fire bank drill
One-breath answers across every domain in this hub. If you can give the crisp version, you can always expand — and the crisp version is what gets you past the screen.
Open the bank
The first bank covers the screen; this round covers the follow-up. One-breath answers to the harder second questions that separate "knows the term" from "has shipped it".
Open the deeper bank
Interview Q&A · deep dive
Question bank — by category, with pointers research
A working catalogue of what panels actually ask, grouped so you can drill the weak categories. Each question has a one-line cue and a jump to the card with the senior-level answer. Treat this as the dashboard for revision, not the destination.
| Question | Cue | Card |
|---|---|---|
| What is the GIL and when does it bite? | one thread of bytecode at a time → CPU-bound suffers, I/O is fine | Concurrency · GIL |
| Mutable default argument bug | default evaluated once at def-time → shared list across calls | Mutability |
| Explain decorators | function-returning-function; @ = syntactic sugar | Decorators |
| Generators vs lists — when? | lazy, constant memory, single-pass | Generators |
| __init__ vs __new__ | new constructs the instance, init configures it | OOP & dunder |
| LEGB / closures | name lookup; closures capture by reference, not value | Scope · LEGB |
| async vs threads vs multiprocessing | I/O-many → async; I/O-few → threads; CPU → mp | Concurrency models |
| Question | Cue | Card |
|---|---|---|
| Big-O of common operations | dict/set O(1) avg; list append O(1) amortised; in on list O(n) | Big-O & pick |
| Pick the right container | deque / heapq / Counter / defaultdict | collections · heapq |
| Two-sum, sliding window, BFS/DFS | name the pattern first, then implement | DSA patterns |
| SQL joins & index selection | EXPLAIN; composite index column order | SQL |
| Find duplicates / dedupe a DataFrame | drop_duplicates; vectorised is the answer | Pandas |
| Question | Cue | Card |
|---|---|---|
| Explain SOLID with an example | walk one (DIP injection) end to end | SOLID & Pythonic |
| Factory vs Builder vs Singleton | creation / step-by-step / one-instance; Python rarely needs Singleton | Creational |
| Adapter vs Facade | translate-1-to-1 vs simplify-many | Structural |
| Strategy with a real example | "my 8-tier matcher" | Behavioural |
| Circuit breaker, retry, idempotency | protect the dependency; safe to retry | Resilience |
| PUT vs PATCH; status codes | idempotency; honest 4xx/5xx | REST |
| FastAPI def vs async def | blocking → threadpool; await → loop | FastAPI in depth |
| Rate limit a noisy client | token bucket; 429 + Retry-After | API limits |
| Question | Cue | Card |
|---|---|---|
| Walk me through building a model | frame → split → baseline → iterate → eval → ship | Model dev rules |
| Bias-variance tradeoff | under vs over; fix variance with data/regularisation | Model dev rules |
| How would you detect data leakage? | fit-on-train only; suspicious val scores | Feature engineering |
| Precision vs recall — when each? | cost of FP vs FN; F1 balances | Evaluation |
| Why XGBoost on tabular? | handles mixed types, missing, interactions; strong default | Tree ensembles |
| Why is NumPy fast? | contiguous C array, ufuncs, no interpreter loop | Vectorization |
| Backprop in one minute | chain rule; autograd records the graph | Deep learning |
| PyTorch or TensorFlow? | PyTorch greenfield 2026; TF for established TFX/TPU | Frameworks |
| What does .backward() do? | autograd walks the recorded graph in reverse | Frameworks |
| Question | Cue | Card |
|---|---|---|
| Explain RAG end to end | ingest → embed → retrieve → rerank → augment → generate → eval | RAG architecture |
| RAG vs fine-tune vs prompt | fresh knowledge → RAG; style/behaviour → fine-tune | RAG vs FT vs prompt |
| Get reliable JSON from an LLM | schema + temp 0 + structured output + repair | Prompt catalogue |
| Zero-shot vs few-shot | add examples when format / edge-cases hard to describe | Prompt catalogue |
| When does CoT not help? | single-step tasks; missing world knowledge | CoT · ToT · Reflexion |
| Self-Consistency vs ToT | parallel votes vs branching search | CoT · ToT · Reflexion |
| Defend a RAG against prompt injection | fence untrusted, validate tools, gate writes | Production prompting |
| Cosine vs Euclidean | direction (orientation) vs distance (magnitude) | Embeddings |
| ReAct vs plain RAG | RAG is a capability; ReAct is an architecture | Resilience & agentic |
| Evaluate a RAG system | faithfulness, relevance, recall (RAGAS); golden set | Evals |
| Question | Cue | Card |
|---|---|---|
| When Airflow over cron? | dependencies, retries, backfills, SLAs | Airflow |
| Airflow vs NiFi | tasks vs data movement | NiFi · Kafka |
| What does Kafka give you? | durable replay, decoupled consumers, partition scale | NiFi · Kafka |
| How do you detect model drift? | monitor inputs & outputs; reference window; alert | Monitoring & drift |
| What's in LLMOps that MLOps misses? | prompts versioned, tokens metered, guardrails | LLMOps |
| Question | Cue | Card |
|---|---|---|
| Image vs container | blueprint vs running instance | Docker |
| Multi-stage Dockerfile — why? | build deps stay out of runtime image | Dockerfile |
| Pod vs Deployment vs Service | unit / desired state / stable network | K8s objects |
| HPA — what triggers a scale event? | metric crosses threshold for stabilisation window | K8s autoscale |
| Why managed K8s on AWS? | control plane HA; you focus on workloads | AWS compute |
| Pick AWS services for a RAG app | ALB → ECS → S3 / OpenSearch / Bedrock | AWS architecture |
| Question | Cue | Card |
|---|---|---|
| AuthN vs AuthZ | who vs what-may; different failure modes | AuthN/AuthZ |
| Walk the TLS handshake | hello → cert chain → verify → key agreement → mTLS adds reverse | PKI · TLS |
| Defend an LLM agent with tool access | layered: scope tools, validate args, human gate, audit | OWASP + LLM |
| Zero Trust concretely | no implicit network trust; identity is the perimeter | Secrets · ZT |
| Does quantum break all crypto? | asymmetric yes (Shor); symmetric halved (Grover) | PQC |
| Supremacy vs advantage | any task vs useful task | Willow |
| Question | Cue | Card |
|---|---|---|
| Design a RAG system at scale | 6-step rail: req → est → API → data → blocks → ops | System design |
| Most challenging project | Dell ReAct headline: 95% time cut, 400+ FTE | STAR stories |
| Disagreement with a senior | data-led, scope-bounded, disagree-and-commit | Leadership |
| How do you ship an LLM feature with quality? | golden set, faithfulness gate, eval in CI | QE / eval |
The follow-up questions panels reach for once the basics land. Each still jumps to the card with the senior answer.
| Question | Cue | Card |
|---|---|---|
| How does CPython manage memory? | refcounting + cycle-collecting GC; arenas/pools | Memory model |
| Does removing the GIL fix everything? | frees CPU threads but reintroduces locking cost & races | Concurrency · GIL |
| What guarantees does a context manager give? | __exit__ runs on exception → deterministic cleanup | Context managers |
| Closure captures value or variable? | the variable (late binding) — the loop-var gotcha | Scope · LEGB |
| When does a generator beat a list comprehension? | streaming / infinite / memory-bound single pass | Generators |
| Question | Cue | Card |
|---|---|---|
| Make a non-idempotent write safe to retry | client idempotency key → server dedupes by key | Resilience |
| Circuit breaker states | closed → open → half-open probe → close | Resilience |
| Dependency injection — why bother? | invert control → testable, swappable seams (DIP) | SOLID & Pythonic |
| FastAPI dependency for auth & db session | Depends; per-request lifecycle, yield for cleanup | FastAPI in depth |
| Token bucket vs leaky bucket | burst-tolerant vs smooth-rate; 429 + Retry-After | API limits |
| POST vs PUT vs PATCH idempotency | POST not, PUT yes (full), PATCH partial | REST |
| Question | Cue | Card |
|---|---|---|
| Diagnose: train acc high, val acc low | overfit → more data / regularise / simpler model | Model dev rules |
| Why scale features before KNN / SVM? | distance/gradient is scale-sensitive | Feature engineering |
| ROC-AUC vs PR-AUC on imbalanced data | PR-AUC is honest when positives are rare | Evaluation |
| Why XGBoost still beats DL on tabular | handles mixed types, missing, interactions | Tree ensembles |
| What makes NumPy fast vs a Python loop? | contiguous C buffer, ufuncs, no per-elem interp | Vectorization |
| Why does .backward() need a scalar? | gradient of a scalar loss w.r.t. params | Deep learning |
| Question | Cue | Card |
|---|---|---|
| Chunking strategy & the overlap tradeoff | precision vs coherence; overlap saves boundaries | RAG architecture |
| Why add a reranker after ANN? | cross-encoder lifts top-k precision before the LLM | RAG architecture |
| Cosine vs dot vs Euclidean for embeddings | direction vs magnitude; normalise then cosine | Embeddings |
| Structured output reliably from an LLM | schema + temp 0 + validate + repair loop | Production prompting |
| When does ReAct beat plain RAG? | multi-step, tool-using, decide-then-act tasks | Resilience & agentic |
| How do you evaluate a RAG system? | faithfulness + context recall on a golden set | QE / eval |
| Self-Consistency vs Tree-of-Thoughts | parallel votes vs branching search + backtrack | CoT · ToT · Reflexion |
| Question | Cue | Card |
|---|---|---|
| Liveness vs readiness probe | restart-the-pod vs pull-from-LB | K8s objects |
| Requests vs limits, and OOMKill | schedule/guarantee vs cap; memory cap kills | K8s autoscale |
| What does Kafka actually guarantee? | ordered within a partition; durable replay | NiFi · Kafka |
| Airflow over cron — when? | dependencies, retries, backfills, SLAs | Airflow |
| What does LLMOps add over MLOps? | prompt versioning, token cost, guardrails, eval gate | LLMOps |
| Detect drift without ground truth | monitor input + prediction distributions | Monitoring & drift |
| Multi-stage Dockerfile payoff | build deps out of runtime → slim, safer image | Dockerfile |
| Question | Cue | Card |
|---|---|---|
| How does mTLS differ from one-way TLS? | both sides present + verify certs | PKI · TLS |
| Defend an agent with tool access | scope tools, validate args, human-gate writes, audit | OWASP + LLM |
| Where do secrets actually live? | vault/KMS, injected at runtime, never in image | Secrets · ZT |
| "Harvest now, decrypt later" — why care today? | migrate to PQC before quantum matures | PQC |
| Quantum supremacy vs advantage | any contrived task vs a useful one | Willow |
| Question | Cue | Card |
|---|---|---|
| Back-of-envelope: size a vector index | vectors × dim × 4 bytes; does it fit RAM? | System design |
| "Now 100× the traffic" — first move | re-run the estimate; name the new bottleneck | System design |
| Tell me about a failure (and the L) | own it; install the mechanism that prevents recurrence | STAR / STARL |
| Disagree-and-commit story | steelman theirs; move to criteria; commit fully | Leadership |
| Prove an LLM feature is good enough to ship | golden set + faithfulness/recall gate in CI | QE / eval |
| How do you quantify fuzzy impact? | reconstruct the estimate out loud; name inputs | STAR / STARL |