Each question includes the tip for answering and what the interviewer is actually evaluating.

Q1case

Design an A/B test to evaluate a new ranking algorithm for Google Search.

Why Google asks: Stats round canonical question. They want to see you understand Google-scale experimental constraints — interaction effects, network effects, novelty bias.

How to answer: Cover: unit of randomization (query? session? user?), metric selection (relevance proxy like CTR + counter-metric like dwell time), sample size calculation, novelty effect handling, interaction with other ongoing experiments.

What they evaluate: Experimental design rigor, awareness of Google-scale challenges (interaction effects), counter-metric thinking

Q2technical

Write a SQL query to find users who logged in 3 consecutive days.

How to answer: Use a window function: ROW_NUMBER() OVER (PARTITION BY user ORDER BY login_date). Subtract from login_date. Group by the difference — runs of consecutive days have the same group ID. Filter for groups of size 3+.

What they evaluate: Window function fluency, comfort with the gaps-and-islands pattern, clean SQL

Q3case

Daily active users on YouTube dropped 5% yesterday. How do you investigate?

Why Google asks: Product analytics round. They want to see you decompose by user segment, geography, platform, before jumping to root cause.

How to answer: Frame as decomposition tree: segment (new vs existing), geo, platform (iOS vs Android vs web), entry source (organic vs notification vs deep link). Check for external events (outages, holidays, competitor launches). State your top 2-3 hypotheses.

What they evaluate: Structured decomposition, multiple hypotheses, awareness of common false positives (weekly patterns, seasonality)

Q4technical

Explain the difference between Type I and Type II errors. When do you care more about each?

How to answer: Type I = false positive (rejecting true null). Type II = false negative (failing to reject false null). Care more about Type I when the cost of false-positive launch is high (e.g. revenue-critical features). Type II when missing a real win is expensive (e.g. medical trials).

What they evaluate: Conceptual clarity, ability to map to real product decisions, awareness of power-vs-significance tradeoff

Q5case

How would you build a model to predict whether a YouTube user will churn?

How to answer: Cover: define churn (no logins in 28 days?), feature engineering (engagement signals, content preferences, social graph), model selection (gradient boosting for tabular vs neural for sequence), evaluation (AUC, precision at top decile), serving (offline batch vs online).

What they evaluate: End-to-end ML thinking, awareness of class imbalance, business-relevant feature engineering, productionization realism

Q6technical

What's the difference between a difference-in-differences and a simple before-after analysis?

Why Google asks: Causal inference is increasingly required for Google DS roles, especially in Ads and Search. They want to see you handle observational data correctly.

How to answer: Before-after assumes nothing else changed (often false). DiD compares treatment vs control over time — controls for time trends. Discuss parallel trends assumption and how to test it.

What they evaluate: Causal inference fluency, awareness of common identification strategies, ability to discuss assumptions and validations

Q7behavioral

Tell me about a time you communicated a complex finding to a non-technical audience.

How to answer: Pick a real example. Show how you found the right analogy or visualization, what objections you anticipated, what action your audience took after.

What they evaluate: Empathy for the audience, ability to simplify without dumbing down, evidence the communication drove action

Q8technical

Write a query to find the median revenue per user without using PERCENTILE_CONT.

How to answer: Use ROW_NUMBER() + COUNT() to find rank and total. Median is at ceil(N/2) and floor(N/2)+1. Use a self-join or subquery to fetch those rows and average them.

What they evaluate: Comfort with SQL when shortcuts aren't available, ability to think through median definition rigorously, clean implementation

Q9case

You ran an A/B test for 14 days and the result is significant at p=0.04. Should you ship?

How to answer: Not enough info to ship blindly. Check: peeking problem (was 14 days predetermined?), Simpson's paradox by segment, secondary metric movements, novelty effect (re-run with new users only?), business impact magnitude.

What they evaluate: Awareness of multiple-comparisons issues, peeking problem, Simpson's paradox, ability to push back on 'significant means ship'

Q10values

Why Google specifically vs Meta or Amazon?

How to answer: Connect to Google's specific DS challenges (Search ranking, Ads auction, Maps routing) and the rigor culture. Show you've researched the team.

What they evaluate: Genuine team interest, alignment with rigor-first culture, signal of multi-year intent

Google Data Scientist Interview Questions

The Google Data Scientist interview process

What Google actually evaluates

Process quirks worth knowing

10 questions Google actually asks