Talentee
Googleadvanced

Google Data Scientist Interview Questions

Google Data Scientist interviews cover stats, SQL, ML, and product analytics. The bar is high on both depth (causal inference, experimental design) and breadth (product sense, business framing).

Process length
8-10 weeks
Rounds
7
Questions
10
Mid-level TC
$240k–$330k (L4)
Practice Google questions with AI

The Google Data Scientist interview process

What to expect, in order.

  1. 1Recruiter screen (30 min — fit + level)
  2. 2Technical screen (60 min — SQL + stats fundamentals)
  3. 3Onsite — typically 4-5 rounds
  4. 4Stats round (60 min — hypothesis testing, A/B test design, causal inference)
  5. 5SQL round (60 min — complex queries on a realistic schema)
  6. 6ML round (60 min — model selection, evaluation, productionization)
  7. 7Product analytics round (60 min — case-based: metric movement diagnosis)

What Google actually evaluates

Google Data Scientists are partnered tightly with PMs and Engineers. The interview reflects this — pure stats brilliance without business framing won't pass. Expect product analytics cases that test whether you can drive a decision.

Rigor — get the stats right before you ship anything
Product impact — analytics that drive product decisions
Causal thinking — separate correlation from causation
Communication — explain stats to non-stats partners

Process quirks worth knowing

Google's DS hiring committee weighs the four dimensions (stats, SQL, ML, product) equally. A 'no-hire' in any of them is hard to recover from. Unlike SWE roles where you can shore up one weak round with another strong, DS candidates must be balanced.

10 questions Google actually asks

Each question includes the tip for answering and what the interviewer is actually evaluating.

Q1case

Design an A/B test to evaluate a new ranking algorithm for Google Search.

Why Google asks: Stats round canonical question. They want to see you understand Google-scale experimental constraints — interaction effects, network effects, novelty bias.
How to answer: Cover: unit of randomization (query? session? user?), metric selection (relevance proxy like CTR + counter-metric like dwell time), sample size calculation, novelty effect handling, interaction with other ongoing experiments.
What they evaluate: Experimental design rigor, awareness of Google-scale challenges (interaction effects), counter-metric thinking
Q2technical

Write a SQL query to find users who logged in 3 consecutive days.

How to answer: Use a window function: ROW_NUMBER() OVER (PARTITION BY user ORDER BY login_date). Subtract from login_date. Group by the difference — runs of consecutive days have the same group ID. Filter for groups of size 3+.
What they evaluate: Window function fluency, comfort with the gaps-and-islands pattern, clean SQL
Q3case

Daily active users on YouTube dropped 5% yesterday. How do you investigate?

Why Google asks: Product analytics round. They want to see you decompose by user segment, geography, platform, before jumping to root cause.
How to answer: Frame as decomposition tree: segment (new vs existing), geo, platform (iOS vs Android vs web), entry source (organic vs notification vs deep link). Check for external events (outages, holidays, competitor launches). State your top 2-3 hypotheses.
What they evaluate: Structured decomposition, multiple hypotheses, awareness of common false positives (weekly patterns, seasonality)
Q4technical

Explain the difference between Type I and Type II errors. When do you care more about each?

How to answer: Type I = false positive (rejecting true null). Type II = false negative (failing to reject false null). Care more about Type I when the cost of false-positive launch is high (e.g. revenue-critical features). Type II when missing a real win is expensive (e.g. medical trials).
What they evaluate: Conceptual clarity, ability to map to real product decisions, awareness of power-vs-significance tradeoff
Q5case

How would you build a model to predict whether a YouTube user will churn?

How to answer: Cover: define churn (no logins in 28 days?), feature engineering (engagement signals, content preferences, social graph), model selection (gradient boosting for tabular vs neural for sequence), evaluation (AUC, precision at top decile), serving (offline batch vs online).
What they evaluate: End-to-end ML thinking, awareness of class imbalance, business-relevant feature engineering, productionization realism
Q6technical

What's the difference between a difference-in-differences and a simple before-after analysis?

Why Google asks: Causal inference is increasingly required for Google DS roles, especially in Ads and Search. They want to see you handle observational data correctly.
How to answer: Before-after assumes nothing else changed (often false). DiD compares treatment vs control over time — controls for time trends. Discuss parallel trends assumption and how to test it.
What they evaluate: Causal inference fluency, awareness of common identification strategies, ability to discuss assumptions and validations
Q7behavioral

Tell me about a time you communicated a complex finding to a non-technical audience.

How to answer: Pick a real example. Show how you found the right analogy or visualization, what objections you anticipated, what action your audience took after.
What they evaluate: Empathy for the audience, ability to simplify without dumbing down, evidence the communication drove action
Q8technical

Write a query to find the median revenue per user without using PERCENTILE_CONT.

How to answer: Use ROW_NUMBER() + COUNT() to find rank and total. Median is at ceil(N/2) and floor(N/2)+1. Use a self-join or subquery to fetch those rows and average them.
What they evaluate: Comfort with SQL when shortcuts aren't available, ability to think through median definition rigorously, clean implementation
Q9case

You ran an A/B test for 14 days and the result is significant at p=0.04. Should you ship?

How to answer: Not enough info to ship blindly. Check: peeking problem (was 14 days predetermined?), Simpson's paradox by segment, secondary metric movements, novelty effect (re-run with new users only?), business impact magnitude.
What they evaluate: Awareness of multiple-comparisons issues, peeking problem, Simpson's paradox, ability to push back on 'significant means ship'
Q10values

Why Google specifically vs Meta or Amazon?

How to answer: Connect to Google's specific DS challenges (Search ranking, Ads auction, Maps routing) and the rigor culture. Show you've researched the team.
What they evaluate: Genuine team interest, alignment with rigor-first culture, signal of multi-year intent

Common ways candidates fail this interview

Specific to Google, not generic interview advice.

  • ⚠️Strong stats but weak SQL — one bad SQL round can fail you
  • ⚠️Treating product analytics like academic stats — Google wants action-oriented
  • ⚠️Generic 'big data' answers without specific Google-scale context
  • ⚠️Missing causal inference — increasingly required especially in Ads/Search
  • ⚠️Underprepping the ML round on the assumption stats covers it

Google Data Scientist compensation (2026)

Entry / Junior
$170k–$220k total comp (L3)
Mid-level
$240k–$330k total comp (L4)
Senior+
$350k–$500k total comp (L5)

Sources: levels.fyi, Glassdoor, public filings (US figures, total compensation including base + bonus + equity).

Practice these questions with a live AI interviewer

Nova is Talentee's voice AI interviewer. Speak your answer out loud, get scored on structure, clarity, and confidence, with a detailed PDF report.