Week 3 Practice: Business Metrics
ECON250: Big Data Analytics
Your Submission
Deadline: 3 days after the practice sessions Points: 2 points for satisfactory completion
Required Submission Items
Create the following two views in your personal dataset (econ250-2026.student__yourname):
View 1: week3_monthly_trend
What to create: Monthly order and revenue summary for 2025, one row per month.
Required columns: month, orders, revenue, returned_items
Where to find it: Practice 1, Part 4.
View 2: week3_submission
What to create: A complete monthly business summary for 2025, one row per month.
Required columns:
month— first day of each month (DATE)total_revenue— total sale_price, rounded to 2 decimalstotal_orders— count of distinct ordersaov— average order value (revenue / orders), rounded to 2 decimalsunique_customers— count of distinct customersreturn_rate— returned items / total items as a percentage, rounded to 2 decimalsavg_ship_days— cleaned order-level shipping time: useMIN(created_at)per order as the order date,TIMESTAMP_DIFFin seconds / 86400.0, rounded to 1 decimal
Where to find it: Practice 2, Synthesis section.
Quality check: View must run without errors, contain 12 rows (Jan–Dec 2025), all avg_ship_days values positive, and use correct NULL handling in return rate calculations.
How to Submit
For each view:
- In BigQuery Explorer, find the view in your
student__yournamedataset - Click the three dots menu (⋮) next to the view name
- Select “Copy link”
- Paste both links in Moodle (one per line)
Before You Start
Required: BigQuery console open at console.cloud.google.com/bigquery?project=econ250-2026
Dataset: bigquery-public-data.thelook_ecommerce — a synthetic e-commerce dataset with orders, products, and users. The data is continuously generated, so exact row counts will vary slightly.
Analysis period: All queries should filter to calendar year 2025:
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'Tables you’ll use
| Table | Key columns | Notes |
|---|---|---|
order_items |
order_id, user_id, product_id, sale_price, status, created_at, shipped_at, returned_at | Main fact table — one row per item purchased |
products |
id, category, brand, department, retail_price | Product catalog — JOIN on products.id = order_items.product_id |
users |
id, first_name, last_name, country, created_at | Customer data — JOIN on users.id = order_items.user_id |
inventory_items |
id, product_id, cost, sold_at | Inventory tracking — multiple rows per product (used in Bug Hunt) |
Some columns are NULL by design, not because of data quality issues:
shipped_at— NULL if the item hasn’t shipped yetdelivered_at— NULL if not yet deliveredreturned_at— NULL if the item was not returned
These NULLs carry business meaning. Don’t filter them out — you’ll need them for metric calculations.
What you’ll practice
- Additive metrics: SUM, COUNT, COUNTIF, DATE_TRUNC grouping
- Distinct counts: COUNT(DISTINCT) and non-additivity
- Ratios: return rates with different denominators
- Averages: AVG with NULL awareness
- Nested aggregation via CTEs (aggregate, then aggregate again)
- DATE_DIFF for time-between calculations
- Validating JOINs by comparing totals before and after
Practice 1: Building Business Metrics
You’ll calculate business metrics from the thelook e-commerce dataset and run into some common pitfalls.
Part 1: Get to know the data
Before calculating anything, look at the data. In the BigQuery Explorer panel, find bigquery-public-data → thelook_ecommerce → order_items, click the table name, then open the Preview tab. This shows a sample of rows at no cost.
What you should notice:
- Each row is one item in an order (not one order)
order_idrepeats when an order has multiple itemssale_priceis the price for that single itemshipped_atandreturned_atare NULL for many rows — that’s expected
Also check the Schema tab while you’re there — you’ll see all available columns. Do the same for the products table: you’ll need the category column from there. And check users — one row per customer, with id, created_at, and a few profile columns. You’ll use this table later in Practice 2.
Now look at inventory_items. The Preview tab will show you multiple rows with the same product_id. Confirm this with a quick query:
SELECT
product_id,
COUNT(*) AS inventory_rows
FROM `bigquery-public-data.thelook_ecommerce.inventory_items`
GROUP BY product_id
ORDER BY inventory_rows DESC
LIMIT 5A single product can appear dozens of times — one row per physical unit in stock. Keep this in mind: joining order_items to inventory_items on product_id will multiply rows. You’ll see exactly what this does to a metric in Practice 2, Bug 1.
Part 2: Revenue and order counts
Write a single query that returns, for 2025:
- Total number of items sold (
COUNT(*)) - Total number of distinct orders (
COUNT(DISTINCT order_id)) - Total revenue (
ROUND(SUM(sale_price), 2)) - Number of returned items (
COUNTIF(returned_at IS NOT NULL)) - Revenue from returned items (
ROUND(SUM(CASE WHEN returned_at IS NOT NULL THEN sale_price ELSE 0 END), 2))
What you should be able to answer: How many items were sold in 2025? How many distinct orders does that represent? What fraction of items were returned?
You’ll use this number later to validate that JOINs don’t inflate your totals.
Part 3: Revenue by product category
Join order_items to products to get revenue broken down by category:
SELECT
p.category,
ROUND(SUM(oi.sale_price), 2) AS revenue
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
JOIN `bigquery-public-data.thelook_ecommerce.products` p
ON p.id = oi.product_id
WHERE oi.created_at >= '2025-01-01' AND oi.created_at < '2026-01-01'
GROUP BY p.category
ORDER BY revenue DESCValidation check: Sum the per-category revenues (mentally or with a CTE). Does the total match your Part 2 revenue? If yes, the JOIN is clean — no fan-out. If not, something went wrong.
Every JOIN can potentially duplicate rows if the relationship is one-to-many. Comparing your total before and after the JOIN is a quick check. You’ll see what fan-out looks like in Practice 2’s Bug Hunt.
Part 4: Monthly revenue trend
Group order_items by month to see how revenue and returns change across 2025. Use:
DATE_TRUNC(DATE(created_at), MONTH)to group by month — this converts the timestamp to a date and truncates it to the first day of the month- Three aggregations:
COUNT(DISTINCT order_id) AS orders,ROUND(SUM(sale_price), 2) AS revenue,COUNTIF(returned_at IS NOT NULL) AS returned_items - Filter to 2025,
GROUP BY month,ORDER BY month
What you should be able to answer: Is there a seasonal pattern? Which month had the highest revenue? Are returns fairly constant across months or do they spike?
If you sum the monthly orders column across all 12 rows and compare to the yearly COUNT(DISTINCT order_id) from Part 2, the monthly sum is slightly higher (~738 orders, or 1.7%). This is because items within the same order can have created_at values in different calendar months (the timestamp issue from Part 1). An order with items in both January and February gets counted as a distinct order in both months.
Revenue is not affected — each item has one created_at and its sale_price lands in exactly one month. But any COUNT(DISTINCT ...) grouped by a time period can exhibit this non-additivity. You’ll see the same pattern with unique customers in Part 5.
CREATE OR REPLACE VIEW `econ250-2026.student__yourname.week3_monthly_trend` AS
-- Your monthly queryThis is submission item 1 of 2.
Part 5: Unique customers — the non-additivity test
Step 1: Write a query that returns COUNT(DISTINCT user_id) per month for 2025, one row per month ordered by month.
Step 2: Now get two numbers and write them both down:
- The sum of your monthly unique counts — wrap your Step 1 query in a CTE and
SUMthe result, or just add the monthly values by hand. - The yearly unique customer count — a single
COUNT(DISTINCT user_id)for all of 2025, noGROUP BY.
The two numbers should not match. Can you explain why? Think through what happens to a customer who places orders in both January and March.
The mismatch between summed monthly uniques and yearly uniques is a property of distinct counts. A customer who orders in January and March gets counted in both months but only once for the year. You cannot roll up distinct counts by summing — always compute them at the grain you need.
The same applies to COUNT(DISTINCT order_id) from Part 4 — orders whose items span month boundaries get counted in both months. Any COUNT(DISTINCT ...) is non-additive when the entity (customer, order) can appear in multiple groups.
Part 6: Orders per customer
How many orders does a typical customer place? This requires nested aggregation — aggregate once to get orders per customer, then aggregate again to get the distribution.
Build it in two steps with a CTE:
WITH customer_orders AS (
-- Step 1: one row per customer, with their order count
SELECT
user_id,
??? AS order_count
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
GROUP BY user_id
)
-- Step 2: how many customers placed 1 order? 2 orders? etc.
SELECT
???,
COUNT(*) AS num_customers
FROM customer_orders
GROUP BY ???
ORDER BY ???What you should be able to answer: What percentage of customers placed exactly one order in 2025? Is this a typical e-commerce distribution?
Notice that getting this distribution requires two separate GROUP BY passes — you can’t compute it in a single aggregation. This is one of the places where SQL’s row-at-a-time model shows its limits. Window functions (next week) make this simpler.
Part 7: Return rate — two definitions
Calculate the return rate two ways:
Approach A — item-level: What fraction of items were returned? Use COUNTIF(returned_at IS NOT NULL) as the numerator and COUNT(*) as the denominator. Multiply by 100.0 and round to 2 decimals.
Approach B — order-level: What fraction of orders had at least one return? This requires collapsing from item-level to order-level first. Use a CTE:
WITH order_returns AS (
SELECT
order_id,
MAX(???) AS has_return -- 1 if any item in the order was returned, 0 otherwise
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
GROUP BY order_id
)
SELECT
ROUND(??? * 100.0 / COUNT(*), 2) AS return_rate_orders
FROM order_returnsWhat you should find: The two numbers may be very close in this dataset because most thelook orders contain only 1-2 items, so the item-level and order-level denominators are similar. The conceptual difference still matters: an order with 3 items where 1 is returned is 33% at item level but 100% at order level.
If a manager asks “what’s our return rate?” — which definition would you use? Under what circumstances would you choose the other?
If you wanted to filter to only categories with a return rate above 10%, you couldn’t use WHERE — the rate doesn’t exist until after GROUP BY runs. That’s what HAVING is for: it filters groups after aggregation. WHERE filters rows before aggregation, HAVING filters groups after.
Part 8: Average shipping time
Now we return to the timestamp issue from Part 1. Calculate the naive average number of days between order creation and shipment, so you can compare it to the cleaned version in Practice 2:
SELECT
ROUND(AVG(DATE_DIFF(shipped_at, created_at, DAY)), 1) AS avg_ship_days
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'Think about this before running: Which items will AVG include? Which will it silently exclude? Check by running COUNTIF(shipped_at IS NULL) alongside the average. About a third of items haven’t shipped — AVG silently excludes them because DATE_DIFF returns NULL when shipped_at is NULL.
The result will also look suspiciously low. Remember the timestamp issue you found in Part 1 — 19% of items have shipped_at before created_at, producing negative values that drag the average down. We’ll fix this properly in Practice 2.
Precision: DAY vs SECOND
There’s a second problem: DATE_DIFF(..., DAY) truncates each row to a whole number of days before averaging. A 1.8-day shipment becomes 1, a 0.3-day shipment becomes 0. A more precise approach — compute in seconds, then convert after averaging:
ROUND(AVG(TIMESTAMP_DIFF(shipped_at, created_at, SECOND) / 86400.0), 1)Compare the two results. For the submission, use TIMESTAMP_DIFF(..., SECOND) / 86400.0 — it preserves sub-day precision.
Practice 2: The Metric Bug Hunt
Each query below has an error that produces wrong results. The queries run without syntax errors — the bugs are in the logic.
For each bug:
- Read the stated goal
- Run the query
- Figure out what’s wrong and why
- Write the corrected version
Bug 1: Revenue by category
Goal: Calculate total revenue by product category for 2025.
SELECT
p.category,
ROUND(SUM(oi.sale_price), 2) AS revenue
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
JOIN `bigquery-public-data.thelook_ecommerce.inventory_items` ii
ON oi.product_id = ii.product_id
JOIN `bigquery-public-data.thelook_ecommerce.products` p
ON p.id = oi.product_id
WHERE oi.created_at >= '2025-01-01' AND oi.created_at < '2026-01-01'
GROUP BY p.category
ORDER BY revenue DESCClue: Sum up the per-category revenues from this query and compare with the total revenue you calculated in Practice 1, Part 2. Are they the same?
Task: Explain what’s wrong and write the corrected query.
Bug 2: Return rate
Goal: Calculate the item-level return rate for 2025.
SELECT
ROUND(COUNT(returned_at) * 100.0 / COUNT(returned_at), 2) AS return_rate
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'Clue: The result is 100%. That can’t be right.
Task: What’s wrong with the denominator? Write the corrected query.
Bug 3: A defensive coding pattern
Goal: Count items that are NOT in “Complete” status for 2025.
This bug may not produce wrong results in this particular dataset — but the pattern will silently drop rows in production data where columns are nullable.
SELECT COUNT(*) AS not_complete
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE status != 'Complete'
AND created_at >= '2025-01-01' AND created_at < '2026-01-01'Investigation: First, check whether the status column has any NULL values:
SELECT
COUNTIF(status IS NULL) AS null_status,
COUNT(*) AS total
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'If there are NULLs, the main query silently undercounts — because NULL != 'Complete' evaluates to NULL (not TRUE), so those rows are dropped.
Task: Write a version that correctly includes items with NULL status. Use either OR status IS NULL or IS DISTINCT FROM.
If status happens to have no NULLs in thelook, the bug is “dormant” — it won’t cause wrong results here, but the same pattern will silently drop rows in any dataset where the column is nullable. The fix takes one extra line and avoids silent data loss in other datasets.
Bug 4: Fixing shipping time with data cleaning
Goal: Calculate an accurate average shipping time for 2025.
In Part 1 you discovered that items within the same order have different created_at timestamps, and some items appear to ship before they were “created.” The naive per-item query produces misleading results:
SELECT
ROUND(AVG(TIMESTAMP_DIFF(shipped_at, created_at, SECOND) / 86400.0), 1) AS avg_ship_days_naive
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
AND shipped_at IS NOT NULLRun this and note the result. It’s low because ~19% of rows contribute negative values.
The fix: Since shipped_at is an order-level timestamp (shared by all items), compare it to the order’s earliest created_at instead of each item’s individual timestamp. Build a CTE that computes one shipping time per order:
WITH order_ship AS (
SELECT
order_id,
MIN(created_at) AS order_date,
MIN(shipped_at) AS shipped_at -- same across all items, MIN just extracts it
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
AND shipped_at IS NOT NULL
GROUP BY order_id
)
SELECT
ROUND(AVG(TIMESTAMP_DIFF(shipped_at, order_date, SECOND) / 86400.0), 1) AS avg_ship_days_clean
FROM order_shipCompare the two results. The cleaned version is higher — and always non-negative, because the earliest item in an order is always created before (or at) shipment.
What changed:
MIN(created_at)per order eliminates the negative valuesTIMESTAMP_DIFF(..., SECOND) / 86400.0preserves sub-day precision (vsDATE_DIFF(..., DAY)which truncates)- Grouping by
order_idmeans each order contributes one shipping time, not one per item
shipped_at IS NOT NULL explicitly?
AVG skips NULLs automatically, so the filter is technically redundant for the average calculation. But including it makes the metric definition explicit — this is the average for shipped orders only. When someone reads your query six months later, they immediately know unshipped orders are excluded by design, not by accident.
This pattern comes up often: investigate the data, aggregate to the right grain, fix known issues, then compute your metric. You’ll use this same CTE in the Synthesis section.
Bug 5: Company-wide average order value
Goal: Calculate the company-wide average order value (AOV) by averaging the per-category AOVs.
WITH category_aov AS (
SELECT
p.category,
ROUND(SUM(oi.sale_price) / COUNT(DISTINCT oi.order_id), 2) AS aov
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
JOIN `bigquery-public-data.thelook_ecommerce.products` p
ON p.id = oi.product_id
WHERE oi.created_at >= '2025-01-01' AND oi.created_at < '2026-01-01'
GROUP BY p.category
)
SELECT ROUND(AVG(aov), 2) AS avg_of_category_aovs
FROM category_aovClue: Compare this result with the overall AOV you can compute directly:
SELECT ROUND(SUM(sale_price) / COUNT(DISTINCT order_id), 2) AS overall_aov
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'The two numbers are noticeably different. Which is the correct “overall AOV”?
Task: Explain why averaging the per-category AOVs gives a different result from computing AOV directly. (Hint: categories like “Socks & Hosiery” have low AOV and “Outerwear & Coats” have high AOV — but they don’t have the same number of orders. AVG gives each category equal weight regardless of size.)
Bug 6: Average revenue per customer
Goal: Calculate the average revenue per customer for all users who registered in 2025 — including customers who registered but placed no orders, who should contribute $0 to the average.
Before looking at the query, open the users table in BigQuery Explorer and check the Preview tab. Note that created_at here is the registration date, not an order date — a user can exist in users without having any rows in order_items.
WITH customer_spend AS (
SELECT
u.id,
SUM(oi.sale_price) AS total_spend
FROM `bigquery-public-data.thelook_ecommerce.users` u
JOIN `bigquery-public-data.thelook_ecommerce.order_items` oi
ON oi.user_id = u.id
WHERE u.created_at >= '2025-01-01' AND u.created_at < '2026-01-01'
GROUP BY u.id
)
SELECT ROUND(AVG(total_spend), 2) AS avg_revenue_per_customer
FROM customer_spendClue: Check how many users appear in customer_spend vs. how many registered in 2025:
SELECT COUNT(*) AS registered_in_2025
FROM `bigquery-public-data.thelook_ecommerce.users`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'If fewer users appear in customer_spend than registered in 2025, the INNER JOIN is silently excluding users who placed no orders — and the average is computed over a smaller, higher-spending population.
Task: Fix this with a LEFT JOIN. Use COALESCE(SUM(oi.sale_price), 0) to replace NULL spend (no orders) with zero.
The original query only filters on u.created_at in WHERE — there’s no filter on order_items columns. So the LEFT JOIN fix is straightforward: change JOIN to LEFT JOIN and add COALESCE.
Note that the fixed query includes all orders ever placed by 2025-registered users, not just their 2025 orders. If you wanted to restrict to 2025 orders only, you’d need to add that date filter in the ON clause, not in WHERE:
FROM `bigquery-public-data.thelook_ecommerce.users` u
LEFT JOIN `bigquery-public-data.thelook_ecommerce.order_items` oi
ON oi.user_id = u.id
AND oi.created_at >= '2025-01-01' AND oi.created_at < '2026-01-01'
WHERE u.created_at >= '2025-01-01' AND u.created_at < '2026-01-01'Why not WHERE? A WHERE condition on the right table’s columns silently converts a LEFT JOIN back into an inner join — unmatched rows have NULL for those columns, which fails the WHERE condition and drops them.
NULL-handling reference: Three functions for working with NULLs in aggregations:
| Function | What it does | Example |
|---|---|---|
COALESCE(x, fallback) |
First non-NULL value | COALESCE(SUM(sale_price), 0) |
IFNULL(x, fallback) |
Two-argument shorthand | IFNULL(SUM(sale_price), 0) |
NULLIF(x, value) |
Return NULL if x equals value | NULLIF(total_spend, 0) |
COALESCE and IFNULL are interchangeable here. NULLIF goes the other direction — useful when you want to convert a sentinel value (like 0) back to NULL so that AVG excludes it rather than dragging the average down.
Synthesis: Monthly Metrics Summary
Combine everything from today’s sessions into a single query that produces a monthly business summary for 2025. Most metrics come from a GROUP BY month on order_items, but avg_ship_days needs the order-level cleaning pattern you built in Bug 4.
Your query should return one row per month with these columns:
| Column | Notes |
|---|---|
month |
First day of month (DATE) |
total_revenue |
Total sale_price, rounded to 2 decimals |
total_orders |
Count of distinct orders |
aov |
Revenue divided by orders, rounded to 2 decimals |
unique_customers |
Count of distinct customers |
return_rate |
Returned items / total items as a percentage, rounded to 2 decimals |
avg_ship_days |
Cleaned: order-level shipping time using MIN(created_at) as order date, TIMESTAMP_DIFF in seconds / 86400.0, rounded to 1 decimal |
Structure: 3 CTEs + JOIN
Revenue, orders, and return rate are item-level metrics — they work fine with a GROUP BY month directly on order_items. But shipping time needs to be computed at the order level first (Bug 4 fix), then averaged per month. These are two different grains, so compute them separately and join:
WITH order_shipping AS (
-- One row per shipped order: cleaned shipping time
SELECT
order_id,
DATE_TRUNC(DATE(MIN(created_at)), MONTH) AS month, -- assigns order to month of its earliest item
TIMESTAMP_DIFF(MIN(shipped_at), MIN(created_at), SECOND) / 86400.0 AS ship_days
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
AND shipped_at IS NOT NULL
GROUP BY order_id
),
monthly_shipping AS (
-- Average shipping time per month (order-weighted)
SELECT month, ROUND(AVG(ship_days), 1) AS avg_ship_days
FROM order_shipping
GROUP BY month
),
monthly_metrics AS (
-- All other metrics at item level
SELECT
DATE_TRUNC(DATE(created_at), MONTH) AS month,
...
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
GROUP BY month
)
SELECT
m.month,
m.total_revenue,
m.total_orders,
m.aov,
m.unique_customers,
m.return_rate,
s.avg_ship_days
FROM monthly_metrics m
LEFT JOIN monthly_shipping s ON s.month = m.month
ORDER BY m.monthFill in the ... in monthly_metrics with the column expressions you built in Parts 2–7.
Requirements:
- Filter to 2025 in both
monthly_metricsandorder_shipping - Return rate denominator must be
COUNT(*), notCOUNT(returned_at) - Use
LEFT JOINformonthly_shipping(defensive — ensures no month is dropped even if it has no shipped orders)
The order_shipping CTE assigns each order to the month of its earliest item (MIN(created_at)), while monthly_metrics assigns each item to its own created_at month. For orders whose items span a month boundary (~1.7% of shipped orders), shipping time and revenue land in different months. The effect is small enough to ignore here, but it’s worth recognizing: whenever you join metrics computed at different grains, check whether the grouping keys align.
Checkpoint: Your query should return exactly 12 rows. Verify that:
- Revenue numbers roughly match Part 2
- AOV is reasonable (tens of dollars per order)
- Return rate is fairly consistent across months
- Unique customers per month sum to MORE than yearly uniques (the non-additivity you observed in Practice 1, Part 5)
- All
avg_ship_daysvalues are positive (a few days)
Save as Submission View ⭐ SUBMISSION
This creates your submission item: the week3_submission view. See the Your Submission section at the top for full requirements.
CREATE OR REPLACE VIEW `econ250-2026.student__yourname.week3_submission` AS
-- Your monthly metrics summary query hereReplace yourname with your actual dataset suffix.
Verify it works
SELECT *
FROM `econ250-2026.student__yourname.week3_submission`
ORDER BY monthCheck: 12 rows, all columns populated, numbers look reasonable.
Going Further (optional)
- New vs returning customers: For each month, how many customers are placing their first-ever order vs. coming back? (Hint: compare each order’s
created_atto the customer’sMIN(created_at)using a CTE. This is awkward with pure aggregation — window functions in Week 4 will make it much simpler.) - Revenue per customer: Divide monthly revenue by monthly unique customers. How does this differ from AOV?
- Month-over-month growth: Using a self-join on a monthly CTE (join each month to the previous month via
DATE_ADD(..., INTERVAL 1 MONTH)), calculate the percentage change in revenue month to month. The first month will have NULL growth. Next week’sLAGfunction simplifies this.
Troubleshooting
| Problem | Solution |
|---|---|
| Query scans too much data | Add a date filter: WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01' |
| Revenue totals don’t match after JOIN | Check for fan-out — compare COUNT(*) before and after the JOIN |
| Return rate is 100% | Check your denominator — COUNT(returned_at) only counts non-NULL values |
| AVG returns unexpected value | Remember: AVG skips NULLs. Use COALESCE(value, 0) if NULLs should count as zeros |
| Shipping time is negative | Use the order-level CTE from Bug 4: MIN(created_at) per order eliminates negative values |
| Monthly uniques sum to more than yearly | Correct! That’s the non-additivity property of distinct counts |
| DATE_TRUNC error | Convert TIMESTAMP to DATE first: DATE_TRUNC(DATE(created_at), MONTH) |
| View creation fails | Check that you replaced yourname with your actual dataset suffix |