Week 3 Practice: Business Metrics

ECON250: Big Data Analytics

Published

February 18, 2026

Modified

March 21, 2026

Your Submission

Deadline: 3 days after the practice sessions Points: 2 points for satisfactory completion

Required Submission Items

Create the following two views in your personal dataset (econ250-2026.student__yourname):


View 1: week3_monthly_trend

What to create: Monthly order and revenue summary for 2025, one row per month.

Required columns: month, orders, revenue, returned_items

Where to find it: Practice 1, Part 4.


View 2: week3_submission

What to create: A complete monthly business summary for 2025, one row per month.

Required columns:

  • month — first day of each month (DATE)
  • total_revenue — total sale_price, rounded to 2 decimals
  • total_orders — count of distinct orders
  • aov — average order value (revenue / orders), rounded to 2 decimals
  • unique_customers — count of distinct customers
  • return_rate — returned items / total items as a percentage, rounded to 2 decimals
  • avg_ship_days — cleaned order-level shipping time: use MIN(created_at) per order as the order date, TIMESTAMP_DIFF in seconds / 86400.0, rounded to 1 decimal

Where to find it: Practice 2, Synthesis section.

Quality check: View must run without errors, contain 12 rows (Jan–Dec 2025), all avg_ship_days values positive, and use correct NULL handling in return rate calculations.


How to Submit

For each view:

  1. In BigQuery Explorer, find the view in your student__yourname dataset
  2. Click the three dots menu (⋮) next to the view name
  3. Select “Copy link”
  4. Paste both links in Moodle (one per line)

Before You Start

Required: BigQuery console open at console.cloud.google.com/bigquery?project=econ250-2026

Dataset: bigquery-public-data.thelook_ecommerce — a synthetic e-commerce dataset with orders, products, and users. The data is continuously generated, so exact row counts will vary slightly.

Analysis period: All queries should filter to calendar year 2025:

WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'

Tables you’ll use

Table Key columns Notes
order_items order_id, user_id, product_id, sale_price, status, created_at, shipped_at, returned_at Main fact table — one row per item purchased
products id, category, brand, department, retail_price Product catalog — JOIN on products.id = order_items.product_id
users id, first_name, last_name, country, created_at Customer data — JOIN on users.id = order_items.user_id
inventory_items id, product_id, cost, sold_at Inventory tracking — multiple rows per product (used in Bug Hunt)
NoteNULL columns in order_items

Some columns are NULL by design, not because of data quality issues:

  • shipped_at — NULL if the item hasn’t shipped yet
  • delivered_at — NULL if not yet delivered
  • returned_at — NULL if the item was not returned

These NULLs carry business meaning. Don’t filter them out — you’ll need them for metric calculations.

What you’ll practice

  • Additive metrics: SUM, COUNT, COUNTIF, DATE_TRUNC grouping
  • Distinct counts: COUNT(DISTINCT) and non-additivity
  • Ratios: return rates with different denominators
  • Averages: AVG with NULL awareness
  • Nested aggregation via CTEs (aggregate, then aggregate again)
  • DATE_DIFF for time-between calculations
  • Validating JOINs by comparing totals before and after

Practice 1: Building Business Metrics

You’ll calculate business metrics from the thelook e-commerce dataset and run into some common pitfalls.

Part 1: Get to know the data

Before calculating anything, look at the data. In the BigQuery Explorer panel, find bigquery-public-datathelook_ecommerceorder_items, click the table name, then open the Preview tab. This shows a sample of rows at no cost.

What you should notice:

  • Each row is one item in an order (not one order)
  • order_id repeats when an order has multiple items
  • sale_price is the price for that single item
  • shipped_at and returned_at are NULL for many rows — that’s expected

Also check the Schema tab while you’re there — you’ll see all available columns. Do the same for the products table: you’ll need the category column from there. And check users — one row per customer, with id, created_at, and a few profile columns. You’ll use this table later in Practice 2.

Now look at inventory_items. The Preview tab will show you multiple rows with the same product_id. Confirm this with a quick query:

SELECT
  product_id,
  COUNT(*) AS inventory_rows
FROM `bigquery-public-data.thelook_ecommerce.inventory_items`
GROUP BY product_id
ORDER BY inventory_rows DESC
LIMIT 5

A single product can appear dozens of times — one row per physical unit in stock. Keep this in mind: joining order_items to inventory_items on product_id will multiply rows. You’ll see exactly what this does to a metric in Practice 2, Bug 1.

Data investigation: do all items in an order share the same timestamp?

You noticed that order_id repeats when an order has multiple items. But do those items all have the same created_at? Check:

SELECT
  COUNT(DISTINCT order_id) AS orders_with_date_spread
FROM (
  SELECT order_id
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
  GROUP BY order_id
  HAVING COUNT(DISTINCT DATE(created_at)) > 1
)

Over 25% of orders have items on different calendar days. Look at a few of these orders to see the pattern:

SELECT order_id, id AS item_id, created_at, shipped_at, status, sale_price
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE order_id IN (
  SELECT order_id
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
  GROUP BY order_id
  HAVING COUNT(DISTINCT DATE(created_at)) > 1
  LIMIT 3
)
ORDER BY order_id, created_at

What you should notice:

  • Items within the same order are spaced roughly 1 day apart in created_at
  • All items in the order share the exact same shipped_at timestamp
  • Some items appear to have been shipped before they were created — shipped_at is earlier than created_at
ImportantThis is a data generation artifact

The thelook dataset is synthetic. The generator creates items within an order one at a time, roughly one per simulated day, but assigns a single shared shipped_at to the entire order. This means created_at in order_items is not really an “order date” — it’s a per-item synthetic timestamp.

For most metrics (revenue, order counts, return rates), this doesn’t matter — each item’s sale_price and returned_at are correct regardless of timing. But for shipping time calculations, comparing shipped_at to an item’s created_at will produce negative values for about 19% of rows. We’ll fix this in Practice 2.

Always investigate your data before computing metrics. Timestamps that look reasonable at first may have structural issues.

For Parts 2–7 below, work directly with order_items as-is — no cleaning needed. The timestamp issue only affects shipping time, which we’ll address in Part 8 and fix properly in Practice 2.

Part 2: Revenue and order counts

Write a single query that returns, for 2025:

  • Total number of items sold (COUNT(*))
  • Total number of distinct orders (COUNT(DISTINCT order_id))
  • Total revenue (ROUND(SUM(sale_price), 2))
  • Number of returned items (COUNTIF(returned_at IS NOT NULL))
  • Revenue from returned items (ROUND(SUM(CASE WHEN returned_at IS NOT NULL THEN sale_price ELSE 0 END), 2))

What you should be able to answer: How many items were sold in 2025? How many distinct orders does that represent? What fraction of items were returned?

TipWrite Down Your Total Revenue

You’ll use this number later to validate that JOINs don’t inflate your totals.

Part 3: Revenue by product category

Join order_items to products to get revenue broken down by category:

SELECT
  p.category,
  ROUND(SUM(oi.sale_price), 2) AS revenue
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
JOIN `bigquery-public-data.thelook_ecommerce.products` p
  ON p.id = oi.product_id
WHERE oi.created_at >= '2025-01-01' AND oi.created_at < '2026-01-01'
GROUP BY p.category
ORDER BY revenue DESC

Validation check: Sum the per-category revenues (mentally or with a CTE). Does the total match your Part 2 revenue? If yes, the JOIN is clean — no fan-out. If not, something went wrong.

TipWhy Validate After a JOIN?

Every JOIN can potentially duplicate rows if the relationship is one-to-many. Comparing your total before and after the JOIN is a quick check. You’ll see what fan-out looks like in Practice 2’s Bug Hunt.

Part 4: Monthly revenue trend

Group order_items by month to see how revenue and returns change across 2025. Use:

  • DATE_TRUNC(DATE(created_at), MONTH) to group by month — this converts the timestamp to a date and truncates it to the first day of the month
  • Three aggregations: COUNT(DISTINCT order_id) AS orders, ROUND(SUM(sale_price), 2) AS revenue, COUNTIF(returned_at IS NOT NULL) AS returned_items
  • Filter to 2025, GROUP BY month, ORDER BY month

What you should be able to answer: Is there a seasonal pattern? Which month had the highest revenue? Are returns fairly constant across months or do they spike?

NoteMonthly order counts won’t sum to the yearly total

If you sum the monthly orders column across all 12 rows and compare to the yearly COUNT(DISTINCT order_id) from Part 2, the monthly sum is slightly higher (~738 orders, or 1.7%). This is because items within the same order can have created_at values in different calendar months (the timestamp issue from Part 1). An order with items in both January and February gets counted as a distinct order in both months.

Revenue is not affected — each item has one created_at and its sale_price lands in exactly one month. But any COUNT(DISTINCT ...) grouped by a time period can exhibit this non-additivity. You’ll see the same pattern with unique customers in Part 5.

ImportantRequired: Save This View
CREATE OR REPLACE VIEW `econ250-2026.student__yourname.week3_monthly_trend` AS
-- Your monthly query

This is submission item 1 of 2.

Part 5: Unique customers — the non-additivity test

Step 1: Write a query that returns COUNT(DISTINCT user_id) per month for 2025, one row per month ordered by month.

Step 2: Now get two numbers and write them both down:

  1. The sum of your monthly unique counts — wrap your Step 1 query in a CTE and SUM the result, or just add the monthly values by hand.
  2. The yearly unique customer count — a single COUNT(DISTINCT user_id) for all of 2025, no GROUP BY.

The two numbers should not match. Can you explain why? Think through what happens to a customer who places orders in both January and March.

ImportantThis Is Not a Bug

The mismatch between summed monthly uniques and yearly uniques is a property of distinct counts. A customer who orders in January and March gets counted in both months but only once for the year. You cannot roll up distinct counts by summing — always compute them at the grain you need.

The same applies to COUNT(DISTINCT order_id) from Part 4 — orders whose items span month boundaries get counted in both months. Any COUNT(DISTINCT ...) is non-additive when the entity (customer, order) can appear in multiple groups.

Part 6: Orders per customer

How many orders does a typical customer place? This requires nested aggregation — aggregate once to get orders per customer, then aggregate again to get the distribution.

Build it in two steps with a CTE:

WITH customer_orders AS (
  -- Step 1: one row per customer, with their order count
  SELECT
    user_id,
    ??? AS order_count
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
  GROUP BY user_id
)
-- Step 2: how many customers placed 1 order? 2 orders? etc.
SELECT
  ???,
  COUNT(*) AS num_customers
FROM customer_orders
GROUP BY ???
ORDER BY ???

What you should be able to answer: What percentage of customers placed exactly one order in 2025? Is this a typical e-commerce distribution?

NoteA SQL Limitation

Notice that getting this distribution requires two separate GROUP BY passes — you can’t compute it in a single aggregation. This is one of the places where SQL’s row-at-a-time model shows its limits. Window functions (next week) make this simpler.

Part 7: Return rate — two definitions

Calculate the return rate two ways:

Approach A — item-level: What fraction of items were returned? Use COUNTIF(returned_at IS NOT NULL) as the numerator and COUNT(*) as the denominator. Multiply by 100.0 and round to 2 decimals.

Approach B — order-level: What fraction of orders had at least one return? This requires collapsing from item-level to order-level first. Use a CTE:

WITH order_returns AS (
  SELECT
    order_id,
    MAX(???) AS has_return  -- 1 if any item in the order was returned, 0 otherwise
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
  GROUP BY order_id
)
SELECT
  ROUND(??? * 100.0 / COUNT(*), 2) AS return_rate_orders
FROM order_returns

What you should find: The two numbers may be very close in this dataset because most thelook orders contain only 1-2 items, so the item-level and order-level denominators are similar. The conceptual difference still matters: an order with 3 items where 1 is returned is 33% at item level but 100% at order level.

If a manager asks “what’s our return rate?” — which definition would you use? Under what circumstances would you choose the other?

TipHAVING vs WHERE

If you wanted to filter to only categories with a return rate above 10%, you couldn’t use WHERE — the rate doesn’t exist until after GROUP BY runs. That’s what HAVING is for: it filters groups after aggregation. WHERE filters rows before aggregation, HAVING filters groups after.

Part 8: Average shipping time

Now we return to the timestamp issue from Part 1. Calculate the naive average number of days between order creation and shipment, so you can compare it to the cleaned version in Practice 2:

SELECT
  ROUND(AVG(DATE_DIFF(shipped_at, created_at, DAY)), 1) AS avg_ship_days
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'

Think about this before running: Which items will AVG include? Which will it silently exclude? Check by running COUNTIF(shipped_at IS NULL) alongside the average. About a third of items haven’t shipped — AVG silently excludes them because DATE_DIFF returns NULL when shipped_at is NULL.

The result will also look suspiciously low. Remember the timestamp issue you found in Part 1 — 19% of items have shipped_at before created_at, producing negative values that drag the average down. We’ll fix this properly in Practice 2.

Precision: DAY vs SECOND

There’s a second problem: DATE_DIFF(..., DAY) truncates each row to a whole number of days before averaging. A 1.8-day shipment becomes 1, a 0.3-day shipment becomes 0. A more precise approach — compute in seconds, then convert after averaging:

ROUND(AVG(TIMESTAMP_DIFF(shipped_at, created_at, SECOND) / 86400.0), 1)

Compare the two results. For the submission, use TIMESTAMP_DIFF(..., SECOND) / 86400.0 — it preserves sub-day precision.


Practice 2: The Metric Bug Hunt

Each query below has an error that produces wrong results. The queries run without syntax errors — the bugs are in the logic.

For each bug:

  1. Read the stated goal
  2. Run the query
  3. Figure out what’s wrong and why
  4. Write the corrected version

Bug 1: Revenue by category

Goal: Calculate total revenue by product category for 2025.

SELECT
  p.category,
  ROUND(SUM(oi.sale_price), 2) AS revenue
FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
JOIN `bigquery-public-data.thelook_ecommerce.inventory_items` ii
  ON oi.product_id = ii.product_id
JOIN `bigquery-public-data.thelook_ecommerce.products` p
  ON p.id = oi.product_id
WHERE oi.created_at >= '2025-01-01' AND oi.created_at < '2026-01-01'
GROUP BY p.category
ORDER BY revenue DESC

Clue: Sum up the per-category revenues from this query and compare with the total revenue you calculated in Practice 1, Part 2. Are they the same?

Task: Explain what’s wrong and write the corrected query.

Bug 2: Return rate

Goal: Calculate the item-level return rate for 2025.

SELECT
  ROUND(COUNT(returned_at) * 100.0 / COUNT(returned_at), 2) AS return_rate
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'

Clue: The result is 100%. That can’t be right.

Task: What’s wrong with the denominator? Write the corrected query.

Bug 3: A defensive coding pattern

Goal: Count items that are NOT in “Complete” status for 2025.

This bug may not produce wrong results in this particular dataset — but the pattern will silently drop rows in production data where columns are nullable.

SELECT COUNT(*) AS not_complete
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE status != 'Complete'
  AND created_at >= '2025-01-01' AND created_at < '2026-01-01'

Investigation: First, check whether the status column has any NULL values:

SELECT
  COUNTIF(status IS NULL) AS null_status,
  COUNT(*) AS total
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'

If there are NULLs, the main query silently undercounts — because NULL != 'Complete' evaluates to NULL (not TRUE), so those rows are dropped.

Task: Write a version that correctly includes items with NULL status. Use either OR status IS NULL or IS DISTINCT FROM.

NoteEven If This Dataset Has No NULLs…

If status happens to have no NULLs in thelook, the bug is “dormant” — it won’t cause wrong results here, but the same pattern will silently drop rows in any dataset where the column is nullable. The fix takes one extra line and avoids silent data loss in other datasets.

Bug 4: Fixing shipping time with data cleaning

Goal: Calculate an accurate average shipping time for 2025.

In Part 1 you discovered that items within the same order have different created_at timestamps, and some items appear to ship before they were “created.” The naive per-item query produces misleading results:

SELECT
  ROUND(AVG(TIMESTAMP_DIFF(shipped_at, created_at, SECOND) / 86400.0), 1) AS avg_ship_days_naive
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
  AND shipped_at IS NOT NULL

Run this and note the result. It’s low because ~19% of rows contribute negative values.

The fix: Since shipped_at is an order-level timestamp (shared by all items), compare it to the order’s earliest created_at instead of each item’s individual timestamp. Build a CTE that computes one shipping time per order:

WITH order_ship AS (
  SELECT
    order_id,
    MIN(created_at) AS order_date,
    MIN(shipped_at) AS shipped_at  -- same across all items, MIN just extracts it
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
    AND shipped_at IS NOT NULL
  GROUP BY order_id
)
SELECT
  ROUND(AVG(TIMESTAMP_DIFF(shipped_at, order_date, SECOND) / 86400.0), 1) AS avg_ship_days_clean
FROM order_ship

Compare the two results. The cleaned version is higher — and always non-negative, because the earliest item in an order is always created before (or at) shipment.

What changed:

  • MIN(created_at) per order eliminates the negative values
  • TIMESTAMP_DIFF(..., SECOND) / 86400.0 preserves sub-day precision (vs DATE_DIFF(..., DAY) which truncates)
  • Grouping by order_id means each order contributes one shipping time, not one per item
NoteWhy shipped_at IS NOT NULL explicitly?

AVG skips NULLs automatically, so the filter is technically redundant for the average calculation. But including it makes the metric definition explicit — this is the average for shipped orders only. When someone reads your query six months later, they immediately know unshipped orders are excluded by design, not by accident.

This pattern comes up often: investigate the data, aggregate to the right grain, fix known issues, then compute your metric. You’ll use this same CTE in the Synthesis section.

Bug 5: Company-wide average order value

Goal: Calculate the company-wide average order value (AOV) by averaging the per-category AOVs.

WITH category_aov AS (
  SELECT
    p.category,
    ROUND(SUM(oi.sale_price) / COUNT(DISTINCT oi.order_id), 2) AS aov
  FROM `bigquery-public-data.thelook_ecommerce.order_items` oi
  JOIN `bigquery-public-data.thelook_ecommerce.products` p
    ON p.id = oi.product_id
  WHERE oi.created_at >= '2025-01-01' AND oi.created_at < '2026-01-01'
  GROUP BY p.category
)
SELECT ROUND(AVG(aov), 2) AS avg_of_category_aovs
FROM category_aov

Clue: Compare this result with the overall AOV you can compute directly:

SELECT ROUND(SUM(sale_price) / COUNT(DISTINCT order_id), 2) AS overall_aov
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'

The two numbers are noticeably different. Which is the correct “overall AOV”?

Task: Explain why averaging the per-category AOVs gives a different result from computing AOV directly. (Hint: categories like “Socks & Hosiery” have low AOV and “Outerwear & Coats” have high AOV — but they don’t have the same number of orders. AVG gives each category equal weight regardless of size.)

Bug 6: Average revenue per customer

Goal: Calculate the average revenue per customer for all users who registered in 2025 — including customers who registered but placed no orders, who should contribute $0 to the average.

Before looking at the query, open the users table in BigQuery Explorer and check the Preview tab. Note that created_at here is the registration date, not an order date — a user can exist in users without having any rows in order_items.

WITH customer_spend AS (
  SELECT
    u.id,
    SUM(oi.sale_price) AS total_spend
  FROM `bigquery-public-data.thelook_ecommerce.users` u
  JOIN `bigquery-public-data.thelook_ecommerce.order_items` oi
    ON oi.user_id = u.id
  WHERE u.created_at >= '2025-01-01' AND u.created_at < '2026-01-01'
  GROUP BY u.id
)
SELECT ROUND(AVG(total_spend), 2) AS avg_revenue_per_customer
FROM customer_spend

Clue: Check how many users appear in customer_spend vs. how many registered in 2025:

SELECT COUNT(*) AS registered_in_2025
FROM `bigquery-public-data.thelook_ecommerce.users`
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'

If fewer users appear in customer_spend than registered in 2025, the INNER JOIN is silently excluding users who placed no orders — and the average is computed over a smaller, higher-spending population.

Task: Fix this with a LEFT JOIN. Use COALESCE(SUM(oi.sale_price), 0) to replace NULL spend (no orders) with zero.

WarningWatch out when adding filters to LEFT-JOINed tables

The original query only filters on u.created_at in WHERE — there’s no filter on order_items columns. So the LEFT JOIN fix is straightforward: change JOIN to LEFT JOIN and add COALESCE.

Note that the fixed query includes all orders ever placed by 2025-registered users, not just their 2025 orders. If you wanted to restrict to 2025 orders only, you’d need to add that date filter in the ON clause, not in WHERE:

FROM `bigquery-public-data.thelook_ecommerce.users` u
LEFT JOIN `bigquery-public-data.thelook_ecommerce.order_items` oi
  ON oi.user_id = u.id
  AND oi.created_at >= '2025-01-01' AND oi.created_at < '2026-01-01'
WHERE u.created_at >= '2025-01-01' AND u.created_at < '2026-01-01'

Why not WHERE? A WHERE condition on the right table’s columns silently converts a LEFT JOIN back into an inner join — unmatched rows have NULL for those columns, which fails the WHERE condition and drops them.

NULL-handling reference: Three functions for working with NULLs in aggregations:

Function What it does Example
COALESCE(x, fallback) First non-NULL value COALESCE(SUM(sale_price), 0)
IFNULL(x, fallback) Two-argument shorthand IFNULL(SUM(sale_price), 0)
NULLIF(x, value) Return NULL if x equals value NULLIF(total_spend, 0)

COALESCE and IFNULL are interchangeable here. NULLIF goes the other direction — useful when you want to convert a sentinel value (like 0) back to NULL so that AVG excludes it rather than dragging the average down.


Synthesis: Monthly Metrics Summary

Combine everything from today’s sessions into a single query that produces a monthly business summary for 2025. Most metrics come from a GROUP BY month on order_items, but avg_ship_days needs the order-level cleaning pattern you built in Bug 4.

Your query should return one row per month with these columns:

Column Notes
month First day of month (DATE)
total_revenue Total sale_price, rounded to 2 decimals
total_orders Count of distinct orders
aov Revenue divided by orders, rounded to 2 decimals
unique_customers Count of distinct customers
return_rate Returned items / total items as a percentage, rounded to 2 decimals
avg_ship_days Cleaned: order-level shipping time using MIN(created_at) as order date, TIMESTAMP_DIFF in seconds / 86400.0, rounded to 1 decimal

Structure: 3 CTEs + JOIN

Revenue, orders, and return rate are item-level metrics — they work fine with a GROUP BY month directly on order_items. But shipping time needs to be computed at the order level first (Bug 4 fix), then averaged per month. These are two different grains, so compute them separately and join:

WITH order_shipping AS (
  -- One row per shipped order: cleaned shipping time
  SELECT
    order_id,
    DATE_TRUNC(DATE(MIN(created_at)), MONTH) AS month,  -- assigns order to month of its earliest item
    TIMESTAMP_DIFF(MIN(shipped_at), MIN(created_at), SECOND) / 86400.0 AS ship_days
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
    AND shipped_at IS NOT NULL
  GROUP BY order_id
),
monthly_shipping AS (
  -- Average shipping time per month (order-weighted)
  SELECT month, ROUND(AVG(ship_days), 1) AS avg_ship_days
  FROM order_shipping
  GROUP BY month
),
monthly_metrics AS (
  -- All other metrics at item level
  SELECT
    DATE_TRUNC(DATE(created_at), MONTH) AS month,
    ...
  FROM `bigquery-public-data.thelook_ecommerce.order_items`
  WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
  GROUP BY month
)
SELECT
  m.month,
  m.total_revenue,
  m.total_orders,
  m.aov,
  m.unique_customers,
  m.return_rate,
  s.avg_ship_days
FROM monthly_metrics m
LEFT JOIN monthly_shipping s ON s.month = m.month
ORDER BY m.month

Fill in the ... in monthly_metrics with the column expressions you built in Parts 2–7.

Requirements:

  • Filter to 2025 in both monthly_metrics and order_shipping
  • Return rate denominator must be COUNT(*), not COUNT(returned_at)
  • Use LEFT JOIN for monthly_shipping (defensive — ensures no month is dropped even if it has no shipped orders)
NoteDifferent grains, slightly different month assignments

The order_shipping CTE assigns each order to the month of its earliest item (MIN(created_at)), while monthly_metrics assigns each item to its own created_at month. For orders whose items span a month boundary (~1.7% of shipped orders), shipping time and revenue land in different months. The effect is small enough to ignore here, but it’s worth recognizing: whenever you join metrics computed at different grains, check whether the grouping keys align.

Checkpoint: Your query should return exactly 12 rows. Verify that:

  • Revenue numbers roughly match Part 2
  • AOV is reasonable (tens of dollars per order)
  • Return rate is fairly consistent across months
  • Unique customers per month sum to MORE than yearly uniques (the non-additivity you observed in Practice 1, Part 5)
  • All avg_ship_days values are positive (a few days)

Save as Submission View ⭐ SUBMISSION

ImportantRequired Submission

This creates your submission item: the week3_submission view. See the Your Submission section at the top for full requirements.

CREATE OR REPLACE VIEW `econ250-2026.student__yourname.week3_submission` AS
-- Your monthly metrics summary query here

Replace yourname with your actual dataset suffix.

Verify it works

SELECT *
FROM `econ250-2026.student__yourname.week3_submission`
ORDER BY month

Check: 12 rows, all columns populated, numbers look reasonable.


Going Further (optional)

  • New vs returning customers: For each month, how many customers are placing their first-ever order vs. coming back? (Hint: compare each order’s created_at to the customer’s MIN(created_at) using a CTE. This is awkward with pure aggregation — window functions in Week 4 will make it much simpler.)
  • Revenue per customer: Divide monthly revenue by monthly unique customers. How does this differ from AOV?
  • Month-over-month growth: Using a self-join on a monthly CTE (join each month to the previous month via DATE_ADD(..., INTERVAL 1 MONTH)), calculate the percentage change in revenue month to month. The first month will have NULL growth. Next week’s LAG function simplifies this.

Troubleshooting

Problem Solution
Query scans too much data Add a date filter: WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01'
Revenue totals don’t match after JOIN Check for fan-out — compare COUNT(*) before and after the JOIN
Return rate is 100% Check your denominator — COUNT(returned_at) only counts non-NULL values
AVG returns unexpected value Remember: AVG skips NULLs. Use COALESCE(value, 0) if NULLs should count as zeros
Shipping time is negative Use the order-level CTE from Bug 4: MIN(created_at) per order eliminates negative values
Monthly uniques sum to more than yearly Correct! That’s the non-additivity property of distinct counts
DATE_TRUNC error Convert TIMESTAMP to DATE first: DATE_TRUNC(DATE(created_at), MONTH)
View creation fails Check that you replaced yourname with your actual dataset suffix