Week 1 Practice: Getting Started with BigQuery
ECON250: Big Data Analytics
Your Submission
Deadline: 3 days after practice session (2 points for satisfactory completion)
Create the following objects in your personal dataset (student__yourname):
| Object | Name | Requirements |
|---|---|---|
| Table (upload) | weather_australia |
Upload the weatherAUS.csv file from Moodle Week 1 materials. |
| View | week1_regional_summary |
Superstore data grouped by Region and Category. Include: order count, total sales, total profit, profit margin %. Order by total sales descending. |
| Table (from query) | week1_weather_summary |
Aggregated summary of your uploaded weather data, grouped by Location. Include: observation count, average max temperature, average rainfall, count of rainy days. Order by average rainfall descending. |
| External table | week1_live_sheet |
Connected to your personal copy of the provided Google Sheet. External tables read directly from the source rather than storing a copy. |
The view and summary table require you to write aggregation queries yourself — use the examples in this guide as reference, but the submission queries are not provided. The external table just needs to exist and be queryable.
To submit: Copy the links to week1_regional_summary, week1_weather_summary, and week1_live_sheet from BigQuery Explorer (click ⋮ → “Copy link”) and paste them in Moodle.
The rest of this guide helps you build the skills needed to complete this submission.
Before You Start
Required: KSE Google account (yourname@kse.org.ua) with access to project econ250-2026
Open BigQuery: console.cloud.google.com/bigquery?project=econ250-2026
In the Explorer panel (left side), you should see:
course_data— shared course materials (read-only)student__yourname— your personal workspace (you can create tables/views here)
Cost Awareness
Every query shows “bytes processed” in the job information — this determines cost. A query scanning 10 GB costs the same whether it runs for 1 second or 10 seconds.
Before running any query, check the estimate in the top-right corner of the editor. After running, look at the “Job information” panel below your results to see actual bytes processed. Project and individual quotas exist to protect against runaway costs, but be mindful of how much data your queries scan.
SELECT * scans every column, which is expensive and usually unnecessary. Always select only the columns you need. This is the single most important habit for cost-effective BigQuery usage.
-- Expensive: scans all columns
SELECT * FROM `bigquery-public-data.thelook_ecommerce.order_items`;
-- Better: scans only what you need
SELECT order_id, product_id, sale_price
FROM `bigquery-public-data.thelook_ecommerce.order_items`;Quick Start: Your First Queries
Verify your setup with this simple query:
SELECT
'Hello, BigQuery!' AS greeting,
CURRENT_TIMESTAMP() AS query_time,
SESSION_USER() AS your_email;This query scans 0 bytes — it doesn’t touch any tables.
Explore the Superstore Dataset
The Superstore dataset (econ250-2026.course_data.superstore) is small (~10K rows, ~2.5 MB) and safe for experimentation. Any query on this table costs essentially nothing.
SELECT
`Order ID`,
`Customer Name`,
`Product Name`,
Sales,
Profit
FROM `econ250-2026.course_data.superstore`
LIMIT 10;Note the backticks around column names with spaces. BigQuery requires backticks for identifiers containing special characters.
Understanding Columnar Storage
BigQuery stores data by column, not by row. Selecting fewer columns means scanning less data, which means lower cost.
Run these two queries and compare the bytes processed (shown in job information after each query runs):
Query A — all columns:
SELECT *
FROM `econ250-2026.course_data.superstore`;Query B — specific columns:
SELECT `Order ID`, Sales, Profit
FROM `econ250-2026.course_data.superstore`;Query B scans significantly less data, even though both return the same number of rows.
LIMIT Doesn’t Reduce Cost
A common misconception: adding LIMIT 10 does not reduce bytes scanned. BigQuery must read the full column to find any rows — LIMIT only restricts what’s returned to you.
-- This still scans the entire table
SELECT *
FROM `bigquery-public-data.thelook_ecommerce.order_items`
LIMIT 10;Check the bytes processed — it’s the same as without LIMIT.
Analytical Queries
Sales by Category
SELECT
Category,
COUNT(*) AS order_count,
ROUND(SUM(Sales), 2) AS total_sales,
ROUND(AVG(Sales), 2) AS avg_sale
FROM `econ250-2026.course_data.superstore`
GROUP BY Category
ORDER BY total_sales DESC;Profitability Analysis
SELECT
`Sub-Category`,
COUNT(*) AS order_count,
ROUND(SUM(Sales), 2) AS total_sales,
ROUND(SUM(Profit), 2) AS total_profit,
ROUND(SUM(Profit) / SUM(Sales) * 100, 1) AS profit_margin_pct
FROM `econ250-2026.course_data.superstore`
GROUP BY `Sub-Category`
ORDER BY total_profit DESC;Which sub-category has the worst profit margin? Why might that be?
Monthly Trends
SELECT
FORMAT_DATE('%Y-%m', `Order Date`) AS month,
COUNT(*) AS order_count,
ROUND(SUM(Sales), 2) AS total_sales
FROM `econ250-2026.course_data.superstore`
GROUP BY month
ORDER BY month;Creating Tables and Views
You have full control over your personal dataset (student__yourname). Let’s use it.
Tables vs Views
| Aspect | Table | View |
|---|---|---|
| Stores data? | Yes | No (stores query definition) |
| Updates automatically? | No (snapshot at creation) | Yes (runs query each time) |
| Storage cost? | Yes | No |
| When to use | Repeated analysis, performance matters | Source data changes, always need current results |
Create a Table
-- Replace 'yourname' with your actual dataset suffix
CREATE OR REPLACE TABLE `econ250-2026.student__yourname.category_summary` AS
SELECT
Category,
`Sub-Category`,
COUNT(*) AS order_count,
ROUND(SUM(Sales), 2) AS total_sales,
ROUND(SUM(Profit), 2) AS total_profit
FROM `econ250-2026.course_data.superstore`
GROUP BY Category, `Sub-Category`;Refresh your dataset in Explorer to see the new table.
Create a View
CREATE OR REPLACE VIEW `econ250-2026.student__yourname.high_value_orders` AS
SELECT
`Order ID`,
`Order Date`,
`Customer Name`,
Sales,
Profit
FROM `econ250-2026.course_data.superstore`
WHERE Sales > 500
ORDER BY Sales DESC;Query it like a table:
SELECT * FROM `econ250-2026.student__yourname.high_value_orders`;Working with Your Own Data
BigQuery can ingest data from multiple sources. Let’s explore three methods.
Upload a CSV File
Download the weatherAUS.csv file from the Week 1 materials in Moodle. This dataset contains Australian weather observations.
Steps:
- In Explorer, find your personal dataset (
student__yourname) - Click the three dots (⋮) → Create table
- For “Create table from”, select Upload
- Browse and select the CSV file
- Table name:
weather_australia - Check Auto detect for schema
- Click Create table
Query your uploaded data:
SELECT
Location,
COUNT(*) AS observations,
ROUND(AVG(MaxTemp), 1) AS avg_max_temp,
ROUND(AVG(Rainfall), 2) AS avg_rainfall
FROM `econ250-2026.student__yourname.weather_australia`
GROUP BY Location
ORDER BY avg_max_temp DESC
LIMIT 10;Connect a Google Sheet (Live Data)
BigQuery can query Google Sheets directly as external tables. Changes in the sheet are reflected immediately in query results — no re-import needed.
Step 1: Make your own copy of the source sheet
Open this Google Sheet and make a copy to your own Google Drive (File → Make a copy):
Step 2: Make your copy accessible
In your copied sheet, click Share → General access → Anyone with the link (Viewer is sufficient). Copy the URL from your browser’s address bar.
Step 3: Create an external table in BigQuery
- In Explorer, find your personal dataset
- Click ⋮ → Create table
- For “Create table from”, select Drive
- Paste your Google Sheet URL
- File format: Google Sheet
- Table name:
week1_live_sheet(use this name for your submission) - Check Auto detect for schema
- Click Create table
Step 4: Query and observe live updates
SELECT *
FROM `econ250-2026.student__yourname.week1_live_sheet`;Now go to your Google Sheet, change a value in one of the cells, and run the query again. The updated value appears immediately — BigQuery reads directly from the sheet each time you query.
This is useful for dashboards and reports that need to reflect manually-updated data without re-importing.
Create from Query Results
You’ve already seen this approach — CREATE TABLE AS SELECT saves query results as a new table. This is the most common method for creating derived datasets from existing data.
Partitioning Preview
Large tables should be partitioned to reduce query costs. When you filter on the partition column, BigQuery only scans relevant partitions instead of the entire table.
Let’s see this with a public dataset that’s already partitioned by date:
Query without date filter (scans all partitions):
SELECT
event_type,
COUNT(*) AS event_count
FROM `bigquery-public-data.thelook_ecommerce.events`
GROUP BY event_type;Query with date filter (scans only matching partitions):
SELECT
event_type,
COUNT(*) AS event_count
FROM `bigquery-public-data.thelook_ecommerce.events`
WHERE created_at >= '2024-01-01'
GROUP BY event_type;Compare the bytes processed. The filtered query scans significantly less data because it only reads partitions within the date range.
When working with large tables, check if they’re partitioned (look at table details in Explorer). If so, filter on the partition column to minimize costs.
We’ll explore partitioning in depth in Week 2. You don’t need to use partitioning for this week’s submission.
Completing Your Submission
Now you have all the skills needed. Here’s what to create:
1. Regional Summary View (Superstore)
Create a VIEW named week1_regional_summary that analyzes the Superstore dataset.
Requirements:
- Group by
RegionandCategory - Calculate: order count, total sales (rounded), total profit (rounded), profit margin percentage
- Order by total sales descending
Use the “Profitability Analysis” example from the Analytical Queries section as a starting point — you’ll need to adapt it for different grouping columns. See “Create a View” for the syntax.
2. Weather Summary Table (Uploaded CSV)
After uploading weatherAUS.csv as weather_australia (see “Upload a CSV File” section), create a TABLE named week1_weather_summary.
Requirements:
- Group by
Location - Calculate: observation count, average max temperature, average rainfall, count of rainy days
- Order by average rainfall descending
Hint: The RainToday column contains ‘Yes’ or ‘No’ values. To count rainy days, you can use COUNTIF(RainToday = 'Yes') or SUM(CASE WHEN RainToday = 'Yes' THEN 1 ELSE 0 END).
Use CREATE TABLE ... AS SELECT to save query results as a table (see “Create a Table” section for syntax).
3. Live Sheet External Table
After connecting your Google Sheet copy (see “Connect a Google Sheet” section), the external table week1_live_sheet should already exist and be queryable. Verify it works by running a simple SELECT query.
Submit
Copy the links to all three objects from BigQuery Explorer (⋮ → “Copy link”) and paste them in Moodle.
Bonus Challenges
Finished early? Try these:
Customer Analysis: Which customers have placed the most orders? What’s their total spend?
Shipping Analysis: How does ship mode affect delivery time (Ship Date - Order Date)? Which is fastest?
Discount Impact: Is there a relationship between discount level and profit margin?
Google Trends: Query bigquery-public-data.google_trends.top_terms to find recent trending searches. This table is partitioned by refresh_date — always include a date filter to avoid scanning gigabytes of data.
SELECT DISTINCT term, rank, refresh_date
FROM `bigquery-public-data.google_trends.top_terms`
WHERE refresh_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
AND rank <= 5
ORDER BY refresh_date DESC, rank
LIMIT 10;Reference: BigQuery Costs
BigQuery pricing differs from traditional databases. Understanding this helps you write cost-effective queries.
What you pay for:
| Factor | Charged? | Notes |
|---|---|---|
| Data scanned (bytes processed) | Yes | ~$5 per TB scanned |
| Query execution time | No | A 1-second query costs the same as a 10-second query |
| Number of rows returned | No | Returning 10 rows vs 10,000 costs the same |
| Storage (tables you create) | Yes | ~$0.02/GB/month (negligible for this course) |
Cost examples:
| Data Scanned | Approximate Cost |
|---|---|
| 10 MB | ~$0.00005 (essentially free) |
| 1 GB | ~$0.005 |
| 10 GB | ~$0.05 |
| 100 GB | ~$0.50 |
| 1 TB | ~$5.00 |
Free tier: First 1 TB of queries per month is free. Stay well within this limit.
Recommended safety setting: In Query settings, set “Maximum bytes billed” to 10737418240 (10 GB) to prevent expensive accidents.
Troubleshooting
| Problem | Solution |
|---|---|
| “Permission denied” | Verify you’re in project econ250-2026 (check top-left dropdown) |
| Can’t find personal dataset | Look for student__yourname (double underscore) |
| “Syntax error near …” | Check for missing backticks around names with spaces |
| “Table not found” | Use full path: project.dataset.table |
| Query estimate too high | Select fewer columns, add filters, avoid SELECT * |
Backticks vs Quotes:
- Backticks
`— for identifiers (table names, column names with spaces) - Single quotes
'— for string values (WHERE country = 'USA')