Week 1 Practice: Getting Started with BigQuery

ECON250: Big Data Analytics

Published

February 4, 2026

Your Submission

Deadline: 3 days after practice session (2 points for satisfactory completion)

Create the following objects in your personal dataset (student__yourname):

Object	Name	Requirements
Table (upload)	`weather_australia`	Upload the `weatherAUS.csv` file from Moodle Week 1 materials.
View	`week1_regional_summary`	Superstore data grouped by Region and Category. Include: order count, total sales, total profit, profit margin %. Order by total sales descending.
Table (from query)	`week1_weather_summary`	Aggregated summary of your uploaded weather data, grouped by Location. Include: observation count, average max temperature, average rainfall, count of rainy days. Order by average rainfall descending.
External table	`week1_live_sheet`	Connected to your personal copy of the provided Google Sheet. External tables read directly from the source rather than storing a copy.

The view and summary table require you to write aggregation queries yourself — use the examples in this guide as reference, but the submission queries are not provided. The external table just needs to exist and be queryable.

To submit: Copy the links to week1_regional_summary, week1_weather_summary, and week1_live_sheet from BigQuery Explorer (click ⋮ → “Copy link”) and paste them in Moodle.

The rest of this guide helps you build the skills needed to complete this submission.

Before You Start

Required: KSE Google account (yourname@kse.org.ua) with access to project econ250-2026

Open BigQuery: console.cloud.google.com/bigquery?project=econ250-2026

In the Explorer panel (left side), you should see:

course_data — shared course materials (read-only)
student__yourname — your personal workspace (you can create tables/views here)

BigQuery Explorer panel showing course_data and student dataset

Cost Awareness

BigQuery Charges by Data Scanned, Not Time

Every query shows “bytes processed” in the job information — this determines cost. A query scanning 10 GB costs the same whether it runs for 1 second or 10 seconds.

Before running any query, check the estimate in the top-right corner of the editor. After running, look at the “Job information” panel below your results to see actual bytes processed. Project and individual quotas exist to protect against runaway costs, but be mindful of how much data your queries scan.

Query editor showing bytes estimate in top-right corner before running

Job information panel showing bytes processed after query completes

Never Use SELECT * on Large Tables

SELECT * scans every column, which is expensive and usually unnecessary. Always select only the columns you need. This is the single most important habit for cost-effective BigQuery usage.

-- Expensive: scans all columns
SELECT * FROM `bigquery-public-data.thelook_ecommerce.order_items`;

-- Better: scans only what you need
SELECT order_id, product_id, sale_price
FROM `bigquery-public-data.thelook_ecommerce.order_items`;

Quick Start: Your First Queries

Verify your setup with this simple query:

SELECT
  'Hello, BigQuery!' AS greeting,
  CURRENT_TIMESTAMP() AS query_time,
  SESSION_USER() AS your_email;

This query scans 0 bytes — it doesn’t touch any tables.

Explore the Superstore Dataset

The Superstore dataset (econ250-2026.course_data.superstore) is small (~10K rows, ~2.5 MB) and safe for experimentation. Any query on this table costs essentially nothing.

SELECT
  `Order ID`,
  `Customer Name`,
  `Product Name`,
  Sales,
  Profit
FROM `econ250-2026.course_data.superstore`
LIMIT 10;

Note the backticks around column names with spaces. BigQuery requires backticks for identifiers containing special characters.

Understanding Columnar Storage

BigQuery stores data by column, not by row. Selecting fewer columns means scanning less data, which means lower cost.

Run these two queries and compare the bytes processed (shown in job information after each query runs):

Query A — all columns:

SELECT *
FROM `econ250-2026.course_data.superstore`;

Query B — specific columns:

SELECT `Order ID`, Sales, Profit
FROM `econ250-2026.course_data.superstore`;

Query B scans significantly less data, even though both return the same number of rows.

LIMIT Doesn’t Reduce Cost

A common misconception: adding LIMIT 10 does not reduce bytes scanned. BigQuery must read the full column to find any rows — LIMIT only restricts what’s returned to you.

-- This still scans the entire table
SELECT *
FROM `bigquery-public-data.thelook_ecommerce.order_items`
LIMIT 10;

Check the bytes processed — it’s the same as without LIMIT.

Analytical Queries

Sales by Category

SELECT
  Category,
  COUNT(*) AS order_count,
  ROUND(SUM(Sales), 2) AS total_sales,
  ROUND(AVG(Sales), 2) AS avg_sale
FROM `econ250-2026.course_data.superstore`
GROUP BY Category
ORDER BY total_sales DESC;

Profitability Analysis

SELECT
  `Sub-Category`,
  COUNT(*) AS order_count,
  ROUND(SUM(Sales), 2) AS total_sales,
  ROUND(SUM(Profit), 2) AS total_profit,
  ROUND(SUM(Profit) / SUM(Sales) * 100, 1) AS profit_margin_pct
FROM `econ250-2026.course_data.superstore`
GROUP BY `Sub-Category`
ORDER BY total_profit DESC;

Which sub-category has the worst profit margin? Why might that be?

Monthly Trends

SELECT
  FORMAT_DATE('%Y-%m', `Order Date`) AS month,
  COUNT(*) AS order_count,
  ROUND(SUM(Sales), 2) AS total_sales
FROM `econ250-2026.course_data.superstore`
GROUP BY month
ORDER BY month;

Creating Tables and Views

You have full control over your personal dataset (student__yourname). Let’s use it.

Tables vs Views

Aspect	Table	View
Stores data?	Yes	No (stores query definition)
Updates automatically?	No (snapshot at creation)	Yes (runs query each time)
Storage cost?	Yes	No
When to use	Repeated analysis, performance matters	Source data changes, always need current results

Create a Table

-- Replace 'yourname' with your actual dataset suffix
CREATE OR REPLACE TABLE `econ250-2026.student__yourname.category_summary` AS
SELECT
  Category,
  `Sub-Category`,
  COUNT(*) AS order_count,
  ROUND(SUM(Sales), 2) AS total_sales,
  ROUND(SUM(Profit), 2) AS total_profit
FROM `econ250-2026.course_data.superstore`
GROUP BY Category, `Sub-Category`;

Refresh your dataset in Explorer to see the new table.

Create a View

CREATE OR REPLACE VIEW `econ250-2026.student__yourname.high_value_orders` AS
SELECT
  `Order ID`,
  `Order Date`,
  `Customer Name`,
  Sales,
  Profit
FROM `econ250-2026.course_data.superstore`
WHERE Sales > 500
ORDER BY Sales DESC;

Query it like a table:

SELECT * FROM `econ250-2026.student__yourname.high_value_orders`;

Working with Your Own Data

BigQuery can ingest data from multiple sources. Let’s explore three methods.

Upload a CSV File

Download the weatherAUS.csv file from the Week 1 materials in Moodle. This dataset contains Australian weather observations.

Steps:

In Explorer, find your personal dataset (student__yourname)
Click the three dots (⋮) → Create table
For “Create table from”, select Upload
Browse and select the CSV file
Table name: weather_australia
Check Auto detect for schema
Click Create table

Create table dialog with Upload selected and Auto detect checked

Query your uploaded data:

SELECT
  Location,
  COUNT(*) AS observations,
  ROUND(AVG(MaxTemp), 1) AS avg_max_temp,
  ROUND(AVG(Rainfall), 2) AS avg_rainfall
FROM `econ250-2026.student__yourname.weather_australia`
GROUP BY Location
ORDER BY avg_max_temp DESC
LIMIT 10;

Connect a Google Sheet (Live Data)

BigQuery can query Google Sheets directly as external tables. Changes in the sheet are reflected immediately in query results — no re-import needed.

Step 1: Make your own copy of the source sheet

Open this Google Sheet and make a copy to your own Google Drive (File → Make a copy):

Sample Data Sheet

Step 2: Make your copy accessible

In your copied sheet, click Share → General access → Anyone with the link (Viewer is sufficient). Copy the URL from your browser’s address bar.

Google Sheets sharing dialog with “Anyone with the link” selected

Step 3: Create an external table in BigQuery

In Explorer, find your personal dataset
Click ⋮ → Create table
For “Create table from”, select Drive
Paste your Google Sheet URL
File format: Google Sheet
Table name: week1_live_sheet (use this name for your submission)
Check Auto detect for schema
Click Create table

Create table dialog with Drive selected and Google Sheet as file format

Step 4: Query and observe live updates

SELECT *
FROM `econ250-2026.student__yourname.week1_live_sheet`;

Now go to your Google Sheet, change a value in one of the cells, and run the query again. The updated value appears immediately — BigQuery reads directly from the sheet each time you query.

This is useful for dashboards and reports that need to reflect manually-updated data without re-importing.

Create from Query Results

You’ve already seen this approach — CREATE TABLE AS SELECT saves query results as a new table. This is the most common method for creating derived datasets from existing data.

Partitioning Preview

Large tables should be partitioned to reduce query costs. When you filter on the partition column, BigQuery only scans relevant partitions instead of the entire table.

Let’s see this with a public dataset that’s already partitioned by date:

Query without date filter (scans all partitions):

SELECT
  event_type,
  COUNT(*) AS event_count
FROM `bigquery-public-data.thelook_ecommerce.events`
GROUP BY event_type;

Query with date filter (scans only matching partitions):

SELECT
  event_type,
  COUNT(*) AS event_count
FROM `bigquery-public-data.thelook_ecommerce.events`
WHERE created_at >= '2024-01-01'
GROUP BY event_type;

Compare the bytes processed. The filtered query scans significantly less data because it only reads partitions within the date range.

Partitioning Rule of Thumb

When working with large tables, check if they’re partitioned (look at table details in Explorer). If so, filter on the partition column to minimize costs.

We’ll explore partitioning in depth in Week 2. You don’t need to use partitioning for this week’s submission.

Completing Your Submission

Now you have all the skills needed. Here’s what to create:

1. Regional Summary View (Superstore)

Create a VIEW named week1_regional_summary that analyzes the Superstore dataset.

Requirements:

Group by Region and Category
Calculate: order count, total sales (rounded), total profit (rounded), profit margin percentage
Order by total sales descending

Use the “Profitability Analysis” example from the Analytical Queries section as a starting point — you’ll need to adapt it for different grouping columns. See “Create a View” for the syntax.

2. Weather Summary Table (Uploaded CSV)

After uploading weatherAUS.csv as weather_australia (see “Upload a CSV File” section), create a TABLE named week1_weather_summary.

Requirements:

Group by Location
Calculate: observation count, average max temperature, average rainfall, count of rainy days
Order by average rainfall descending

Hint: The RainToday column contains ‘Yes’ or ‘No’ values. To count rainy days, you can use COUNTIF(RainToday = 'Yes') or SUM(CASE WHEN RainToday = 'Yes' THEN 1 ELSE 0 END).

Use CREATE TABLE ... AS SELECT to save query results as a table (see “Create a Table” section for syntax).

3. Live Sheet External Table

After connecting your Google Sheet copy (see “Connect a Google Sheet” section), the external table week1_live_sheet should already exist and be queryable. Verify it works by running a simple SELECT query.

Submit

Copy the links to all three objects from BigQuery Explorer (⋮ → “Copy link”) and paste them in Moodle.

Explorer context menu showing Copy link option

Bonus Challenges

Finished early? Try these:

Customer Analysis: Which customers have placed the most orders? What’s their total spend?

Shipping Analysis: How does ship mode affect delivery time (Ship Date - Order Date)? Which is fastest?

Discount Impact: Is there a relationship between discount level and profit margin?

Google Trends: Query bigquery-public-data.google_trends.top_terms to find recent trending searches. This table is partitioned by refresh_date — always include a date filter to avoid scanning gigabytes of data.

SELECT DISTINCT term, rank, refresh_date
FROM `bigquery-public-data.google_trends.top_terms`
WHERE refresh_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
  AND rank <= 5
ORDER BY refresh_date DESC, rank
LIMIT 10;

Reference: BigQuery Costs

BigQuery pricing differs from traditional databases. Understanding this helps you write cost-effective queries.

What you pay for:

Factor	Charged?	Notes
Data scanned (bytes processed)	Yes	~$5 per TB scanned
Query execution time	No	A 1-second query costs the same as a 10-second query
Number of rows returned	No	Returning 10 rows vs 10,000 costs the same
Storage (tables you create)	Yes	~$0.02/GB/month (negligible for this course)

Cost examples:

Data Scanned	Approximate Cost
10 MB	~$0.00005 (essentially free)
1 GB	~$0.005
10 GB	~$0.05
100 GB	~$0.50
1 TB	~$5.00

Free tier: First 1 TB of queries per month is free. Stay well within this limit.

Recommended safety setting: In Query settings, set “Maximum bytes billed” to 10737418240 (10 GB) to prevent expensive accidents.

Troubleshooting

Problem	Solution
“Permission denied”	Verify you’re in project `econ250-2026` (check top-left dropdown)
Can’t find personal dataset	Look for `student__yourname` (double underscore)
“Syntax error near …”	Check for missing backticks around names with spaces
“Table not found”	Use full path: `project.dataset.table`
Query estimate too high	Select fewer columns, add filters, avoid `SELECT *`

Backticks vs Quotes:

Backticks ` — for identifiers (table names, column names with spaces)
Single quotes ' — for string values (WHERE country = 'USA')