ECON250 - Big Data Analytics | Week 1
2026-01-01
Analytical SQL — not database administration
Cloud-scale thinking — not local tools
Business questions → SQL → Insights
| We cover | We don’t cover |
|---|---|
| Writing analytical queries | Building databases |
| BigQuery, cloud warehouses | PostgreSQL administration |
| Deriving insights from data | Building data pipelines |
| SQL patterns for analysis | Software engineering |
Practical reasons
Learning reasons
8 weeks:
1-2: Foundations (BigQuery, data types, cost awareness)
3-4: Core patterns (aggregations, window functions)
5-6: Advanced analysis (cohorts, complex structures)
7-8: Integration & projects
| Day | Session | Focus |
|---|---|---|
| Day 1 | Lecture (80 min) | Concepts, demonstrations |
| Day 2 | Practice 1 (80 min) | Hands-on exercises |
| Day 2 | Practice 2 (80 min) | Continued work, submission |
| Component | Points | Notes |
|---|---|---|
| Weekly quizzes | 12 | Weeks 2-7, at lecture start |
| Practice submissions | 14 | Satisfactory completion |
| Assignments | 50 | 5 assignments, individual |
| Group project | 35 | Groups of 3-4 |
Practical skills
You’ll leave with abilities you can immediately apply in internships and jobs.
Raise your hand if you’ve ever had problems with a file being too big for analysis on your local laptop
RAM: 8-16 GB (where analysis happens)
Storage: 256 GB - 1 TB (where files sit)
CPUs: 4-8 cores
A “big” Excel file: ~1M rows × 20 columns ≈ 200 MB
Your dataset is 50 GB. Your laptop has 16 GB RAM.
Now what?
| Scale | Example | Fits on laptop? |
|---|---|---|
| MB | Course grades spreadsheet | ✓ Easy |
| GB | University database | ✓ Manageable |
| TB | Large E-commerce transactions | ✗ Difficult |
| PB | Spotify, Netflix | ✗ Impossible |
1 PB = 1,000,000 GB
At some point, buying a bigger computer stops working.
The size dimension of big data
When data is too large for a single machine:
Solution: Distribute across many machines
Rozetka on Black Friday
Millions of users. Thousands of orders per minute.
What challenges arise when data arrives faster than you can process it?
| Processing type | Latency | Example |
|---|---|---|
| Batch | Hours-days | Monthly reports |
| Near-real-time | Minutes | Dashboard updates |
| Real-time | Milliseconds | Fraud detection |
The speed dimension of big data
When data arrives faster than batch processing allows:
Most analytics (including this course) uses batch processing.
What if your data isn’t a nice table?
Think about what companies actually store:
Structured
Semi/Unstructured
BigQuery handles structured and semi-structured (JSON, arrays).
The structure dimension of big data
When data isn’t rows and columns:
VOLUME
Size dimension
“Data too big for one machine”
VELOCITY
Speed dimension
“Data arriving too fast”
VARIETY
Structure dimension
“Data that isn’t tables”
| Company | Scale | Primary Challenge |
|---|---|---|
| Spotify | 100+ PB, 600M users | Volume + Velocity |
| Netflix | Billions of events/day | Velocity |
| Exabytes | All three | |
| Monobank | Millions of txns/day | Velocity (fraud) |
Data is getting bigger everywhere
You need to know how to work beyond your laptop
BigQuery sits at: Storage + Transform + Analysis
Why start here: Most immediately applicable skill
| Data Warehouse | Data Lake |
|---|---|
| Structured, cleaned | Raw files |
| Optimized for analytics | Store everything |
| Schema-on-write | Schema-on-read |
| BigQuery, Snowflake | S3, GCS |
For this course: We work in the warehouse layer.
| Role | Primary focus |
|---|---|
| Data Analyst | Analysis + Consumption |
| Analytics Engineer | Transform + Analysis |
| Data Engineer | Ingestion + Storage + Transform |
| Data Scientist | Analysis + ML |
This course: Data Analyst skills. You’ll know where everything else fits.
Why? Different tools for different problems. SQL skills transfer everywhere — that’s the foundation.
SQL = Lingua franca of data
SQL skills compound. Every data tool speaks SQL.
Imagine you work at an e-commerce company. Your CEO asks:
“How many customers purchased again this month?”
“What’s our revenue trend by region over 3 years?”
“Are we retaining our customers?”
Not because the SQL syntax is complex…
But because they require:
Designed for applications:
Optimized for: Finding one needle in the haystack
They ask about the whole haystack:
This requires a different kind of database — and a different mindset.
| Transactional (OLTP) | Analytical (OLAP) |
|---|---|
| Support applications | Support decisions |
| Single-row lookups | Aggregate millions of rows |
| “Get this customer’s order” | “What are our top products?” |
| CRUD operations | Read-heavy analysis |
| Normalized schemas | Denormalized for reading |
Transactional
Find a needle
What happened?
Analytical
Understand the haystack
Why did it happen?
What will happen?
To answer business questions at scale, you need:
Collapsing rows into insights
Calculations across rows without collapsing
Trends, seasonality, comparisons
“Are we retaining customers?”
-- Step 1: Find when each customer first purchased
WITH customer_first_purchase AS (
SELECT customer_id, MIN(order_date) AS first_date
FROM orders GROUP BY customer_id
),
-- Step 2: Track activity relative to first purchase
customer_activity AS (
SELECT ... -- join and calculate months since first
)
-- Step 3: Aggregate into cohort retention
SELECT
cohort_month, months_since_first,
COUNT(DISTINCT customer_id) AS retained
FROM customer_activity
GROUP BY 1, 2Always ask:
SQL can answer these questions. That’s what this course teaches.
Weeks 2-4
Weeks 5-7
Practice 1 (80 min):
Practice 2 (80 min):
Bring your laptop
Make sure you can access:
console.cloud.google.com/bigquery
(Use your KSE Google account)
Project: econ250-2026
Questions?
o_omelchenko@kse.org.ua

Kyiv School of Economics