What Makes Data “Big”?
ECON250 - Big Data Analytics | Week 1
2026-01-01
You already work with data
Is the data you’ve worked with “big”?
What’s the largest dataset you’ve personally worked with?
Pause and think for a minute.
Maybe “big” means…
A CSV file that’s several gigabytes?
A database query that takes minutes to run?
Data that crashes your laptop?
Your laptop’s limits
RAM — 8–16 GB
Storage — 256 GB – 1 TB
CPUs — 4–8 cores
From KSE HUB…
![]()
How many rows? What tables? Think about it.
…to Rozetka
![]()
Millions of products × millions of users × every interaction = ?
“Just buy a bigger computer?”
| Your laptop |
16 GB |
~$1,000 |
| Powerful workstation |
128 GB |
~$5,000 |
| High-end server |
1 TB |
~$50,000 |
| Rozetka’s data |
10+ TB |
??? |
The first dimension of “big”
When data doesn’t fit on one machine
Orders of magnitude
MB → Course grades spreadsheet ✓
GB → KSE HUB (all tabular data) ✓
TB → Rozetka (transactions + behavior) ✗
PB → Netflix, Spotify ✗✗
1 PB = 1,000 TB = 1,000,000 GB :::
Now consider a different problem
[Image: Weather station, calm sky]
Daily measurements → Monthly analysis → Quarterly report
Taxi pricing at rush hour
[Image: Uklon app showing surge pricing]
Is demand being met right now?
Same problem, different domain
[Image: Monobank app notification]
Is this transaction fraudulent?
You have milliseconds to decide.
The second dimension of “big”
When you can’t wait for batch processing
The velocity spectrum
Batch (hours–days)
↓
Near-real-time (minutes)
↓
Real-time (milliseconds)
One more dimension to consider
| 1001 |
Anna |
2024-03-15 |
150.00 |
| 1002 |
Oleh |
2024-03-16 |
89.50 |
This is comfortable. Rows, columns, clear types.
What if your data looks like this?
Scientific papers ![]()
Web pages to scrape
![]()
Or like this?
Nested API responses
![]()
Text on image for OCR
![]()
The third dimension of “big”
When data isn’t rows and columns
The Three Vs
📦
VOLUME
Too much for one machine
⚡
VELOCITY
Too fast for batch
🔀
VARIETY
Too diverse for tables
Some add more Vs
Veracity — Can you trust the data? Is it accurate?
Value — Is there actual insight worth extracting?
The 3 Vs remain the core framework.
Real-world scale
| Spotify |
100+ PB, 600M users |
Real-time recommendations |
| Rozetka |
Millions of daily events |
Inventory, personalization |
| Monobank |
Millions of txns/day |
Fraud detection |
| Nova Poshta |
Every package, every scan |
Logistics optimization |
Economic research is changing
[Satellite imagery → economic activity]
[Mobile data → migration patterns]
What you’ll learn
This course
- Analytical SQL at scale
- BigQuery handles the Volume
- Patterns that work on billions of rows
Why it transfers
- Same SQL everywhere
- Foundation for Velocity tools
- Foundation for Variety tools
“Big data” isn’t a buzzword
It’s a threshold where your tools and techniques
must fundamentally change.
This course teaches you to work beyond that threshold.
Next up
The Big Data Technology Landscape
What tools exist, why they exist,
and why we’re focusing on one part.