What Makes Data “Big”?

ECON250 - Big Data Analytics | Week 1

Oleh Omelchenko

2026-01-01

You already work with data

Is the data you’ve worked with “big”?


What’s the largest dataset you’ve personally worked with?


Pause and think for a minute.

Maybe “big” means…

A CSV file that’s several gigabytes?

A database query that takes minutes to run?

Data that crashes your laptop?

Your laptop’s limits


RAM — 8–16 GB

Storage — 256 GB – 1 TB

CPUs — 4–8 cores

From KSE HUB…


How many rows? What tables? Think about it.

…to Rozetka

Millions of products × millions of users × every interaction = ?

“Just buy a bigger computer?”

Machine RAM Cost
Your laptop 16 GB ~$1,000
Powerful workstation 128 GB ~$5,000
High-end server 1 TB ~$50,000
Rozetka’s data 10+ TB ???

The first dimension of “big”

VOLUME

When data doesn’t fit on one machine

Orders of magnitude


MB → Course grades spreadsheet ✓

GB → KSE HUB (all tabular data) ✓

TB → Rozetka (transactions + behavior) ✗

PB → Netflix, Spotify ✗✗

1 PB = 1,000 TB = 1,000,000 GB :::

Now consider a different problem

[Image: Weather station, calm sky]


Daily measurements → Monthly analysis → Quarterly report

Taxi pricing at rush hour

[Image: Uklon app showing surge pricing]


Is demand being met right now?

Same problem, different domain

[Image: Monobank app notification]


Is this transaction fraudulent?

You have milliseconds to decide.

The second dimension of “big”

VELOCITY

When you can’t wait for batch processing

The velocity spectrum


Batch (hours–days)

Near-real-time (minutes)

Real-time (milliseconds)

One more dimension to consider

customer_id name purchase_date amount
1001 Anna 2024-03-15 150.00
1002 Oleh 2024-03-16 89.50


This is comfortable. Rows, columns, clear types.

What if your data looks like this?

Scientific papers

Web pages to scrape

Or like this?

Nested API responses

Text on image for OCR

The third dimension of “big”

VARIETY

When data isn’t rows and columns

The Three Vs


📦

VOLUME

Too much for one machine

VELOCITY

Too fast for batch

🔀

VARIETY

Too diverse for tables

Some add more Vs

Veracity — Can you trust the data? Is it accurate?

Value — Is there actual insight worth extracting?


The 3 Vs remain the core framework.

Real-world scale

Who Scale Challenge
Spotify 100+ PB, 600M users Real-time recommendations
Rozetka Millions of daily events Inventory, personalization
Monobank Millions of txns/day Fraud detection
Nova Poshta Every package, every scan Logistics optimization

Economic research is changing

[Satellite imagery → economic activity]

[Mobile data → migration patterns]

What you’ll learn

This course

  • Analytical SQL at scale
  • BigQuery handles the Volume
  • Patterns that work on billions of rows

Why it transfers

  • Same SQL everywhere
  • Foundation for Velocity tools
  • Foundation for Variety tools

“Big data” isn’t a buzzword


It’s a threshold where your tools and techniques

must fundamentally change.


This course teaches you to work beyond that threshold.

Next up


The Big Data Technology Landscape


What tools exist, why they exist,

and why we’re focusing on one part.