Big Data Analytics
2025/2026, Spring Trimester
| Course faculty | Oleh Omelchenko e-mail: o_omelchenko@kse.org.ua |
|---|---|
| Course assistants | e-mail: |
| Department | Computer Science Department |
| Study program | Economics and Big Data |
| ECTS credits | 4 (120 academic hours total) |
| Class hours | 48 academic hours, 16 lecture hours and 32 practice hours |
| Course language | English |
| Course format Form of study | Offline Full-time studies |
Overview
Prerequisites
Must have:
- Completion of a database fundamentals course (understanding of relational model, basic SQL: SELECT, WHERE, JOIN, GROUP BY)
- Familiarity with data manipulation concepts (from Data Manipulation Essentials or equivalent)
- Basic understanding of business metrics and KPIs (from Business Intelligence Software or equivalent)
Nice to have:
- Experience with data visualization tools (Tableau, Power BI)
- Familiarity with working in cloud environments
- Understanding of basic statistics (averages, distributions, percentiles)
Note: The course includes a SQL foundations refresh in Week 1, but assumes students have encountered SQL previously and need reactivation rather than first-time learning.
Background and course rationale
Data analysis at scale requires more than knowing SQL syntax — it demands understanding how to translate business questions into efficient queries, navigate the quirks of real-world datasets, and extract meaningful insights from millions of rows. As organizations increasingly rely on cloud data warehouses, the ability to perform analytical work directly in SQL becomes a core competency for data professionals.
This course bridges the gap between foundational database knowledge and practical analytical work with large-scale data. Students will learn to approach SQL not as a data retrieval language, but as a powerful analytical tool — working with window functions, cohort analysis, time-series patterns, and other techniques that form the backbone of modern business analytics.
Using Google BigQuery as our platform, students will work with realistic datasets spanning business operations, public policy, and economic data — gaining hands-on experience with the scale and complexity they will encounter in professional settings.
Course aims
This course aims to:
- Develop fluency in analytical SQL patterns that go beyond basic data retrieval — including window functions, cohort analysis, and complex aggregations
- Build practical experience with cloud-based data warehouses using Google BigQuery and realistic large-scale datasets
- Teach systematic approaches to handling data quality challenges, particularly NULL values and inconsistent data
- Equip students with the ability to translate business and analytical questions into well-structured SQL queries
- Provide hands-on experience with datasets reflecting the diverse data landscape graduates will encounter
- Prepare students for data analyst roles requiring SQL proficiency and analytical thinking
Learning outcomes
By the end of the course, students will be able to:
- Write Analytical SQL: Go beyond basic queries to implement window functions, complex aggregations, and multi-step analytical logic
- Work with BigQuery at Scale: Navigate BigQuery’s interface, understand cost implications of queries, and apply platform-specific optimizations
- Handle Real-World Data Quality Issues: Identify and correctly handle NULL values, duplicates, and inconsistent data in analytical queries
- Perform Time-Based Analysis: Build cohort tables, calculate retention metrics, and analyze trends over time
- Translate Business Questions to SQL: Break down analytical problems into queryable components and structure analysis systematically
- Communicate Analytical Findings: Document query logic clearly and present data-driven insights in a professional format
Methods of Teaching and Assessment
This course employs following methods of teaching and assessment:
- Lectures
- Practices
- Quizzes (graded and ungraded) and interactive exercises
- Personal Assignments
- Group project
Course Timeline (approximate)
TBD
Course Faculty
Oleh Omelchenko, M.Sc. in Telecommunications
Office hours: offline, every Wednesday, 16:00-18:00, KSE Dragon Capital Building. (also available online in Slack during working hours)
Professional Experience: I have 8 years of experience in Data analytics and analytics development, specifically with web and mobile applications. Prior work experience includes e-commerce websites and startups. Since 2020, my current position is Senior Product Analyst at MacPaw - Ukrainian software company building applications for macOS and iOS.
Research Interests: I am interested in visualization literacy and study of dataviz comprehension differences.
Teaching Philosophy: I believe in an approach that goes beyond predefined topics and rigid scope - hence, during my courses we may touch topics that aren’t typically present in similar courses. However, the choice of topics is not arbitrary but rather based on what students might expect in the various working environments that they might get into.
There is a set of technical challenges that students are expected to learn to solve, however I aim to help to finish the course with a comprehensive understanding of the area. This, however, requires bi-directional approach, so I also expect the students to be inquisitive during the lectures and not hesitate to ask questions.
Personal Interests: Besides working and teaching, I practice martial arts (Kyokushin) and enjoy participating in mind games (pub quizzes etc.)
Interesting Facts: I used to do competitive touch typing so I possibly type in Ukrainian faster than any person you know (and even faster in English) 🙂
Teacher Assistants
, …
Office hours: …
Course Structure
The course runs for 8 weeks with one lecture and two consecutive practice sessions per week. Topics within each week are contextually aligned, with practice sessions building directly on lecture concepts.
Preliminary Materials: Before most lectures, students will be assigned required preparatory materials (readings, videos, tutorials). Students should review these materials before the lecture to maximize in-class time for discussion, clarification, and advanced concepts rather than basic introduction.
Lectures: Eight lectures, one per week, introducing key concepts and frameworks. Starting from Week 2, each lecture begins with a graded quiz covering the previous week’s material. Quizzes are only available during lecture time.
Practice Sessions: Sixteen practice sessions (two per week) provide hands-on experience with the tools and techniques covered in lectures. The sessions feature datasets from different business domains, with brief discussions of domain-specific analytical considerations. Students submit their practice work (SQL queries, BigQuery saved queries, or exported results) through Moodle after each session.
Practice submissions are evaluated on a “satisfactory completion” basis — demonstrating understanding of core concepts and reasonable effort is more important than perfection. Practices can be completed outside of scheduled hours within the submission window.
Individual Assignments: Five assignments throughout the course deepen understanding and broaden experience with realistic analytical scenarios. Assignments are assessed on both technical implementation (70%) and documentation/explanation (30%), reflecting real-world requirements where communicating analytical processes is as important as technical execution.
Group Project: A comprehensive analytical report project (groups of 3-4 students) synthesizing all course concepts. Groups may propose their own datasets (subject to instructor approval) or select from provided options. The project includes a milestone submission in Week 6 and final presentations in Week 8.
Note on Class Hours
Since this is a 4-credit course, students are expected to dedicate 120 hours of total work. Class activities (lectures & practices) take 48 hours, leaving 72 academic hours for self-study: preliminary materials, assignments, and group project work.
Make-Up Policy on the graded activities
Weekly Quizzes: Must be completed during lecture or practice hours. Make-up opportunities may be granted at the instructor’s discretion for documented emergencies or unavoidable conflicts if reported before the lecture.
Practice Submissions: Must be submitted within a 3-day window (inclusive) after the practice day. For example, if practice is on Thursday, the deadline is Sunday 23:59 Kyiv time; practice on Friday means the deadline is Monday 23:59.
Assignments: Must be submitted by the specified deadline. Submissions after the deadline but within one week will be graded at 50% of earned points. Submissions more than one week late will not be accepted.
Deadline Extensions: Students may request extensions on individual assignments by contacting the instructor at least 2 days before the deadline. Requests made closer to the deadline will not be considered.
Grading Policy
General Grading Scheme
| Task | Total Points | Total |
|---|---|---|
| Lectures (graded quiz) | 6 weeks * 2 pts each | 12 |
| Practices | 7 weeks * 2 pts each | 14 |
| Assignments | 5 assignments * 8+2 pts each | 40+10 |
| Group Project | 35 pts | 35 |
| Total (if sum of grades exceeds 100 pts, final grade is 100) | 101+10=111 max |
Graded Lecture Quizzes (12 pts total, 2 pts each)
Short interactive quizzes administered at the start of lectures via Moodle:
- Tests retention of previous week’s concepts
- 5-7 questions, 10 minutes total
- Must be present to participate (no makeups)
- Grading: 60%+ correct = 2 pts, 30-60% = 1 pt, <30% = 0 pts
Practice Submissions (14 pts total, 2 pts each)
Hands-on work completed during or immediately after practice sessions:
- Evaluated on “satisfactory” basis - completion over perfection
- Must demonstrate understanding of core concepts
- Submit on Moodle within 3-day window
- Grading: Complete & correct approach = 2 pts, Missing/inadequate = 0 pts
Individual Assignments (40+10 pts total, 8+2 pts each)
Late Submission Impact:
- On time: 100% of earned grade
- 1-7 days late: 50% of earned grade
- 7+ days late: 0 points (not accepted)
Group Project (35 pts)
Students will work in groups of 3-4 to analyze a complex dataset provided by the instructor. The project requires examining the data from multiple analytical perspectives and producing a comprehensive report with supporting SQL queries.
Detailed grading breakdown will be presented in Week 4 when introducing the project. Key evaluation criteria include:
- Analytical Depth (12 pts): Variety and sophistication of questions explored; appropriate use of techniques learned throughout the course
- SQL Quality (10 pts): Correct, efficient, and well-structured queries; proper handling of NULLs and edge cases
- Insights & Interpretation (8 pts): Meaningful conclusions drawn from the analysis; business/practical relevance of findings
- Documentation & Presentation (5 pts): Clear explanation of methodology; professional formatting; group presentation quality
Short descriptions of assignments
| Assignment | Learning Objectives | Description |
|---|---|---|
| Assignment 1: SQL Refresh & BigQuery Basics | LO1, LO2 | Reactivate foundational SQL skills while getting familiar with the BigQuery environment. Students will write queries against a provided dataset covering filtering, sorting, basic joins, and simple aggregations — establishing the baseline for more complex analytical work ahead. |
| Assignment 2: Aggregations, NULLs handling | LO1, LO3, LO5 | Calculate business metrics from a messy real-world dataset containing NULL values, duplicates, and inconsistencies. Students will practice defensive SQL techniques, explore how NULLs affect aggregation results, and learn to validate their outputs against expected behavior |
| Assignment 3: Window Functions | LO1, LO5 | Apply window functions to solve analytical problems: rankings, running totals, moving averages, and period-over-period comparisons. Students will work with a transactional dataset to answer business questions that require looking across rows rather than collapsing them. |
| Assignment 4: Cohort & Time-Series Analysis | LO1, LO4, LO5 | Build a cohort retention analysis from raw event data. Students will transform timestamped records into cohort tables, calculate retention rates across time periods, and interpret patterns in user or customer behavior over time. |
| Assignment 5: Complex Analytical Patterns | LO4, LO2, LO5, LO6 | Tackle an advanced analytical scenario combining multiple techniques: funnel analysis, sessionization, or working with nested data structures. Students will document their approach and present findings in a format suitable for a business audience. |
| Group Project | LO6 | A comprehensive analysis developed by groups of 3-4 students addressing a real-world analytical challenge. |
Academic integrity warning
Academic integrity is highly valued by the KSE community and its vast majority of students. We have a zero-tolerance policy towards academic plagiarism, self-plagiarism, fabrication, falsification, cheating, deception, bribery and other types of violations. Due to the highly competitive nature of the programmes, all students must be treated equally. Even a single case of violation of academic integrity is a serious misdemeanour that may lead to unjust redistribution of grades and, consequently, overall rankings, possible grants, fee reductions, and other merit-based awards. Therefore, penalties may vary from receiving zero points or a negative grade for an assignment to expulsion of the student from KSE, depending on the severity of the case and might include additional consequences such as deprivation of scholarships, financial assistance etc. All the rules and procedures regarding academic integrity are stated in the KSE Code of Academic Integrity. Before enrolling in this course, students must be aware of and abide by the KSE Code of Academic Integrity.
AI Tools usage policy
Encouraged:
- Using AI as a learning assistant before lectures to get familiar with and explore new concepts and topics
- Asking clarifying questions about complex topics
- Generating example code snippets to understand specific functions or techniques
- Debugging assistance and error explanation
- Brainstorming alternative approaches to problems
Prohibited:
- Using AI to complete entire assignments or project components without understanding of the part of the work AI performed
- Submitting AI-generated solutions without significant personal modification and understanding
- Using AI for final project presentations or documentation without disclosure
Documentation Requirements
Unless otherwise specified, all AI usage must be documented in a separate section of your submission including:
- Which AI tool was used
- Complete chat history or prompts used
- Clear identification of:
- Parts completed independently
- Parts where AI provided assistance
- Nature and extent of AI contribution
- Parts completed independently
- Short summary of how the AI-generated content was modified and integrated
Verification process
- Submissions will be checked for undisclosed AI usage during evaluation by the lecturer or TA. No automated AI checks will be performed.
- Students may be asked to explain any part of their submission in detail.
- Random submission reviews may include questions about implementation decisions.
Penalties for Non-Compliance
Any violations of the AI Tools usage policy will be treated as violation of the KSE Code of Academic Integrity.
All such cases will be handled according to the plagiarism procedures outlined in the KSE Code of Academic Integrity, with penalties ranging from grade reduction to course failure and possible expulsion from KSE in case of repeated violations of the Code.
Course Plan
Learning materials
- Google BigQuery: The Definitive Guide (O’Reilly link)
- SQL for Data Analysis (O’Reilly Link)
- BigQuery documentation
Schedule
| Week | Topics | Learning Materials | Learning outcomes & activities |
|---|---|---|---|
| 1 | BigQuery Introduction & SQL Foundations Assignment 1 | ||
| Course overview, objectives, and expectations Big Data landscape: what makes data “big” and why it matters SQL as an analytical tool: relational vs analytical mindset P1: BigQuery environment setup, interface navigation, running first queries P2: SQL foundations refresh — filtering, sorting, joins, basic aggregations | Published on Moodle | Navigate BigQuery interface confidently Write foundational SQL queries Understand the difference between transactional and analytical SQL usage Practice submission (2 pts) | |
| 2 | BigQuery Fundamentals & Working with Data Assignment 2 | ||
| BigQuery architecture: how it processes queries at scale Data types in BigQuery: strings, numbers, dates, timestamps Cost-aware querying: understanding what you’re scanning P1: Working with different data types, type conversions, date/time functions P2: Query optimization basics — partitioning, clustering concepts, query plans | Published on Moodle | Apply BigQuery-specific syntax and functions Write cost-conscious queries Work confidently with various data types Quiz W2 (2 pts) Practice submission (2 pts) Assignment 1 deadline (8+2 pts) | |
| 3 | Aggregations, Business Metrics & NULL Handling Assignment 3 | ||
| Aggregation patterns: GROUP BY, HAVING, conditional aggregations The NULL problem: how NULLs propagate through calculations Defensive SQL: COALESCE, NULLIF, IFNULL, COUNT(*) vs COUNT(column) P1: Calculating business KPIs from transactional data P2: NULL handling exercises — identifying and fixing NULL-related bugs in queries | Published on Moodle | Calculate meaningful business metrics using aggregations Identify and correctly handle NULL values in analytical queries Validate query outputs against expected behavior Quiz W3 (2 pts) Practice submission (2 pts) Assignment 2 deadline (8+2 pts) | |
| 4 | Window Functions Assignment 4 Group Project Introduction | ||
| Window function fundamentals: OVER(), PARTITION BY, ORDER BY Ranking functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE Analytical functions: running totals, moving averages, LAG/LEAD P1: Implementing rankings and row numbering for business scenarios P2: Period-over-period comparisons, running calculations, identifying trends | Published on Moodle | Apply window functions to solve analytical problems Choose appropriate window specifications for different use cases Compare values across rows without collapsing data Quiz W4 (2 pts) Practice submission (2 pts) Assignment 3 deadline (8+2 pts) | |
| 5 | Cohort & Time-Series Analysis ASsignment 5 | ||
| Time-series analysis fundamentals: trends, seasonality, date spines Cohort analysis methodology: defining cohorts, measuring retention Common pitfalls in time-based analysis P1: Building date spines, filling gaps in time-series data P2: Constructing cohort tables and calculating retention metrics | Published on Moodle | Build cohort tables from raw event data Calculate and interpret retention rates Analyze trends and patterns over time Quiz W5 (2 pts) Practice submission (2 pts) Assignment 4 deadline (8+2 pts) | |
| 6 | Complex Data Structures Assignment 6 | ||
| Nested and repeated data in BigQuery: ARRAYs, STRUCTs Working with JSON data: JSON functions, extraction patterns UNNEST and flattening techniques P1: Querying nested structures — typical patterns with GA4/Firebase-style data P2: Transforming nested data into flat analytical tables | Published on Moodle | Work with complex nested data structures common in modern datasets Flatten and reshape data for analytical purposes Apply UNNEST and CROSS JOIN patterns appropriately Activities: Quiz W6 (2 pts) Practice submission (2 pts) Assignment 5 deadline (8+2 pts) | |
| 7 | Advanced Analytical Patterns | ||
| Funnel analysis: conversion tracking, drop-off identification Sessionization: defining and building sessions from event streams Attribution basics: first-touch, last-touch, multi-touch concepts P1: Building and analyzing conversion funnels P2: Implementing sessionization logic from raw events | Published on Moodle | Implement funnel analysis to track user journeys Define and construct sessions from event-level data Understand attribution models and their SQL implementation Quiz W7 (2 pts) Practice submission (2 pts) | |
| 8 | Integration and Project Presentations | ||
| Group project presentations Course synthesis: connecting the techniques Career paths in data analytics, further learning resources | Communicate analytical findings effectively to an audience Integrate multiple SQL techniques to solve complex analytical problems Group Project deadline (35 pts) |