Exercises‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌¶

Complete list of all available exercises in the course.‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌

Exercise Roadmap¶

Module 1: Databases¶

#	Exercise	Technology	Level	Status
1.1	Introduction to SQLite	SQLite + Pandas	Basic	Available
2.1	PostgreSQL HR	PostgreSQL	Intermediate	Available
2.2	PostgreSQL Gardening	PostgreSQL	Intermediate	Available
2.3	SQLite to PostgreSQL Migration	PostgreSQL + Python	Intermediate	Available
3.1	Oracle HR	Oracle Database	Advanced	Available
5.1	Excel/Python Analysis	Pandas + Excel	Basic	Available

Module 2: Data Cleaning and ETL¶

#	Exercise	Technology	Level	Status
02	ETL Pipeline QoG	PostgreSQL + Pandas	Advanced	Available

Module 3: Distributed Processing¶

#	Exercise	Technology	Level	Status
03	Distributed Processing with Dask	Dask + Parquet	Intermediate	Available

Module 4: Machine Learning¶

#	Exercise	Technology	Level	Status
04	Machine Learning (PCA, K-Means)	Scikit-Learn, PCA, K-Means	Advanced	Available
04.2	Transfer Learning Flowers	TensorFlow, MobileNetV2	Advanced	Available
ARIMA	Time Series ARIMA/SARIMA	statsmodels, Box-Jenkins	Advanced	Available

Module 5: NLP and Text Mining¶

#	Exercise	Technology	Level	Status
05	NLP and Text Mining	NLTK, TF-IDF, Jaccard, Sentiment	Advanced	Available

Module 6: Panel Data Analysis¶

#	Exercise	Technology	Level	Status
06	Panel Data Analysis	linearmodels, Panel OLS, Altair	Advanced	Available

Module 7: Big Data Infrastructure¶

#	Exercise	Technology	Level	Status
07	Big Data Infrastructure	Docker Compose, Apache Spark	Intermediate-Advanced	Available

Module 8: Streaming with Kafka¶

#	Exercise	Technology	Level	Status
08	Streaming with Kafka	Apache Kafka, Spark Streaming, KRaft	Advanced	Available

Module 9: Cloud with LocalStack¶

#	Exercise	Technology	Level	Status
09	Cloud with LocalStack	LocalStack, Terraform, AWS	Advanced	Available

Capstone Project¶

#	Exercise	Technology	Level	Status
TF	Capstone Integrative Project	Docker + Spark + PostgreSQL + QoG	Advanced	Available

MODULE 1: Databases¶

Exercise 1.1: Introduction to SQLite ¶

Details

Level: Basic
Dataset: NYC Taxi (10MB sample)
Technologies: SQLite, Pandas

What you'll learn:

Load CSV data into a SQLite database
Basic SQL queries (SELECT, WHERE, GROUP BY)
Optimization with indexes
Export results to CSV

Exercise Roadmap¶

Module 1: Databases¶

Module 2: Data Cleaning and ETL¶

Module 3: Distributed Processing¶

Module 4: Machine Learning¶

Module 5: NLP and Text Mining¶

Module 6: Panel Data Analysis¶

Module 7: Big Data Infrastructure¶

Module 8: Streaming with Kafka¶

Module 9: Cloud with LocalStack¶

Capstone Project¶

MODULE 1: Databases¶

MODULE 2: Data Cleaning and ETL¶

MODULE 3: Distributed Processing¶

MODULE 4: Machine Learning¶

MODULE 5: NLP and Text Mining¶

MODULE 6: Panel Data Analysis¶

MODULE 7: Big Data Infrastructure¶

MODULE 8: Streaming with Kafka¶

MODULE 9: Cloud with LocalStack¶

CAPSTONE PROJECT¶

Datasets Used¶

NYC Taxi & Limousine Commission (TLC)¶

Quality of Government (QoG)¶

AirPassengers¶

How to Work Through Exercises¶

Recommended Workflow¶

Next Steps¶