Your First Exercise‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌¶

This guide will take you step by step through the process of completing and submitting your first exercise.‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌

General Workflow¶

graph LR
    A[Fork Repo] --> B[Clone to your PC]
    B --> C[Work on Exercise]
    C --> D[Document PROMPTS.md]
    D --> E[Commit Changes]
    E --> F[Push to your Fork]
    F --> G[Automatic Evaluation]

PROMPTS-Based Evaluation System

Pull Requests are NOT used. The system evaluates your PROMPTS.md file directly in your fork. You only need to do git push.

Step 1: Open the Project in PyCharm¶

Open PyCharm
File → Open...
Select the ejercicios-bigdata/ folder
Click "OK"

First time in PyCharm?

PyCharm will ask if you trust the project. Click "Trust Project".

Step 2: Configure the Python Interpreter¶

PyCharm should automatically detect the virtual environment. If it doesn't:

File → Settings (Windows/Linux) or PyCharm → Preferences (macOS)
Project: ejercicios-bigdata → Python Interpreter
Click the gear icon → Add
Select "Existing environment"
Browse to .venv/Scripts/python.exe (Windows) or .venv/bin/python (macOS/Linux)
Click "OK"

Step 3: Navigate to Your First Exercise¶

In PyCharm's file explorer:

ejercicios-bigdata/
└── ejercicios/
    └── 01_cargar_sqlite.py  ← Open this file

Exercise Structure

Each exercise has:

Base code: .py file with instructions
Data: datos/ folder with datasets
README: Detailed explanation of the exercise

Step 4: Read the Problem Statement¶

IMPORTANT: Read the ENTIRE file before you start coding.

The exercise will have sections like:

"""
Exercise 01: Data Loading with SQLite

OBJECTIVE:
Learn to load data from CSV into an SQLite database

DATASET:
- File: datos/muestra_taxi.csv
- Size: ~10MB
- Records: ~100,000

TASKS:
1. Load CSV in chunks into SQLite
2. Create indexes to optimize queries
3. Run analysis queries
4. Export results

ESTIMATED TIME: 2-3 hours
"""

Step 5: Create a Working Branch¶

NEVER work directly on main. Always create a branch:

# Make sure you are on main and up to date
git checkout main
git pull origin main

# Create a branch with your last name and exercise number
git checkout -b garcia-ejercicio-01

# Verify you are on the correct branch
git branch
# Should show: * garcia-ejercicio-01

Branch Naming Convention

Use the format: your-lastname-exercise-XX

Examples: - garcia-ejercicio-01 - martinez-ejercicio-02

Step 6: Work on the Exercise¶

Edit the Code¶

Open ejercicios/01_cargar_sqlite.py and start working.

Code Example

import sqlite3
import pandas as pd

# Task 1: Load CSV in chunks
def cargar_datos_sqlite(csv_path, db_path, chunksize=10000):
    """
    Loads a large CSV into SQLite in chunks to avoid memory issues
    """
    conn = sqlite3.connect(db_path)

    # Read CSV in parts
    chunks = pd.read_csv(csv_path, chunksize=chunksize)

    for i, chunk in enumerate(chunks):
        chunk.to_sql('trips', conn, if_exists='append', index=False)
        print(f"Chunk {i+1} loaded ({len(chunk)} records)")

    conn.close()
    print("Loading complete!")

# Execute
if __name__ == "__main__":
    cargar_datos_sqlite(
        csv_path='datos/muestra_taxi.csv',
        db_path='datos/taxi.db'
    )

Test Your Code¶

Run your code frequently to verify it works:

From PyCharmFrom Terminal

Right-click on the file
Run 'ejercicio_01'
Or press Shift + F10

# Make sure the virtual environment is activated
python ejercicios/01_cargar_sqlite.py

Debug Frequently

Don't write all the code at once. Write a function, test it, and continue.

Step 7: Save Your Work with Git¶

When you have significant progress (for example, you completed a task):

# See which files you changed
git status

# Add the modified files
git add ejercicios/01_cargar_sqlite.py

# Commit with a descriptive message
git commit -m "Implement CSV to SQLite chunk loading"

# Continue working...

Good Commit Messages

GOOD: - "Implement CSV to SQLite chunk loading" - "Add indexes to optimize queries" - "Complete revenue analysis by hour"

BAD: - "update" - "fix" - "asdfasdf"

Step 8: Push to GitHub¶

When you have completed the exercise:

# Make a final commit
git add .
git commit -m "Complete exercise 01: SQLite data loading"

# Push your branch to GitHub
git push origin garcia-ejercicio-01

First time pushing?

Git will ask for authentication. Use your GitHub username and password, or configure SSH keys.

Step 9: Verify Your Submission¶

Go to your fork on GitHub: https://github.com/YOUR_USERNAME/ejercicios-bigdata
Navigate to your submission folder
Verify that all your files are there, especially PROMPTS.md

Submission Completed

You don't need to do anything else. The system evaluates your PROMPTS.md automatically.

Step 10: The PROMPTS.md File¶

This is the most important file of your submission.

Document your AI prompts as you work:

# AI Prompts - Exercise 01

## Prompt A: Load data into SQLite

**AI used:** ChatGPT / Claude / etc.

**Exact prompt:**
> how do i load a large csv into sqlite using python with chunks

---

## Prompt B: Optimize queries

[Same format...]

---

## Final Blueprint

[When finished, ask the AI for a summary of what you built]

DO NOT clean your prompts

Paste your prompts EXACTLY as you wrote them, with errors and all. The system detects if they were "cleaned".

Best Practices¶

Clean Code¶

# ✅ GOOD - Readable code with comments
def calcular_promedio_tarifas(db_path):
    """
    Calculates the average fare by hour of day

    Args:
        db_path: Path to the SQLite database

    Returns:
        DataFrame with average fares by hour
    """
    conn = sqlite3.connect(db_path)

    query = """
        SELECT
            strftime('%H', pickup_datetime) as hora,
            AVG(total_amount) as promedio_tarifa
        FROM trips
        GROUP BY hora
        ORDER BY hora
    """

    resultado = pd.read_sql_query(query, conn)
    conn.close()

    return resultado

# ❌ BAD - No documentation, confusing names
def calc(p):
    c = sqlite3.connect(p)
    r = pd.read_sql_query("SELECT strftime('%H', pickup_datetime) as h, AVG(total_amount) as t FROM trips GROUP BY h", c)
    c.close()
    return r

Atomic Commits¶

Make small and specific commits:

# ✅ GOOD - Small and descriptive commits
git commit -m "Add data loading function"
git commit -m "Implement index creation"
git commit -m "Add analysis queries"

# ❌ BAD - One giant commit
git commit -m "Entire exercise"

Test Before Uploading¶

# Always verify it works before pushing
python ejercicios/01_cargar_sqlite.py

# If it works, then push
git push origin garcia-ejercicio-01

Exercise Checklist¶

Before uploading your work (git push), verify:

The code runs without errors
All exercise tasks are complete
The code is documented (comments, docstrings)
Commits have descriptive messages
The code follows Python best practices
You tested with the complete dataset

Common Problems¶

Error: ModuleNotFoundError: No module named 'pandas'

Cause: Virtual environment not activated or dependencies not installed.

Solution:

# Activate virtual environment
source .venv/bin/activate  # macOS/Linux
.venv\Scripts\activate      # Windows

# Install dependencies
pip install -r requirements.txt

Git says: 'Your branch is behind origin/main'

Cause: Your local main branch is outdated.

Solution:

git checkout main
git pull origin main
git checkout garcia-ejercicio-01
git merge main

Cannot push: 'Permission denied'

Cause: Authentication issues with GitHub.

Solution: Configure SSH keys or use a Personal Access Token.

See: GitHub Authentication

PyCharm can't find the data

Cause: Incorrect relative path.

Solution: Use relative paths from the project root:

# ✅ CORRECT
csv_path = 'datos/muestra_taxi.csv'

# ❌ WRONG
csv_path = '../datos/muestra_taxi.csv'

‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‍‌‌‌‍‌‌‌‌‍‌‌‍‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‍‌‌‌‌‍‌‌‌‌---

Next Steps¶

Once you have completed your first exercise:

Sync Fork - Keep your fork up to date
Course Roadmap - See all available exercises
Useful Commands - Git Cheatsheet