Module 09: Cloud Engineering with LocalStack¶
Introduction¶
Cloud Computing has revolutionized the way we deploy and scale applications. However, learning AWS, Azure, or GCP has a barrier: cost. A mistake in production can generate unexpected bills.
LocalStack solves this problem by simulating AWS services locally. You can learn S3, Lambda, DynamoDB, Kinesis, and more without spending a cent. If it works on LocalStack, it works on real AWS.
Terraform is the standard tool for "Infrastructure as Code" (IaC). Instead of clicking through web consoles, you define your infrastructure in text files that you can version, review, and reuse.
Fundamental Concepts¶
Cloud Computing: The 3 Models¶
| Model | Description | Example |
|---|---|---|
| IaaS | Infrastructure as a Service | EC2, VMs |
| PaaS | Platform as a Service | Elastic Beanstalk, Heroku |
| SaaS | Software as a Service | Gmail, Salesforce |
Key AWS Services for Big Data¶
| Service | Purpose | Open Source Equivalent |
|---|---|---|
| S3 | Object storage (Data Lake) | MinIO |
| Lambda | Serverless functions | OpenFaaS |
| Kinesis | Data streaming | Kafka |
| DynamoDB | NoSQL database | MongoDB |
| EventBridge | Event orchestration | Cron + Kafka |
| IAM | Access control | - |
Data Lake Architecture (Medallion)¶
┌─────────────────────────────────────────────────────────────┐
│ DATA LAKE (S3) │
├───────────────┬───────────────────┬─────────────────────────┤
│ BRONZE │ SILVER │ GOLD │
│ (Raw Data) │ (Cleaned Data) │ (Business-Ready) │
├───────────────┼───────────────────┼─────────────────────────┤
│ - Datos crudos│ - Datos limpios │ - Agregaciones │
│ - JSON/CSV │ - Parquet │ - KPIs │
│ - Sin esquema │ - Con esquema │ - Dashboards │
│ - Append-only │ - Deduplicados │ - ML-ready │
└───────────────┴───────────────────┴─────────────────────────┘
Required Tools¶
- Docker and Docker Compose: For LocalStack
- Python 3.9+: Main language
- Terraform: Infrastructure as code
- awscli-local: AWS CLI for LocalStack
- boto3: AWS SDK for Python
Dependency Installation¶
# Python
pip install boto3 requests
# Terraform (Windows con Chocolatey)
choco install terraform
# Terraform (Linux/Mac)
brew install terraform
# AWS CLI Local
pip install awscli-local
Challenge 1: Set Up LocalStack¶
Objective: Create a functional LocalStack environment with Docker.
Difficulty: Basic
Instructions¶
-
Create a directory for the project:
-
Create a
docker-compose.ymlfile with LocalStack -
The service must:
- Use image
localstack/localstack:latest - Expose port 4566 (unified gateway)
- Enable services: s3, lambda, dynamodb, events
-
Mount a volume for persistence
-
Start and verify:
Success Criteria¶
- LocalStack container running
- Health endpoint responds with active services
- You can list S3 buckets (empty):
awslocal s3 ls
Hints¶
- The
SERVICESvariable defines which services to activate LOCALSTACK_HOST=localhostfor local connections- Port 4566 is the gateway for all services
Resources¶
Challenge 2: Create an S3 Bucket with Terraform¶
Objective: Define an S3 bucket using Terraform and deploy it to LocalStack.
Difficulty: Basic
Instructions¶
-
Create a
main.tffile -
Configure the AWS provider for LocalStack:
-
Define an S3 bucket:
-
Run Terraform:
Success Criteria¶
-
terraform initdownloads the provider -
terraform planshows the bucket to be created -
terraform applycreates the bucket -
awslocal s3 lsshows "mi-data-lake"
Hints¶
- Endpoints must point to
http://localhost:4566 access_keyandsecret_keycan be any value in LocalStack- Use
terraform destroyto clean up resources
Challenge 3: Your First Lambda (Hello World)¶
Objective: Create a Lambda function and deploy it with Terraform.
Difficulty: Intermediate
Instructions¶
-
Create a
lambdas/folder with ahello.pyfile: -
Package the Lambda into a ZIP:
-
Add to
main.tf: -
Apply and test:
Success Criteria¶
- Lambda created in LocalStack
- Invocation returns statusCode 200
- The body contains your message
Hints¶
- In LocalStack, the role can be a fictitious ARN
- The handler is
filename.function_name - Use
source_code_hashto detect changes in the ZIP
Challenge 4: Lambda that Consumes an External API¶
Objective: Create a Lambda that captures ISS data in real time.
Difficulty: Intermediate
Instructions¶
-
Create
lambdas/capturar_iss.py:import json import urllib.request ISS_API = "https://api.wheretheiss.at/v1/satellites/25544" def handler(event, context): """Captura posicion actual de la ISS""" # Implementa: # 1. Hacer request a la API # 2. Parsear el JSON # 3. Extraer: latitude, longitude, altitude, velocity # 4. Retornar los datos formateados pass -
Note: Lambda does not have
requests, useurllib.request: -
Deploy and test the Lambda
Success Criteria¶
- Lambda deploys correctly
- Returns the current ISS position
- Data includes lat, lon, alt, velocity
Hints¶
- Lambda has library limitations - use the standard library
- The default timeout is 3 seconds, it may need more
- Handle network exceptions
Challenge 5: Save Data to S3¶
Objective: Modify the Lambda to save ISS data to S3.
Difficulty: Intermediate
Instructions¶
- Modify
capturar_iss.pyto: - Connect to S3 using boto3
- Save each capture as JSON in the bucket
-
Use path:
raw/iss/{date}/{timestamp}.json -
Structure of the saved file:
-
Configure boto3 for LocalStack:
Success Criteria¶
- Lambda saves file to S3
- Path includes date and timestamp
- File contains valid ISS data
- You can list files:
awslocal s3 ls s3://bucket/raw/iss/ --recursive
Hints¶
- In Docker, use
host.docker.internalto access LocalStack s3.put_object(Bucket, Key, Body)to saveBodymust be a string or bytes
Challenge 6: Scheduling with EventBridge¶
Objective: Schedule the Lambda to run automatically every minute.
Difficulty: Advanced
Instructions¶
-
Add an EventBridge rule to
main.tf: -
Connect the rule to the Lambda:
-
Add permissions so that EventBridge can invoke the Lambda:
Success Criteria¶
- EventBridge rule created
- Lambda runs automatically
- Files appear in S3 every minute
Hints¶
- Use
schedule_expression = "rate(1 minute)"or cron - The permission requires
statement_id,action,function_name,principal - Check logs:
awslocal logs tail /aws/lambda/capturar-iss
Challenge 7: DynamoDB for Metadata¶
Objective: Create a DynamoDB table to store capture metadata.
Difficulty: Advanced
Instructions¶
-
Add a DynamoDB table in Terraform:
-
Modify the Lambda to save metadata:
dynamodb = boto3.resource('dynamodb', endpoint_url='http://host.docker.internal:4566', # ... ) table = dynamodb.Table('capturas-iss') table.put_item(Item={ 'capture_id': timestamp, 's3_path': f's3://bucket/raw/iss/{fecha}/{timestamp}.json', 'latitude': data['latitude'], 'longitude': data['longitude'], # ... })
Success Criteria¶
- DynamoDB table created
- Lambda saves metadata on each execution
- You can query items:
awslocal dynamodb scan --table-name capturas-iss
FINAL Challenge: Tracker Dashboard¶
Objective: Create a web visualization that shows the ISS position in real time.
Difficulty: Advanced
Evaluation Criteria¶
| Criterion | Points |
|---|---|
| Map with current ISS position | 20 |
| Custom ISS icon | 10 |
| Automatic update (every 5-10 sec) | 20 |
| Trajectory/movement trail | 15 |
| Live data (lat, lon, alt, vel) | 15 |
| Flyover predictor for a city | 10 |
| Professional design | 10 |
| Total | 100 |
Technical Requirements¶
- HTML5 + vanilla JavaScript
- Leaflet.js for maps
- Fetch API for live data
- No Jupyter/Colab dependencies
Suggestions¶
- Use the API directly:
https://api.wheretheiss.at/v1/satellites/25544 - Nominatim for geocoding cities
- SVG for the ISS icon
Submission¶
- Self-contained HTML file
- Screenshot showing it working
- Brief documentation
Reference: You can see an example of a professional tracker at ISS Tracker, but the challenge is to create your own version.
From LocalStack to Real AWS¶
When you are ready for production:
Required Changes¶
-
Real credentials:
-
Remove local endpoints:
-
Real IAM roles:
- Create roles with minimum required permissions
-
Use managed policies when possible
-
Cost considerations:
- S3: $0.023/GB/month
- Lambda: 1M invocations free, then $0.20/1M
- DynamoDB: Pay-per-request or provisioned
Resources and References¶
Official Documentation¶
Data APIs¶
- ISS Position:
https://api.wheretheiss.at/v1/satellites/25544 - ISS Astronauts:
http://api.open-notify.org/astros.json - Geocoding:
https://nominatim.openstreetmap.org/search
---
Course: Big Data with Python - From Zero to Production Instructor: Juan Marcelo Gutierrez Miranda | @TodoEconometria Hash ID: 4e8d9b1a5f6e7c3d2b1a0f9e8d7c6b5a4f3e2d1c0b9a8f7e6d5c4b3a2f1e0d9c Methodology: Progressive exercises with real data and professional tools
Academic References:
- Wittig, M., & Wittig, A. (2019). Amazon Web Services in Action (2nd ed.). Manning Publications. ISBN: 978-1617295119.
- Brikman, Y. (2019). Terraform: Up & Running (2nd ed.). O'Reilly Media. ISBN: 978-1492046905.
- Chauhan, A. (2020). Infrastructure as Code (IaC) for Beginners. Medium - Towards Data Science.
- Jonas, E., et al. (2019). Cloud programming simplified: A Berkeley view on serverless computing. arXiv preprint arXiv:1902.03383.
- LocalStack Team (2024). LocalStack Documentation. https://docs.localstack.cloud/