🚀 Deployment Guide¶
Prerequisites¶
System Requirements¶
- OS: Linux (Ubuntu 20.04+), macOS, or Windows with WSL2
- RAM: Minimum 8GB, Recommended 16GB+
- CPU: Minimum 4 cores, Recommended 8+ cores
- Storage: Minimum 50GB free space
- Network: Internet connection for Docker image downloads
Software Requirements¶
- Docker: Version 20.10+
- Docker Compose: Version 2.0+
- Git: For cloning the repository
- curl: For health checks
Installation Commands¶
Ubuntu/Debian¶
# Update package list
sudo apt update
# Install Docker
sudo apt install -y docker.io docker-compose-plugin
# Install Git
sudo apt install -y git curl
# Start Docker service
sudo systemctl start docker
sudo systemctl enable docker
# Add user to docker group
sudo usermod -aG docker $USER
macOS¶
# Install Homebrew (if not installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install Docker Desktop
brew install --cask docker
# Install Git
brew install git curl
Windows (WSL2)¶
# Install Docker Desktop for Windows
# Download from: https://www.docker.com/products/docker-desktop
# Install Git
winget install Git.Git
# Install curl
winget install cURL.cURL
Quick Start Deployment¶
1. Clone Repository¶
2. Environment Setup¶
# Create environment file (optional)
cp .env.example .env
# Edit environment variables if needed
nano .env
3. Start Services¶
4. Verify Deployment¶
# Check service health
curl -f http://localhost:8080/v1/info # Trino
curl -f http://localhost:3030/health # Dagster
curl -f http://localhost:9000/minio/health/live # MinIO
Detailed Deployment Steps¶
Step 1: Infrastructure Services¶
PostgreSQL Database¶
# Start PostgreSQL
docker-compose up -d postgres
# Wait for database initialization
docker-compose logs -f postgres
# Verify connection
docker exec -it postgres psql -U postgres -c "SELECT version();"
MinIO Object Storage¶
# Start MinIO
docker-compose up -d minio minio-setup
# Wait for bucket creation
docker-compose logs -f minio-setup
# Verify buckets
docker exec -it minio-setup-bucket /usr/bin/mc ls minio/
Step 2: Metadata Services¶
Hive Metastore¶
# Start Hive Metastore
docker-compose up -d hive-metastore
# Wait for initialization
docker-compose logs -f hive-metastore
# Verify metastore
docker exec -it hive-metastore hive --service metastore --version
Apache Ranger¶
# Start Ranger
docker-compose up -d ranger
# Wait for initialization (may take 5-10 minutes)
docker-compose logs -f ranger
# Verify Ranger UI
curl -f http://localhost:6080
Step 3: Processing Services¶
Apache Spark¶
# Start Spark cluster
docker-compose up -d spark-master spark-worker spark-worker-b spark-worker-c
# Verify Spark cluster
docker exec -it spark-driver spark-submit --version
# Check Spark UI
curl -f http://localhost:8081
Trino Query Engine¶
# Start Trino
docker-compose up -d trino
# Wait for initialization
docker-compose logs -f trino
# Verify Trino
curl -f http://localhost:8080/v1/info
Step 4: Orchestration and Analytics¶
Dagster Orchestration¶
# Start Dagster
docker-compose up -d dagster
# Wait for initialization
docker-compose logs -f dagster
# Verify Dagster UI
curl -f http://localhost:3030
Apache Superset¶
# Start Superset
docker-compose up -d superset-bi
# Wait for initialization (may take 5-10 minutes)
docker-compose logs -f superset-bi
# Verify Superset UI
curl -f http://localhost:8088
Hue SQL Interface¶
# Start Hue
docker-compose up -d hue
# Wait for initialization
docker-compose logs -f hue
# Verify Hue UI
curl -f http://localhost:8888
Configuration¶
Environment Variables¶
Core Configuration¶
# MinIO Configuration
MINIO_ENDPOINT=http://minio:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
S3_BUCKET_LIST=datalake,logger,warehouse
# Database Configuration
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgrespassword
HIVE_DB_USER=hiveuser
HIVE_DB_PASSWORD=hivepassword
RANGER_DB_PASS=rangerpassword
HUE_DB_PASS=huepassword
# Warehouse Configuration
WAREHOUSE_DIR=s3a://warehouse/
Service-Specific Configuration¶
# Spark Configuration
SPARK_MODE=master
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=1G
SPARK_EXECUTOR_MEMORY=1G
SPARK_DRIVER_MEMORY=1G
# Ranger Configuration
RANGER_SERVICE_NAME=hivedev
RANGER_POLICY_REST_URL=http://ranger:6080
RANGER_POLICY_CACHE_DIR=/tmp/ranger
RANGER_POLICY_POLL_INTERVAL=60000
Service Ports¶
| Service | Port | Protocol | Purpose |
|---|---|---|---|
| Trino | 8080 | HTTP | SQL query interface |
| Dagster | 3030 | HTTP | Workflow orchestration |
| Superset | 8088 | HTTP | BI dashboard |
| MinIO | 9000 | HTTP | Object storage API |
| MinIO Console | 9001 | HTTP | Storage management |
| Hue | 8888 | HTTP | SQL query interface |
| Ranger | 6080 | HTTP | Security management |
| Spark Master | 8081 | HTTP | Spark cluster UI |
| Spark Worker 1 | 8082 | HTTP | Worker UI |
| Spark Worker 2 | 8083 | HTTP | Worker UI |
| Spark Worker 3 | 8084 | HTTP | Worker UI |
| PostgreSQL | 5432 | TCP | Database |
| Hive Metastore | 9083 | TCP | Metadata service |
Service Management¶
Starting Services¶
# Start all services
docker-compose up -d
# Start specific service
docker-compose up -d trino
# Start with logs
docker-compose up trino
Stopping Services¶
# Stop all services
docker-compose down
# Stop specific service
docker-compose stop trino
# Stop and remove volumes
docker-compose down -v
Restarting Services¶
# Restart all services
docker-compose restart
# Restart specific service
docker-compose restart trino
# Force recreate service
docker-compose up -d --force-recreate trino
Monitoring Services¶
# View service status
docker-compose ps
# View service logs
docker-compose logs -f trino
# View resource usage
docker stats
# Check service health
docker-compose exec trino curl -f http://localhost:8080/v1/info
Data Initialization¶
1. Create Initial Tables¶
# Connect to Trino
docker exec -it trino trino --server http://localhost:8080
# Create databases
CREATE SCHEMA IF NOT EXISTS iceberg.asset_property;
CREATE SCHEMA IF NOT EXISTS iceberg.flight_radar;
CREATE SCHEMA IF NOT EXISTS iceberg.ecommerce;
2. Run Initial Data Pipeline¶
# Trigger Dagster workflows
curl -X POST http://localhost:3030/graphql \
-H "Content-Type: application/json" \
-d '{"query": "mutation { launchRun(executionParams: {selector: {pipelineName: \"asset_property_pipeline\"}}) { run { id } } }"}'
3. Verify Data¶
# Check MinIO Bronze layer
docker exec -it minio-setup-bucket /usr/bin/mc ls minio/datalake/bronze/
# Check MinIO Prepare layer
docker exec -it minio-setup-bucket /usr/bin/mc ls minio/datalake/prepare/
# Check Delta tables
docker exec -it trino trino --server http://localhost:8080 --execute "SHOW TABLES FROM spark_catalog.flight_radar_prepared;"
Troubleshooting¶
Common Issues¶
Service Won't Start¶
# Check logs
docker-compose logs <service-name>
# Check resource usage
docker stats
# Restart service
docker-compose restart <service-name>
Database Connection Issues¶
# Check PostgreSQL status
docker-compose exec postgres pg_isready -U postgres
# Check database exists
docker-compose exec postgres psql -U postgres -c "\l"
# Reset database
docker-compose down -v
docker-compose up -d postgres init-postgres
MinIO Connection Issues¶
# Check MinIO status
docker-compose exec minio mc admin info minio
# Check Bronze layer buckets
docker-compose exec minio-setup-bucket /usr/bin/mc ls minio/datalake/bronze/
# Check Prepare layer buckets
docker-compose exec minio-setup-bucket /usr/bin/mc ls minio/datalake/prepare/
# Reset MinIO
docker-compose down -v
docker-compose up -d minio minio-setup
Spark Cluster Issues¶
# Check Spark master
docker-compose exec spark-driver spark-submit --version
# Check worker connectivity
docker-compose exec spark-worker-1 spark-submit --version
# Restart Spark cluster
docker-compose restart spark-master spark-worker spark-worker-b spark-worker-c
Performance Tuning¶
Memory Configuration¶
# Increase Spark memory
SPARK_WORKER_MEMORY=2G
SPARK_EXECUTOR_MEMORY=2G
SPARK_DRIVER_MEMORY=2G
# Increase Trino memory
TRINO_JVM_HEAP_SIZE=2G
Storage Configuration¶
Log Management¶
# View all logs
docker-compose logs
# View specific service logs
docker-compose logs -f trino
# Save logs to file
docker-compose logs > deployment.log
# Clear old logs
docker system prune -f
Production Deployment¶
Security Considerations¶
# Change default passwords
POSTGRES_PASSWORD=<secure-password>
MINIO_ACCESS_KEY=<secure-access-key>
MINIO_SECRET_KEY=<secure-secret-key>
# Enable SSL/TLS
TRINO_HTTPS_ENABLED=true
SUPERSET_HTTPS_ENABLED=true
Scaling Configuration¶
# Add more Spark workers
docker-compose up -d spark-worker-d spark-worker-e
# Scale Trino workers
docker-compose up -d trino-worker-1 trino-worker-2
# Increase MinIO storage
MINIO_STORAGE_SIZE=1T
Backup Strategy¶
# Backup PostgreSQL
docker-compose exec postgres pg_dump -U postgres > backup.sql
# Backup MinIO data
docker-compose exec minio-setup-bucket /usr/bin/mc mirror minio/ /backup/
# Backup configurations
tar -czf config-backup.tar.gz docker-compose.yml .env trino/etc/ hive/conf/
Maintenance¶
Regular Tasks¶
# Update Docker images
docker-compose pull
docker-compose up -d
# Clean up unused resources
docker system prune -f
# Monitor disk usage
docker system df
# Check service health
./scripts/health-check.sh
Update Procedures¶
# Stop services
docker-compose down
# Pull latest images
docker-compose pull
# Start services
docker-compose up -d
# Verify deployment
./scripts/verify-deployment.sh
Last update:
October 3, 2025
Created: October 3, 2025
Created: October 3, 2025