ML Pipeline Architecture
End-to-end machine learning pipeline with data ingestion, feature engineering, model training, and real-time predictions.
Services
- API Gateway (api_gateway) — Exposes prediction and data ingestion endpoints — FastAPI, Python
- Data Ingestion (service) — Collects and validates incoming data — Python, Pandas
- Feature Store (database) — Stores computed features for training and serving — Redis, Feast
- Model Training (service) — Trains and evaluates ML models — Python, PyTorch
- SageMaker (external) — Manages training jobs and model hosting — AWS SageMaker
- Prediction Service (service) — Serves real-time model predictions — Python, FastAPI
- PostgreSQL (database) — Stores experiment metadata and results — PostgreSQL
- S3 (external) — Stores training data and model artifacts — AWS S3
- SQS (queue) — Queues training jobs and batch predictions — AWS SQS
Connections
- API Gateway → Data Ingestion (sync_http)
- API Gateway → Prediction Service (sync_http)
- Data Ingestion → S3 (sync_http)
- Data Ingestion → SQS (async_event)
- SQS → Model Training (async_event)
- Model Training → SageMaker (sync_http)
- Model Training → S3 (sync_http)
- Model Training → Feature Store (db_access)
- Prediction Service → Feature Store (db_access)
- Model Training → PostgreSQL (db_access)