Deep Learning

End-to-End Machine Learning Pipeline using Kubernetes, starting from the dataset to deploying a trained model.

Here’s the workflow:

Setup Overview

We’ll use Kubernetes to:

Preprocess a dataset.
Train a model using train.py.
Save the trained model.
Deploy the trained model as an API for predictions.

Prerequisites

Install Kubernetes on your Mac:
- Use Docker Desktop with Kubernetes enabled, or install Kubernetes via Minikube.
Install kubectl:
- Verify Kubernetes is running:
```
kubectl get nodes
```
Install Python (if needed) and ML libraries like scikit-learn or TensorFlow.
Install Helm (optional): For managing Kubernetes packages.

Step 1: Dataset Preparation

We’ll use a simple CSV dataset for house prices:

# Save this as dataset.csv
square_footage,bedrooms,bathrooms,price
1400,3,2,300000
1600,4,2,350000
1700,4,3,400000
1200,2,1,200000
1500,3,2,320000

Place this dataset in a directory, for example, /Users/yourname/k8s-ml-pipeline.

Step 2: Create a `train.py` Script

Here’s a basic training script using scikit-learn:

# train.py
import pandas as pd
from sklearn.linear_model import LinearRegression
import pickle

# Load the dataset
data = pd.read_csv("dataset.csv")

# Features and target variable
X = data[["square_footage", "bedrooms", "bathrooms"]]
y = data["price"]

# Train the model
model = LinearRegression()
model.fit(X, y)

# Save the model
with open("model.pkl", "wb") as f:
    pickle.dump(model, f)

print("Model trained and saved as model.pkl")

Step 3: Dockerize `train.py`

Create a Dockerfile:

FROM python:3.9-slim

# Copy files into the container
COPY train.py /app/train.py
COPY dataset.csv /app/dataset.csv

# Set the working directory
WORKDIR /app

# Install dependencies
RUN pip install pandas scikit-learn

# Default command
CMD ["python", "train.py"]

Build the Docker Image:
```
docker build -t train-ml:latest .
```

Step 4: Create a Kubernetes Job for Training

Job YAML (train-job.yaml):

apiVersion: batch/v1
kind: Job
metadata:
  name: train-job
spec:
  template:
    spec:
      containers:
      - name: train-container
        image: train-ml:latest
        volumeMounts:
        - mountPath: /app
          name: model-volume
      restartPolicy: Never
      volumes:
      - name: model-volume
        hostPath:
          path: /Users/yourname/k8s-ml-pipeline

Run the Job:
```
kubectl apply -f train-job.yaml
```

Check Logs:

kubectl logs job/train-job

This will output:

Model trained and saved as model.pkl

The model.pkl file will be saved locally in /Users/yourname/k8s-ml-pipeline.

Step 5: Deploy the Trained Model as an API

Create a predict.py Script:

# predict.py
import pickle
from flask import Flask, request, jsonify

# Load the trained model
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

app = Flask(__name__)

@app.route("/predict", methods=["POST"])
def predict():
    data = request.get_json()
    X = [[data["square_footage"], data["bedrooms"], data["bathrooms"]]]
    prediction = model.predict(X)
    return jsonify({"predicted_price": prediction[0]})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

Dockerize predict.py:

FROM python:3.9-slim

# Copy files
COPY predict.py /app/predict.py
COPY model.pkl /app/model.pkl

# Set working directory
WORKDIR /app

# Install dependencies
RUN pip install flask scikit-learn

# Default command
CMD ["python", "predict.py"]

Build the API Docker Image:
```
docker build -t predict-ml:latest .
```

Deployment YAML (predict-deployment.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: predict-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: predict-api
  template:
    metadata:
      labels:
        app: predict-api
    spec:
      containers:
      - name: predict-container
        image: predict-ml:latest
        ports:
        - containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
  name: predict-service
spec:
  selector:
    app: predict-api
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000
  type: LoadBalancer

Deploy the API:

kubectl apply -f predict-deployment.yaml

Access the API:

Find the service IP:
```
kubectl get services
```

Test the API:

curl -X POST -H "Content-Type: application/json" \
  -d '{"square_footage": 1600, "bedrooms": 3, "bathrooms": 2}' \
  http://<EXTERNAL-IP>/predict

Step 6: Clean Up

To clean up resources:

kubectl delete -f train-job.yaml
kubectl delete -f predict-deployment.yaml

Summary

Dataset: Prepared and mounted into the container.
Training: Kubernetes Job ran train.py and saved the model.
API Deployment: The trained model was deployed as a REST API using Kubernetes Deployment and Service.

This pipeline can scale as needed and is fully containerized for portability and reproducibility.