To provide a detailed step-by-step explanation of Kubernetes in the context of the files you’ve uploaded, I’ll use your project as an example. Here’s an overview and breakdown:
1. What is Kubernetes?
Kubernetes is an open-source container orchestration platform for managing containerized applications across a cluster of machines. It automates deployment, scaling, and operations of application containers.
Key Concepts:
- Pods: Smallest deployable units, encapsulating one or more containers.
- Services: Expose a set of Pods as a network service.
- Deployments: Define how Pods are created and managed.
- Jobs: Ensure specific tasks are completed (e.g., model training).
- Volumes: Provide persistent storage for containers.
2. Overview of Your Project
Your project involves:
- Flask API:
predict.py
serves predictions. - Streamlit App:
app.py
interacts with users to send requests to the API. - Model Training:
train.py
trains and saves a linear regression model. - Kubernetes Deployment: Managed using YAML files (
deployment.yaml
,service.yaml
,train-job.yaml
) andrun_pipeline.sh
.
3. Steps to Deploy with Kubernetes
Step 1: Containerize the Application
Kubernetes uses Docker containers. Your Dockerfile
ensures:
- The environment is consistent.
- Dependencies for
predict.py
are installed. - The application is runnable.
Example Dockerfile (assumed from context):
FROM python:3.8-slim
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["python", "predict.py"]
Step 2: Kubernetes Job for Training
Your run_pipeline.sh
creates a Kubernetes Job to train the model.
Key Steps in Training Job:
- Volume mounts provide the dataset (
dataset.csv
) and a path to savemodel.pkl
. - Job YAML dynamically applies training logic using
train.py
.
Snippet from run_pipeline.sh
:
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: train-job
spec:
template:
spec:
containers:
- name: train-job
image: $DOCKER_IMAGE
command: ["python", "train.py"]
volumes:
- name: dataset-volume
hostPath:
path: /mnt/data/dataset.csv
EOF
Step 3: API Deployment
After training, the Flask API (predict.py
) is deployed. Kubernetes Deployment YAML defines:
- Number of replicas.
- Image to use (from Docker Hub).
- Port configuration.
Deployment YAML Example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: flask-api-deployment
spec:
replicas: 2
selector:
matchLabels:
app: flask-api
template:
metadata:
labels:
app: flask-api
spec:
containers:
- name: flask-api
image: modeha/flask-api:latest
ports:
- containerPort: 5000
Step 4: Exposing the API
A Kubernetes Service exposes the API internally or externally (e.g., via NodePort).
Service YAML Example:
apiVersion: v1
kind: Service
metadata:
name: flask-api-service
spec:
selector:
app: flask-api
ports:
- protocol: TCP
port: 80
targetPort: 5000
type: NodePort
Step 5: Using the Streamlit Interface
Your Streamlit app (app.py
) sends requests to the API to predict house prices based on user inputs.
4. Running the Pipeline
- Build and Push Docker Image:
docker build -t modeha/my-app:latest . docker push modeha/my-app:latest
- Run the Pipeline Script:
./run_pipeline.sh my-app
This:
- Kills processes blocking the required port.
- Trains the model (
train.py
) using a Kubernetes Job. - Deploys the API and exposes it.
- Access the API via Streamlit:
- Launch
app.py
:streamlit run app.py
- Input house features and get predictions.
- Launch
5. Next Steps
- Scaling: Adjust replicas in your Deployment YAML to scale the API.
- Monitoring: Use Kubernetes tools like
kubectl logs
, Prometheus, or Grafana. - CI/CD Integration: Automate deployments with Jenkins, GitHub Actions, or other CI/CD tools.