In this post we will deploy a simple dockerized Machine learning application written with the Flask micro-framework to the Heroku (PaaS), using one of the leading CI/CD tools, which is Gitlab!, all used tools are free and no fees in achieving this tutorial.
Tools to know
This the list of tools that we need to achieve the aim of this tutorial alongside some bash scripting knowledge
Tools | Definition |
Flask | is a micro web framework written in Python. It is classified as a micro-framework because it does not require specific tools or libraries. It has no database abstraction layer, form validation, or any other existing third-party library components that provide common functions. Go farther with flask |
Heroku | is a container-based cloud Platform as a Service (PaaS). Developers use Heroku to deploy, manage, and scale modern apps. The platform is elegant, flexible, and easy to use, offering developers the simplest path to getting their apps to market. Go farther with heroku |
GitLab | is a web-based DevOps lifecycle tool that provides a Git-repository manager providing wiki, issue-tracking, and continuous integration and deployment pipeline features, using an open-source license. Explore Gitlab documentation |
Code base
Before starting let's understand what we will dockerize here! This application is a simple machine learning that exposes two routes :
/kmeans: This route accepts a JSON data format with two entries. The first one is the number of clusters you want the classifier to consider, the second one is the text that we will group into the specified number of cluster, each group will contain words that are similar on the meaning or has the same area like people, building, civilization and so on.
/ads : This route accepts also JSON data format that contains also two entries which are the age and the annual salary, the result will be if this person can buy a product on which he saw it on a social media.
The app used Two ML algorithms associated with routes :
- KMEANS : Learn more
- Liniar SVC: learn more
Here is the code :
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
import json
import os
# Importing the libraries
import numpy as np
import pandas as pd
from flask import Flask,jsonify,request,render_template
app = Flask(__name__)
@app.route('/')
def main_page():
return render_template('index.html')
@app.route('/kmeans', methods=['POST'])
def kmeans_pred():
posted_data = request.get_json()
true_k = posted_data['num_cluster']
documents = posted_data['data']
feeds = dict()
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents.split('.'))
model = KMeans(n_clusters=true_k, init='k-means++', max_iter=100, n_init=1)
model.fit(X)
order_centroids = model.cluster_centers_.argsort()[:, ::-1]
terms = vectorizer.get_feature_names()
print(terms)
for i in range(true_k):
feeds.update({"cluster_"+str(i)+"": [terms[ind] for ind in order_centroids[i, :10]]})
return jsonify(feeds)
@app.route("/ads", methods=['POST'])
def purshase():
posted_data = request.get_json()
age = int(posted_data['age'])
salary = int(posted_data['salary'])
new_data = [[age,salary]]
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
new_data_pred = sc.transform(new_data)
# Fitting SVM to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(new_data_pred)
response = {
"result" : str(y_pred)
}
return jsonify(response)
if __name__ == '__main__':
app.run(host="0.0.0.0", port=os.environ.get('PORT'))
I will not explain the code, it's not the aim of this tutorial. But take a look a the last line :
app.run(host="0.0.0.0", port=os.environ.get('PORT'))
The app takes an environment variable called PORT ... but Why?
When you log in to your Heroku pannel you will create a simple server with a custom name, that name will be used later to access the app via the internet, in-depth it is a subdomain linked to your app, it's much like an nginx reverse proxy that will point the subdomain to you deployed app, How?
Simply, When you hit the subdomain, Heroku will route your request to that port and return a response. If you choose your own port, your app won't work, because he doesn't know where to access the container.
Ship the app
It's time to ship the app into an isolated container to be shippable on any Server!
FROM frolvlad/alpine-python-machinelearning
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
EXPOSE 3000
ENV ENVIRONMENT dev
COPY . /app
CMD python main.py
This image frolvlad/alpine-python-machinelearning
is ready to go image that contains what we need like the sklearn
so we don't install it as a dependency, then we set our work directory to /app
and copy the requirements.txt
file to be used by the pip
command to install Flask
. By default, the app will run on port 3000
if we set the port on the app.run(host='',port='')
. besides we need to copy the rest of the app code and run it with python main.py
.
Test it locally
To test locally follow these steps :
# Clone the project
git clone https://gitlab.com/hatemBT/REST-API-KMEANS
# Run it locally
pip install -r requirements.txt
# Run the code
python main.py
#To build the docker image (change the port=os.environ.get('PORT') => port=3000 (you can choose yor custom port))
docker build -t kmeans_app .
#Run with docker
docker run -p 3000:3000 kmeans_app
Make sure to install python3 and docker engine.
Prepare your Server
Log in to your Heroku account and then create a new app and name it hello-devops-python
and choose the region from US
or Europe
, choose the nearest to you, hit create!
Notice that your app URL will be hello-devops-python.herokuapp.com
.
We need the API key now, go to settings and grab that key.
We need to create a special file called .netrc
that will allow us to connect to Heroku API
and it's Git
:
machine api.heroku.com
login <mail>
password <apiKey>
machine git.heroku.com
login <mail>
password <apiKey>
Gitlab pipeline
Make sure to have a GitLab account and a repository
We need to put the apiKey and the Heroku registry URL in GitLab as secrets.
go to settings->CI/CD->variables
and add HEROKU_APP=registry.heroku.com/hello-devops-python/web
, note that web
is the container name. and HEROKU_TOKEN
that hols the API key.
The pipeline has a single-stage development
that will deploy the container to the "hello-devops-python" app. Let's write the CI/CD configuration (.gitlab-ci.yml) :
image : docker:latest
services:
- docker:dind
variables:
DOCKER_DRIVER: overlay
stages:
- development
devloppement:
stage : development
script:
- docker build -t $HEROKU_APP .
- docker login --username=_ -p $HEROKU_TOKEN registry.heroku.com
- docker push $HEROKU_APP
- docker run --rm -e HEROKU_API_KEY=$HEROKU_TOKEN wingrunr21/alpine-heroku-cli container:release web --app hello-devops-python
Simply this pipeline follows these steps :
- Build the container:
docker build -t $HEROKU_APP .
- login to Heroku registry:
docker login --username=_ -p $HEROKU_TOKEN registry.heroku.com
- Push the container:
docker push $HEROKU_APP
- Release the container:
docker run --rm -e HEROKU_API_KEY=$HEROKU_TOKEN wingrunr21/alpine-heroku-cli container:release web --app hello-devops-python
, we used a temporary docker image that contains theheroku-cli
pre-installed, to deploy theweb
container tohello-devops-python
.
Push the code Wait unit Green
Test The App
You can access your app now via the browser, let's test the app API :
Test the /kmeans route :
curl -X POST -H "Content-Type: application/json" "http://hello-devops-python.herokuapp.com/kmeans" -d @data.json | jq "."
Example data :
{
"num_cluster":2,
"data":"
The brothers wanted to build their own kingdom where people could live without fear.
They collected a band of young men and trained them in warfare
They lived in a forest hideout on the banks of the river Tungabhadra in South India.
One day, the brothers were out on a hunt. Ferocious dogs accompanied them.
They crossed the river and rode on
A couple of frightened rabbits ran out of the bushes
The dogs gave them chase with the two brothers closely behind on their horses.
"
}
Output:
{
"cluster_0": [
"accompanied",
"ferocious",
"dogs",
"gave",
"forest",
"fear",
"day",
"crossed",
"couple",
"collected"
],
"cluster_1": [
"brothers",
"hunt",
"day",
"river",
"wanted",
"fear",
"people",
"build",
"live",
"kingdom"
]
}
Test the /ads route :
curl -X POST -H "Content-Type: application/json" "http://hello-devops-python.herokuapp.com/ads" -d "{\"age\":80,\"salary\":190000}" | jq "."
Output :
{
"result": "[1]"
}
The presentation version :