Heroku: Deploy a dockerized Flask ML app

Heroku: Deploy a dockerized Flask ML app

In this post we will deploy a simple dockerized Machine learning application written with the Flask micro-framework to the Heroku (PaaS), using one of the leading CI/CD tools, which is Gitlab!, all used tools are free and no fees in achieving this tutorial.

Tools to know

This the list of tools that we need to achieve the aim of this tutorial alongside some bash scripting knowledge

ToolsDefinition
Flaskis a micro web framework written in Python. It is classified as a micro-framework because it does not require specific tools or libraries. It has no database abstraction layer, form validation, or any other existing third-party library components that provide common functions. Go farther with flask
Herokuis a container-based cloud Platform as a Service (PaaS). Developers use Heroku to deploy, manage, and scale modern apps. The platform is elegant, flexible, and easy to use, offering developers the simplest path to getting their apps to market. Go farther with heroku
GitLabis a web-based DevOps lifecycle tool that provides a Git-repository manager providing wiki, issue-tracking, and continuous integration and deployment pipeline features, using an open-source license. Explore Gitlab documentation

Code base

Before starting let's understand what we will dockerize here! This application is a simple machine learning that exposes two routes :

  • /kmeans: This route accepts a JSON data format with two entries. The first one is the number of clusters you want the classifier to consider, the second one is the text that we will group into the specified number of cluster, each group will contain words that are similar on the meaning or has the same area like people, building, civilization and so on.

  • /ads : This route accepts also JSON data format that contains also two entries which are the age and the annual salary, the result will be if this person can buy a product on which he saw it on a social media.

The app used Two ML algorithms associated with routes :

  1. KMEANS : Learn more
  2. Liniar SVC: learn more

Here is the code :

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
import json
import os
# Importing the libraries
import numpy as np
import pandas as pd
from flask import Flask,jsonify,request,render_template
app = Flask(__name__)


@app.route('/')
def main_page():
    return render_template('index.html')

@app.route('/kmeans', methods=['POST'])
def kmeans_pred():
    posted_data = request.get_json()
    true_k = posted_data['num_cluster']
    documents = posted_data['data']
    feeds = dict()
    vectorizer = TfidfVectorizer(stop_words='english')
    X = vectorizer.fit_transform(documents.split('.'))
    model = KMeans(n_clusters=true_k, init='k-means++', max_iter=100, n_init=1)
    model.fit(X)
    order_centroids = model.cluster_centers_.argsort()[:, ::-1]
    terms = vectorizer.get_feature_names()
    print(terms)
    for i in range(true_k):
        feeds.update({"cluster_"+str(i)+"": [terms[ind] for ind in order_centroids[i, :10]]})
    return jsonify(feeds)

@app.route("/ads", methods=['POST'])
def purshase():
    posted_data = request.get_json()
    age = int(posted_data['age'])
    salary = int(posted_data['salary'])
    new_data = [[age,salary]]
    # Importing the dataset
    dataset = pd.read_csv('Social_Network_Ads.csv')
    X = dataset.iloc[:, [2, 3]].values
    y = dataset.iloc[:, 4].values
    # Splitting the dataset into the Training set and Test set
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
    # Feature Scaling
    from sklearn.preprocessing import StandardScaler
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    new_data_pred = sc.transform(new_data)
    # Fitting SVM to the Training set
    from sklearn.svm import SVC
    classifier = SVC(kernel = 'linear', random_state = 0)
    classifier.fit(X_train, y_train)
    # Predicting the Test set results
    y_pred = classifier.predict(new_data_pred)
    response = {
        "result" : str(y_pred)
    }
    return jsonify(response)

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=os.environ.get('PORT'))

I will not explain the code, it's not the aim of this tutorial. But take a look a the last line :

    app.run(host="0.0.0.0", port=os.environ.get('PORT'))

The app takes an environment variable called PORT ... but Why?

When you log in to your Heroku pannel you will create a simple server with a custom name, that name will be used later to access the app via the internet, in-depth it is a subdomain linked to your app, it's much like an nginx reverse proxy that will point the subdomain to you deployed app, How?

Simply, When you hit the subdomain, Heroku will route your request to that port and return a response. If you choose your own port, your app won't work, because he doesn't know where to access the container.

Ship the app

It's time to ship the app into an isolated container to be shippable on any Server!

FROM frolvlad/alpine-python-machinelearning
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
EXPOSE 3000
ENV ENVIRONMENT dev
COPY . /app
CMD python main.py

This image frolvlad/alpine-python-machinelearning is ready to go image that contains what we need like the sklearn so we don't install it as a dependency, then we set our work directory to /app and copy the requirements.txt file to be used by the pip command to install Flask. By default, the app will run on port 3000 if we set the port on the app.run(host='',port=''). besides we need to copy the rest of the app code and run it with python main.py.

Test it locally

To test locally follow these steps :

# Clone the project 
git clone https://gitlab.com/hatemBT/REST-API-KMEANS
# Run it locally 
pip install -r requirements.txt
# Run the code 
python main.py
#To build the docker image (change the  port=os.environ.get('PORT') => port=3000 (you can choose yor custom port))
docker build -t kmeans_app . 
#Run with docker 
docker run -p 3000:3000 kmeans_app

Make sure to install python3 and docker engine.

Prepare your Server

Log in to your Heroku account and then create a new app and name it hello-devops-python and choose the region from US or Europe, choose the nearest to you, hit create!

Notice that your app URL will be hello-devops-python.herokuapp.com.

We need the API key now, go to settings and grab that key.

We need to create a special file called .netrc that will allow us to connect to Heroku API and it's Git :

machine api.heroku.com
  login <mail>
  password <apiKey>
machine git.heroku.com
  login <mail>
  password <apiKey>

Gitlab pipeline

Make sure to have a GitLab account and a repository

We need to put the apiKey and the Heroku registry URL in GitLab as secrets. go to settings->CI/CD->variables and add HEROKU_APP=registry.heroku.com/hello-devops-python/web, note that web is the container name. and HEROKU_TOKEN that hols the API key.

Screenshot_2020-11-28 CI CD Settings · CI CD · hatem ben tayeb REST-API-KMEANS.png

The pipeline has a single-stage development that will deploy the container to the "hello-devops-python" app. Let's write the CI/CD configuration (.gitlab-ci.yml) :

image : docker:latest
services:
  - docker:dind
variables:
    DOCKER_DRIVER: overlay
stages:
    - development
devloppement:
    stage : development
    script:
         - docker build -t $HEROKU_APP .
         - docker login --username=_ -p $HEROKU_TOKEN registry.heroku.com
         - docker push $HEROKU_APP
         - docker run  --rm  -e HEROKU_API_KEY=$HEROKU_TOKEN wingrunr21/alpine-heroku-cli container:release web --app hello-devops-python

Simply this pipeline follows these steps :

  • Build the container: docker build -t $HEROKU_APP .
  • login to Heroku registry: docker login --username=_ -p $HEROKU_TOKEN registry.heroku.com
  • Push the container: docker push $HEROKU_APP
  • Release the container: docker run --rm -e HEROKU_API_KEY=$HEROKU_TOKEN wingrunr21/alpine-heroku-cli container:release web --app hello-devops-python, we used a temporary docker image that contains the heroku-cli pre-installed, to deploy the web container to hello-devops-python.

Push the code giphy.gif Wait unit Green

Screenshot_2020-11-28 Pipelines · hatem ben tayeb REST-API-KMEANS.png

Test The App

You can access your app now via the browser, let's test the app API :

Test the /kmeans route :

curl -X POST -H "Content-Type: application/json" "http://hello-devops-python.herokuapp.com/kmeans" -d @data.json | jq "."

Example data :

{
    "num_cluster":2,
    "data":"
The brothers wanted to build their own kingdom where people could live without fear.
 They collected a band of young men and trained them in warfare
 They lived in a forest hideout on the banks of the river Tungabhadra in South India.
One day, the brothers were out on a hunt. Ferocious dogs accompanied them.
 They crossed the river and rode on 
A couple of frightened rabbits ran out of the bushes
The dogs gave them chase with the two brothers closely behind on their horses.
"
}

Output:

{
  "cluster_0": [
    "accompanied",
    "ferocious",
    "dogs",
    "gave",
    "forest",
    "fear",
    "day",
    "crossed",
    "couple",
    "collected"
  ],
  "cluster_1": [
    "brothers",
    "hunt",
    "day",
    "river",
    "wanted",
    "fear",
    "people",
    "build",
    "live",
    "kingdom"
  ]
}

Test the /ads route :

curl -X POST -H "Content-Type: application/json" "http://hello-devops-python.herokuapp.com/ads" -d "{\"age\":80,\"salary\":190000}" | jq "."

Output :

{
  "result": "[1]"
}

The presentation version :