Performance Test AA Service APIs with Locust

Contents

“Just as athletes cannot win without a sophisticated combination of strategy, shape, attitude, tactics and speed, performance engineering requires a good collection of metrics and tools to deliver the desired business results”.– Todd DeCapua

Introduction:

Over the years, adoption of machine learning to drive business decisions has increased exponentially. According to Forbes, ML is projected to grow to $ 30.6 billion for 2024 and it is not surprising to see the myriad of custom ML solutions invading the market that address specific business needs. The ease of availability of computing powers, cloud infrastructure and automation has accelerated it even further.

The current trend to harness the powers of ML in business has led data scientists and engineers to design solutions / innovative services and one of those services has been Model As A Service (MaaS). We have used many of these services without the knowledge of how they were built or served on the web, some examples include data visualization, facial recognition, natural language processing, predictive analytics and more. In summary, MaaS encapsulates all complex data, model training and evaluation, implementation, etc., and allows customers to consume them for their purpose.

As simple as it may seem to use these services, there are many challenges in creating such a service, for instance: How do we maintain the service? How do we ensure that the accuracy of our model does not decrease over time? etc. As with any service or application, an important factor to consider is the load or traffic that a service / API can handle to ensure your uptime. The best feature of the API is to have great performance and the only way to test this is by pressing the API to see how it responds. This is the load test.

In this blog, we will not only see how this service is built, but also how to test service load to plan for hardware requirements / infrastructure in the production environment. We will try to achieve it in the following order:

  1. Create a simple API with FastAPI
  2. Build a classification model in Python
  3. Wrap the model with FastAPI
  4. Test the API with the Postman client
  5. Load test with Locust

Let us begin !!

Creating a simple web API using FastAPI:

The following code shows the basic FastAPI implementation. The code is used to create a simple web API that, upon receiving a particular ticket, produces a specific output. Here is the code division:

  1. Load the libraries
  2. Create an application object
  3. Create a route with @ app.get ()
  4. Write a controller function that has a defined host and port number
from fastapi import FastAPI, Request
from typing import Dict
from pydantic import BaseModel
import uvicorn
import numpy as np
import pickle
import pandas as pd
import json
app = FastAPI()
@app.get("/")
async def root():
    return {"message": "Built with FastAPI"}
if __name__ == '__main__':
    uvicorn.run(app, host="127.0.0.1", port=8000)

Once executed, you can navigate to the browser with the url: http: // localhost: 8000 and observe the result which in this case will be ‘ Built with FastAPI

Creating an API from an ML model using FastAPI:

Now that you have a clear idea of ​​FastAPI, let's see how you can wrap a machine learning model (developed in Python) in an API in Python. I will use the dataset (diagnosis) Wisconsin Breast Cancer. The goal of this ML project is to predict whether a person has a benign or malignant tumor. I will use VSCode as my editor and note that we will test our service with Mailman Client. These are the steps we will follow.

  1. We will first build our classification model: KNeighborsClassifier ()
  2. Build our server file which will have logic for API in the FlastAPI structure.
  3. Finally, we will test our service with Mailman

Paso 1: Classification model

A simple classification model with the standard process of loading data, split data into train / proof, followed by building the model and saving the model in the pickle format to the unit. I will not go into the details of the construction of the model, since the article is about load testing.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import joblib, pickle
import os
import yaml
# folder to load config file
CONFIG_PATH = "../Configs"
# Function to load yaml configuration file
def load_config(config_name):
    """[The function takes the yaml config file as input and loads the the config]
    Args:
        config_name ([yaml]): [The function takes yaml config as input]
    Returns:
        [string]: [Returns the config]
    """
    with open(os.path.join(CONFIG_PATH, config_name)) as file:
        config = yaml.safe_load(file)
    return config
config = load_config("config.yaml")
#path to the dataset
filename = "../../Data/breast-cancer-wisconsin.csv"
#load data 
data = pd.read_csv(filename)
#replace "?" with -99999
data = data.replace('?', -99999)
# drop id column
data = data.drop(config["drop_columns"], axis=1)
# Define X (independent variables) and y (target variable)
X = np.array(data.drop(config["target_name"], 1))
y = np.array(data[config["target_name"]])
X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=config["test_size"], random_state= config["random_state"]
)
# call our classifier and fit to our data
classifier = KNeighborsClassifier(
    n_neighbors=config["n_neighbors"],
    weights=config["weights"],
    algorithm=config["algorithm"],
    leaf_size=config["leaf_size"],
    p=config["p"],
    metric=config["metric"],
    n_jobs=config["n_jobs"],
)
# training the classifier
classifier.fit(X_train, y_train)
# test our classifier
result = classifier.score(X_test, y_test)
print("Accuracy score is. {:.1f}".format(result))
# Saving model to disk
pickle.dump(classifier, open('../../FastAPI//Models/KNN_model.pkl','wb'))

You can access the full code from Github

Paso 2: compile the API with FastAPI:

We will build on the basic example we did in a previous section.

Load the libraries:

from fastapi import FastAPI, Request
from typing import Dict
from pydantic import BaseModel
import uvicorn
import numpy as np
import pickle
import pandas as pd
import json

Load the saved KNN model and write a routing function to return the Json:

app = FastAPI()
@app.get("/")
async def root():
    return {"message": "Hello World"}
# Load the model
# model = pickle.load(open('../Models/KNN_model.pkl','rb'))
model = pickle.load(open('../Models/KNN_model.pkl','rb'))
@app.post('/predict')
def pred(body: dict):
    """[summary]
    Args:
        body (dict): [The pred methos takes Response as input which is in the Json format and returns the predicted value from the saved model.]
    Returns:
        [Json]: [The pred function returns the predicted value]
    """
    # Get the data from the POST request.
    data = body   
    varList = []
    for val in data.values():
        varList.append(val)
    # Make prediction from the saved model
    prediction = model.predict([varList])
    # Extract the value
    output = prediction[0]
    #return the output in the json format
    return {'The prediction is ': output}
# 5. Run the API with uvicorn
#    Will run on http://127.0.0.1:8000
if __name__ == '__main__':
    """[The API will run on the localhost on port 8000]    
    """
    uvicorn.run(app, host="127.0.0.1", port=8000)

You can access the full code from Github.

Using Postman Client:

In our previous section, we create a simple API in which pressing the http: // localhost: 8000 in the browser we got an output message “Built with FastAPI”. This is fine as long as the output is simpler and a user or system input is expected. But we are building a model as a service in which we send data as input for the model to predict.. In that case, we will need a better and easier way to test it. we will use postman to test our API.

  1. Run the server.py file
  2. Open the Postman client and fill in the relevant details highlighted below and hit the submit button.
  3. See the result in the answer section below.
77545postman-1301229

Are your applications and services stable under maximum load?

Time to load test:

We will explore the Locust library for load testing and the easiest way to install Langosta it is

pip install locust

Let's create a perf.py file with the following code. I have referred to the code Quick start lobster page

import time
import json
from locust import HttpUser, task, between
class QuickstartUser(HttpUser):
    wait_time = between(1, 3)
    @task(1)
    def testFlask(self):
        load = {
        "radius_mean": 13.54,
        "texture_mean": 14.36,
        ......
	......
        "fractal_dimension_worst": 0.07259}
        myheaders = {'Content-Type': 'application/json', 'Accept': 'application/json'}
        self.client.post("/predict", data= json.dumps(load), headers=myheaders)

Access the full code file from Github

Start lobster: Navigate to the directory perf.py and run the following code.

locust -f perf.py

Interfaz web Locust:

Once you have started Locust with the above command, navigate to a browser and point it to http: // localhost: 8089. You should see the following page:

21370locust-9034764

Let's try with 100 users, generation ratio 3 y host: http: 127.0.0.1: 8000 where our API is running. You can see the following screen. You can see the load increase over time and response time, a graphical representation shows the mean time and other metrics.

Note: make sure server.py is running.

91658locust1-3563259
48857locust2-5129651

Conclution:

We cover a lot on this blog, from building a model, closing with a FastAPI, the service test with the postman and finally the performance of a load test with 100 simulated users accessing our service with a gradually increasing load. We were able to monitor how the service is responding.

Most of the time there are enterprise-level SLAs that must be met, namely, keep a certain threshold for a response time like 30ms or 20ms. If SLAs are not met, there are potential financial implications depending on the contract or the loss of customers, as they did not receive the service quickly enough.

A load test helps us understand the maximum and potential points of failure. Later, we can plan proactive action by increasing our hardware capacity and, if the service is deployed in the Kubernetes configuration type, configure it to increase the number of pods with increasing load.

Happy learning !!!!

You can connect with me – Linkedin

You can find the code for reference: Github

References

https://docs.locust.io/en/stable/quickstart.html

https://fastapi.tiangolo.com/

https://unsplash.com/

The media shown in this article is not the property of DataPeaker and is used at the author's discretion.

Subscribe to our Newsletter

We will not send you SPAM mail. We hate it as much as you.