Jupyter Notebooks as RESTful Microservices

“Data science enables the creation of data products.” – Mike Loukides in What is data science?

Data products take on many forms. Web articles, dashboard applications, and cloud services are all common vehicles for delivering value from data. Tools that help produce artifacts such as these are a necessary part of any data mining methodology.

In the Project Jupyter ecosystem, many tools exist to aid the creation of data products from notebooks:

Today, I’d like to introduce a new tool and capability. The Jupyter kernel gateway can turn notebooks into RESTful web APIs, giving notebook authors a way of making their work immediately useful to software developers, particularly in microservice architectures. Consider the following use cases:

  • A data scientist prototypes a model for suggesting last-minute purchases to users at checkout. She writes a notebook with code for invoking the model and saving scoring results. She publishes her notebook as a simple API for her web dev team to fit into their check-out system and invoke on a trial basis. Later, she pulls click-through metrics from her service in order to improve her model.
  • A software developer writes a bit of “glue code” in a notebook to accept a search string, query a data source, pass results to a concept tagging API, and respond with the tagged data. The developer deploys this service to the cloud for use in various departmental applications. He makes improvements to the original notebook over time and redeploys it as a web service.

To explain the mechanics of turning notebooks into APIs, let’s walk through a simple example: the scotch recommendation engine from our prior dashboards post. This time, we’ll outline how to create and deploy a web API from the notebook instead of a visual dashboard:

  • GET /scotches – returns a list of all known scotches as JSON
  • GET /scotches/:scotch – returns the features of a particular scotch as JSON
  • GET /scotches/:scotch/similar?count=N – returns the names and scores of N similar scotches as JSON

If you’d like to skip to the chase, the complete example is available on GitHub in three different languages (Python, R, Julia).

Create a Jupyter Notebook

To start, we need to create a Jupyter notebook (.ipynb file). Within this notebook, we write the necessary code to load the scotch feature set, load the similarity model, and implement the logic for the three API requests we want to support. We’re free to pick whatever language we want to use, as long as it can serialize and deserialize JSON strings (a requirement that we’ll see in more detail below).

Using Python, our notebook starts in typical fashion with some Markdown, module imports, calls to Pandas to load DataFrames, and so on. All of this is typical notebook content as shown in the screenshot below (or in the completed notebook on GitHub).

Screenshot of the top of the scotch notebook

Annotate the API Handlers

After loading the data, we move on to implementing the handlers for our desired API. To implement the GET /scotches resource in our Python notebook, we annotate a cell with # GET /scotches. Then we write the code to output the list of scotches in our desired format, either on stdout or as display data in the notebook.

Likewise, to implement the GET /scotches/:scotch resource handler, we write the following in a notebook cell:

Here, the code reads a global variable named REQUEST. The kernel gateway populates this variable with a JSON string containing the request headers, URL arguments, path arguments, and body before running the code in the cell. Our handler parses this JSON string, then uses the scotch name specified in the path to look up its features. Finally, it responds with the features in a JSON object.

In general, annotations are single-line comments containing an HTTP verb (GET, POST, PUT, DELETE, etc.) followed by a parameterized URL path. The kernel gateway executes the code in annotated cells whenever it receives an HTTP request with a matching verb and resource.

Deploy the Service

After developing and debugging our logic in the notebook, we move on to deploying it as a standalone service. To do so, we need to:

  1. Install the kernel gateway
  2. Install whatever dependencies the notebook code needs
  3. Make the notebook file accessible to the gateway

One convenient way to pull all of these pieces in place is to use Docker. For the purposes of this post, we’ll build and run a Docker container manually. (We can automate these steps if we have a known cloud provider, known dependencies, and a way to one-click deploy a container to our provider.)

First, we write a Dockerfile that packages the kernel gateway, our notebook, and its dependencies together.

Next, we build a Docker container image from the Dockerfile.

Finally, we run a container from the image on our Docker host of choice.

We can see our API working by pointing our browser at it and viewing the JSON results. More importantly, we can call our API from new and existing applications.

Screenshot of the API output

Learn More

This post scratches the surface of what’s possible with the kernel gateway and the transformation of notebooks into APIs. For more information, check out these resources.