Python meets microservice – useful tips based on experience

Karol Wybraniec

It is impossible nowadays to not hear about microservices. It’s so “buzzy” word that everyone is talking, writing and thinking about it – either developers and managers. In this blogpost I’d like to focus on some useful cases concerning Python & Microservices altogether.

At the beginning it’s worth to say, why Python gets along well with microservices.

  • easy to start, fast prototyping provides working API quickly and easily
  • great microframeworks ready to use, like Flask
  • asynchronous services with Tornado or Twisted
  • a lot of useful packages: requests, uritemplate, rfc3339, flask-restless
  • clients to popular services like RabbitMQ, Redis, MongoDB
  • Python’s advantages like sameness of JSON (most popular REST data medium) with Python’s dicts

Synchronous WSGI – help yourself with asynchrony of Gevent or asyncio

WSGI (Web Server Gateway Interface) standard is the implementation of PEP3333, which was inspired by CGI (Common Gateway Interface). It is worth to mention about a few key features, that comes with WSGI and are great with creating microservices. Starting with flexibility, which allows us to change the backend server without changing the application’s code (e.g.: switching from Nginx to Apache), through scalability handled by WSGI server itself (which allows us to add instances if the load of requests is huge). The other, no less important, are reusable middleware components for dealing with caches, sessions, authorizations etc.

Despite the above short introduction of WSGI, that sounds encouragingly to be convinced, microservices are about sending a request, and getting the response. The time indispensable for getting the data back may sometimes be a bottle neck to synchronous frameworks.

That’s why, mentioned in the introduction, Tornado and Twisted increase their popularity, as based on “callbacks”, handle asynchrony pretty well. I am not saying, that callbacks are the remedy. It is important to highlight, that is may help, in some cases. Nevertheless, there’s nothing wrong with implementing microservice based on WSGI standard, as far as app works in manner: 1 request = 1 thread. If still synchronous service is a deal breaker there is a trick to speed up our app by using gevent, or (which is going to be reminded at the end of this chapter) asyncio might be the answer.

Before I’ll proceed to some code samples, I’d like to inform all of you (one more time), that concurrency code it’s not always the answer. I’d go for the bold statement, that it should be used, if other solutions have failed. Why? Because it complicates the code, makes it harder to debug. Not saying about synchronization between concurrent fragments of code, that uses shared data. So, it’s not for free (by the way, have you ever heard the term “callbacks hell” related to JavaScript?), and should be used only, if there are some remarks for that (a lot of time spend on waiting for the response data, while CPU usage is low).

Gevent is a concurrency library that provides API for numerous of concurrency and network related jobs. It is based on greenlets (a coroutine module written as C extension). It may reduce the time needed to handle multiple calls to our endpoints.

Let’s create two python files example_without_gevent.py and example_with_gevent.py, then time them. Convince yourself.

# example_without_gevent.py
import requests


def run():
    urls = [
        'http://www.google.com',
        'http://www.python.org',
        'http://www.wikipedia.org',
        'http://www.github.com',
     ]
     responses = [requests.get(url) for url in urls]
$ python -mtimeit -n 3 'import example_without_gevent' 'example_without_gevent.run()'
3 loops, best of 3: 1.52 sec per loop
# example_with_gevent.py
import gevent
from gevent import monkey
monkey.patch_all()
import requests


def run():
    urls = [
        'http://www.google.com',
        'http://www.python.org',
        'http://www.wikipedia.org',
        'http://www.github.com',
     ]
     jobs = [gevent.spawn(lambda url: requests.get(url), each) for each in urls]
     gevent.joinall(jobs, timeout=5)

     responses = [requests.get(url) for url in urls]
 $ python -mtimeit -n 3 'import example_with_gevent' 'example_with_gevent.run()'
3 loops, best of 3: 647 msec per loop

Not everything is that rosy. To have gevent lib working properly, all the code that is using it has to be compatible with its version. That’s the reason, that some packages being developed by the community, sometimes block each other (especially C-extensions). Anyway, in most cases you’re not going to face it by yourself.

The other way, but the prettiest and the most modern is asyncio. Since Python 3.4, when it was introduced, asyncio allows to write concurrent code by providing high-level (coroutines, synchronizing concurrent code, subprocesses control) and low-level API (event loops), new keywords (await/async). If your project allows you to use the newest releases of Python, it’s probably the best way of dealing with concurrency. Let me reference an official website, where more detailed information has been published: https://docs.python.org/3/library/asyncio.html

Analysis of security vulnerabilities using Bandit

The community of OpenStack designed and created a tool called Bandit to find security weaknesses (e.g.: SQL injection). As a result of vulnerabilities tests user gets a clean console output with pointed cases that failed during test run.

Let’s create an example code with security issues intentionally (lines with issues marked with comments).

# 1st issue related to subprocess
import subprocess
import yaml
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker


def read_config(file_name):
     with open(file_name) as config:
         # 2nd issue – unsafe yaml load
         data = yaml.load(config.read())


def run_command(cmd):
     # 3rd issue – shell=True
     return subprocess.check_call(cmd, shell=True)


db = create_engine('sqlite://localhost')
Session = sessionmaker(bind=db)


def get_product(id):
    session = Session()
    # 4th issue – SQL injection
    query = "select * from products where id='%s'" % id
    return session.execute(query)

Run a following command to execute bandit tests on a file:

$ bandit my_file.py 

This is the result:

Test results:
>> Issue: [B404:blacklist] Consider possible security implications associated with subprocess module.
   Severity: Low   Confidence: High
   Location: example.py:1
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_imports.html#b404-import-subprocess 1       import subprocess
2       import yaml
3       from sqlalchemy import create_engine
 
--------------------------------------------------
>> Issue: [B506:yaml_load] Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load().
   Severity: Medium   Confidence: High
   Location: example.py:9
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html
8           with open(file_name) as config:
9               data = yaml.load(config.read())
10
 
--------------------------------------------------
>> Issue: [B602:subprocess_popen_with_shell_equals_true] subprocess call with shell=True identified, security issue.
   Severity: High   Confidence: High
   Location: example.py:13
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b602_subprocess_popen_with_shell_equals_true.html12      def run_command(cmd):
13          return subprocess.check_call(cmd, shell=True)
14
 
--------------------------------------------------
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   Location: example.py:22
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b608_hardcoded_sql_expressions.html
21          session = Session()
22          query = "select * from products where id='%s'" % id
23          return session.execute(query)
--------------------------------------------------
Code scanned:
        Total lines of code: 15
        Total lines skipped (#nosec): 0
 
Run metrics:
        Total issues (by severity):
                Undefined: 0.0
                Low: 1.0
                Medium: 2.0
                High: 1.0
        Total issues (by confidence):
                Undefined: 0.0
                Low: 1.0
                Medium: 0.0
                High: 3.0

Bandit considers dozens of tests. Considering Flask platform there is one test worth to mention: checking if debug is set to True, which is fine in development instances, but not in production ones. Since creating a Flask application is shorter than single System.out.println() in Java, I don’t hesitate to place a basic example below:

from flask import Flask

app = Flask(__name__)

@app.route('/')
def index():
     return 'Nothing is here'

if __name__ == '__main__':
    app.run(debug=True)

Running bandit test on that code produces following result:

Test results:
>> Issue: [B201:flask_debug_true] A Flask app appears to be run with debug=True, which exposes the Werkzeug debugger and allows the execution of arbitrary code.
   Severity: High   Confidence: Medium
   Location: app.py:13
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b201_flask_debug_true.html
12      if __name__ == '__main__':
13          app.run(debug=True)

Running such a test on the production branch before deploying to production environment is definitely a good practice. Notice, that automated tools like this shouldn’t be treated like an oracle. It should be used in addition to the serious tests (all of the kinds). Moreover, since it’s third-party library, it’s about trust to the authors too, isn’t it?

Bandit is configurable through .ini file:

[bandit]
skips: B201
Exclude: tests

Manage API using flask-restless

Flask-restless basically provides a mapping between database and model, simplifying generation of API for the model without writing routes, since accessing database tables is pretty much the same for all of the entities. As a result of GET request on the specific model flask-restless returns JSON.

We’re going to use example SQLite database available under http://www.sqlitetutorial.net/sqlite-sample-database/ as a local resource. The file is called chinook.db. Here’s the tree result of my working directory (just in case of look up need):

flask_restless_with_sqlite_example/
    config.cfg
    database
      chinook.db
    requirements.txt
    src
      my_app.py
    venv

It’s handy to place database connection string in config.cfg, like shown below:

SQLALCHEMY_DATABASE_URI = 'sqlite:///../database/chinook.db' 
DEBUG = True

chinook.db contains more than enough number of tables, although for the sake of this blog we’re going to focus on Albums and Artists tables.

my_app.py looks like following:

from pathlib import Path

import sqlalchemy as db

from flask import Flask
from flask_restless import APIManager
from sqlalchemy.ext.declarative import declarative_base

app = Flask(__name__)
app.config.from_pyfile(Path(Path(__file__).parent, '..', 'config.cfg'))
engine = db.create_engine(app.config['SQLALCHEMY_DATABASE_URI'])
session = db.orm.sessionmaker(bind=engine)()

Base = declarative_base()


class Albums(Base):
    __tablename__ = 'Albums'
    album_id = db.Column('AlbumId', db.Integer, primary_key=True)
    title = db.Column(db.String(160))
    artist_id = db.Column(
        'ArtistId', db.Integer, db.ForeignKey('Artists.ArtistId')
    )


class Artists(Base):
    __tablename__ = 'Artists'
    artist_id = db.Column('ArtistId', db.Integer, primary_key=True)
    name = db.Column(db.String(160))
    albums = db.orm.relationship('Albums', backref='Artists')


manager = APIManager(app, session=session)
manager.create_api(Albums)
manager.create_api(Artists)

if __name__ == '__main__':
    app.run()

Application by default is serving data under http://localhost:5000/. Our API is accessed by sending GET request to http://localhost:5000/api/Albums. As a result, we receive a JSON (show below), and that’s it. We serve data from our database.

{
  "num_results": 347, 
  "objects": [
    {
      "Artists": {
        "artist_id": 1, 
        "name": "AC/DC"
      }, 
      "album_id": 1, 
      "artist_id": 1, 
      "title": "For Those About To Rock We Salute You"
    }, 
    {
      "Artists": {
        "artist_id": 2, 
        "name": "Accept"
      }, 
      "album_id": 2, 
      "artist_id": 2, 
      "title": "Balls to the Wall"
    }, 

(…most of this JSON has been cut...)

    {
      "Artists": {
        "artist_id": 8, 
        "name": "Audioslave"
      }, 
      "album_id": 10, 
      "artist_id": 8, 
      "title": "Audioslave"
    }
  ], 
  "page": 1, 
  "total_pages": 35
}

Conclusion

There is enormous hype concerning microservices nowadays. Let’s clash it with the popularity of Python programming language (according to Stackoverflow’s statistics, Python is leading phrase to be asked about in theirs service) and we receive a pretty duo that is capable of handling microservices very well (with libraries ready to be used). In my opinion, it’s a fantastic time to learn about the catchy microservices in such a relevant language, as the Python is. Take a look at the biggest players on the market, they’ve already spotted the advantages of Python + microservices!

Poznaj mageek of j‑labs i daj się zadziwić, jak może wyglądać praca z j‑People!

Skontaktuj się z nami