A Whistlestop Tour of Python Requests

Return to our regularly scheduled technical blogging, I’m going to give a
quick overview of one of the software libraries I know best:
Kenneth Reitz’s
Requests library for
the Python programming language. Since I started making minor (and I mean
really minor)
contributions to the library, I’ve become increasingly familiar with its use
and utility, and I now firmly believe that it is the best third-party library
available for Python.

The documentation is in general good: the bad parts tend to be the parts I
wrote. Nevertheless, it’s worth having an informal introduction to the library
somewhere on the internet. I’m actually pretty sure other sources exist, but
this is one of the very few things I can write about with even a modicum of
legitimate authority, so damnit I’m going to write it anyway.

Be warned: Requests provides a great many features that I’m not going to go
into in this post. The library has developed from being a useful wrapper over
some of Python’s less Pythonic
libraries (oh, hello there urllib2
and httplib) into the most
intuitive method of interacting with HTTP and HTTPS available in any
programming language I have ever seen. The intention of this blog post is to
quickly highlight the utility of Requests, not its depth. As always, the
best place to look is the official docs.

What is Requests?

tl;dr: Requests is a Python library intended to make it as easy as
possible to work with HTTP.

More fully, Requests is intended to provide a simple and intuitive API that
allows the creation of complex workflows from Python code. This library is,
in my opinion, the holy grail of
API-Oriented Design. For
the maintainer, any addition to Requests that makes the API any less clear is
a bad change.

This means that Requests has the ability to scale naturally to the complexity
of your problem. If all you want to do is grab some
JSON from a single web URL, Requests will
make that a one-line program. If you want to manage a complicated workflow
using OAuth to grab data from multiple
sources, Requests provides you with all the tools you need to handle the data
exchange.

Any Python developer who has to interact with the web as a client will
eventually find their way to Requests, and when they do they’ll have stumbled
onto the single best Python library around.

Let’s Start!

Ok, let’s get going. Firstly, you want to create a sandbox in which you can
play. It is rarely a good idea to pollute your system Python installation with
any package installation, so you should probably create a useful
virtualenv. If there was
any package to install into your site packages, Requests would be it, but it
then becomes something you need to maintain. As a result, I recommend you
create a virtualenv for following through this little tour. Do this like
so:

$ virtualenv venv
$ source venv/bin/activate

You should get a little (venv) at the start of your prompt. This lets you
know that you’re using the virtualenv. When that’s done, install Requests
into the virtualenv:

$ pip install requests

Requests is present. Grab yourself an interactive Python shell (python at
the command prompt). Now you can begin to see some of the glory of Requests.
Let’s see what we need to do to get Google’s homepage.

>>> import requests
>>> r = requests.get('http://www.google.com/')

And we’re done.

Seriously, that’s it. The variable r now contains a Response object, which
contains everything associated with the response from the web server. We can
take a look at what we got.

>>> print r.status_code
200
>>> print r.reason
OK
>>> print r.text
<!doctype html><html>...
...snip...
...</html>

As you can see, you can find out the HTTP status code returned by the server,
and also the human-readable response code. In this case, because the URL I
used points to a real resource, the server has returned a 200 OK response.

We can make other forms of HTTP requests too. POST, DELETE, PUT, OPTIONS,
HEAD and PATCH can all be made using the exact same syntax: requests.post(),
requests.delete(), requests.put(), requests.options(),
requests.head(), requests.patch().

Knowing that the resource I wanted has been returned, I can access it using
the text method of the Response object.

Getting Data From a Response

The text method is one of a few methods of getting the content of a
Response. To demonstrate the others, the most effective way is to use a
resource that returns JSON, for which Requests has built-in support.

A good resource of this type is Twitter. Let’s find the top 20 trending topics
for each hour of today. To do that, run:

>>> r = requests.get('http://api.twitter.com/1/trends/daily.json')
>>> print r.status_code
200

In Requests you can get the content of a Response in four ways. For now,
we’ll focus on three. The first is to get the content as a stream of bytes,
which in Python is represented by a standard string (Python 2.x) or a Python
bytestring (Python 3.x). This is the content method:

>>> print r.content
...snip...

The next option is the preferred method for non-JSON requests, which is to use
the text method. This is because
encoding text is hard.
It’s highly likely that your web server has returned data that is not encoded
in ASCII. This means the text needs to be decoded first. Requests will look at
the headers of the response and use them to determine the encoding used. We
can see this clearly in our response from Twitter. First, take a look at the
response headers.

>>> print r.headers['Content-Type']
application/json; charset=utf-8

Requests looks for that Content-Type header, in particular for the charset
field. If that is present, Requests assumes that the encoding specified is the
one being used. We can see that:

>>> print r.encoding
utf-8

If the header isn’t present (and
a specific other header isn’t there either),
Requests uses chardet to try to guess
the encoding. If it turns out that Requests gets it wrong, you can also set
the encoding property to the correct value.

Regardless, when you call Response.text, you will get the content of the
response decoded using that codec. This happens behind the scenes, so that
in 99.99% of cases everything will be seamless.

>>> print r.text
...unicode snip...

The final method works well when you have JSON, and that is to use the json
method. This method returns a Python object (basically a dict) that contains
the JSON from the response, exactly like you’d get if you called
json.loads() on the content of
the response. This dict can then be immediately interacted with like any other
Python dictionary.

>>> print r.json
...dict snip...

Cookies and Headers and Such!

Requests makes it really easy to get at the headers and cookies sent in a
Response. The headers are stored in objects that behave almost exactly like
Python dictionaries. This means you can print the entire thing, like this:

>>> print r.headers
...dict snip...

Both the headers and cookies support access like Python dictionaries. This
means you can do this:

>>> print r.headers['cache-control']
max-age=300, must-revalidate
>>> print r.cookies['_twitter_sess']
...long alphanumeric string here...

You can also send custom headers and cookies by using Python dictionaries. For
instance, you can do this:

>>> headers = {'User-Agent': 'not-a-real-browser (like Gecko)'}
>>> cookies = {'Tasty': 'Chocolate_chip_cookies'}
>>> t = requests.get('http://httpbin.org/get', headers=headers, cookies=cookies)

Of course, cookies aren’t really of much use when you only send them on one
request. The whole point is that cookies enable you to maintain state, and
Requests’ API appears to be stateless. Obviously, Requests makes it easy to
maintain state.

State in Requests

To maintain state, Requests provides you with the Session class. Once you
create an instance of the class, you can make all of you requests using
methods on the Session instance. This looks like this:

>>> s = requests.Session()
>>> s.get('http://www.google.com/')
>>> s.close()

There are two main reasons to use sessions. The first is to maintain state
over a series of connections. If you make subsequent GETs or POSTs or
what-have-you, Requests will continue to pass the cookies around, which is
exactly what you want to have happen.

The next reason is that the Session class allows you to set some
parameters that will be used for all subsequent requests. This includes
headers and cookies, as well as some of the other options I won’t be going
into this time around.

As you can see in the code example above, sessions need to be instantiated
and cleaned up. It’s good form, then, to use sessions as context managers,
like so:

>>> with requests.Session(headers=headers) as s:
...    s.get('http://www.google.com/')
...    s.get('http://www.example.com/')
>>> do_some_other_stuff()

This saves you having to call Session.close() on your session, and ensures
that you don’t leak sockets. It’s also very Pythonic, and that’s always a good
thing.

Query Parameters

A common feature of web APIs and URLs in general is the use of query
parameters. These are the portions of the URL that follow the leading ‘?’
character. Requests makes it easy to add these into the URL without having
to form it correctly yourself:

>>> params = {'hi': 'there'}
>>> r = requests.get('http://httpbin.org/get', params=params}
>>> print r.url
http://httpbin.org/get?hi=there

These parameters are placed into the URL and correctly encoded. Passing
parameters in this way ensures that there will not be difficulties passing
them to APIs upstream.

So much more!

This was the most brief look at the features available in Requests. There are
so many other features: proxies, automatic HTTP authentication, automatic
redirect handlers, multipart file upload, SSL certificate verification and
more. If I can find a good example, I’ll walk through some of the more
advanced features of the library in the future.

In the meantime, I thoroughly suggest that if you have any need to
programmatically access web services, whether an API or in some web-scraping,
you try out Python’s Requests library. It’s truly an exemplar in the world of
programming libraries.