Programming link clearance 2015: Python edition

I have a Coding bookmarks folder which is stuffed full of loads of interesting articles that I’ve never shared with anyone because they don’t really fit into any of my posts. So, taking an idea from The Old New Thing, I’m going to run a few ‘Link Clearance’ posts. This is the Python-focused one (there will be more soon, including a general programming one).

(Yes, I know it is now the middle February 2016, but things got delayed a bit! Most of these links are from 2015 – with a few more recent ones added too)

General Python:

Scientific Python

  • Python for Computational Science and Engineering: Book-length introduction to scientific Python programming, including basic Python, plus numpy, matplotlib, SymPy and more.
  • Bayesian Methods for Hackers: An introduction to Bayesian methods from a programming-perspective – also book-length and definitely worth a read.
  • Think Bayes: If you didn’t like the previous book relying on the PyMC module then you might prefer this one – it teaches similar concepts but with pure Python (with a bit of numpy later on). It gave me a far better understanding of probability in general – not just Bayesian thinking.
  • Kalman and Bayesian filters in Python: Yup, yet another book – but I promise this is the last one. It covers some of what has been covered in the two previous books, but goes into a lot of depth about Kalman filters, in a very easy-to-understand way.
  • 100 numpy exercises: This link is actually far more interesting than it sounds – it’s amazing what can be done in numpy in very few lines of code. I’d recommend starting at the top and seeing how many of the exercises you can complete…and then looking at the answers which will probably teach you a lot!
  • PSA: Consider using NumPy if you need to parse a large binary data file with a fairly simple format: This was very useful to me once, and I had no idea about it before I read this article – again, numpy is great!
  • Pandas and Python: Top 10: A great introduction to useful pandas features, I often use this as a reference for functions that confuse me slightly (like map, apply and applymap
  • Python GDAL/OGR Cookbook!: Some good ‘cookbook’-style examples of using the Python interface to GDAL/OGR (for reading/writing geographic data). Particularly useful as the main GDAL docs are focused on the C++ interface
  • Fitting models using R-style formulas:Have you ever wished for R-style formulas for fitting models in Python? Well, look no further – it can be done easily using a combination of statsmodels and patsy
  • Probability distributions in SciPy: A great brief summary of probability distributions included in scipy, and how to use the various methods available on them
  • Overview of Python Visualization: Visualisation options for Python were a lot less confusing when the only option was matplotlib! This should help you navigate the range of options now available
  • What is your Jupyter workflow like?: As with many Reddit discussions, there is some gold buried amongst the less-useful comments. I definitely learnt some new ways of working.
  • Getting the Best Performance out of NumPy: A good guide to increasing the performance of your numpy code.

Python packages

These are all packages that didn’t quite fit in to my Top 5 Python Packages of 2015 post, but are still great

  • pypath-magic: A handy command-line tool and IPython magic to allow you to easily change your PYTHONPATH – very useful!
  • MoviePy: Lovely simple interface to make animations/videos in Python – using whatever libraries/functions you want to create the actual images
  • SWAPY: A simple GUI to allow you to interactively generate pywinauto scripts to automate functions on Windows. Even better is that you can then edit the resulting Python code if you want – far nicer than switching to something like AutoHotKey
  • Glue: A great Python-based GUI for exploring data relationships, principally based on ‘linked displays’. All functionality is available through the Python API too – and the documentation is great.
  • Gloo: I really loved the ProjectTemplate library for R, but somehow never quite got as comfortable with this port of the library to Python. I really should try again – as the idea of a standardised structure for all analysis projects is very appealing.
  • pudb: Interactive, curses-style debugger, even accessible remotely and through IPython. I must remember to use this more!/li>
  • pony: An interesting new Object-Relational Model, a potential competitor to SQLAlchemy. I like its pythonic-nature
  • pyserial: Simple and easy-to-use library for serial communications in Python. I’ve used this for connecting to scientific instruments as well as for home automation.
  • xmltodict: This makes working with XML feel like you are working with JSON, by parsing XML data to a dict. You wouldn’t want to use it on enormous XML files, but for quick scripts it’s great!
  • uncertainties: A very easy-to-use package that lets you do calculations with uncertain numbers (eg. 3 +/- 0.3) – even in numpy arrays
  • pathlib: Do you hate os.path.join as much as I do? How does dir / output_folder / filename seem instead? A great pythonic path-handling package, which is a part of the standard library since Python 3.4. This package allows you to get the same functionality in previous versions.
  • geocoder: Very easy-to-use geocoding module
  • fuzzywuzzy: Simple but comprehensive fuzzy string matching library
  • blessings: The easiest way to introduce colour, font styles and positioning to your terminal programs
  • PrettyPandas: Handy API for making nicely-formatted Pandas tables
  • pandas-profiling: I think this is slightly misleadingly named: it doesn’t do profiling in a ‘speed’ sense, but in a ‘summary’ sense. Basically it’ll produce a lovely HTML summary of your Pandas DataFrame, with a huge amount of detail
  • PyDataset Do you envy R programmers with their handy access to various nice test datasets as data(cars) and so on? Well, this does the same for Python – with an even larger range of data
  • pyq Allows you to search Python code using jQuery-like selectors, such as class:extends(#IntegerField) for all classes that extend the IntegerField class. Fascinating, and I can see all sorts of interesting uses for this…if only I had the time!

Conda

I use the Anaconda scientific Python distribution to get a standard, easily-configurable Python set up on all of my machines. I’m not going to give full details for each of these links, as they are fairly self-explanatory – but definitely very useful for those using Anaconda.

Code Architecture

The most difficult part of programming is designing and structuring your code: the actual ‘getting the computer to do what you want’ bit is often relatively easy. This becomes particularly difficult with larger projects. The links below are all interesting discussions of software architecture with a Python focus. I find the 500 Lines or Less posts to be particularly interesting: they all implement challenging programs in relatively short pieces of code. They’ll all be released in book form eventually – and I’m definitely going to buy a copy!