Data Science Things Roundup #7

This weeks edition of the Data Science Things Roundup is pretty python-heavy, as opposed to previous editions that were a bit more machine learning and dataviz heavy.  At the end of the day, some kind of software is backing most of data science, so getting a bit lower level can be useful sometimes.  This week we look at a couple of ways to increase performance in python codebases and one way to generalize them a little better.

Intel Python Distribution

Libraries like numpy generally use some 3rd party backend for number crunching (like Atlas, Blas or LAPACK).  One such backend is Intel’s Math Kernel Library (MKL).  With their custom backends and distribution of python, Intel claims some pretty impressive speedups over vanilla python, presumably without much work on the developer side.  They have an open beta, so if you have a bunch of intel hardware and a need for speed, maybe sign up and try it out. Check it out here.

Go-Python

Gopython lets you create CPython extensions out of Go packages.  I think that’s pretty cool, Go is (arguably) nicer to write than C, and you would presumably end up with similar speedups.  I’ve played around with Go some, but haven’t found a great usecase for this yet.  If you have one or you’ve used this in production, leave a comment below and let me know what you thought. Check it out here.

PyFilesystem

When dealing with data you tend to have to go get it from a ton of different places.  Some FTP server, S3, local filesystem, HDFS or wherever.  PyFliesystem is a useful abstraction to write generalized code for interacting with these diverse filesystems. Check it out here.

The post Data Science Things Roundup #7 appeared first on Will’s Noise.