Data Science Things Roundup #1

This is the first in a new series of posts, tailored more towards the newsletter subscribers (join it here).  There are a few of these around the internet that I like, notably:

Mine will probably be way less consistent, so if you like this, then for sure subscribe to those as well.  Anyway, here are a few things I’ve stumbled across related to data science or python recently that you may also find interesting.

Kaggle Past Solutions

I’ve spoken a bit critically of Kaggle in the past (read it here), but there is no denying that it is a huge and bright community, with some great ideas coming out.  While not super up-to-date, this post on Garbled Notes has some solution whitepapers, code, and comments from tons of not only high-rankers in Kaggle challenges, but winners.  Check it out here.

Gooey

Over time, I’ve accumulated a ridiculous number of little one off python CLIs for munging around with data.  I’m sure I’m not in the minority on that.  When it comes time to send it over to someone else so they can use it, often it’s poorly-if-at-all documented, and I’ve forgotten half of how it worked.   Gooey is a simple GUI framework for python CLIs that builds a simple interface rather than forcing users to use the command line.  Especially for less-technical users or coworkers, this can be a huge value-add without very much work at all.  Check it out here.

Metric-Learn

Metric-Learn is a library out of University of Massachusetts for constructing optimal distance metrics with a scikit-learn style interface.  It includes a bunch of different algorithms including Least Squares Metric Learning (LSML) and Information Theoretic Metric Learning (ITML), and will not only transform your data into the relevant metric space by means of the transform() method, but also will give you the transformation matrix directly via transformer(), which is great.  This sort of thing can be really helpful as a pre-processing step for classification, clustering or recommendation systems.  Check it out here.

The post Data Science Things Roundup #1 appeared first on Will’s Noise.