Data Science Things Roundup #2

This is the second edition of the now-regular series of posts: Data Science Things Roundup, where I round up data science things (as you’d probably guessed). Last week we had a scikit-learn extension, a GUI framework for python CLIs and some writing about how kaggle winners won their competitions. This week is a bit more data-science-y, so dig in.

Lifelines

If you haven’t already checked out lifelines or the author’s (Cam Davidson-Pilon) book Probabilistic Programming and Bayesian Methods for Hackers, you really should. I used lifelines in a previous post where we used survival analysis to try to estimate the quality of different chunks of code. The library itself is really great though, so read the book, use the library, and let me know what you do with it. Check it out here.

Patsy Learn

Pasty-learn is an experimental scikit-learn extension written by Andreas Mueller, a core committer to scikit-learn and probably some other machine learning libraries you use. Patsy-learn is a wrapper around patsy to let you use it’s R-style model description syntax in scikit-learn, so if you’re halfway between R and Python, this is probably the sort of thing you’d like. Check it out here.

HDBSCAN

DBSCAN is a pretty great clustering algorithm. From the practitioner/hacker perspective HDBSCAN is that but better. This particular implementation is very fast, and HDBSCAN’s general ease of configuration and robustness to parameter selection makes this a pretty great first stab at any clustering problem you may be working on. Check it out here.

The post Data Science Things Roundup #2 appeared first on Will’s Noise.