Data Science Things Roundup #4

Time for another edition of the data science things roundup, where I round up some data science things for ya’ll.  Todays collections are uncharacteristically R heavy.  It’s usually pretty python and machine learning heavy, so if you find something you like here, be sure to check out previous editions as well.  Without further adieu:

Scikit-Learn Groups

Scikit-learn groups (skl-groups) is a python library for operating on sets of features (aka “groups”).  It can be particularly useful for less-structured data, and locally interesting subsets like “galaxy clusters that are made up of individual galaxies” or “a set of tweets from a given area and time”.  Rather than constructing one long feature vector for the set, a bag of respective features for the members of the group can be used.  Super interesting library. Check it out here.

Detecting Events with Markov Modulated Poisson Processes

This tutorial goes through the usage of Markov-modulated Poisson processes for unsupervised anomaly detection in noisy time series data.  It’s got some nice looking plots and seems to perform well on noisy data with rare anomalies, which is promising. It also comes along with some corresponding R-code. Check it out here.

Stochastic Dummy Boosting

DBoost is an implementation of stochastic dummy boosting, wherein the weak learner is encoding categorical variables by hash as it learns along, which adds an extra layer of randomness to the process. It’s a pretty short piece of code and is well commented, so even if you don’t have an immediate use for it, is worth a look. Check it out here.

The post Data Science Things Roundup #4 appeared first on Will’s Noise.