Data Science Things Roundup #10

Hey all, I haven’t done one of these in quite a while, but thought I’d share a few more articles I’ve found interesting recently.

An analysis of twitter influencers in the field of data science & big data

This is a pretty in depth medium article that goes through some of the concepts in network analysis, through the lens of twitter data. It’s not an area I know a ton about, but I found it approachable and really interesting. Check it out here.

StashPy

I am a pretty heavy user of the Elasticsearch ecosystem, and have found it to be a really powerful tool.  I also, as you probably know if you read this blog, work a lot in python.  StashPy is a python3 project that does more or less the same thing as a minimal logstash.  So it takes a config, runs listening on a TCP port, and pipes log data though a processing pipeline before indexing into Elasticsearch. Super cool. Check it out here.

Bayesian Survival Analysis with python and pymc3

Survival analysis is a really powerful branch of statistics concerned with predicting the time until some event happens.  It comes up a lot in the medical field in particular (predicting time to death for different cases, as an example).  I’ve used it lightly in a past post to try to predict time until a programmers code would be replaced or deleted, you can check that out here.  In this article, Austin walks through the math backing some of the more common algorithms, and then how to translate that into python. Check it out here.