Growing Pedal Wrencher

In the past I’ve posted a few times about the creation and stagnation of a project of mine, pedal wrencher.  In recent months I’ve taken a more structured approach to iteratively improving the app and the plan is to share the lessons learned along the way a bit more freely going forward.  This is a […]

Read More

Data Science Things Roundup #5

Time again for the 5th edition of the data science things roundup, named suspiciously similarly to the much more established Data Science Roundup by RJ Metrics (but we won’t worry about that this week).  In previous weeks we’ve seen some pretty cool ML and Data Science libraries, mostly in python, this week we branch out […]

Read More

Data Science Things Roundup #4

Time for another edition of the data science things roundup, where I round up some data science things for ya’ll.  Todays collections are uncharacteristically R heavy.  It’s usually pretty python and machine learning heavy, so if you find something you like here, be sure to check out previous editions as well.  Without further adieu: Scikit-Learn […]

Read More

When do I work on what?

In past posts, I’ve shown that it’s pretty easy to create organization wide punchcards with git-pandas.  Today, I put together a little twist on that particular visualization, to split my projects into two cohorts: open and closed source.  My work at Predikto is, as work tends to be, mostly closed source (though we try to […]

Read More

Data Science Things Roundup #3

Time again for the 3rd edition of the data science things roundup, where I share a few data science things I’ve come across recently.  Check out previous editions here and here. Self Organizing Maps with TensorFlow Google’s open sourcing of TensorFlow late last year caused a pretty big splash in the machine learning and data […]

Read More

Data Science Things Roundup #2

This is the second edition of the now-regular series of posts: Data Science Things Roundup, where I round up data science things (as you’d probably guessed).  Last week we had a scikit-learn extension, a GUI framework for python CLIs and some writing about how kaggle winners won their competitions.  This week is a bit more […]

Read More

Data Science Things Roundup #1

This is the first in a new series of posts, tailored more towards the newsletter subscribers (join it here).  There are a few of these around the internet that I like, notably: ds_ldn’s Data Machina RJMetrics’ Data Science Roundup Jeremy Singer-Vine’s Data is Plural Mine will probably be way less consistent, so if you like […]

Read More

Testing with Apache Spark and Python

Apache spark and pyspark in particular are fantastically powerful frameworks for large scale data processing and analytics.  In the past I’ve written about flink’s python api a couple of times, but my day-to-day work is in pyspark, not flink.  With any data processing pipeline, thorough testing is critical to ensuring veracity of the end-result, so […]

Read More