I recently took a stab at a Kaggle competition. The premise was simple, given some information about insurance quotes, predict whether or not the customer who requested the quote will follow through and buy the insurance. Straight forward classification problem, data already clean and in one place, clear scoring metric (Area under the ROC curve). […]
Author: Will McGinnis
Weighing options with python and petersburg
In the past few weeks I’ve posted a number of examples of applications for petersburg. You can check them out here, here and here. It is, in short, an extension of probabilistic decision graphs and Bayesian networks that allows for some interesting analysis on decision theoretic problems. The interesting thing that arises out of this […]
Git-pandas v1.0.0, or how to check for a stable release
In the process of making the v1.0.0 release of git-pandas, I had one primary goal: to simplify and solidify the interface to git-pandas objects (the ProjectDirectory and the Repository). At the end of the day, the usefulness of a project like git-pandas versus one off analysis or rolling your own interface is consistent and predictable […]
Github.com cumulative blame in 5 lines of python
Git-pandas has gotten to be pretty capable. Currently in the master branch and soon to be in the v1.0.0 release, we’ve included a github.com interface to git-pandas via the GitHubProfile class. With this, in just a few lines of code, you can see how your profile has grown over time: from gitpandas.utilities.plotting import plot_cumulative_blame from […]
Decision strategies: beyond expected value
Oftentimes when making some kind of uncertain decision, the decision maker will use a measure such as expected value to make that decision. Imagine the case of a single coin flip where the better pays 5 dollars to play, and gets 2 dollars for heads and 10 dollars for tails. The expected value of this […]
Introducing the pygeohash stats module
Pygeohash version 1.1.0 is now live on pypi! It include the first release of the stats module, intended to provide high level stats and manipulations for lists of geohashes. The initial functions are: mean: provides the mean position from a list of geohashes northern: the northernmost geohash in the list southern: the southernmost geohash in […]
Solving the two envelopes problem with python and petersburg
In a couple of previous posts (here, here, and here) we’ve explored how petersburg represents uncertain decisions as a directed acyclic graph with weighted random decisions for which edge to take out of a given node. It turns out this is very similar to Bayesian networks, which will be the subject of post in the […]
Open Source Projects In Atlanta
A few times recently, I’ve been working with an open source project of some kind and it turned out that the maintainer was another Atlanta local. We have great resources for keeping track of what entrepreneurs in town are working on (see my post on that here), and what local VCs are up to (here), […]
Bayesian Networks vs. Petersburg
A couple of weeks ago we walked through how petersburg represents complex decisions (check it out here). Some of you may have recognized a familiar concept in that description: Bayesian networks (or bayesnet). Just like petersburg’s structure, a Bayesian network is at it’s core a Directed Acyclic Graph (DAG). So let’s first discuss what a […]
Thinking like a graph to make decisions: petersburg
A while back I posted an example outlining how to use petersburg to simulate the St. Petersburg Paradox. In this post, I’d like to dig a little deeper into what petersbug is and what it does. Petersburg is a minimal python library for representing complex decisions (as in decision theory decisions) as probabilistic graphs. With […]