Will’s NoiseBowl Game Pick ’em ResultsOn taking things too seriously: holiday editionElote: a python package of rating systemsRipyr: sampled metrics on datasets using python’s asyncioCategory Encoders v1.2.5 ReleaseStanding Peachtree ParkData Science Things Roudup #11Modernizing Pedalwrencher: whatever that means.Git-pandas caching for faster analysisCategory Encoders v1.2.4 Release http://www.willmcginnis.com Data Science, Technology, Atlanta Mon, 25 Dec 2017 00:52:27 […]
Author: Will McGinnis
Bowl Game Pick ’em Results
If you haven’t read my previous post on picking bowl game winners with elote, this may not make a whole lot of sense, but basically I wrote a rating system, trained it on the college football season thus far, and used it to predict winners for every bowl game. In this post, I’m tracking how […]
On taking things to seriously: holiday edition
For some reason Atlanta got a pretty significant amount of snow yesterday, and because of that I’ve been mostly stuck at home. When faced with that kind of time on hand, sometimes I spend too much time on things that don’t really matter all that much. Recently, I’ve been fascinated with rating systems (see a […]
Elote: a python package of rating systems
Recently I’ve been interesting in rating systems. Around here the application most front of mind for those is college football rankings. In general, imagine any case you have a large population of things you want to rank, and only a limited set of head-to-head matchups between those things to use for building your ratings. Without […]
Ripyr: sampled metrics on datasets using python’s asyncio
Today I’d like to introduce a little python library I’ve toyed around with here and there for the past year or so, ripyr. Originally it was written just as an excuse to try out some newer features in modern python: asyncio and type hinting. The whole package is type hinted, which turned out to be […]
Category Encoders v1.2.5 Release
This release was actually cut a couple of weeks ago, but I forgot to put a post here. It’s been a release of mainly incremental changes, but also one of increased contributions from the community, so while not a huge feature-packed release, it’s one I’m particularly proud of. Here’s to more like this. It was […]
Standing Peachtree Park
I find it very easy to forget the world we live in. I grew up in North Atlanta, near the Chattahoochee, and spent a huge portion of it riding bikes to the river, going to some park on the river, or otherwise being around it. I’ve lived intown now as an adult for some years, […]
Data Science Things Roudup #11
Once again time for the data science things roundup, a few links of articles or projects I’ve stumbled across and found interesting. This is the 11th one in the extremely irregular series, so if you think it’s cool, check out some of the others: This time we’ve got quite a diverse set of links, so […]
Modernizing Pedalwrencher: whatever that means.
I’ve got a side project that I’ve maintained (badly) for the past couple of years, pedalwrencher.com. It’s a pretty simple idea, if you ride bikes, and use strava.com, you can sign up with pedalwrencher and set up mileage based alerts. So if you want to replace you chain every 2000 miles, you can get an […]
Git-pandas caching for faster analysis
Git-pandas is a python library I wrote to help make analysis of git data easier when dealing with collections of repositories. It makes a ton of cool stuff easier, like cumulative blame plots, but they can be kind of slow, especially with many large repositories. In the past we’ve made that work with running analyses […]
Category Encoders v1.2.4 Release
I’ve just cut a fresh release of the scikit-learn-contrib library, category_encoders. This one included a lot of great contributions from the broader community, which has been really great. A few selected features now available: Leave-one-out encoding: a new encoder, based on a popular Kaggle post by Owen Zhang, detailed here and here. (proposal) Maintenance fixes […]
Data Science Things Roundup #10
Hey all, I haven’t done one of these in quite a while, but thought I’d share a few more articles I’ve found interesting recently. An analysis of twitter influencers in the field of data science & big data This is a pretty in depth medium article that goes through some of the concepts in network […]
Garden shed and woodpile
For the past couple of years this website has been pretty much all about software or the business of writing/selling software. But as it turns out, that is not all a person should really do, so here is a totally unrelated post. My house has a approximately 5 ft wide alleyway sort of thing around […]
BaseN Encoding and Grid Search in category_encoders
In the past I’ve posted about the various categorical encoding methods one can use for machine learning tasks, like one-hot encoding, ordinal or binary. In my OSS package, category_encodings, I’ve added a single scikit-learn compatible encoder called BaseNEncoder, which allows the user to pick a base (2 for binary, N for ordinal, 1 for one-hot, […]
Category Encoders accepted into scikit-learn-contrib
In the past I’ve posted a few times about a library I’m working on called category encoders. The idea of it is to provide a complete toolbox of scikit-learn compatible transformers for the encoding of categorical variables in different ways. If that sounds interesting, you can check out much more in-depth posts here and here. […]