In basketball, we typically talk about 5 positions: point guard, shooting guard, small forward, power forward, and center. Based on this, one might expect NBA players to fall into 5 distinct groups- Point guards perform similar to other point guards, shooting guards perform similar to other shooting guards, etc. Is this the case? Do NBA […]
Using survival analysis and git-pandas to estimate code quality
Survival analysis is a statistical technique for determining the likelihood of events to happen over a timeline. It was originally based heavily in the medical/actuarial profession, where it would answer questions like: given this set of conditions, how likely is a person to survive X years? In previous posts, we’ve seen that we can tap […]
Updated Django Website
One year later Last year I wrote about developing our company website with Django: http://blog.aclark.net/2015/01/11/new-django-website/index.html This year, I updated the site and am again very happy with the results. Here’s an overview of the interesting aspects. Makefile I’ve continued to develop Python projects using a Makefile. So much so I’m now attempting to genericize the […]
Using Support Vector Machines for Digit Recognition
I have been sitting around on the MNIST data set for a while now. MNIST database is a large database of handwritten digits and these are provided in the Kaggle Knowledge Competition Digit Recognizer. I have been sitting on this data set for so long in fact, that the last thing I have written for it was […]
Interview with a Data Scientist Tool Developer
About Peadar: Peadar Coyle is a data scientist, author and math geek who specializes in applying robust statistical or machine learning models to data to extract business value. His academic interests range from quantum computing to time series forecasting. Peadar has worked or consulted for Amazon, Vodafone, Import.io and JobTODAY, to name a few. He […]
Overriding Default Arguments in Python
Sometimes you want to change the behavior of a function call in a Python test. Let’s assume you have the following code: # a.py from b import subfunc def func(): # do something subfunc(1, 2) # do something else # b.py def subfunc(a, b=1): # step1 # step2 # step3 You are testing the func […]
psutil 4.0.0 and how to get “real” process memory and environ in Python
New psutil 4.0.0 is out, with some interesting news about process memory metrics. I’ll just get straight to the point and describe what’s new. “Real” process memory info Determining how much memory a process really uses is not an easy matter (see this and this). RSS (Resident Set Size), which is what most people usually […]
Repeated measures ANOVA using Python
A common method in experimental psychology is within-subjects designs. One way to analysis the data collected using within-subjects designs are using repeated measures ANOVA. I recently wrote a post on how to conduct a repeated measures ANOVA using Python and rpy2. I wrote that post since the great Python package statsmodels do not include repeated […]
Journalism and the perfect pitch deck
Having been pretty immersed in the VC funded startup experience at Predikto for a couple of years now, I have (counter to what I would have ever thought), taken an intellectual curiosity to pitch decks. As I come across them, I put up notable pitch decks on a page here, including the fantastically annotated LinkedIn […]
Coming to Bay Area in April
Despite my visa blues ( see more at https://todayilearnedinamerica.wordpress.com/2016/02/15/night-13-make-epic-shit/ ) I am still hanging on and traveling on in the United States of America. I am also going to TWO of the best conferences I have never attended despite being a blog Partner since past three years. Predictive Analytics World San Francisco – April 3-7, 2016…
Early Bird Tickets for EARL R Conference London 2016
We are pleased to announce that early bird tickets for the EARL (Effective Application of the R Language) Conference have been released and are now available to purchase. The conference will be held on the 13th-15th September 2016 at the Tower Hotel in London. The Full Conference Pass includes: 2 pre-conference workshops (13thSeptember) both full conference […]
Python Design Patterns: For Sleek And Fashionable Code
Python is a powerful, object-based, high-level programming language with dynamic typing and binding. Due to its flexibility and power, developers often employ certain rules, or Python design patterns. What makes them so important and what do does this mean for the average Python developer? In this post, Toptal Senior Software Engineer Andrei Boyanov explains why […]
Dynaconf – Let your settings to be Dynamic
Dynaconf dynaconf – The dynamic configurator for your Python Project dynaconf is an OSM (Object Settings Mapper) it can read settings variables from a set of different data stores such as python settings files, environment variables, redis, memcached, ini files, json files, yaml files and you can customize dynaconf loaders to read from wherever you […]
How to always execute exit functions in Python
…or why atexit.register() and signal.signal() are evil UPDATE (2016-02-13): this recipe no longer handles SIGINT, SIGQUIT and SIGABRT as aliases for “application exit” because it was a bad idea. It only handles SIGTERM. Also it no longer support Windows because signal.signal() implementation is too different than POSIX. Many people erroneously think that any function registered […]
Secured Communication for Hacker Activists and Liberals
Does the NSA track Git requests. I mean can’t the terrorists just be talking to each other by Visual Cryptography of Arabic through Git Repo requests. Basically increase the cost of decryption. This is Visual Cryptography. Now Imagine using a one time pad codebook of just emojis and talking through mobile and Kik. Etherpad is…