Data Normalization in Python

Opening Day Well it’s that time of the year again in the United States. The 162 game marathon MLB season is officially underway. In honor of the opening of another season of America’s Pasttime I was working on a post that uses data from the MLB. What I realized was that as I was writing […]

Read More

Electron Release Manager

Recently we released a new version of our Rodeo, our data science IDE. In the past this meant our users would have to go to our homepage, click on the Rodeo page, download Rodeo again, and then reinstall it. But luckily this is no longer the case! As of the v1.1 release, we’re officially supporting […]

Read More

Summarizing Data in SQL

About Matt: Matt DeLand is Co-Founder and Data Scientist at Wagon. His team is building a collaborative SQL editor for analysts and engineers. He studied algebraic geometry at Columbia University, taught at the University of Michigan, and now enjoys applied machine learning— his mom is very proud! Introduction How quickly can you understand data from […]

Read More

What is Model-Based Machine Learning?

About Tom: Tom Diethe is a research fellow on the SPHERE project at the University of Bristol. His research interests include probabilistic machine learning, computational statistics, learning theory, and data fusion. He has a PhD in machine learning applied to multivariate signal processing from University College London. Contact him at tom.diethe@bristol.ac.uk. Introduction If you haven’t […]

Read More

How we built Rodeo with Electron

Last week we announced the release of Rodeo v1.0. The big deal was that we’d taken Rodeo from a command line, python app built using Flask, to a more legitimate looking desktop app. There were comments on reddit and twitter mentioning that it seemed like Rodeo was running it’s own browser behind the scenes–and these […]

Read More

ScienceCluster Meets Spark

Getting Started When did all the ‘big data’ hoopla start? By the very first definition, in a 1997 paper by scientists at NASA, a data set that is too big to fit on a local disk has officially graduated to big-data-dom. Whether you’re working with large excel files or processing the “10 terabytes generated by […]

Read More