Kaggle is a site where people create algorithms and compete against machine learning practitioners around the world. Your algorithm wins the competition if it’s the most accurate on a particular data set. Kaggle is a fun way to practice your machine learning skills. This tutorial is based on part of our free, four-part course: Kaggle […]

# Category: NumPy

## Text Analytics and Visualization

For this post, I want to describe a text analytics and visualization technique using a basic keyword extraction mechanism using nothing but a word counter to find the top 3 keywords from a corpus of articles that I’ve created from my blog at http://ericbrown.com. To create this corpus, I downloaded all of my blog posts […]

## Explore Happiness Data Using Python Pivot Tables

One of the biggest challenges when facing a new data set is knowing where to start and what to focus on. Being able to quickly summarize hundreds of rows and columns can save you a lot of time and frustration. A simple tool you can use to achieve this is a pivot table, which helps […]

## Machine Learning Fundamentals: Predicting Airbnb Prices

Machine learning is easily one of the biggest buzzwords in tech right now. Over the past three years Google searches for “machine learning” have increased by over 350%. But understanding machine learning can be difficult — you either use pre-built packages that act like ‘black boxes’ where you pass in data and magic comes out […]

## Using pandas with large data

Tips for reducing memory usage by up to 90% When working using pandas with small data (under 100 megabytes), performance is rarely a problem. When we move to larger data (100 megabytes to multiple gigabytes), performance issues can make run times much longer, and cause code to fail entirely due to insufficient memory. While tools […]

## Should I learn Python 2 or 3?

Image Credit: DigitalOcean One of the biggest sources of confusion and misinformation for people wanting to learn Python is which version they should learn. Should I learn Python 2.x or Python 3.x? Indeed, this is one of the questions we are asked most often at Dataquest, where we teach Python as part of our Data […]

## Understanding SettingwithCopyWarning in pandas

SettingWithCopyWarning is one of the most common hurdles people run into when learning pandas. A quick web search will reveal scores of Stack Overflow questions, GitHub issues and forum posts from programmers trying to wrap their heads around what this warning means in their particular situation. It’s no surprise that many struggle with this; there […]

## Step-by-step guide for solving the Pyvttbl Float and NoneType error

In this short post I will show you a quick fix for the error “unsupported operand type(s) for +: ‘float’ and ‘NoneType’” with Pyvttbl. In earlier posts I have showed how to carry out ANOVA using Pyvttbl (among other packages. See posts 1, 2, 3, and 3 for ANOVA using pyvttbl). However, Pyvttbl is not […]

## Setting up a Python development environment

Setting up Python is usually simple, but there are some places where newcomers (and experienced users) need to be careful. What versions are there? What’s the difference between Python, CPython, Anaconda, PyPy? Those and many other questions may stump new developers, or people wanting to use Python. Note: this guide is opinionated. Contents Glossary and […]

## Forecasting Time-Series data with Prophet – Part 2

In Forecasting Time-Series data with Prophet – Part 1, I introduced Facebook’s Prophet library for time-series forecasting. In this article, I wanted to take some time to share how I work with the data after the forecasts. Specifically, I wanted to share some tips on how I visualize the Prophet forecasts using matplotlib rather than […]

## Visualizing data – overlaying charts in python

Visualizing data is vital to analyzing data. If you can’t see your data – and see it in multiple ways – you’ll have a hard time analyzing that data. There are quite a few ways to visualize data and, thankfully, with pandas, matplotlib and/or seaborn, you can make some pretty powerful visualizations during analysis. One of […]

## Forecasting Time-Series data with Prophet – Part 1

This is part 1 of a series where I look at using Prophet for Time-Series forecasting in Python A lot of what I do in my data analytics work is understanding time series data, modeling that data and trying to forecast what might come next in that data. Over the years I’ve used many different […]

## Getting Started with Kaggle: House Prices Competition

Founded in 2010, Kaggle is a Data Science platform where users can share, collaborate, and compete. One key feature of Kaggle is “Competitions”, which offers users the ability to practice on real world data and to test their skills with, and against, an international community. This guide will teach you how to approach and enter […]

## NumPy Cheat Sheet – Python for Data Science

NumPy is the library that gives Python its ability to work with data at speed. Originally, launched in 1995 as ‘Numeric,’ NumPy is the foundation on which man importany Python data science libraries are built, including Pandas, SciPy and scikit-learn. The printable version of this cheat sheet It’s common when first learning NumPy to have […]

## How to do Descriptives Statistics in Python using Numpy

In this short post we are going to revisit the topic on how to carry out summary/descriptive statistics in Python. In the previous post, I used Pandas (but also SciPy and Numpy, see Descriptive Statistics Using Python) but now we are only going to use Numpy. The descriptive statistics we are going to calculate are […]