Libraries – Page 4

Python Plotting With Matplotlib (Guide)

February 28, 2018February 28, 2018 Real Python

A picture says a thousand words, and with Python’s matplotlib library, it fortunately takes far less than a thousand words of code to create a production-quality graphic. However, matplotlib is also a massive library, and getting a plot to look “just right” is often practiced on a trial-and-error basis. Using one-liners to generate basic plots […]

Simplifying Offline Python Deployments With Docker

January 24, 2018January 24, 2018 Real Python

In cases when a production server does not have access to the Internet or to the internal network, you will need to bundle up the Python dependencies (as wheel files) and interpreter along with the source code. This post looks at how to package up a Python project for distribution internally on a machine cut […]

Local Interpretable Model-agnostic Explanations – LIME in Python

January 20, 2018January 20, 2018 Python Data

When working with classification and/or regression techniques, its always good to have the ability to ‘explain’ what your model is doing. Using Local Interpretable Model-agnostic Explanations (LIME), you now have the ability to quickly provide visual explanations of your model(s). Its quite easy to throw numbers or content into an algorithm and get a result […]

Introduction to Python Ensembles

January 11, 2018January 11, 2018 Sebastian Flennerhag

Stacking models in Python efficiently Ensembles have rapidly become one of the hottest and most popular methods in applied machine learning. Virtually every winning Kaggle solution features them, and many data science pipelines have ensembles in them. Put simply, ensembles combine predictions from different models to generate a final prediction, and the more models we […]

Adding Axis Labels to Plots With pandas

December 20, 2017December 20, 2017 Josh Devlin

Pandas plotting methods provide an easy way to plot pandas objects. Often though, you’d like to add axis labels, which involves understanding the intricacies of Matplotlib syntax. Thankfully, there’s a way to do this entirely using pandas. Let’s start by importing the required libraries: import pandas as pd import numpy as np import matplotlib.pyplot as […]

Pandas Concatenation Tutorial

December 13, 2017December 13, 2017 Sunishchal Dev

You’d be hard pressed to find a data science project which doesn’t require multiple data sources to be combined together. Often times, data analysis calls for appending new rows to a table, pulling additional columns in, or in more complex cases, merging distinct tables on a common key. All of these tricks are handy to […]

Using Excel with pandas

December 8, 2017December 8, 2017 Harish Garg 1 Comment

Excel is one of the most popular and widely-used data tools; it’s hard to find an organization that doesn’t work with it in some way. From analysts, to sales VPs, to CEOs, various professionals use Excel for both quick stats and serious data crunching. With Excel being so pervasive, data professionals must be familiar with […]

Setting Up the PyData Stack on Windows

November 22, 2017November 22, 2017 Vik Paruchuri

The speed of modern electronic devices allows us to crunch large amounts of data at home. However, these devices require the right software in order to reach peak performance. Luckily, it’s now easier than ever to set up your own data science environment. One of the most popular stacks for data science is PyData, a […]

Kaggle Fundamentals: The Titanic Competition

October 25, 2017October 25, 2017 Vik Paruchuri

Kaggle is a site where people create algorithms and compete against machine learning practitioners around the world. Your algorithm wins the competition if it’s the most accurate on a particular data set. Kaggle is a fun way to practice your machine learning skills. This tutorial is based on part of our free, four-part course: Kaggle […]

Text Analytics and Visualization

October 9, 2017October 9, 2017 Python Data

For this post, I want to describe a text analytics and visualization technique using a basic keyword extraction mechanism using nothing but a word counter to find the top 3 keywords from a corpus of articles that I’ve created from my blog at http://ericbrown.com. To create this corpus, I downloaded all of my blog posts […]

Explore Happiness Data Using Python Pivot Tables

September 25, 2017September 25, 2017 Vik Paruchuri

One of the biggest challenges when facing a new data set is knowing where to start and what to focus on. Being able to quickly summarize hundreds of rows and columns can save you a lot of time and frustration. A simple tool you can use to achieve this is a pivot table, which helps […]

Machine Learning Fundamentals: Predicting Airbnb Prices

August 31, 2017August 31, 2017 Vik Paruchuri

Machine learning is easily one of the biggest buzzwords in tech right now. Over the past three years Google searches for “machine learning” have increased by over 350%. But understanding machine learning can be difficult — you either use pre-built packages that act like ‘black boxes’ where you pass in data and magic comes out […]

Using pandas with large data

August 5, 2017August 8, 2017 Vik Paruchuri

Tips for reducing memory usage by up to 90% When working using pandas with small data (under 100 megabytes), performance is rarely a problem. When we move to larger data (100 megabytes to multiple gigabytes), performance issues can make run times much longer, and cause code to fail entirely due to insufficient memory. While tools […]

Git-pandas caching for faster analysis

July 26, 2017July 26, 2017 Will McGinnis

Git-pandas is a python library I wrote to help make analysis of git data easier when dealing with collections of repositories. It makes a ton of cool stuff easier, like cumulative blame plots, but they can be kind of slow, especially with many large repositories. In the past we’ve made that work with running analyses […]

Should I learn Python 2 or 3?

July 13, 2017July 13, 2017 Vik Paruchuri

Image Credit: DigitalOcean One of the biggest sources of confusion and misinformation for people wanting to learn Python is which version they should learn. Should I learn Python 2.x or Python 3.x? Indeed, this is one of the questions we are asked most often at Dataquest, where we teach Python as part of our Data […]