Libraries – Page 5

Understanding SettingwithCopyWarning in pandas

July 5, 2017July 5, 2017 Vik Paruchuri

SettingWithCopyWarning is one of the most common hurdles people run into when learning pandas. A quick web search will reveal scores of Stack Overflow questions, GitHub issues and forum posts from programmers trying to wrap their heads around what this warning means in their particular situation. It’s no surprise that many struggle with this; there […]

Step-by-step guide for solving the Pyvttbl Float and NoneType error

July 3, 2017July 3, 2017 Erik Marsja

In this short post I will show you a quick fix for the error “unsupported operand type(s) for +: ‘float’ and ‘NoneType’” with Pyvttbl. In earlier posts I have showed how to carry out ANOVA using Pyvttbl (among other packages. See posts 1, 2, 3, and 3 for ANOVA using pyvttbl). However, Pyvttbl is not […]

Setting up a Python development environment

July 3, 2017July 3, 2017 Chris Warrick

Setting up Python is usually simple, but there are some places where newcomers (and experienced users) need to be careful. What versions are there? What’s the difference between Python, CPython, Anaconda, PyPy? Those and many other questions may stump new developers, or people wanting to use Python. Note: this guide is opinionated. Contents Glossary and […]

Forecasting Time-Series data with Prophet – Part 2

June 16, 2017June 16, 2017 Python Data

In Forecasting Time-Series data with Prophet – Part 1, I introduced Facebook’s Prophet library for time-series forecasting. In this article, I wanted to take some time to share how I work with the data after the forecasts. Specifically, I wanted to share some tips on how I visualize the Prophet forecasts using matplotlib rather than […]

Visualizing data – overlaying charts in python

June 2, 2017June 2, 2017 Python Data

Visualizing data is vital to analyzing data. If you can’t see your data – and see it in multiple ways – you’ll have a hard time analyzing that data. There are quite a few ways to visualize data and, thankfully, with pandas, matplotlib and/or seaborn, you can make some pretty powerful visualizations during analysis. One of […]

Forecasting Time-Series data with Prophet – Part 1

June 1, 2017June 1, 2017 Python Data

This is part 1 of a series where I look at using Prophet for Time-Series forecasting in Python A lot of what I do in my data analytics work is understanding time series data, modeling that data and trying to forecast what might come next in that data. Over the years I’ve used many different […]

Getting Started with Kaggle: House Prices Competition

May 5, 2017May 5, 2017 Vik Paruchuri

Founded in 2010, Kaggle is a Data Science platform where users can share, collaborate, and compete. One key feature of Kaggle is “Competitions”, which offers users the ability to practice on real world data and to test their skills with, and against, an international community. This guide will teach you how to approach and enter […]

NumPy Cheat Sheet – Python for Data Science

April 18, 2017April 18, 2017 Vik Paruchuri

NumPy is the library that gives Python its ability to work with data at speed. Originally, launched in 1995 as ‘Numeric,’ NumPy is the foundation on which man importany Python data science libraries are built, including Pandas, SciPy and scikit-learn. The printable version of this cheat sheet It’s common when first learning NumPy to have […]

How to do Descriptive Statistics in Python using Numpy

March 26, 2017August 23, 2017 Erik Marsja

In this short post we are going to revisit the topic on how to carry out summary/descriptive statistics in Python. In the previous post, I used Pandas (but also SciPy and Numpy, see Descriptive Statistics Using Python) but now we are only going to use Numpy. The descriptive statistics we are going to calculate are […]

How to do Descriptives Statistics in Python using Numpy

March 26, 2017March 26, 2017 Erik Marsja

Self-Organising Maps: In Depth

March 16, 2017March 16, 2017 yhat

About David: David Asboth is a Data Scientist with a software development background. He’s had many different job titles over the years, with a common theme: he solves human problems with computers and data. This post originally appeared on his blog, davidasboth.com Introduction In Part 1, I introduced the concept of Self-Organising Maps (SOMs). Now […]

The Current State of Automated Machine Learning

March 7, 2017March 7, 2017 yhat

About Matthew: Matthew Mayo is a Data Scientist and the Deputy Editor of KDnuggets, as well as a machine learning aficionado and an all-around data enthusiast. Matthew holds a Master’s degree in Computer Science and a graduate diploma in Data Mining. This post originally appeared on the KDNuggets blog. Background What is automated machine learning […]

Diagnosing and Fixing Memory Leaks in Python

March 6, 2017March 9, 2017 mike

Fugue uses Python extensively throughout the Conductor and in our support tools, due to its ease-of-use, extensive package library, and powerful language tools. One thing we’ve learned from building complex software for the cloud is that a language is only as good as its debugging and profiling tools. Logic errors, CPU spikes, and memory leaks […]

A Simple Trending Products Recommendation Engine in Python

February 28, 2017February 28, 2017 yhat

by Chris Clark | February 28, 2017 This blogpost originally appeared on Chris Clark’s blog. Chris is the cofounder of Grove Collaborative, a certified B-corp that delivers amazing, affordardable and effective natural products to your doorstep. We’re fans. Background Our product recommendations at Grove.co were boring. I knew that because our customers told us. When […]

Pandas Cheat Sheet – Python for Data Science

February 21, 2017February 22, 2017 Vik Paruchuri

Pandas is arguably the most important Python package for data science. Not only does it give you lots of methods and functions that make working with data easier, but it has been optimized for speed which gives you a significant advantage compared with working with numeric data using Python’s built-in functions. The printable version of […]