This is the third in a series of posts about using Prophet to forecast time series data. The other parts can be found here: Forecasting Time Series data with Prophet – Part 1 Forecasting Time Series data with Prophet – Part 2 In those previous posts, I looked at forecasting monthly sales data 24 months […]
Category: Data Analytics
Forecasting Time Series data with Prophet – Jupyter Notebook
In previous posts, I described how I use Prophet to forecast time series data. There were some questions in the comments about the code not working, so I wanted to publish a new post with a link to a Jupyter Notebook that will hopefully provide a full, correct working example. The original posts are: Forecasting […]
Modernizing Pedalwrencher: whatever that means.
I’ve got a side project that I’ve maintained (badly) for the past couple of years, pedalwrencher.com. It’s a pretty simple idea, if you ride bikes, and use strava.com, you can sign up with pedalwrencher and set up mileage based alerts. So if you want to replace you chain every 2000 miles, you can get an […]
Using pandas with large data
Tips for reducing memory usage by up to 90% When working using pandas with small data (under 100 megabytes), performance is rarely a problem. When we move to larger data (100 megabytes to multiple gigabytes), performance issues can make run times much longer, and cause code to fail entirely due to insufficient memory. While tools […]
Git-pandas caching for faster analysis
Git-pandas is a python library I wrote to help make analysis of git data easier when dealing with collections of repositories. It makes a ton of cool stuff easier, like cumulative blame plots, but they can be kind of slow, especially with many large repositories. In the past we’ve made that work with running analyses […]
Python Cheat Sheet for Data Science
The printable version of this cheat sheet It’s common when first learning Python for Data Science to have trouble remembering all the syntax that you need. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help […]
Revisiting Unit Testing and Mocking in Python
My previous blog post, Python Mocking 101: Fake It Before You Make It, discussed the basic mechanics of mocking and unit testing in Python. This post covers some higher-level software engineering principles demonstrated in my experience with Python testing over the past year and half. In particular, I want to revisit the idea of patching […]
PyCharm vs Spyder: a quick comparsion of two Python IDEs
If you have followed my blog you may have noticed that a lot of focus have been put on how to learn programming (particularly in Python). I have also written about Integrated Development Environments (IDEs). IDEs may, in fact, be very useful when learning how to code. When it comes to Python IDEs it may […]
Should I learn Python 2 or 3?
Image Credit: DigitalOcean One of the biggest sources of confusion and misinformation for people wanting to learn Python is which version they should learn. Should I learn Python 2.x or Python 3.x? Indeed, this is one of the questions we are asked most often at Dataquest, where we teach Python as part of our Data […]
Category Encoders v1.2.4 Release
I’ve just cut a fresh release of the scikit-learn-contrib library, category_encoders. This one included a lot of great contributions from the broader community, which has been really great. A few selected features now available: Leave-one-out encoding: a new encoder, based on a popular Kaggle post by Owen Zhang, detailed here and here. (proposal) Maintenance fixes […]
Understanding SettingwithCopyWarning in pandas
SettingWithCopyWarning is one of the most common hurdles people run into when learning pandas. A quick web search will reveal scores of Stack Overflow questions, GitHub issues and forum posts from programmers trying to wrap their heads around what this warning means in their particular situation. It’s no surprise that many struggle with this; there […]
Step-by-step guide for solving the Pyvttbl Float and NoneType error
In this short post I will show you a quick fix for the error “unsupported operand type(s) for +: ‘float’ and ‘NoneType’” with Pyvttbl. In earlier posts I have showed how to carry out ANOVA using Pyvttbl (among other packages. See posts 1, 2, 3, and 3 for ANOVA using pyvttbl). However, Pyvttbl is not […]
Setting up a Python development environment
Setting up Python is usually simple, but there are some places where newcomers (and experienced users) need to be careful. What versions are there? What’s the difference between Python, CPython, Anaconda, PyPy? Those and many other questions may stump new developers, or people wanting to use Python. Note: this guide is opinionated. Contents Glossary and […]
PyCharm vs Spyder: a quick comparison of two Python IDEs
In this post, PyCharm vs Spyder will be compared. If you have followed my blog you may have noticed that a lot of focus have been put on how to learn programming (particularly in Python). I have also written about Integrated Development Environments (IDEs). I think that an IDE may, in fact, be very useful […]
Web Scraping with Python and BeautifulSoup
To source data for data science projects, you’ll often rely on SQL and NoSQL databases, APIs, or ready-made CSV data sets. The problem is that you can’t always find a data set on your topic, databases are not kept current and APIs are either expensive or have usage limits. If the data you’re looking for […]