Using pandas with large data

Tips for reducing memory usage by up to 90% When working using pandas with small data (under 100 megabytes), performance is rarely a problem. When we move to larger data (100 megabytes to multiple gigabytes), performance issues can make run times much longer, and cause code to fail entirely due to insufficient memory. While tools […]

Read More

Python Cheat Sheet for Data Science

The printable version of this cheat sheet It’s common when first learning Python for Data Science to have trouble remembering all the syntax that you need. While at Dataquest we advocate getting used to consulting the Python documentation, sometimes it’s nice to have a handy reference, so we’ve put together this cheat sheet to help […]

Read More

Revisiting Unit Testing and Mocking in Python

My previous blog post, Python Mocking 101: Fake It Before You Make It, discussed the basic mechanics of mocking and unit testing in Python. This post covers some higher-level software engineering principles demonstrated in my experience with Python testing over the past year and half. In particular, I want to revisit the idea of patching […]

Read More

Should I learn Python 2 or 3?

Image Credit: DigitalOcean One of the biggest sources of confusion and misinformation for people wanting to learn Python is which version they should learn. Should I learn Python 2.x or Python 3.x? Indeed, this is one of the questions we are asked most often at Dataquest, where we teach Python as part of our Data […]

Read More

Category Encoders v1.2.4 Release

I’ve just cut a fresh release of the scikit-learn-contrib library, category_encoders.  This one included a lot of great contributions from the broader community, which has been really great. A few selected features now available: Leave-one-out encoding: a new encoder, based on a popular Kaggle post by Owen Zhang, detailed here and here. (proposal) Maintenance fixes […]

Read More

Setting up a Python development environment

Setting up Python is usually simple, but there are some places where newcomers (and experienced users) need to be careful. What versions are there? What’s the difference between Python, CPython, Anaconda, PyPy? Those and many other questions may stump new developers, or people wanting to use Python. Note: this guide is opinionated. Contents Glossary and […]

Read More