Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. Pandas builds on packages like NumPy and matplotlib to give you a single, convenient, place to do most of your data analysis […]
Author: Vik Paruchuri
NumPy Tutorial: Data analysis with Python
NumPy is a commonly used Python data analysis package. By using NumPy, you can speed up your workflow, and interface with other packages in the Python ecosystem, like scikit-learn, that use NumPy under the hood. NumPy was originally developed in the mid 2000s, and arose from an even older package called Numeric. This longevity means […]
28 Jupyter Notebook tips, tricks and shortcuts
This post is based on a post that originally appeared on Alex Rogozhnikov’s blog, ‘Brilliantly Wrong’. We have expanded the post and will continue to do so over time – if you have a suggestion please let us know in the comments. Thanks to Alex for graciously letting us republish his work here. Jupyter Notebook […]
Working with SQLite Databases using Python and Pandas
SQLite is a database engine that makes it simple to store and work with relational data. Much like the csv format, SQLite stores data in a single file that can be easily shared with others. Most programming languages and environments have good support for working with SQLite databases. Python is no exception, and a library […]
Learn Python the right way in 5 steps
Python is an amazingly versatile programming language. You can use it to build websites, machine learning algorithms, and even autonomous drones. A huge percentage of programmers in the world use Python, and for good reason. It gives you the power to create almost anything. But – and this is a big but – you have […]
17 places to find datasets for data science projects
This is the fifth post in a series of posts on how to build a Data Science Portfolio. If you like this and want to know when the next post in the series is released, you can subscribe at the bottom of the page. If you’ve ever worked on a personal data science project, you’ve […]
Working with streaming data: Using the Twitter API to capture tweets
If you’ve done any data science or data analysis work, you’ve probably read in a csv file or connected to a database and queried rows. A typical data analysis workflow involves retrieving stored data, loading it into an analysis tool, and then exploring it. This works well when you’re dealing with historical data such as […]
The key to building a data science portfolio that will get you a job
This is the fourth post in a series of posts on how to build a Data Science Portfolio. If you like this and want to know when the next post in the series is released, you can subscribe at the bottom of the page. In the past few posts in this series, we’ve talked about […]
How I built a Slack bot to help me find an apartment in San Francisco
I moved from Boston to the Bay Area a few months ago. Priya (my girlfriend) and I heard all sorts of horror stories about the rental market. The fact that searching for “How to find an apartment in San Francisco” on Google yields dozens of pages of advice is a good indicator that apartment hunting […]
Building a data science portfolio: Machine learning project
This is the third in a series of posts on how to build a Data Science Portfolio. If you like this and want to know when the next post in the series is released, you can subscribe at the bottom of the page. Data science companies are increasingly looking at portfolios when making hiring decisions. […]
Building a data science portfolio: Making a data science blog
This is the second in a series of posts on how to build a Data Science Portfolio. If you like this and want to know when the next post in the series is released, you can subscribe at the bottom of the page. You can read the first post in this series here: Building a […]
Building a data science portfolio: Storytelling with data
This is the first in a series of posts on how to build a Data Science Portfolio. If you like this and want to know when the next post in the series is released, you can subscribe at the bottom of the page. Data science companies are increasingly looking at portfolios when making hiring decisions. […]
Matplotlib tutorial: Plotting tweets mentioning Trump, Clinton & Sanders
Analyzing Tweets with Pandas and Matplotlib Python has a variety of visualization libraries, including seaborn, networkx, and vispy. Most Python visualization libraries are based wholly or partially on matplotlib, which often makes it the first resort for making simple plots, and the last resort for making plots too complex to create in other libraries. In […]
How to get into the top 15 of a Kaggle competition using Python
Kaggle competitions are a fantastic way to learn data science and build your portfolio. I personally used Kaggle to learn many data science concepts. I started out with Kaggle a few months after learning programming, and later won several competitions. Doing well in a Kaggle competition requires more than just knowing machine learning algorithms. It […]
Python & JSON: Working with large datasets using Pandas
Working with large JSON datasets can be a pain, particularly when they are too large to fit into memory. In cases like this, a combination of command line tools and Python can make for an efficient way to explore and analyze the data. In this post, we’ll look at how to leverage tools like Pandas […]