Founded in 2010, Kaggle is a Data Science platform where users can share, collaborate, and compete. One key feature of Kaggle is “Competitions”, which offers users the ability to practice on real world data and to test their skills with, and against, an international community. This guide will teach you how to approach and enter […]
Author: Vik Paruchuri
How to become a data scientist
Data science is one of the most buzzed about fields right now, and data scientists are in extreme demand. And with good reason – data scientists are doing everything from creating self-driving cars to automatically captioning images. Given all the interesting applications, it makes sense that data science is a very sought-after career. Data science […]
NumPy Cheat Sheet – Python for Data Science
NumPy is the library that gives Python its ability to work with data at speed. Originally, launched in 1995 as ‘Numeric,’ NumPy is the foundation on which man importany Python data science libraries are built, including Pandas, SciPy and scikit-learn. The printable version of this cheat sheet It’s common when first learning NumPy to have […]
Turbocharge Your Data Acquisition using the data.world Python Library
When working with data, a key part of your workflow is finding and importing data sets. Being able to quickly locate data, understand it and combine it with other sources can be difficult. One tool to help with this is data.world, where you can search for, copy, analyze, and download data sets. In addition, you […]
Building An Analytics Data Pipeline In Python
If you’ve ever wanted to work with streaming data, or data that changes quickly, you may be familiar with the concept of a data pipeline. Data pipelines allow you transform data from one representation to another through a series of steps. Data pipelines are a key part of data engineering, which we teach in our […]
Pandas Cheat Sheet – Python for Data Science
Pandas is arguably the most important Python package for data science. Not only does it give you lots of methods and functions that make working with data easier, but it has been optimized for speed which gives you a significant advantage compared with working with numeric data using Python’s built-in functions. The printable version of […]
1 tip for effective data visualization in Python
Yes, you read correctly – this post will only give you 1 tip. I know most posts like this have 5 or more tips. I once saw a post with 15 tips, but I may have been daydreaming at the time. You’re probably wondering what makes this 1 tip so special. “Vik”, you may ask, […]
What is Data Engineering?
This is the first in a series of posts on Data Engineering. If you like this and want to know when the next post in the series is released, you can subscribe at the bottom of the page. From helping cars drive themselves to helping Facebook tag you in photos, data science has attracted a […]
How to present your data science portfolio on Github
This is the fifth and final post in a series of posts on how to build a Data Science Portfolio. In the previous posts in our portfolio series, we talked about how to build a storytelling project, how to create a data science blog, how to create a machine learning project, and how to construct […]
Why You Should Use Dataquest To Learn Data Science
When I launched Dataquest a little under two years ago, one of the first things I did was write a blog post about why. At the time, if you wanted to become a data scientist, you were confronted with dozens of courses on sites like edX or Coursera with no easy path to getting a […]
The Six Elements of the Perfect Data Science Learning Tool
When I launched Dataquest a little under two years ago, one of the first things I did was write a blog post about why. At the time, if you wanted to become a data scientist, you were confronted with dozens of courses on sites like edX or Coursera with no easy path to getting a […]
Machine Learning Walkthrough Part One: Preparing the Data
Cleaning and preparing data is a critical first step in any machine learning project. In this blog post, Dataquest student Daniel Osei’s takes us through examining a dataset, selecting columns for features, exploring the data visually and then encoding the features for machine learning. This post is based on a Dataquest ‘Monthly Challenge’, where our […]
How to get a data science job
You’ve done it. You just spent months learning how to analyze data and make predictions. You’re now able to go from raw data to well structured insights in a matter of hours. After all that effort, you feel like it’s time to take the next step, and get your first data science job. Unfortunately for […]
Pandas Tutorial: Data analysis with Python: Part 2
We covered a lot of ground in Part 1 of our pandas tutorial. We went from the basics of pandas DataFrames to indexing and computations. If you’re still not confident with Pandas, you might want to check out the Dataquest pandas Course. In this tutorial, we’ll dive into one of the most powerful aspects of […]
Python Web Scraping Tutorial using BeautifulSoup
When performing data science tasks, it’s common to want to use data found on the internet. You’ll usually be able to access this data in csv format, or via an Application Programming Interface(API). However, there are times when the data you want can only be accessed as part of a web page. In cases like […]