Last October we challenged our PyBites’ audience to make a web app to better navigate the Daily Python Tip feed. In this article, I’ll share what I built and learned along the way. In this article you will learn: How to clone the project repo and set up the app. How to use the Twitter […]
Category: Statistics
Using Excel with pandas
Excel is one of the most popular and widely-used data tools; it’s hard to find an organization that doesn’t work with it in some way. From analysts, to sales VPs, to CEOs, various professionals use Excel for both quick stats and serious data crunching. With Excel being so pervasive, data professionals must be familiar with […]
Regular Expressions for Data Scientists
As data scientists, diving headlong into huge heaps of data is part of the mission. Sometimes, this includes massive corpuses of text. For instance, suppose we were asked to figure out who’s been emailing whom in the scandal of the Panama Papers — we’d be sifting through 11.5 million documents! We could do that manually […]
Setting Up the PyData Stack on Windows
The speed of modern electronic devices allows us to crunch large amounts of data at home. However, these devices require the right software in order to reach peak performance. Luckily, it’s now easier than ever to set up your own data science environment. One of the most popular stacks for data science is PyData, a […]
Change Python Version for Jupyter Notebook
Three ways to do it- sometimes package dependencies force analysts and developers to require older versions of Python use conda to downgrade Python version (if Anaconda installed already) conda install python=3.5.0 Hat tip- http://chris35wills.github.io/conda_python_version/ https://docs.anaconda.com/anaconda/faq#how-do-i-get-the-latest-anaconda-with-python-3-5 2. you download the latest version of Anaconda and then make a Python 3.5 environment. To create the new environment for Python 3.6, […]
SQL Fundamentals
The pandas workflow is a common favorite among data analysts and data scientists. The workflow looks something like this: The pandas workflow works well when: the data fits in memory (a few gigabytes but not terabytes) the data is relatively static (doesn’t need to be loaded into memory every minute because the data has changed) […]
Data Science Things Roudup #11
Once again time for the data science things roundup, a few links of articles or projects I’ve stumbled across and found interesting. This is the 11th one in the extremely irregular series, so if you think it’s cool, check out some of the others: This time we’ve got quite a diverse set of links, so […]
Loading Data into Postgres using Python and CSVs
An introduction to Postgres with Python Data storage is one of (if not) the most integral parts of a data system. You will find hundreds of articles online detailing how to write insane SQL analysis queries, how to run complex machine learning algorithms on petabytes of training data, and how to build statistical models on […]
How to Generate FiveThirtyEight Graphs in Python
If you read data science articles, you may have already stumbled upon FiveThirtyEight’s content. Naturally, you were impressed by their awesome visualizations. You wanted to make your own awesome visualizations and so asked Quora and Reddit how to do it. You received some answers, but they were rather vague. You still can’t get the graphs […]
Importing data from csv file using PySpark
There are two ways to import the csv file, one as a RDD and the other as Spark Dataframe(preferred) !pip install pyspark from pyspark import SparkContext, SparkConf sc =SparkContext() A SparkContext represents the connection to a Spark cluster, and can be used to create RDD and broadcast variables on that cluster. https://spark.apache.org/docs/latest/rdd-programming-guide.html#overview To create a […]
Machine Learning Fundamentals: Predicting Airbnb Prices
Machine learning is easily one of the biggest buzzwords in tech right now. Over the past three years Google searches for “machine learning” have increased by over 350%. But understanding machine learning can be difficult — you either use pre-built packages that act like ‘black boxes’ where you pass in data and magic comes out […]
Web Scraping with Python and BeautifulSoup
To source data for data science projects, you’ll often rely on SQL and NoSQL databases, APIs, or ready-made CSV data sets. The problem is that you can’t always find a data set on your topic, databases are not kept current and APIs are either expensive or have usage limits. If the data you’re looking for […]
Forecasting Time-Series data with Prophet – Part 1
This is part 1 of a series where I look at using Prophet for Time-Series forecasting in Python A lot of what I do in my data analytics work is understanding time series data, modeling that data and trying to forecast what might come next in that data. Over the years I’ve used many different […]
Getting Started with Kaggle: House Prices Competition
Founded in 2010, Kaggle is a Data Science platform where users can share, collaborate, and compete. One key feature of Kaggle is “Competitions”, which offers users the ability to practice on real world data and to test their skills with, and against, an international community. This guide will teach you how to approach and enter […]
A Dramatic Tour through Python’s Data Visualization Landscape (including ggpy and Altair)
by Dan Saber | April 19, 2017 This post originally appeared on Dan Saber’s blog. We thought it was hilarious, so we asked him if we could repost it. He generously agreed! About Dan: My name is Dan Saber. I’m a UCLA math grad, and I do Data Science at Coursera. (Before that, I worked […]