Using Excel with pandas

Excel is one of the most popular and widely-used data tools; it’s hard to find an organization that doesn’t work with it in some way. From analysts, to sales VPs, to CEOs, various professionals use Excel for both quick stats and serious data crunching. With Excel being so pervasive, data professionals must be familiar with […]

Read More

Change Python Version for Jupyter Notebook

Three ways to do it- sometimes package dependencies force analysts and developers to require older versions of Python use conda to downgrade Python version (if Anaconda installed already) conda install python=3.5.0 Hat tip- http://chris35wills.github.io/conda_python_version/ https://docs.anaconda.com/anaconda/faq#how-do-i-get-the-latest-anaconda-with-python-3-5 2. you download the latest version of Anaconda and then make a Python 3.5 environment. To create the new environment for Python 3.6, […]

Read More

SQL Fundamentals

The pandas workflow is a common favorite among data analysts and data scientists. The workflow looks something like this: The pandas workflow works well when: the data fits in memory (a few gigabytes but not terabytes) the data is relatively static (doesn’t need to be loaded into memory every minute because the data has changed) […]

Read More

Importing data from csv file using PySpark

There are two ways to import the csv file, one as a RDD and the other as Spark Dataframe(preferred) !pip install pyspark from pyspark import SparkContext, SparkConf sc =SparkContext() A SparkContext represents the connection to a Spark cluster, and can be used to create RDD and broadcast variables on that cluster.  https://spark.apache.org/docs/latest/rdd-programming-guide.html#overview To create a […]

Read More