Change Python Version for Jupyter Notebook

Three ways to do it- sometimes package dependencies force analysts and developers to require older versions of Python use conda to downgrade Python version (if Anaconda installed already) conda install python=3.5.0 Hat tip- http://chris35wills.github.io/conda_python_version/ https://docs.anaconda.com/anaconda/faq#how-do-i-get-the-latest-anaconda-with-python-3-5 2. you download the latest version of Anaconda and then make a Python 3.5 environment. To create the new environment for Python 3.6, […]

Read More

SQL Fundamentals

The pandas workflow is a common favorite among data analysts and data scientists. The workflow looks something like this: The pandas workflow works well when: the data fits in memory (a few gigabytes but not terabytes) the data is relatively static (doesn’t need to be loaded into memory every minute because the data has changed) […]

Read More

Importing data from csv file using PySpark

There are two ways to import the csv file, one as a RDD and the other as Spark Dataframe(preferred) !pip install pyspark from pyspark import SparkContext, SparkConf sc =SparkContext() A SparkContext represents the connection to a Spark cluster, and can be used to create RDD and broadcast variables on that cluster.  https://spark.apache.org/docs/latest/rdd-programming-guide.html#overview To create a […]

Read More