Importing data from csv file using PySpark

There are two ways to import the csv file, one as a RDD and the other as Spark Dataframe(preferred) !pip install pyspark from pyspark import SparkContext, SparkConf sc =SparkContext() A SparkContext represents the connection to a Spark cluster, and can be used to create RDD and broadcast variables on that cluster.  https://spark.apache.org/docs/latest/rdd-programming-guide.html#overview To create a … Continue reading “Importing data from csv file using PySpark”