Category Command Description / What It Does Example Creation sc.parallelize(data) Create RDD from a Python list rdd = sc.parallelize([1,2,3]) sc.textFile(path) Read text file (each line = 1 element) rdd = sc.textFile("data.txt") sc.wholeTextFiles(path) Read files as (filename, content) pairs rdd = sc.wholeTextFiles("folder/") Basic Transformations map(func) Apply function to each element rdd.map(lambda x: x*2) flatMap(func) Apply function & flatten result rdd.flatMap(lambda x: x.split(" ")) filter(func) Keep elements matching condition rdd.filter(lambda x: x>10) distinct() Remove duplicate elements rdd.distinct() union(rdd2) Combine two RDDs rdd1.union(rdd2) intersection(rdd2) Keep common elements rdd1.intersection(rdd2) subtract(rdd2) Elements in first RDD but not second rdd1.subtract(rdd2) cartesian(rdd2) All pairs between two RDDs rdd1.cartesian(rdd2) sample(withRepla...