docker pull apache/spark docker run -it --user root apache/spark bash // choose "root" user above so you can install vim and other things into // the container // Run `docker start -i --user root ' to re-start the container if gets stopped The commands below to be issued within the container's bash terminal. apt update apt install vim /* The above steps are optional, if you want to install the editor vi inside the container. If you don't install any editor inside the container, you will have to edit any spark programs that you want to run in your host, and copy them to the container using docker cp. Editors other than vi, such as gedit, may also be available. You can try installing them if you wish.*/ cd to /opt/spark/bin Run ./pyspark from command-line. You will get the pyspark terminal. Type, e.g., the following statements within the pyspark terminal: from pyspark.sql import functions data_df = spark.createDataFrame([("Brooke", 20), ("Denny", 31), ("Jules", 30),("TD", 35), ("Brooke", 25)], ["name", "age"]) avg_df = data_df.groupBy("name").agg(functions.avg("age")) avg_df.show() Type Ctrl-D to close the pyspark terminal. Or, save the above statements into a file, say ageAvg.py, then run the following command within the pyspark terminal: exec(open("ageAvg.py").read())