Students earning this badge have attended the Data Science for All seminar on Spark and Jupyter Notebooks and successfully completed the post-seminar assignment/quiz. This badge attests to the skills students have learned through the seminar and demonstrated through the post-seminar assignment using Spark in a notebook environment on Databricks.

The course was initially developed and presented as part of the Data Science for All seminar series at San Jose State University by Dr. Scott Jensen in the Spring 2019 semester.

For a description of the seminar content, see the seminar page. In a nutshell, the skills learned (and covered by this digital badge) include:

  • Creating and working with DataFrames and temporary views in Apache Spark
  • Understanding the importance of documenting their work and using markdown
  • Writing basic PySpark and Spark SQL queries using DataFrames
  • Visualizing DataFrames as bar, line, and area charts

Students who have earned this badge demonstrated an understanding of the basics of Jupyter notebooks, including creating markdown and code cells, sharing, and publishing their notebook.  In addition, they have demonstrated the use of Apache Spark in a notebook environment, including working with DataFrames, temporary views, and writing basic queries using PySpark and Spark SQL.  They have also demonstrated they can select an appropriate chart type and have created visualizations based on their DataFrames; including line charts, bar charts, stacked bar charts, and area graphs to communicate their results.