Taught by: Dr. Scott Jensen
Why we hope to see YOU at the seminar!
No prior knowledge of graphs or programming is required; just curiosity! If you have ever wondered how companies explore your data to suggest new social media connections, or products you may be interested in, this seminar is for you! If you have ever been fascinated by journalists using data to connect, people and events, this seminar is for you!
Data science is about exploring the patterns and relationships in data, and graph databases are the key to exploring relationships in networks – such as the tsunami of data from social networks. In this seminar we will be using the Neo4j graph database to explore relationships in a social network. Graphs are composed of “nodes” and the relationships (edges) between those nodes. For example, in a social network, the nodes could be you and your friends (you would each be a node), and the relationships between you would be “FRIEND”. Other people in your social network would also be nodes, but connected through other types of relationships, such as “PARENT”, or “SIGNIFICANT OTHER”. Relationships are directional (you would have a PARENT relationship to each of your parents, but they would not have a PARENT relationship to you).
In business, this means discovering relationships between customers, their purchases, and their behaviors. Graphs enable features such as “people you may know” or recommending other products to purchase, songs to listen to, or people to date. But graphs aren’t only for businesses. The International Consortium of Investigative Journalists used Neo4j to enable investigative journalists across the globe to discover previously hidden relationships between politicians and offshore tax havens. So whether your interest is in tracking the relationships between customers, between politicians and tax havens, detecting financial fraud, or tracking the spread of infectious diseases, this seminar will enable you to discover the relationships of interest to you!
After completing the pre-seminar, you should be able to:
- Install Neo4j on your computer
- Configure the basic computer settings for Neo4j
- Load an existing Neo4j database
After participating in the seminar and completing the post-seminar assessment, you will be able to:
- Describe why graph databases are used to explore social networks
- Describe relationships in graphs
- Write basic cypher queries
- Load data in a graph database
- Generate visualizations of networks relationships
What you will be doing during the seminar:
You will be working with a dataset made available by Yelp and we will be looking at restaurant and bar reviews, and who is reviewing which businesses, the cuisines or entertainment the businesses provide, where businesses are located, and the friend relationships between users. Although this is only a small segment of Yelp’s data, the graph you will be working with contains approximately 20 million relationships! We will explore patterns in users reviewing restaurants and also explore using the database for recommendations, such as starting with a user who has a lot of fans and asking, “can we use their network of friends to make pizza recommendations based on the reviews by friends of their friends (who are not direct friends), but they have been similarly critical of a restaurant that the user has also reviewed?”
The database we will be using will be installed on the lab computers, but instructions for creating and installing the database on your own computer are included below if you wish to play with it afterwards on your own computer. No prior experience is needed, but to get the most out of the seminar, please do the following:
How to get started:
- Register for the seminar – its 100% free, but registering for the seminar will get you access to a Canvas course with all of the seminar materials, optional pre-seminar exercises, and additional materials (some of these are included below, but more convenient in Canvas).
- Try out the pre-seminar exercise in Canvas. This includes using web-based, pre-populated databases that Neo4j makes available for you to play with – all you need is a browser!
Seminar materials (additional materials are available in Canvas after you register):
- Dataset agreement. At the start of the seminar you will be required to access this link and accept the agreement to access the Yelp data. Since we have already created the graph database, you will not need to download the dataset.
- Pre-seminar exercise. In addition to optional videos and websites with examples using graphs, there is an exercise in the Canvas module based on the Neo4j Recommendation sandbox you can sign up for here. This is a database Neo4j has created for movie recommendations and you only need a web browser to access it. the exercise in Canvas walks you through signing up for the sandbox and doing some initial queries to generate visualizations.
- Seminar slides. This is a PDF of the slides from the seminar. Feel free to look at them beforehand, but if you don’t understand them before the seminar, that’s fine! We will be walking through learning about the topics covered in the slides.
- Creating the database and installing Neo4j (optional). As stated, this is totally optional – the database will be installed on the computers we will be using in the seminar, so you do not need to install anything (or even own a computer), to participate in the seminar.
- Faculty: If you are a faculty member at SJSU or any university or community college, and you would like to host a seminar at your school or use the materials in your course, please see the faculty page for additional materials. If you are a Dean or faculty member at a a Bay Area community college, we would like to hear from you! We are working with community college faculty in the Bay Area and provide small stipends to attend the seminar and assist you in presenting it at your school.