If you are registered for the seminar, these instructions are also included in Canvas, and as noted there, installing Neo4j on your computer and building the database is totally optional. For the seminar the graph will be installed already – you will just need to click the download link to agree to Yelp’s Dataset Agreement.
If you decide to install Neo4j and build the graph on your own computer, be sure you have enough memory on your computer or reduce the size of the dataset by selecting a subset of the 10 metropolitan areas included in the Yelp data. The computers we will be using during the seminar have 16GB of memory and we allocate up to 12GB to Neo4j (when it’s running).
To Install Neo4j and load the data, perform the following steps:
- Download the Yelp Data – download somewhere that you have a good Internet connection and be patient. The download page contains JSON files and pictures as separate downloads – you only need the JSON files. After the download has completed, you will have a file named yelp_dataset.tar.gz that is roughly 3.58GB. If it is substantially smaller, the download failed. If you are on Windows and have file extensions hidden, the file may be listed as yelp_dataset.tar (the .gz could be hidden). You may want to consider showing file extensions. If you are on a Mac, it automatically decompresses the tar file as it’s downloading, so you will also see “yelp_dataset.tar”, but in that case it is actually the tar file. Put the download file in it’s own directory.
- Unzip the Yelp data you download as described here for a PC or here for a Mac.
- Install Anaconda (a free tool for working in Python). The Anaconda installation will include Jupyter notebooks – a very popular tool in data science. You will be using a Jupyter notebook to generate your data files.
- Download the notebook in this file and unzip it. The zip file contains a notebook named GenerateGraphDatabaseFiles.
- Create the data files as described in this this documentation.
- Install Neo4j as described in this documentation (the free developer license).
- Configure Neo4j and load the data as described in this documentation. When loading the data, you can copy/paste the commands from this text file.