Working with NEO4J – Some practical tips

As part of a data science project that involved working on NEO4J database (using the community edition) I came across some issues which I'm summarizing as tips in the hope that they will save time and prevent headaches for others.

Tip #1: Manage your databases:

Since (as far as I know) one cannot switch between databases when working with Neo4J, I recommend simply creating them beforehand so one can easily go to the necessary database when needed.

This is also a great way to share work: instead of reloading nodes and relationships, you can just share the whole DB (e.g. DB folder and subfolders) with another person, who can connect to it seamlessly.

Tip #2: Loading files from any location on your machine

The default option in NE4OJ configuration file forces you to UPLOAD files from a predefined 'import' folder.

To be able to upload files from any location, just comment down the

'dbms.directories.import=import' line:

#dbms.directories.import=import

Below are the (three) steps you need to follow:

Step 1 – Click Options

Step 2 – Click first Edit button

Step 3 – Comment line

BTW, in case you performed the change while the DB is running, you will need to stop and start the DB for the changes to take effect.

Tip #3: As soon as you define/upload data to a new node, create an Index

Several times I've rushed to create a node and then uploaded relationships between the node and other ones, but it took me a while to figure out that the process was sluggish since I forgot to create a corresponding index.

You can make sure you have indexes created for the relevant nodes by typing :schema on the command line:

Tip #4: Use 'apoc' plugins (current version: 'apoc-3.0.4.1-all')

This collection of utilities can help you with several tasks. You can refer to

https://neo4j-contrib.github.io/neo4j-apoc-procedures for an in-depth documentation.

You will need to copy this file to the 'plugins' folder of each DB you work with.

My favorite utility is $call apoc.meta.graph, which depicts the loaded schema (e.g. nodes and their relationships).