[NOTES] Neo4J Crash Course by Laith Academy
Neo4J: A Graph Database using Cypher - A Declarative Query Language.
These notes are based on an excellent Intro to Neo4J crash course by Laith Academy these are notes created whilst following along feel free to read but I do highly recommend watching the course too as these notes are provided as is. You can find the sample data for the course here.
What is a graph database?
A Graph Database is a type of database similar to Relational DB (i.e MySQL/PostgreSQL), Document Databases (i.e MongoDB), and Key-Value Pair DB (i.e Redis). An example of a use case where a graph database might be better is a Social Media Network or a Recommendation engine, or Semantic Web Engine.
The Neo4J Graph Database
The Neo4J Graph database is made up of nodes and relationships each of which can have properties. Nodes however, also have a label (or multiple labels), you can think of this as a type i.e a node could have type (label) PLAYER but could also have type MIDFIELDER. Finally, relationships also have a ‘direction’ for example a PLAYER node (with name attribute of “LeBron James”) would have a “PLAYS_FOR” relationship to the TEAM node (with name attribute of “LA Lakers”) - of course the direction would be from PLAYER to TEAM node but this has to be specified (examples later) if there is a two way relationship this should be two individual relationships. Additionally, the PLAYS_FOR relationship could have its own attribute for salary to indicate information about the relationship i.e that his salary is $40m.
Example Queries
Let’s say, given a Neo4J Database of Basketball players we want to find the 3rd and 4th shortest (height) players in the database…
MATCH (player:PLAYER)
WHERE player.height >= 2
RETURN player
ORDER BY player.height ASC
SKIP 2
LIMIT 2
Or maybe we just want to see the players who’s height is greater than or equal to 2, and their connections to other players, coaches, and teams…
MATCH (player:PLAYER), (coach:COACH), (team:TEAM)
WHERE player.height >= 2
RETURN player, coach, team
Match keyword requires name:TYPE - i.e team could be teem:TEAM and as long as you return teem and not ’team’ it will work fine.
Querying Relationships
When querying relationships the query is a little different.
MATCH (player:PLAYER)-[:PLAYS_FOR]->(team:TEAM)
WHERE team.name = "LA Lakers"
RETURN player, team
In this example we are querying for all players who PLAYS_FOR the LA Lakers. Typically we wouldn’t also return the team if we are just interested in the players but the team looks good on the visualisation.
Where we have MATCH (player:PLAYER) -[:PLAYS_FOR]-> (team:TEAM) the arrow can go either way depending on where the relationship between nodes should be. We can also label the PLAYS_FOR relationship if we wanted to use that more in our query. This would look like:
-[relationshiplable:PLAYS_FOR]->
Querying Properties of Relationships
Relationships themselves can have properties such as, salary (for plays_for relationship between a player and a team). Example:
match (player:PLAYER) -[contract:PLAYS_FOR]-> (team:TEAM)
where team.name = "LA Lakers" and contract.salary >= 38000000
return player, team
More Complex Query
Get all of LeBron’s teammates who have a salary > 3500000. First get LeBron’s teammates…
MATCH (lebron:PLAYER {name: "LeBron James"}) -[:TEAMMATES]-> (teammate:PLAYER)
RETURN lebron, teammate
Notice we are selecting only LeBron using the {name: "LeBron James"} addition to the first node in our MATCH.
Now we can add another MATCH and use that match to filter.
MATCH (lebron:PLAYER {name: "LeBron James"}) -[:TEAMMATES]-> (teammate:PLAYER)
MATCH (teammate) -[contract:PLAYS_FOR]-> (team:TEAM)
WHERE contract.salary >= 35000000
RETURN teammate
Aggregation in Queries
Suppose we want to count the number of games played by each player given each player has a PLAYED_AGAINST relationship. To do this we would simply do…
MATCH (player:PLAYER) -[gamePlayed:PLAYED_AGAINST]->(:TEAM)
RETURN player.name, COUNT(gamePlayed)
Alternatively, we can get average of points played using AVG(gamePlayed.points). Should we then wish to filter or order based on this aggregated/calculated number we can us AS blah to alias the query and then ORDER BY blah ASC.
MATCH (player:PLAYER) -[gamePlayed:PLAYED_AGAINST]->(:TEAM)
RETURN player.name, AVG(gamePlayed.points) AS ppg
ORDER BY ppg ASC
We could even add LIMIT 1 which would give us the player with the lowest average points in the games played.
Deleting Nodes
In order to delete nodes we have to be careful as just using the DELETE keyword will try to delete the node but will return an error if that node still has relationships… in order to prevent this we can use the DETACH keyword before delete…
MATCH (ja {name: "Ja Morant"})
DETACH DELETE ja
But what if we just want to delete a relationship…
MATCH (joel {name: "Joel Embiid"}) -[rel:PLAYS_FOR]-> (:TEAM)
DELETE rel
If we want to delete everything.
MATCH (n)
DETACH DELETE n
Creating Nodes and Relationships
CREATE (lebron:PLAYER:COACH:GENERAL_MANAGER {name: "LeBron James", height: "2.01"})
return lebron
In this example we are using multiple types or labels (Caps).
CREATE (:PLAYER) -[:PLAYS_FOR {salary: 34000000}]-> (:TEAM {name: "LA Lakers"})
(note: we forgot to add the name “Anthony Davis” when we created this player but dont worry we will come back to this soon. We should have done CREATE (:PLAYER {name: "Anthony Davis"}) -[:PLAYS....
But first let’s add just a relationship…
MATCH (lebron:PLAYER {name: "LeBron James"}), (lakers:TEAM {name: "LA Lakers"})
CREATE (lebron) -[:PLAYS_FOR {salary: 40000000}]-> (lakers)
Notice we actually forgot to give Anthony a node label/name. So, How do we update a node if we cant select on name…
MATCH (anthony:PLAYER)
WHERE ID(anthony) = 0
SET anthony.name = "Anthony Davis"
RETURN anthony
We can update any node using the SET keyword. In our case we couldn’t just do MATCH (anthony:PLAYER {name: "Anthony Davis"}) because we didn’t set the name property… Instead we had to filter the players by their ID. Unfortunately, to find the ID we have to filter through all the nodes using MATCH (n) RETURN n and manually get the ID.
Updating and Adding Properties
MATCH (lebron:PLAYER)
WHERE ID(lebron) = 3
SET lebron.height = 2.02, lebron.age = 36
or
MATCH (lebron:PLAYER {name: "LeBron James"})
SET lebron.height = 2.02, lebron.age = 36
RETURN lebron
The return keyword is optional for both of these but it is handy to get the updated node back to verify our changes.
We can also update/add node labels (types)…
MATCH (lebron:PLAYER {name: "LeBron James"})
SET lebron:REF
RETURN lebron
Updating and Adding Relationships
First let’s grab both LeBron and the Team.
MATCH (lebron {name: "LeBron James"}) -[contract:PLAYS_FOR]-> (team:TEAM)
RETURN lebron, team
Let’s say we want to update the PLAYS_FOR salary to be 60m instead of 40m.
MATCH (lebron {name: "LeBron James"}) -[contract:PLAYS_FOR]-> (team:TEAM)
SET contract.salary = 60000000
RETURN lebron, team
Lets say I want to remove the REF label from LeBron and the Age property.
MATCH (lebron {name: "LeBron James"}) -[contract:PLAYS_FOR]-> (team:TEAM)
REMOVE lebron:REF, lebron.age
RETURN lebron, team
That should be most of the basics covered. I will try to update these notes as I find newer info or handy shortcuts :). Once again thank you to Laith Academy for the crash course.