-
Notifications
You must be signed in to change notification settings - Fork 53
Add notebook for visualizing projections with PyVis #766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
a696278
fe47d91
090f008
3982faa
5575cf3
1498df5
41f5f1f
b01e44c
0703121
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,201 @@ | ||
// DO NOT EDIT - AsciiDoc file generated automatically | ||
|
||
= GDS Projection Visualization with PyVis | ||
adamnsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
https://colab.research.google.com/github/neo4j/graph-data-science-client/blob/main/examples/import-sample-export-gnn.ipynb[image:https://colab.research.google.com/assets/colab-badge.svg[Open | ||
In Colab]] | ||
|
||
|
||
This Jupyter notebook is hosted | ||
https://github.com/neo4j/graph-data-science-client/blob/main/examples/visualize-with-pyvis.ipynb[here] | ||
in the Neo4j Graph Data Science Client Github repository. | ||
|
||
The notebook exemplifies how to visualize a graph projection in the GDS | ||
Graph Catalog using the `graphdatascience` | ||
(https://neo4j.com/docs/graph-data-science-client/current/[docs]) and | ||
`pyvis` (https://pyvis.readthedocs.io/en/latest/index.html[docs]) | ||
libraries. | ||
|
||
== Prerequisites | ||
|
||
Running this notebook requires a Neo4j server with GDS installed. We | ||
recommend using Neo4j Desktop with GDS, or AuraDS. | ||
|
||
Also required are of course the Python libraries `graphdatascience` and | ||
`pyvis`: | ||
|
||
[source, python, role=no-test] | ||
---- | ||
%pip install graphdatascience pyvis | ||
---- | ||
|
||
== Setup | ||
|
||
We start by importing our dependencies and setting up our GDS client | ||
connection to the database. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
from graphdatascience import GraphDataScience | ||
import os | ||
from pyvis.network import Network | ||
---- | ||
|
||
[source, python, role=no-test] | ||
---- | ||
# Get Neo4j DB URI, credentials and name from environment if applicable | ||
NEO4J_URI = os.environ.get("NEO4J_URI", "bolt://localhost:7687") | ||
NEO4J_AUTH = None | ||
NEO4J_DB = os.environ.get("NEO4J_DB", "neo4j") | ||
if os.environ.get("NEO4J_USER") and os.environ.get("NEO4J_PASSWORD"): | ||
NEO4J_AUTH = ( | ||
os.environ.get("NEO4J_USER"), | ||
os.environ.get("NEO4J_PASSWORD"), | ||
) | ||
gds = GraphDataScience(NEO4J_URI, auth=NEO4J_AUTH, database=NEO4J_DB) | ||
---- | ||
|
||
== Sampling Cora | ||
|
||
Next we use the | ||
https://neo4j.com/docs/graph-data-science-client/current/common-datasets/#_cora[built-in | ||
Cora loader] to get the data into GDS. The nodes in the Cora dataset is | ||
represented by academic papers, and the relationships connecting them | ||
are citations. | ||
|
||
We will then sample a smaller representative subgraph from it that is | ||
more suitable for visualization. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
G = gds.graph.load_cora() | ||
---- | ||
|
||
Let’s make sure we constructed the correct graph. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
print(f"Metadata for our loaded Cora graph `G`: {G}") | ||
print(f"Node labels present in `G`: {G.node_labels()}") | ||
---- | ||
|
||
It’s looks correct! Now let’s go ahead and sample the graph. | ||
|
||
We use the random walk with restarts sampling algorithm to get a smaller | ||
graph that structurally represents the full graph. In this example we | ||
will use the algorithm’s default parameters, but check out | ||
https://neo4j.com/docs/graph-data-science/current/management-ops/graph-creation/sampling/rwr/[the | ||
algorithm’s docs] to see how you can for example specify the size of the | ||
subgraph, and choose which start node around which the subgraph will be | ||
sampled. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
G_sample, _ = gds.alpha.graph.sample.rwr("cora_sample", G, randomSeed=42, concurrency=1) | ||
adamnsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
---- | ||
|
||
We should have somewhere around 0.15 * 2708 ~ 406 nodes in our sample. | ||
And let’s see how many relationships we got. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
print(f"Number of nodes in our sample: {G_sample.node_count()}") | ||
print(f"Number of relationships in our sample: {G_sample.relationship_count()}") | ||
---- | ||
|
||
Let’s also compute | ||
https://neo4j.com/docs/graph-data-science/current/algorithms/page-rank/[PageRank] | ||
on our sample graph, in order to get an importance score that we call | ||
``rank'' for each node. It will be interesting for context when we | ||
visualize the graph. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
gds.pageRank.mutate(G_sample, mutateProperty="rank") | ||
---- | ||
|
||
== Exporting the sampled Cora graph | ||
|
||
We can now export the topology and node properties of our sampled graph | ||
that we want to visualize. | ||
|
||
Let’s start by fetching the relationships. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
sample_topology_df = gds.beta.graph.relationships.stream(G_sample) | ||
adamnsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
display(sample_topology_df) | ||
---- | ||
|
||
We get the right amount of rows, one for each expected relationship. So | ||
that looks good. | ||
|
||
Next we should fetch the node properties we are interested in. Each node | ||
will have a ``subject'' property which will be an integer 0,…,6 that | ||
indicates which of seven academic subjects the paper represented by the | ||
nodes belong to. We will also fetch the PageRank property ``rank'' that | ||
we computed above. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
sample_node_properties_df = gds.graph.nodeProperties.stream( | ||
G_sample, | ||
["subject", "rank"], | ||
separate_property_columns=True, | ||
) | ||
display(sample_node_properties_df) | ||
---- | ||
|
||
Now that we have all the data we want to visualize, we can create a | ||
network with PyVis. We color each node according to its ``subject'', and | ||
adamnsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
size it according to its ``rank''. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
net = Network(notebook = True, | ||
cdn_resources="remote", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should be able to format notebooks according to https://docs.astral.sh/ruff/configuration/#jupyter-notebook-discovery There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue with our setup .. see #768 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried after rebasing on main, but the formatting issue remains the same after running There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. likely wrong pandoc version installed locally. |
||
bgcolor = "#222222", | ||
font_color = "white", | ||
height = "750px", # Modify according to your screen size | ||
width = "100%", | ||
) | ||
|
||
# Seven suitable light colors, one for each "subject" | ||
subject_to_color = ["#80cce9", "#fbd266", "#a9eebc", "#e53145", "#d2a6e2", "#f3f3f3", "#ff91af"] | ||
|
||
# Add all the nodes | ||
for _, node in sample_node_properties_df.iterrows(): | ||
net.add_node(int(node["nodeId"]), color=subject_to_color[int(node["subject"])], value=node["rank"]) | ||
adamnsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# Add all the relationships | ||
net.add_edges(zip(sample_topology_df["sourceNodeId"], sample_topology_df["targetNodeId"])) | ||
adamnsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
net.show("cora-sample.html") | ||
adamnsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
---- | ||
|
||
Unsuprisingly we can see that papers largely seem clustered by academic | ||
adamnsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
subject. We also note that some nodes appear larger in size, indicating | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just the visualization does not explain to us what the academic subjects are right? |
||
that they have a higher centrality score according to PageRank. | ||
|
||
We can scroll over the graphic to zoom in/out, and ``click and drag'' | ||
the background to navigate to different parts of the network. If we | ||
click on a node, it will be highlighted along with the relationships | ||
connected to it. And if we ``click and drag'' a node, we can move it. | ||
|
||
Additionally one could enable more sophisticated navigational features | ||
for searching and filtering by providing `select_menu = True` and | ||
`filter_menu = True` respectively to the PyVis `Network` constructor | ||
above. Check out the | ||
https://pyvis.readthedocs.io/en/latest/index.html[PyVis documentation] | ||
for this. | ||
|
||
== Cleanup | ||
|
||
We remove the Cora graphs from the GDS graph catalog to free up memory. | ||
|
||
[source, python, role=no-test] | ||
---- | ||
_ = G_sample.drop() | ||
_ = G.drop() | ||
---- |
Uh oh!
There was an error while loading. Please reload this page.