Skip to content

Commit ba34d23

Browse files
authored
Merge pull request #236 from neo4j/rework-integrations-api
Slim down integrations api for pandas and gds
2 parents 7c6ff4a + 6d26d39 commit ba34d23

File tree

12 files changed

+243
-446
lines changed

12 files changed

+243
-446
lines changed

changelog.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,29 @@
33
## Breaking changes
44

55
- Do not automatically derive size and caption for `from_neo4j` and `from_gql_create`. Use the `size_property` and `node_caption` parameters to explicitly configure them.
6+
- Change API of integrations to only provide basic parameters. Any further configuration should happen ons the Visualization Graph object:
7+
- `from_gds`
8+
- Drop parameters size_property, node_radius_min_max. `Use VG.resize_nodes(property=...)` instead
9+
- rename additional_node_properties to node_properties
10+
- Don't derive fields from properties. Use `VG.map_properties_to_fields` instead
11+
- `from_pandas`
12+
- Drop `node_radius_min_max` parameter. `VG.resize_nodes(...)` instead
613

714
## New features
815

9-
- Allow to include db node properties in addition to the properties in the GDS Graph. Specify `additional_db_node_properties` in `from_gds`.
10-
16+
- Allow to include db node properties in addition to the properties in the GDS Graph. Specify `db_node_properties` in `from_gds`.
1117

1218
## Bug fixes
1319

1420
- fixed a bug in `from_neo4j`, where the node size would always be set to the `size` property.
1521
- fixed a bug in `from_neo4j`, where the node caption would always be set to the `caption` property.
22+
- Color nodes in `from_snowflake` only if there are less than 13 node tables used. This avoids reuse of colors for different tables.
1623

1724
## Improvements
1825

1926
- Validate fields of a node and relationship not only at construction but also on assignment.
2027
- Allow resizing per node property such as `VG.resize_nodes(property="score")`.
28+
- Color nodes by label in `from_gds`.
29+
- Add `table` property to nodes and relationships created by `from_snowflake`. This is used as a default caption.
2130

2231
## Other changes

docs/source/conf.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,10 @@
3333

3434
# -- Options for autodoc extension -------------------------------------------
3535
autodoc_typehints = "description"
36+
autoclass_content = "both"
3637

38+
# -- Options for napoleon extension -------------------------------------------
39+
napoleon_use_admonition_for_examples = True
3740

3841
# -- Options for HTML output -------------------------------------------------
3942
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

docs/source/integration.rst

Lines changed: 10 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -35,28 +35,17 @@ The ``from_dfs`` method takes two mandatory positional parameters:
3535
* A Pandas ``DataFrame``, or iterable (eg. list) of DataFrames representing the nodes of the graph.
3636
The rows of the DataFrame(s) should represent the individual nodes, and the columns should represent the node
3737
IDs and attributes.
38-
If a column shares the name with a field of :doc:`Node <./api-reference/node>`, the values it contains will be set
39-
on corresponding nodes under that field name.
40-
Otherwise, the column name will be a key in each node's `properties` dictionary, that maps to the node's corresponding
38+
The node ID will be set on the :doc:`Node <./api-reference/node>`,
39+
Other columns will be a key in each node's `properties` dictionary, that maps to the node's corresponding
4140
value in the column.
4241
If the graph has no node properties, the nodes can be derived from the relationships DataFrame alone.
4342
* A Pandas ``DataFrame``, or iterable (eg. list) of DataFrames representing the relationships of the graph.
4443
The rows of the DataFrame(s) should represent the individual relationships, and the columns should represent the
4544
relationship IDs and attributes.
46-
If a column shares the name with a field of :doc:`Relationship <./api-reference/relationship>`, the values it contains
47-
will be set on corresponding relationships under that field name.
48-
Otherwise, the column name will be a key in each node's `properties` dictionary, that maps to the node's corresponding
45+
The relationship id, source and target node IDs will be set on the :doc:`Relationship <./api-reference/relationship>`.
46+
Other columns will be a key in each relationship's `properties` dictionary, that maps to the relationship's corresponding
4947
value in the column.
5048

51-
``from_dfs`` also takes an optional property, ``node_radius_min_max``, that can be used (and is used by default) to
52-
scale the node sizes for the visualization.
53-
It is a tuple of two numbers, representing the radii (sizes) in pixels of the smallest and largest nodes respectively in
54-
the visualization.
55-
The node sizes will be scaled such that the smallest node will have the size of the first value, and the largest node
56-
will have the size of the second value.
57-
The other nodes will be scaled linearly between these two values according to their relative size.
58-
This can be useful if node sizes vary a lot, or are all very small or very big.
59-
6049

6150
Example
6251
~~~~~~~
@@ -111,33 +100,21 @@ If you want to have more control of the sampling, such as choosing a specific st
111100
a `sampling <https://neo4j.com/docs/graph-data-science/current/management-ops/graph-creation/sampling/>`_
112101
method yourself and passing the resulting projection to ``from_gds``.
113102

114-
We can also provide an optional ``size_property`` parameter, which should refer to a node property of the projection,
115-
and will be used to determine the sizes of the nodes in the visualization.
116-
117-
The ``additional_node_properties`` parameter is also optional, and should be a list of additional node properties of the
103+
The ``node_properties`` parameter is also optional, and should be a list of additional node properties of the
118104
projection that you want to include in the visualization.
119105
The default is ``None``, which means that all properties of the nodes in the projection will be included.
120106
Apart from being visible through on-hover tooltips, these properties could be used to color the nodes, or give captions
121107
to them in the visualization, or simply included in the nodes' ``Node.properties`` maps without directly impacting the
122108
visualization.
123-
If you want to include node properties stored at the Neo4j database, you can include them in the visualization by using the `additional_db_node_properties` parameter.
124-
125-
The last optional property, ``node_radius_min_max``, can be used (and is used by default) to scale the node sizes for
126-
the visualization.
127-
It is a tuple of two numbers, representing the radii (sizes) in pixels of the smallest and largest nodes respectively in
128-
the visualization.
129-
The node sizes will be scaled such that the smallest node will have the size of the first value, and the largest node
130-
will have the size of the second value.
131-
The other nodes will be scaled linearly between these two values according to their relative size.
132-
This can be useful if node sizes vary a lot, or are all very small or very big.
109+
If you want to include node properties stored at the Neo4j database, you can include them in the visualization by using the `db_node_properties` parameter.
133110

134111

135112
Example
136113
~~~~~~~
137114

138115
In this small example, we import a graph projection from the GDS library, that has the node properties "pagerank" and
139116
"componentId".
140-
We use the "pagerank" property to determine the size of the nodes, and the "componentId" property to color the nodes.
117+
We use the "pagerank" property to compute the size of the nodes, and the "componentId" property to color the nodes.
141118

142119
.. code-block:: python
143120
@@ -156,9 +133,10 @@ We use the "pagerank" property to determine the size of the nodes, and the "comp
156133
VG = from_gds(
157134
gds,
158135
G,
159-
size_property="pagerank",
160-
additional_node_properties=["componentId"],
136+
node_properties=["componentId"],
161137
)
138+
# Size the nodes by the `pagerank` property
139+
VG.resize_nodes(property="pagerank")
162140
163141
# Color the nodes by the `componentId` property, so that the nodes are
164142
# colored by the connected component they belong to

python-wrapper/src/neo4j_viz/gds.py

Lines changed: 30 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
import pandas as pd
99
from graphdatascience import Graph, GraphDataScience
1010

11+
from neo4j_viz.colors import NEO4J_COLORS_DISCRETE, ColorSpace
12+
1113
from .pandas import _from_dfs
1214
from .visualization_graph import VisualizationGraph
1315

@@ -55,18 +57,20 @@ def _fetch_rel_dfs(gds: GraphDataScience, G: Graph) -> list[pd.DataFrame]:
5557
def from_gds(
5658
gds: GraphDataScience,
5759
G: Graph,
58-
size_property: Optional[str] = None,
59-
additional_node_properties: Optional[list[str]] = None,
60-
additional_db_node_properties: Optional[list[str]] = None,
61-
node_radius_min_max: Optional[tuple[float, float]] = (3, 60),
60+
node_properties: Optional[list[str]] = None,
61+
db_node_properties: Optional[list[str]] = None,
6262
max_node_count: int = 10_000,
6363
) -> VisualizationGraph:
6464
"""
6565
Create a VisualizationGraph from a GraphDataScience object and a Graph object.
6666
67-
All `additional_node_properties` will be included in the visualization graph.
68-
If the properties are named as the fields of the `Node` class, they will be included as top level fields of the
69-
created `Node` objects. Otherwise, they will be included in the `properties` dictionary.
67+
By default:
68+
69+
* the caption of a node will be based on its `labels`.
70+
* the caption of a relationship will be based on its `relationshipType`.
71+
* the color of nodes will be set based on their label, unless there are more than 12 unique labels.
72+
73+
All `node_properties` and `db_node_properties` will be included in the visualization graph under the `properties` field.
7074
Additionally, a new "labels" node property will be added, containing the node labels of the node.
7175
Similarly for relationships, a new "relationshipType" property will be added.
7276
@@ -76,49 +80,36 @@ def from_gds(
7680
GraphDataScience object.
7781
G : Graph
7882
Graph object.
79-
size_property : str, optional
80-
Property to use for node size, by default None.
81-
additional_node_properties : list[str], optional
83+
node_properties : list[str], optional
8284
Additional properties to include in the visualization node, by default None which means that all node
8385
properties from the Graph will be fetched.
84-
additional_db_node_properties : list[str], optional
86+
db_node_properties : list[str], optional
8587
Additional node properties to fetch from the database, by default None. Only works if the graph was projected from the database.
86-
node_radius_min_max : tuple[float, float], optional
87-
Minimum and maximum node radius, by default (3, 60).
88-
To avoid tiny or huge nodes in the visualization, the node sizes are scaled to fit in the given range.
8988
max_node_count : int, optional
9089
The maximum number of nodes to fetch from the graph. The graph will be sampled using random walk with restarts
9190
if its node count exceeds this number.
9291
"""
93-
if additional_db_node_properties is None:
94-
additional_db_node_properties = []
92+
if db_node_properties is None:
93+
db_node_properties = []
9594

9695
node_properties_from_gds = G.node_properties()
9796
assert isinstance(node_properties_from_gds, pd.Series)
9897
actual_node_properties: dict[str, list[str]] = cast(dict[str, list[str]], node_properties_from_gds.to_dict())
9998
all_actual_node_properties = list(chain.from_iterable(actual_node_properties.values()))
10099

101-
if size_property is not None:
102-
if size_property not in all_actual_node_properties:
103-
raise ValueError(f"There is no node property '{size_property}' in graph '{G.name()}'")
104-
105100
node_properties_by_label_sets: dict[str, set[str]] = dict()
106-
if additional_node_properties is None:
101+
if node_properties is None:
107102
node_properties_by_label_sets = {k: set(v) for k, v in actual_node_properties.items()}
108103
else:
109-
for prop in additional_node_properties:
104+
for prop in node_properties:
110105
if prop not in all_actual_node_properties:
111106
raise ValueError(f"There is no node property '{prop}' in graph '{G.name()}'")
112107

113108
for label, props in actual_node_properties.items():
114109
node_properties_by_label_sets[label] = {
115-
prop for prop in actual_node_properties[label] if prop in additional_node_properties
110+
prop for prop in actual_node_properties[label] if prop in node_properties
116111
}
117112

118-
if size_property is not None:
119-
for label, label_props in node_properties_by_label_sets.items():
120-
label_props.add(size_property)
121-
122113
node_properties_by_label = {k: list(v) for k, v in node_properties_by_label_sets.items()}
123114

124115
node_count = G.node_count()
@@ -143,7 +134,7 @@ def from_gds(
143134
props.append(property_name)
144135

145136
node_dfs = _fetch_node_dfs(
146-
gds, G_fetched, node_properties_by_label, G_fetched.node_labels(), additional_db_node_properties
137+
gds, G_fetched, node_properties_by_label, G_fetched.node_labels(), db_node_properties
147138
)
148139
if property_name is not None:
149140
for df in node_dfs.values():
@@ -161,13 +152,6 @@ def from_gds(
161152
df.drop(columns=[property_name], inplace=True)
162153

163154
node_props_df = pd.concat(node_dfs.values(), ignore_index=True, axis=0).drop_duplicates()
164-
if size_property is not None:
165-
if "size" in all_actual_node_properties and size_property != "size":
166-
node_props_df.rename(columns={"size": "__size"}, inplace=True)
167-
if additional_node_properties is not None and size_property not in additional_node_properties:
168-
node_props_df.rename(columns={size_property: "size"}, inplace=True)
169-
else:
170-
node_props_df["size"] = node_props_df[size_property]
171155

172156
for lbl, df in node_dfs.items():
173157
if "labels" in all_actual_node_properties:
@@ -179,22 +163,22 @@ def from_gds(
179163

180164
node_df = node_props_df.merge(node_labels_df, on="nodeId")
181165

182-
if "caption" not in all_actual_node_properties:
183-
node_df["caption"] = node_df["labels"].astype(str)
166+
try:
167+
VG = _from_dfs(node_df, rel_dfs, dropna=True)
184168

185-
for rel_df in rel_dfs:
186-
if "caption" not in rel_df.columns:
187-
rel_df["caption"] = rel_df["relationshipType"]
169+
for node in VG.nodes:
170+
node.caption = str(node.properties.get("labels"))
171+
for rel in VG.relationships:
172+
rel.caption = rel.properties.get("relationshipType")
188173

189-
try:
190-
return _from_dfs(
191-
node_df, rel_dfs, node_radius_min_max=node_radius_min_max, rename_properties={"__size": "size"}, dropna=True
192-
)
174+
number_of_colors = node_df["labels"].drop_duplicates().count()
175+
if number_of_colors <= len(NEO4J_COLORS_DISCRETE):
176+
VG.color_nodes(property="labels", color_space=ColorSpace.DISCRETE)
177+
178+
return VG
193179
except ValueError as e:
194180
err_msg = str(e)
195181
if "column" in err_msg:
196182
err_msg = err_msg.replace("column", "property")
197-
if ("'size'" in err_msg) and (size_property is not None):
198-
err_msg = err_msg.replace("'size'", f"'{size_property}'")
199183
raise ValueError(err_msg)
200184
raise e

python-wrapper/src/neo4j_viz/node.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,3 +98,10 @@ def all_validation_aliases(exempted_fields: Optional[list[str]] = None) -> set[s
9898
by_field = [v.validation_alias.choices for k, v in Node.model_fields.items() if k not in exempted_fields] # type: ignore
9999

100100
return {str(alias) for aliases in by_field for alias in aliases}
101+
102+
@staticmethod
103+
def basic_fields_validation_aliases() -> set[str]:
104+
mandatory_fields = ["id"]
105+
by_field = [v.validation_alias.choices for k, v in Node.model_fields.items() if k in mandatory_fields] # type: ignore
106+
107+
return {str(alias) for aliases in by_field for alias in aliases}

0 commit comments

Comments
 (0)