Merge pull request #60 from neo4j/snowpark-example

adamnsch · web-flow · commit 628a87cf264c · 2025-01-16T16:29:10.000+01:00
Add Snowpark notebook example
diff --git a/examples/snowpark-nvl-example.ipynb b/examples/snowpark-nvl-example.ipynb
@@ -0,0 +1,276 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0d3ffc27",
+   "metadata": {},
+   "source": [
+    "# Visualizing Snowflake Tables"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b83277d",
+   "metadata": {},
+   "source": [
+    "\n",
+    "This is a brief but complete example of how to visualize graphs represented by tables in Snowflake, using the Graph Visualization for Python library for Neo4j."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "168b2f0ec9520f4a",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "We will start by installing the necessary Python library requirements."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "39e8a71b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install snowflake-snowpark-python # Requires Python version <= 3.11\n",
+    "%pip install neo4j-viz"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c91214441edff2d",
+   "metadata": {},
+   "source": [
+    "We can now proceed to set up our connection to Snowflake by initializing a new session.\n",
+    "Please not that you may need more or fewer connection parameters depending on your Snowflake configuration."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "801d0bed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "from snowflake.snowpark import Session\n",
+    "\n",
+    "# Configure according to your own setup\n",
+    "connection_parameters = {\n",
+    "    \"account\": os.environ.get(\"SNOWFLAKE_ACCOUNT\"),\n",
+    "    \"user\": os.environ.get(\"SNOWFLAKE_USER\"),\n",
+    "    \"password\": os.environ.get(\"SNOWFLAKE_PASSWORD\"),\n",
+    "    \"role\": os.environ.get(\"SNOWFLAKE_ROLE\"),\n",
+    "    \"warehouse\": os.environ.get(\"SNOWFLAKE_WAREHOUSE\"),\n",
+    "}\n",
+    "\n",
+    "session = Session.builder.configs(connection_parameters).create()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ff57d28a917c569",
+   "metadata": {},
+   "source": [
+    "Now can we create a new Snowflake database where we can put our little example tables.\n",
+    "If you already have a database you want to use, you can skip this step."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "41ad4289420a9b36",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "session.sql(\n",
+    "    \"CREATE DATABASE IF NOT EXISTS nvl_example DATA_RETENTION_TIME_IN_DAYS = 1\"\n",
+    ").collect()\n",
+    "session.sql(\"USE DATABASE nvl_example\").collect()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "365a1c31",
+   "metadata": {},
+   "source": [
+    "## Creating tables\n",
+    "\n",
+    "Next we will create a new table for the nodes in our graph, that will represent products of various categories."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d935b3d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "session.sql(\n",
+    "    \"CREATE OR REPLACE TABLE products (id INT, name VARCHAR, category INT)\"\n",
+    ").collect()\n",
+    "\n",
+    "session.sql(\"\"\"\n",
+    "INSERT INTO products VALUES\n",
+    "(1,  'Product 1',  1),\n",
+    "(2,  'Product 1A', 1),\n",
+    "(3,  'Product 1B', 1),\n",
+    "(4,  'Product 2',  2),\n",
+    "(5,  'Product 2A', 2),\n",
+    "(6,  'Product 2B', 2),\n",
+    "(7,  'Product 3',  3),\n",
+    "(8,  'Product 3A', 3),\n",
+    "(9,  'Product 3B', 3),\n",
+    "(10, 'Product 4',  4),\n",
+    "(11, 'Product 4A', 4),\n",
+    "(12, 'Product 4B', 4)\n",
+    "\"\"\").collect()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf08716eb4275659",
+   "metadata": {},
+   "source": [
+    "Some of the products, are \"subproducts\" of certain parent products.\n",
+    "We now create a table that encodes these \"PARENT\" relationships between the products."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "be2ac16d3bd41e6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "session.sql(\n",
+    "    \"CREATE OR REPLACE TABLE parents (source INT, target INT, type VARCHAR)\"\n",
+    ").collect()\n",
+    "\n",
+    "session.sql(\"\"\"\n",
+    "INSERT INTO parents VALUES\n",
+    "(2,  1,  'PARENT'),\n",
+    "(3,  1,  'PARENT'),\n",
+    "(5,  4,  'PARENT'),\n",
+    "(6,  4,  'PARENT'),\n",
+    "(8,  7,  'PARENT'),\n",
+    "(9,  7,  'PARENT'),\n",
+    "(11, 10, 'PARENT'),\n",
+    "(12, 10, 'PARENT')\n",
+    "\"\"\").collect()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a28bd5aa",
+   "metadata": {},
+   "source": [
+    "## Fetching the data\n",
+    "\n",
+    "Next we fetch our tables from Snowflake and convert them to pandas DataFrames.\n",
+    "Additionally, we rename the most of the table columns so that they are named according to the `neo4j-viz` API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "deb6353193e2338b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "products_df = (\n",
+    "    session.table(\"products\")\n",
+    "    .to_pandas()\n",
+    "    .rename(columns={\"ID\": \"id\", \"NAME\": \"caption\"})\n",
+    ")\n",
+    "parents_df = (\n",
+    "    session.table(\"parents\")\n",
+    "    .to_pandas()\n",
+    "    .rename(columns={\"SOURCE\": \"source\", \"TARGET\": \"target\", \"TYPE\": \"caption\"})\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "950e0e76cfcaf3d6",
+   "metadata": {},
+   "source": [
+    "## Rendering the visualization\n",
+    "With only one command we can now create a `VisualizationGraph` from these tables representing nodes and relationships.\n",
+    "In order to enhance the visualization, we will also be utilizing the `color_nodes` function, which will assign a distinct color to each product category."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2322065c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from neo4j_viz.pandas import from_dfs\n",
+    "\n",
+    "VG = from_dfs(products_df, parents_df)\n",
+    "\n",
+    "# Using the default Neo4j color scheme\n",
+    "VG.color_nodes(\"CATEGORY\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da39f29deb1569e2",
+   "metadata": {},
+   "source": [
+    "Let us now render our graph, using only default render options."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e8b0f4c6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "VG.render()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac4c5e35a602ede2",
+   "metadata": {},
+   "source": [
+    "You can scroll to zoom in and out in the visualization, and click-and-drag nodes to move them."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8aa5576fa3b25383",
+   "metadata": {},
+   "source": [
+    "## Cleanup\n",
+    "\n",
+    "Lastly, we clean up the example database we created."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c6eb218e892d420e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "session.sql(\"DROP DATABASE IF EXISTS nvl_example\").collect()\n",
+    "session.close()"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}