Skip to content

Commit 87473ed

Browse files
committed
Initial commit
0 parents  commit 87473ed

11 files changed

+736
-0
lines changed

.github/workflows/main.yml

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
name: Main
2+
on: push
3+
jobs:
4+
jupyter-lite-build-release:
5+
runs-on: ubuntu-latest
6+
name: Package and release the Juypter Lite app
7+
steps:
8+
- name: Checkout
9+
uses: actions/checkout@v3
10+
- name: Jupyter Lite Build
11+
uses: ./
12+
id: jupyter-lite-build
13+
- name: Release
14+
if: startsWith(github.ref, 'refs/tags/')
15+
uses: softprops/action-gh-release@v1
16+
with:
17+
files: jupyter-lite-build.tgz

.gitignore

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# VIM swp
2+
*.swp
3+
4+
# Project files
5+
.idea/
6+
7+
# Mac
8+
.DS_Store
9+
10+
# Ignore all logfiles and tempfiles.
11+
/log/*
12+
!/log/.keep
13+
/tmp/*
14+
!/tmp/pids
15+
/tmp/pids/*
16+
!/tmp/pids/.keep
17+
18+
# Ignore local env file
19+
/.envrc
20+
.env
21+
22+
# Jupyterlite
23+
public/notebooks
24+
public/notebooks-*
25+
.jupyterlite.doit.db
26+
27+
# Built artifact
28+
jupyter-lite-build.tgz

Dockerfile

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
FROM python:3.11
2+
RUN pip install --no-cache-dir --upgrade pip
3+
4+
COPY jupyterlite /build/
5+
6+
RUN pip install --no-cache-dir -r /build/requirements.txt && rm -f requirements.txt
7+
8+
# Use the target dir, i.e. a mounted path from where the built artifact can be retrieved from
9+
# Use the github workspace as the default.
10+
# For a local run use e.g. /dist with `docker run -it --rm -e TARGET_DIR=/dist -v "$(pwd)":/dist $(docker build -q .)`
11+
ENV TARGET_DIR=${GITHUB_WORKSPACE:-/github/workspace}
12+
13+
# Build the Jupyter lite web app, the pyodide kernel option is passed for offline availability
14+
# (see https://jupyterlite.readthedocs.io/en/latest/howto/configure/advanced/offline.html), the
15+
# Kernel version is taken from https://github.com/jupyterlite/pyodide-kernel/blob/main/packages/pyodide-kernel/package.json
16+
CMD jupyter lite build \
17+
--lite-dir /build \
18+
--pyodide https://github.com/pyodide/pyodide/releases/download/0.23.2/pyodide-0.23.2.tar.bz2 \
19+
--output-dir notebooks \
20+
&& tar -czf ${TARGET_DIR}/jupyter-lite-build.tgz notebooks

action.yml

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
name: 'Jupyter Lite Docker Action'
2+
description: 'Build Juypter Lite App'
3+
runs:
4+
using: 'docker'
5+
image: 'Dockerfile'

jupyterlite/files/README.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Welcome to the Jupyterlite based notebooks
2+
3+
Learn more at https://jupyterlite.readthedocs.io/en/latest/
4+
5+
* ./examples/Snapshot Compare.ipynb is an example of measuring baseline relevancy over time.

jupyterlite/files/examples/Snapshot Compare With Charts.ipynb

+121
Large diffs are not rendered by default.

jupyterlite/files/examples/Snapshot Compare.ipynb

+142
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
{
2+
"metadata": {
3+
"language_info": {
4+
"codemirror_mode": {
5+
"name": "python",
6+
"version": 3
7+
},
8+
"file_extension": ".py",
9+
"mimetype": "text/x-python",
10+
"name": "python",
11+
"nbconvert_exporter": "python",
12+
"pygments_lexer": "ipython3",
13+
"version": "3.8"
14+
},
15+
"kernelspec": {
16+
"name": "python",
17+
"display_name": "Python (Pyodide)",
18+
"language": "python"
19+
}
20+
},
21+
"nbformat_minor": 4,
22+
"nbformat": 4,
23+
"cells": [
24+
{
25+
"cell_type": "markdown",
26+
"source": "# Snapshot Jaccard Similarity\n\nTo understand the impact of changes, you can compare the Jaccard Similarity of snapshots.\n\nPlease copy this example and customize it for your own purposes!",
27+
"metadata": {}
28+
},
29+
{
30+
"cell_type": "markdown",
31+
"source": "### Imports",
32+
"metadata": {}
33+
},
34+
{
35+
"cell_type": "code",
36+
"source": "import pandas as pd\nimport io\nfrom js import fetch",
37+
"metadata": {
38+
"trusted": true
39+
},
40+
"execution_count": 1,
41+
"outputs": []
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"source": "## Define the Data You Want",
46+
"metadata": {}
47+
},
48+
{
49+
"cell_type": "code",
50+
"source": "CASE_ID = 6 # Your Case\nSNAPSHOT_IDS = [1,2] # Your Snapshots. Use the Compare Snapshot function in Quepid to see what the specific ID's are of your snapshots.",
51+
"metadata": {
52+
"trusted": true
53+
},
54+
"execution_count": 2,
55+
"outputs": []
56+
},
57+
{
58+
"cell_type": "markdown",
59+
"source": "### Jaccard Subroutines",
60+
"metadata": {}
61+
},
62+
{
63+
"cell_type": "code",
64+
"source": "## Calculation of Jaccard Similarity of List 1 and 2\n\ndef jaccard_similarity(list1, list2):\n print(list1, list2)\n if list1 == list2: \n print('the lists are same')\n return float(1.0)\n intersection = len(list(set(list1).intersection(list2)))\n union = (len(set(list1)) + len(set(list2))) - intersection\n return float(intersection) / union",
65+
"metadata": {
66+
"trusted": true
67+
},
68+
"execution_count": 3,
69+
"outputs": []
70+
},
71+
{
72+
"cell_type": "code",
73+
"source": "## Construction of a comparable list from Snapshot blob\n\ndef construct_comparable_list_from_snapshot_blob(snapshot):\n for data in snapshot:\n record = data.split(\"\\n\")\n #print(record)\n df = pd.DataFrame(record)\n df[['query','docid','rating']] = df[0].str.split(',',expand=True)\n ratings_df= df[['query','docid','rating']]\n \n # Drop first row as its just column names\n ratings_mod_df = ratings_df.drop(index=ratings_df.index[0])\n \n # Remove '?' if using ispy else the next step can be ignored\n ratings_mod_df['docid'] = ratings_mod_df['docid'].str.split('?').str.get(0)\n #print(ratings_mod_df.head(10))\n \n return ratings_mod_df",
74+
"metadata": {
75+
"trusted": true
76+
},
77+
"execution_count": 4,
78+
"outputs": []
79+
},
80+
{
81+
"cell_type": "code",
82+
"source": "## Subroutine for calculating Jaccard Similarity between 2 Snapshots\n\ndef jaccard_similarity(A, B):\n # Compute Jaccard Similarity\n nominator = set(A).intersection(set(B))\n denominator = set(A).union(set(B))\n Jacc_similarity = len(nominator)/len(denominator)\n #print(Jacc_similarity) \n return (Jacc_similarity) ",
83+
"metadata": {
84+
"trusted": true
85+
},
86+
"execution_count": 5,
87+
"outputs": []
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"source": "### Pull data directly from Quepid's snapshot repository to calculate Jaccard Similarity",
92+
"metadata": {}
93+
},
94+
{
95+
"cell_type": "code",
96+
"source": "# Retrieve from Quepid API from Case id - 6 and Snapshot id - 1\nrating_snapshot_1 = []\nres = await fetch(f'/api/export/ratings/{CASE_ID}.csv?file_format=basic_snapshot&snapshot_id={SNAPSHOT_IDS[0]}')\nrating_snapshot_1.append(await res.text())\n#print(rating_snapshot_1)\n\n# Retrieve from Quepid API from Case id - 6 and Snapshot id - 2\nrating_snapshot_2 = []\nres = await fetch(f'/api/export/ratings/{CASE_ID}.csv?file_format=basic_snapshot&snapshot_id={SNAPSHOT_IDS[1]}')\nrating_snapshot_2.append(await res.text())\n#print(rating_snapshot_2)",
97+
"metadata": {
98+
"tags": [],
99+
"trusted": true
100+
},
101+
"execution_count": 6,
102+
"outputs": []
103+
},
104+
{
105+
"cell_type": "markdown",
106+
"source": "### Read and transform data in a dataframe",
107+
"metadata": {}
108+
},
109+
{
110+
"cell_type": "code",
111+
"source": "df1 = construct_comparable_list_from_snapshot_blob(rating_snapshot_1)\ndf2 = construct_comparable_list_from_snapshot_blob(rating_snapshot_2)\ndf1 = df1.groupby('query')['docid'].apply(list).reset_index(name=\"results\")\ndf2 = df2.groupby('query')['docid'].apply(list).reset_index(name=\"results\")\n\ndf_jaccard = df1[['query']].copy()\ndf_jaccard['baseline_results'] = df1['results']\ndf_jaccard['comparison_results'] = df2['results']\ndf_jaccard['baseline_count'] = df_jaccard.apply(lambda row: len(row.baseline_results), axis = 1)\ndf_jaccard['comparison_count'] = df_jaccard.apply(lambda row: len(row.comparison_results), axis = 1)",
112+
"metadata": {
113+
"tags": [],
114+
"trusted": true
115+
},
116+
"execution_count": 7,
117+
"outputs": []
118+
},
119+
{
120+
"cell_type": "markdown",
121+
"source": "### Add column with jaccard similarity",
122+
"metadata": {}
123+
},
124+
{
125+
"cell_type": "code",
126+
"source": "df_jaccard['jaccard_similarity'] = df_jaccard.apply(lambda row:jaccard_similarity(row.baseline_results, row.comparison_results), axis = 1)",
127+
"metadata": {
128+
"trusted": true
129+
},
130+
"execution_count": 8,
131+
"outputs": []
132+
},
133+
{
134+
"cell_type": "code",
135+
"source": "df_jaccard.head(10)",
136+
"metadata": {
137+
"trusted": true
138+
},
139+
"execution_count": 9,
140+
"outputs": [
141+
{
142+
"execution_count": 9,
143+
"output_type": "execute_result",
144+
"data": {
145+
"text/plain": " query \\\n0 \n1 movie about a boxer who climbs \n2 star trek \n3 star wars \n\n baseline_results \\\n0 [None] \n1 [45317, 826, 46838, 683716, 769, 570731, 680, ... \n2 [193, 199, 188927, 200, 13475, 152, 201, 154, ... \n3 [11, 12180, 181808, 330459, 348350, 140607, 18... \n\n comparison_results baseline_count \\\n0 [None] 1 \n1 [45317, 826, 46838, 683716, 769, 570731, 680, ... 10 \n2 [13363, 193, 199, 154, 152, 174, 157, 168, 188... 10 \n3 [12180, 322506, 85, 1895, 18046, 11, 330459, 1... 10 \n\n comparison_count jaccard_similarity \n0 1 1.000000 \n1 10 1.000000 \n2 10 0.666667 \n3 10 0.538462 ",
146+
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>query</th>\n <th>baseline_results</th>\n <th>comparison_results</th>\n <th>baseline_count</th>\n <th>comparison_count</th>\n <th>jaccard_similarity</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>[None]</td>\n <td>[None]</td>\n <td>1</td>\n <td>1</td>\n <td>1.000000</td>\n </tr>\n <tr>\n <th>1</th>\n <td>movie about a boxer who climbs</td>\n <td>[45317, 826, 46838, 683716, 769, 570731, 680, ...</td>\n <td>[45317, 826, 46838, 683716, 769, 570731, 680, ...</td>\n <td>10</td>\n <td>10</td>\n <td>1.000000</td>\n </tr>\n <tr>\n <th>2</th>\n <td>star trek</td>\n <td>[193, 199, 188927, 200, 13475, 152, 201, 154, ...</td>\n <td>[13363, 193, 199, 154, 152, 174, 157, 168, 188...</td>\n <td>10</td>\n <td>10</td>\n <td>0.666667</td>\n </tr>\n <tr>\n <th>3</th>\n <td>star wars</td>\n <td>[11, 12180, 181808, 330459, 348350, 140607, 18...</td>\n <td>[12180, 322506, 85, 1895, 18046, 11, 330459, 1...</td>\n <td>10</td>\n <td>10</td>\n <td>0.538462</td>\n </tr>\n </tbody>\n</table>\n</div>"
147+
},
148+
"metadata": {}
149+
}
150+
]
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"source": "### Export data as CSV for reporting and sharing purpose",
155+
"metadata": {}
156+
},
157+
{
158+
"cell_type": "code",
159+
"source": "df_jaccard.to_csv('jaccard_similarity_results.csv', encoding='utf-8', index=False)",
160+
"metadata": {
161+
"trusted": true
162+
},
163+
"execution_count": 10,
164+
"outputs": []
165+
},
166+
{
167+
"cell_type": "code",
168+
"source": "df_jaccard['jaccard_similarity'].mean()",
169+
"metadata": {
170+
"trusted": true
171+
},
172+
"execution_count": 11,
173+
"outputs": [
174+
{
175+
"execution_count": 11,
176+
"output_type": "execute_result",
177+
"data": {
178+
"text/plain": "0.8012820512820512"
179+
},
180+
"metadata": {}
181+
}
182+
]
183+
},
184+
{
185+
"cell_type": "code",
186+
"source": "",
187+
"metadata": {},
188+
"execution_count": null,
189+
"outputs": []
190+
},
191+
{
192+
"cell_type": "code",
193+
"source": "",
194+
"metadata": {},
195+
"execution_count": null,
196+
"outputs": []
197+
}
198+
]
199+
}

0 commit comments

Comments
 (0)