You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"source": "# Snapshot Jaccard Similarity\n\nTo understand the impact of changes, you can compare the Jaccard Similarity of snapshots.\n\nPlease copy this example and customize it for your own purposes!",
27
+
"metadata": {}
28
+
},
29
+
{
30
+
"cell_type": "markdown",
31
+
"source": "### Imports",
32
+
"metadata": {}
33
+
},
34
+
{
35
+
"cell_type": "code",
36
+
"source": "import pandas as pd\nimport io\nfrom js import fetch",
37
+
"metadata": {
38
+
"trusted": true
39
+
},
40
+
"execution_count": 1,
41
+
"outputs": []
42
+
},
43
+
{
44
+
"cell_type": "markdown",
45
+
"source": "## Define the Data You Want",
46
+
"metadata": {}
47
+
},
48
+
{
49
+
"cell_type": "code",
50
+
"source": "CASE_ID = 6 # Your Case\nSNAPSHOT_IDS = [1,2] # Your Snapshots. Use the Compare Snapshot function in Quepid to see what the specific ID's are of your snapshots.",
51
+
"metadata": {
52
+
"trusted": true
53
+
},
54
+
"execution_count": 2,
55
+
"outputs": []
56
+
},
57
+
{
58
+
"cell_type": "markdown",
59
+
"source": "### Jaccard Subroutines",
60
+
"metadata": {}
61
+
},
62
+
{
63
+
"cell_type": "code",
64
+
"source": "## Calculation of Jaccard Similarity of List 1 and 2\n\ndef jaccard_similarity(list1, list2):\n print(list1, list2)\n if list1 == list2: \n print('the lists are same')\n return float(1.0)\n intersection = len(list(set(list1).intersection(list2)))\n union = (len(set(list1)) + len(set(list2))) - intersection\n return float(intersection) / union",
65
+
"metadata": {
66
+
"trusted": true
67
+
},
68
+
"execution_count": 3,
69
+
"outputs": []
70
+
},
71
+
{
72
+
"cell_type": "code",
73
+
"source": "## Construction of a comparable list from Snapshot blob\n\ndef construct_comparable_list_from_snapshot_blob(snapshot):\n for data in snapshot:\n record = data.split(\"\\n\")\n #print(record)\n df = pd.DataFrame(record)\n df[['query','docid','rating']] = df[0].str.split(',',expand=True)\n ratings_df= df[['query','docid','rating']]\n\n # Drop first row as its just column names\n ratings_mod_df = ratings_df.drop(index=ratings_df.index[0])\n\n # Remove '?' if using ispy else the next step can be ignored\n ratings_mod_df['docid'] = ratings_mod_df['docid'].str.split('?').str.get(0)\n #print(ratings_mod_df.head(10))\n\n return ratings_mod_df",
"source": "### Pull data directly from Quepid's snapshot repository to calculate Jaccard Similarity",
92
+
"metadata": {}
93
+
},
94
+
{
95
+
"cell_type": "code",
96
+
"source": "# Retrieve from Quepid API from Case id - 6 and Snapshot id - 1\nrating_snapshot_1 = []\nres = await fetch(f'/api/export/ratings/{CASE_ID}.csv?file_format=basic_snapshot&snapshot_id={SNAPSHOT_IDS[0]}')\nrating_snapshot_1.append(await res.text())\n#print(rating_snapshot_1)\n\n# Retrieve from Quepid API from Case id - 6 and Snapshot id - 2\nrating_snapshot_2 = []\nres = await fetch(f'/api/export/ratings/{CASE_ID}.csv?file_format=basic_snapshot&snapshot_id={SNAPSHOT_IDS[1]}')\nrating_snapshot_2.append(await res.text())\n#print(rating_snapshot_2)",
97
+
"metadata": {
98
+
"tags": [],
99
+
"trusted": true
100
+
},
101
+
"execution_count": 6,
102
+
"outputs": []
103
+
},
104
+
{
105
+
"cell_type": "markdown",
106
+
"source": "### Read and transform data in a dataframe",
0 commit comments