@@ -1065,11 +1065,6 @@ What is P(S) for $$S=w_3(A)w_2(C)r_1(A)w_1(B)r_1(C)w_2(A)r_4(A)w_4(D)
1065
1065
- option 1: request exclusive lock
1066
1066
- option 2: upgrade (need to read, but unsure about write)
1067
1067
1068
-
1069
-
1070
-
1071
-
1072
-
1073
1068
## Lecture 13 - Concurrency Control Cont'd
1074
1069
1075
1070
- Concurrency Control judged on few aspects
@@ -1272,4 +1267,66 @@ GROUP BY G
1272
1267
- caches may or may not pay off as they incur a maintenance cost.
1273
1268
1274
1269
1270
+ ## Lecture 15 - Query Processing ...Again
1271
+
1272
+ - wong-yussefi algorithm (INGRES)
1273
+ - optimize queries with lots of joins.
1274
+ - smart exhaustive algorithm for plans
1275
+ - textbook sec 16.6
1276
+ - INGRES is a heuristic for plan enumeration
1277
+ - not typically in use for modern databases
1278
+ - exponential algorithm ok in practice as long as the exponent is low
1279
+
1280
+
1281
+ - basic DP approach to enumerating plans
1282
+ - for each sub expression $$ op(e_1 e_2\dots e_n $$
1283
+ - recursively compute the best plan and cost for each subexpression $$ e_i $$
1284
+ - for each physical operator $$ op^p $$ of each operator ($$ op $$ )
1285
+ - evaluate the cost of computing the operator abd bite best plan for each subexpression
1286
+ - memo the best physical operator $$ op^4 $$
1287
+
1288
+ Example
1289
+
1290
+ Given a query
1291
+
1292
+ ``` sql
1293
+ SELECT *
1294
+ FROM R, S, T, U
1295
+ WHERE R .A = S .A AND R .B = S .B and T .C = U .C
1296
+ ```
1275
1297
1298
+ Algorithm would
1299
+ - give all plans
1300
+ - eliminate all plans with a cartesian product and with only joins
1301
+ - physical plans emerge from the above logical plans
1302
+ - join types:
1303
+ - Hash Join
1304
+ - Merge join
1305
+ - SSM
1306
+ - S_M
1307
+ - _ SM
1308
+ - each logical plan will have a number of physical plans because each
1309
+ operator can have different physical implementations which run at
1310
+ different speeds due to table layouts (partitions, indices, storage,
1311
+ etc)
1312
+ - number of plans grow large due to max implementations
1313
+ - memoizing can reduce need to re-calculate costs.
1314
+ - solving 3-way sub problems
1315
+ - e.g. $$ R\bowtie S \bowtie T $$
1316
+ - split into single problems (e.g. parenthesis -->
1317
+ $$ (R\bowtie S) \bowtie T $$ or $$ R \bowtie (S\bowtie T) $$ )
1318
+
1319
+ Local suboptimality of the basic approach, and the Selinger improvement
1320
+
1321
+ - basic dynamic programming may lead to globally suboptimal solutions
1322
+ - solution good for one operation, but not for the whole query
1323
+ - a suboptimal plan for $$ e_1 $$ may lead to the optimal plan for an entire op
1324
+ $$ op(e_1 e_2\dots e_n) $$
1325
+ - consider $$ e_1 \bowtie e_2 $$
1326
+ - optimal computation of $$ e_1 $$ produces an unsorted result
1327
+ - optimal merge is a sort-merge join on A
1328
+ - could have paid off to consider the suboptimal computation of
1329
+ $$ e_1 $$ that produces sorted results on A
1330
+ - Selinger improvement
1331
+ - memo any plan that also produces an ordering of the results which may be of
1332
+ use to ancestor operators.
0 commit comments