Skip to content

Commit 2077719

Browse files
committed
add old cse232 db notes from final lecture
1 parent 1f039e9 commit 2077719

File tree

1 file changed

+62
-5
lines changed

1 file changed

+62
-5
lines changed

education/grad/cse232/index.md

+62-5
Original file line numberDiff line numberDiff line change
@@ -1065,11 +1065,6 @@ What is P(S) for $$S=w_3(A)w_2(C)r_1(A)w_1(B)r_1(C)w_2(A)r_4(A)w_4(D)
10651065
- option 1: request exclusive lock
10661066
- option 2: upgrade (need to read, but unsure about write)
10671067

1068-
1069-
1070-
1071-
1072-
10731068
## Lecture 13 - Concurrency Control Cont'd
10741069

10751070
- Concurrency Control judged on few aspects
@@ -1272,4 +1267,66 @@ GROUP BY G
12721267
- caches may or may not pay off as they incur a maintenance cost.
12731268

12741269

1270+
## Lecture 15 - Query Processing ...Again
1271+
1272+
- wong-yussefi algorithm (INGRES)
1273+
- optimize queries with lots of joins.
1274+
- smart exhaustive algorithm for plans
1275+
- textbook sec 16.6
1276+
- INGRES is a heuristic for plan enumeration
1277+
- not typically in use for modern databases
1278+
- exponential algorithm ok in practice as long as the exponent is low
1279+
1280+
1281+
- basic DP approach to enumerating plans
1282+
- for each sub expression $$op(e_1 e_2\dots e_n$$
1283+
- recursively compute the best plan and cost for each subexpression $$e_i$$
1284+
- for each physical operator $$op^p$$ of each operator ($$op$$)
1285+
- evaluate the cost of computing the operator abd bite best plan for each subexpression
1286+
- memo the best physical operator $$op^4$$
1287+
1288+
Example
1289+
1290+
Given a query
1291+
1292+
```sql
1293+
SELECT *
1294+
FROM R, S, T, U
1295+
WHERE R.A = S.A AND R.B = S.B and T.C=U.C
1296+
```
12751297

1298+
Algorithm would
1299+
- give all plans
1300+
- eliminate all plans with a cartesian product and with only joins
1301+
- physical plans emerge from the above logical plans
1302+
- join types:
1303+
- Hash Join
1304+
- Merge join
1305+
- SSM
1306+
- S_M
1307+
- _SM
1308+
- each logical plan will have a number of physical plans because each
1309+
operator can have different physical implementations which run at
1310+
different speeds due to table layouts (partitions, indices, storage,
1311+
etc)
1312+
- number of plans grow large due to max implementations
1313+
- memoizing can reduce need to re-calculate costs.
1314+
- solving 3-way sub problems
1315+
- e.g. $$R\bowtie S \bowtie T$$
1316+
- split into single problems (e.g. parenthesis -->
1317+
$$(R\bowtie S) \bowtie T$$ or $$R \bowtie (S\bowtie T)$$)
1318+
1319+
Local suboptimality of the basic approach, and the Selinger improvement
1320+
1321+
- basic dynamic programming may lead to globally suboptimal solutions
1322+
- solution good for one operation, but not for the whole query
1323+
- a suboptimal plan for $$e_1$$ may lead to the optimal plan for an entire op
1324+
$$op(e_1 e_2\dots e_n)$$
1325+
- consider $$e_1 \bowtie e_2 $$
1326+
- optimal computation of $$e_1$$ produces an unsorted result
1327+
- optimal merge is a sort-merge join on A
1328+
- could have paid off to consider the suboptimal computation of
1329+
$$e_1$$ that produces sorted results on A
1330+
- Selinger improvement
1331+
- memo any plan that also produces an ordering of the results which may be of
1332+
use to ancestor operators.

0 commit comments

Comments
 (0)