(Final Submission): 987 mnc sketch implemenation by justaprog · Pull Request #1000 · daphne-project/daphne

justaprog · 2026-01-18T21:46:46Z

Issues?
#987
What?
Based on the paper: Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, Peter J. Haas (2019). "MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions". SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data.

MNC Sparsity Estimator Project (Large Scale Data Engineering
This section documents the environment and results for the MNC Sparsity Estimator)

Experimental Setting
All experiments were conducted using a Docker container to ensure a consistent environment for library linking and reproducibility.
data.zip

Hardware: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz with 16.0 GB RAM.
OS/Software: Windows 11 Home running through WSL (Windows Subsystem for Linux).
System Type: 64-bit operating system, x64-based processor.
2. Metrics
Dataset: We used real matrices from the SuiteSparse Matrix Collection (e.g., gre__115.mtx) instead of synthetic random data to better reflect real-world distributions.
Metrics: We measured Accuracy Error (the difference between the MNC estimate and actual sparsity) and Decision Correctness.
Variance: Because MNC is a probabilistic sketching method, small variations in results between runs are expected and were managed by using mean values for reporting.
3. Key Results
Decision Logic: Our implementation uses a 0.25 density threshold. It correctly triggers SPARSE storage for workloads measured below this value and DENSE for those above.
Reliability: The prototype passed 2,452 unit tests in the DAPHNE Catch2 suite, confirming it integrates safely with the existing system kernels.

This PR contains:

feat: implementations for Mnc sketch data structure
feat: builder functions to build the sketch from dense and csr matrix
feat: sparsity estimator for matrix multiplication's result
feat: sketch propagation for matrix multiplication
feat: integrate mnc sketch into daphne compiler as data property of data type matrix
feat: support MalmutOP, FillOp, RandOp, SeqOp, Binary elementwise arthmetic, Transpose, ReshapeOp, MatrixConstantOp, ReadOp to infer sparsity during compiling
test: integration tests via DaphneDSL scripts/test to ensure that all ops work properly within daphne compiler
test: unittest cases

Update after final submission:

Current all tests run:

- Problems: statuscode and --select-matrix-repr

Update Features

feat: support rbind, cbind,diagMatrix ops

Experiments

Visualizations: https://github.com/justaprog/daphne-experiment-visualizations
Dataset: bcsstk02 (66,66), bp__1000 (822,822), bp_1200 (822,822), football (35,35), G10 (800,800),
G11 (800,800), G14(800,800), G22 (2000,2000), G27 (2000,2000), gre__115 (115,115), gre__185 (185,185),
gre__343 (343,343), gre__512 (512,512), str__200 (363,363), str__400 (363,363), lp_scagr7 (129, 185),
football__115 (115,115), dw256A (512,512), dwt__66 (66,66)
Scripts:
mncsketch_propa folder: scripts to ensure inferring sparsity via mncsketch works well
experiment folder: scripts to run all experiments
experiment/result folder: experiments results saved as txt files
data folder: data.zip

…unctions

… of getters

…lations

…ence

…to MncSketch

…se matrix example

…enseMatrix

…nd update tests

pdamme

Thanks for this PR, @justaprog, @laakhdher, et al. Adding support for MNC sketches would make a great contribution to sparsity estimation in DAPHNE. Overall, your code is already in a good initial state, the stand-alone implementation of the MNC sketch and related algorithms as described in the paper looks good to me. Nevertheless, there is still substantial work required before we can merge this PR, most importantly, the current MNC implementation needs to be integrated with the rest of the DAPHNE system, such that we can benefit from MNC when running a DaphneDSL script.

Required points: (need to be addressed before we can merge this PR)

Finish the implementation of the remaining parts of MNC as described in the paper.
1. Implement the density map estimator over an MNC sketch (your function Edm() is currently a stub).
2. Implement the propagation of the MNC sketch over reorganization ops and elementwise binary ops as described in Section 4.2 of the paper.
Integrate your implementation of MNC with the rest of the DAPHNE system.
1. Make an MNC sketch an optional data property of a matrix (see the developer docs on adding new data properties).
2. Invoke the MNC-based sparsity estimation and MNC sketch propagation during the InferencePass (implement the necessary MLIR interfaces or traits, see the developer docs on data properties again).
3. Ensure that the MNC sketch of inputs of the data flow graph can be known. Such inputs could be:
  - Matrices read from files (most important).
    - MNC sketch as an optional part of the file meta data, plus some way of creating that file meta data automatically.
    - MNC sketch creation on-the-fly (optional).
  - Matrices created by various source ops (e.g., fill()/FillOp, seq()/SeqOp, rand()/RandMatrixOp, … (implement the inference interface for the MNC sketch for those ops).
  - Matrix literals (MatrixConstantOp).
Please fix the code style issues to make the CI checks pass (see the contribution guidelines).

Optional points: (recommended, but not critical for merging)

Apply some little code improvements (see my comments on individual code lines).
Use the helper function genGivenVals() to shorten the code of your unit test cases. Currently, you initialize CSR matrices by hardcoding the values, colIdxs, and rowOffsets arrays, which leads to long and hard-to-read test code. Instead, you could simply pass the (dense) values array to genGivenVals() and get its representation as a CSRMatrix or DenseMatrix. (For small test matrices, it doesn't hurt if you explicitly write the zeros in the code.) You can take inspiration from many existing unit test cases.
Use the MNC sketch for more accurate, structure-aware sparsity estimation of additional ops, e.g., SliceRowOp/SliceColOp, ExtractRowOp/ExtractColOp, etc.
Implement the MNC sketch propagation for additional ops, e.g., elementwise unary ops, etc.

pdamme · 2026-01-29T22:00:32Z

+    h.hc.assign(h.n, 0);
+
+    const std::size_t *rowOffsets = A.getRowOffsets();
+    const std::size_t *colIdxs    = A.getColIdxs();


Please use A.getColIdxs(0); otherwise, you may run into errors if A is a view into a larger CSRMatrix.

pdamme · 2026-01-29T22:03:18Z

+
+    // --- 3) isDiagonal ---
+    // We call a matrix "diagonal" if it is square and every non-zero lies on i == j.
+    if(h.m == h.n && nnzEnd > nnzBegin) {


You could also skip the check for a diagonal matrix if h.maxHr > 1 (or h.maxHc > 1), because a diagonal matrix can have at most one non-zero per row/column.

pdamme · 2026-01-29T22:06:51Z

+    const std::size_t m = hA.m;
+    const std::size_t l = hB.n;
+
+    double nnz = 0.0;


I recommend not using double for the number of non-zeros, because the number of non-zeros is conceptually an integer (e.g., size_t). Using double could lead to round-off errors (and vastly off numbers when casted to integer) when adding up many small numbers of per-row non-zeros in a large matrix).

This hints also applies to several more such uses of double below.

pdamme · 2026-01-29T22:10:29Z

+    }
+
+    // Case 2: Extended count
+    else if(!hA.her.empty() || !hB.her.empty()) {


As the code in the then-branch uses both the extended histograms of hA and hB, I think it should be && instead of || (in contrast to Algorithm 1 in the paper).

pdamme · 2026-01-29T22:12:22Z

+    else if(!hA.her.empty() || !hB.her.empty()) {
+
+        // Exact part
+        for(std::size_t j = 0; j < hA.n; ++j)


These two loops both iterate over the same range (as hA.n is the same as hB.m). If you fuse these loops, you need to scan hA.hec only once, which should be more efficient.

pdamme · 2026-01-29T22:13:25Z

+        std::size_t p = (hA.nnzRows - hA.rowsEq1) * (hB.nnzCols - hB.colsEq1);
+
+        if(p > 0) {
+            double dens = Edm(hA.hc, hB.hr, p);


Don't forget to subtract the extended histograms from hA.hc and hB.hr as in Algorithm 1 in the paper.

…ch IDs consistently

- Integrated Mncsketch into DaphneIR to support RandMatrixOp, FillOp, SeqOp, MatMulOp, TransposeOp, ReshapeOp, MatrixConstantOp, EwAddOp, EwSubOp, EwMulOp, EwDivOp, ReadOp ops

…pagate

…dOp and ColBindOp

…og_op

…d comparison

… for_19 script

…t_mnc

…ata writing

…rounding

…periments - Created new result files for matrix multiplication experiments with real data: - run_matmul_only_real_data.txt - run_matmul_self_transpose.txt - Updated run_matmul_w.txt with additional results and improved formatting. - Added new script run_elementwise_mul.sh to execute element-wise multiplication experiments across various datasets. - Modified run_matmul_only_real_data.sh to include additional datasets and improved execution permissions. - Created run_matmul_self_transpose.sh for self-transpose matrix multiplication experiments. - Updated run_write_mnc_md.sh to include more datasets and improved formatting.

Update after final submission

Added a Project Spotlight section to the README to document the Matrix Non-zero Count (MNC) sketch implementation. This includes a breakdown of our key technical contributions (compiler integration, sketch propagation, and extended operations), a code navigation guide for our code files in src/ and test/

justaprog and others added 20 commits December 5, 2025 19:16

feat: mnc sketch struct and buildfromCsr

d19c47b

feat: enhance MncSketch class with additional methods and include tests

022117e

feat: add MncSketch unit tests for MNC sketch

2f00a95

refactor: change MncSketch from class to struct and clean up member f…

8894fae

…unctions

refactor: update MncSketchTest to use public member variables instead…

eaa46cc

… of getters

feat: enhance MncSketch with diagonal check and extended counts calcu…

d81fe50

…lations

feat: add MNC Sketch test case based on example from paper

a2c8543

docs: add execution instruction comment for MNC sketch test

d9bde44

Sparsity Estimator

3fdc65f

docs: add note for MNC attribute usage in DiagMatrixOp sparsity infer…

1019c13

…ence

build: add comment for parallel build option in build script

162f024

refactor: remove MNCSparsityEstimation and integrate functionality in…

7571728

…to MncSketch

refactor: rename estimateSparsity function to estimateSparsity_product

531cc1d

test: update MncSketch tests to use buildMncFromCsrMatrix and add den…

8f624da

…se matrix example

feat: add buildMncFromDenseMatrix function to create MncSketch from D…

5a0beb3

…enseMatrix

feat: add propagation functions for MncSketch and tests

a43777e

Merge branch 'daphne-eu:main' into 987-mnc-sketch-implemenation

641a707

docs: add docstrings to mnc sketch implementation and tests

9c6dab0

fix: pass vectors as arguments for Edm

7639425

feat: implement EdmDensity function for improved density estimation a…

16dbf6c

…nd update tests

pdamme self-requested a review January 28, 2026 13:53

pdamme added student project Suitable for a bachelor/master student's programming project. LDE winter 2025/26 Student project in the course Large-scale Data Engineering at TU Berlin (winter 2025/26). labels Jan 28, 2026

pdamme requested changes Jan 29, 2026

View reviewed changes

pdamme reviewed Jan 29, 2026

View reviewed changes

justaprog and others added 4 commits February 16, 2026 20:57

feat: improve inferMncSketchId implementations to handle unknown sket…

e2cf73e

…ch IDs consistently

refactor: refactor scripts

0004d4d

Merge pull request #1 from justaprog/feat/add_mnc_to_matrix_property

bc3b7bd

- Integrated Mncsketch into DaphneIR to support RandMatrixOp, FillOp, SeqOp, MatMulOp, TransposeOp, ReshapeOp, MatrixConstantOp, EwAddOp, EwSubOp, EwMulOp, EwDivOp, ReadOp ops

fixed comments and changed test cases to use genGivenVals

cd6cefa

justaprog changed the title ~~(Intial Prototype): 987 mnc sketch implemenation~~ (Final Submission): 987 mnc sketch implemenation Feb 16, 2026

justaprog and others added 24 commits February 21, 2026 17:43

feat: add propagateRbind and propagateCbind functions to MncSketchPro…

f531b89

…pagate

fix: reduce parallel jobs in build command to improve stability

487ece7

feat: add propagateMncFromDiagMatrix function to MncSketchPropagate

6f87e17

feat: implement inferMncSketchId and inferSparsity methods for RowBin…

e079d02

…dOp and ColBindOp

feat: add cbind and rbind operations with sparsity calculations in re…

aa089a8

…og_op

feat: add MNC to matrix property and update related scripts

c2d4ff6

chore: remove outC.csv and its metadata file

cb0af4c

feat: add various experimental scripts for MNC sparsity estimation an…

c4f9afb

…d comparison

fix: update script paths in corner_cases and real_data, remove unused…

6b689c8

… for_19 script

refactor: rename scripts/mncsketch folder to scripts/mncsketch_propa

5422e50

feat: add new experimental scripts for matrix multiplication with MNC

fb43f16

fix: remove unused writeMatrix calls for m1_without_mnc and m2_withou…

4c58cdc

…t_mnc

feat: add new scripts for matrix multiplication experiments and metad…

2a60a13

…ata writing

fix: correct matrix initialization in diagonal propagation test case

bee7467

feat: fix element-wise multiplication propagation with probabilistic …

82eb023

…rounding

feat: add element-wise addition experiment scripts and results

3507fb0

refactor: move irrelevant experiment to experiment_old folder

388f571

feat: add matmul_self experiment script and results

355e36f

feat: add run_matmul_w_reshape

9f829c7

feat: add some more scripts

4945dd1

feat: update matmul_w_reshape experiment results and script paths

5cbe89c

Merge pull request #3 from justaprog/987-experiments

1b51af6

Update after final submission

abdnadeemm deleted the 987-mnc-sketch-implemenation branch April 2, 2026 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Final Submission): 987 mnc sketch implemenation #1000

(Final Submission): 987 mnc sketch implemenation #1000
justaprog wants to merge 88 commits into
daphne-project:mainfrom
justaprog:987-mnc-sketch-implemenation

justaprog commented Jan 18, 2026 •

edited

Loading

Uh oh!

pdamme left a comment

Uh oh!

pdamme Jan 29, 2026

Uh oh!

pdamme Jan 29, 2026

Uh oh!

pdamme Jan 29, 2026

Uh oh!

pdamme Jan 29, 2026

Uh oh!

pdamme Jan 29, 2026

Uh oh!

pdamme Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

justaprog commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update after final submission:

Current all tests run:

Update Features

Experiments

Uh oh!

pdamme left a comment

Choose a reason for hiding this comment

Uh oh!

pdamme Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

pdamme Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

pdamme Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

pdamme Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

pdamme Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

pdamme Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

justaprog commented Jan 18, 2026 •

edited

Loading