Now, we have the core logic for an eager backend, a lazy engine, and an optimizer, it's the perfect time to structure this into a proper framework. This involves organizing the code into logical files and planning for future growth.
With this structure, you can now plan for new features. Here is a roadmap:
A. More Operations
-
Linear Algebra:
- Matrix Multiplication (
@): Fully implement the MultiplyOp to call multiply_eager.
- Transpose (
.T): Add a TransposeOp. An interesting optimization is that a transpose can often be "free" by just changing how tiles are read, without writing a new matrix.
- Element-wise Functions: Add
exp, log, sqrt, etc. These are simple to fuse.
-
Reductions:
sum(), mean(): These are fundamental. They are also interesting because they change the output shape from a matrix to a vector or scalar.
max(), min(): Implement these along different axes.
B. A Smarter Optimizer ✨
-
Rule-Based Optimizer: Instead of hard-coding the if isinstance(...) check inside one operation, create a list of optimization rules. Your optimizer would have a registry like:
- Rule 1: If the plan matches
Multiply(Add(A, B), Scalar), execute with fused_add_multiply_kernel.
- Rule 2: If the plan matches
Transpose(Transpose(A)), rewrite the plan to just A.
-
Cost-Based Optimizer: For truly advanced systems, you would implement a cost model. For an operation like A @ B @ C, the optimizer would calculate the estimated I/O cost of both (A @ B) @ C and A @ (B @ C) and choose the cheaper execution path based on the matrix shapes.
C. An Enhanced Execution Engine
This roadmap transforms the project from a simple script into a conceptual blueprint for a real computational framework. The process of thinking about this architecture is just as valuable as your work on SystemDS, as it deals with the same fundamental design patterns of separating the logical plan from the physical execution.
Now, we have the core logic for an eager backend, a lazy engine, and an optimizer, it's the perfect time to structure this into a proper framework. This involves organizing the code into logical files and planning for future growth.
With this structure, you can now plan for new features. Here is a roadmap:
A. More Operations
Linear Algebra:
@): Fully implement theMultiplyOpto callmultiply_eager..T): Add aTransposeOp. An interesting optimization is that a transpose can often be "free" by just changing how tiles are read, without writing a new matrix.exp,log,sqrt, etc. These are simple to fuse.Reductions:
sum(),mean(): These are fundamental. They are also interesting because they change the output shape from a matrix to a vector or scalar.max(),min(): Implement these along different axes.B. A Smarter Optimizer ✨
Rule-Based Optimizer: Instead of hard-coding the
if isinstance(...)check inside one operation, create a list of optimization rules. Your optimizer would have a registry like:Multiply(Add(A, B), Scalar), execute withfused_add_multiply_kernel.Transpose(Transpose(A)), rewrite the plan to justA.Cost-Based Optimizer: For truly advanced systems, you would implement a cost model. For an operation like
A @ B @ C, the optimizer would calculate the estimated I/O cost of both(A @ B) @ CandA @ (B @ C)and choose the cheaper execution path based on the matrix shapes.(A + B) * 2pattern #7C. An Enhanced Execution Engine
Parallelism: Use Python's
concurrent.futures.ThreadPoolExecutorto process tiles in parallel and saturate a fast NVMe drive, orProcessPoolExecutorto use multiple CPU cores for computation-heavy tiles.Memory Management: Implement a simple LRU (Least Recently Used) cache to keep a few frequently accessed data tiles in memory, avoiding re-reading them from disk. #13
Add option to visualize the cache access #19
This roadmap transforms the project from a simple script into a conceptual blueprint for a real computational framework. The process of thinking about this architecture is just as valuable as your work on SystemDS, as it deals with the same fundamental design patterns of separating the logical plan from the physical execution.