Method call analysis based `testQuick` command #4731

HollandDM · 2025-03-15T03:59:50Z

Add testQuick command that leverage CallGraphAnalysis.

This pull request introduces the testQuick command, which utilizes CallGraphAnalysis. The following changes have been implemented:

The outputs of CallGraphAnalysis now include invalidClassNames, which contains the names of test classes extracted from the spanningInvalidationTree. This list indicates the classes that are affected by code changes.
A callGraphAnalysis task has been added to JavaModule. This is a persistent task that, when executed, generates a callGraphAnalysis folder containing data related to the CallGraphAnalysis feature.
The codeSignatures of MillBuildRootModule have been updated to incorporate the results from callGraphAnalysis.
The testQuick command has been added to JavaTests. It uses the invalidClassNames output from callGraphAnalysis and a dedicated log file, quickTestFailedClasses.log, which stores the failed test classes from previous runs. testQuick combines the information from these two sources to determine the set of test classes to run.

/claim #4109

see also #4787 for Zinc based design

lihaoyi · 2025-03-17T03:41:07Z

scalalib/src/mill/scalalib/JavaModule.scala

+          def isForwarderCallsiteOrLambda =
+            callSiteOpt.nonEmpty && {
+              val callSiteSig = callSiteOpt.get.sig
+
+              (callSiteSig.name == (calledSig.name + "$") &&
+                callSiteSig.static &&
+                callSiteSig.desc.args.size == 1)
+              || (
+                // In Scala 3, lambdas are implemented by private instance methods,
+                // not static methods, so they fall through the crack of "isSimpleTarget".
+                // Here make the assumption that a zero-arg lambda called from a simpleTarget,
+                // should in fact be tracked. e.g. see `integration.invalidation[codesig-hello]`,
+                // where the body of the `def foo` target is a zero-arg lambda i.e. the argument
+                // of `Cacher.cachedTarget`.
+                // To be more precise I think ideally we should capture more information in the signature
+                isSimpleTarget(callSiteSig.desc) && calledSig.name.contains("$anonfun")
+              )
+            }


All the other special cases except this one are heuristics unique to MillBuildRootModule, and so probably should live there. We can have a def codeSigIgnoreCall method that gets fed into CodeSig.compute, with a default implementation for JavaModule and a more detailed override for MillBuildRootModule

lihaoyi · 2025-03-17T03:41:57Z

One limitation of the current implementation is after the first time testQuick is run (test all classes), callGraphAnalysis will only create the current folder. If we run testQuick again, because callGraphAnalysis's inputs haven't changed, it will not run provide previous folder. In that case testQuick will not have enough information and have to run all the test classes again.

I don't understand the problem are describing here, could you go into more detail

lihaoyi · 2025-03-17T03:42:58Z

scalalib/src/mill/scalalib/JavaModule.scala

@@ -92,6 +95,95 @@ trait JavaModule
          case _: ClassNotFoundException => // if we can't find the classes, we certainly are not in a ScalaJSModule
        }
    }
+
+    private def quickTest(args: Seq[String]): Task[(String, Seq[TestResult])] =
+      Task(persistent = true) {


Task(persistent = true) is probably the wrong thing here, since def quickTest takes arguments. We should add a Command(persistent = true) flag and keep it as a command similar to the other test tasks

lihaoyi · 2025-03-17T03:45:33Z

The outputs of CallGraphAnalysis now include invalidClassNames, which contains the names of test classes extracted from the spanningInvalidationTree. This list indicates the classes that are affected by code changes.

This seems like a post-processing step, could we put it in a separate method that takes the output of spanningInvalidationTree and parses out the parts we need? Or could it be a different helper method that works directly on prevTransitiveCallGraphHashes and transitiveCallGraphHashes0, so we don't need to construct a spanningInvalidationTree at all?

HollandDM · 2025-03-17T04:43:03Z

One limitation of the current implementation is after the first time testQuick is run (test all classes), callGraphAnalysis will only create the current folder. If we run testQuick again, because callGraphAnalysis's inputs haven't changed, it will not run provide previous folder. In that case testQuick will not have enough information and have to run all the test classes again.

I don't understand the problem are describing here, could you go into more detail

Here is the reproduce steps:

rm -rf out to clean the output folder.
run scalalib.callGraphAnalysis for the first time.
run scalalib.callGraphAnalysis again.

In this case, the first callGraphAnalysis will run normally and produce current folder, but because we din't modify any source code, the second callGraphAnalysis will do nothing (I guess because it's input, compile() doesn't change). Therefore previous folder will not be produced.
The problem is that this PR will check for previous folder, and it will consider testQuick to be a first time command when it doesn't find one. So with this case especially, invoke testQuick consecutively will run all test cases like normal testForked.

=================================

The outputs of CallGraphAnalysis now include invalidClassNames, which contains the names of test classes extracted from the spanningInvalidationTree. This list indicates the classes that are affected by code changes.

This seems like a post-processing step, could we put it in a separate method that takes the output of spanningInvalidationTree and parses out the parts we need? Or could it be a different helper method that works directly on prevTransitiveCallGraphHashes and transitiveCallGraphHashes0, so we don't need to construct a spanningInvalidationTree at all?

I also consider post process this at first, but the output of pre-jsonifiedspanningInvalidationTree is too easy to work with, so I went with this approach. I will update this into a separate post-processing step

lihaoyi · 2025-03-17T05:41:23Z

@HollandDM the problem with callGraphAnalysis caching should be fixed by making testQuick a command. That way it will run every time regardless of changes, and then can decide what to do based on whether or not tue upstream tasks changed

lihaoyi · 2025-03-17T05:44:36Z

I also consider post process this at first, but the output of pre-jsonifiedspanningInvalidationTree is too easy to work with, so I went with this approach. I will update this into a separate post-processing step

Let's split it into three methods then: one shared one, one that uses the shared helper to generate the tree, and one that uses the shared helper to generate a list of affected classes

HollandDM · 2025-03-17T06:43:16Z

scalalib/src/mill/scalalib/JavaModule.scala

+        val jsonValueQueue = mutable.ArrayDeque[(String, ujson.Value)]()
+        val prefixTrie = new PrefixTrie[String]()
+        jsonValueQueue.appendAll(spanningInvalidationTreeObj.value)
+        while (jsonValueQueue.nonEmpty) {


I also consider post process this at first, but the output of pre-jsonifiedspanningInvalidationTree is too easy to work with, so I went with this approach. I will update this into a separate post-processing step

Let's split it into three methods then: one shared one, one that uses the shared helper to generate the tree, and one that uses the shared helper to generate a list of affected classes

I went with this approach first, as I think leaving callGraphAnalysis folder as it is right now is quite good. This pre-processing seem quick enough when I run scalalib.test.testQuick, so I think it is fine. WDYT?

Looking at the code, I think splitting it into three methods is better. The various string operations necessary to do the post-processing look pretty fragile, and while it no doubt works, it probably will be harder to maintain and refactor in future compared to working with typed data structures

HollandDM · 2025-03-17T06:44:04Z

core/define/src/mill/define/Task.scala

+  /**
+   * This version allow [[Command]] to be persistent
+   */
+  inline def Command[T](inline persistent: Boolean)(inline t: Result[T])(implicit


add a Command with persistent option.

lihaoyi · 2025-03-17T10:49:06Z

I think the approach looks reasonable. Questions I have, more on the semantics and expected behavior rather than the code:

I assume you tested that this works for changes in the test code, but does this work for code changes in the non-test module code? Or upstream modules? The callgraph analysis we do for MillBuildRootModule is focused on one module, though it should in theory work for multi-module codebases too if given all the necessary classfiles
What kind of performance cost does this incur when used on various .test modules in the Mill codebase? You can run ./mill dist.installLocal && ci/patch-mill-bootstrap.sh to get an executable you can try out locally
Are there any major edge cases that we should be aware of that this doesn't work well?

Eventually all this should probably make its way into a unit/integration test suite, but for now manual testing is fine just to figure out what the boundaries and limitations of this featue

HollandDM · 2025-03-18T04:30:19Z

It seems that changes from upstream modules does not trigger the test. The main reason is that call graph analysis does not compute necessary data for it.
Performance cost is very little, the computation for which tests to run is small compare to the test running time.
Aside from 1, for now I'm not aware of any case that this does not work yet.

lihaoyi · 2025-03-18T05:18:16Z

The callgraph analysis should be able to handle cross-module analysis if all the classfiles are available on disk; you should be able to aggregate the transitiveLocalClasspath and pass it to CodeSig and it should just work (in theory)

HollandDM · 2025-03-20T09:20:38Z

I've update the code to include all upstream module while running the testQuick command. Also clean up some shared logic.
I've also run some local test, e.g: if I update UnitTester.scala, then run scalalib.test.testQuick, all test cases that use UnitTester will re-run the test. This ensure cross module is working.

lihaoyi · 2025-03-22T02:41:57Z

core/codesig/src/mill/codesig/ExternalSummary.scala

@@ -47,7 +48,8 @@ object ExternalSummary {

    def load(cls: JCls): Unit = methodsPerCls.getOrElse(cls, load0(cls))

-    def load0(cls: JCls): Unit = {
+    // Some macros implementations will fail the ClassReader, we can skip them


What kind of error do those macro implementations produce? We should try and be specific about what errors we catch here, to avoid silencing unexpected errors that may indicate real issues

If you run core.define.compile and then check class/mill/define/Cross$Factory$.class in out folder, you will see something like this

// Source code is decompiled from a .class file using FernFlower decompiler. package mill.define; import java.io.Serializable; import mill.define.internal.CrossMacros; import mill.define.internal.CrossMacros.; import scala.runtime.ModuleSerializationProxy; public final class Cross$Factory$ implements Serializable { public static final Cross$Factory$ MODULE$ = new Cross$Factory$(); public Cross$Factory$() { } private Object writeReplace() { return new ModuleSerializationProxy(Cross$Factory$.class); } public CrossMacros inline$CrossMacros$i1(final internal x$0) { return .MODULE$; } }

The retrieved call from asm yield something like this:
def mill.define.Cross$Factory$#inline$CrossMacros$i1(mill.define.internal)mill.define.internal

As you can see mill/define/internal is not a valid class, so class loader will throw when trying to load this up.

I don't know why the code is like this, I will try to reproduce this and look around for clues

lihaoyi · 2025-03-22T02:42:49Z

core/codesig/src/mill/codesig/ReachabilityAnalysis.scala

  }

-  logger.mandatoryLog(spanningInvalidationTree)
+  def calculateInvalidClassName(


Probably should call this calculateInvalidatedClassName to avoid confusion

lihaoyi · 2025-03-22T02:43:44Z

core/codesig/src/mill/codesig/CodeSig.scala

+    logger.mandatoryLog(callAnalysis.methodCodeHashes)
+    logger.mandatoryLog(callAnalysis.prettyCallGraph)


Are these two mandatoryLogs necessary? It seems we only use transitiveCallGraphHashes and spanningInvalidationTree, at least as far as I can tell

I have no ideal either, I put them here to mimic the original code, which log everything like this

lihaoyi · 2025-03-22T02:43:57Z

core/codesig/src/mill/codesig/ReachabilityAnalysis.scala

+  /**
+   * Get all class names that have their hashcode changed compared to prevTransitiveCallGraphHashes
+   */
+  def invalidClassNames(


invalidatedClassNames

lihaoyi · 2025-03-22T02:47:24Z

scalalib/src/mill/scalalib/PrefixTrie.scala

+
+import scala.collection.mutable
+
+private[scalalib] final class PrefixTrie[A](using CanEqual[A, A]) {


This seems to be unused?

True, I must have missed it

lihaoyi · 2025-03-22T02:48:51Z

scalalib/src/mill/scalalib/JavaModule.scala

+    // We can ignore all calls to methods that look like Targets when traversing
+    // the call graph. We can do this because we assume `def` Targets are pure,
+    // and so any changes in their behavior will be picked up by the runtime build
+    // graph evaluator without needing to be accounted for in the post-compile
+    // bytecode callgraph analysis.
+    def isSimpleTarget(desc: mill.codesig.JvmModel.Desc) =
+      (desc.ret.pretty == classOf[mill.define.Target[?]].getName ||
+        desc.ret.pretty == classOf[mill.define.Worker[?]].getName) &&
+        desc.args.isEmpty
+
+    // We avoid ignoring method calls that are simple trait forwarders, because
+    // we need the trait forwarders calls to be counted in order to wire up the
+    // method definition that a Target is associated with during evaluation
+    // (e.g. `myModuleObject.myTarget`) with its implementation that may be defined
+    // somewhere else (e.g. `trait MyModuleTrait{ def myTarget }`). Only that one
+    // step is necessary, after that the runtime build graph invalidation logic can
+    // take over
+    def isForwarderCallsiteOrLambda =
+      callSiteOpt.nonEmpty && {
+        val callSiteSig = callSiteOpt.get.sig
+
+        (callSiteSig.name == (calledSig.name + "$") &&
+          callSiteSig.static &&
+          callSiteSig.desc.args.size == 1)
+        || (
+          // In Scala 3, lambdas are implemented by private instance methods,
+          // not static methods, so they fall through the crack of "isSimpleTarget".
+          // Here make the assumption that a zero-arg lambda called from a simpleTarget,
+          // should in fact be tracked. e.g. see `integration.invalidation[codesig-hello]`,
+          // where the body of the `def foo` target is a zero-arg lambda i.e. the argument
+          // of `Cacher.cachedTarget`.
+          // To be more precise I think ideally we should capture more information in the signature
+          isSimpleTarget(callSiteSig.desc) && calledSig.name.contains("$anonfun")
+        )
+      }


These should be in the MillBuildRootModule override method I think; they look specific to Mill's own "ignore Task-returning methods" heuristic that won't apply to normal code

If we allow analysis to go through Task-liked classes/objects, then by design it will spread into all tasks. In mill project there is tests that create a custom module of Test Module (TestModuleTestUtils, etc...). So in this case it will trigger every tasks.
Should we keep it like this, or allow testQuick to trigger tasks like that?

The issue is that this problem exists for other interfaces as well. For Mill and Mill plugins the heuristic is to ignore Task, but for other projects with other base abstractions the heuristic would need to be something different to be useful. Let's think if we can come up with a more general heuristic that isn't hardcoded to depend on Mill-specific types like Task, using the data we already have

What if rather than using a method-based callgraph, we did the analysis on a more coarse grained class-based dependency graph? This is something we can't really do for invalidating tasks since tasks are individual methods, but since test selection is at a granularity of classes that might work. Presumably if ClassA never directly or transitively references ClassB, and ClassA doesn't take any constructor parameters (since it's a test suite class and those usually have zero parameters) then changes to ClassB should never affect ClassA? This feels like an approach that could either replace the existing method-level approach, or be stacked on top as an additional heuristic (e.g. only invalidate a test or task if both the method-level codesig and the class-level codesig both change)

I think we still need to analyze method call to identify which classes depend on which classes, and with the nature of virtual invoke, we will still comeback to the case where call method in traits being translated to calling every concrete classes of that trait, right?

lihaoyi · 2025-03-22T02:52:25Z

Left some comments. Let's also add some unit tests in scalalib.test to exercise this logic, next to the various TestRunnerTests that exercise the rest of the testing logic:

single-module Java project
single-module Scala project
multi-module Java project
multi-module Scala project

Since we don't need to make any changes to the build.mill to exercise this logic, unit tests are fine and we don't need an integration test.

HollandDM · 2025-03-24T15:37:26Z

@lihaoyi I've update the code to address the comments and answer some questions.

For using class based graph instead of method call, I've tried using the class dep graph produced from zinc as reference, while it works, I still need to use the ReachabilityAnalysis to check the hash code (not sure if zinc can do it itself). And after some thoughts, while leaving method call to pass the Task boundary result in a bigger call graph, in the end we only check if the method's class name contains the module test classes discovered from discoverTestClasses, so the test classes that need to be run will never more than the normal testForked. So I guess leave it as it is for now is also good enough.

If the code is good to go, I'll proceed with writing test cases

lihaoyi · 2025-03-24T15:51:58Z

Thanks @HollandDM ! will take a look

lihaoyi · 2025-03-25T20:34:43Z

I think the code looks reasonable, at least as I would expect.

However I would like to push a bit more on the design of the feature. How does Zinc implement testQuick? Presumably they must have made similar tradeoffs as we are considering here. We should find out what approach they chose and why they chose it, so we can make a better decision rather than just arbitrarily picking a design on our own.

HollandDM · 2025-03-26T08:46:10Z

Not sure what zinc's testQuick is, I don't think zinc has that, but sbt does have testQuick command.
If going zinc way, I think we can use the output analysis of previous compile and current compile, compare there stamps to extract changes classes (products), then use the current analysis again to get all relations, and do something similar to spanning forest on the graph.
Comparing zinc stamp meaning we'll compare mostly the hash of the whole file content.

add testQuick command

Loading
Loading status checks…

b292fb3

lihaoyi reviewed Mar 17, 2025

View reviewed changes

HollandDM force-pushed the test-quick-command branch from e32c777 to 353efb6 Compare March 17, 2025 06:40

HollandDM closed this Mar 17, 2025

HollandDM reopened this Mar 17, 2025

HollandDM commented Mar 17, 2025

View reviewed changes

HollandDM requested a review from lihaoyi March 17, 2025 06:44

update

Loading
Loading status checks…

d5e5433

HollandDM force-pushed the test-quick-command branch from 8e48d40 to d5e5433 Compare March 17, 2025 08:57

update 2

Loading
Loading status checks…

cce94ef

HollandDM force-pushed the test-quick-command branch from 19ecabf to cce94ef Compare March 20, 2025 09:17

lihaoyi reviewed Mar 22, 2025

View reviewed changes

update 3

Loading
Loading status checks…

e8b5640

HollandDM force-pushed the test-quick-command branch from 91ca3f2 to e8b5640 Compare March 24, 2025 15:29

HollandDM requested a review from lihaoyi March 24, 2025 15:30

[autofix.ci] apply automated fixes

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

d80d8ea

HollandDM mentioned this pull request Mar 27, 2025

Zinc based testQuick command #4787

Draft

HollandDM changed the title ~~Add testQuick command~~ Method call analysis based testQuick command Mar 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Method call analysis based `testQuick` command #4731

Method call analysis based `testQuick` command #4731

HollandDM commented Mar 15, 2025 •

edited

Loading

lihaoyi Mar 17, 2025

lihaoyi commented Mar 17, 2025

lihaoyi Mar 17, 2025

lihaoyi commented Mar 17, 2025 •

edited

Loading

HollandDM commented Mar 17, 2025 •

edited

Loading

lihaoyi commented Mar 17, 2025

lihaoyi commented Mar 17, 2025

HollandDM Mar 17, 2025

lihaoyi Mar 17, 2025

HollandDM Mar 17, 2025

lihaoyi commented Mar 17, 2025

HollandDM commented Mar 18, 2025 •

edited

Loading

lihaoyi commented Mar 18, 2025

HollandDM commented Mar 20, 2025 •

edited

Loading

lihaoyi Mar 22, 2025

HollandDM Mar 23, 2025

lihaoyi Mar 22, 2025

lihaoyi Mar 22, 2025

HollandDM Mar 22, 2025

lihaoyi Mar 22, 2025

lihaoyi Mar 22, 2025

HollandDM Mar 22, 2025

lihaoyi Mar 22, 2025

HollandDM Mar 22, 2025

lihaoyi Mar 22, 2025

HollandDM Mar 22, 2025

lihaoyi commented Mar 22, 2025

HollandDM commented Mar 24, 2025

lihaoyi commented Mar 24, 2025

lihaoyi commented Mar 25, 2025

HollandDM commented Mar 26, 2025

		logger.mandatoryLog(callAnalysis.methodCodeHashes)
		logger.mandatoryLog(callAnalysis.prettyCallGraph)


		import scala.collection.mutable

		private[scalalib] final class PrefixTrie[A](using CanEqual[A, A]) {

Method call analysis based testQuick command #4731

Are you sure you want to change the base?

Method call analysis based testQuick command #4731

Conversation

HollandDM commented Mar 15, 2025 • edited Loading

Choose a reason for hiding this comment

lihaoyi commented Mar 17, 2025

Choose a reason for hiding this comment

lihaoyi commented Mar 17, 2025 • edited Loading

HollandDM commented Mar 17, 2025 • edited Loading

lihaoyi commented Mar 17, 2025

lihaoyi commented Mar 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lihaoyi commented Mar 17, 2025

HollandDM commented Mar 18, 2025 • edited Loading

lihaoyi commented Mar 18, 2025

HollandDM commented Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lihaoyi commented Mar 22, 2025

HollandDM commented Mar 24, 2025

lihaoyi commented Mar 24, 2025

lihaoyi commented Mar 25, 2025

HollandDM commented Mar 26, 2025

Method call analysis based `testQuick` command #4731

Method call analysis based `testQuick` command #4731

HollandDM commented Mar 15, 2025 •

edited

Loading

lihaoyi commented Mar 17, 2025 •

edited

Loading

HollandDM commented Mar 17, 2025 •

edited

Loading

HollandDM commented Mar 18, 2025 •

edited

Loading

HollandDM commented Mar 20, 2025 •

edited

Loading