Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method call analysis based testQuick command #4731

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

HollandDM
Copy link
Contributor

@HollandDM HollandDM commented Mar 15, 2025

Add testQuick command that leverage CallGraphAnalysis.

This pull request introduces the testQuick command, which utilizes CallGraphAnalysis. The following changes have been implemented:

  • The outputs of CallGraphAnalysis now include invalidClassNames, which contains the names of test classes extracted from the spanningInvalidationTree. This list indicates the classes that are affected by code changes.
  • A callGraphAnalysis task has been added to JavaModule. This is a persistent task that, when executed, generates a callGraphAnalysis folder containing data related to the CallGraphAnalysis feature.
  • The codeSignatures of MillBuildRootModule have been updated to incorporate the results from callGraphAnalysis.
  • The testQuick command has been added to JavaTests. It uses the invalidClassNames output from callGraphAnalysis and a dedicated log file, quickTestFailedClasses.log, which stores the failed test classes from previous runs. testQuick combines the information from these two sources to determine the set of test classes to run.

/claim #4109

see also #4787 for Zinc based design

Comment on lines 1504 to 1521
def isForwarderCallsiteOrLambda =
callSiteOpt.nonEmpty && {
val callSiteSig = callSiteOpt.get.sig

(callSiteSig.name == (calledSig.name + "$") &&
callSiteSig.static &&
callSiteSig.desc.args.size == 1)
|| (
// In Scala 3, lambdas are implemented by private instance methods,
// not static methods, so they fall through the crack of "isSimpleTarget".
// Here make the assumption that a zero-arg lambda called from a simpleTarget,
// should in fact be tracked. e.g. see `integration.invalidation[codesig-hello]`,
// where the body of the `def foo` target is a zero-arg lambda i.e. the argument
// of `Cacher.cachedTarget`.
// To be more precise I think ideally we should capture more information in the signature
isSimpleTarget(callSiteSig.desc) && calledSig.name.contains("$anonfun")
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other special cases except this one are heuristics unique to MillBuildRootModule, and so probably should live there. We can have a def codeSigIgnoreCall method that gets fed into CodeSig.compute, with a default implementation for JavaModule and a more detailed override for MillBuildRootModule

@lihaoyi
Copy link
Member

lihaoyi commented Mar 17, 2025

One limitation of the current implementation is after the first time testQuick is run (test all classes), callGraphAnalysis will only create the current folder. If we run testQuick again, because callGraphAnalysis's inputs haven't changed, it will not run provide previous folder. In that case testQuick will not have enough information and have to run all the test classes again.

I don't understand the problem are describing here, could you go into more detail

@@ -92,6 +95,95 @@ trait JavaModule
case _: ClassNotFoundException => // if we can't find the classes, we certainly are not in a ScalaJSModule
}
}

private def quickTest(args: Seq[String]): Task[(String, Seq[TestResult])] =
Task(persistent = true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task(persistent = true) is probably the wrong thing here, since def quickTest takes arguments. We should add a Command(persistent = true) flag and keep it as a command similar to the other test tasks

@lihaoyi
Copy link
Member

lihaoyi commented Mar 17, 2025

The outputs of CallGraphAnalysis now include invalidClassNames, which contains the names of test classes extracted from the spanningInvalidationTree. This list indicates the classes that are affected by code changes.

This seems like a post-processing step, could we put it in a separate method that takes the output of spanningInvalidationTree and parses out the parts we need? Or could it be a different helper method that works directly on prevTransitiveCallGraphHashes and transitiveCallGraphHashes0, so we don't need to construct a spanningInvalidationTree at all?

@HollandDM
Copy link
Contributor Author

HollandDM commented Mar 17, 2025

One limitation of the current implementation is after the first time testQuick is run (test all classes), callGraphAnalysis will only create the current folder. If we run testQuick again, because callGraphAnalysis's inputs haven't changed, it will not run provide previous folder. In that case testQuick will not have enough information and have to run all the test classes again.

I don't understand the problem are describing here, could you go into more detail

Here is the reproduce steps:

  • rm -rf out to clean the output folder.
  • run scalalib.callGraphAnalysis for the first time.
  • run scalalib.callGraphAnalysis again.

In this case, the first callGraphAnalysis will run normally and produce current folder, but because we din't modify any source code, the second callGraphAnalysis will do nothing (I guess because it's input, compile() doesn't change). Therefore previous folder will not be produced.
The problem is that this PR will check for previous folder, and it will consider testQuick to be a first time command when it doesn't find one. So with this case especially, invoke testQuick consecutively will run all test cases like normal testForked.

=================================

The outputs of CallGraphAnalysis now include invalidClassNames, which contains the names of test classes extracted from the spanningInvalidationTree. This list indicates the classes that are affected by code changes.

This seems like a post-processing step, could we put it in a separate method that takes the output of spanningInvalidationTree and parses out the parts we need? Or could it be a different helper method that works directly on prevTransitiveCallGraphHashes and transitiveCallGraphHashes0, so we don't need to construct a spanningInvalidationTree at all?

I also consider post process this at first, but the output of pre-jsonifiedspanningInvalidationTree is too easy to work with, so I went with this approach. I will update this into a separate post-processing step

@lihaoyi
Copy link
Member

lihaoyi commented Mar 17, 2025

@HollandDM the problem with callGraphAnalysis caching should be fixed by making testQuick a command. That way it will run every time regardless of changes, and then can decide what to do based on whether or not tue upstream tasks changed

@lihaoyi
Copy link
Member

lihaoyi commented Mar 17, 2025

I also consider post process this at first, but the output of pre-jsonifiedspanningInvalidationTree is too easy to work with, so I went with this approach. I will update this into a separate post-processing step

Let's split it into three methods then: one shared one, one that uses the shared helper to generate the tree, and one that uses the shared helper to generate a list of affected classes

@HollandDM HollandDM force-pushed the test-quick-command branch from e32c777 to 353efb6 Compare March 17, 2025 06:40
@HollandDM HollandDM closed this Mar 17, 2025
@HollandDM HollandDM reopened this Mar 17, 2025
val jsonValueQueue = mutable.ArrayDeque[(String, ujson.Value)]()
val prefixTrie = new PrefixTrie[String]()
jsonValueQueue.appendAll(spanningInvalidationTreeObj.value)
while (jsonValueQueue.nonEmpty) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also consider post process this at first, but the output of pre-jsonifiedspanningInvalidationTree is too easy to work with, so I went with this approach. I will update this into a separate post-processing step

Let's split it into three methods then: one shared one, one that uses the shared helper to generate the tree, and one that uses the shared helper to generate a list of affected classes

I went with this approach first, as I think leaving callGraphAnalysis folder as it is right now is quite good. This pre-processing seem quick enough when I run scalalib.test.testQuick, so I think it is fine. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, I think splitting it into three methods is better. The various string operations necessary to do the post-processing look pretty fragile, and while it no doubt works, it probably will be harder to maintain and refactor in future compared to working with typed data structures

/**
* This version allow [[Command]] to be persistent
*/
inline def Command[T](inline persistent: Boolean)(inline t: Result[T])(implicit
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a Command with persistent option.

@HollandDM HollandDM requested a review from lihaoyi March 17, 2025 06:44
@HollandDM HollandDM force-pushed the test-quick-command branch from 8e48d40 to d5e5433 Compare March 17, 2025 08:57
@lihaoyi
Copy link
Member

lihaoyi commented Mar 17, 2025

I think the approach looks reasonable. Questions I have, more on the semantics and expected behavior rather than the code:

  1. I assume you tested that this works for changes in the test code, but does this work for code changes in the non-test module code? Or upstream modules? The callgraph analysis we do for MillBuildRootModule is focused on one module, though it should in theory work for multi-module codebases too if given all the necessary classfiles

  2. What kind of performance cost does this incur when used on various .test modules in the Mill codebase? You can run ./mill dist.installLocal && ci/patch-mill-bootstrap.sh to get an executable you can try out locally

  3. Are there any major edge cases that we should be aware of that this doesn't work well?

Eventually all this should probably make its way into a unit/integration test suite, but for now manual testing is fine just to figure out what the boundaries and limitations of this featue

@HollandDM
Copy link
Contributor Author

HollandDM commented Mar 18, 2025

  1. It seems that changes from upstream modules does not trigger the test. The main reason is that call graph analysis does not compute necessary data for it.
  2. Performance cost is very little, the computation for which tests to run is small compare to the test running time.
  3. Aside from 1, for now I'm not aware of any case that this does not work yet.

@lihaoyi
Copy link
Member

lihaoyi commented Mar 18, 2025

The callgraph analysis should be able to handle cross-module analysis if all the classfiles are available on disk; you should be able to aggregate the transitiveLocalClasspath and pass it to CodeSig and it should just work (in theory)

@HollandDM HollandDM force-pushed the test-quick-command branch from 19ecabf to cce94ef Compare March 20, 2025 09:17
@HollandDM
Copy link
Contributor Author

HollandDM commented Mar 20, 2025

I've update the code to include all upstream module while running the testQuick command. Also clean up some shared logic.
I've also run some local test, e.g: if I update UnitTester.scala, then run scalalib.test.testQuick, all test cases that use UnitTester will re-run the test. This ensure cross module is working.

@@ -47,7 +48,8 @@ object ExternalSummary {

def load(cls: JCls): Unit = methodsPerCls.getOrElse(cls, load0(cls))

def load0(cls: JCls): Unit = {
// Some macros implementations will fail the ClassReader, we can skip them
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of error do those macro implementations produce? We should try and be specific about what errors we catch here, to avoid silencing unexpected errors that may indicate real issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run core.define.compile and then check class/mill/define/Cross$Factory$.class in out folder, you will see something like this

// Source code is decompiled from a .class file using FernFlower decompiler.
package mill.define;

import java.io.Serializable;
import mill.define.internal.CrossMacros;
import mill.define.internal.CrossMacros.;
import scala.runtime.ModuleSerializationProxy;

public final class Cross$Factory$ implements Serializable {
   public static final Cross$Factory$ MODULE$ = new Cross$Factory$();

   public Cross$Factory$() {
   }

   private Object writeReplace() {
      return new ModuleSerializationProxy(Cross$Factory$.class);
   }

   public CrossMacros inline$CrossMacros$i1(final internal x$0) {
      return .MODULE$;
   }
}

The retrieved call from asm yield something like this:
def mill.define.Cross$Factory$#inline$CrossMacros$i1(mill.define.internal)mill.define.internal

As you can see mill/define/internal is not a valid class, so class loader will throw when trying to load this up.

I don't know why the code is like this, I will try to reproduce this and look around for clues

}

logger.mandatoryLog(spanningInvalidationTree)
def calculateInvalidClassName(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should call this calculateInvalidatedClassName to avoid confusion

Comment on lines +51 to +52
logger.mandatoryLog(callAnalysis.methodCodeHashes)
logger.mandatoryLog(callAnalysis.prettyCallGraph)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two mandatoryLogs necessary? It seems we only use transitiveCallGraphHashes and spanningInvalidationTree, at least as far as I can tell

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no ideal either, I put them here to mimic the original code, which log everything like this

/**
* Get all class names that have their hashcode changed compared to prevTransitiveCallGraphHashes
*/
def invalidClassNames(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalidatedClassNames


import scala.collection.mutable

private[scalalib] final class PrefixTrie[A](using CanEqual[A, A]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I must have missed it

Comment on lines 1499 to 1533
// We can ignore all calls to methods that look like Targets when traversing
// the call graph. We can do this because we assume `def` Targets are pure,
// and so any changes in their behavior will be picked up by the runtime build
// graph evaluator without needing to be accounted for in the post-compile
// bytecode callgraph analysis.
def isSimpleTarget(desc: mill.codesig.JvmModel.Desc) =
(desc.ret.pretty == classOf[mill.define.Target[?]].getName ||
desc.ret.pretty == classOf[mill.define.Worker[?]].getName) &&
desc.args.isEmpty

// We avoid ignoring method calls that are simple trait forwarders, because
// we need the trait forwarders calls to be counted in order to wire up the
// method definition that a Target is associated with during evaluation
// (e.g. `myModuleObject.myTarget`) with its implementation that may be defined
// somewhere else (e.g. `trait MyModuleTrait{ def myTarget }`). Only that one
// step is necessary, after that the runtime build graph invalidation logic can
// take over
def isForwarderCallsiteOrLambda =
callSiteOpt.nonEmpty && {
val callSiteSig = callSiteOpt.get.sig

(callSiteSig.name == (calledSig.name + "$") &&
callSiteSig.static &&
callSiteSig.desc.args.size == 1)
|| (
// In Scala 3, lambdas are implemented by private instance methods,
// not static methods, so they fall through the crack of "isSimpleTarget".
// Here make the assumption that a zero-arg lambda called from a simpleTarget,
// should in fact be tracked. e.g. see `integration.invalidation[codesig-hello]`,
// where the body of the `def foo` target is a zero-arg lambda i.e. the argument
// of `Cacher.cachedTarget`.
// To be more precise I think ideally we should capture more information in the signature
isSimpleTarget(callSiteSig.desc) && calledSig.name.contains("$anonfun")
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be in the MillBuildRootModule override method I think; they look specific to Mill's own "ignore Task-returning methods" heuristic that won't apply to normal code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow analysis to go through Task-liked classes/objects, then by design it will spread into all tasks. In mill project there is tests that create a custom module of Test Module (TestModuleTestUtils, etc...). So in this case it will trigger every tasks.
Should we keep it like this, or allow testQuick to trigger tasks like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that this problem exists for other interfaces as well. For Mill and Mill plugins the heuristic is to ignore Task, but for other projects with other base abstractions the heuristic would need to be something different to be useful. Let's think if we can come up with a more general heuristic that isn't hardcoded to depend on Mill-specific types like Task, using the data we already have

What if rather than using a method-based callgraph, we did the analysis on a more coarse grained class-based dependency graph? This is something we can't really do for invalidating tasks since tasks are individual methods, but since test selection is at a granularity of classes that might work. Presumably if ClassA never directly or transitively references ClassB, and ClassA doesn't take any constructor parameters (since it's a test suite class and those usually have zero parameters) then changes to ClassB should never affect ClassA? This feels like an approach that could either replace the existing method-level approach, or be stacked on top as an additional heuristic (e.g. only invalidate a test or task if both the method-level codesig and the class-level codesig both change)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to analyze method call to identify which classes depend on which classes, and with the nature of virtual invoke, we will still comeback to the case where call method in traits being translated to calling every concrete classes of that trait, right?

@lihaoyi
Copy link
Member

lihaoyi commented Mar 22, 2025

Left some comments. Let's also add some unit tests in scalalib.test to exercise this logic, next to the various TestRunnerTests that exercise the rest of the testing logic:

  1. single-module Java project
  2. single-module Scala project
  3. multi-module Java project
  4. multi-module Scala project

Since we don't need to make any changes to the build.mill to exercise this logic, unit tests are fine and we don't need an integration test.

@HollandDM HollandDM force-pushed the test-quick-command branch from 91ca3f2 to e8b5640 Compare March 24, 2025 15:29
@HollandDM HollandDM requested a review from lihaoyi March 24, 2025 15:30
@HollandDM
Copy link
Contributor Author

@lihaoyi I've update the code to address the comments and answer some questions.

For using class based graph instead of method call, I've tried using the class dep graph produced from zinc as reference, while it works, I still need to use the ReachabilityAnalysis to check the hash code (not sure if zinc can do it itself). And after some thoughts, while leaving method call to pass the Task boundary result in a bigger call graph, in the end we only check if the method's class name contains the module test classes discovered from discoverTestClasses, so the test classes that need to be run will never more than the normal testForked. So I guess leave it as it is for now is also good enough.

If the code is good to go, I'll proceed with writing test cases

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@lihaoyi
Copy link
Member

lihaoyi commented Mar 24, 2025

Thanks @HollandDM ! will take a look

@lihaoyi
Copy link
Member

lihaoyi commented Mar 25, 2025

I think the code looks reasonable, at least as I would expect.

However I would like to push a bit more on the design of the feature. How does Zinc implement testQuick? Presumably they must have made similar tradeoffs as we are considering here. We should find out what approach they chose and why they chose it, so we can make a better decision rather than just arbitrarily picking a design on our own.

@HollandDM
Copy link
Contributor Author

Not sure what zinc's testQuick is, I don't think zinc has that, but sbt does have testQuick command.
If going zinc way, I think we can use the output analysis of previous compile and current compile, compare there stamps to extract changes classes (products), then use the current analysis again to get all relations, and do something similar to spanning forest on the graph.
Comparing zinc stamp meaning we'll compare mostly the hash of the whole file content.

@HollandDM HollandDM changed the title Add testQuick command Method call analysis based testQuick command Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants