Skip to content

Compute Partial module graph fingerprints #4594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 3, 2025
Merged

Compute Partial module graph fingerprints #4594

merged 12 commits into from
Jun 3, 2025

Conversation

soulomoon
Copy link
Collaborator

@soulomoon soulomoon commented May 19, 2025

Just follow implementation from #4443 (comment) @guibou very good bisect and findings

@soulomoon soulomoon added the performance Issues about memory consumption, responsiveness, etc. label May 19, 2025
@guibou
Copy link
Collaborator

guibou commented May 20, 2025

Thank you for working on that topic.

I applied the patch and did not observe much improvement in my work codebase (e.g. any added import triggers a long, 30s-1minute rebuild).

I will have a look a bit later during the day with the "simple" examples that I had provided in #4443, maybe this actually fix / improves them and maybe there is another issue with my codebase.

@guibou
Copy link
Collaborator

guibou commented May 24, 2025

@soulomoon Thank you for this code.

I've tested it with the replication depicted in #4443 (comment)

(after rebasing the replication code on the commit just before this MR).

What I can observe:

With commit before this MR:

  • Opening Lib2.hs: diagnostics are available after 10s
  • Adding/removing the module: diagnostics are there immediately
  • Then, in the same editor session, opening MyLib.hs: rebuilds take 10s
  • Adding/removing the module: diagnostics takes 10s to come back

With this commit from this MR:

  • Opening Lib2.hs: diagnostics are available immediately! I'm actually kinda surprised by that. Is there some sort of cache that HLS may exploit when restarting? I would have assumed that in order to provide diagnostics for Lib2.hs, the initial startup would have required to typecheck MyLib.hs at least once. Anyway, that's great!
  • Adding/removing the module: diagnostics are there immediately (as before)
  • Then, in the same editor session, opening MyLib.hs: rebuild for MyLib.hs takes 10s. This is expected and interesting, it means that to typecheck / diag Lib2.hs (which depends on MyLib.hs) it does not need to fully typecheck MyLib.hs (somethig is in cache, or typecheck is partial). But in cannot use "the cache" when typechecking MyLib.hs. Anyway, I'm ok with this result.
  • Adding/removing the module in Lib2.hs: diagnostics are back immediately.

This is an awesome success!

I unfortunately do not understand the implementation so I cannot comment more than that, but this is awesome and should be merged ASAP ;) @fendor maybe you can have a look.

EDIT: I've cleaned my .cache/ghcide AND when reopening Lib2.hs it initially took like 15s to typecheck. However after closing and reopening, it is back to instant. So it means that there is indeed a cache and that this cache is working great and that your fix actually fix the cache usage too when starting hls. I'm really excited because on all project I had worked with HLS, I had never ever imagined that there was a cache at startup (I always observe startup time of something like many minutes) and now I'm starting to dream about super fast HLS startup time.

@soulomoon
Copy link
Collaborator Author

soulomoon commented May 24, 2025

Thank you so much for your detailed feedback. I am so happy that the fix worked !

For this PR, all the compilation rules that use GetModuleGraph rule switched to use GetFileModuleGraph file for each file, now they only depend on ModSummary fingerprints for the file and its depencies. (Originally they depend on the whole build graph's ModSummaries' fingerprints)

your fix actually fix the cache usage too when starting hls

Interesting, it seems to fix some thing more than I expected.

I just discovered that I forget to update the rule needsCompilationRule to use the new GetFileModuleGraph in my previous commits , now I changed that one too. Perhaps it would help with your @guibou codebase.

  • One concern for this PR is that we have GetFileModuleGraph for each file, it would increase the memory usage since their results are all cached in shake. One improvement could be, do not return DependencyInformation directly, but use a IOref or something to share DependencyInformation for all the GetFileModuleGraph's result. But we might want to measure how much more mem the current approach would take to see if it worth changing it to IOref.
  • The other concern is that ReverseModuleDeps might not be up to date. Possible fix, single out a new rule to compute ReverseModuleDeps for a file, so our rule system can keep it up to date.

@guibou
Copy link
Collaborator

guibou commented May 25, 2025

@soulomoon regarding your concern about memory usage, I've discussed that a bit in #4598 and I'm sure we can shrink the memory usage of the dependency graph by a factor 10 at a minimum.

I'll try to work on that asap.

Interesting, it seems to fix some thing more than I expected.

It does make sense, I suppose (and hope) that modules with template haskell do have a ModSummary which is always invalidated. So your change does not consider irrelevant changes in these modules.

Copy link
Collaborator

@wz1000 wz1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look very promising, thank you!

getModuleGraphSingleFileRule recorder =
defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetFileModuleGraph file -> do
di <- useNoFile_ GetModuleGraph
return (fingerprintToBS <$> lookupFingerprint file di, ([], Just di))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for correctness reasons we might need to restrict di to only include keys in the downward dependency closure. To avoid blowing up memory usage by destroying sharing, perhaps the valid keys (downward dependency closure) should be stored in a seperate field that can be updated independent of the full Module Graph structure, and lookup operations should consult this field to check if the key to lookup is valid.

@soulomoon
Copy link
Collaborator Author

soulomoon commented May 28, 2025

Thank you for the review @wz1000.

To avoid destroying sharing, following your idea, we can do the following.
That is we rebuild on the fingerKey but query the GetModuleGraph for the actual module graph.
Where I build three fingerKeys (GetModuleGraphTransDepsFingerprints, GetModuleGraphTransReverseDepsFingerprints, GetModuleGraphImmediateReverseDepsFingerprints) and use only one of them in each location depending on what we care about at the point.

useWithSeparateFingerprintRule
    :: (IdeRule k v, IdeRule k1 Fingerprint)
    => k1 -> k -> NormalizedFilePath -> Action (Maybe v)
useWithSeparateFingerprintRule fingerKey key file = do
    _ <- use_ fingerKey file
    useWithoutDependency key emptyFilePath

@soulomoon soulomoon marked this pull request as ready for review May 28, 2025 19:40
@soulomoon
Copy link
Collaborator Author

soulomoon commented Jun 1, 2025

I think for correctness reasons we might need to restrict di to only include keys in the downward dependency closure.

Since we are not restrict it before, it should fine if we are not doing it right away. But indeed it might be a good idea to implement the restriction to ensure the correctness. Also I am a bit worried about the performance impact doing the restriction. Givem the above, I'll open another PR to do it.

@soulomoon soulomoon merged commit 997a426 into master Jun 3, 2025
22 checks passed
Comment on lines +1254 to +1265
defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetModuleGraphTransDepsFingerprints file -> do
di <- useNoFile_ GetModuleGraph
let finger = lookupFingerprint file di (depTransDepsFingerprints di)
return (fingerprintToBS <$> finger, ([], finger))
defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetModuleGraphTransReverseDepsFingerprints file -> do
di <- useNoFile_ GetModuleGraph
let finger = lookupFingerprint file di (depTransReverseDepsFingerprints di)
return (fingerprintToBS <$> finger, ([], finger))
defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetModuleGraphImmediateReverseDepsFingerprints file -> do
di <- useNoFile_ GetModuleGraph
let finger = lookupFingerprint file di (depImmediateReverseDepsFingerprints di)
return (fingerprintToBS <$> finger, ([], finger))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future, I think this should be a function moduleGraphRules

@soulomoon soulomoon changed the title compute deps fingerprint only Compute Partial module graph fingerprints Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues about memory consumption, responsiveness, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants