You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor: WIP. Module name to filepath optimisation
This is related to #4598.
This changes the file to module associating logic done during dependency
graph building.
Before, each time a module `Foo.Bar` is found, HLS is testing inside all
the import path for the existence of a relevant fiel.. It means that for
`i` import paths and `m` modules to locate, `m * n` filesystem
operations are done. Note also that this involves a lot of complex
string concatenation primitive to build the `FilePath`.
A module is tested for each `import` for each of the file of the
project. We also test for `boot` files, doubling the number of test.
In #4598 we have a project with `1100` modules, in more than 250 import
paths and we count more than `17000` `import` statments, resulting on
over 6 millions test for file existences. This project was blocking for
more than 3 minutes during HLS startup.
This commit changes the way this is computed:
- At startup, a `Map ModuleName FilePath` (the real type is a bit more
involved for performance, multiples unit and boot files handling) is
built by scanning all the import paths for files representing the
different modules.
- Directory scanning is efficient and if import path only contains
haskell module, this will never do more job that listing the files of
the project.
- The lookup is now simplify a `Map` lookup.
The performance improvement is as follows:
- The number of IO operation is dramatically reduced, from multiples
millions to a few recursive directories listing.
- A lot of the boilerplate of converting path had be removed.
- TODO: add an RTS stats before / after with number of allocations
- On my project, the graph building time is reduced from a few minutes
to 3s.
Limitations:
- How to rebuild the `Map` if the content of one directory change?
- If one directory is filled with millions of files which are not of
interested, performance can be damaged. TODO: add a diagnostic during
this phase so the user can learn about this issue.
Code status:
- The `lookup` is not fully restored, especially it does not include the
handling of home unit as well as reexport.
- The initialisation phase is cached inside a `TVar` stored as a top
level identifier using `unsafePerformIO`. This is to be improved.
0 commit comments