Skip to content

Commit ab06e51

Browse files
committed
refactor: WIP. Module name to filepath optimisation
This is related to #4598. This changes the file to module associating logic done during dependency graph building. Before, each time a module `Foo.Bar` is found, HLS is testing inside all the import path for the existence of a relevant fiel.. It means that for `i` import paths and `m` modules to locate, `m * n` filesystem operations are done. Note also that this involves a lot of complex string concatenation primitive to build the `FilePath`. A module is tested for each `import` for each of the file of the project. We also test for `boot` files, doubling the number of test. In #4598 we have a project with `1100` modules, in more than 250 import paths and we count more than `17000` `import` statments, resulting on over 6 millions test for file existences. This project was blocking for more than 3 minutes during HLS startup. This commit changes the way this is computed: - At startup, a `Map ModuleName FilePath` (the real type is a bit more involved for performance, multiples unit and boot files handling) is built by scanning all the import paths for files representing the different modules. - Directory scanning is efficient and if import path only contains haskell module, this will never do more job that listing the files of the project. - The lookup is now simplify a `Map` lookup. The performance improvement is as follows: - The number of IO operation is dramatically reduced, from multiples millions to a few recursive directories listing. - A lot of the boilerplate of converting path had be removed. - TODO: add an RTS stats before / after with number of allocations - On my project, the graph building time is reduced from a few minutes to 3s. Limitations: - How to rebuild the `Map` if the content of one directory change? - If one directory is filled with millions of files which are not of interested, performance can be damaged. TODO: add a diagnostic during this phase so the user can learn about this issue. Code status: - The `lookup` is not fully restored, especially it does not include the handling of home unit as well as reexport. - The initialisation phase is cached inside a `TVar` stored as a top level identifier using `unsafePerformIO`. This is to be improved.
1 parent 321e12e commit ab06e51

File tree

5 files changed

+96
-16
lines changed

5 files changed

+96
-16
lines changed

ghcide/ghcide.cabal

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ library
107107
, unliftio-core
108108
, unordered-containers >=0.2.10.0
109109
, vector
110+
, pretty-simple
110111

111112
if os(windows)
112113
build-depends: Win32

ghcide/src/Development/IDE/Core/RuleTypes.hs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -392,6 +392,9 @@ type instance RuleResult GetModSummary = ModSummaryResult
392392
-- | Generate a ModSummary with the timestamps and preprocessed content elided, for more successful early cutoff
393393
type instance RuleResult GetModSummaryWithoutTimestamps = ModSummaryResult
394394

395+
type instance RuleResult GetModulesPaths = (M.Map ModuleName (UnitId, NormalizedFilePath),
396+
M.Map ModuleName (UnitId, NormalizedFilePath))
397+
395398
data GetParsedModule = GetParsedModule
396399
deriving (Eq, Show, Generic)
397400
instance Hashable GetParsedModule
@@ -494,6 +497,13 @@ data GetModSummaryWithoutTimestamps = GetModSummaryWithoutTimestamps
494497
instance Hashable GetModSummaryWithoutTimestamps
495498
instance NFData GetModSummaryWithoutTimestamps
496499

500+
-- | Scan all the import directory for existing modules and build a map from
501+
-- module name to paths
502+
data GetModulesPaths = GetModulesPaths
503+
deriving (Eq, Show, Generic)
504+
instance Hashable GetModulesPaths
505+
instance NFData GetModulesPaths
506+
497507
data GetModSummary = GetModSummary
498508
deriving (Eq, Show, Generic)
499509
instance Hashable GetModSummary

ghcide/src/Development/IDE/Core/Rules.hs

Lines changed: 55 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
{-# LANGUAGE CPP #-}
55
{-# LANGUAGE DuplicateRecordFields #-}
66
{-# LANGUAGE TypeFamilies #-}
7+
{-# LANGUAGE PartialTypeSignatures #-}
78

89
-- | A Shake implementation of the compiler service, built
910
-- using the "Shaker" abstraction layer for in-memory use.
@@ -93,7 +94,7 @@ import Data.Proxy
9394
import qualified Data.Text as T
9495
import qualified Data.Text.Encoding as T
9596
import qualified Data.Text.Utf16.Rope.Mixed as Rope
96-
import Data.Time (UTCTime (..))
97+
import Data.Time (UTCTime (..), getCurrentTime, diffUTCTime)
9798
import Data.Time.Clock.POSIX (posixSecondsToUTCTime)
9899
import Data.Tuple.Extra
99100
import Data.Typeable (cast)
@@ -173,6 +174,12 @@ import System.Info.Extra (isWindows)
173174

174175
import qualified Data.IntMap as IM
175176
import GHC.Fingerprint
177+
import Text.Pretty.Simple
178+
import qualified Data.Map.Strict as Map
179+
import System.FilePath (takeExtension, takeFileName, normalise, dropTrailingPathSeparator, dropExtension, splitDirectories)
180+
import Data.Char (isUpper)
181+
import System.Directory.Extra (listFilesRecursive, listFilesInside)
182+
import System.IO.Unsafe
176183

177184
data Log
178185
= LogShake Shake.Log
@@ -311,6 +318,7 @@ getParsedModuleDefinition packageState opt file ms = do
311318
getLocatedImportsRule :: Recorder (WithPriority Log) -> Rules ()
312319
getLocatedImportsRule recorder =
313320
define (cmapWithPrio LogShake recorder) $ \GetLocatedImports file -> do
321+
314322
ModSummaryResult{msrModSummary = ms} <- use_ GetModSummaryWithoutTimestamps file
315323
(KnownTargets targets targetsMap) <- useNoFile_ GetKnownTargets
316324
let imports = [(False, imp) | imp <- ms_textual_imps ms] ++ [(True, imp) | imp <- ms_srcimps ms]
@@ -333,8 +341,11 @@ getLocatedImportsRule recorder =
333341
| otherwise = do
334342
itExists <- getFileExists nfp
335343
return $ if itExists then Just nfp else Nothing
344+
345+
moduleMaps <- use_ GetModulesPaths file
336346
(diags, imports') <- fmap unzip $ forM imports $ \(isSource, (mbPkgName, modName)) -> do
337-
diagOrImp <- locateModule (hscSetFlags dflags env) import_dirs (optExtensions opt) getTargetFor modName mbPkgName isSource
347+
348+
diagOrImp <- locateModule moduleMaps (hscSetFlags dflags env) import_dirs (optExtensions opt) getTargetFor modName mbPkgName isSource
338349
case diagOrImp of
339350
Left diags -> pure (diags, Just (modName, Nothing))
340351
Right (FileImport path) -> pure ([], Just (modName, Just path))
@@ -624,6 +635,43 @@ getModuleGraphRule recorder = defineEarlyCutOffNoFile (cmapWithPrio LogShake rec
624635
fs <- toKnownFiles <$> useNoFile_ GetKnownTargets
625636
dependencyInfoForFiles (HashSet.toList fs)
626637

638+
{-# NOINLINE cacheVar #-}
639+
cacheVar = unsafePerformIO (newTVarIO mempty)
640+
641+
getModulesPathsRule :: Recorder (WithPriority Log) -> Rules ()
642+
getModulesPathsRule recorder = defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetModulesPaths file -> do
643+
env_eq <- use_ GhcSession file
644+
645+
cache <- liftIO (readTVarIO cacheVar)
646+
case Map.lookup (envUnique env_eq) cache of
647+
Just res -> pure (mempty, ([], Just res))
648+
Nothing -> do
649+
let env = hscEnv env_eq
650+
let import_dirs = map (second homeUnitEnv_dflags) $ hugElts $ hsc_HUG env
651+
opt <- getIdeOptions
652+
let exts = (optExtensions opt)
653+
let acceptedExtensions = concatMap (\x -> ['.':x, '.':x <> "-boot"]) exts
654+
655+
(unzip -> (a, b)) <- flip mapM import_dirs $ \(u, dyn) -> do
656+
(unzip -> (a, b)) <- flip mapM (importPaths dyn) $ \dir' -> do
657+
let dir = dropTrailingPathSeparator dir'
658+
let predicate path = pure (path == dir || isUpper (head (takeFileName path)))
659+
let dir_number_directories = length (splitDirectories dir)
660+
let toModule file = mkModuleName (intercalate "." $ drop dir_number_directories (splitDirectories (dropExtension file)))
661+
662+
-- TODO: we are taking/droping extension, this could be factorized to save a few cpu cycles ;)
663+
-- TODO: do acceptedextensions needs to be a set ? or a vector?
664+
modules <- fmap (\path -> (toModule path, toNormalizedFilePath' path)) . filter (\y -> takeExtension y `elem` acceptedExtensions) <$> liftIO (listFilesInside predicate dir)
665+
let isSourceModule (_, path) = "-boot" `isSuffixOf` fromNormalizedFilePath path
666+
let (sourceModules, notSourceModules) = partition isSourceModule modules
667+
pure $ (Map.fromList notSourceModules, Map.fromList sourceModules)
668+
pure (fmap (u,) $ mconcat a, fmap (u, ) $ mconcat b)
669+
670+
let res = (mconcat a, mconcat b)
671+
liftIO $ atomically $ modifyTVar' cacheVar (Map.insert (envUnique env_eq) res)
672+
673+
pure (mempty, ([], Just $ (mconcat a, mconcat b)))
674+
627675
getModuleGraphSingleFileRule :: Recorder (WithPriority Log) -> Rules ()
628676
getModuleGraphSingleFileRule recorder =
629677
defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetFileModuleGraph file -> do
@@ -632,8 +680,12 @@ getModuleGraphSingleFileRule recorder =
632680

633681
dependencyInfoForFiles :: [NormalizedFilePath] -> Action (BS.ByteString, DependencyInformation)
634682
dependencyInfoForFiles fs = do
683+
-- liftIO $ print ("fs length", length fs)
635684
(rawDepInfo, bm) <- rawDependencyInformation fs
685+
-- liftIO $ print ("ok with raw deps")
686+
-- liftIO $ pPrint rawDepInfo
636687
let (all_fs, _all_ids) = unzip $ HM.toList $ pathToIdMap $ rawPathIdMap rawDepInfo
688+
-- liftIO $ print ("all_fs length", length all_fs)
637689
msrs <- uses GetModSummaryWithoutTimestamps all_fs
638690
let mss = map (fmap msrModSummary) msrs
639691
let deps = map (\i -> IM.lookup (getFilePathId i) (rawImports rawDepInfo)) _all_ids
@@ -1232,6 +1284,7 @@ mainRule recorder RulesConfig{..} = do
12321284
getModIfaceRule recorder
12331285
getModSummaryRule templateHaskellWarning recorder
12341286
getModuleGraphRule recorder
1287+
getModulesPathsRule recorder
12351288
getModuleGraphSingleFileRule recorder
12361289
getFileHashRule recorder
12371290
knownFilesRule recorder

ghcide/src/Development/IDE/Import/FindImports.hs

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55

66
module Development.IDE.Import.FindImports
77
( locateModule
8-
, locateModuleFile
98
, Import(..)
109
, ArtifactsLocation(..)
1110
, modSummaryToArtifactsLocation
@@ -14,9 +13,8 @@ module Development.IDE.Import.FindImports
1413
) where
1514

1615
import Control.DeepSeq
17-
import Control.Monad.Extra
1816
import Control.Monad.IO.Class
19-
import Data.List (find, isSuffixOf)
17+
import Data.List (isSuffixOf)
2018
import Data.Maybe
2119
import qualified Data.Set as S
2220
import Development.IDE.GHC.Compat as Compat
@@ -26,7 +24,8 @@ import Development.IDE.Types.Diagnostics
2624
import Development.IDE.Types.Location
2725
import GHC.Types.PkgQual
2826
import GHC.Unit.State
29-
import System.FilePath
27+
import Data.Map.Strict (Map)
28+
import qualified Data.Map.Strict as Map
3029

3130

3231
#if MIN_VERSION_ghc(9,11,0)
@@ -70,6 +69,7 @@ data LocateResult
7069
| LocateFoundReexport UnitId
7170
| LocateFoundFile UnitId NormalizedFilePath
7271

72+
{-
7373
-- | locate a module in the file system. Where we go from *daml to Haskell
7474
locateModuleFile :: MonadIO m
7575
=> [(UnitId, [FilePath], S.Set ModuleName)]
@@ -94,6 +94,7 @@ locateModuleFile import_dirss exts targetFor isSource modName = do
9494
maybeBoot ext
9595
| isSource = ext ++ "-boot"
9696
| otherwise = ext
97+
-}
9798

9899
-- | This function is used to map a package name to a set of import paths.
99100
-- It only returns Just for unit-ids which are possible to import into the
@@ -110,36 +111,47 @@ mkImportDirs _env (i, flags) = Just (i, (importPaths flags, reexportedModules fl
110111
-- Haskell
111112
locateModule
112113
:: MonadIO m
113-
=> HscEnv
114+
=> (Map ModuleName (UnitId, NormalizedFilePath),Map ModuleName (UnitId, NormalizedFilePath))
115+
-> HscEnv
114116
-> [(UnitId, DynFlags)] -- ^ Import directories
115117
-> [String] -- ^ File extensions
116118
-> (ModuleName -> NormalizedFilePath -> m (Maybe NormalizedFilePath)) -- ^ does file exist predicate
117119
-> Located ModuleName -- ^ Module name
118120
-> PkgQual -- ^ Package name
119121
-> Bool -- ^ Is boot module
120122
-> m (Either [FileDiagnostic] Import)
121-
locateModule env comp_info exts targetFor modName mbPkgName isSource = do
123+
locateModule moduleMaps@(moduleMap, moduleMapSource) env comp_info exts targetFor modName mbPkgName isSource = do
122124
case mbPkgName of
123125
-- 'ThisPkg' just means some home module, not the current unit
124126
ThisPkg uid
127+
-- TODO: there are MANY lookup on import_paths, which is a problem considering that it can be large.
125128
| Just (dirs, reexports) <- lookup uid import_paths
126-
-> lookupLocal uid dirs reexports
129+
-> lookupLocal moduleMaps uid dirs reexports
127130
| otherwise -> return $ Left $ notFoundErr env modName $ LookupNotFound []
128131
-- if a package name is given we only go look for a package
129132
OtherPkg uid
130133
| Just (dirs, reexports) <- lookup uid import_paths
131-
-> lookupLocal uid dirs reexports
134+
-> lookupLocal moduleMaps uid dirs reexports
132135
| otherwise -> lookupInPackageDB
133136
NoPkgQual -> do
134137

135138
-- Reexports for current unit have to be empty because they only apply to other units depending on the
136139
-- current unit. If we set the reexports to be the actual reexports then we risk looping forever trying
137140
-- to find the module from the perspective of the current unit.
138-
mbFile <- locateModuleFile ((homeUnitId_ dflags, importPaths dflags, S.empty) : other_imports) exts targetFor isSource $ unLoc modName
141+
---- locateModuleFile ((homeUnitId_ dflags, importPaths dflags, S.empty) : other_imports) exts targetFor isSource $ unLoc modName
142+
--
143+
-- TODO: handle the other imports, the unit id, ..., reexport.
144+
-- - Previous implementation was using homeUnitId dflags
145+
-- - Handle the -boot
146+
-- - Have a look at "targetFor"
147+
--
148+
let mbFile = case Map.lookup (unLoc modName) (if isSource then moduleMapSource else moduleMap) of
149+
Nothing -> LocateNotFound
150+
Just (uid, file) -> LocateFoundFile uid file
139151
case mbFile of
140152
LocateNotFound -> lookupInPackageDB
141153
-- Lookup again with the perspective of the unit reexporting the file
142-
LocateFoundReexport uid -> locateModule (hscSetActiveUnitId uid env) comp_info exts targetFor modName noPkgQual isSource
154+
LocateFoundReexport uid -> locateModule moduleMaps (hscSetActiveUnitId uid env) comp_info exts targetFor modName noPkgQual isSource
143155
LocateFoundFile uid file -> toModLocation uid file
144156
where
145157
dflags = hsc_dflags env
@@ -180,12 +192,16 @@ locateModule env comp_info exts targetFor modName mbPkgName isSource = do
180192
let genMod = mkModule (RealUnit $ Definite uid) (unLoc modName) -- TODO support backpack holes
181193
return $ Right $ FileImport $ ArtifactsLocation file (Just loc) (not isSource) (Just genMod)
182194

183-
lookupLocal uid dirs reexports = do
184-
mbFile <- locateModuleFile [(uid, dirs, reexports)] exts targetFor isSource $ unLoc modName
195+
lookupLocal moduleMaps@(moduleMapSource, moduleMap) uid dirs reexports = do
196+
error "MOXOOO"
197+
-- mbFile <- locateModuleFile [(uid, dirs, reexports)] exts targetFor isSource $ unLoc modName
198+
let mbFile = case Map.lookup (unLoc modName) (if isSource then moduleMapSource else moduleMap) of
199+
Nothing -> LocateNotFound
200+
Just (uid, file) -> LocateFoundFile uid file
185201
case mbFile of
186202
LocateNotFound -> return $ Left $ notFoundErr env modName $ LookupNotFound []
187203
-- Lookup again with the perspective of the unit reexporting the file
188-
LocateFoundReexport uid' -> locateModule (hscSetActiveUnitId uid' env) comp_info exts targetFor modName noPkgQual isSource
204+
LocateFoundReexport uid' -> locateModule moduleMaps (hscSetActiveUnitId uid' env) comp_info exts targetFor modName noPkgQual isSource
189205
LocateFoundFile uid' file -> toModLocation uid' file
190206

191207
lookupInPackageDB = do

ghcide/src/Development/IDE/Types/HscEnvEq.hs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{-# LANGUAGE CPP #-}
22
module Development.IDE.Types.HscEnvEq
33
( HscEnvEq,
4-
hscEnv, newHscEnvEq,
4+
hscEnv, newHscEnvEq, envUnique,
55
updateHscEnvEq,
66
envPackageExports,
77
envVisibleModuleNames,

0 commit comments

Comments
 (0)