Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC][RFC] Metals BSP third-party code navigation #21143

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jgranstrom
Copy link
Contributor

@jgranstrom jgranstrom commented Jul 8, 2024

Note: This is not an intended implementation, simply a PoC of a path to achieving it

A PoC for extended metals (VS Code) support for 3rd-party code navigation.

The underlying issue is that metals requires source-jars. Intellij works around this by decompiling anything that doesn't have it, metals doesn't.

  • Enable coursier to download source jars for depenencies, and include them in lockfiles
  • Map between classifiers to find source jars
  • Provide dependencySources in addition to dependencyModules via BSP (required by metals)

The implementation would probably be better centered around classifiers instead of specifically sources, but I simplified it as much as possible here to get it to a working state.

Comment on lines +175 to +199
digests = []
for cpe in classpath_entries:
digests.append(cpe.digest)
for alse in applicable_lockfile_source_entries:
new_file = FileEntry(alse.file_name, alse.file_digest)
digest = await Get(Digest, CreateDigest([new_file]))
digests.append(digest)

for dep in alse.dependencies:
coord = Coordinate.from_coord_str(dep.to_coord_str())
dep_artifact_requirement = ArtifactRequirement(coord)
dep_entry = get_entry_for_coord(lockfile, dep_artifact_requirement.coordinate)
dep_new_file = FileEntry(dep_entry.file_name, dep_entry.file_digest)
dep_digest = await Get(Digest, CreateDigest([dep_new_file]))
digests.append(dep_digest)
src_ent = applicable_lockfile_source_entries.get(alse)
applicable_lockfile_source_entries_inverse.get(src_ent).append(dep_entry)

resolve_digest = await Get(Digest, MergeDigests(digests))
inverse = dict(zip(classpath_entries, applicable_lockfile_entries))

s = map(
lambda x: (x, applicable_lockfile_source_entries_inverse.get(inverse.get(x))),
classpath_entries,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing is convoluted just to see that it works, just mashing it in here to avoid changing too much elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also just be a list comprehension.

Comment on lines +541 to +543
dependency_modules = await Get(
DependencyModulesResult, DependencyModulesParams, DependencyModulesParams(request.targets)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have its own backend-specific rule I suppose, for now it just pulling it from dependency_modules fully expecting it to be a maven module, again to try it out without implementing a whole rule-indirection for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Maybe add a TODO comment as a reminder to create the union hook which makes this backend-agnostic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have its own backend-specific rule I suppose

Yes, because the core BSP logic should not be aware of the per-language specifics.

entries = {(i.coord.group, i.coord.artifact): i for i in self.entries}
entry = entries.get((coord.group, coord.artifact))
entries = {(i.coord.group, i.coord.artifact, i.coord.classifier): i for i in self.entries}
entry = entries.get((coord.group, coord.artifact, coord.classifier))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to make sure everything includes classifier as group, artifact might not be unique anymore

Comment on lines +47 to +68
if ":jar:sources:" in dep['coord']:
path = os.path.join(os.getcwd(), sys.argv[1] + ".empty")
f = open(path, "w")
f.write("")
f.close()
source = PurePath(path)
dest_name = dep['coord'].replace(":", "_")
classpath_dest = f"classpath/{dest_name}{ext}"
existing_source = classpath.get(classpath_dest)
if existing_source:
if existing_source == source:
# We've already captured this file.
continue
raise Exception(
f"Duplicate jar name {classpath_dest} with incompatible source:\\n"
f" {source}\\n"
f" {existing_source}\\n"
)
classpath[classpath_dest] = source
copyfile(source, classpath_dest)
continue
else:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is literally just a copy of the normal action but just passing an empty file for missing source jars (super-hacky yes). I noticed things upstream expect results to match 1-to-1 and in order, I just did this hack instead of finding every place where that was assumed. Since source jars aren't always a thing they'd have to be allowed to be not found

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, do you have an example of how this setup script fails without this change?

Comment on lines +105 to +106
--default=true \
--sources \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where I'm thinking it should be configurable to have coursier fetch source-jars / javadoc-jars etc. Currently I just add source-jars on top of the normal ones in this PoC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So javadoc could be an additional type of jar added to the resolve?

Comment on lines +80 to +94
reqs = list(requirements)
req_sources = [
dataclasses.replace(
i, coordinate=dataclasses.replace(i.coordinate, classifier="sources")
)
for i in reqs
]

extended = []
for i in reqs:
extended.append(i)
for i in req_sources:
extended.append(i)

return cls(FrozenOrderedSet(sorted(i.to_metadata_str() for i in extended)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again if enabling source jars it would be added to the lockfile and lockfile metadata as well

digests.append(cpe.digest)
for alse in applicable_lockfile_source_entries:
new_file = FileEntry(alse.file_name, alse.file_digest)
digest = await Get(Digest, CreateDigest([new_file]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware all these individual CreateDigest etc would not be done one-by-one in a for-loop, I just left everything as simple as I could to present the overall approach

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I suggest a TODO comment here as a reminder to just convert this to a single CreateDigest with a list of FileEntry.

@@ -114,7 +115,7 @@ class ThirdpartyModulesRequest:
@dataclass(frozen=True)
class ThirdpartyModules:
resolve: CoursierResolveKey
entries: dict[CoursierLockfileEntry, ClasspathEntry]
entries: dict[CoursierLockfileEntry, tuple[ClasspathEntry, list[CoursierLockfileEntry]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner list should be a tuple instead since "frozen" dataclasses need to be hashable and list is not hashable (since it is mutable).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe it would be better to just use a new dataclass instead of a 2-tuple? Then you could give a name to each component which will help with readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And is there any particular ordering of these CoursierLockfileEntry instances?

artifact_requirement_source = dataclasses.replace(
artifact_requirement,
coordinate=dataclasses.replace(
artifact_requirement.coordinate, classifier="sources"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall from the PR description, that you had some concern about hard-coding to "sources" classifier. Seems fine to me. I suggest defining a set of constants for well-known classifier names though and using a constant here instead of the open-coded "sources" string.

digests.append(cpe.digest)
for alse in applicable_lockfile_source_entries:
new_file = FileEntry(alse.file_name, alse.file_digest)
digest = await Get(Digest, CreateDigest([new_file]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I suggest a TODO comment here as a reminder to just convert this to a single CreateDigest with a list of FileEntry.

coord = Coordinate.from_coord_str(dep.to_coord_str())
dep_artifact_requirement = ArtifactRequirement(coord)
dep_entry = get_entry_for_coord(lockfile, dep_artifact_requirement.coordinate)
dep_new_file = FileEntry(dep_entry.file_name, dep_entry.file_digest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversion to FileEntry seems to occur more than once. Maybe define a to_file_entry method on the dep_entry?

Comment on lines +175 to +199
digests = []
for cpe in classpath_entries:
digests.append(cpe.digest)
for alse in applicable_lockfile_source_entries:
new_file = FileEntry(alse.file_name, alse.file_digest)
digest = await Get(Digest, CreateDigest([new_file]))
digests.append(digest)

for dep in alse.dependencies:
coord = Coordinate.from_coord_str(dep.to_coord_str())
dep_artifact_requirement = ArtifactRequirement(coord)
dep_entry = get_entry_for_coord(lockfile, dep_artifact_requirement.coordinate)
dep_new_file = FileEntry(dep_entry.file_name, dep_entry.file_digest)
dep_digest = await Get(Digest, CreateDigest([dep_new_file]))
digests.append(dep_digest)
src_ent = applicable_lockfile_source_entries.get(alse)
applicable_lockfile_source_entries_inverse.get(src_ent).append(dep_entry)

resolve_digest = await Get(Digest, MergeDigests(digests))
inverse = dict(zip(classpath_entries, applicable_lockfile_entries))

s = map(
lambda x: (x, applicable_lockfile_source_entries_inverse.get(inverse.get(x))),
classpath_entries,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also just be a list comprehension.

Comment on lines +541 to +543
dependency_modules = await Get(
DependencyModulesResult, DependencyModulesParams, DependencyModulesParams(request.targets)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Maybe add a TODO comment as a reminder to create the union hook which makes this backend-agnostic?

Comment on lines +541 to +543
dependency_modules = await Get(
DependencyModulesResult, DependencyModulesParams, DependencyModulesParams(request.targets)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have its own backend-specific rule I suppose

Yes, because the core BSP logic should not be aware of the per-language specifics.

)
if ":jar:sources:" in dep['coord']:
path = os.path.join(os.getcwd(), sys.argv[1] + ".empty")
f = open(path, "w")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More idiomatic is to use with open(path, "w") as f: syntax instead.

Comment on lines +47 to +68
if ":jar:sources:" in dep['coord']:
path = os.path.join(os.getcwd(), sys.argv[1] + ".empty")
f = open(path, "w")
f.write("")
f.close()
source = PurePath(path)
dest_name = dep['coord'].replace(":", "_")
classpath_dest = f"classpath/{dest_name}{ext}"
existing_source = classpath.get(classpath_dest)
if existing_source:
if existing_source == source:
# We've already captured this file.
continue
raise Exception(
f"Duplicate jar name {classpath_dest} with incompatible source:\\n"
f" {source}\\n"
f" {existing_source}\\n"
)
classpath[classpath_dest] = source
copyfile(source, classpath_dest)
continue
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education, do you have an example of how this setup script fails without this change?

Comment on lines +105 to +106
--default=true \
--sources \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So javadoc could be an additional type of jar added to the resolve?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants