Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Jupyter Notebook #264

Merged
merged 4 commits into from
Nov 8, 2023
Merged

Add support for Jupyter Notebook #264

merged 4 commits into from
Nov 8, 2023

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Oct 5, 2023

Summary

This PR adds support for Jupyter Notebook. It requires client support for LSP 3.17 which contains the Notebook support.

Implementation

Context

  • Document: LSP type representing a text file (Python file for Ruff).
  • TextDocument: pygls representation of the LSP Document. This is an abstraction created from a Document which provides some useful methods like getting the file path, source code, etc.
  • New in 3.17: NotebookDocument type was added to represent a Notebook which consists of a list of cells (NotebookCell). Note that these are all LSP types coming from lsprotocol.
  • In pygls, a Notebook cell is represented as a text document (TextDocument).

There are methods provided by pygls to get the object:

  • get_text_document - Returns a TextDocument which either represents a Python file or a Notebook cell
  • get_notebook_document - Returns a NotebookDocument either using the Notebook URI or a cell URI. For cell URI, it returns the NotebookDocument containing the cell.

Document

A new Document type was created to facilitate the implementation. This represents either a Python file, a Notebook or a Notebook cell. There are various constructor methods which should be used to create this type:

  • For a URI representing a Python file, use either from_uri or from_text_document.
  • For a URI representing a Notebook file, use either from_uri or from_notebook_document.
  • For a URI representing a Notebook cell, use either from_cell_or_text_uri or from_notebook_cell.

Notebook JSON

Ruff expects the source content of a Notebook file to be in JSON format following the [Notebook format specification] but the protocol uses it's own abstraction and doesn't store the JSON format. This means that we need to create a JSON string representing the Notebook from the available information. This doesn't need all the information as Ruff only uses the cell source and version information. So, we create a minimal JSON string representing the Notebook document and pass it to Ruff.

An example JSON string representing a Notebook Document:

{
  "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 5,
  "cells": [
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "import random\nimport math"
    },
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "try:\n    print()\nexcept ValueError:\n    pass"
    },
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "import random\nimport pprint\n\nrandom.randint(10, 20)"
    },
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "foo = 1\nif foo is 1:\n    msg = f\"Invalid foo: {foo}\"\n    raise ValueError(msg)"
    }
  ]
}

We need to pass in every cell including the markdown cell to get an accurate information like the cell number.

For the cell document kind, the source value is a JSON string containing just a single code cell. This is required as code actions and formatting work at both cell and notebook level.

Configuration

For VSCode users, the notebook.* configuration is used to run the formatter or code actions on save:

{
  // Enable formatting the entire Notebook on save
  "notebook.formatOnSave.enabled": true,
  // Run the enabled code actions on the entire Notebook on save
  "notebook.codeActionsOnSave": {
    "source.fixAll": true,
    "source.organizeImports.ruff": true
  },
}

The way the above settings work in VSCode is that the editor runs the actions in parallel for every cell. This has the illusion that it was run on the entire Notebook. The commands defined by us (Ruff: Organize imports and Ruff: Fix all auto-fixable problems) are run on the entire Notebook at once. This is important because in the latter case the ruff command is invoked n number of times where n is the number of cells while for the former it's run only once.

Commands

Builtin

  • Ruff: Organize Imports: Works at Notebook level
  • Ruff: Fix all auto-fixable problems: Works at Notebook level

VSCode specifc

  • Format Cell: Formats the current cell
  • Notebook: Format Notebook: Formats the entire Notebook by running the formatter for every cell
  • Organize Imports: Runs the source.organizeImports code action on every cell in parallel
  • Fix All: Runs the source.fixAll code action on every cell in parallel

Feature checklist

  • Code actions
    • Organize imports
    • Fix all
    • Each fixable diagnostics
    • Disable rule comment
  • Code action resolve
  • Commands
    • ruff.applyAutofix
    • ruff.applyOrganizeImports
    • ruff.applyFormat
  • Diagnostics
    • On open
    • On close
    • On save
    • On change
  • Formatting
  • Hover

Test Plan

Manually testing for all the features mentioned above.

How to run this locally?

  1. Clone https://github.com/astral-sh/ruff-lsp and https://github.com/astral-sh/ruff-vscode in the same directory
  2. Checkout this branch git checkout dhruv/notebook in the ruff-lsp repository
  3. Install the requirements for both repositories
  4. For ruff-vscode, uninstall ruff-lsp (pip uninstall --yes ruff-lsp) as we'd want to use the local version. To install the local ruff-lsp version in ruff-vscode, follow Modifying the LSP.
  5. Open VSCode from ruff-vscode directory -> "Run and Debug" section from the sidebar -> "Debug Extension and Python" config.

This will then open a VSCode development session which can be used to test out the notebook features.

Test notebooks:

Requires: astral-sh/ruff#7664 which was released in v0.1.0

fixes: #267
closes: astral-sh/ruff-vscode#256
closes: astral-sh/ruff-vscode#314
closes: astral-sh/ruff-vscode#51

@dhruvmanila
Copy link
Member Author

dhruvmanila commented Oct 5, 2023

Base automatically changed from dhruv/rename-deprecated to main October 5, 2023 17:08
@dhruvmanila dhruvmanila force-pushed the dhruv/notebook branch 3 times, most recently from 485b941 to 63d0a83 Compare October 10, 2023 07:59
ruff_lsp/server.py Show resolved Hide resolved

settings = _get_settings_by_document(document.path)

if utils.is_stdlib_file(document.path):
if document.is_stdlib_file():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also provide notebook specific code-actions. These are prefixed by namespace notebook. in code action kind.

@dhruvmanila dhruvmanila changed the base branch from main to dhruv/fix-test October 17, 2023 04:27
Base automatically changed from dhruv/fix-test to main October 17, 2023 13:55
ruff_lsp/server.py Show resolved Hide resolved
ruff_lsp/server.py Show resolved Hide resolved
Comment on lines 272 to 280
notebook_document = LSP_SERVER.workspace.get_notebook_document(cell_uri=uri)
if notebook_document is None:
notebook_document = LSP_SERVER.workspace.get_notebook_document(
notebook_uri=uri
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use:

LSP_SERVER.workspace.get_notebook_document(notebook_uri=uri, cell_uri=uri)

Because the method uses either notebook_uri or cell_uri where notebook_uri is preferred over cell_uri. While our implementation will first try the cell_uri and then the notebook_uri.

ruff_lsp/server.py Show resolved Hide resolved
ruff_lsp/server.py Show resolved Hide resolved

This works for both Python files and Notebook Documents. For Notebook Documents,
the hover works at the cell level.
"""
document = LSP_SERVER.workspace.get_text_document(params.text_document.uri)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is either a Python file or a Notebook cell.

Comment on lines 813 to 867
document = Document.from_text_document(
LSP_SERVER.workspace.get_text_document(params.text_document.uri)
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we creating the document again here when it's already constructed above at the start of the function?

This function is a generic implementation to create code actions at both source and line level. The document created above can only be used to create source level code actions as it represents either a Python file or an entire Notebook.

Here, we need a document which represents a Python file or an individual cell because we need to get the line level code action. Here, we already have the diagnostics information which is why this works as otherwise we would've been running Ruff on an individual cell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding something to this effect as a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified this part such that each branch uses the document through our custom abstraction or the one from pygls.

Comment on lines -760 to +1138
def _result_to_edits(
document: workspace.TextDocument,
result: RunResult | None,
) -> list[TextEdit]:
def _result_to_workspace_edit(
document: Document, result: RunResult | None
) -> WorkspaceEdit | None:
"""Converts a run result to a WorkspaceEdit."""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source level code actions should apply the fix for every cell in a Notebook. Earlier the API was to create a single text edit to replace the entire content of a text document with the fixed source, create a workspace edit corresponding the text edit and send it back. This behavior is preserved when the document kind is Python.

But, for a Notebook, we need to create text edits for each cell where the fix exists which is why this was changed.

ruff_lsp/server.py Outdated Show resolved Hide resolved
Comment on lines 1130 to 1203
if new_source.endswith("\r\n"):
new_source = new_source[:-2]
elif new_source.endswith("\n"):
new_source = new_source[:-1]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a TODO for me to check if this is actually required now. Ruff adds and removes the newline at the end of every cell but I need to verify that behavior. I'm not sure what kind of example can be used here. I might have to check with a Ruff version when Jupyter Notebook support was not present.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Would love to get a second look at this because I'm unfamiliar with our Python LSP implementation

ruff_lsp/settings.py Show resolved Hide resolved
cells: list[CodeCell | MarkdownCell]


def _create_notebook_json(notebook_document: NotebookDocument) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, this feels kind of expensive. Do we run this on every keystroke or only on save? How does it perform for large notebooks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good questions. What about this function feels expensive? I guess the json.dumps part?

Do we run this on every keystroke or only on save?

It depends on user settings but it could run on every keystroke when "run": "onType".

How does it perform for large notebooks?

I'll test it out.

We could potentially create our own layer of notebook synchronization but it feels not worth the effort if we're going to move the LSP to Rust.

Context: openlawlibrary/pygls#394

Comment on lines 480 to 515
# Publish diagnostics for every code cell, replacing the previous diagnostics.
for cell_idx, cell in enumerate(notebook_document.cells):
if cell.kind is not NotebookCellKind.Code:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding this explanation to the inline comment

ruff_lsp/server.py Show resolved Hide resolved
ruff_lsp/server.py Show resolved Hide resolved

def is_notebook_file(self) -> bool:
"""Return True if the document belongs to a Notebook or a cell in a Notebook."""
return self.kind is DocumentKind.Notebook or self.path.endswith(".ipynb")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so the latter case is a cell? We use DocumentKind.Text?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although I'm thinking of adding it's own variant (#264 (comment))

Comment on lines 813 to 867
document = Document.from_text_document(
LSP_SERVER.workspace.get_text_document(params.text_document.uri)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding something to this effect as a comment?

ruff_lsp/server.py Outdated Show resolved Hide resolved
document = Document.from_uri(uri)
workspace_edit = await _fix_document_impl(document)
if workspace_edit is None:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a behavior change? Like does this action now not show up in some cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a behavior change?

No, it's not.

Like does this action now not show up in some cases?

No, this action is registered and always shown on the client side using the VSCode extension. When invoked through VSCode, the code executes this command registered in the LSP.

ruff_lsp/server.py Outdated Show resolved Hide resolved
@@ -1192,13 +1539,9 @@ async def _run_check_on_document(
)


async def _run_format_on_document(document: workspace.TextDocument) -> RunResult | None:
async def _run_format_on_document(document: Document) -> RunResult | None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, so this works interchangeably with Python files and notebooks now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Document object creates an abstraction to store all the information required in the format (specifically the source) it is required.

workspace_edit = _result_to_workspace_edit(document, results)
if workspace_edit is None:
return
LSP_SERVER.apply_edit(workspace_edit, "Ruff: Format document")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you walk me through what happens if we receive a notebook cell here? How does that interact with _run_format_on_document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applyFormat command works at the Notebook level.

For a Notebook the arguments is going to contain a single entry which will be a TextDocument for the current cell (active cell or the cell containing the cursor). We'll construct a Document which represents either a Python file or a Notebook.

After the formatting is done, the _result_to_workspace_edit will generate a single TextDocumentEdit for the Python file or a list of TextDocumentEdit for each cell in a Notebook. These are then a part of WorkspaceEdit which are then applied.

@dhruvmanila
Copy link
Member Author

Thanks for the reviews! I just realized while playing around with the Format Cell capability that the cell formatting doesn't work as it should've. The reason is that each cell is represented as a text document with the cell source and our abstraction will not construct a JSON representation for it.

There are 2 ways to go about solving this:

  1. Implement the CLI flag as described here Break in compatibility with jupyterlab-lsp + python-lsp-ruff since 0.0.285 ruff#6847 (--extension ipynb=py)
  2. Add a new dedicated DocumentKind.Cell variant in the Document abstraction where the source will be the JSON representation containing a single cell. This will require some changes in translating the result into TextEdit to account for this new variant.

(Sorry for the churn, I must've missed this in my earlier testing and I wasn't aware of the Format Cell action 😓)

Comment on lines 296 to 298
def is_notebook_cell(self) -> bool:
"""Return True if the document belongs to a cell in a Notebook."""
return self.kind is DocumentKind.Text and self.path.endswith(".ipynb")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to is_notebook_file. We represent a cell as Text because that's what pygls does as well using TextDocument.

Comment on lines 795 to 812
workspace_edit = await _fix_document_impl(
Document.from_uri(params.text_document.uri), only="I001"
)
if workspace_edit:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Document creation is done only when required as it's eager to read the source text or construct the JSON notebook format.

workspace_edit = _result_to_workspace_edit(document, results)
if workspace_edit is None:
return
LSP_SERVER.apply_edit(workspace_edit, "Ruff: Format document")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applyFormat command works at the Notebook level.

For a Notebook the arguments is going to contain a single entry which will be a TextDocument for the current cell (active cell or the cell containing the cursor). We'll construct a Document which represents either a Python file or a Notebook.

After the formatting is done, the _result_to_workspace_edit will generate a single TextDocumentEdit for the Python file or a list of TextDocumentEdit for each cell in a Notebook. These are then a part of WorkspaceEdit which are then applied.

Comment on lines 1566 to 1572
class DocumentSource(NamedTuple):
"""The source of a document."""

path: str
"""The path to the document."""

text: str
"""The text of the document."""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that we can pass a different source to Ruff than the one in the document. For example, in Format Cell document, we'll pass the notebook representation containing a single cell and then compare against the original source. Refer to format_document function.

@dhruvmanila
Copy link
Member Author

Demo

Formatting

  • Using Format Cell
  • Using Ruff: Format Document
notebook_formatting.mov

Code Actions

  • Fix individual code action
  • Disable rule comment
  • Fix all
  • Format imports
notebook_code_actions.mov

Diagnostics

  • OnType
diagnostics_on_type.mov
  • OnSave
diagnostics_on_save.mov

Hover

notebook_hover.mov

@charliermarsh
Copy link
Member

(@dhruvmanila - FYI looks like a CI failure here.)

@dhruvmanila
Copy link
Member Author

(@dhruvmanila - FYI looks like a CI failure here.)

Sorry, it's fixed.

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work, and the thorough testing -- gives me a lot of confidence.

@DonnyVerduijn
Copy link

LGTM

Comment on lines +232 to +233
Cell = enum.auto()
"""A cell in a Notebook Document."""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to introduce the cell specific document kind now that they're being used in multiple places.

Comment on lines +281 to +288
@classmethod
def from_cell_or_text_uri(cls, uri: str) -> Self:
"""Create a `Document` representing either a Python file or a Notebook cell from
the given URI.

The function will try to get the Notebook cell first, and if there's no cell
with the given URI, it will fallback to the text document.
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there's any better name for this.

Comment on lines +833 to +837
workspace_edit = await _fix_document_impl(
Document.from_cell_or_text_uri(params.text_document.uri),
only="I001",
)
if workspace_edit:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code actions now work at cell level for a Notebook. This is consistent with everything else in the VS Code ecosystem and a user can use builtin commands (Ruff:) to run an action on the entire notebook.

Comment on lines +1202 to +1208
def _result_single_cell_notebook_to_edits(
document: Document, result: RunResult
) -> list[TextEdit] | None:
"""Converts a run result to a list of TextEdits.

if new_source == document.source:
The result is expected to be a single cell Notebook Document.
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function exists because for formatting provider, we need to return a list of TextEdit while for the command we need to apply a WorkspaceEdit.

@dhruvmanila
Copy link
Member Author

(@charliermarsh sorry for asking a review after approving, this is the final commit :))

I've pushed it as a separate commit to facilitate the review process: 244a246

Updates since the last review

The major update is that now the code actions work at cell level. Earlier, they were applied at the Notebook level. My thinking before was that source level code actions should be applied on the entire Notebook but it doesn't work well with VS Code settings, specifically the following:

{
  // Enable formatting the entire Notebook on save
  "notebook.formatOnSave.enabled": true,
  // Run the enabled code actions on the entire Notebook on save
  "notebook.codeActionsOnSave": {
    "source.fixAll": true,
    "source.organizeImports.ruff": true
  },
}

This also means that the generic Organize Imports / Fix All actions in VS Code works only for the current cell which is again consistent with the isort VS Code extension.

@charliermarsh
Copy link
Member

I'll review and approve this today, but before doing so I'll also check it out and poke around locally.

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, tested this out myself too.

@dhruvmanila dhruvmanila merged commit 31f3836 into main Nov 8, 2023
20 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/notebook branch November 8, 2023 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants