Skip to content

gh-131591: Allow pdb to attach to a running process #132451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Apr 25, 2025

Conversation

godlygeek
Copy link
Contributor

@godlygeek godlygeek commented Apr 12, 2025

Initial proof-of-concept and design. This is lacking tests and documentation, but otherwise works well, and it would be very helpful if anyone interested can try it out!

It has been tested on macOS and Linux, though not yet on Windows.

@godlygeek
Copy link
Contributor Author

godlygeek commented Apr 12, 2025

Oh, one missing feature: this doesn't currently send a SIGINT to the PID when the client gets a ctrl-c.

That shouldn't be tough to add, at least for Linux and macOS - I'm not sure if something equivalent is doable on Windows or not?

EDIT: This has been implemented in a way that works for all platforms.

@pablogsal
Copy link
Member

CC @gaogaotiantian

@pablogsal
Copy link
Member

pablogsal commented Apr 12, 2025

@gaogaotiantian as promised this is the remote PDB initial implementation. Let’s try to get it ready for beta freeze 🤘

I suggest to try to get the basics landed first so we can iterate in parallel between all of us instead of dying on this PR

@pablogsal pablogsal requested a review from Copilot April 14, 2025 13:15
Copilot

This comment was marked as off-topic.

@gaogaotiantian
Copy link
Member

I agree that we should merge the basic stuff in first. It's a relatively long PR so I'll need some time :)

@pablogsal
Copy link
Member

I agree that we should merge the basic stuff in first. It's a relatively long PR so I'll need some time :)

Feel free to push to the PR if you have small nits or thinks like that to avoid review roundtripping

Copy link
Member

@gaogaotiantian gaogaotiantian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think the approach is good. I'm not the socket expert but it seems solid to me. I would like the pdb part to be even simpler - it would be much better to provide some basic version then add features when requested, than to promise a lot and realize some of them are too complicated to make it correct. The users would already be thrilled to have a basic attachable debugger that functions, even without some rarely used commands.

It would be a nice chance to understand what the users are actually using as well. Unsupported features are much better than buggy ones.

Lib/pdb.py Outdated
if opts.pid:
# If attaching to a remote pid, unrecognized arguments are not allowed.
# This will raise an error if there are extra unrecognized arguments.
parser.parse_args()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think attach is so different than the rest, we should only have one entry to parse and do the attach. parser.parse_args() is also a bit magical here. I think we should combine this piece and the one below and just check if -m or args exists. If not, print some easy to understand message. Then do attach. Put the attaching code together and as a short-cut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also considered putting -p and -m in a ArgumentParser.add_mutually_exclusive_group but decided to err on the side of simplicity. That said, if you have a strong opinion on how best to do this, I'm happy to follow your lead. Want to push something that you're happy with?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of combining the two parts together:

    if opts.pid:
        # If attaching to a remote pid, unrecognized arguments are not allowed.
        # This will raise an error if there are extra unrecognized arguments.
        opts = parser.parse_args()
        if opts.module:
            parser.error("argument -m: not allowed with argument --pid")
        attach(opts.pid, opts.commands)
        return

    # Existing code
    opts, args = parser.parse_known_args()

Lib/pdb.py Outdated
ns: dict[str, typing.Any]


class _RemotePdb(Pdb):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call the client _PdbClient, why not _PdbServer for this? I think it would clearer that they are a pair.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _RemotePdb makes it clearer that it is-a Pdb in the OOP sense.

_PdbServer doesn't imply anything about whether it uses inheritance or composition, but _RemotePdb does, I think.

That said, it's a private class, so there's not much need to bikeshed on the name. If you prefer _PdbServer we can switch it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really do this kind of OOP naming in stdlib. _RemotePdb is easily confused with a pdb that can remote to other processes. I still prefer _PdbServer as it's a clear pair with _PdbClient and it's obvious that it runs in the same process.

Lib/pdb.py Outdated
]
# fmt: on

self._send(commands_entry={"bpnum": bnum, "terminators": end_cmds})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to send these end commands back and forth? They are constants. Also end is a very important one (arguably the most important one as it's in the example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're constants, but they're not necessarily the same constants for the client and the server, since they can be running different versions of Python.

If the attaching client is running 3.14.9 and the remote PDB server is running 3.14.0, it's possible that the client will think that some command should terminate the command list, and send a command list to the server that the server thinks is incomplete because it doesn't recognize that terminator. If that happened, we'd be out of sync - the server would still be waiting for more lines, and the client would think that it's done sending lines.

That said, perhaps I'm being overly defensive, and we shouldn't expect any new command to ever be added after 3.X final. But the risk of getting it wrong is leaving you with a broken debugger session that might only be able to be fixed by killing the client.

Another possible option is to be defensive in the:

        for line in commands:
            if self.handle_command_def(line):
                break

loop below. We could say that loop must break, and that if it doesn't, it can only mean that we got a list that was unterminated, and so we should roll back to the old set of commands. If you prefer that over sending list of early-command-terminators back over the wire, I'm fine with that.

That'd just be something like (pseudocode):

        saved = old_commands_for_breakpoint(bnum)
        for line in commands:
            if self.handle_command_def(line):
                break
        else:
            self.error("Incomplete command list!")
            set_commands_for_breakpoint(bnum, saved)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the server would still be waiting for more lines, and the client would think that it's done sending lines.

That won't happen. When the client sent the commands as a list, the server will just finish the list and consider it done for the commands. Nothing worse would happen.

Also, I don't expect the list of resuming commands to change. And as discussed elsewhere, I don't think we should support cross version debugging.

I just tested it and CTRL+D does not work either (to terminate the commands).

Copy link
Contributor Author

@godlygeek godlygeek Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't happen. When the client sent the commands as a list, the server will just finish the list and consider it done for the commands. Nothing worse would happen.

Ah, you're right - I misremembered and thought that _RemotePdb sets self.commands_defining = True, but it doesn't (it used to in one of my earlier iterations).

OK, in that case, the only reason I can think of to send the list of resuming commands from the remote to the client is aliases. Currently I'm not handling that, and I'm just hardcoding, but in regular PDB you can:

(Pdb) alias blah cont
(Pdb) import functools
(Pdb) b functools.cache
Breakpoint 1 at /opt/bb/lib/python3.13/functools.py:676
(Pdb) commands
(com) p "hello from breakpoint"
(com) blah
(Pdb)

Note that blah ended the command list. The only way we can support that for remote PDB is to send the list of resuming commands from the server to the client. But, I also don't care much if we support that. As you say, if we don't, the only impact is that the user needs to a) use the regular command instead of the alias, or b) add an end command.

That seems fine to me. I think that having the resuming commands end the command list is a weird feature anyway, but I was aiming for feature parity.

So: if you care about aliases for resuming commands ending the commands list, I'll keep this code and update it to also send the list of aliases that expand to one of the commands in the list.

If you don't care about aliases for resuming commands ending the command list, I'm happy to hardcode the list client-side instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested it and CTRL+D does not work either (to terminate the commands).

Not for regular PDB either, so that's fine.

But ctrl-c just kills the whole remote pdb client, that's wrong.

Also end is a very important one

lol oops, that's what I get for focusing on edge cases 🤣

Fixed 'end' and ctrl-c in e44a670

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the only "correct" way to do this is to send the commands to the server one by one - you'll never get the correct result with client parsing. For example, x = 1;;continue should end the command prompt, and the current client does not do that. Unless you parse everything exactly as the server does, you won't get the same result.

For your blah example - the current code does not handle it, and it would be complicated to make it work properly. If we want to make it right, we need to simulate the send/receive line by line.

The worse case in the current implementation is that the user needs to end the command with an end - that's not too bad. We can do it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, x = 1;;continue should end the command prompt, and the current client does not do that.

Oof, yeah. I didn't think about ;;

If we want to make it right, we need to simulate the send/receive line by line.

I can do that. It wouldn't be too tough, and it would wind up looking a lot like interact - set a flag for the "mode", and make it be possible for the client to return to command mode (on ctrl-c or ctrl-d) or for the server to (when the command list is terminated).

I understand how to do this, and it wouldn't be difficult, but it would be almost exactly as complex as interact, for better or worse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 600aa05

These were removed from the Pdb base class in b577460
The only thing _RemotePdb needs to customize is how the recursive
debugger is constructed. Have the base class provide a customization
point that we can use.
This makes us robust against changes in the temporary script.
@Zheaoli
Copy link
Contributor

Zheaoli commented Apr 16, 2025

To be honest, I think the TCP socket here is not good for the user. Here's the detail

For now, we use closing(socket.create_server(("localhost", 0))) to allocate a random port from the system. This means that the port can be a common port used by other services, like 3000, 9432, 9092, etc. So the service may can not bootup when the debug process has existed and the people may don't know why(I think we don't have documentation for this behavior yet)

So I think we may need to allow people setup a port allocate range to avoid the common port usage range.

But here's a better way to do this. Maybe we can let people choose to use the Unix Domain Socket(aka UDS) or the TCP Socket. For now, most of the OS platform has supported the UDS (even if Windows since 17063), So we can use UDS first and fallback to TCP if the OS platform not support UDS yet)

cc @gaogaotiantian

@godlygeek
Copy link
Contributor Author

For now, we use closing(socket.create_server(("localhost", 0))) to allocate a random port from the system. This means that the port can be a common port used by other services, like 3000, 9432, 9092, etc.

So I think we may need to allow people setup a port allocate range to avoid the common port usage range.

The common port range is avoided automatically on every operating system I know of.

POSIX says:

If the sin_port value passed to bind() is zero, the port number bound to the socket shall be one chosen by the implementation from an implementation-defined port range to produce an unused local address.

On Linux that implementation-defined port range is configurable via /proc/sys/net/ipv4/ip_local_port_range but the default range is 32768 to 60999.

On Windows the port range is 49152 to 65535.

I'm having trouble finding official docs for macOS, but StackOverflow says that it's configurable like on Linux, with a default range of 49152 to 65535.

So the service may can not bootup when the debug process has existed and the people may don't know why

Statically configuring a service to start on a port that's allocated to the pool of ephemeral ports is incorrect. The configuration should be corrected, rather than assuming that no other software on the machine ever uses the widely portable feature of supplying bind() with 0 as the port.

@Zheaoli
Copy link
Contributor

Zheaoli commented Apr 16, 2025

Statically configuring a service to start on a port that's allocated to the pool of ephemeral ports is incorrect. The configuration should be corrected, rather than assuming that no other software on the machine ever uses the widely portable feature of supplying bind() with 0 as the port.

This is correct, I agree with your idea. But my concern is the different Linux distro may have different default local port range out of box. If we still use TCP rather than the UDS, I think we may need mention this behavior on the documentation.

But in my side, I think the random server port is not a common usage for socket programming. When we try to communicate between the processes, UDS should be a better way to reach the target. It's more effective and nor more port conflict

@gaogaotiantian
Copy link
Member

Okay I tried to read the protocol for interact and breakpoint commands. Here are my thoughts.

I think the protocol could be simpler and easier to maintain if we reuse more of the existing structure of pdb. We can't fully mimic a stdin/stdout because that could be difficult to parse, but I don't think we need to make the state of pdb controllable from both client and server. I don't think the client needs to send different commands for "interact", "commands" and "breakpoint commands" (which is very unfortunate because it's called commands in pdb but commands is used in the protocol for normal pdb commands).

I think RemotePdb should be the only object that maintains the state, and it tells PdbClient what state it's in - either "normal"? (there might be a better word for it), "interact" or "commands". That should be an enum string that's sent through prompt protocol - it's possible we have more in the future, they'll all be a kind of states and we have a single source of truth.

Then, with the prompt protocol, the server sends this information to the client, and the client display prompts based on the mode (I don't even know if prompt itself is needed, might be). The client should only listen to the server about what the current state is, and behave solely based on that.

In order to do this, we need to give the client the capability to send "control signals" to server, mainly EOF and KeyboardInterrupt - we need a separate prototol keyword for it. This could unify the bahavior on remote pdb. For example, the commands command actually stops from Ctrl+D - it's not obvious, but try it out.

I believe with this protocol, we can do minimum changes to make commands work - not on the client side, but on server side. We shouldn't need to duplicate all the parsing loops on the client side. The client side should almost only do - listen, display, send what user types (or control signal), and listen again.

@pablogsal
Copy link
Member

pablogsal commented Apr 23, 2025

Of course not. I'm just wondering if we should fix this in this PR.

I don't think so. This probably requires some discussion on how to handle it and we are already on the limit here. I would prefer to target this fix with a different PR where the discussion is ONLY around this problem. AS it is a bug, we also technically have after feature freeze if we want but we probably want to build more things on top of this to polish the experience and that MUST be done before freeze

@gaogaotiantian
Copy link
Member

Okay sure. I'm good with creating an issue for the bug so we won't forget it and working on landing this PR.

@gaogaotiantian
Copy link
Member

Oh, we definitely need a whatsnew entry for it.

@pablogsal
Copy link
Member

Oh, we definitely need a whatsnew entry for it.

I am working on more tests, will be pushing them soon

@pablogsal
Copy link
Member

@gaogaotiantian what's new entry added and added some tests that cover most of what is not super complicated to test. Do you mind formally approving so I know you are fine to land and then we can open new issues for the different things we want to improve.

@godlygeek will open a issue as soon as we land for the infinite loop problem.

def test_protocol_version(self):
"""Test that incompatible protocol versions are properly detected."""
# Create a script using an incompatible protocol version
script = f"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do textwrap.dedent() for all the scripts? So it's visually indented properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaogaotiantian
Copy link
Member

Is the test flaky? That's something I really want to avoid because everyone is going to be disrupted by flaky tests. I'll run the tests a few times and see if it's stable.

@gaogaotiantian
Copy link
Member

It failed for a second time, that is worth checking. It's the keyboard interrupt one.

@pablogsal
Copy link
Member

It failed for a second time, that is worth checking. It's the keyboard interrupt one.

Will check tomorrow

@pablogsal pablogsal force-pushed the remote_pdb branch 3 times, most recently from 28e5992 to 659556f Compare April 24, 2025 16:20
@pablogsal
Copy link
Member

@gaogaotiantian Ok the test is fixed. I plan to land this tomorrow unless you have any other concern.

@gaogaotiantian
Copy link
Member

Sure let's land this first and polish it later. We still have some work to do but we don't have to do them all together immediately. Thanks for the work, this is a very cool feature for pdb.

@pablogsal
Copy link
Member

Awesome! Thanks for the thorough review @gaogaotiantian! This was not easy (it's 1,330 of very tricky code and more than a hundred comments) and you have been not only very thorough, patient and understanding and I am happy we have been able to navigate a plan. pdb is certainly in good hands :)

Let's iterate together on polishing the feature! ❤️

@pablogsal pablogsal merged commit 797b29b into python:main Apr 25, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants