Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: get correct source from redefined class #704

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Seraphli
Copy link

Summary

Fixes #703

Checklist

Documentation and Tests

  • Added relevant tests that run with python tests/__main__.py, and pass.

Release Management

  • Added "Fixes #NNN" in the PR body, referencing the issue (#NNN) it closes.
  • Added a comment to issue #NNN, linking back to this PR.
  • Added rationale for any breakage of backwards compatibility.
  • Requested a review.

@Seraphli
Copy link
Author

I use current main branch code to run tests on Python 3.9.9 on my local linux machine and get the errors. That confirms these errors are not introduced by this PR.

@mmckerns
Copy link
Member

Here's a bit of a out-of the box thought regarding this PR. What if you were to pickle the class, then get the source from the pickled instance? dill isolates the (class/function) context, and pickles all of that... so identifying the correct instance could at the very least be confirmed by what get pickled.

@Seraphli
Copy link
Author

Seraphli commented Mar 1, 2025

I don’t quite understand the scenario you're describing. Do you have any test cases? What exactly do you mean by 'get the source from the pickled instance'? I've never performed such an operation before, so I'm not entirely clear on the situation you're referring to.

@Seraphli
Copy link
Author

Seraphli commented Mar 2, 2025

I find when wrap getsource in a function will cause a new problem. I fix that problem in the new commit.

@mmckerns
Copy link
Member

mmckerns commented Mar 3, 2025

See: #243 for an example of serializing class definition with the instance

@Seraphli
Copy link
Author

Seraphli commented Mar 7, 2025

I looked into the issue you mentioned and tested a small example:

import dill

class Foo(object):
    y = 1

_Foo_bits = dill.dumps(Foo)

class Foo(object):
    z = 1

_Foo = dill.loads(_Foo_bits)
print(dill.source.getsource(_Foo))
print(_Foo.y)

I observed that the current code returns:

class Foo(object):
    z = 1
1  

Are you expecting it to return:

class Foo(object):
    y = 1
1  

instead?

@Seraphli
Copy link
Author

Seraphli commented Mar 7, 2025

import dill
import inspect


class Foo(object):
    def foo(self):
        return self.x

    x = 1


class Foo(object):
    def bar(self):
        return self.y

    y = 1


_Foo_bits = dill.dumps(Foo)


class Foo(object):
    def foo(self):
        return self.z

    z = 1


_Foo = dill.loads(_Foo_bits)
print(dill.source.getsource(_Foo))
print(inspect.getsource(_Foo))
print(_Foo.y)
f = _Foo()
print(f.bar())

The result is

class Foo(object):
    def foo(self):
        return self.z

    z = 1

class Foo(object):
    def foo(self):
        return self.x

    x = 1

1
1

I guess in this situation, the only solution is to save the code when pickling the class object. I can't think of any other approaches.

There might be another complex solution - we could first locate the invocation points and analyze the source file's AST to trace the origin of class variables. However, this would inevitably lead to another issue: if the variable is imported from other modules, there's no way to trace back the variable's origin just analyzing one file's AST.

@mmckerns
Copy link
Member

mmckerns commented Mar 7, 2025

instead?

What I was alluding to from that issue is that dill serializes the class definition and you can control whether you use the serialized class definition or not with a setting (keyword) on load. However, get_source always uses the buffer... and I'm wondering if, instead of trying the buffer first, it could first inspect the serialized object. The serialized object has all of the necessary code and dependencies contained.

@Seraphli
Copy link
Author

Seraphli commented Mar 7, 2025

Is there a way to extract the code from a serialized class object? I didn't find any API in dill. Anyway in my case I use a decorator to modify the code of one class using AST, then return and replace the orignal class definition. So I need dill.getsource when python goes through class definitions. If there is a way to extract the code from a serialized class object in current main branch, I would like to use that. That can solve my case.

@mmckerns
Copy link
Member

mmckerns commented Mar 9, 2025

No, there isn't a way to do it currently, or I would have pointed you at it. My point was, if it can be done, you don't have to rely on searching the entire buffer.

If you use dill.source.getsource(obj), where obj is a serialized object, you get this: "import dill\ndill.loads(...pickled...)" where ...pickled... is the pickled string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

getsource not get correct source code
2 participants