Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SourceFileLoader does not (fully) accept path-like objects #87005

Closed
favonia mannequin opened this issue Jan 6, 2021 · 17 comments
Closed

SourceFileLoader does not (fully) accept path-like objects #87005

favonia mannequin opened this issue Jan 6, 2021 · 17 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir topic-importlib type-feature A feature request or enhancement

Comments

@favonia
Copy link
Mannequin

favonia mannequin commented Jan 6, 2021

BPO 42839
Nosy @gvanrossum, @brettcannon, @encukou, @FFY00, @favonia

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-01-06.14:32:12.762>
labels = ['type-feature', 'library', '3.9']
title = 'SourceFileLoader does not (fully) accept path-like objects'
updated_at = <Date 2021-09-07.14:10:09.707>
user = 'https://github.com/favonia'

bugs.python.org fields:

activity = <Date 2021-09-07.14:10:09.707>
actor = 'petr.viktorin'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2021-01-06.14:32:12.762>
creator = 'favonia'
dependencies = []
files = []
hgrepos = []
issue_num = 42839
keywords = []
message_count = 5.0
messages = ['384504', '384524', '384531', '384532', '401280']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'brett.cannon', 'petr.viktorin', 'FFY00', 'favonia']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue42839'
versions = ['Python 3.9']

Linked PRs

@favonia
Copy link
Mannequin Author

favonia mannequin commented Jan 6, 2021

If one uses the loader created by importlib.machinery.SourceFileLoader(module_name, path) in a meta path finder, where path is not a str, then the following error would happen at the moment the module is being imported. Note that, the error would not occur if the corresponding bytecode cache (pycache/*.pyc) is present.

File "/usr/lib/python3.9/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 786, in exec_module
File "<frozen importlib._bootstrap_external>", line 932, in get_code
File "<frozen importlib._bootstrap_external>", line 604, in _code_to_timestamp_pyc
ValueError: unmarshallable object

Here is an example that could trigger the error. The package unipath is not very important except that unipath.Path is not marshallable. (Any library with a fancy, unmarshallable path-like object would work.) The choice of 'local' and the use of 'os.getcwd()' are also irrelevant. The actual production code is quite complicated, using different path-like objects and convoluted logic. I made up this example from scratch.

import os
import sys
import importlib
import importlib.abc
import importlib.util
from unipath import Path # just an example

class DummyFinder(importlib.abc.MetaPathFinder):
    def __init__(self):
        self.cwd = os.getcwd()
    def find_spec(self, fullname, path, target=None):
        if fullname == 'local':
            initpath = os.path.join(self.cwd, '__init__.py')
            if os.path.isfile(initpath):
                loader = importlib.machinery.SourceFileLoader(fullname, Path(initpath)) # some non-str Path-like object here
                return importlib.util.spec_from_file_location(fullname, initpath,
                        loader = loader, submodule_search_locations=[self.cwd])
        else:
            return None

sys.meta_path.append(DummyFinder())
importlib.import_module("local.test2")

If this is indeed a bug, there might be other classes and functions in importlib that share the same problem.

@favonia favonia mannequin added 3.9 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 6, 2021
@gvanrossum
Copy link
Member

Maybe Brett can help?

@brettcannon
Copy link
Member

importlib is probably not os.PathLike-clean due to its bootstrapping restrictions of not getting to use anything implemented in Python from 'os' (i.e. if it isn't being managed in posixmodule.c then it probably won't work).

If you follow the traceback it's trying to marshal a code object for the eventual .pyc file and failing (

data.extend(marshal.dumps(code))
). The real question is why is any unmarshallable object getting passed in the first place since that object is the code object that compile() returned.

Best guess? The compile() function is being given the path-like object (via

def source_to_code(self, data, path, *, _optimize=-1):
) and it's blindly setting it on the code object itself, and then marhsal fails since it can't handle pathlib.Path. If my hunch is right, then the possible solutions are:

  • Don't do that 😉
  • Make compile() aware of path-like objects
  • Have importlib explicitly work around compile()'s shortcoming by doing the right thing for path-like objects before passing in the the 'path' argument.

@gvanrossum
Copy link
Member

If Brett's theory is right, compile() has a bug though -- it shouldn't plug a non-string into the code object. Or possibly the code object constructor should reject such objects (or extract the string).

@encukou
Copy link
Member

encukou commented Sep 7, 2021

I just filed the slightly more general bpo-45127 ("Code objects can contain unmarshallable objects").

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@ChrisBarker-NOAA
Copy link
Contributor

ChrisBarker-NOAA commented Jun 14, 2024

This hasn't been touched for a while, but I just ran into the issue.

I'm trying to dynamically load a module from a file on disk with a path provided by code.

using:

config = SourceFileLoader("config", config_file_path).load_module()

if config_file_path is a Path object, I get:

  File "<frozen importlib._bootstrap_external>", line 605, in _check_name_wrapper
  File "<frozen importlib._bootstrap_external>", line 1120, in load_module
  File "<frozen importlib._bootstrap_external>", line 945, in load_module
  File "<frozen importlib._bootstrap>", line 284, in _load_module_shim
  File "<frozen importlib._bootstrap>", line 433, in spec_from_loader
  File "<frozen importlib._bootstrap_external>", line 833, in spec_from_file_location
  File "<frozen importlib._bootstrap_external>", line 926, in is_package
  File "<frozen importlib._bootstrap_external>", line 134, in _path_split
  File "<frozen importlib._bootstrap_external>", line 134, in <genexpr>
AttributeError: 'PosixPath' object has no attribute 'rfind'

which is all as expected, and why this issue exists.

So: would it be possible to simply call: os.fspath() up front? (it's been a couple years, maybe something has changed?)

Or, as Brett suggested: "due to its bootstrapping restrictions of not getting to use anything implemented in Python from 'os' " is still the case.

Is there a simpler option: Could you do str() on it? -- not as robust, but would work for Path objects. Or even something specific to Path objects -- poke into the object itself?

If It's simply not possible, then it could be clearly documented that it only takes a string path, and this issue can be closed as won't fix.

@brettcannon
Copy link
Member

Or, as Brett suggested: "due to its bootstrapping restrictions of not getting to use anything implemented in Python from 'os' " is still the case.

That's still the case.

Is there a simpler option: Could you do str() on it? -- not as robust, but would work for Path objects.

That exact sort of shortcut is why __fspath__ exists, so I don't think it's a good solution.

it could be clearly documented that it only takes a string path, and this issue can be closed as won't fix.

I would be fine with that unless someone wants to put in the effort to do the work to make importlib support path-like objects.

@ChrisBarker-NOAA
Copy link
Contributor

That exact sort of shortcut is why fspath exists, so I don't think it's a good solution.

sure -- which is what I meant by "not robust" -- but I think supporting the PathLike in the stdlib would be a win, even if no others would work.

NOTE: I see in FileLoader.get_data

    def get_data(self, path):
        """Return the data from path as raw bytes."""
        if isinstance(self, (SourceLoader, ExtensionFileLoader)):
            with _io.open_code(str(path)) as file:
                return file.read()
        else:
            with _io.FileIO(path, 'r') as file:
                return file.read()

It is wrapping the path in str() -- but it failed ( I think) in _check_name

Looking closer, making all of importlib PathLike-safe would be a big (and maybe not practical at all) lift, but making SourceFileLoader accept a PathLike could be as simple as something like:

class FileLoader:

    """Base file loader class which implements the loader protocol methods that
    require file system usage."""

    def __init__(self, fullname, path):
        """Cache the module name and the path to the file found by the
        finder."""
        self.name = fullname
        try:
            self.path = path.__fspath__()
        except AttributeError:
            self.path = path

which would then accept anything with an __fspath__, and anything else would fail in the same way it currently does.

However, thinking now -- importlib is the low-level machinery for importing -- it makes sense that it's working with simple string paths.

So really, the only reason that folks might want to use SourceFileLoader with a PathLike is that they are trying to load a module directly from their own code. In which case,

from importlib.machinery import SourceFileLoader
config = SourceFileLoader("config", config_file_path).load_module()

Is not a very user-friendly way to do it -- it sure feels like poking into internals in a way I shouldn't be. And if I'm reading the current source correctly load_module() is deprecated, and I should be using .exec_module() instead (though that's not a direct replacement). But the above is the advice that's scattered over the internet.

So, maybe what we should do is add a function to importlib along the lines of:

def load_module_from_path(name, path):
    """
    load a module from a file on disk
    
    :param name: name of the module
    :type name: str

    :param path: path to the file
    :type path: PathLike

    :returns: instantiated module object
    """
    import os  # import here, so that it only gets imported if needed.
    path = os.fspath(path)
    module = SourceFileLoader(name, path).load_module()
    return module

Which, in fact, is a lot like the function I put in my code for this very reason.

NOTE: It seems the non-deprecated way to do this is to use .exec_module() -- but I couldn't in a hour or so figure out how to do that [*]-- exec_module requires a module already, and create_module seem so be a no-op in the FileLoader class -- so ??

Which makes a utility function like the above all the more a good idea.

[*] importlib is not well documented for end-users -- again, a reason that a utility function would be a good idea

@brettcannon
Copy link
Member

importlib is not well documented for end-users

If you feel there are improvements to be made to the documentation, then please consider opening PRs to improve it. But I would also ask that you please not state your opinion as fact as those of us who have put significant time into the module may not agree w/ your view and feel demotivated by your phrasing.

@ChrisBarker-NOAA
Copy link
Contributor

I apologize for the harsh wording.

I should have said: "I have found the module hard to understand"

When I was struggling to understand it, I assumed the docs had not been written with someone like me in mind -- i.e. someone trying use the SourceFileLoader, rather than, say, someone trying to write a new Loader.

I honestly couldn't begin to write a PR, as I don't understand the module. If it would be helpful, I could comment on at least some of what I found confusing.

In any case, what I am trying to say is that importlib (for good reason) is a complex system -- so providing a easy and obvious way to load a module from a user-specified file could be helpful.

And maybe there are other utilities that could be helpful as well -- in the spirit of reload(). Though I can't think of any others at the moment.

Thanks, Chris

@brettcannon
Copy link
Member

I honestly couldn't begin to write a PR, as I don't understand the module. If it would be helpful, I could comment on at least some of what I found confusing.

Yep, separate issue(s) outlining what you found confusing would be helpful.

In any case, what I am trying to say is that importlib (for good reason) is a complex system -- so providing a easy and obvious way to load a module from a user-specified file could be helpful.

It's a balancing act. We don't necessarily want to make it too easy to circumvent the import system as that's a big hammer w/ subtlety that a lot of people (accidentally) ignore but still end up getting bit by. That's why we have e.g. https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly to tell people what they can basically do for common cases. This is also often a situation where people expect different things to be handled for them, so coming up w/ an encompassing API is tough. As such, we have typically encouraged people to look to PyPI for higher-level stuff that doesn't directly tie into how import functions.

@ChrisBarker-NOAA
Copy link
Contributor

https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly

Ahh -- thanks -- I hadn't found that.

That looks to me like it's the non-deprecated way to do:

module = SourceFileLoader(name, path).load_module()

Is that correct -- will it do the same thing?

In which case, I propose that that recipe be added to importlib as a utility function along the lines of:

def load_module_from_path(name, path):
    """
    load a module from a file on disk

    :param name: name of the module
    :type name: str

    :param path: path to the file
    :type path: PathLike

    :returns: instantiated module object

    Note: The module is also added to sys.modules, and will behave the same as though
              the file were on sys.path and imported the usual way.
    """
    import os  # import here, so that it only gets imported if needed.
    path = os.fspath(path)

    spec = importlib.util.spec_from_file_location(name, path)
    module = importlib.util.module_from_spec(spec)
    sys.modules[name] = module
    spec.loader.exec_module(module)

    return module

I understand that it was a deliberate decision to put the recipe in the docs, rather than a utility function, though unless it's really something that is "not recommended" (in which case perhaps it should say so in the docs) -- then this is just complex enough that a utility function is called for -- particularly since SourceFileLoader(name, path).load_module() (now deprecated?) is advice readily found on the internet.

Does this need another issue or PR?

Meanwhile, for the original issue:

I suggest that checking for __fspath__ be added to the: FileLoader.__init__() and perhaps a couple other places where file paths are initialized -- that is the "outer" parts of the importlib API.

Though for my use-case the proposed utility function would solve the problem at hand.

@brettcannon
Copy link
Member

brettcannon commented Jul 10, 2024

That looks to me like it's the non-deprecated way to do:

module = SourceFileLoader(name, path).load_module()

Is that correct -- will it do the same thing?

Roughly.

though unless it's really something that is "not recommended" (in which case perhaps it should say so in the docs)

This is being discussed in your PR, but yes, the recipe needs a warning on it.

Does this need another issue or PR?

Yes as I'm going to close this issue.

@brettcannon
Copy link
Member

Closing as making this change is a lot of work for something that people in general should be discouraged from doing (circumventing the import system to import something).

@brettcannon brettcannon closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2024
@brettcannon
Copy link
Member

SourceFileLoader(name, path).load_module() (now deprecated?) is advice readily found on the internet.

Someone working on #121604 said this should be raising a deprecation warning and a quick check of the source that should be happening. Was that not happening for you?

@ChrisBarker-NOAA
Copy link
Contributor

It was not -- but I was (likely) testing in an older version.

Let's assume it is, and I'll re-post if I do find a problem when I check 3.12.

@ChrisBarker-NOAA
Copy link
Contributor

Confirmed -- deprecation warning is there -- I didn't see it because I didn't run with -Wd -- which is a issue that is unrelated to this one. All good here.

-Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes stdlib Python modules in the Lib dir topic-importlib type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants