-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] process_iter()
is 10x slower when running from non-admin account
#2366
Comments
On my Windows 11 machine, I get 5 to 8 seconds to iterate over 427 processes. import time
from typing import Tuple
import psutil
def proc_by_name(process_name: str) -> Tuple[bool, int]:
start = time.time()
for proc in psutil.process_iter(attrs=["name"]):
if process_name in proc.name():
print("On success took:", time.time() - start, "seconds")
return True, proc.pid
print("On failure took:", time.time() - start, "seconds")
return False, 0
if __name__ == "__main__":
running, pid = proc_by_name("unknown.exe")
print(f"unknown.exe: {running=}, {pid=}")
running, pid = proc_by_name("chrome.exe")
print(f"chrome.exe: {running=}, {pid=}")
print(len(list(psutil.process_iter(attrs=["name"])))) An iteration on the terminal: On failure took: 5.50777530670166 seconds
unknown.exe: running=False, pid=0
On success took: 0.7929043769836426 seconds
chrome.exe: running=True, pid=3024
427 |
This is known. Certain APIs have 2 implementations, a fast one and a slow one. The fast one is attempted first, but requires more privileges, and hence often fail with Some examples of "dual" APIs are:
|
Thank you, @giampaolo . I wasn't aware of a dual implementation driving parts of the Now, assuming that the process I wish to test the existence for (and getting additional info from, such as virtual mem usage metrics, or say Is there a way to check solely for such process, and get info about it in a faster way than As @tduarte-dspc just exemplified very concisely (and even when running under an admin account, which presumably engages the faster API even if such account is not Thank you. |
It depends on what criteria you use to identify the process you're looking for. Is it based on E.g. import psutil, os
myuser = os.getlogin()
mycmdline = ["C:\\python310\\python.exe", "foo.py"]
for p in psutil.process_iter():
try:
if p.username() == myuser and p.cmdline() == mycmdline:
print(f"found {p}")
except psutil.Error:
pass With that said (mostly note to self): it would make sense to debug-log APIs which use the dual implementation, so one can identify performance bottlenecks by running psutil in |
Well, my question was mostly about finding a way to avoid iterating through the list of all processes, i.e. how to avoid Probably it's not supported in the current Within everyone's agreement, I'll close this issue, since it's proven to work as designed, and it's not a defect. Thanks again for everybody's time, and all the best. |
…mpaolo-master * 'master' of https://github.com/giampaolo/psutil: add black opt to make lines more compact giampaolo#2366 [Windows]: log debug message when using slower process APIs Linux: skip offline cpu cores in cpu_freq (giampaolo#2376) fix py2 failure update style to latest black ver chore: update cibuildwheel on windows (giampaolo#2370) use unicode literals u"" instead of u("") make install-pip: fix installation on python 2 more ruff rules adapt to new ruff config directives update CREDITS + mention @c0m4r for sponsorship (thanks!) Include net/if.h before net/if_dl.h (giampaolo#2361) pre-release fix failing tests refac t
* giampaolo-master: add black opt to make lines more compact giampaolo#2366 [Windows]: log debug message when using slower process APIs Linux: skip offline cpu cores in cpu_freq (giampaolo#2376) fix py2 failure update style to latest black ver chore: update cibuildwheel on windows (giampaolo#2370) use unicode literals u"" instead of u("") make install-pip: fix installation on python 2 more ruff rules adapt to new ruff config directives update CREDITS + mention @c0m4r for sponsorship (thanks!) Include net/if.h before net/if_dl.h (giampaolo#2361) pre-release fix failing tests refac t
Hi @giampaolo I am not 100% sure if it is related but "simple" process enumeration However, it is not clear how the flow gets there. I can see from the code and you confirm it above, additional attributes collection may trigger that if not high-privileged process but simple enumeration does not appears to be doing that yet in the debugger I can see it. Because I do not have debug symbols I cannot say what call that function. The problem with that that on a machine with 400 processes under non-admin service process iteration is 200x slower. If from 5.9.0 psutil I switch to 6.0.0 (I Have seen release not which stated 20x improvement) and I actually see it, but again I see ONLY 20x improvement, 10x overhead is still there. So it is not clear what and why in Thank you. |
If never used before, Process creation time on Windows uses a dual-implementation (see my previous comment). If the first (fast) method fails due to insufficient permissions, a second (much slower) method is attempted (source). If you iterate over for p in psutil.process_iter():
if p.name() == "myapp.exe":
print("found it!")
break With that said, I recently bumped into a comment on X, which made me realize that this is more serious than I though:
The speedup you're seeing is due to #2396 (merged in 6.0.0). Basically |
process_iter()
is 10x slower when running from non-admin account
I ran some tests to confirm your theory. @wrap_exceptions
def create_time(self):
import time
# Note: proc_times() not put under oneshot() 'cause create_time()
# is already cached by the main Process class.
try:
start = time.time()
user, system, created = cext.proc_times(self.pid)
print("NOT USING FALLBACK FASST")
print(time.time() - start)
return created
except OSError as err:
if is_permission_err(err):
start = time.time()
print("USING FALLBACK SLOWW")
x = self._proc_info()[pinfo_map['create_time']]
print(time.time() - start)
return x
raise
So it seems for some processes it uses the fallback method for others it doesn't. When it's using the fallback method it takes indeed way longer. Presumably when the processes are higher privileged ones (not sure) ? @giampaolo Do you have an idea why the fallback method is so slow ? I understand your concern about the is_running method depending on the creation time of the process and therefore needing this information. So one idea would be to make the fallback method faster or replacing it by a different(faster) implementation. Alternatively you could create a If you're curious here is my ctypes implementation I'm using right now. https://gist.github.com/ThoenigAdrian/b12bb7e6c438fd4f7a7e56c67a294484 import ctypes
import ctypes.wintypes
# Load the required libraries
psapi = ctypes.WinDLL('Psapi.dll')
kernel32 = ctypes.WinDLL('kernel32.dll')
# Define constants
PROCESS_QUERY_INFORMATION = 0x0400
PROCESS_VM_READ = 0x0010
MAX_PATH = 260
def get_pids_by_name_fast(process_name):
process_name = process_name.encode('utf-8')
pids = []
# Allocate an array for the process IDs
array_size = 1024
pid_array = (ctypes.wintypes.DWORD * array_size)()
bytes_returned = ctypes.wintypes.DWORD()
# Call EnumProcesses to get the list of process IDs
if not psapi.EnumProcesses(ctypes.byref(pid_array), ctypes.sizeof(pid_array), ctypes.byref(bytes_returned)):
raise ctypes.WinError(ctypes.get_last_error())
# Calculate the number of processes
num_pids = bytes_returned.value // ctypes.sizeof(ctypes.wintypes.DWORD)
# Iterate over all the process IDs
for pid in pid_array[:num_pids]:
# Open the process with necessary privileges
h_process = kernel32.OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, False, pid)
if h_process:
exe_name = (ctypes.c_char * MAX_PATH)()
h_module = ctypes.wintypes.HMODULE()
needed = ctypes.wintypes.DWORD()
# Get the first module, which is the executable
if psapi.EnumProcessModules(h_process, ctypes.byref(h_module), ctypes.sizeof(h_module), ctypes.byref(needed)):
psapi.GetModuleBaseNameA(h_process, h_module, ctypes.byref(exe_name), ctypes.sizeof(exe_name))
if exe_name.value.lower() == process_name.lower():
pids.append(pid)
kernel32.CloseHandle(h_process)
return pids
# Example usage:
process_name = "python.exe"
matching_pids = get_pids_by_name_fast(process_name)
print(f"PIDs for processes named '{process_name}': {matching_pids}") |
@ThoenigAdrian the fallback method is very slow because it effectively asks kernel to retrieve ALL processes running in the system with many details. Some of it as @giampaolo said is cached. I do see in 6.0.0 the very first iteration takes 30 seconds and subsequent iterations take "only" 1.5 seconds. To my code the logic of getting process times for every process is unfortunate since
The bottom line for our kind of logic current implementation of process iteration under lower privilege process is a huge waste. I guess changing that iteration now impossible because it would break old code and lot of existing logic. Perhaps there could be alternative method to get a list of pid->name map? We could of cause call Windows APIs directly but it is kind of defeat the purpose and convenience of psutil. |
And thank you very much for reply :bow |
@ThoenigAdrian I have replied before I have read your whole comment. Regarding of your Also @ThoenigAdrian your
Would be interesting to hear @giampaolo thought if that style of process data collection can be naturally and organically added to the existing API styles 🙇 |
@giampaolo I want to mention one more thing which I do not know if it is relevant to the way you think about these issues. Per your comment above
This approach indeed could help in many cases but in cases when you want to collect information for all chrome.exe processes, e.g., it would not work unfortunately. |
-1 Adding a function returning a pid->name mapping would probably cover the most common use case, but it wouldn't be a generic enough solution. E.g. one may want to filter for To clarify: we fetch
If Perhaps a possible solution would be adding a new Perhaps the new paramenter may even default to The doc may clarify this by stating: <<If you plan on using I see 3 downsides of this solution, not really real blockers, just mentioning them for completeness and personal brainstorming:
Any comment is welcome =) |
@giampaolo thank you for your openness and invitation for a dialog. I probably will need to sleep on it to have a more sensible reply but here are my few cents (please forgive me I am thinking aloud).
Thank you and best regards 🙏. |
Interesting. This message seems to confirm what you say https://superuser.com/a/937134, but it's unclear how he tested this. Also it's unclear after how much time (if any) Windows can re-assign the same PID, which is the key point here. To clarify: on Linux the creation time has a 2-digits precision ( >>> import psutil
>>> psutil.Process().create_time()
1727641709.66
>>> That means that if PID disappears at psutil makes the same assumption on Windows, but right now I can't check what's the time precision there, nor we know how PIDs are assigned exactly (couldn't find any useful info). It must be noted that the number of maximum PIDs also matters here: when the OS runs out of PIDs it will restart from 0. Therefore the smaller the max-PID, the more likely it is to hit the
So yes, psutil algo is technically racy, but practically speaking it should be "good enough" in most cases, and "better than nothing" in the worst case.
Very good point, and I agree with you. For the time being I think the quicker way to solve this issue on Windows is to only use the "fast" create time method in
Excellent, thanks for letting me know. Since on Windows it's less clear how PIDs are assigned, this API looks particularly useful. We may determine API existence at runtime and do (in pseudo code): def unique_process_ident(pid):
if WIN_VER >= 10:
return (pid, ProcessStartKey(pid))
else:
return (pid, fast_creation_time(pid))
Are you proposing something like this? >>> psutil.pids_names_map()
{1: "foo.exe", 2: "bar.exe", ...} Please note that if we use the fast create time method in |
I have created PR #2444 which implements exactly this. This should solve the severe performance issue described in here. @ThoenigAdrian if you have a chance to test this PR please report back here, but you'll need Visual Studio installed in order to compile psutil, I believe Github CI also stores the binary wheels somewhere but can't remember where. |
Indeed. In a previous cybersecurity company, PID reuse triggered an internal kernel process cache bug, and my coworker investigated it by creating a tight loop of process creation, where each process exited immediately (this was a while ago). Within minutes, he encountered multiple collisions, if my memory serves me right. Without consulting Microsoft or reverse-engineering Windows, it is hard to determine the likelihood or chances of PID reuse per unit of time under different load levels. One thing is certain, though: we cannot infer the minimum granularity of time resolution, which, if I understand your reply correctly, could be done. Windows time-related APIs in user mode (outside of a few C-runtime wrappers) and 100% in the kernel use the FILETIME structure, which provides time in 100-nanosecond intervals (and this is documented here: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-filetime). I don't believe this is guaranteed under all circumstances, but it's the theoretical foundation. Additionally, in systems where numerically based resources are allocated and released in a stack-like manner, rapid reuse of numbers when things are allocated and released quickly could result in giving back the same resource/number, unless explicit measures are taken to avoid it. I wouldn't necessarily say this is uncommon. I think it's a rather pesky problem for any software trying to cache process information, possibly even including Microsoft's own cybersecurity products. I suspect this is why they extended one of the crucial process kernel structures to include the truly unique ID I mentioned earlier.
On Windows, the kernel uses 64-bit process IDs (PIDs), while the user mode API uses 32-bit PIDs. Technically, this allows for around 4 billion PIDs. However, in my experience, I have never seen a PID with more than six digits. Without PID reuse, I doubt even systems with terabytes of RAM could handle even a tiny fraction of that number. For example, 2,000 small processes on my machine consume around 20 GB of memory. I believe many things would break before Windows could handle 100,000 processes. Over time, processes are created and destroyed continuously, and if a computer runs for a long time, the total number of processes can add up. However, since PIDs are reused, I don’t think we’ll ever see PID numbers roll over on Windows
👍
Yes and yes. Indeed my suggestion is moot if process enumeration would not automatically and always implicitly call "slow" function if GetProcessTimes fails. I have a few more, somewhat related thoughts and idea. I could be wrong in my assumptions to begin with., Perhaps they are more questions than suggestions. First, let's consider a scenario during a process iteration where a particular attribute triggers the invocation of a 'slow' function that collects detailed process information. It appears that the 'slow' function, psutil_get_proc_info, retrieves information for all processes but ignores any 'expensive' data except for the specified process. I suspect that in the next iteration, if the 'slow' function is the only way to obtain the data, it would repeat the slow call, even though the 'expensive' data was just collected a microsecond earlier. If this is the case, it could be resolved by retaining the 'expensive' process data until the entire process enumeration is complete. It is almost like oneshot call except in this case it is not needed since process iteration defines a perfect and tight scope, Second,. The same also applies if you want to in rapid succession collect process information for a batch of PIDs. If I want to collect all process details for all chrome.exe processes, I do not want to have "slow" function acquire "expensive" data applicable to all processes to be called more than once. I do not know how to setup the scope though. Oneshot would not work here. Perhaps the following 3rd would address it but only if it is part of the process enumeration and not stand alone constrct. Third, I believe that a filtered process iteration could be very useful, especially in cases where I need a large set of attributes for a subset of properties (at least based on name). It allows me to retrieve and set up 'expensive/slow' attributes in a single call when I only care about a subset of processes, rather than relying on 'fast' attributes to collect the process ID and name, and then manually filtering and opening them separately (which is what we're doing now, though process enumeration is not very fast yet). Thank you again for this useful discussion. 🙏 |
This is correct. Internally This is easier said than done though. Windows is the only platform offering an API like this. As such psutil code and API evolved with the assumption that such a thing (retrieve info about all PIDs) couldn't be done. Also, I guess a separate brand new function could be provided, something like: psutil.multi_proc_info()
{
{1: {"user_time": ..., "system_time", ...}},
{2: {"user_time": ..., "system_time", ...}},
{3: {"user_time": ..., "system_time", ...}},
...
} ...but it would be Windows only and sort of different than the rest of the API. Maybe it could live under a new |
I agree totally with all your points.
Very good points naive implementation probably would satisfy quick enumeration but generator would not be good. I am not sure about Python generator semantics. Is there a way to see that complete enumeration had been done and we can drop cached information? Perhaps we can also rely on the time, if the cached information is 1/4 of a second old, get a new one?
Indeed conceptually it is cross-platform oddity. By the way can you provide a bit more details on how multi_proc_info call would look like in terms of API call. I thought more of oneshot kind of semantics when the scope is defined outside but internally it influence regular and existing Process methods calls. What if in addition to Process.oneshot() we can add procutil.Oneshot(process iterator or list) which would keep cross-process context AND process private context (without their explicit definition) but overall semantics would be similar (I am thinking aloud)? Overtime it can keep globally per-scope affecting knobs and caches which could be useful on other OSes beyond automating per-process oneshot. |
Modeling, and especially retrofitting, an API is not easy. However, I want to share some of the reasons why I am eager to discuss various approaches. This is not to justify a particular implementation technique, but rather to provide additional context and perspective. In some cases, we've observed that for customers with many running processes, process enumeration and data collection at regular intervals (every 15 seconds or every few minutes) can consume more than 50% of CPU usage, even on high-performance servers—far more than the rest of the large, busy application. When this feature is disabled, CPU usage drops to negligible levels. The root cause, which is now more apparent, is repeatedly calling for the same expensive data and discarding most of it over and over for each of the many processes. |
Yeah, indeed. It probably means
You use
Hmm. Something like this? with psutil.oneshot():
for proc in psutil.proess_iter():
... Maybe. This would have the extra advantage to work with
Not bad. It's something I've been pondering for a while actually. The oddity though, is that it requires relying on a global var ( Another possible idea could be Quite a brainstorming... :)
Definitively. Long ago I blogged about it: https://gmpy.dev/blog/2013/making-constants-part-of-your-api-is-evil. Back then it was easier to fix mistakes and break compatibility. Today we can't. If something gets in, it's not like "it's forever" but... almost. I guess sometimes I probably look excessively cautious up here, mostly because of this.
Interesting. #2444 should alleviate some pain, but it depends on what you're doing in your code really. If your code calls one of these methods for multiple ADMIN process then you experience the slowdown, else you won't. Question is: do you really need to do that? If not, you may filter out those processes by using |
True
I understand now, that is good
Right, Introducing implicit globals is not good. I was hoping, not knowing internal details, that In my opinion, we are effectively discussing introduction for implicit context to avoid passing it explicitly to the Process objects and having an odd interface and explanation, especially if it is would be primarily useful only for Windows. Maybe there is no elegant way of avoiding signature changes or strange constructs. I did find Context Managers (contextlib) which work
Indeed :)
Yes, this is an interesting topic in general and I can say even more about it since there are a few interesting aspects. Perhaps my points and how I am thinking about them even though specific for our case still could be useful for overall discussion as an extra context. Please bear with me.
|
@giampaolo I would not normally say it in a github issue but if you reply next week I probably will not be able to reply back quickly since I will be on the PTO next week. But I certainly will after 🙏 |
This is based on #2366 (comment). On Windows, we now determine process unique identity by using process' fast create time method. This has more chances to fail with `AccessDenied` for ADMIN owned processes, but it shouldn't matter because if we have no rights to get process ctime we'll also have no rights to accidentally `kill()` the wrong process PID anyway. This should drastically speedup `process_iter()` when used for retrieving process info one time instead of in a loop (e.g. htop like apps).
@giampaolo, issue #2454 is related to batching/speeding up Linux API calls only, right? What about implicit batching of process information collection on Windows when the collector's privilege level is not high, as I mentioned in my latest comment? Are you still considering changes to address that in some way, or are you transferring this discussion to issue #2454? Your last comment suggests that, but the issue's title and scope seem different. |
Thank you, I appreciate that 🙏 |
Summary
Description
Running the following Python code from a non-admin Windows user account, takes about
400ms
.And when running the same code from a the same Windows user account, but through an ADMIN/elevated
cmd.exe
command prompt, it takes about38-40ms
, which is10x faster
.The total exec time does not seem to be influenced by whether the process searched for, is currently running or not.
Also, when the process searched for is running, it's always only 1 instance of it, so no multiple processes of the same name.
The code above is invoked indirectly, via Uvicorn ASGI web application server (configured with only 1 worker) and FastAPI web api framework.
Same machine, same Virtual Mem usage, same list or processes between the two cases.
About 220 processes showing in TaskMgr / tasklist, in both cases.
There is something that makes psutil's Process item generator go 10x slower in non-admin mode, than in admin mode.
Thank you.
The text was updated successfully, but these errors were encountered: