-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested signal handling leads to excessive slowdown of sig_occurred()
#215
Comments
I guess recursive That said, is there any cleaner way than the current (fragile) method of using reference counting and run the garbage collector? It looks like it will only be a matter of time until someone comes across a case where the garbage collector is run too many times consecutively which leads to unexplained slowdown. |
Actually nested signal handling is not needed, you might as well do something more innocuous such as
But |
How about we throttle the garbage collector to run at most e.g. 10% of the (wall) time? That way, worst case something heavy is done in a place where |
Here's a way to reproduce something what appears to be the root cause of this in python, using cysignals 1.11.4:
Clearly, by the time I typed This also seems to be the cause of sagemath/sage#38749 (comment) Using python 3.13.1 gives the same. |
No nesting of exceptions is necessary:
Indeed, Does not happen with pure python, but it does happen with ipython:
I suppose sage / ipython is storing the exception somewhere, which keeps the refcount from reaching 1. |
Here is a PoC fix: --- a/src/cysignals/signals.pyx
+++ b/src/cysignals/signals.pyx
@@ -28,6 +28,7 @@ from libc.stdio cimport freopen, stdin
from cpython.ref cimport Py_XINCREF, Py_CLEAR
from cpython.exc cimport (PyErr_Occurred, PyErr_NormalizeException,
PyErr_Fetch, PyErr_Restore)
+from cpython.exc cimport PyErr_GetRaisedException, PyErr_SetRaisedException
from cpython.version cimport PY_MAJOR_VERSION
cimport cython
@@ -360,6 +361,15 @@ cdef void verify_exc_value() noexcept:
Check that ``cysigs.exc_value`` is still the exception being raised.
Clear ``cysigs.exc_value`` if not.
"""
+ cdef PyObject * raised = PyErr_GetRaisedException()
+
+ if cysigs.exc_value is not raised:
+ Py_CLEAR(cysigs.exc_value)
+
+ PyErr_SetRaisedException(raised)
+
+ return
+
if cysigs.exc_value.ob_refcnt == 1:
# No other references => exception is certainly gone
Py_CLEAR(cysigs.exc_value) (note: as is, this requires python 3.12+ and a few tests fail) BEFORE:
AFTER:
|
The problem with your fix is it negates the whole point of The reason why it's introduced is the following: (test with an environment with %load_ext cython
##
%%cython
from cysignals.signals cimport sig_on, sig_off, sig_occurred
from cysignals.tests import print_sig_occurred
from libc.stdlib cimport abort
cdef class Integer:
def __dealloc__(self):
print_sig_occurred()
print("deleted")
#class Integer:
# def __del__(self):
# print_sig_occurred()
# print("deleted")
def mutate(a):
abort()
def f1():
a = Integer()
sig_on()
mutate(a)
sig_off()
def g1():
print("before call f1")
f1()
print("after call f1")
# CANNOT print "no current exception" here
g1() in SageMath, if Of course, the solution implemented in SageMath is suboptimal in that the optimal way is to do for example change def f1():
a = Integer()
try:
sig_on()
mutate(a.value)
sig_off()
except:
a.value = None # avoid calling mpz_clear(a.value) when a is deleted
raise but since there's a lot of usages of Of course in the long term we prefer this solution… but it is really hard to make sure it is bug-free while doing the migration. Alternatively (potentially faster than the above solution because it avoids exception handler, and it's also shorter): def f1():
a = Integer()
sig_on()
cdef mpz a1 = a.value
a.value = None # if interrupted here, a.value is None so destructor is harmless
mutate(a1)
a.value = a1
sig_off() or put into helper function (then there's only 2 extra lines): cdef take_mpz(a: Integer):
"""
This function must be used before calling a mpz function that might be interrupted
"""
cdef mpz a1 = a.value
a.value = None
return a1
def f1():
a = Integer()
sig_on()
cdef mpz a1 = take_mpz(a)
mutate(a1)
a.value = a1
sig_off() |
I understand what you mean. But how are we sure that every integer that was used since def f1():
a = Integer()
sig_on()
mutate(a)
sig_off() In this case, I think cython will ensure
|
tl;dr… I guess we can just go with implementing the proper solution if you want. But:
|
The first bottleneck is there are so many occurrences of it. This is a conservative estimate (though in some cases the function call does not actually mutate the first argument):
|
I'm trying to find ways to reproduce the issue by e.g. adding delay at random places in the memory allocation function. I was thinking if it's possible to at least reproduce the issues on other systems, it is more likely that an incorrect implementation will raise an error. proof of concept: apply the patch, then run
strangely enough neither cdef extern from "<malloc.h>":
int mallopt(int param, int value) nogil
const int M_CHECK_ACTION
mallopt(M_CHECK_ACTION, 0) disables the check. But I guess it's good either way. |
Looks like something goes wrong in Python 3.12?
Or it could be issue with only that pull request, I don't know. Anyway it is evident that something has changed between Python 3.11 and Python 3.12 related to the reference counting. |
It's worse than I thought…
— https://docs.python.org/3/library/sys.html →
— https://docs.python.org/3/library/sys.html#sys.last_exc so if you catch the exception (without ever re-raising it) then that mechanism fails. Looks like it somehow gets worse in Python 3.12 sagemath/sage#39142 (coincidentally sys.last_exc is added at exactly this version) I wonder if ipython/ipython#14684 is the cause. I wonder why flint functions seem to be immune to this. ##
%%cython
from libc.stdint cimport uintptr_t
from libc.stdio cimport printf
from posix.unistd cimport usleep, useconds_t
cdef extern from "flint/flint.h":
void __flint_set_memory_functions (
void *(*alloc_func) (size_t),
void *(*calloc_func) (size_t, size_t),
void *(*realloc_func) (void *, size_t),
void (*free_func) (void *)
)
void __flint_get_memory_functions (
void *(**alloc_func) (size_t),
void *(**calloc_func) (size_t, size_t),
void *(**realloc_func) (void *, size_t),
void (**free_func) (void *)
)
cdef class FlintMemoryFunctions:
cdef void *(*alloc_func) (size_t) noexcept
cdef void *(*calloc_func) (size_t, size_t) noexcept
cdef void *(*realloc_func) (void *, size_t) noexcept
cdef void (*free_func) (void *) noexcept
def set_from_flint_current(self):
__flint_get_memory_functions(&self.alloc_func, &self.calloc_func, &self.realloc_func, &self.free_func)
def set_to_flint(self):
__flint_set_memory_functions(self.alloc_func, self.calloc_func, self.realloc_func, self.free_func)
def __str__(self):
return f"<FlintMemoryFunctions: alloc_func={<uintptr_t>self.alloc_func:#x}, calloc_func={<uintptr_t>self.calloc_func:#x}, realloc_func={<uintptr_t>self.realloc_func:#x}, free_func={<uintptr_t>self.free_func:#x}>"
def __repr__(self):
return str(self)
# ======== wrapper alloc functions to print debug ========
cdef FlintMemoryFunctions backup = None
cdef useconds_t alloc_sleep = 0, calloc_sleep = 0, realloc_sleep = 0, free_sleep = 0
cdef void *new_alloc_func(size_t size) noexcept:
printf("alloc %d\n", <int>size)
usleep(alloc_sleep)
return backup.alloc_func(size)
cdef void *new_calloc_func(size_t nmemb, size_t size) noexcept:
printf("calloc %d %d\n", <int>nmemb, <int>size)
usleep(calloc_sleep)
return backup.calloc_func(nmemb, size)
cdef void *new_realloc_func(void *ptr, size_t size) noexcept:
printf("realloc %p %d\n", ptr, <int>size)
usleep(realloc_sleep)
return backup.realloc_func(ptr, size)
cdef void new_free_func(void *ptr) noexcept:
printf("free %p\n", ptr)
usleep(free_sleep)
backup.free_func(ptr)
def enable_debug():
global backup
assert backup is None
backup = FlintMemoryFunctions()
backup.set_from_flint_current()
__flint_set_memory_functions(new_alloc_func, new_calloc_func, new_realloc_func, new_free_func)
def set_sleep(useconds_t alloc=0, useconds_t calloc=0, useconds_t realloc=0, useconds_t free=0):
global alloc_sleep, calloc_sleep, realloc_sleep, free_sleep
alloc_sleep = alloc
calloc_sleep = calloc
realloc_sleep = realloc
free_sleep = free
def disable_debug():
global backup
assert backup is not None
backup.set_to_flint()
backup = None
##
enable_debug()
disable_debug()
##
set_sleep(calloc=1000000)
set_sleep()
##
R.<x> = QQ[]
a = R([1, 2, 3])
a = R([1, 2, 3, 4, 5])
del a
##
def f():
set_sleep()
a = matrix.zero(ZZ, 100)
b = matrix.zero(ZZ, 100)
set_sleep(calloc=1000000)
alarm(0.5)
c = a + b
cancel_alarm()
f()
import gc
gc.collect()
## ======== test ========
from sage.libs.flint.flint cimport flint_malloc, flint_realloc, flint_calloc, flint_free
from libc.stdio cimport printf
printf("%p %p %p %p\n", flint_malloc, flint_calloc, flint_realloc, flint_free)
printf("%p %p %p %p\n", alloc_func, calloc_func, realloc_func, free_func)
from libc.stdlib cimport malloc, calloc, realloc, free
printf("%p %p %p %p\n", malloc, calloc, realloc, free) |
From sagemath/sage#39224
SageMath session:
Note the invocation of
g()
that is marked# slow
.The text was updated successfully, but these errors were encountered: