Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rcpp exception types are not correctly recognized when thrown across translation units #972

Closed
wch opened this issue Jun 20, 2019 · 26 comments

Comments

@wch
Copy link

wch commented Jun 20, 2019

If there are Rcpp exceptions that are thrown from one translation unit and caught in another, they are sometimes not recognized with the correct type.

In this example, there are two functions that throw different kinds of exceptions, Rcpp::exception and Rcpp::internal::InterruptedException, and another function that catches exceptions and returns a string describing which type it caught.

library(Rcpp)

cppFunction('
  void rcpp_exception_thrower() {
    throw Rcpp::exception("This is a Rcpp::exception");
  }
')
cppFunction('
  void rcpp_interrupt_thrower() {
    throw Rcpp::internal::InterruptedException();
  }
')

cppFunction('
  std::string catcher(Function f) {
    try {
      f();
    }
    catch(Rcpp::exception &e) {
      return "Caught Rcpp::exception";
    }
    catch(Rcpp::internal::InterruptedException &e) {
      return "Caught Rcpp::internal::InterruptedException";
    }
    catch(std::exception &e) {
      return "Caught std::exception";
    }
    catch( ... ) {
      return "Caught unknown object";
    }
    return "Caught nothing";
  }
')

catcher(rcpp_exception_thrower)
#> [1] "Caught std::exception"

catcher(rcpp_interrupt_thrower)
#> [1] "Caught Rcpp::internal::InterruptedException"

When the Rcpp::exception is thrown, the catcher recognizes it as a std::exception, which is the parent class.

When the Rcpp::internal::InterruptedException is thrown, it is recogized correctly. (This is actually a little surprising to me, since we have other cases where a Rcpp::internal::InterruptedException is not recognized, and this is the problem in practice for us.)

In our use case, we want to know when an interrupt occurs during the execution of an Rcpp::Function and act accordingly, but since the Rcpp::internal::InterruptedException is not properly recognized, we aren't able to handle the interrupt correctly.

Here is a package that demonstrates the issue with Rcpp::internal::InterruptedException not being recognized: https://github.com/jcheng5/exceptiontest. And here is a related issue: r-lib/later#55

cc: @kevinushey

@eddelbuettel
Copy link
Member

Hmpf. Patches welcome. Smells a litle like Tomas Kalibera telling us not to longjmp (I simplified somewhat here...)

Any ideas as to how do make this more solid?

@kevinushey
Copy link
Contributor

I think the problem is that different 'versions' of the various Rcpp objects are getting compiled into the different shared objects generated by cppFunction(), and for some reason they're not considered compatible / the same for the purposes of exception handling within the different modules.

Here's what I see with some debug logging. Note that there's some complications from all the nested try-catch C++ contexts generated by attributes, as well as the R-level errors getting thrown around...

> catcher(rcpp_exception_thrower)
BEGIN_RCPP                  // from catcher
BEGIN_RCPP                  // from rcpp_exception_thrower
END_RCPP: Rcpp::exception   // Rcpp::exception caught in Attributes wrapper
Emitting R error!           // converted to R error
Throwing eval_error()       // R-level error is caught in Rcpp_eval; forwarded on
[1] "Caught std::exception"

So the Rcpp::eval_error thrown by rcpp_exception_thrower() is not considered to be an Rcpp::exception in the context where it's caught (within catcher()). Presumedly this is because they're "technically" different objects, even though they're otherwise ABI-compatible?

A similar story for the Rcpp interrupt exception:

> catcher(rcpp_interrupt_thrower)
BEGIN_RCPP                     // from catcher
BEGIN_RCPP                     // from rcpp_interrupt_thrower
END_RCPP: InterruptedException // Rcpp interrupt caught in Attributes wrapper
Calling Rf_onintr()!           // going back to R
Throwing interrupt exception   // interrupt exception caught in Rcpp_eval; forwarded on

... but I'm not sure why the interrupt exception gets properly caught this time.

@lionel-
Copy link
Contributor

lionel- commented Jun 20, 2019

This seems relevant, from the GCC manual:

-shared-libgcc
-static-libgcc

On systems that provide libgcc as a shared library, these options force the use of either the shared or static version, respectively. If no shared version of libgcc was built when the compiler was configured, these options have no effect.

There are several situations in which an application should use the shared libgcc instead of the static version. The most common of these is when the application wishes to throw and catch exceptions across different shared libraries. In that case, each of the libraries as well as the application itself should use the shared libgcc.

Therefore, the G++ driver automatically adds -shared-libgcc whenever you build a shared library or a main executable, because C++ programs typically use exceptions, so this is the right thing to do.

Tentative list of potential issues:

  • It doesn't seem sufficient to build packages as shared libraries, the main executable also needs to link dynamically to its runtime.

  • R is compiled as a C program. Even it's linked dynamically to its runtime, does this include support for C++ features?

  • The compilers for R and the packages might need to be congruent. Even if the runtime is dynamically linked, and the compilers are congruent, their versions might not be the same.

@wch
Copy link
Author

wch commented Jun 20, 2019

Now that I've done some more exploration, I think that my example might not be a good test of what I had described. This is because the functions are wrapped in R code that calls Rcpp code, and, as Kevin noted, the exceptions are caught by the Rcpp code that wraps the thrower functions, converted to R errors or interrupts, and then Rcpp converts those to C++ exceptions in the catcher function. So in short, the object that's thrown is not the same object that's caught.

The example in https://github.com/jcheng5/exceptiontest is a better test, since the catcher function calls the thrower function's C++ code directly, without all the wrapping and unwrapping.

Another observation: I think that in the vast majority of cases, a C++ function in one package won't directly call a C++ function (which throws a Rcpp exception) in another package. If C++ code in one package calls C++ code in another package, it's much more common for the callee to be wrapped in an R function, and so the caller will go through R, and the exception wrapping and unwrapping that I've described. So in practice, this issue may be a very rare one.

@eddelbuettel
Copy link
Member

R itself has a configure-time switch --enable-shared -- meaning we cannot assume it is on. 😞

@kevinushey
Copy link
Contributor

kevinushey commented Jun 20, 2019

I'm fairly sure R dynamically links to the runtime by default, and that's true in all the cases we're testing here. (Although I guess that's not true on Windows?)

I think the problem is that different versions of Rcpp objects are ending up in different translations units (and so the compiled shared libraries generated by cppFunction() are getting their own versions of these objects) but for some reason they're not considered identical for these cross-module exceptions. But I've had trouble distilling this into a reproducible example independent of Rcpp to validate that claim.

@eddelbuettel
Copy link
Member

Maybe it is an unreasonable assumption? The foreign function interface is in C and .Call() really is all we have. We can dive from R into a direct, reasonably well-defined context. Calling something from there is likely iffy and fraught with "surprises". Different compilation units is one, different shared library, possibly built with different settings, compilers, etc pp sounds like something that can go astray easily.

@lionel-
Copy link
Contributor

lionel- commented Jun 20, 2019

I just tried with a locally compiled R. Having R, jcheng5/exceptiontest, and the sourceCpp() modules all compiled with the same compiler, the exceptions are caught correctly. If I recompile the package or the module with a different compiler, the exception slips through.

I did not add -shared-libgcc flag, so that seems to confirm @kevinushey comment about it being the default. Furthermore when I compile R with -static-libgcc, the interrupt is still caught across modules, and I get this warning when compiling a package/module:

ld: warning: direct access in function 'operator new(unsigned long, std::nothrow_t const&) [clone .cold.0]' from file '/usr/local/Cellar/gcc/8.3.0/lib/gcc/8/gcc/x86_64-apple-darwin18.2.0/8.3.0/../../../libstdc++.a(new_opnt.o)' to global weak symbol 'operator new(unsigned long, std::nothrow_t const&)' from file '/usr/local/Cellar/gcc/8.3.0/lib/gcc/8/gcc/x86_64-apple-darwin18.2.0/8.3.0/../../../libstdc++.a(new_opnt.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings.

@kevinushey
Copy link
Contributor

On macOS, I still see this behavior when using the CRAN binary of R, regardless of whether I compile with the system toolchain or the LLVM toolchain. I also see it with a locally-compiled version of R, even with R and all packages built from sources with the same toolchain. So (at least on macOS) it seems there could be something deeper going on.

I still think the root cause of the issue is that different 'versions' of these exception objects are entering each shared library, and those versions aren't considered compatible for some reason.

As an aside, I also see this behavior on Linux, when using the CRAN binary of R and the default gcc compiler (ie: in theory, all the same toolchain). But I haven't tried compiling R myself there yet.

https://learning.oreilly.com/library/view/c-coding-standards/0321113586/ch63.html seems to suggest what I think we probably all agree on: throwing exceptions across modules isn't a good idea...

@lionel-
Copy link
Contributor

lionel- commented Jun 20, 2019

Ah when I got a success it was with everything compiled with gcc. Maybe clang is the issue?

Edit: And probably small differences in the gcc toolchain, as suggests your Linux experiment. In any case it seems this pattern should simply be avoided.

@wch
Copy link
Author

wch commented Jun 20, 2019

From Kevin's link (C++ Coding Standards by Sutter and Alexandrescu, item 62):

Typically, this boils down to: Don't let exceptions propagate across module/subsystem boundaries.

The C++ Standard does not specify the way exception propagation has to be implemented, and there isn't even a de facto standard respected by most systems. The mechanics of exception propagation vary not only with the operating system and compiler used, but also with the compiler options used to build each module of your application with a given compiler on a given operating system. Therefore, an application must prevent exception handling incompatibilities by shielding the boundaries of each of its major modules, meaning each unit for which the developer can ensure that the same compiler and options are consistently used.

At a minimum, your application must have catch-all catch(...) seals in the following places, most of which apply directly to modules:
...

  • Around callbacks from code you don't control: Operating systems and libraries offer frameworks in which you pass a pointer to a function to be invoked later (e.g., when an asynchronous event occurs). Don't allow exceptions to propagate out of your callback, because it's very possible that the code invoking your callback does not use the same exception handling mechanism. For that matter, it might not even have been written in C++.

If I understand it correctly, then it is not possible to be 100% sure that it is safe to use the C++ exception system for cases like this.

Also from that chapter:

A good solution is to define central functions that translate between exceptions and error codes returned by a subsystem. This way, you can easily translate incoming errors from peer modules into your internally used exceptions and ease integration.

@bgoodri
Copy link
Contributor

bgoodri commented Nov 8, 2019

I encountered a similar error with Stan (which often loads a DLL with a C++ function compiled by inline::cxxfunction) that I was able to eventually work around by setting SHLIB_CXXLD (or SHLIB_CXX11LD if you are requiring C++11) in ~/.R/Makevars to be the same executable that was used to compile the code. For example, if you use clang7 on a Mac, exceptiontest may work if you put

SHLIB_CXXLD = /usr/local/clang7/bin/clang++

or you may need to specify additionally

SHLIB_CXXLD = /usr/local/clang7/bin/clang++ -L/usr/local/clang7/lib

or perhaps

SHLIB_CXXLD = /usr/local/clang7/bin/clang++ -L$(R_HOME)/lib

I don't have a Mac and am not getting the unexpected behavior on Linux, so can someone else confirm?

@eddelbuettel
Copy link
Member

I think we all agree -- did you see what I put in the Rcpp FAQ vignette a few days about this?

## Can we use exceptions and stop() across shared libraries?


Within limits, yes. Code that is generated via Rcpp Attributes (see
\citet{CRAN:Rcpp:Attributes} and Section~\ref{using-attributes}) generally
handles this correctly and gracefully via the `try-catch` layer it adds
shielding the exception from propagating to another, separate dynamic
library.


However, this mechanism relies on dynamic linking with the (system library)
`libgcc` providing the C++ standard library (as well as on using the same C++
standard library across all compiled components). But this library is linked
statically on Windows putting a limitation on the use of `stop()` from within
Rcpp Modules \citep{CRAN:Rcpp:Modules}.  Some more background on the linking
requirement is [in this SO
question](https://stackoverflow.com/questions/2424836/exceptions-are-not-caught-in-gcc-program).

@bgoodri
Copy link
Contributor

bgoodri commented Nov 8, 2019

I had not seen that until now, but I agree.

@kevinushey
Copy link
Contributor

I think also ensuring that /usr/local/clang7/bin is on the PATH will fix these issues as well (so that any references to clang in the various Makefile routines resolve to the LLVM toolchain)

asardaes added a commit to asardaes/wiserow that referenced this issue Mar 1, 2020
@asardaes
Copy link

asardaes commented Mar 2, 2020

I'm not sure if this is an instance of the same core issue, but I'm seeing the same behavior in one of my packages, even though the exception is thrown and catched by the code in my shared library:

  • One class throws std::exception.
  • Another class catches and saves with std::current_exception.
  • The RcppExported function re-throws with std::rethrow_exception.

I have an R-side test that checks whether the thrown exception's message is the expected one, which fails some times because only the generic "c++ exception (unknown reason)" is thrown. What's more puzzling is that this is only failing on the following CRAN systems:

  • OSX release
  • OSX oldrel
  • Fedora clang devel

Debian clang devel works fine, and the OSX build on Travis doesn't show the issue either.

I can of course adjust the test to ignore the message, but that still means the user will only get the generic one if a problem occurs.

EDIT: turns out it's not only a problem with exception types, a segmentation fault actually occurs in the aforementioned CRAN systems (free(): invalid pointer).

@eddelbuettel
Copy link
Member

It would be nicer if all this worked, but alas. You documented this well, but this is to me a restatement of what was already said aboce. Eg a quote (from a C++ authority)/

Typically, this boils down to: Don't let exceptions propagate across module/subsystem boundaries.

and

Maybe clang is the issue? [...] In any case it seems this pattern should simply be avoided.

riccardoporreca added a commit to miraisolutions/rTRNG that referenced this issue Jan 26, 2021
* This works around the fact that, in certain circumstances, we cannot rely on errors propagating with proper `std::invalid_argument` class / error message. See also RcppCore/Rcpp#972
@boennecd
Copy link
Contributor

boennecd commented Feb 25, 2021

I'm not sure if this is an instance of the same core issue, but I'm seeing the same behavior in one of my packages, even though the exception is thrown and catched by the code in my shared library:

  • One class throws std::exception.
  • Another class catches and saves with std::current_exception.
  • The RcppExported function re-throws with std::rethrow_exception.

I have an R-side test that checks whether the thrown exception's message is the expected one, which fails some times because only the generic "c++ exception (unknown reason)" is thrown. What's more puzzling is that this is only failing on the following CRAN systems:

  • OSX release
  • OSX oldrel
  • Fedora clang devel

Debian clang devel works fine, and the OSX build on Travis doesn't show the issue either.

I can of course adjust the test to ignore the message, but that still means the user will only get the generic one if a problem occurs.

EDIT: turns out it's not only a problem with exception types, a segmentation fault actually occurs in the aforementioned CRAN systems (free(): invalid pointer).

I found the a very similar if not identical issue. See this SO question. In some cases, I also get a free(): invalid pointer error or something similar (consistent with the output from valgrind). The error only happens for me on Fedora with clang-11 using libc++ (it works without libc++).

The output from Valgrind though seems to suggest that the error happens quite earlier. Here is the example. Take the following C++ file:

// openmp-exception-issue.cpp
#include <exception>
#include <stdexcept>
#include <Rcpp.h>

// [[Rcpp::export()]]
double that_cpp_func(int const n_it){
  std::exception_ptr Ptr = nullptr;
  bool is_set = false;
  double out(0.);

  for(int i = 0; i < n_it; ++i)
    try
    {
      if(i > -1)
        throw std::runtime_error("boh :(");
      out += i;
    }
    catch (...)
    {
      if(!is_set){
        Ptr = std::current_exception();
        is_set = true;
      }
    }

  if(Ptr)
    std::rethrow_exception(Ptr);

  return out;
}

Setup R on Docker with Fedora as described in the SO question and run (you likely need to change the path):

/root/R-devel/bin/R -d valgrind -e "Rcpp::sourceCpp('/sdir/openmp-exception-issue.cpp'); that_cpp_func(100)"

The output can be found here. The first error shows that there are invalid reads at the end of the catch block which is from memory which is free'd at Ptr = std::current_exception();. The free(): invalid pointer or similar messages I have seen can likely be explained by the Invalid free() / delete / delete[] / realloc() messages at the end of the valgrind output.

@eddelbuettel
Copy link
Member

eddelbuettel commented Feb 25, 2021

I saw your (by now two) SO questions, and this here is likely the relevant context. It is not clear to me that what you desire can be done, but I also have not really dig into the particulars of your questions. But as first sight it appears to be neither a new nor solved problem.

(Edit: Missed a 'not')

@boennecd
Copy link
Contributor

Thank you for the quick answer.

It is not clear to me that what you desire can be done, but I also have really dig into the particulars of your questions.

This is also my conclusion.

But as first sight it appears to be neither a new nor solved problem.

I think an important point is that using std::current_exception should be discouraged when using Rcpp as it can cause a segmentation fault it seems?

@eddelbuettel
Copy link
Member

I am not sure we can generalize to that conclusion. Don't all your example involve MPI and added complications? Exceptions work in general and can be caught, trouble arises (really simplified summary) due to a mix of C and C++ differences, R's setup and use of different compilation units.

@boennecd
Copy link
Contributor

boennecd commented Feb 25, 2021

Don't all your example involve MPI and added complications?

There is no parallel computation in the example I provide here yet the issue persists on the setup? Yes, I do see your point that it works in a lot settings! But it will fail it seems on CRAN's r-devel-linux-x86_64-fedora-clang regardless of whether any parallel computation is done (in some maybe all cases?). This is a big deal for any R package.

@eddelbuettel
Copy link
Member

That it does work or fail on a given system is also known; some of this behaviour is AFAIK listed as "implementation dependent". Which is what you see here: works in some places, but not others. Sadly success in one place cannot be interpolated to all others.

Exceptions are useful, and difficult to get right at scale. Some "guidelines" (including this one) just outlaw them completely.

@kevinushey
Copy link
Contributor

I found the a very similar if not identical issue. See this SO question. In some cases, I also get a free(): invalid pointer error or something similar (consistent with the output from valgrind). The error only happens for me on Fedora with clang-11 using libc++ (it works without libc++).

1 warning generated.
==15727== Invalid read of size 8
==15727==    at 0x71EC2DF: __cxa_end_catch (in /usr/lib64/libstdc++.so.6.0.28)
==15727==    by 0x10B0FB25: that_cpp_func(int) (openmp-exception-issue.cpp:25)

Note the usage of libstdc++: this most likely implies that you're somehow getting both libc++ and libstdc++ mixed into the same process.

@boennecd
Copy link
Contributor

boennecd commented Feb 26, 2021

Note the usage of libstdc++: this most likely implies that you're somehow getting both libc++ and libstdc++ mixed into the same process.

I see, thank you! I wonder if it is the same issue I am having on CRAN's r-devel-linux-x86_64-fedora-clang in my mdgc package (version 0.1.2). CRAN note that

clang was built to use libc++: for a version built to default to libstdc++
(as shipped by Fedora/Debian/Ubuntu), add -stdlib=libc++ to CXX
and install the libcxx-devel/libc++-dev package.

which is what I am doing in the example. One of many errors I have gotten with my mdgc package only on CRAN's r-devel-linux-x86_64-fedora-clang is:

*** caught segfault ***
address 0xffffffff, cause 'memory not mapped'

which is consistent with the Valgrind output.

@github-actions
Copy link

github-actions bot commented Mar 5, 2022

This issue is stale (365 days without activity) and will be closed in 31 days unless new activity is seen. Please feel free to re-open it is still a concern, possibly with additional data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants