Skip to content

Add string_view API for c++17 #410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

roligugus
Copy link

@roligugus roligugus commented Nov 3, 2020

Follow-up of WIP: #80 to add std::string_view support.

std::string_view is enabled if cxx is compiled with c++17 or later. One will need to set the c++17 (or later) feature on the cxx crate for cargo-based builds. Also added a "CXX Features" section to the cargo build info if that helps.

Notes:

  • This only replaces the std::string ctor, not the ptr and ptr/len ctors as in the WIP
    • We could replace the latter ctors as well, but it can break client code. It would require changes for rust::str and rust::String parameters and return values in the client code. Not sure you'd want to break the API.
  • Tests are only enabled for c++17 (or later)
    • One needs to set the c++17 (or later) feature on cxx in tests/ffi/Cargo.toml at this time. Not ideal, but a start.
  • Added a "CXX Features" section to the documentation if that helps

#error "string_view gets auto-detected"
#endif

// clang-format off
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling clang-format as settings will flatten everything which makes this hard to follow.

Using normal feature detection for compilers supporting it. Feature detection was also noted in the WIP PR.
Tested paths on godbolt.org as I don't have access to all the compiler versions.

Idea is that if string_view is supported, we should support it.

src/cxx.cc Outdated
@@ -65,7 +65,11 @@ static void initString(String *self, const char *s, size_t len) {
}
}

#ifdef CXXBRIDGE_HAS_STRING_VIEW
String::String(std::string_view sv) : String(sv.data(), sv.length()) {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forwarding it to ptr/len ctor that does the nice asserts.

FYI, empty string_view has nullptr and 0 size.

@@ -14,6 +14,45 @@
#include <basetsd.h>
#endif

#ifdef CXXBRIDGE_HAS_STRING_VIEW
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using CXXBRIDGE_.. naming to align more with other defines. WIP used CXX_... naming

Bikeshedding time?

@@ -598,9 +598,26 @@ extern "C" const char *cxx_run_test() noexcept {
r_take_unique_ptr(std::unique_ptr<C>(new C{2020}));
r_take_ref_c(C{2020});
r_take_str(rust::Str("2020"));
r_take_empty_str(rust::Str(""));
Copy link
Author

@roligugus roligugus Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "empty" tests for std::string_view and figured we might as well add tests for rust::str and rust::String

#else
// no __has_include
#if _MSC_VER >= 1910L && _MSVC_LANG > 201402L
// msvc: 19.10 requires c++latest (!)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older msvc compilers that supported string_view. Not sure what compiler versions you want to support, but I've included the ones I could get my hands on through godbolt.

There could be a discussion if it would make sense to support "c++latest" in cxx as a feature (for msvc).

Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is great. I am going to defer reviewing this for a little bit while I get the https://cxx.rs site wrapped up and published. That will also provide a better place to document the features situation -- I think it's not good for a readme to go into this level of detail.

@roligugus roligugus force-pushed the string_view branch 2 times, most recently from d3eb663 to 18d9c71 Compare November 24, 2020 20:36
@roligugus
Copy link
Author

That will also provide a better place to document the features situation -- I think it's not good for a readme to go into this level of detail.

No pressure to look at this. Just had some downtime at work to quickly run after this.

Rebased changes and I've moved the "features" documentation into the book/build/cargo.md. Not sure if you want to have it there or somewhere else. Just a placeholder, really.

@@ -144,6 +144,46 @@ manifest key][links]* in the Cargo reference.

[links]: https://doc.rust-lang.org/cargo/reference/build-scripts.html#the-links-manifest-key

## Features
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that's the place where you'd want to document features if at all. Just a placeholder.

Nicer to look at page: https://github.com/roligugus/cxx/blob/string_view/book/src/build/cargo.md#features

r_take_rust_string(std::string_view("2020"));
r_take_empty_rust_string(std::string_view{});
r_take_empty_rust_string(std::string_view(""));
const auto rustString = rust::String("2020"); // avoid lifetime issue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What lifetime issue is this comment referring to?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Lifetime issue" was referring to lifetime of (temporary) rust::String("2020") needing to exceed lifetime of string_view in ASSERT() below.

Was easier to use defensive programming than trying to figure it out exactly. Anyhow, checked and the temporary rust::String("2020") is only destructed after the comparison.

Will remove this var and merge code into ASSERT().

Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned that in this implementation, simply adding a new dependency to a Cargo build graph can cause existing working code to fail to link. If the newly added crates somewhere contain a dependency on "cxx/c++17", that leads to removing various symbols from the cxx rlib without doing anything to eliminate use of those symbols in elsewhere in the build graph.

Relatedly it seems like this implementation violates ODR which is UB (https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule), because the class definitions of Str and String don't necessarily "consists of the same sequence of tokens" each place they appear.

Thoughts? It seems like more design work is still required on how to integrate this from the build side.

@roligugus
Copy link
Author

roligugus commented Nov 26, 2020

I am concerned that in this implementation, simply adding a new dependency to a Cargo build graph can cause existing working code to fail to link. If the newly added crates somewhere contain a dependency on "cxx/c++17", that leads to removing various symbols from the cxx rlib without doing anything to eliminate use of those symbols in elsewhere in the build graph.

Yes, that is a downside of tying that functionality to a C++ standard switch and automatically enabling it in the header if it's available while at the same time removing existing functionality. The nice thing is that you get link errors due to missing symbols that your particular client compilation requires.

UPDATE: Removed paragraph as I think I understood wrongly what you meant. The other stuff here still applies, imho.

The current approach would not work for these cases as it breaks the cxx lib ABI as well as the API.

In order to always be able to handle this case, one must never remove current functionality to be backwards compatible from an ABI point-of-view. One could be somewhat flexible with APIs depending on the change - more on that later. Yeah, there's also the "let's version APIs and lib" approach that one would do in C++/C land (not sure about rust), but I don't think that worth considering as that's always a pain and I typically like avoiding that like the pest.

Possible approach to follow...

@roligugus
Copy link
Author

Relatedly it seems like this implementation violates ODR which is UB (https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule), because the class definitions of Str and String don't necessarily "consists of the same sequence of tokens" each place they appear.

Don't think this is an ODR problem in this instance. The cxx rlib itself is fine if compiled e.g. with C++17. No ODR there. Your client code is also fine even though your included cxx header might declare different things. No ODR there. However, you will run into link errors when trying to link the cxx rlib into your client lib/exec conveniently avoiding ODR.

It only would become an ODR issue if you were compiling cxx and your client TUs directly into the same executable or lib without going through a separate cxx lib. However, in that case you will be using your own build system and you better be compiling everything linked together into the same lib/exec using the same compilation settings.

@roligugus
Copy link
Author

roligugus commented Nov 27, 2020

Thoughts? It seems like more design work is still required on how to integrate this from the build side.

Agree. So what can we do to avoid that?

The problem is that this PR breaks the API and the ABI by adding a new string_view ctor API (with implementation) and removing another string ctor API (with implementation). In addition it does this silently depending on how the client using cxx is compiled. Such an API and ABI change is only workable if you control all code and are willing to set the same compilation options used to toggle APIs in cxx.

A way of handling that is not to automatically enable the string_view APIs
Using a #define to enable the string_view API on demand while using it in your client code has the same issue as automatically enabling it upon a certain c++ standard. Different clients might have different defines resulting in the same issues we try to avoid.

The only way to handle that would be to do something similar to what e.g. autotools do. The idea is to bake the functionality into cxx. E.g. have the cxx crate write a config.h equivalent that #defines what features are enabled during the cxx lib build and then have cxx.h include that generated header. This way, the functionality that cxx supports (defined by its features) is defined by cxx itself and not by the dependencies using it.

However, you still would run into the same issue for breaking changes where client A needs cxx features X and client B needs cxx feature Y. So that would not be solved as the cxx lib compilation would define the supported functionality depending on the features defined.

A better way to fix that for this change is to "only add new APIs".
Essentially chosing the approach of not breaking the cxx rlib ABI. APIs still can be changed in ways that don't break the existing ABI. In this case, we can add the string_view APIs, while keeping the string APIs around.

Since we're currently keeping the char*-parameter constructors, we should be able to simply add the string_view-parameter constructor a la:

#ifdef CXXBRIDGE_HAS_STRING_VIEW
  String(std::string_view);
#endif
  String(const std::string &);
  String(const char *);
  String(const char *, size_t);

Ditto with the conversion operator.

This will keep the ABI backwards-compatible, i.e. client code compiled with e.g. c++14 will be able to compile and link against a cxx rlib compiled with c++17. This allows to compile the cxx lib with whatever highest c++ standard required to support both c++14 (using string ctor) clients and c++17 clients (using string_view ctor).

We can go further than that. By inlining the new string_view APIs in the cxx.h header, we can even support the "other direction", i.e. C++17 clients using the string_view APIs compiled against a cxx lib compiled with c++14. The trick is in the inlined "C++17 string_view API" code that essentially gets injected into the client compilation via cxx.h due to inlining and internally only calling existing C++14/11 cxx APIs.

We can keep the automatic enabling of the string_view ctor and conversion operators to support string_view APIs.

Sounds like a better approach? I can update this PR to give a better picture.

<tr><td>c++17</td><td>-std=c++17 (gcc, clang), /std:c++17 (msvc)</td><td>Compile CXX lib with c++17</td></tr>
</table>

## Additional C++ standard features
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential documentation for enabling additional C++ APIs.

Nicer to look at page:
https://github.com/roligugus/cxx/blob/string_view/book/src/build/cargo.md#additional-c-standard-features

@@ -34,6 +73,9 @@ class String final {
String(String &&) noexcept;
~String() noexcept;

#ifdef CXXBRIDGE_HAS_STRING_VIEW
String(std::string_view sv) : String(sv.data(), sv.length()) {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add "always inline", e.g. __attribute__((always_inline)) (gcc/clang) and __forceinline (msvc), to make sure it gets inlined even if e.g. people disable inlining, etc.

@hallfox
Copy link

hallfox commented Jul 8, 2023

Hi there! I went looking for this feature earlier today, seems like it's stalled out. Is there any interest in revisiting this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants