Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalize away bit width and embed small integers into IntIds #4487

Open
wants to merge 11 commits into
base: trunk
Choose a base branch
from

Conversation

chandlerc
Copy link
Contributor

@chandlerc chandlerc commented Nov 5, 2024

The first change here is to canonicalize away bit width when tracking
integers in our shared value store. This lets us have a more definitive
model of "what is the mathematical value". It also frees us to use more
efficient bit widths when available, such as bits inside the ID itself.

For canonicalizing, we try to minimize the width adjustments and
maximize the use of the SSO in APInt, and so we never shrink belowe
64-bits and grow in multiples of the word bit width in the
implementation. We also canonicalize to the signed 2s compliment
representation so we can represent negative numbers in an intuitive way.

The canonicalizing requires getting the bit width out of the type and
adjusting to it within the toolchain when doing any kind of math, and
this PR updates various places to do that, as well as adding some
convenience APIs to assist.

Then we take advantage of the canonical form and embed small integers
into the ID itself rather than allocating storage for them and
referencing them with an index. This is especially helpful for the
pervasive small integers such as the sizes of types, arrays, etc. Those
no longer require indirection at all. Various short-cut APIs to take
advantage of this have also been added.

This PR improves lexing by about 5% when there are lots of i32 types.

@chandlerc chandlerc force-pushed the fast-ints2 branch 2 times, most recently from 043e620 to 833c177 Compare November 6, 2024 00:42
The first change here is to canonicalize away bit width when tracking
integers in our shared value store. This lets us have a more definitive
model of "what is the mathematical value". It also frees us to use more
efficient bit widths when available, such as bits inside the ID itself.

For canonicalizing, we try to minimize the width adjustments and
maximize the use of the SSO in APInt, and so we never shrink belowe
64-bits and grow in multiples of the word bit width in the
implementation. We also canonicalize to the signed 2s compliment
representation so we can represent negative numbers in an intuitive way.

The canonicalizing requires getting the bit width out of the type and
adjusting to it within the toolchain when doing any kind of math, and
this PR updates various places to do that, as well as adding some
convenience APIs to assist.

Then we take advantage of the canonical form and embed small integers
into the ID itself rather than allocating storage for them and
referencing them with an index. This is especially helpful for the
pervasive small integers such as the sizes of types, arrays, etc. Those
no longer require indirection at all. Various short-cut APIs to take
advantage of this have also been added.

This PR improves lexing by about 5% when there are lots of `i32` types.
@chandlerc chandlerc marked this pull request as ready for review November 6, 2024 01:05
@github-actions github-actions bot requested a review from jonmeow November 6, 2024 01:06
Copy link
Contributor

@danakj danakj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through to try wrap my head around everything, noticed a few inconsequential things along the way.

toolchain/base/int_store.h Outdated Show resolved Hide resolved
toolchain/base/int_store.h Outdated Show resolved Hide resolved
toolchain/base/int_store.cpp Outdated Show resolved Hide resolved
toolchain/base/value_ids.h Outdated Show resolved Hide resolved
toolchain/check/eval.cpp Outdated Show resolved Hide resolved
@chandlerc chandlerc changed the title WIP: Canonicalize ints across bitwidth and optimize Canonicalize away bit width and embed small integers into IntIds Nov 7, 2024
@jonmeow
Copy link
Contributor

jonmeow commented Nov 7, 2024

This PR improves lexing by about 5% when there are lots of i32 types.

What percentage of tokens/bytes being i32 results in 5% lex improvement? Can you give a little more context for this?

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LG, sorry about my usual spread of comments. High level I think the IntId and IntStore changes look pretty much like what I'd expected after discussions, I'm glad for the noted performance improvements.

toolchain/sem_ir/type.h Show resolved Hide resolved
toolchain/sem_ir/file.h Outdated Show resolved Hide resolved
toolchain/sem_ir/file.h Outdated Show resolved Hide resolved
toolchain/sem_ir/file.h Outdated Show resolved Hide resolved
toolchain/sem_ir/file.h Outdated Show resolved Hide resolved
toolchain/base/int_store.h Outdated Show resolved Hide resolved

static auto MakeIndexOrInvalid(int index) -> IntId {
CARBON_DCHECK(index >= 0 && index <= InvalidIndex);
return IntId(ZeroIndexId - index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there validation that this doesn't produce incorrect values? Is it possible to have a unit test that tries making too many unique integers, to check for graceful failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...

I don't think the unit test is easy to do here, as we don't even have the token payload size limitation, and so we can have a lot of unique integers. Should be 2 billion - 8 million or something, and each needs its own APInt.

But one thing that made me happy about the logic here is that we actually compute the ID from InvalidIndex (which is the largest value of index allowed) in a constexpr context below. And that should ensure that this subtraction doesn't hit UB provided the assert above it holds, and produces the expected ID value even for the largest value. And for the smallest of 0, its pretty easy to analyze.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More focused on lex, that's a lower limit of 2 million right? Is that feasible to test, like with a string of long integers one after the other?

I think 2B may be infeasible to reach until we get metaprogramming.

Note, fine to not address this in this PR, but I do lean towards that we should test lex thresholds given the low-ish limits.


// Tries to make a signed APInt into an embedded value in the ID, and if
// unable to do that returns the `Invalid` ID.
static auto TryMakeSignedValue(llvm::APInt value) -> IntId {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, since you'd asked organizational comments, it might be worth moving these Make functions to IntStore (if the result is more compact)... for example, I'm having to flip back and forth between files in order to understand how IntStore::AddSigned works, and that might've been something that could be in one spot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, merged into one file.

Once there, I moved these all to be private helper functions in IntStore.

I actually tried inlining most of them, but it felt slightly awkward. We end up wanting both Add... and Lookup... code paths in the store I think, at least for generality. And these helpers are useful to extract and make common between those.

I actually added another Lookup to simplify one of the places where we unnecessarily were forming an APInt. Currently there aren't a lot of Lookup calls, but it seems like an important API from a library design perspective so I didn't want to fully remove them.

That said, happy to revisit or discuss if there is a cleaner way to structure this... not super confident in the exact result I ended up with.

toolchain/base/int_store_test.cpp Show resolved Hide resolved
toolchain/base/int_store_test.cpp Outdated Show resolved Hide resolved
Co-authored-by: Jon Ross-Perkins <[email protected]>
toolchain/base/int_store.h Outdated Show resolved Hide resolved
toolchain/base/int_store.h Outdated Show resolved Hide resolved
toolchain/sem_ir/file.h Show resolved Hide resolved
toolchain/sem_ir/file.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comments, I think I've gotten to them all, but let me know if I missed anything!

// This will always be a signed `APInt` with a canonical bit width for the
// specific integer value in question.
auto Get(IntId id) const -> llvm::APInt {
if (id.is_value()) [[likely]] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that we have standard attributes now. Happy to either switch to LLVM ones until we can move the rest of the code, or move the rest of the code in a follow-up.

toolchain/base/int_store.h Outdated Show resolved Hide resolved
toolchain/base/value_ids.h Outdated Show resolved Hide resolved
toolchain/base/int_store.h Outdated Show resolved Hide resolved
toolchain/base/int_store.h Outdated Show resolved Hide resolved
@@ -46,7 +46,7 @@ static auto MakeI32Literal(Context& context, Parse::NodeId node_id,
return context.AddInst<SemIR::IntValue>(
node_id,
{.type_id = context.GetBuiltinType(SemIR::BuiltinInstKind::IntType),
.int_id = context.ints().Add(i32_val)});
.int_id = context.ints().AddUnsigned(i32_val)});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path didn't get updated enough, all of this should have been simplified with this PR to just pass through the ID after verifying that the value fits into an i32. The extending and creating a new ID all stemmed from when there was implicit bit width in the integer IDs themselves. The new code should be more clear.

That said, I have thought about removing AddUnsigned and forcing the lexer to form the unsigned APInt, but I'm worried that would add cost due to needing a wider APInt ealier in the process.

Because we want to canonicalize the bit width inside the store, I didn't want clients to do any unnecessary resizing if possible, and the cleanest way I see to do that is to let them directly add an unsigned APInt if that's what they have.

toolchain/sem_ir/type.h Show resolved Hide resolved
toolchain/lower/constant.cpp Show resolved Hide resolved
// Exceptions. See /LICENSE for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

#ifndef CARBON_TOOLCHAIN_BASE_INT_STORE_H_
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. I'll do the rename from int_store.h to int.h last to preserve review threads as much as I can.

toolchain/base/value_ids.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, missed replying to one thread it seems, but found it now and replied below. (The code change was already in, just lost the thread.)

toolchain/sem_ir/file.h Outdated Show resolved Hide resolved
@chandlerc
Copy link
Contributor Author

This PR improves lexing by about 5% when there are lots of i32 types.

What percentage of tokens/bytes being i32 results in 5% lex improvement? Can you give a little more context for this?

This is just in the compile_benchmark for the lex phase, using the generated source there:

BM_CompileAPIFileDenseDecls<Phase::Lex>/256       36.2µs ± 2%  35.8µs ± 3%  -1.09%  (p=0.003 n=20+19)
BM_CompileAPIFileDenseDecls<Phase::Lex>/1024       163µs ± 1%   159µs ± 1%  -2.48%  (p=0.000 n=19+18)
BM_CompileAPIFileDenseDecls<Phase::Lex>/4096       660µs ± 1%   640µs ± 1%  -3.13%  (p=0.000 n=20+19)
BM_CompileAPIFileDenseDecls<Phase::Lex>/16384     2.97ms ± 2%  2.82ms ± 1%  -5.07%  (p=0.000 n=20+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/65536     12.8ms ± 1%  12.2ms ± 1%  -4.42%  (p=0.000 n=20+19)
BM_CompileAPIFileDenseDecls<Phase::Lex>/262144    58.8ms ± 1%  57.2ms ± 2%  -2.73%  (p=0.000 n=19+20)

Seems to fluctuate a bit between 2% and 5%. The 1% for the smallest file is because we spend more time in setup/teardown.

The % of tokens that are i32 in this test is 4.6% -- not tiny, but also not huge.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I think my comments are pretty small, except for the one "lex file with 2M ints" test suggestion which I'm happy to split out. So feel free to merge when you've had a chance to go through remaining stuff.

@@ -70,6 +77,25 @@ class File : public Printable<File> {
return types().GetAs<PointerType>(pointer_id).pointee_id;
}

// Returns integer type information from a type ID. Abstracts away the
// difference between an `IntType` instruction defined type and a builtin
// instruction defined type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// instruction defined type.
// instruction defined type. Uses IntId::Invalid for types that have an
// invalid width.

auto LookupLarge(int64_t value) const -> IntId;
auto LookupSignedLarge(llvm::APInt value) const -> IntId;

CanonicalValueStore<APIntId> values_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CanonicalValueStore<APIntId> values_;
// Stores values which don't fit in an IntId. These are always signed.
CanonicalValueStore<APIntId> values_;

private:
friend struct Testing::IntStoreTestPeer;

struct APIntId : IdBase, Printable<APIntId> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct APIntId : IdBase, Printable<APIntId> {
// Used for `values_`; tracked using `IntId`'s index range.
struct APIntId : IdBase, Printable<APIntId> {

return ZeroIndexId - id_;
}

constexpr auto AsTokenPayload() const -> uint32_t {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr auto AsTokenPayload() const -> uint32_t {
// Returns the ID formatted as a lex token payload.
constexpr auto AsTokenPayload() const -> uint32_t {

// This will always be a signed `APInt` with a canonical bit width for the
// specific integer value in question.
auto Get(IntId id) const -> llvm::APInt {
if (id.is_value()) [[likely]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is we've generally agreed to use C++ attribute forms so that seems the better choice. I don't think it makes sense to switch this code if the rest changes.

// Used to return information about an integer type in `GetIntTypeInfo`.
struct IntTypeInfo {
bool is_signed;
IntId bit_width = IntId::Invalid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IntId bit_width = IntId::Invalid;
IntId bit_width;

Looks like this default is unused, suggesting removal.

Comment on lines +158 to +161
// Because this is the first index ID, and we encoded indices as successive
// negative numbers counting downwards, we can both use a comparison with
// this ID to distinguish value and index IDs, and to compute the actual index
// from the ID. The computation of an index in fact is just a subtraction:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble reading this due to the commas. What do you think of:

Suggested change
// Because this is the first index ID, and we encoded indices as successive
// negative numbers counting downwards, we can both use a comparison with
// this ID to distinguish value and index IDs, and to compute the actual index
// from the ID. The computation of an index in fact is just a subtraction:
// ZeroIndexId is the first index ID, and we encode indices as successive
// negative numbers counting downwards. The setup allows us to both use a comparison with
// this ID to distinguish value and index IDs, and to compute the actual index
// from the ID.
//
// The computation of an index in fact is just a subtraction:

Comment on lines +406 to +407
// only a few lines of code, but it ends up expensive and a lot of code so we
// move these out-of-line.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// only a few lines of code, but it ends up expensive and a lot of code so we
// move these out-of-line.
// only a few lines of code, but we move these out-of-line because the generated code is big and harms performance for the non-`Large` common case.

Suggesting a slightly different comment due to discussion.

toolchain/base/int_store_test.cpp Outdated Show resolved Hide resolved
Comment on lines +187 to +192
// Each bit is either `T` for part of the token or `P` as part
// of the available payload that we use for the ID:
//
// clang-format off: visualizing bit positions
//
// 0bTTTT'TTTT'TPPP'PPPP'PPPP'PPPP'PPPP'PPPP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nuanced thing, the clang-format throws me a little, maybe one of:

Suggested change
// Each bit is either `T` for part of the token or `P` as part
// of the available payload that we use for the ID:
//
// clang-format off: visualizing bit positions
//
// 0bTTTT'TTTT'TPPP'PPPP'PPPP'PPPP'PPPP'PPPP
// clang-format off: visualizing bit positions
//
// Each bit is either `T` for part of the token or `P` as part
// of the available payload that we use for the ID:
//
// 0bTTTT'TTTT'TPPP'PPPP'PPPP'PPPP'PPPP'PPPP
Suggested change
// Each bit is either `T` for part of the token or `P` as part
// of the available payload that we use for the ID:
//
// clang-format off: visualizing bit positions
//
// 0bTTTT'TTTT'TPPP'PPPP'PPPP'PPPP'PPPP'PPPP
// Each bit is tagged either `T` for part of the token or `P` as part
// of the available payload that we use for the ID.
//
// clang-format off: visualizing bit positions
//
// 0bTTTT'TTTT'TPPP'PPPP'PPPP'PPPP'PPPP'PPPP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants