Skip to content

Reduce size of buffer, stringBuffer and tape. #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions src/main/java/org/simdjson/SimdJsonParser.java
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ public class SimdJsonParser {
private final StructuralIndexer indexer;
private final BitIndexes bitIndexes;
private final JsonIterator jsonIterator;
private final byte[] paddedBuffer;
private final int capacity;

public SimdJsonParser() {
this(DEFAULT_CAPACITY, DEFAULT_MAX_DEPTH);
Expand All @@ -20,7 +20,7 @@ public SimdJsonParser() {
public SimdJsonParser(int capacity, int maxDepth) {
bitIndexes = new BitIndexes(capacity);
jsonIterator = new JsonIterator(bitIndexes, capacity, maxDepth, PADDING);
paddedBuffer = new byte[capacity];
this.capacity = capacity;
reader = new BlockReader(STEP_SIZE);
indexer = new StructuralIndexer(bitIndexes);
}
Expand All @@ -34,7 +34,8 @@ public JsonValue parse(byte[] buffer, int len) {
}

private byte[] padIfNeeded(byte[] buffer, int len) {
if (buffer.length - len < PADDING) {
if (buffer.length - len < PADDING && len < capacity) {
byte[] paddedBuffer = new byte[len + PADDING];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for maintaining the paddedBuffer all the time, regardless of whether it is necessary or not, is to avoid allocations on hot paths. However, I see at least two issues with padding in general. Firstly, it requires adding this extra branch. Secondly, it complicates the API: on one hand, the user doesn't need to be aware of it, but on the other hand, if they want to achieve the best performance, they should pad the input. Therefore, I've been considering removing the need for padding altogether. It should be possible, although I haven't thoroughly researched this topic.

To summarize: I'd start by verifying if removing the padding is possible. If so, I'd remove it and test the performance of the parser. If there is no regression compared to the current version with padding, we have a win-win situation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by design, the padding is 64 bytes. However, the 'paddedBuffer' is 34MB, that's such a waste. I'm just change the padding size to 64 bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that this is a waste. However, in your approach, you are potentially allocating a new array on every call of the parse method, which can be costly.

I've been working on removing the padding entirely. It's a bit complicated, but we will see if it is feasible. I'll report back.

System.arraycopy(buffer, 0, paddedBuffer, 0, len);
return paddedBuffer;
}
Expand Down
4 changes: 4 additions & 0 deletions src/main/java/org/simdjson/StringParser.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ class StringParser {
this.stringBuffer = stringBuffer;
}

int getStringBufferIdx() {
return stringBufferIdx;
}

void parseString(byte[] buffer, int idx) {
tape.append(stringBufferIdx, STRING);
int src = idx + 1;
Expand Down
6 changes: 6 additions & 0 deletions src/main/java/org/simdjson/Tape.java
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@ class Tape {
tape = new long[capacity];
}

Tape(Tape other) {
this.tape = new long[other.tapeIdx];
System.arraycopy(other.tape, 0, this.tape, 0, other.tapeIdx);
this.tapeIdx = other.tapeIdx;
}

void append(long val, char type) {
tape[tapeIdx] = val | (((long) type) << 56);
tapeIdx++;
Expand Down
8 changes: 7 additions & 1 deletion src/main/java/org/simdjson/TapeBuilder.java
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,13 @@ void reset() {
}

JsonValue createJsonValue(byte[] buffer) {
return new JsonValue(tape, 1, stringBuffer, buffer);
Tape newTape = new Tape(tape);

int stringBufferLen = stringParser.getStringBufferIdx();
byte[] newStringBuffer = new byte[stringBufferLen];
System.arraycopy(stringBuffer, 0, newStringBuffer, 0, stringBufferLen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to #36? I'm asking because I'm a bit concerned that we need another allocation on the parsing path.

Copy link
Author

@ZhaiMo15 ZhaiMo15 Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think there must be an allocation somewhere, if we want to save the information of "old" data.

Copy link
Author

@ZhaiMo15 ZhaiMo15 Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as I mentioned above, the default size of buffer and stringBuffer, as well as long[] tape in class Tape, is 34M. If we allocated 34M * 3 for each element, the cost is way too much.


return new JsonValue(newTape, 1, newStringBuffer, buffer);
}

private static class OpenContainer {
Expand Down