Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test inferred schema for json array #454

Merged
merged 1 commit into from
Jul 17, 2023

Conversation

trueleo
Copy link
Contributor

@trueleo trueleo commented Jul 17, 2023

Description

Arrow's infer_json_schema internally uses IndexMap to keep track of inferred schema while it is being updated as it goes through array of json objects. Thus the order in which insertion happens to this map determines the output order for schema fields as well.

Since the json object in serde_json uses BtreeMap, the iteration is always ascending by key. This leads to infer_schema producing schema fields in ascending order when we infer using only one json object. In case of json array any new fields encountered are added to the end of indexmap. Thus output order of infer schema did not provide good enough guarantee.

This is solved in #450. This PR just adds a test for it.


This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Arrow infer schema internally uses IndexMap to keep track of inferred schema
while updating it through given json objects. Thus the order in which
insertion happens to this map determines the output order for schema fields
as well.

Since the json object in serde_json uses BtreeMap, the iteration is always
ascending by key. This leads to infer_schema producing schema fields in
ascending order when we infer using only one json object.
In case of json array any new fields encountered are added to the last.
Thus output order of infer schema did not provide good enough guarantee.
This is solved in parseablehq#450. This commit just adds a test for it.
@nitisht nitisht merged commit 294a887 into parseablehq:main Jul 17, 2023
6 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 17, 2023
@trueleo trueleo deleted the add_test_infer branch August 7, 2023 09:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants