Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update arrow example for columnar data #279

Merged
merged 21 commits into from
Oct 25, 2024
Merged

Conversation

nitbharambe
Copy link
Member

@nitbharambe nitbharambe commented Oct 20, 2024

Changes proposed in this PR include:

  • Added columnar data example (WIP based on decisions)
  • Schema creation

Could you please pay extra attention to the points below when reviewing the PR:

  • Table creation is changed to record batch

Decisions

  • Keep or remove row based examples?
    • Decided to only keep columnar
  • Table creation should be individually or via the schema? Check #TODO Decisions: Create from schema
  • Asymmetric attributes should be fixed list array or individual phases or both? Check # TODO Decisions: Asymmetric attributes should be fixed list array or individual phases or both?
  • Should we leave update data creation upto users again or provide an example for that as well?

Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Copy link
Member

@mgovers mgovers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave update data creation up to users again or provide an example for that as well?

since this is only an example notebook, not a full feature, i don't think we need to add that. the only difference would be DatasetType.update instead of DatasetType.input

sidenote: i see you add a lot of new cells. is there a need for that? or is it because you haven't removed the old ones yet?

docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
@nitbharambe
Copy link
Member Author

Should we leave update data creation up to users again or provide an example for that as well?

since this is only an example notebook, not a full feature, i don't think we need to add that. the only difference would be DatasetType.update instead of DatasetType.input

sidenote: i see you add a lot of new cells. is there a need for that? or is it because you haven't removed the old ones yet?

Cells regarding decisions will be removed. The final example will be similar size as original.

Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
@nitbharambe nitbharambe marked this pull request as ready for review October 21, 2024 14:40
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
@mgovers
Copy link
Member

mgovers commented Oct 22, 2024

CI is failing on code quality

nitbharambe and others added 7 commits October 22, 2024 12:37
Co-authored-by: Martijn Govers <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
docs/examples/arrow_example.ipynb Show resolved Hide resolved
docs/examples/arrow_example.ipynb Show resolved Hide resolved
"id: int32\n",
"u_rated: double\n",
"-------asym load combined asym scehma-------\n",
"-------asym load scehma-------\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"-------asym load scehma-------\n",
"-------asym load schema-------\n",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did this end up in production? please do a follow-up PR to fix this. also double-check that there are no other typos like this remaining

docs/examples/arrow_example.ipynb Show resolved Hide resolved
docs/examples/arrow_example.ipynb Show resolved Hide resolved
docs/examples/arrow_example.ipynb Show resolved Hide resolved
docs/examples/arrow_example.ipynb Show resolved Hide resolved
docs/examples/arrow_example.ipynb Show resolved Hide resolved
nitbharambe and others added 4 commits October 23, 2024 13:09
Co-authored-by: Martijn Govers <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Copy link
Member

@mgovers mgovers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of it looks good

docs/examples/arrow_example.ipynb Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Signed-off-by: Nitish Bharambe <[email protected]>
Copy link
Contributor

@figueroa1395 figueroa1395 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional nitpick: Although the section for batch data was not updated, I would still add a sentence reminding that it should be provided in columnar format to avoid zero copy (right?) and that it can also be validated. Just to be more self contained, but it is fine as is if you don't deem it necessary.

docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Outdated Show resolved Hide resolved
docs/examples/arrow_example.ipynb Show resolved Hide resolved
@nitbharambe
Copy link
Member Author

Additional nitpick: Although the section for batch data was not updated, I would still add a sentence reminding that it should be provided in columnar format to avoid zero copy (right?) and that it can also be validated. Just to be more self contained, but it is fine as is if you don't deem it necessary.

A lot more details can be mentioned about batch data, hence I chose not to mention any of them as it might appear incomplete :D
This point about columnar can indeed be inferred by the user.

In batch case, there would be multiple things to think about:
Different data sources, different batches with multiple scenarios etc. An advanced user would definitely needs to dive deeper there.

Signed-off-by: Nitish Bharambe <[email protected]>
Copy link

sonarcloud bot commented Oct 24, 2024

@nitbharambe nitbharambe added this pull request to the merge queue Oct 25, 2024
Merged via the queue into main with commit 5593945 Oct 25, 2024
25 checks passed
@nitbharambe nitbharambe deleted the docs/updated-arrow-example branch October 25, 2024 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants