Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45394: [C++] Handle Single-Line JSON Without Line Ending #45443

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

JOBIN-SABU
Copy link

@JOBIN-SABU JOBIN-SABU commented Feb 6, 2025

Fixes issue #45394: Handle single-line JSON without line ending

  • Previously, if a file contained only one line of JSON and lacked a newline character at the end,
    pyarrow.dataset.dataset would correctly infer the schema but fail to load the values from that row.
  • This fix ensures that such cases are handled properly.
  • Additionally, I have fixed some other minor issues in the code.
  • Tests are still not passing, but they were already failing before these changes.
  • I believe the issue might be on the upstream side, but please review.

Looking forward to feedback!

Copy link

github-actions bot commented Feb 6, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@JOBIN-SABU JOBIN-SABU changed the title Fix issue #45394: Handle single-line JSON without line ending GH-45394: [C++] Handle Single-Line JSON Without Line Ending Feb 6, 2025
Copy link

github-actions bot commented Feb 6, 2025

⚠️ GitHub issue #45394 has been automatically assigned in GitHub to PR creator.

@@ -0,0 +1,81 @@
#include <gtest/gtest.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason you're not using the existing files_json_test.cc file and infrastructure and you're adding all this new CMake?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @raulcd,

I sincerely apologize for overlooking the existing file_json_test.cc file and the associated infrastructure. I should have checked and integrated my changes into the existing test framework instead of introducing new CMake configurations.

I’ll make the necessary corrections to align with the existing structure and update the PR accordingly. Thanks for pointing this out!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries at all! Thanks for your contributions!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @raulcd a

I hope you're doing well.

I've made progress on fixing issue #45394, but I've encountered some challenges with running the tests due to hardware and internet limitations. While I believe I've addressed the core of the problem, the tests are still not passing, and I think this might be an upstream issue.

I've pushed my changes to the fix-45394 branch. Could you please take it from here and help with running the tests and finalizing any additional changes?

I appreciate your assistance and look forward to your feedback.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants