Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orc hdfs #1674

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Orc hdfs #1674

wants to merge 3 commits into from

Conversation

372046933
Copy link
Contributor

@372046933 372046933 commented Apr 27, 2022

Related tasks: #1372 (comment)
This PR use Tensorflow Filesystem API to access HDFS. Instead of relying on libhdfspp, which is not included in the current compilation setup.
By the way, libhdfspp is not another wrapper of C libhdfs. But it is an implementation based on RPC protocol. Which is quite complex and some of the code seems not well maitained.
IMHO, we can rely on TensorFlow's modular Filesystem HDFS API. Which is based on libhdfs and quite stable. libtensorflow_io_plugins.so is loaded when import tensorflow_io is executed in Python. So the following C++ code

std::unique_ptr<tensorflow::RandomAccessFile> file_;
tensorflow::Env::Default()->NewRandomAccessFile("hdfs:///xxx/yyy/z", &file_);

returns a successful RandonAccessFile. In this way, we can support reading ORC from HDFS

@372046933
Copy link
Contributor Author

By the way, Kerberos support is provided by libhdfs, libgssapi-krb5-2 etc., which must be installed on the environment.
I have tested libhdfspp and found that libhdfspp does not support kerberos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant