How to create code2vec input #186

messiGao · 2023-11-16T11:21:01Z

I use command like “{java -cp JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir test.java >file.txt }“ ，then use ”{python3 code2vec.py --load models/java14_model/saved_model_iter8.release --test file.txt}“，but get error “ {return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Expect 201 fields but have 4 in record
[[{{node IteratorGetNext}}]] }”.

urialon · 2023-11-16T11:56:07Z

Hi @messiGao ,
Thank you for your interest in our work.

I think there is a confusion, because the exception that is raised is coming from TensorFlow, while the java command that you mentioned does not involve TensorFlow at all.

May I also ask what kinds of tasks are you looking into?
Maybe I can recommend a newer model.

Best,
Uri

messiGao · 2023-11-16T12:04:05Z

I want to use the “--test” command to export <TEST_FILE>.vectors,but I don't know what kind of TEST_FILE is correct。when i ask gpt-4， the answer is use the JavaExtractor to convert my test.java to test.txt。

messiGao · 2023-11-16T12:07:13Z

Additionally,My aim is to store a Java codebase in a vector database to run similarity searches and retrieve code files from the db relevant to my query.

urialon · 2023-11-16T15:19:06Z

Hi @messiGao ,

Please see https://github.com/neulab/code-bert-score
You don't need the approach itself, but it contains Huggingface models, and one specifically for java called neulab/codebert-java.

This will allow you to use the Huggingface library with that model and a BERT-like framework.

Best,
Uri

asyed79gatech · 2024-02-22T09:38:09Z

I have a similar dilemma with regards to creating embeddings of csharp code using a code2vec model I have trained. As
@messiGao mentioned, I want to use the "--test" command to create .vectors file as mentioned in the repo but when i execute the command, it gives the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Expect 201 fields but have 2 in record
         [[node IteratorGetNext (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]]```

urialon · 2024-02-22T13:33:41Z

Hi @asyed79gatech ,
Thank you for your interest in our work.

I believe that you haven't run the preprocess.sh script on the data.

However in general, I recommend using the newer https://github.com/neulab/code-bert-score project. It is based on Huggingface, which is actively maintained.

Best,
Uri

asyed79gatech · 2024-02-22T13:37:54Z

Hi @urialon

Thanks for your prompt response. I thought we only needed to run the preprocess.sh script while training the code2vec model. Right now, I already have a trained model released and want it to generate embeddings for vector store.

XuPing1234 · 2024-06-29T04:11:51Z

我使用像“{java -cp JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir test.java >file.txt }”这样的命令，然后使用“{python3 code2vec.py --load models/java14_model/saved_model_iter8.release --test file.txt}”，但出现错误“ {return tf_session。TF_SessionRun_wrapper（self._session、选项、feed_dict、tensorflow.python.framework.errors_impl。InvalidArgumentError：预期有 201 个字段，但记录中有 4 个字段 [[{{node IteratorGetNext}}]] }“。

Hello, have you resolved your issue? How can Java source code be converted into the input format required by code2vec?

zhaojialinnn · 2024-08-20T09:41:08Z

我使用像“{java -cp JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir test.java >file.txt }”这样的命令，然后使用“{python3 code2vec .py --load models/java14_model/saved_model_iter8.release --test file.txt}”，但出现错误“ {return tf_session。TF_SessionRun_wrapper（self._session、选项、feed_dict、tensorflow.python.framework.errors_impl。InvalidArgumentError：预期有 201 个字段，但记录有 4 个字段 [[{{node IteratorGetNext}}]] }“。

您好，您的问题解决了吗？Java 源代码如何转换成 code2vec 所需的输入格式？

hello, I encountered the same issue. Have you resolved it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to create code2vec input #186

How to create code2vec input #186

messiGao commented Nov 16, 2023

urialon commented Nov 16, 2023

messiGao commented Nov 16, 2023 •

edited

Loading

messiGao commented Nov 16, 2023

urialon commented Nov 16, 2023

asyed79gatech commented Feb 22, 2024

urialon commented Feb 22, 2024 •

edited

Loading

asyed79gatech commented Feb 22, 2024

XuPing1234 commented Jun 29, 2024

zhaojialinnn commented Aug 20, 2024

How to create code2vec input #186

How to create code2vec input #186

Comments

messiGao commented Nov 16, 2023

urialon commented Nov 16, 2023

messiGao commented Nov 16, 2023 • edited Loading

messiGao commented Nov 16, 2023

urialon commented Nov 16, 2023

asyed79gatech commented Feb 22, 2024

urialon commented Feb 22, 2024 • edited Loading

asyed79gatech commented Feb 22, 2024

XuPing1234 commented Jun 29, 2024

zhaojialinnn commented Aug 20, 2024

messiGao commented Nov 16, 2023 •

edited

Loading

urialon commented Feb 22, 2024 •

edited

Loading