Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alabulei1 add llama2 inference #182

Closed

Conversation

alabulei1
Copy link
Collaborator

Add docs for

  1. llama2 inference with WasmEdge and Rust
  2. install instruction for ggml backend
  3. fix the typos in mediapipe

Copy link
Collaborator Author

alabulei1 commented Oct 30, 2023

Hello, I am a code review bot on flows.network. Here are my reviews of code commits in this PR.


Overall, the pull request titled "Alabulei1 add llama2 inference" seems to be focused on adding support and documentation for running llama2 inference using the ggml backend in WasmEdge and Rust. The changes include adding a new section for the WASI-NN plug-in with ggml backend, providing installation instructions for WasmEdge with the ggml backend, and adding a link to the chapter on running AI inference with llama2 models.

Most of the individual summaries did not identify any potential problems or issues with the patch. However, there are a few findings that should be addressed:

  1. Inconsistency in naming convention: In one of the summaries, there is a mention of inconsistency in the naming convention for wasm files and model files. It would be better to ensure consistency in the naming throughout the codebase.

  2. Lack of code context: In one of the summaries, it was mentioned that the patch only includes changes to the documentation file and does not provide any details about the code changes. It would be helpful to have more context or information about the changes made in the code.

  3. Typo and formatting errors: In a couple of summaries, there were mentions of typos and formatting errors in the code or documentation files. These should be fixed to maintain code clarity and readability.

In summary, the pull request adds valuable documentation and functionality for running llama2 inference using the ggml backend. It is important to address the identified issues and errors to ensure the overall quality of the patch.

Details

Commit ce0d25ca6ea021db50702ac0a78fe710cfeda44b

Key changes:

  • Added a new section for WASI-NN plug-in with ggml backend.
  • Provided instructions on how to install WasmEdge with ggml backend for llama2 inference.
  • Added a link to the chapter on running AI inference with llama2 series of models.

Potential problems:

  • No potential problems were identified in this patch.

Overall, the changes seem to be straightforward and focused on adding support for llama2 inference using the ggml backend.

Commit 7e32ab6e6b8d7d597ae25bb0086d712a9fc64a1f

Key changes in the pull request:

  • Added a new Markdown file llm-inference.md that provides instructions and examples for running llama2 models in WasmEdge and Rust.
  • The Markdown file contains information about the prerequisites, quick start instructions, build and run instructions, optional configurations, performance improvements, and code explanation.

Potential problems:

  • No potential problems were identified in the patch.

Additional comments:

  • The instructions and examples provided in the Markdown file seem clear and comprehensive.
  • The code explanation section provides a good overview of the main.rs file and its functionality.
  • Overall, the changes in this pull request are valuable additions to the project's documentation.

Commit 7d95466766594169c7e6deb131e2efb23ef8dc00

Key changes:

  • The sidebar_position in the mediapipe.md file has been changed from 1 to 2.

Potential problems:

  • There is a typo in the comment on line 19. "dection" should be "detection".

Overall, the changes seem to be small and mostly cosmetic. There are no major issues with the patch.

Commit ccc52f29a298fceae0583183e5e5a9f2be4533c6

Key changes in the patch:

  • The file llm-inference.md in the docs/develop/rust/wasinn directory has been modified.
  • The names of the wasm files have been changed from chat.wasm to llama-chat.wasm.
  • The model names have been changed from llama-2-7b-chat.Q5_K_M.gguf to llama-2-13b-chat.Q5_K_M.gguf.

Potential problems:

  • It seems that there is some inconsistency in the naming convention used for the wasm files and model files. It would be better to ensure consistency in the naming throughout the codebase.
  • The patch only includes changes to the documentation file and does not provide any details about the code changes. It would be helpful to have more context or information about the changes made in the code.

Commit b05a011aaa958338c362455be61a215840400505

Key changes:

  • Updated the link for the ggml backend in the install.md file.

Potential problems:

  • No potential problems found in this patch.

Commit 5c047dc9d6df5c0ac4c5bfc21f17c0646b24a1e0

Key changes:

  • Added a new file llm-inference.md under the directory i18n/zh/docusaurus-plugin-content-docs/current/develop/rust/wasinn.
  • Added content to the llm-inference.md file, including instructions for running inferences with the llama2 model in WasmEdge and Rust.

Potential problems:

  • The patch does not include any code changes, so it is difficult to identify potential problems. However, there could be issues in the newly added llm-inference.md file, such as incorrect or incomplete instructions or missing prerequisites.

Overall, the patch seems to add documentation for running llama2 inferences using WasmEdge and Rust. It would be important to review the llm-inference.md file for accuracy and completeness.

Commit 106c7505e592c17b7ff728549d42e91f12110bf1

Key changes in the patch:

  1. The sidebar_position in the metadata of the mediapipe.md file has been changed from 1 to 2.

  2. A typo has been fixed in the code comment. "dection" has been corrected to "detection".

Potential problems:

  1. It is unclear why the sidebar_position value has been changed. The significance of this change should be explained in the pull request or in the commit message.

  2. The typo "dection" in the comment has been fixed, but it is possible that there are other typos or mistakes in the file as well. A thorough review of the entire file is recommended to ensure all necessary changes have been made.

Overall, the changes in this patch seem relatively minor and do not raise any major concerns.

Commit e54976134fb741332b37a43e12139b00c261b972

Key changes in the patch:

  • Added a new section called "WASI-NN plug-in with ggml backend" to the install.md file.
  • Provided instructions on how to install WasmEdge with the WASI-NN ggml backend.
  • Added a reference to the "Llama2 inference in Rust" chapter.

Potential problems:

  • There are no apparent problems in this patch. It seems to be a straightforward addition of documentation for the ggml backend installation instructions.

Commit 6d118bcb271db98433267ded3272303a33d60a1f

Key changes in the patch:

  • The model download links have been updated to new URLs for the llama-2-7b and llama-2-13b chat models.

Potential problems:

  • In both instances, the git clone command is incorrect. The curl -LO command should be used instead to download the files.
  • The new model download links may not be correct. It's important to verify that the URLs are valid and accessible.
  • The patch does not provide any explanation or context for why the model download links are being updated. It would be helpful to include a commit message or comment explaining the reason for these changes.

Commit 1461be0f00429f76af5529e7506fa1ea3a500a58

Key changes:

  • The code has been simplified in the code walkthrough part of the llm-inference.md documentation file.
  • There have been 76 insertions and 145 deletions in the file.

Potential problems:

  • The code patch does not introduce any new functionality or fix any bugs. It only simplifies the existing code.
  • The documentation file may be incomplete or incorrect, as the changes seem to be focused on code readability rather than correctness or new features.

Commit 4c5cd77c01e519d2f4db3d79f1bb391b48cd3898

Key changes:

  • Updated the instructions on how to load the llama-2-13b model.
  • Modified the command to use the correct model file name in the --nn-preload parameter.
  • Modified the handling of command line arguments in the main function by adding a new argument for the prompt.
  • Removed the prompt_template parameter from the command line arguments and updated the code accordingly.
  • Created a new execution context from the loaded Graph.
  • Modified the code for parsing and creating the prompt template based on the input prompt.

Potential problems:

  • The patch does not provide a clear explanation of why the changes are being made. It would be helpful to have a more detailed commit message or pull request description explaining the motivation behind the changes.
  • There are multiple instances where the code formatting is inconsistent, such as extra spaces, missing line breaks, and inconsistent indentation. These should be cleaned up for better readability.
  • The patch removes the prompt_template parameter from the command line arguments, but it is not clear if this was intentional or an oversight. If it was intentional, the code associated with it should be removed as well.
  • The patch introduces a new argument for the prompt in the main function, but there is no code that actually uses this argument. This should be implemented or removed if not needed.
  • The patch includes changes to the Rust code, but it is not clear how these changes relate to the rest of the changes in the patch. It would be helpful to have a more detailed explanation of the overall changes and how the code changes fit into the larger context.

Commit 112214db0d07090bc177877348fb116e4175efa7

Key Changes:

  • The patch fixes the formatting of the "LLAMA_LOG" option in the table.
  • The patch fixes the formatting of the "LLAMA_N_CTX" option in the table.

Potential Problems:

  • No potential problems were identified in the patch.

Signed-off-by: alabulei1 <[email protected]>
Signed-off-by: alabulei1 <[email protected]>
@juntao
Copy link
Member

juntao commented Oct 30, 2023

1 We should download models from our HF repo now:

https://huggingface.co/wasmedge/llama2

2 I think it is easier to walkthrough the code for the simple example. And then explain / demo the chat and api-server examples.

No need for code walkthrough for the latter two examples.

Signed-off-by: alabulei1 <[email protected]>
Signed-off-by: alabulei1 <[email protected]>
Signed-off-by: alabulei1 <[email protected]>
@alabulei1
Copy link
Collaborator Author

@juntao
Copy link
Member

juntao commented Oct 30, 2023

The simple example should not have chat template. It should not support conversation history either. It should just let the user manually assemble a promote and it will output a string and exit. We will need to simplify it.

The other example will be removed later. There cannot be two sources of truth.

@alabulei1
Copy link
Collaborator Author

The simple example should not have chat template. It should not support conversation history either. It should just let the user manually assemble a promote and it will output a string and exit. We will need to simplify it.

The other example will be removed later. There cannot be two sources of truth.

@apepkuss Could you please check out the comments above -- simplify the simple example? Thanks.

@apepkuss
Copy link
Contributor

Will update simple. Thanks a lot!

@alabulei1 alabulei1 requested review from juntao and hydai November 1, 2023 07:02
docs/develop/rust/wasinn/llm-inference.md Outdated Show resolved Hide resolved
@alabulei1 alabulei1 closed this Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants