Skip to content

[New] Add a LP on accelerating a Voice Assistant with KleidiAI and SME2. #1883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: Prerequisites
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Install software required for this Learning Path

In this learning path, you will compile an Android application, so you first need to download and install the latest version of [Android Studio](https://developer.android.com/studio) on your computer.

You then need to ensure you have the following tools:
- `cmake`, the software build system
- `git`, the version control system for cloning the Voice Assistant codebase
- `adb`, the Android Debug Bridge, a command-line tool to communicate with a device and perform various commands on it

These tools can be installed by running the following command (depending on your machine's OS):

{{< tabpane code=true >}}
{{< tab header="Linux/Ubuntu" language="bash">}}
sudo apt install git adb cmake
{{< /tab >}}
{{< tab header="macOS" language="bash">}}
brew install git android-platform-tools cmake
{{< /tab >}}
{{< /tabpane >}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: Overview
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---

The Voice Assistant is an example application that demonstrates a complete voice interaction pipeline for Android.

It generates intelligent responses by utilizing:
1. Speech-to-Text (STT) to transform the user's audio prompt into a text representation,
2. a Large Language Model (LLM) to answer the user's prompt in text form,
3. the Android Text-to-Speech (TTS) API is then used to produce a voice response.

![example image alt-text#center](overview.png "Figure 1: Overview")

These three steps correspond to specific components used in the Voice Assistant application. A more detailed description of each one follows.

## Speech to Text Library

Speech-to-Text is also known as Automatic Speech Recognition. This part of the pipeline focuses on converting spoken language into written text.

Speech recognition is done in the following stages:
- The device's microphone captures spoken language as an audio waveform,
- The audio waveform is broken into small time frames, and features are extracted to represent sound,
- A neural network is used to predict the most likely transcription of audio based on grammar and context,
- The final recognized text is generated for the next stage of the pipeline.

## Large Language Models Library

Large Language Models (LLMs) are designed for natural language understanding, and in this application, they are used for question-answering.

The text transcription from the previous part of the pipeline is used as input to the neural model. During initialization, the application assigns a persona to the LLM to ensure a friendly and informative voice assistant experience. By default, the application uses an asynchronous flow for this part of the pipeline, meaning that parts of the response are collected as they become available. The application UI is updated with each new token, and these are also used for the final stage of the pipeline.

## Text to Speech Component

Currently, this part of the application pipeline uses the Android Text-to-Speech API with some extra functionality to ensure smooth and natural speech output.

In synchronous mode, speech is only generated after the full response from the LLM is received. By default, the application operates in asynchronous mode, where speech synthesis starts as soon as a sufficient portion of the response (such as a half or full sentence) is available. Any additional responses are queued for processing by the Android Text-to-Speech engine.
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: Build the Voice Assistant
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Download the Voice Assistant

```BASH
git clone https://git.gitlab.arm.com/kleidi/kleidi-examples/real-time-voice-assistant.git voice-assistant.git
```

## Build the Voice Assistant

Open Android Studio and open the project that you just downloaded in the preceding step:

![example image alt-text#center](open_project.png "Figure 2: Open the project in Android Studio.")

Build the application with its default settings by clicking the little hammer
"Make Module 'VoiceAssistant.app'" button in the upper right corner:

![example image alt-text#center](build_project.png "Figure 3: Build the project.")

Android Studio will start the build, which may take some time if it needs to
download some dependencies of the Voice Assistant app:

![example image alt-text#center](build_success.png "Figure 4: Successful build!")
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---

title: Run the Voice Assistant
weight: 6

### FIXED, DO NOT MODIFY
layout: learningpathall
---

In the previous section, we built the Voice Assistant app. Now, we need to install it on the phone. The easiest way to do this is to put the Android phone in developer mode and use a USB cable to upload the application.

## Switch your phone to developer mode

By default, developer mode is not active on Android phones. You will need to activate it by following [these instructions](https://developer.android.com/studio/debug/dev-options).

## Upload the Voice Assistant to your phone

Once your phone is in developer mode, plug it into the USB cable: it should appear as a running device in the top bar. Select it and then press the run button (small red circle in figure 4 below). This will transfer the app to the phone and launch it.

In the picture below, a Pixel 6a phone has been connected to the USB cable:
![example image alt-text#center](upload.png "Figure 5: Upload the Voice App")

## Run the Voice Assistant

The Voice Assistant will welcome you with this screen:

![example image alt-text#center](voice_assistant_view1.png "Figure 6: Welcome Screen")

You can now press the part at the bottom and make your request!

## Voice Assistant Controls

### Performance Counters

You can switch on/off the display of some performance counters like:
- Speech recognition time
- LLM encode tokens/s
- LLM decode tokens/s
- Speech generation time

by clicking on the element circled in red in the upper left:

![example image alt-text#center](voice_assistant_view2.png "Figure 7: Performance Counters")

### Reset the Voice Assistant's Context

By clicking on the icon circled in red in the upper right corner, you can reset the assistant's context.

![example image alt-text#center](voice_assistant_view3.png "Figure 8: Reset the Voice Assistant's Context")
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---

title: KleidiAI

weight: 7

### FIXED, DO NOT MODIFY

layout: learningpathall

---

The LLM part of the Voice Assistant uses [Llama.cpp](https://github.com/ggml-org/llama.cpp). LLM inference is a highly computation-intensive task and has been heavily optimized within Llama.cpp for various platforms, including Arm.

Speech recognition is also a computation-intensive task and has been optimized for Arm processors as well.

## KleidiAI

This application uses the [KleidiAI library](https://gitlab.arm.com/kleidi/kleidiai) by default for optimized performance on Arm processors.

[KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) is an open-source library that provides optimized performance-critical routines, also known as micro-kernels, for artificial intelligence (AI) workloads tailored for Arm CPUs.

These routines are tuned to exploit the capabilities of specific Arm hardware architectures, aiming to maximize performance.

The KleidiAI library has been designed for easy adoption into C or C++ machine learning (ML) and AI frameworks. Developers looking to incorporate specific micro-kernels into their projects can simply include the corresponding `.c` and `.h` files associated with those micro-kernels and a common header file.

### Compare the performance without KleidiAI

By default, the Voice Assistant is built with KleidiAI support on Arm platforms, but this can be disabled if you want to compare the performance to a raw implementation.

You can disable KleidiAI support at build time in Android Studio by adding `-PkleidiAI=false` to the Gradle invocation. You can also edit the top-level `gradle.properties` file and add `kleidiAI=false` at the end of it.

### Why use KleidiAI?

A significant benefit of using KleidiAI is that it enables the developer to work at a relatively high level, leaving the KleidiAI library to select the best implementation at runtime to perform the computation in the most efficient way on the current target. This is a great advantage because a significant amount of work has gone into optimizing those micro-kernels.

It becomes even more powerful when newer versions of the architecture become available: a simple update of the KleidiAI library used by the Voice Assistant will automatically give it access to newer hardware features as they become available. An example of such a feature deployment is happening with SME2, which means in the near future, the Voice Assistant will be able to benefit from improved performance — on devices that have implemented SME2 — with no further effort required from the developer.
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: Accelerate a Voice Assistant with KleidiAI and SME2

minutes_to_complete: 30

who_is_this_for: This Learning Path is an introductory topic on improving the performances of a voice assistant by using KleidiAI and SME2.

learning_objectives:
- Compile an Android application
- Use KleidAI and SME2 to improve the performance of the voice assistant

prerequisites:
- an Android phone
- Android Studio
- CMake
- adb
- git

author: Arnaud de Grandmaison

test_images:
- ubuntu:latest
test_link: null
test_maintenance: true

### Tags
skilllevels: Introductory
subjects: Performance and Architecture
armips:
- Cortex-A
tools_software_languages:
- Java
- Kotlin
operatingsystems:
- Linux
- macOS
- Windows

further_reading:

- resource:
title: Accelerate Generative AI workloads using KleidiAI
link: https://learn.arm.com/learning-paths/cross-platform/kleidiai-explainer
type: website

- resource:
title: LLM inference on Android with KleidiAI, MediaPipe, and XNNPACK
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/kleidiai-on-android-with-mediapipe-and-xnnpack/
type: website

- resource:
title: Vision LLM inference on Android with KleidiAI and MNN
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/
type: website

### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
title: "Next Steps" # Always the same
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
review:
- questions:
question: >
What is KleidiAI?
answers:
- An anime about a little AI lost in a giant world.
- A software library
correct_answer: 2
explanation: >
KleidiAI is an open-source software library that provides optimized
performance-critical micro-kernels for artificial intelligence (AI)
workloads tailored for Arm processors.

- questions:
question: >
How does KleidiAI optimize performance?
answers:
- Lots of magic, and let's be honest, a bit of hard work
- It takes advantage of different available Arm processor architectural features.
correct_answer: 2
explanation: >
Processor architectural features, e.g., ``FEAT_DotProd``, when implemented, enable
the software to use specific instructions dedicated to efficiently performing some
tasks or computations. For example, when implemented, ``FEAT_DotProd`` adds the
``UDOT`` and ``SDOT`` 8-bit dot product instructions, which are critical for
improving the performance of dot product computations.

# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
title: "Review" # Always the same title
weight: 20 # Set to always be larger than the content in this path
layout: "learningpathall" # All files under learning paths have this same wrapper
---
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 5 additions & 1 deletion data/stats_current_test_info.yml
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,11 @@ sw_categories:
tests_and_status: []
iot: {}
laptops-and-desktops: {}
mobile-graphics-and-gaming: {}
mobile-graphics-and-gaming:
voice-assistant:
readable_title: Accelerate a Voice Assistant with KleidiAI and SME2
tests_and_status:
- ubuntu:latest: passed
servers-and-cloud-computing:
clickhouse:
readable_title: Measure performance of ClickHouse on Arm servers
Expand Down
Loading