Skip to content

Commit f77a535

Browse files
[New] Add a LP on accelerating a Voice Assistant with KleidiAI and SME2.
1 parent 59f4931 commit f77a535

16 files changed

+281
-0
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: Prerequisites
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Install software required for this Learning Path
10+
11+
In this learning path, you will compile an Android application, so you first need to download and install the latest version of [Android Studio](https://developer.android.com/studio) on your computer.
12+
13+
You then need to ensure you have the following tools:
14+
- `cmake`, the software build system
15+
- `git`, the version control system for cloning the Voice Assistant codebase
16+
- `adb`, the Android Debug Bridge, a command-line tool to communicate with a device and perform various commands on it
17+
18+
These tools can be installed by running the following command (depending on your machine's OS):
19+
20+
{{< tabpane code=true >}}
21+
{{< tab header="Linux/Ubuntu" language="bash">}}
22+
sudo apt install git adb cmake
23+
{{< /tab >}}
24+
{{< tab header="macOS" language="bash">}}
25+
brew install git android-platform-tools cmake
26+
{{< /tab >}}
27+
{{< /tabpane >}}
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Overview
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
The Voice Assistant is an example application that demonstrates a complete voice interaction pipeline for Android.
10+
11+
It generates intelligent responses by utilizing:
12+
1. Speech-to-Text (STT) to transform the user's audio prompt into a text representation,
13+
2. a Large Language Model (LLM) to answer the user's prompt in text form,
14+
3. the Android Text-to-Speech (TTS) API is then used to produce a voice response.
15+
16+
![example image alt-text#center](overview.png "Figure 1: Overview")
17+
18+
These three steps correspond to specific components used in the Voice Assistant application. A more detailed description of each one follows.
19+
20+
## Speech to Text Library
21+
22+
Speech-to-Text is also known as Automatic Speech Recognition. This part of the pipeline focuses on converting spoken language into written text.
23+
24+
Speech recognition is done in the following stages:
25+
- The device's microphone captures spoken language as an audio waveform,
26+
- The audio waveform is broken into small time frames, and features are extracted to represent sound,
27+
- A neural network is used to predict the most likely transcription of audio based on grammar and context,
28+
- The final recognized text is generated for the next stage of the pipeline.
29+
30+
## Large Language Models Library
31+
32+
Large Language Models (LLMs) are designed for natural language understanding, and in this application, they are used for question-answering.
33+
34+
The text transcription from the previous part of the pipeline is used as input to the neural model. During initialization, the application assigns a persona to the LLM to ensure a friendly and informative voice assistant experience. By default, the application uses an asynchronous flow for this part of the pipeline, meaning that parts of the response are collected as they become available. The application UI is updated with each new token, and these are also used for the final stage of the pipeline.
35+
36+
## Text to Speech Component
37+
38+
Currently, this part of the application pipeline uses the Android Text-to-Speech API with some extra functionality to ensure smooth and natural speech output.
39+
40+
In synchronous mode, speech is only generated after the full response from the LLM is received. By default, the application operates in asynchronous mode, where speech synthesis starts as soon as a sufficient portion of the response (such as a half or full sentence) is available. Any additional responses are queued for processing by the Android Text-to-Speech engine.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
title: Build the Voice Assistant
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Download the Voice Assistant
10+
11+
```BASH
12+
git clone https://git.gitlab.arm.com/kleidi/kleidi-examples/real-time-voice-assistant.git voice-assistant.git
13+
```
14+
15+
## Build the Voice Assistant
16+
17+
Open Android Studio and open the project that you just downloaded in the preceding step:
18+
19+
![example image alt-text#center](open_project.png "Figure 2: Open the project in Android Studio.")
20+
21+
Build the application with its default settings by clicking the little hammer
22+
"Make Module 'VoiceAssistant.app'" button in the upper right corner:
23+
24+
![example image alt-text#center](build_project.png "Figure 3: Build the project.")
25+
26+
Android Studio will start the build, which may take some time if it needs to
27+
download some dependencies of the Voice Assistant app:
28+
29+
![example image alt-text#center](build_success.png "Figure 4: Successful build!")
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
3+
title: Run the Voice Assistant
4+
weight: 6
5+
6+
### FIXED, DO NOT MODIFY
7+
layout: learningpathall
8+
---
9+
10+
In the previous section, we built the Voice Assistant app. Now, we need to install it on the phone. The easiest way to do this is to put the Android phone in developer mode and use a USB cable to upload the application.
11+
12+
## Switch your phone to developer mode
13+
14+
By default, developer mode is not active on Android phones. You will need to activate it by following [these instructions](https://developer.android.com/studio/debug/dev-options).
15+
16+
## Upload the Voice Assistant to your phone
17+
18+
Once your phone is in developer mode, plug it into the USB cable: it should appear as a running device in the top bar. Select it and then press the run button (small red circle in figure 4 below). This will transfer the app to the phone and launch it.
19+
20+
In the picture below, a Pixel 6a phone has been connected to the USB cable:
21+
![example image alt-text#center](upload.png "Figure 5: Upload the Voice App")
22+
23+
## Run the Voice Assistant
24+
25+
The Voice Assistant will welcome you with this screen:
26+
27+
![example image alt-text#center](voice_assistant_view1.png "Figure 6: Welcome Screen")
28+
29+
You can now press the part at the bottom and make your request!
30+
31+
## Voice Assistant Controls
32+
33+
### Performance Counters
34+
35+
You can switch on/off the display of some performance counters like:
36+
- Speech recognition time
37+
- LLM encode tokens/s
38+
- LLM decode tokens/s
39+
- Speech generation time
40+
41+
by clicking on the element circled in red in the upper left:
42+
43+
![example image alt-text#center](voice_assistant_view2.png "Figure 7: Performance Counters")
44+
45+
### Reset the Voice Assistant's Context
46+
47+
By clicking on the icon circled in red in the upper right corner, you can reset the assistant's context.
48+
49+
![example image alt-text#center](voice_assistant_view3.png "Figure 8: Reset the Voice Assistant's Context")
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
---
2+
3+
title: KleidiAI
4+
5+
weight: 7
6+
7+
### FIXED, DO NOT MODIFY
8+
9+
layout: learningpathall
10+
11+
---
12+
13+
The LLM part of the Voice Assistant uses [Llama.cpp](https://github.com/ggml-org/llama.cpp). LLM inference is a highly computation-intensive task and has been heavily optimized within Llama.cpp for various platforms, including Arm.
14+
15+
Speech recognition is also a computation-intensive task and has been optimized for Arm processors as well.
16+
17+
## KleidiAI
18+
19+
This application uses the [KleidiAI library](https://gitlab.arm.com/kleidi/kleidiai) by default for optimized performance on Arm processors.
20+
21+
[KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) is an open-source library that provides optimized performance-critical routines, also known as micro-kernels, for artificial intelligence (AI) workloads tailored for Arm CPUs.
22+
23+
These routines are tuned to exploit the capabilities of specific Arm hardware architectures, aiming to maximize performance.
24+
25+
The KleidiAI library has been designed for easy adoption into C or C++ machine learning (ML) and AI frameworks. Developers looking to incorporate specific micro-kernels into their projects can simply include the corresponding `.c` and `.h` files associated with those micro-kernels and a common header file.
26+
27+
### Compare the performance without KleidiAI
28+
29+
By default, the Voice Assistant is built with KleidiAI support on Arm platforms, but this can be disabled if you want to compare the performance to a raw implementation.
30+
31+
You can disable KleidiAI support at build time in Android Studio by adding `-PkleidiAI=false` to the Gradle invocation. You can also edit the top-level `gradle.properties` file and add `kleidiAI=false` at the end of it.
32+
33+
### Why use KleidiAI?
34+
35+
A significant benefit of using KleidiAI is that it enables the developer to work at a relatively high level, leaving the KleidiAI library to select the best implementation at runtime to perform the computation in the most efficient way on the current target. This is a great advantage because a significant amount of work has gone into optimizing those micro-kernels.
36+
37+
It becomes even more powerful when newer versions of the architecture become available: a simple update of the KleidiAI library used by the Voice Assistant will automatically give it access to newer hardware features as they become available. An example of such a feature deployment is happening with SME2, which means in the near future, the Voice Assistant will be able to benefit from improved performance — on devices that have implemented SME2 — with no further effort required from the developer.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: Accelerate a Voice Assistant with KleidiAI and SME2
3+
4+
minutes_to_complete: 30
5+
6+
who_is_this_for: This Learning Path is an introductory topic on improving the performances of a voice assistant by using KleidiAI and SME2.
7+
8+
learning_objectives:
9+
- Compile an Android application
10+
- Use KleidAI and SME2 to improve the performance of the voice assistant
11+
12+
prerequisites:
13+
- an Android phone
14+
- Android Studio
15+
- CMake
16+
- adb
17+
- git
18+
19+
author: Arnaud de Grandmaison
20+
21+
### Tags
22+
skilllevels: Introductory
23+
subjects: Performance and Architecture
24+
armips:
25+
- Cortex-A
26+
tools_software_languages:
27+
- Java
28+
- Kotlin
29+
operatingsystems:
30+
- Linux
31+
- macOS
32+
- Windows
33+
34+
further_reading:
35+
36+
- resource:
37+
title: Accelerate Generative AI workloads using KleidiAI
38+
link: https://learn.arm.com/learning-paths/cross-platform/kleidiai-explainer
39+
type: website
40+
41+
- resource:
42+
title: LLM inference on Android with KleidiAI, MediaPipe, and XNNPACK
43+
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/kleidiai-on-android-with-mediapipe-and-xnnpack/
44+
type: website
45+
46+
- resource:
47+
title: Vision LLM inference on Android with KleidiAI and MNN
48+
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/
49+
type: website
50+
51+
### FIXED, DO NOT MODIFY
52+
# ================================================================================
53+
weight: 1 # _index.md always has weight of 1 to order correctly
54+
layout: "learningpathall" # All files under learning paths have this same wrapper
55+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
56+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY
4+
# ================================================================================
5+
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
6+
title: "Next Steps" # Always the same
7+
layout: "learningpathall" # All files under learning paths have this same wrapper
8+
---
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
review:
3+
- questions:
4+
question: >
5+
What is KleidiAI?
6+
answers:
7+
- An anime about a little AI lost in a giant world.
8+
- A software library
9+
correct_answer: 2
10+
explanation: >
11+
KleidiAI is an open-source software library that provides optimized
12+
performance-critical micro-kernels for artificial intelligence (AI)
13+
workloads tailored for Arm processors.
14+
15+
- questions:
16+
question: >
17+
How does KleidiAI optimize performance?
18+
answers:
19+
- Lots of magic, and let's be honest, a bit of hard work
20+
- It takes advantage of different available Arm processor architectural features.
21+
correct_answer: 2
22+
explanation: >
23+
Processor architectural features, e.g., ``FEAT_DotProd``, when implemented, enable
24+
the software to use specific instructions dedicated to efficiently performing some
25+
tasks or computations. For example, when implemented, ``FEAT_DotProd`` adds the
26+
``UDOT`` and ``SDOT`` 8-bit dot product instructions, which are critical for
27+
improving the performance of dot product computations.
28+
29+
# ================================================================================
30+
# FIXED, DO NOT MODIFY
31+
# ================================================================================
32+
title: "Review" # Always the same title
33+
weight: 20 # Set to always be larger than the content in this path
34+
layout: "learningpathall" # All files under learning paths have this same wrapper
35+
---
196 KB
Loading
275 KB
Loading

0 commit comments

Comments
 (0)