Skip to content

Commit 2577f0b

Browse files
quic-jhugoquic-aashwinsquic-amitrajanujgupt-githubquic-mamta
committed
Initial commit
Initial commit of the Efficent Transformers library. Co-authored-by: Aashwin Sreenivasan <[email protected]> Signed-off-by: Aashwin Sreenivasan <[email protected]> Co-authored-by: Amit Raj <[email protected]> Signed-off-by: Amit Raj <[email protected]> Co-authored-by: Anuj Gupta <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> Co-authored-by: Mamta Singh <[email protected]> Signed-off-by: Mamta Singh <[email protected]> Co-authored-by: Onkar Chougule <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> Co-authored-by: Vinayak Baddi <[email protected]> Signed-off-by: Vinayak Baddi <[email protected]> Signed-off-by: Jeffrey Hugo <[email protected]>
0 parents  commit 2577f0b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+7634
-0
lines changed

.pre-commit-config.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
repos:
2+
- repo: https://github.com/astral-sh/ruff-pre-commit
3+
# Ruff version.
4+
rev: v0.3.4
5+
hooks:
6+
# Run the linter.
7+
- id: ruff
8+
types_or: [ python, pyi, jupyter ]
9+
args: [ --fix ]
10+
# Run the formatter.
11+
- id: ruff-format
12+
types_or: [ python, pyi, jupyter ]

CODE-OF-CONDUCT.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Contributor Covenant Code of Conduct
2+
3+
## Our Pledge
4+
5+
In the interest of fostering an open and welcoming environment, we as
6+
contributors and maintainers pledge to making participation in our project and
7+
our community a harassment-free experience for everyone, regardless of age, body
8+
size, disability, ethnicity, gender identity and expression, level of experience,
9+
nationality, personal appearance, race, religion, or sexual identity and
10+
orientation.
11+
12+
## Our Standards
13+
14+
Examples of behavior that contributes to creating a positive environment
15+
include:
16+
17+
* Using welcoming and inclusive language
18+
* Being respectful of differing viewpoints and experiences
19+
* Gracefully accepting constructive criticism
20+
* Focusing on what is best for the community
21+
* Showing empathy towards other community members
22+
23+
Examples of unacceptable behavior by participants include:
24+
25+
* The use of sexualized language or imagery and unwelcome sexual attention or
26+
advances
27+
* Trolling, insulting/derogatory comments, and personal or political attacks
28+
* Public or private harassment
29+
* Publishing others' private information, such as a physical or electronic
30+
address, without explicit permission
31+
* Other conduct which could reasonably be considered inappropriate in a
32+
professional setting
33+
34+
## Our Responsibilities
35+
36+
Project maintainers are responsible for clarifying the standards of acceptable
37+
behavior and are expected to take appropriate and fair corrective action in
38+
response to any instances of unacceptable behavior.
39+
40+
Project maintainers have the right and responsibility to remove, edit, or
41+
reject comments, commits, code, wiki edits, issues, and other contributions
42+
that are not aligned to this Code of Conduct, or to ban temporarily or
43+
permanently any contributor for other behaviors that they deem inappropriate,
44+
threatening, offensive, or harmful.
45+
46+
## Scope
47+
48+
This Code of Conduct applies both within project spaces and in public spaces
49+
when an individual is representing the project or its community. Examples of
50+
representing a project or community include using an official project e-mail
51+
address, posting via an official social media account, or acting as an appointed
52+
representative at an online or offline event. Representation of a project may be
53+
further defined and clarified by project maintainers.
54+
55+
## Enforcement
56+
57+
Instances of abusive, harassing, or otherwise unacceptable behavior may be
58+
reported by contacting the project team. All complaints will be reviewed
59+
and investigated and will result in a response that is deemed necessary and
60+
appropriate to the circumstances. The project team is obligated to maintain
61+
confidentiality with regard to the reporter of an incident.
62+
Further details of specific enforcement policies may be posted separately.
63+
64+
Project maintainers who do not follow or enforce the Code of Conduct in good
65+
faith may face temporary or permanent repercussions as determined by other
66+
members of the project's leadership.
67+
68+
## Attribution
69+
70+
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71+
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
72+
73+
[homepage]: https://www.contributor-covenant.org

CONTRIBUTING.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
## Contributing to PROJECT
2+
3+
Hi there!
4+
We’re thrilled that you’d like to contribute to this project.
5+
Your help is essential for keeping this project great and for making it better.
6+
7+
## Branching Strategy
8+
9+
In general, contributors should develop on branches based off of `master` and pull requests should be made against `master`.
10+
11+
## Submitting a pull request
12+
13+
1. Please read our [code of conduct](CODE-OF-CONDUCT.md) and [license](LICENSE).
14+
1. Fork and clone the repository.
15+
1. Create a new branch based on `master`: `git checkout -b <my-branch-name> master`.
16+
1. Make your changes, add tests, and make sure the tests still pass.
17+
1. Commit your changes using the [DCO](http://developercertificate.org/). You can attest to the DCO by commiting with the **-s** or **--signoff** options or manually adding the "Signed-off-by".
18+
1. Push to your fork and submit a pull request from your branch to `master`.
19+
1. Pat yourself on the back and wait for your pull request to be reviewed.
20+
21+
Here are a few things you can do that will increase the likelihood of your pull request to be accepted:
22+
23+
- Follow the existing style where possible.
24+
- Write tests.
25+
- Keep your change as focused as possible.
26+
If you want to make multiple independent changes, please consider submitting them as separate pull requests.
27+
- Write a [good commit message](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).

Dockerfile

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Use Ubuntu 20.04 as the base image
2+
# Create a temp image that has build tools that we can use to build wheel
3+
# files for dependencies only available as source.
4+
FROM docker-registry.qualcomm.com/library/ubuntu:20.04
5+
6+
# Update the package lists and install required packages
7+
RUN apt-get update && apt-get install -y \
8+
git \
9+
tmux \
10+
python3.8 \
11+
python3.8-venv \
12+
python3-pip
13+
14+
# pip recognizes this variable
15+
ENV PIP_CACHE_DIR=/var/cache/pip
16+
WORKDIR /app
17+
18+
# Sample command to register and clone the repository
19+
# Clone the GitHub repository
20+
RUN git config --global user.email [email protected] && \
21+
git config --global user.name none
22+
23+
RUN mkdir -p /app/qefficient-library
24+
COPY . /app/qefficient-library
25+
26+
# Create Virtual Env for the docker image
27+
RUN python3.8 -m venv /app/llm_env
28+
RUN . /app/llm_env/bin/activate
29+
WORKDIR /app/qefficient-library
30+
31+
# Install the required Python packages
32+
33+
RUN pip install torch==2.0.0+cpu --extra-index-url https://download.pytorch.org/whl/cpu --no-deps
34+
RUN pip install datasets==2.17.0 fsspec==2023.10.0 multidict==6.0.5 sentencepiece --no-deps
35+
36+
RUN python3.8 -m pip install .
37+
WORKDIR /app/qefficient-library
38+
39+
# Set the environment variable for the model card name and token ID
40+
ENV HF_HOME = "/app/qefficient-library/docs"
41+
ENV MODEL_NAME = ""
42+
ENV CACHE_DIR = ""
43+
ENV TOKEN_ID = ""
44+
45+
# Print a success message
46+
CMD ["echo", "qefficient-transformers repository cloned and setup installed inside Docker image."]
47+
CMD ["echo", "Starting the Model Download and Export to Onnx Stage for QEff."]
48+
CMD python3.8 -m QEfficient.cloud.export --model-name "$MODEL_NAME"
49+
50+
# Example usage:
51+
# docker build -t qefficient-library .
52+
53+
# Minimum System Requirements Before running docker containers:
54+
# 1. Clear the tmp space.
55+
# 2. For smaller models, 32GiB RAM is sufficient, but larger LLMs we require good CPU/RAM (Context 7B model would require atleast 64GiB).
56+
# 3. The exact minimum system configuration are tough to decide, since its all function of model parameters.
57+
58+
# docker run -e MODEL_NAME=gpt2 -e TOKEN_ID=<your-token-id> qefficient-library

LICENSE

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
Copyright (c) 2023, Qualcomm Innovation Center, Inc. All rights reserved.
2+
3+
Redistribution and use in source and binary forms, with or without
4+
modification, are permitted (subject to the limitations in the
5+
disclaimer below) provided that the following conditions are met:
6+
7+
* Redistributions of source code must retain the above copyright
8+
notice, this list of conditions and the following disclaimer.
9+
10+
* Redistributions in binary form must reproduce the above
11+
copyright notice, this list of conditions and the following
12+
disclaimer in the documentation and/or other materials provided
13+
with the distribution.
14+
15+
* Neither the name of Qualcomm Innovation Center, Inc. nor the names of its
16+
contributors may be used to endorse or promote products derived
17+
from this software without specific prior written permission.
18+
19+
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE
20+
GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT
21+
HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
22+
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
23+
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
24+
IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
25+
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
27+
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
28+
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
29+
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
30+
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
31+
IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32+
33+
SPDX-License-Identifier: BSD-3-Clause

MANIFEST.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
include README.md
2+
include LICENSE

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
format:
2+
isort --profile black -l 140 QEfficient/
3+
black -l 140 QEfficient/

QEfficient/__init__.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# -----------------------------------------------------------------------------
2+
#
3+
# Copyright (c) 2023-2024 Qualcomm Innovation Center, Inc. All rights reserved.
4+
# SPDX-License-Identifier: BSD-3-Clause
5+
#
6+
# -----------------------------------------------------------------------------
7+
8+
import torch.nn as nn
9+
from QEfficient.transformers.modeling_utils import transform as transform_hf
10+
11+
12+
def transform(model: nn.Module, type="Transformers", form_factor="cloud"):
13+
"""Low level apis in library
14+
model : instance of nn.Module
15+
type : Transformers | Diffusers, default : Transformers
16+
"""
17+
if type == "Transformers":
18+
return transform_hf(model, form_factor)
19+
else:
20+
raise NotImplementedError

QEfficient/cloud/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# -----------------------------------------------------------------------------
2+
#
3+
# Copyright (c) 2023-2024 Qualcomm Innovation Center, Inc. All rights reserved.
4+
# SPDX-License-Identifier: BSD-3-Clause
5+
#
6+
# -----------------------------------------------------------------------------
7+

QEfficient/cloud/compile.py

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# -----------------------------------------------------------------------------
2+
#
3+
# Copyright (c) 2023-2024 Qualcomm Innovation Center, Inc. All rights reserved.
4+
# SPDX-License-Identifier: BSD-3-Clause
5+
#
6+
# -----------------------------------------------------------------------------
7+
8+
import os
9+
import argparse
10+
import json
11+
from typing import List
12+
13+
from QEfficient.exporter.export_utils import compile_kv_model_on_cloud_ai_100
14+
from QEfficient.utils.logging_utils import logger
15+
16+
17+
def create_and_dump_specializations(batch_size: int, prompt_len: int, ctx_len: int, path: str):
18+
# Create
19+
specializations = {
20+
"specializations": [
21+
{
22+
"batch_size": str(batch_size),
23+
"seq_len": str(prompt_len),
24+
"ctx_len": str(ctx_len),
25+
},
26+
{"batch_size": str(batch_size), "seq_len": "1", "ctx_len": str(ctx_len)},
27+
]
28+
}
29+
# Dump
30+
with open(path, "w") as file:
31+
json.dump(specializations, file, indent=4)
32+
33+
34+
def main(
35+
onnx_path: str,
36+
qpc_path: str,
37+
num_cores: int,
38+
device_group: List[int],
39+
aic_enable_depth_first: bool = False,
40+
mos: int = -1,
41+
batch_size: int = 1,
42+
prompt_len: int = 32,
43+
ctx_len: int = 128,
44+
mxfp6: bool = True,
45+
) -> str:
46+
# Dynamically create the specializations JSON
47+
"""
48+
Api() to compile the Onnx Model on Cloud AI 100 Platform with give config.
49+
---------
50+
:param onnx_path: str. Generated Onnx Model Path.
51+
:base_path: str. Base path for the generated models.
52+
:batch_size: int. Batch size to compile the model for.
53+
:prompt_len: int. prompt len for the model to compile.
54+
:ctx_len: int. Maximum context length to compile the model.
55+
:mxfp6: bool. Enable compilation for MXFP6 precision
56+
:num_cores: int. Number of cores to compile model on. default: 16 available option: [1 to 16]
57+
"""
58+
59+
os.makedirs(qpc_path, exist_ok=True)
60+
specialization_json_path = os.path.join(qpc_path, "specializations.json")
61+
create_and_dump_specializations(
62+
batch_size=batch_size, prompt_len=prompt_len, ctx_len=ctx_len, path=specialization_json_path
63+
)
64+
custom_io_file_path = os.path.join(os.path.dirname(onnx_path), "custom_io.yaml")
65+
66+
if not os.path.isfile(custom_io_file_path):
67+
raise FileNotFoundError(f"file {custom_io_file_path} needs to exist in the same directory as onnx model files.")
68+
69+
_, qpc_path = compile_kv_model_on_cloud_ai_100(
70+
onnx_path=onnx_path,
71+
specializations_json=specialization_json_path,
72+
num_cores=num_cores,
73+
custom_io_path=custom_io_file_path,
74+
base_path=qpc_path,
75+
mxfp6=mxfp6,
76+
aic_enable_depth_first=aic_enable_depth_first,
77+
mos=mos,
78+
device_group=device_group,
79+
)
80+
81+
logger.info(f"Compiled QPC files can be found here: {qpc_path}")
82+
return qpc_path
83+
84+
85+
if __name__ == "__main__":
86+
parser = argparse.ArgumentParser(description="Compilation script.")
87+
parser.add_argument("--onnx_path", "--onnx-path", required=True, help="Onnx Model Path")
88+
parser.add_argument(
89+
"--qpc-path",
90+
"--qpc_path",
91+
required=True,
92+
help="Compiled qpc binaries will be stored under this folder",
93+
)
94+
parser.add_argument("--batch_size", "--batch-size", type=int, default=1, help="Batch size for text generation")
95+
parser.add_argument(
96+
"--prompt_len",
97+
"--prompt-len",
98+
default=32,
99+
type=int,
100+
help="Sequence length for text generation.",
101+
)
102+
parser.add_argument("--ctx_len", "--ctx-len", default=128, type=int, help="Context length for text generation.")
103+
parser.add_argument(
104+
"--mxfp6",
105+
action="store_true",
106+
help="Compress constant MatMul weights to MXFP6 E2M3, default is no compression",
107+
)
108+
parser.add_argument(
109+
"--num_cores",
110+
"--num-cores",
111+
required=True,
112+
type=int,
113+
help="num cores to compile the model on",
114+
)
115+
parser.add_argument(
116+
"--device_group",
117+
"--device-group",
118+
required=True,
119+
type=lambda device_ids: [int(x) for x in device_ids.strip("[]").split(",")],
120+
help="Cloud AI 100 device ids (comma-separated) e.g. [0] ",
121+
)
122+
parser.add_argument(
123+
"--aic_enable_depth_first", "--aic-enable-depth-first",
124+
action="store_true",
125+
help="If passed, this option will be enabled during compilation, disabled by default",
126+
)
127+
parser.add_argument(
128+
"--mos",
129+
type=int,
130+
default=-1,
131+
help=" Effort level to reduce the on-chip memory",
132+
)
133+
args = parser.parse_args()
134+
main(**vars(args))

0 commit comments

Comments
 (0)