You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
num_cores: int=16, # FIXME: Make this mandatory arg
1743
1747
mxfp6_matmul: bool=False,
1748
+
mxint8_kv_cache: bool=False,
1749
+
num_speculative_tokens: Optional[int] =None,
1750
+
enable_qnn: bool=False,
1751
+
qnn_config: Optional[str] =None,
1744
1752
**compiler_options,
1745
1753
) ->str:
1746
1754
"""
@@ -1751,19 +1759,41 @@ def compile(
1751
1759
``Optional`` Args:
1752
1760
:onnx_path (str, optional): Path to pre-exported onnx model.
1753
1761
:compile_dir (str, optional): Path for saving the qpc generated.
1754
-
:seq_len (int, optional): The length of the prompt should be less that ``seq_len``. ``Defaults to 32``.
1762
+
:encoder_ctx_len (int, optional): The maximum length of context for encoder, based on the AutoProcessor output. ``Defaults to checking config, if None in config then 1500``
1763
+
:ctx_len (int, optional): The maximum length of context to keep for decoding. ``Defaults to 150``.
1755
1764
:batch_size (int, optional): Batch size. ``Defaults to 1``.
1756
1765
:num_devices (int): Number of devices the model needs to be compiled for. Defaults to 1.
1757
1766
:num_cores (int): Number of cores used to compile the model.
1758
1767
:mxfp6_matmul (bool, optional): Whether to use ``mxfp6`` compression for weights. ``Defaults to False``.
1759
1768
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
1760
-
:allow_mxint8_mdp_io (bool, optional): Allows MXINT8 compression of MDP IO traffic. ``Defaults to False.``
1769
+
1770
+
Other args are not yet implemented for AutoModelForSpeechSeq2Seq
0 commit comments