Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Templating to GAP #283

Open
wants to merge 5 commits into
base: config_file_support
Choose a base branch
from

Conversation

nv-braf
Copy link
Contributor

@nv-braf nv-braf commented Feb 12, 2025

Adds logic to create templates and adds a new subcommand which creates both verbose (with comments) and non-verbose (just options) templates. The default filename for the template is: genai_perf_config.yaml. This can be modified by the user by specifying a filename.

I have run both of these templates successfully in live testing.

genai-perf create-template:

  model_names: 

  analyze:
    # Uncomment the lines below to enable the analyze subcommand
    # For further details see analyze.md
    # sweep_parameters:
    #   concurrency:
    #     start: 1
    #     stop: 1024

  endpoint:
    model_selection_strategy: round_robin
    backend: tensorrtllm
    custom: 
    type: 
    service_kind: triton
    streaming: False
    server_metrics_urls: ['http://localhost:8002/metrics']
    url: localhost:8001

  perf_analyzer:
    path: perf_analyzer
    verbose: False
    stimulus: {'concurrency': 1}
    stability_percentage: 999
    measurement_interval: 10000

  input:
    batch_size: 1
    extra: 
    goodput: 
    header: 
    file: 
    num_dataset_entries: 100
    random_seed: 0

    image:
      batch_size: 1
      width_mean: 100
      width_stddev: 0
      height_mean: 100
      height_stddev: 0
      format: png

    output_tokens:
      mean: 0
      deterministic: False
      stddev: 0

    synthetic_tokens:
      mean: 550
      stddev: 0

    prefix_prompt:
      num: 0
      length: 100

    request_count:
      warmup: 0

  output:
    artifact_directory: artifacts
    checkpoint_directory: checkpoint
    profile_export_file: profile_export.json
    generate_plots: False

  tokenizer:
    name: hf-internal-testing/llama-tokenizer
    revision: main
    trust_remote_code: False

genai-perf create-template -v

  # The name of the model(s) to benchmark.
  model_names: 


  analyze:
    # Uncomment the lines below to enable the analyze subcommand
    # For further details see analyze.md
    # sweep_parameters:
    #   concurrency:
    #     start: 1
    #     stop: 1024

  endpoint:
    # When multiple model are specified, this is how a specific model should be assigned to a prompt.            
    # round_robin: nth prompt in the list gets assigned to n-mod len(models).            
    # random: assignment is uniformly random
    model_selection_strategy: round_robin

    # When using the "triton" service-kind, this is the backend of the model.                
    # For the TENSORRT-LLM backend,you currently must set 'exclude_input_in_output' to true                
    # in the model config to not echo the input tokens
    backend: tensorrtllm

    # Set a custom endpoint that differs from the OpenAI defaults.
    custom: 

    # The type to send requests to on the server.
    type: 

    # The kind of service Perf Analyzer will generate load for.                
    # In order to use "openai", you must specify an api via the "type" field
    service_kind: triton

    # An option to enable the use of the streaming API.
    streaming: False

    # The list of Triton server metrics URLs.                
    # These are used for Telemetry metric reporting with the "triton" service-kind.
    server_metrics_urls: ['http://localhost:8002/metrics']

    # URL of the endpoint to target for benchmarking.
    url: localhost:8001


  perf_analyzer:
    # Path to Perf Analyzer binary
    path: perf_analyzer

    # Enables verbose output from Perf Analyzer
    verbose: False

    # The type and value of stimulus to benchmark
    stimulus: {'concurrency': 1}

    # The allowed variation in latency measurements when determining if a result is stable.            
    # The measurement is considered as stable if the ratio of max / min            
    # from the recent 3 measurements is within (stability percentage)            
    # in terms of both infer per second and latency.
    stability_percentage: 999

    # The time interval used for each measurement in milliseconds.                
    # Perf Analyzer will sample a time interval specified and take measurement                
    # over the requests completed within that time interval.
    measurement_interval: 10000


  input:
    # The batch size of text requests GenAI-Perf should send.            
    # This is currently supported with the embeddings and rankings endpoint types
    batch_size: 1

    # Provide additional inputs to include with every request.                
    # Inputs should be in an 'input_name:value' format.
    extra: 

    # An option to provide constraints in order to compute goodput.                
    # Specify goodput constraints as 'key:value' pairs,                
    # where the key is a valid metric name, and the value is a number representing                
    # either milliseconds or a throughput value per second.                
    # For example:                
    #   request_latency:300                
    #   output_token_throughput_per_request:600
    goodput: 

    # Adds a custom header to the requests.                
    # Headers must be specified as 'Header:Value' pairs.
    header: 

    # The file or directory containing the content to use for profiling.                
    # Example:                
    #   text: "Your prompt here"                
    # 
    # To use synthetic files for a converter that needs multiple files,                
    # prefix the path with 'synthetic:' followed by a comma-separated list of file names.                
    # The synthetic filenames should not have extensions.                
    # Example:                
    #   synthetic: queries,passages
    file: 

    # The number of unique payloads to sample from.                
    # These will be reused until benchmarking is complete.
    num_dataset_entries: 100

    # The seed used to generate random values.
    random_seed: 0


    image:
      # The image batch size of the requests GenAI-Perf should send.                
      # This is currently supported with the image retrieval endpoint type.
      batch_size: 1

      # The mean width of the images when generating synthetic image data.
      width_mean: 100

      # The standard deviation of width of images when generating synthetic image data.
      width_stddev: 0

      # The mean height of images when generating synthetic image data.
      height_mean: 100

      # The standard deviation of height of images when generating synthetic image data.
      height_stddev: 0

      # The compression format of the images.
      format: png


    output_tokens:
      # The mean number of tokens in each output.
      mean: 0

      # This can be set to improve the precision of the mean by setting the            
      # minimum number of tokens equal to the requested number of tokens.            
      # This is currently supported with the Triton service-kind.
      deterministic: False

      # The standard deviation of the number of tokens in each output.
      stddev: 0


    synthetic_tokens:
      # The mean of number of tokens in the generated prompts when using synthetic data.
      mean: 550

      # The standard deviation of number of tokens in the generated prompts when using synthetic data.
      stddev: 0


    prefix_prompt:
      # The number of prefix prompts to select from.            
      # If this value is not zero, these are prompts that are prepended to input prompts.            
      # This is useful for benchmarking models that use a K-V cache.
      num: 0

      # The number of tokens in each prefix prompt.            
      # This is only used if "num" is greater than zero.            
      # Note that due to the prefix and user prompts being concatenated,            
      # the number of tokens in the final prompt may be off by one.
      length: 100


    request_count:
      warmup: 0

  output:
    # The directory to store all the (output) artifacts generated by                    
    # GenAI-Perf and Perf Analyzer.
    artifact_directory: artifacts

    # The directory to store/restore the checkpoint generated by GenAI-Perf.
    checkpoint_directory: checkpoint

    # The path where Perf Analyzer profiling data will be exported.                
    # By default, the profile export will be to profile_export.json.                
    # By default, the GenAI-Perf export will be to <profile_export_file>_genai_perf.csv
    profile_export_file: profile_export.json

    # Enables the generation of plots
    generate_plots: False


  tokenizer:
    # The HuggingFace tokenizer to use to interpret token metrics                
    # from prompts and responses. The value can be the                
    # name of a tokenizer or the filepath of the tokenizer.
    name: hf-internal-testing/llama-tokenizer

    # The specific model version to use.                                             
    # It can be a branch name, tag name, or commit ID.
    revision: main

    # Allows custom tokenizer to be downloaded and executed.                
    # This carries security risks and should only be used for repositories you trust.                
    # This is only necessary for custom tokenizers stored in HuggingFace Hub.
    trust_remote_code: False

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@nv-braf nv-braf marked this pull request as ready for review February 13, 2025 17:03
@nv-braf nv-braf requested a review from debermudez February 13, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant