Add Support for Templating to GAP #283

nv-braf · 2025-02-12T20:25:03Z

Adds logic to create templates and adds a new subcommand which creates both verbose (with comments) and non-verbose (just options) templates. The default filename for the template is: genai_perf_config.yaml. This can be modified by the user by specifying a filename.

I have run both of these templates successfully in live testing.

genai-perf create-template:

  model_names: 

  analyze:
    # Uncomment the lines below to enable the analyze subcommand
    # For further details see analyze.md
    # sweep_parameters:
    #   concurrency:
    #     start: 1
    #     stop: 1024

  endpoint:
    model_selection_strategy: round_robin
    backend: tensorrtllm
    custom: 
    type: 
    service_kind: triton
    streaming: False
    server_metrics_urls: ['http://localhost:8002/metrics']
    url: localhost:8001

  perf_analyzer:
    path: perf_analyzer
    verbose: False
    stimulus: {'concurrency': 1}
    stability_percentage: 999
    measurement_interval: 10000

  input:
    batch_size: 1
    extra: 
    goodput: 
    header: 
    file: 
    num_dataset_entries: 100
    random_seed: 0

    image:
      batch_size: 1
      width_mean: 100
      width_stddev: 0
      height_mean: 100
      height_stddev: 0
      format: png

    output_tokens:
      mean: 0
      deterministic: False
      stddev: 0

    synthetic_tokens:
      mean: 550
      stddev: 0

    prefix_prompt:
      num: 0
      length: 100

    request_count:
      warmup: 0

  output:
    artifact_directory: artifacts
    checkpoint_directory: checkpoint
    profile_export_file: profile_export.json
    generate_plots: False

  tokenizer:
    name: hf-internal-testing/llama-tokenizer
    revision: main
    trust_remote_code: False

genai-perf create-template -v

  # The name of the model(s) to benchmark.
  model_names: 


  analyze:
    # Uncomment the lines below to enable the analyze subcommand
    # For further details see analyze.md
    # sweep_parameters:
    #   concurrency:
    #     start: 1
    #     stop: 1024

  endpoint:
    # When multiple model are specified, this is how a specific model should be assigned to a prompt.            
    # round_robin: nth prompt in the list gets assigned to n-mod len(models).            
    # random: assignment is uniformly random
    model_selection_strategy: round_robin

    # When using the "triton" service-kind, this is the backend of the model.                
    # For the TENSORRT-LLM backend,you currently must set 'exclude_input_in_output' to true                
    # in the model config to not echo the input tokens
    backend: tensorrtllm

    # Set a custom endpoint that differs from the OpenAI defaults.
    custom: 

    # The type to send requests to on the server.
    type: 

    # The kind of service Perf Analyzer will generate load for.                
    # In order to use "openai", you must specify an api via the "type" field
    service_kind: triton

    # An option to enable the use of the streaming API.
    streaming: False

    # The list of Triton server metrics URLs.                
    # These are used for Telemetry metric reporting with the "triton" service-kind.
    server_metrics_urls: ['http://localhost:8002/metrics']

    # URL of the endpoint to target for benchmarking.
    url: localhost:8001


  perf_analyzer:
    # Path to Perf Analyzer binary
    path: perf_analyzer

    # Enables verbose output from Perf Analyzer
    verbose: False

    # The type and value of stimulus to benchmark
    stimulus: {'concurrency': 1}

    # The allowed variation in latency measurements when determining if a result is stable.            
    # The measurement is considered as stable if the ratio of max / min            
    # from the recent 3 measurements is within (stability percentage)            
    # in terms of both infer per second and latency.
    stability_percentage: 999

    # The time interval used for each measurement in milliseconds.                
    # Perf Analyzer will sample a time interval specified and take measurement                
    # over the requests completed within that time interval.
    measurement_interval: 10000


  input:
    # The batch size of text requests GenAI-Perf should send.            
    # This is currently supported with the embeddings and rankings endpoint types
    batch_size: 1

    # Provide additional inputs to include with every request.                
    # Inputs should be in an 'input_name:value' format.
    extra: 

    # An option to provide constraints in order to compute goodput.                
    # Specify goodput constraints as 'key:value' pairs,                
    # where the key is a valid metric name, and the value is a number representing                
    # either milliseconds or a throughput value per second.                
    # For example:                
    #   request_latency:300                
    #   output_token_throughput_per_request:600
    goodput: 

    # Adds a custom header to the requests.                
    # Headers must be specified as 'Header:Value' pairs.
    header: 

    # The file or directory containing the content to use for profiling.                
    # Example:                
    #   text: "Your prompt here"                
    # 
    # To use synthetic files for a converter that needs multiple files,                
    # prefix the path with 'synthetic:' followed by a comma-separated list of file names.                
    # The synthetic filenames should not have extensions.                
    # Example:                
    #   synthetic: queries,passages
    file: 

    # The number of unique payloads to sample from.                
    # These will be reused until benchmarking is complete.
    num_dataset_entries: 100

    # The seed used to generate random values.
    random_seed: 0


    image:
      # The image batch size of the requests GenAI-Perf should send.                
      # This is currently supported with the image retrieval endpoint type.
      batch_size: 1

      # The mean width of the images when generating synthetic image data.
      width_mean: 100

      # The standard deviation of width of images when generating synthetic image data.
      width_stddev: 0

      # The mean height of images when generating synthetic image data.
      height_mean: 100

      # The standard deviation of height of images when generating synthetic image data.
      height_stddev: 0

      # The compression format of the images.
      format: png


    output_tokens:
      # The mean number of tokens in each output.
      mean: 0

      # This can be set to improve the precision of the mean by setting the            
      # minimum number of tokens equal to the requested number of tokens.            
      # This is currently supported with the Triton service-kind.
      deterministic: False

      # The standard deviation of the number of tokens in each output.
      stddev: 0


    synthetic_tokens:
      # The mean of number of tokens in the generated prompts when using synthetic data.
      mean: 550

      # The standard deviation of number of tokens in the generated prompts when using synthetic data.
      stddev: 0


    prefix_prompt:
      # The number of prefix prompts to select from.            
      # If this value is not zero, these are prompts that are prepended to input prompts.            
      # This is useful for benchmarking models that use a K-V cache.
      num: 0

      # The number of tokens in each prefix prompt.            
      # This is only used if "num" is greater than zero.            
      # Note that due to the prefix and user prompts being concatenated,            
      # the number of tokens in the final prompt may be off by one.
      length: 100


    request_count:
      # The number of warmup requests to send before benchmarking.
      warmup: 0

  output:
    # The directory to store all the (output) artifacts generated by                    
    # GenAI-Perf and Perf Analyzer.
    artifact_directory: artifacts

    # The directory to store/restore the checkpoint generated by GenAI-Perf.
    checkpoint_directory: checkpoint

    # The path where Perf Analyzer profiling data will be exported.                
    # By default, the profile export will be to profile_export.json.                
    # By default, the GenAI-Perf export will be to <profile_export_file>_genai_perf.csv
    profile_export_file: profile_export.json

    # Enables the generation of plots
    generate_plots: False


  tokenizer:
    # The HuggingFace tokenizer to use to interpret token metrics                
    # from prompts and responses. The value can be the                
    # name of a tokenizer or the filepath of the tokenizer.
    name: hf-internal-testing/llama-tokenizer

    # The specific model version to use.                                             
    # It can be a branch name, tag name, or commit ID.
    revision: main

    # Allows custom tokenizer to be downloaded and executed.                
    # This carries security risks and should only be used for repositories you trust.                
    # This is only necessary for custom tokenizers stored in HuggingFace Hub.
    trust_remote_code: False

genai-perf/genai_perf/parser.py

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

dyastremsky

Great work! Appreciate all the clean-up and improvement in this PR.

Left one comment.

A couple of other thoughts:

You might want to add another test for create_template with all of the params specified (e.g. verbose).
You might want to add a test file that tests the subcommand, so that you're getting the equivalent of an end-to-end test but much cheaper (a unit test with mocks).

If either of the above sound tedious but valuable, let me know and I'm happy to pair program or open a PR branched off of this PR to contribute the tests to your feature.

genai-perf/genai_perf/config/input/config_analyze.py

nv-braf · 2025-02-14T20:42:21Z

Great work! Appreciate all the clean-up and improvement in this PR.

Left one comment.

A couple of other thoughts:

You might want to add another test for create_template with all of the params specified (e.g. verbose).

You might want to add a test file that tests the subcommand, so that you're getting the equivalent of an end-to-end test but much cheaper (a unit test with mocks).

If either of the above sound tedious but valuable, let me know and I'm happy to pair program or open a PR branched off of this PR to contribute the tests to your feature.

I agree, we need more testing. And, I think we need live testing. My plan is to substitute a current CI test we have with a templated version (modified to fit the test) of the config file. This will be done in a later story.

dyastremsky

Exemplary work!

debermudez

Well done!

* Adding support for templating to GAP * Potential fix for code scanning alert no. 159: Unused import Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Fixing isort issue created by autofix * Refactoring template creation * Refactoring parser * Adding missing verbose template comment * Reformatting sweep template comment --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Adding support for templating to GAP

9c34bee

nv-braf temporarily deployed to GITLAB February 12, 2025 20:25 — with GitHub Actions Inactive

github-advanced-security bot found potential problems Feb 12, 2025

View reviewed changes

genai-perf/genai_perf/parser.py Fixed Show fixed Hide fixed

Potential fix for code scanning alert no. 159: Unused import

259c90a

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

nv-braf temporarily deployed to GITLAB February 12, 2025 20:27 — with GitHub Actions Inactive

nv-braf temporarily deployed to GITLAB February 12, 2025 20:30 — with GitHub Actions Inactive

Fixing isort issue created by autofix

fbd354d

nv-braf temporarily deployed to GITLAB February 12, 2025 21:29 — with GitHub Actions Inactive

Refactoring template creation

d3dbdd0

nv-braf temporarily deployed to GITLAB February 12, 2025 23:18 — with GitHub Actions Inactive

nv-braf temporarily deployed to GITLAB February 12, 2025 23:19 — with GitHub Actions Inactive

Refactoring parser

1b4d747

nv-braf temporarily deployed to GITLAB February 13, 2025 16:55 — with GitHub Actions Inactive

nv-braf temporarily deployed to GITLAB February 13, 2025 16:56 — with GitHub Actions Inactive

nv-braf marked this pull request as ready for review February 13, 2025 17:03

nv-braf requested a review from debermudez February 13, 2025 17:03

Adding missing verbose template comment

b0a79ae

nv-braf temporarily deployed to GITLAB February 14, 2025 01:28 — with GitHub Actions Inactive

dyastremsky reviewed Feb 14, 2025

View reviewed changes

genai-perf/genai_perf/config/input/config_analyze.py Outdated Show resolved Hide resolved

Reformatting sweep template comment

8bd32c3

nv-braf temporarily deployed to GITLAB February 14, 2025 21:43 — with GitHub Actions Inactive

nv-braf temporarily deployed to GITLAB February 14, 2025 21:44 — with GitHub Actions Inactive

dyastremsky approved these changes Feb 14, 2025

View reviewed changes

debermudez approved these changes Feb 14, 2025

View reviewed changes

nv-braf merged commit dd2cf2e into config_file_support Feb 18, 2025
6 of 7 checks passed

nv-braf deleted the add_templating_to_config_file branch February 18, 2025 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Support for Templating to GAP #283

Add Support for Templating to GAP #283

Uh oh!

nv-braf commented Feb 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

dyastremsky left a comment •

edited

Loading

Uh oh!

Uh oh!

nv-braf commented Feb 14, 2025

Uh oh!

dyastremsky left a comment

Uh oh!

debermudez left a comment

Uh oh!

Uh oh!

Uh oh!

Add Support for Templating to GAP #283

Add Support for Templating to GAP #283

Uh oh!

Conversation

nv-braf commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

genai-perf create-template:

genai-perf create-template -v

Uh oh!

Uh oh!

dyastremsky left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nv-braf commented Feb 14, 2025

Uh oh!

dyastremsky left a comment

Choose a reason for hiding this comment

Uh oh!

debermudez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nv-braf commented Feb 12, 2025 •

edited

Loading

dyastremsky left a comment •

edited

Loading