Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: show inference metrics from Endpoint and Routing queries #3133

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

kyujin-cho
Copy link
Member

@kyujin-cho kyujin-cho commented Nov 25, 2024

This PR brings following updates:

  • Huggingface TGI runtime variant support
  • Show inference metric from GQL query
    • This feature will only be effective when using Enterprise AppProxy, as we do not have plans to support such feature on OSS AppProxy

Inference metric format

{
  # ...
  "vllm_request_params_n": {
    "__type": "HISTOGRAM",
    "current": {
      "1.0": "3832319",
      "2.0": "3832319",
      "5.0": "3832319",
      "10.0": "3832319",
      "20.0": "3832319",
      "+Inf": "3832319"
    },
    "threshold_unit": "le",
    "count": 3832319,
    "sum": "3832319"
  },
  "vllm_request_params_best_of": {
    "__type": "HISTOGRAM",
    "current": {
      "1.0": "3832319",
      "2.0": "3832319",
      "5.0": "3832319",
      "10.0": "3832319",
      "20.0": "3832319",
      "+Inf": "3832319"
    },
    "threshold_unit": "le",
    "count": 3832319,
    "sum": "3832319"
  },
  "vllm_cache_config_info": {
    "__type": "GAUGE",
    "current": "1",
    "capacity": null,
    "pct": "0.00",
    "unit_hint": "count"
  },
  # ...
}

Metrics from inference frameworks can be distinguished by three types: GAUGE, COUNTER and HISTOGRAM. The identifier __type bundled at the root of every metrics represents its type. GAUGE, COUNTER follows Backend.AI's session metrics format, whereas HISTOGRAM has its own new defintion.

HISTOGRAM metric format

On HISTOGRAM type, the distribution will be illustrated under current key. count and sum are optional, and will be supplied only if inference framework supports.

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version
  • Mention to the original issue
  • Installer updates including:
    • Fixtures for db schema changes
    • New mandatory config options
  • Update of end-to-end CLI integration tests in ai.backend.test
  • API server-client counterparts (e.g., manager API -> client SDK)
  • Test case(s) to:
    • Demonstrate the difference of before/after
    • Demonstrate the flow of abstract/conceptual models with a concrete implementation
  • Documentation
    • Contents in the docs directory
    • docstrings in public interfaces and type annotations

@github-actions github-actions bot added comp:manager Related to Manager component comp:agent Related to Agent component comp:common Related to Common component size:M 30~100 LoC labels Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:agent Related to Agent component comp:common Related to Common component comp:manager Related to Manager component size:M 30~100 LoC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant