Telemetry Contract
Standardized telemetry with canonical agent naming, checkpoint normalization, and bounded attribute sets for stable observability across agent evolution.
Context
Prior telemetry patterns allowed drift: agent names appeared with multiple aliases in metrics, checkpoint labels were inconsistent, unbounded attributes created noisy dimensions, and dashboards broke after innocuous naming changes. Operations needed to trust trend data and alerting over time.
Core Decisions
1. Canonical Agent and Checkpoint Naming
Metrics are emitted using normalized identifiers: canonical agent names, normalized checkpoints (CP-1 to CP-6), and legacy alias mapping for backward compatibility.
2. Bounded and Sanitized Attribute Set
Telemetry helper functions sanitize attribute values and enforce bounded key sets to prevent cardinality blowups and malformed dimensions.
3. Contract Lives in Shared Helper Layer
All emitters use a shared helper for contract-safe metric output instead of ad hoc metric assembly. This ensures every telemetry emission follows the same normalization rules.
4. Backward Compatibility Is Explicit
Legacy aliases are translated to canonical forms so historical dashboards continue to operate while migration completes.
5. Contract Regressions Fail CI
Telemetry contract tests validate canonicalization, alias mapping, and attribute safety before merge.
Telemetry Pipeline
Workflow/Agent Event
An event is triggered during workflow execution or agent operation.
agentMetrics Helper
The shared helper receives the raw event data.
Canonicalization
Agent names and checkpoints are normalized to canonical forms.
Attribute Sanitization
Attribute values are sanitized and bounded to prevent cardinality issues.
Metric Emission
Clean, contract-safe metrics are emitted to the telemetry backend.
Dashboards & Alerts
Stable semantics ensure dashboards and alerts remain trustworthy over time.