Calculator¶
The prompt_versioner.metrics.calculator module provides utilities for calculating various metrics and derived scores for LLM calls.
MetricsCalculator¶
Class for calculating additional metrics and enriching LLM call data.
Constructor¶
Parameters:
- pricing_manager (Optional[PricingManager]): Custom pricing manager (default: default PricingManager)
Methods¶
calculate_cost()¶
Calculates the cost in EUR for a model call.
Parameters:
- model_name (str): Model name
- input_tokens (int): Number of input tokens
- output_tokens (int): Number of output tokens
Returns:
- float: Cost in EUR
Example:
from prompt_versioner.metrics.calculator import MetricsCalculator
calculator = MetricsCalculator()
# Calculate cost for GPT-4
cost = calculator.calculate_cost("gpt-4", input_tokens=100, output_tokens=50)
print(f"Cost: €{cost:.4f}")
# Calculate cost for different models
models_test = [
("gpt-3.5-turbo", 200, 100),
("gpt-4", 200, 100),
("claude-3-haiku", 200, 100)
]
for model, input_tokens, output_tokens in models_test:
cost = calculator.calculate_cost(model, input_tokens, output_tokens)
print(f"{model}: €{cost:.4f} per {input_tokens + output_tokens} tokens")
enrich_metrics()¶
Enriches metrics with automatically calculated values.
Features:
- Calculates total_tokens if missing
- Calculates cost_eur if missing
- Adds derived metrics to metadata:
- cost_per_token: Cost per token
- tokens_per_second: Throughput in tokens/second
- cost_per_second: Cost per second
Parameters:
- metrics (ModelMetrics): ModelMetrics object to enrich
Returns:
- ModelMetrics: Enriched object (modifies the original object)
Example:
from prompt_versioner.metrics.models import ModelMetrics
from prompt_versioner.metrics.calculator import MetricsCalculator
# Create base metrics
metrics = ModelMetrics(
model_name="gpt-4",
input_tokens=150,
output_tokens=75,
latency_ms=1200,
quality_score=0.92
# cost_eur not specified - will be calculated
# total_tokens not specified - will be calculated
)
calculator = MetricsCalculator()
enriched = calculator.enrich_metrics(metrics)
print(f"Total tokens: {enriched.total_tokens}") # 225
print(f"Cost: €{enriched.cost_eur:.4f}") # Automatically calculated
print(f"Cost per token: €{enriched.metadata['cost_per_token']:.6f}")
print(f"Tokens/sec: {enriched.metadata['tokens_per_second']:.1f}")
print(f"Cost/sec: €{enriched.metadata['cost_per_second']:.4f}")
calculate_efficiency_score()¶
Calculates an efficiency score (quality per cost).
Formula:
Parameters:
- metrics (ModelMetrics): Object with quality_score and cost_eur
Returns:
- float: Efficiency score (0-100)
Example:
# Compare efficiency of different versions
versions = [
ModelMetrics(quality_score=0.85, cost_eur=0.002), # Baseline
ModelMetrics(quality_score=0.90, cost_eur=0.003), # Higher quality, more expensive
ModelMetrics(quality_score=0.82, cost_eur=0.001), # Lower quality, cheaper
]
for i, metrics in enumerate(versions):
efficiency = calculator.calculate_efficiency_score(metrics)
print(f"Version {i+1}: Efficiency = {efficiency:.1f}")
print(f" Quality: {metrics.quality_score:.2%}")
print(f" Cost: €{metrics.cost_eur:.4f}")
calculate_value_score()¶
def calculate_value_score(
self,
metrics: ModelMetrics,
quality_weight: float = 0.5,
cost_weight: float = 0.3,
latency_weight: float = 0.2,
) -> float
Calculates an overall value score combining quality, cost, and latency.
Formula:
value_score = (quality * quality_weight +
cost_score * cost_weight +
latency_score * latency_weight) / total_weight
Parameters:
- metrics (ModelMetrics): Object with metrics to evaluate
- quality_weight (float): Weight for quality (default: 0.5)
- cost_weight (float): Weight for cost (default: 0.3)
- latency_weight (float): Weight for latency (default: 0.2)
Returns:
- float: Value score (0-100)
Example:
# Multi-dimensional analysis
metrics = ModelMetrics(
quality_score=0.88,
cost_eur=0.0025,
latency_ms=850
)
# Different weighting profiles
profiles = {
"balanced": {"quality_weight": 0.5, "cost_weight": 0.3, "latency_weight": 0.2},
"quality_focused": {"quality_weight": 0.7, "cost_weight": 0.2, "latency_weight": 0.1},
"cost_focused": {"quality_weight": 0.3, "cost_weight": 0.5, "latency_weight": 0.2},
"speed_focused": {"quality_weight": 0.3, "cost_weight": 0.2, "latency_weight": 0.5}
}
for profile_name, weights in profiles.items():
score = calculator.calculate_value_score(metrics, **weights)
print(f"{profile_name.title()}: {score:.1f}/100")
See Also¶
Aggregator- Functionality to aggregate metrics across multiple test runsAnalyzer- Functionality for analyzing and comparing metrics between versionsModels- Data models for metrics and comparison structuresPricing- Manages model pricing and calculates LLM call costsTracker- Functionality for tracking and statistical analysis of metrics