Analyzing results
After your evaluation completes, Karini AI provides comprehensive analytics to review performance and identifying improvement opportunities.
Results Dashboard
Overall Performance Summary
The dashboard provides a high-level view of your evaluation:
Status: Current state of evaluation (completed, processing, failed)
Evaluation Date: When the evaluation was run
Dataset Size: Number of test cases evaluated
Prompt Version: Which version was evaluated
Evaluation Strategy: Which metrics were used (default, custom, or both)

Metric Performance Cards
Each metric is shown as a performance card, providing a quick snapshot of behavior across the dataset:
Average Score – Mean score for the metric
Score Distribution – Visual representation of the spread of scores (e.g., histogram)
Confidence Level – Average evaluator confidence for that metric
These cards help you quickly identify which dimensions (e.g., relevancy, faithfulness, tool accuracy) are strong or weak.

Statistical Analysis
Understand performance variability:
Mean: Average score across dataset
Median: Middle value, showing central tendency
Standard Deviation: Consistency measure (lower = more consistent)
Min/Max Values: Range showing best and worst performance
Percentiles: Distribution of scores across the dataset

Last updated