Analyzing results

After your evaluation completes, Karini AI provides comprehensive analytics to review performance and identifying improvement opportunities.

Results Dashboard

Overall Performance Summary

The dashboard provides a high-level view of your evaluation:

Status: Current state of evaluation (completed, processing, failed)
Evaluation Date: When the evaluation was run
Dataset Size: Number of test cases evaluated
Prompt Version: Which version was evaluated
Evaluation Strategy: Which metrics were used (default, custom, or both)

Metric Performance Cards

Each metric is shown as a performance card, providing a quick snapshot of behavior across the dataset:

Average Score – Mean score for the metric
Score Distribution – Visual representation of the spread of scores (e.g., histogram)
Confidence Level – Average evaluator confidence for that metric

These cards help you quickly identify which dimensions (e.g., relevancy, faithfulness, tool accuracy) are strong or weak.

Statistical Analysis

Understand performance variability:

Mean: Average score across dataset
Median: Middle value, showing central tendency
Standard Deviation: Consistency measure (lower = more consistent)
Min/Max Values: Range showing best and worst performance
Percentiles: Distribution of scores across the dataset

PreviousEvaluation Metrics & Scoring NextDatasets

Last updated 2 months ago