Analyzing results

After your evaluation completes, Karini AI provides comprehensive analytics to review performance and identifying improvement opportunities.

Results Dashboard

Overall Performance Summary

The dashboard provides a high-level view of your evaluation:

  • Status: Current state of evaluation (completed, processing, failed)

  • Evaluation Date: When the evaluation was run

  • Dataset Size: Number of test cases evaluated

  • Prompt Version: Which version was evaluated

  • Evaluation Strategy: Which metrics were used (default, custom, or both)

Metric Performance Cards

Each metric is shown as a performance card, providing a quick snapshot of behavior across the dataset:

  • Average Score – Mean score for the metric

  • Score Distribution – Visual representation of the spread of scores (e.g., histogram)

  • Confidence Level – Average evaluator confidence for that metric

These cards help you quickly identify which dimensions (e.g., relevancy, faithfulness, tool accuracy) are strong or weak.

Statistical Analysis

Understand performance variability:

  • Mean: Average score across dataset

  • Median: Middle value, showing central tendency

  • Standard Deviation: Consistency measure (lower = more consistent)

  • Min/Max Values: Range showing best and worst performance

  • Percentiles: Distribution of scores across the dataset

Last updated