8 April 2026 – Stories from a Software Tester

AI and Testing: Recall, Relevancy, and Richer Evaluation

written by Jeff Nyman

In the previous posts we looked at the Faithfulness and Contextual Precision metrics with DeepEval, and started building an intuition for how retrieval failures cascade into generation failures. Those two metrics told us what went wrong and where in the pipeline. In this post, we’ll add three more tools to the diagnostic kit: Contextual Recall, Contextual Relevancy, and G-Eval.

Continue reading AI and Testing: Recall, Relevancy, and Richer Evaluation →

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …

Day: April 8, 2026

AI and Testing: Recall, Relevancy, and Richer Evaluation