In the previous post we set up a test experiment around DeepEval and used DeepEval’s evaluation function to establish a quality baseline. That post ended with the need for experiments to confirm against that baseline, and that’s what we’ll do in this post.
Continue reading AI and Testing: Improving Retrieval Quality, Part 2