AI and Testing: Improving Retrieval Quality, Part 1

In the previous post on Contextual Precision, we diagnosed a critical problem in our RAG system: poor retrieval quality was causing failures that we also observed in the Faithfulness post. In this first of three related posts, we’re going to dig in a bit. This will be our first extended example of what testing a generative AI really looks like.

Continue reading AI and Testing: Improving Retrieval Quality, Part 1

AI and Testing: Evaluation and DeepEval

In previous posts in this series, I’ve largely been talking about how to use local LLMs by writing scripts and, along the way, I’ve been able to shoehorn in some testing ideas. We even wrote a bespoke test script together. In this post, I’m going to focus more specifically on testing by considering the idea of evaluation.

Continue reading AI and Testing: Evaluation and DeepEval

AI and Testing: Personal Marketability

In the posts in this series, I’ve been taking you through a lot of concepts and tooling. That’s going to continue but, for this post, it felt prudent to take a little break and talk about why doing all this can matter. That gets into interviewing and potentially being hired.

Continue reading AI and Testing: Personal Marketability

AI and Testing: Evaluating the Future

As our technocracy continues to grow and as (at least some) technologists continue to push us toward a potentially dehumanized and dehumanizing future, I want to focus on how we can work from within this technocracy to make sure that human experimentation is front and center.

Continue reading AI and Testing: Evaluating the Future

Navigating the AI Shift: A Tester’s Mandate

It’s very clear that artificial intelligence has become more democratized than at any other time in history. It’s also fairly clear that this democratization will not only continue but likely accelerate. What is the mandate for quality and test specialists in this context?

Continue reading Navigating the AI Shift: A Tester’s Mandate

AI-Powered Testing: Exploring and Exploiting with Reinforcement

There’s a lot of talk out there about using large language models to help testers write tests, such as coming up with scenarios. There’s also talk out there about AI based tools actually doing the testing. Writing tests and executing tests are both a form of performing testing. So let’s talk about what this means in a human and an AI context.

Continue reading AI-Powered Testing: Exploring and Exploiting with Reinforcement

Text Trek: Navigating Classifications, Part 6

In this final post of this series, we’ll look at training our learning model on our Emotions dataset. This post is the culmination of everything we’ve learned in the first three posts in this series and then implemented in the previous two posts in this series. So let’s dig in for the final stretch!

Continue reading Text Trek: Navigating Classifications, Part 6

Text Trek: Navigating Classifications, Part 5

This post, and the following, will bring together everything we’ve learned in the previous four posts in this text classification series. Here we’re going to use the Emotions dataset we looked at in the last post and feed it to a model.

Continue reading Text Trek: Navigating Classifications, Part 5