AI and Testing: LangChain Messages

In the previous post, we got familiar with LangChain templates and dipped our toes into messages. In this post, I’m going to focus a bit more on those messages since these are the key to communicating with AI.

The Basis of Messages

LangChain treats interaction with a language model not as “sending a string,” but as participating in a conversation with structure. At the heart of this idea is the Messages subsystem. Instead of collapsing everything into raw text, LangChain represents each part of an interaction as a message with a role: human, system, AI, or tool. This mirrors how modern chat-based models actually reason: they don’t just read text; they interpret who is speaking, in what order, and with what authority. By making those distinctions explicit in code, LangChain preserves conversational intent instead of burying it inside prompt text.

This design creates a clean separation between what is said and how what is said participates in the conversation. A prompt template formats content, but a message defines meaning within an interaction. So, for example, a “HumanMessage” is not just text: it’s a user utterance. A “SystemMessage” is not just instructions: it’s context that shapes all downstream reasoning. The ChatPromptTemplate then acts as an assembler, composing these role-aware messages into a coherent conversational starting state.

The result is a system that models dialogue the way the model itself experiences it: as a sequence of role-bound messages, not as a single monolithic prompt. This makes conversations more explicit, more testable, and easier to extend as interactions grow beyond simple question-and-answer exchanges.

Holding Our Messages

Let’s consider a script that uses a particular aspect of messages directly.

from langchain_ollama import ChatOllama
from langchain_core.prompts import (
  ChatPromptTemplate,
  MessagesPlaceholder
)
from langchain_core.messages import HumanMessage

MODEL = "qwen3:latest"

model = ChatOllama(
  model = MODEL,
  base_url = "http://localhost:11434",
)

systemPrompt = """
  You are an expert in explaining Einstein's relativity to
  non-specialist audiences.
  """

prompt_template = ChatPromptTemplate([
  ("system", systemPrompt),
  MessagesPlaceholder("message")
])

prompt = prompt_template.invoke(
  {
    "message": [
      HumanMessage("How would you explain time dilation?")
    ]
  }
)

response = model.invoke(prompt).content

print(response)

from langchain_ollama import ChatOllama

from langchain_core.prompts import (

ChatPromptTemplate,

MessagesPlaceholder

)

from langchain_core.messages import HumanMessage

MODEL = "qwen3:latest"

model = ChatOllama(

model = MODEL,

base_url = "http://localhost:11434",

)

systemPrompt = """

You are an expert in explaining Einstein's relativity to

non-specialist audiences.

"""

prompt_template = ChatPromptTemplate([

("system", systemPrompt),

MessagesPlaceholder("message")

])

prompt = prompt_template.invoke(

{

"message": [

HumanMessage("How would you explain time dilation?")

]

}

)

response = model.invoke(prompt).content

print(response)

This script introduces MessagesPlaceholder, which creates a named slot in your template where you can insert messages dynamically. The key connection is that the string you pass to MessagesPlaceholder must match the key in your invoke call. When we write:

MessagesPlaceholder("message")

1	MessagesPlaceholder("message")

We’re saying: “Hey, there’s a placeholder here called message.” Then when we invoke the template:

prompt_template.invoke({"message": [HumanMessage("...")]})

1	prompt_template.invoke({"message": [HumanMessage("...")]})

We’re saying: “Fill that message placeholder with this following list of messages.” The name “message” is arbitrary. I could have called it “conversation” or “user_input” or anything else, as long as it matches on both sides. It’s just a label connecting the placeholder to the data you provide.

Notice something important: we’re not passing a simple string to MessagesPlaceholder. We’re not doing this:

prompt_template.invoke({"message": "How would you explain time dilation?"})

1	prompt_template.invoke({"message": "How would you explain time dilation?"})

Why? This is because MessagesPlaceholder expects a list of message objects, not plain strings. And the reason for that is because it’s designed to handle conversation structure: it needs to know the role of each message: human, AI, or system. A plain string doesn’t carry that information.

The Flexibility of Messages

This code I just showed you might seem like extra work for a single message (and it is!), but it makes sense when you realize MessagesPlaceholder is built for flexibility. You might pass in one message, or ten messages of conversation history. Either way, those are handled the same because the MessagesPlaceholder always expects a list of message objects.

So why might this approach be used? One is the ability to actually maintain a conversation history. The idea is you can pass in a list of back-and-forth messages from an ongoing chat session.

A second use would be working with stored messages. You might want to use message objects you’ve retrieved from a database, log files, or just from a previous chat session. Yet a third use would be dynamic content injection. You might add variable numbers of examples, context, or what are called few-shot demonstrations based on runtime conditions.

Each one of those highlights a different aspect of why MessagesPlaceholder exists:

temporal dimension (ongoing conversations)
persistence/retrieval (data from elsewhere)
programmatic flexibility (runtime decisions)

Going with my above examples, the most common use case is maintaining conversation history. Imagine you’re building a chatbot that needs to remember previous exchanges. Let me show you a rough example of what this would look like (you don’t have to type this in):

# Messages from an ongoing conversation
history = [
  HumanMessage("What is relativity?"),
  AIMessage("Relativity describes how space and time..."),
  HumanMessage("Can you explain time dilation?"),
  AIMessage("Time dilation occurs when...")
]

# Now add a new question while keeping the history
prompt_template.invoke({
  "message": history + [HumanMessage("How does this affect GPS satellites?")]
})

# Messages from an ongoing conversation

history = [

HumanMessage("What is relativity?"),

AIMessage("Relativity describes how space and time..."),

HumanMessage("Can you explain time dilation?"),

AIMessage("Time dilation occurs when...")

]

# Now add a new question while keeping the history

prompt_template.invoke({

"message": history + [HumanMessage("How does this affect GPS satellites?")]

})

This is exactly what MessagesPlaceholder was primarily designed for: flexibly handling conversation context of any length.

Let me also explore that idea of “few-shot” prompting that I mentioned. This is where you include example interactions to guide the model’s behavior. Again, here’s some general code to show the idea.

# Example interactions used to guide the model
examples = [
  HumanMessage("What is kinetic energy?"),
  AIMessage("Kinetic energy is the energy an object has due to its motion."),
  HumanMessage("What is potential energy?"),
  AIMessage("Potential energy is stored energy based on position or state.")
]

# The actual question comes after the examples
prompt_template.invoke({
  "message": examples + [HumanMessage("What is thermal energy?")]
})

# Example interactions used to guide the model

examples = [

HumanMessage("What is kinetic energy?"),

AIMessage("Kinetic energy is the energy an object has due to its motion."),

HumanMessage("What is potential energy?"),

AIMessage("Potential energy is stored energy based on position or state.")

]

# The actual question comes after the examples

prompt_template.invoke({

"message": examples + [HumanMessage("What is thermal energy?")]

})

With this example, the earlier exchanges aren’t conversation history at all. They’re demonstrations. The model infers the pattern from the examples and applies it to the new question.

Notice something important here. As written, the two above examples are structurally identical. The difference lives entirely in intent, which is invisible unless you highlight it. Think about that if you were to consider the above test cases, which they effectively are.

In this context, “few-shot” refers to providing the model with a small number of examples that demonstrate the kind of input–output behavior you want before asking it to perform a new task. Rather than training the model or giving lengthy instructions, you show it a handful of representative interactions, often framed as prior messages in the conversation. The model then infers the pattern from those examples and applies it to the new request.

“Few-shot” sits between “zero-shot” (no examples, just a question) and “fine-tuning” (changing the model itself), offering a lightweight way to guide behavior using concrete demonstrations.

You can make our previous code a little simpler with one simple change:

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import HumanMessage

MODEL = "qwen3:latest"

model = ChatOllama(
  model = MODEL,
  base_url = "http://localhost:11434",
)

systemPrompt = """
  You are an expert in explaining Einstein's relativity to
  non-specialist audiences.
  """

prompt_template = ChatPromptTemplate([
  ("system", systemPrompt),
  ("placeholder", "{message}")
])

prompt = prompt_template.invoke(
  {
    "message": [
      HumanMessage("How would you explain time dilation?")
    ]
  }
)

response = model.invoke(prompt).content

print(response)

from langchain_ollama import ChatOllama

from langchain_core.prompts import ChatPromptTemplate

from langchain_core.messages import HumanMessage

MODEL = "qwen3:latest"

model = ChatOllama(

model = MODEL,

base_url = "http://localhost:11434",

)

systemPrompt = """

You are an expert in explaining Einstein's relativity to

non-specialist audiences.

"""

prompt_template = ChatPromptTemplate([

("system", systemPrompt),

("placeholder", "{message}")

])

prompt = prompt_template.invoke(

{

"message": [

HumanMessage("How would you explain time dilation?")

]

}

)

response = model.invoke(prompt).content

print(response)

Here “placeholder” is yet another element you can add as part of the messages sequence. The placeholder is saying: “Reserve this spot in the message sequence, and I’ll fill it with whatever gets passed in under this variable name later.”

Augmented Example: Conversation History

Let’s consider a slightly more augmented example that uses the history idea I talked about above.

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import HumanMessage, AIMessage

MODEL = "qwen3:latest"

model = ChatOllama(
  model = MODEL,
  base_url = "http://localhost:11434",
)

systemPrompt = """
  You are an expert in explaining Einstein's relativity to
  non-specialist audiences.
  """

prompt_template = ChatPromptTemplate([
  ("system", systemPrompt),
  ("placeholder", "{conversation_history}"),
  ("human", "{current_question}")
])

prompt1 = prompt_template.invoke({
  "conversation_history": [],
  "current_question": "How would you explain time dilation?"
})

response1 = model.invoke(prompt1).content

print("FIRST RESPONSE:")
print(response1)

prompt2 = prompt_template.invoke({
  "conversation_history": [
    HumanMessage("How would you explain time dilation?"),
    AIMessage(response1)
  ],
  "current_question": "Can you give me a concrete example involving twins?"
})

response2 = model.invoke(prompt2).content

print("\nSECOND RESPONSE:")
print(response2)

from langchain_ollama import ChatOllama

from langchain_core.prompts import ChatPromptTemplate

from langchain_core.messages import HumanMessage, AIMessage

MODEL = "qwen3:latest"

model = ChatOllama(

model = MODEL,

base_url = "http://localhost:11434",

)

systemPrompt = """

You are an expert in explaining Einstein's relativity to

non-specialist audiences.

"""

prompt_template = ChatPromptTemplate([

("system", systemPrompt),

("placeholder", "{conversation_history}"),

("human", "{current_question}")

])

prompt1 = prompt_template.invoke({

"conversation_history": [],

"current_question": "How would you explain time dilation?"

})

response1 = model.invoke(prompt1).content

print("FIRST RESPONSE:")

print(response1)

prompt2 = prompt_template.invoke({

"conversation_history": [

HumanMessage("How would you explain time dilation?"),

AIMessage(response1)

"current_question": "Can you give me a concrete example involving twins?"

})

response2 = model.invoke(prompt2).content

print("\nSECOND RESPONSE:")

print(response2)

What I hope you’ve been seeing through these posts is that the prompt template defines the “shape” of every conversation. What’s changing, as we move along in these posts, is that we can actually better see a shape. In this case, prompt1 is the first interaction with the model and thus there’s no history yet. With prompt2, however, we have the second interaction. Now we have history.

Notice how I’m building the conversation history for the second prompt. I’m creating message objects: HumanMessage for what the user asked, and AIMessage for what the AI responded.

Needing these message wrapper objects is a key thing of what I’m trying to show in this post.This goes back to what I’ve been reinforcing a lot: LangChain uses message types to maintain conversation structure. The template needs to know who said what: was this the human speaking or the AI? That’s what the message types communicate.

With the above code, consider that we did this for the first prompt:

model.invoke(prompt1).content

1	model.invoke(prompt1).content

We’re extracting just the text string from the model’s response. However, for conversation history, we need to put that text back into a message object with the proper role label. Thus, we do this:

AIMessage(response1)

1	AIMessage(response1)

This says: “Here’s some text, and it came from the AI.” Without these role labels, the model wouldn’t understand the conversational structure. Meaning, it wouldn’t know which messages were questions versus answers.

Wait, Something is Fishy Here …

In this augmented example, I’m manually constructing the conversation history after the fact. The first exchange happens, then I’m building a simulated “memory” of that exchange for the second prompt.

It looks like I’m faking it, right? After all, aren’t I just creating a fictional history based on a fictional conservation? Yes, I am, but here’s what’s actually going on: I’m showing the mechanics of how conversation state gets maintained, step by step, so you can see what needs to happen under the hood.

In a production application, you wouldn’t manually construct history like this, of course. Instead, you would have a loop that automatically builds the history as the conversation unfolds from your actual system. Something like this:

conversation_history = []

while True:
  user_input = input("You: ")
  if user_input.lower() == "quit":
    break

  # Build the prompt with current history
  prompt = prompt_template.invoke({
    "conversation_history": conversation_history,
    "current_question": user_input
  })

  # Get response
  response = model.invoke(prompt)
  print(f"AI: {response.content}")

  # Update history for next turn
  conversation_history.append(HumanMessage(user_input))
  conversation_history.append(AIMessage(response.content))

conversation_history = []

while True:

user_input = input("You: ")

if user_input.lower() == "quit":

break

# Build the prompt with current history

prompt = prompt_template.invoke({

"conversation_history": conversation_history,

"current_question": user_input

})

# Get response

response = model.invoke(prompt)

print(f"AI: {response.content}")

# Update history for next turn

conversation_history.append(HumanMessage(user_input))

conversation_history.append(AIMessage(response.content))

I’m showing how all this gets built because when you’re learning (or teaching) how this works, breaking it into discrete steps helps you see a few things:

What data structure conversation history actually is (a list of message objects)
How you extract and repackage model responses
Why the template pattern matters when state needs to persist

Once you understand the mechanics, wrapping it in a loop becomes obvious. If I had started with the loop, the question from a tester would likely be: “Wait, what’s actually in that conversation_history list, and how did it get there?”

Test Fixtures in AI

Here’s where the “testing an AI” part comes into focus: when you’re evaluating model behavior, you often do need to construct specific conversational scenarios to test edge cases, consistency across turns, or how well context is maintained.

In this test-focused situation, manually building conversation histories isn’t “fake.” It’s test fixture construction! You’re setting up known states to validate behavior, just like you would seed a database with test data before running integration tests.

I hope you can see that the “fictional” history example isn’t wasted effort: it’s teaching both the pattern for real conversation and the foundation for systematic testing.

That last point is a key thing for most of these posts. It may seem like I’m not considering any testing at all. In fact, the test thinking is inherent in the understanding and construction of the logic that has to be tested in the first place.

Exploring Patterns

So far, we’ve been working with LangChain components in a pretty manual way: create a prompt template, format it, pass it to the model, get a response. But notice the pattern here. We’re essentially creating a pipeline.

  prompt → model → extract content

LangChain provides a powerful abstraction called Runnables (which I mentioned in the previous post) that let you chain these components together a bit more elegantly. Instead of manually passing outputs from one step to the next, you can compose them into a single executable pipeline.

Here’s the key insight for what we’ve been doing: both ChatPromptTemplate and ChatOllama are Runnables. This means they share a common interface. In fact, we’ve used one part of that interface regularly: the .invoke() method. What’s important to understand is that all these things can be chained together, which really gets into why the tool is called LangChain!

Next Steps!

What I just described starts to take us the concept of orchestrating behavior and that’s what I’ll dig into in the next post.

Stories from a Software Tester

Twice upon a time, in another space, no distance in any direction from here …

The Basis of Messages

Holding Our Messages

The Flexibility of Messages

Augmented Example: Conversation History

Wait, Something is Fishy Here …

Test Fixtures in AI

Exploring Patterns

Next Steps!

Leave a Reply Cancel reply