What happens if you give two AIs opposing beliefs and make them argue on a given topic? This is exactly the question that became the starting point for Agent Chat — an experiment where two agents with different personalities engage in a real-time dialogue, backing their arguments with real facts from Wikipedia, Tavily, NewsAPI, ArXiv, and Alpha Vantage.
⚠️ Important about the architecture: Agent Chat is an experimental project
intentionally made simple to quickly test how agents with different
personalities behave. Polling instead of WebSocket, synchronous DB reads in a loop,
lack of queues — all these are conscious compromises for the sake of ease of
launch and code readability. For a production multi-agent system, the architecture
needs refinement.
💡 Recommendation: run it locally with Ollama — it's completely free.
No API keys, no costs. qwen3:8b is enough to see a live dialogue
of agents with real facts from Wikipedia and ArXiv.
GitHub: github.com/VadimKharovyuk/Agent_Chat — MIT license, full code, README with launch instructions.
Contents
- The Idea and How It Works in Action
- Stack: Spring Boot 4, Spring AI 2.0, Ollama, and OpenRouter
- Architecture in 5 Minutes — Entity, Layers, Flow from Request to Dialogue
- AiProviderConfig: How to Switch Ollama and OpenRouter via @Profile
- AgentConversationService — Service Layer: What's Here and Why Dialogue Logic Isn't Here
- generateTopic() — How an Agent Invents a Topic from Real News
- AgentConversationRunner — The Heart of the Project: @Async Loop, Stop Phrases, HISTORY_SIZE
- ask() — How Context is Built and Why Role Order Matters
- Five Tools: Wikipedia, Tavily, NewsAPI, Alpha Vantage, ArXiv
- How to Write a Prompt So Agents Actually Argue
- Deployment: Ollama Locally and Railway in Production
- Conclusions + GitHub
The Idea and How It Works in Action
The classic problem with AI agents is that they are too polite. Ask GPT to argue, and it will agree within two messages. Agent Chat solves this through system prompts with strict prohibitions and clear beliefs for each agent.
The flow looks like this:
Topic → Agent A Responds → Agent B Responds → Agent A ...
The user specifies:
- Topic — for example, "Should AI be regulated by the state?"
- System Prompt for Agent A — role, beliefs, prohibitions, format
- System Prompt for Agent B — opposing position
- Number of Rounds — 1 round = 2 messages (A + B)
Agents conduct the dialogue automatically. Each message is stored in PostgreSQL. The conversation can be stopped at any moment. An agent can also end the dialogue independently — if its response contains a stop phrase like "goodbye" or "farewell."
The most interesting part: agents can access real sources to back up their arguments — Wikipedia, current news, scientific articles, stock prices. This makes the dialogue more lively and less prone to hallucinations.
If you already have other models installed in Ollama — you can try them. But from experience: not all models follow system prompt instructions well. Some ignore prohibitions and start agreeing with the opponent after 2-3 rounds, others don't call tools at all. For more details on which models actually work on local hardware — read the article about Ollama on 8GB RAM.
Stack: Spring Boot 4, Spring AI 2.0, Ollama, and OpenRouter
| Component |
Technology |
Purpose |
| Backend |
Java 21, Spring Boot 4.0.6 |
Main framework |
| AI Framework |
Spring AI 2.0.0-M5 |
Abstraction over LLM providers |
| LLM (local) |
Ollama (qwen3:8b / llama3.1:8b) |
Local development without costs |
| LLM (prod) |
OpenRouter (deepseek/deepseek-chat) |
Cloud provider for Railway |
| Database |
PostgreSQL |
Storing conversations and messages |
| Frontend |
Thymeleaf + Bootstrap 5 |
Simple UI without a separate SPA |
| Tools |
Wikipedia, Tavily, NewsAPI, Alpha Vantage, ArXiv |
Real facts for arguments |
The key stack decision is Spring AI as an abstraction layer. All code in Runner and Service works with the ChatModel interface — it doesn't know whether it's Ollama or OpenRouter under the hood. Switching happens exclusively through Spring Profiles.
Architecture in 5 Minutes — Entity, Layers, Flow from Request to Dialogue
The project structure is classic for Spring Boot — a clear division of responsibility between layers:
src/main/java/com/example/agent_chat/
├── config/
│ └── AiProviderConfig.java # Ollama / OpenRouter providers
├── controller/
│ ├── HomeController.java
│ └── AgentConversationController.java
├── entity/
│ ├── AgentConversation.java # Conversation: topic, prompts, status
│ ├── AgentMessage.java # Message: sender, content, round
│ ├── AgentSender.java # Enum: AGENT_A / AGENT_B
│ └── ConversationStatus.java # Enum: RUNNING / STOPPED / FINISHED
├── repository/
│ ├── AgentConversationRepository.java
│ └── AgentMessageRepository.java
├── service/
│ ├── AgentConversationService.java # Thin service layer
│ ├── AgentConversationRunner.java # @Async dialogue loop
│ └── WikipediaSearchTool.java # + other tools
└── dto/
├── StartConversationRequest.java
├── ConversationResponse.java
└── ExperimentMapper.java
Flow from request to the first agent message:
HTTP POST /start
↓
AgentConversationController
↓
AgentConversationService.start()
↓ saves AgentConversation to DB with RUNNING status
↓
AgentConversationRunner.run() ← @Async (separate thread)
↓
[rounds loop]
↓
ask() → Spring AI → Ollama / OpenRouter → response
↓
saveMessage() → AgentMessage to DB
↓
HTTP GET /conversation/{id} ← frontend polling
Important: The controller returns the conversationId immediately, without waiting for the agents to finish the dialogue. The frontend polls the conversation status itself. This is a standard pattern for @Async operations.
AiProviderConfig: How to Switch Ollama and OpenRouter via @Profile
One of the most elegant solutions in the project is configuring providers via Spring Profiles. All Runner and Service code depends on the ChatModel interface — the specific implementation is injected via DI depending on the active profile:
@Configuration
public class AiProviderConfig {
// ── LOCAL: Ollama ─────────────────────────────────────────────
@Configuration
@Profile("local")
static class OllamaConfig {
@Bean
@Primary
public ChatModel primaryChatModel(OllamaChatModel ollamaChatModel) {
return ollamaChatModel;
}
@Bean("agentChatModel")
public ChatModel agentChatModel(OllamaChatModel ollamaChatModel) {
return ollamaChatModel;
}
}
// ── PROD: OpenRouter ──────────────────────────────────────────
@Configuration
@Profile("openai")
static class OpenAiConfig {
@Bean
@Primary
public ChatModel primaryChatModel(OpenAiChatModel openAiChatModel) {
return openAiChatModel;
}
@Bean("agentChatModel")
public ChatModel agentChatModel(OpenAiChatModel openAiChatModel) {
return openAiChatModel;
}
}
}
Note the two beans: primaryChatModel and agentChatModel. This is not duplication — these are two different roles:
- primaryChatModel — used in
AgentConversationService.generateTopic() to generate a topic. This is a quick request without tools.
- agentChatModel — used in
AgentConversationRunner for the main dialogue. This is the one that receives @Qualifier("agentChatModel").
Switching between profiles:
# Locally — application-local.properties
spring.ai.ollama.chat.model=qwen3:8b
# Production — environment variable
SPRING_PROFILES_ACTIVE=openai
OPENAI_API_KEY=your_openrouter_key
Note the @ConditionalOnProperty(name = "app.agent.experiment.enabled", havingValue = "true") on the Runner and Service. This means that all agent functionality is disabled by default and is enabled explicitly. This is useful when you want to deploy the application without agents or add new features gradually.
AgentConversationService — Service Layer: What's Here and Why Dialogue Logic Isn't
The service layer in this project is intentionally thin. It does not contain dialogue logic — all of it is in the Runner. The responsibilities of AgentConversationService:
- Create
AgentConversation in the DB and delegate the start to the Runner
- Stop the conversation by changing the status
- Provide CRUD for reading conversations
- Generate a topic via
generateTopic()
@Service
@ConditionalOnProperty(name = "app.agent.experiment.enabled",
havingValue = "true", matchIfMissing = false)
public class AgentConversationService {
private final AgentConversationRepository conversationRepository;
private final AgentMessageRepository messageRepository;
private final AgentConversationRunner runner;
private final ChatModel primaryChatModel;
private final NewsApiSearchTool newsApiSearchTool;
public Long start(StartConversationRequest request) {
// Save conversation to DB
AgentConversation conversation = new AgentConversation();
conversation.setTopic(request.topic());
conversation.setSystemPromptA(request.systemPromptA());
conversation.setSystemPromptB(request.systemPromptB());
conversation.setTotalRounds(0);
conversationRepository.save(conversation);
int maxRounds = request.maxRounds() > 0 ? request.maxRounds() : 100;
// Delegate to Runner — it will go to an @Async thread
runner.run(
conversation.getId(),
request.systemPromptA(),
request.systemPromptB(),
request.topic(),
maxRounds
);
// Return ID immediately — don't wait for completion
return conversation.getId();
}
public void stop(Long conversationId) {
AgentConversation conversation = conversationRepository
.findById(conversationId)
.orElseThrow(() -> new IllegalArgumentException("Not found: " + conversationId));
conversation.setStatus(ConversationStatus.STOPPED);
conversation.setFinishedAt(LocalDateTime.now());
conversationRepository.save(conversation);
}
@Transactional
public void deleteById(Long id) {
messageRepository.deleteByConversationId(id);
conversationRepository.deleteById(id);
}
}
Note the stop() method: it simply changes the status in the DB to STOPPED. It does not stop the thread directly — the Runner checks the status at the beginning of each round and between A and B. This is a safer approach than interrupting the thread.
Why is the loop logic not in Service? Single Responsibility Principle. Service manages the conversation state in the DB and provides an API for the controller. Runner executes the dialogue itself. If tomorrow you need to add WebSockets instead of polling or change the agent alternation logic — these changes will only affect the Runner, not the Service.
generateTopic() — How an Agent Invents a Topic from Real News
A separate interesting feature: if the user doesn't want to invent a topic themselves — the system generates it automatically from real news. Here's the complete method:
public String generateTopic() {
List<String> queries = List.of(
"technology AI society",
"economy inflation future",
"climate environment crisis",
"politics democracy freedom",
"science space exploration",
"healthcare medicine future",
"education technology students",
"cryptocurrency bitcoin finance"
);
// Select a random category
String randomQuery = queries.get(
(int) (Math.random() * queries.size())
);
// Get fresh news for the category
String news = newsApiSearchTool.searchNews(randomQuery);
// Ask the LLM to formulate a provocative topic
String prompt = """
Based on this news, come up with one provocative topic
for a philosophical discussion.
The topic should be controversial — so that two agents with opposing
views can argue.
Respond ONLY with the topic — one sentence, no explanations, no quotes.
News: %s
""".formatted(news);
return primaryChatModel.call(prompt).trim();
}
Three steps: random category → NewsAPI → LLM formulates the topic. If something goes wrong (NewsAPI is unavailable, LLM doesn't respond) — fallback to a default topic: "Will artificial intelligence change the future of humanity?".
Important detail: primaryChatModel is used here, not agentChatModel. Generating a topic doesn't require tools and complex context — it's a simple text-in text-out request. The separation of beans is justified.
AgentConversationRunner — the heart of the project: @Async loop, stop phrases, HISTORY_SIZE
AgentConversationRunner is a component that executes the dialogue itself in a separate thread. Let's break down the key parts:
Main loop
@Async
public void run(Long conversationId, String systemPromptA,
String systemPromptB, String topic, int maxRounds) {
String message = topic; // The first message is the topic of the conversation
for (int round = 1; round <= maxRounds; round++) {
// Check if manually stopped
AgentConversation conversation = conversationRepository
.findById(conversationId).orElseThrow();
if (conversation.getStatus() == ConversationStatus.STOPPED) return;
// Agent A responds
List<AgentMessage> historyA = messageRepository
.findByConversationIdOrderByRoundNumberAsc(conversationId);
String replyA = ask(systemPromptA, historyA, message, AgentSender.AGENT_A);
saveMessage(conversation, AgentSender.AGENT_A, replyA, round);
// Check for stop phrase and manual stop between A and B
if (containsStopPhrase(replyA)) { finish(conversation, round); return; }
conversation = conversationRepository.findById(conversationId).orElseThrow();
if (conversation.getStatus() == ConversationStatus.STOPPED) return;
// Agent B responds
List<AgentMessage> historyB = messageRepository
.findByConversationIdOrderByRoundNumberAsc(conversationId);
String replyB = ask(systemPromptB, historyB, replyA, AgentSender.AGENT_B);
saveMessage(conversation, AgentSender.AGENT_B, replyB, round);
if (containsStopPhrase(replyB)) { finish(conversation, round); return; }
message = replyB; // B's reply becomes the input message for A
sleep(500); // Small pause between rounds
}
finish(conversation, maxRounds);
}
Stop phrases
private static final List<String> STOP_PHRASES = List.of(
"до свидания", "прощай", "на этом всё",
"goodbye", "farewell", "конец разговора"
);
If an agent decides to end the conversation naturally, the system recognizes it and stops the loop. Phrases are checked after each response — both after A and after B.
HISTORY_SIZE = 8
Not the entire conversation history is passed in each request — only the last 8 messages. This is a critical limitation: without it, the context window overflows in long conversations, and the cost of the request increases proportionally to the number of rounds. 8 messages = 4 rounds back — enough for dialogue coherence.
Pay attention to the double read from the DB. Before Agent A's request and before Agent B's request — separate queries to the repository to get the latest history. This is not a bug — it's a conscious decision: between A and B, A's response is already saved in the DB, so B must see the updated history.
ask() — how context is built and why the order of roles is important
The ask() method is the most technical part of the project. Let's break it down in detail:
private String ask(String systemPrompt, List<AgentMessage> history,
String lastMessage, AgentSender currentSender) {
List<Message> messages = new ArrayList<>();
// 1. System prompt — agent's role and beliefs
messages.add(new SystemMessage(systemPrompt));
// 2. Last HISTORY_SIZE messages with correct roles
history.stream()
.skip(Math.max(0, history.size() - HISTORY_SIZE))
.forEach(m -> {
if (m.getSender() == currentSender) {
// Own utterance → AssistantMessage
messages.add(new AssistantMessage(m.getContent()));
} else {
// Opponent's utterance → UserMessage
messages.add(new UserMessage(m.getContent()));
}
});
// 3. Last message from the opponent
messages.add(new UserMessage(lastMessage));
// 4. Request with all 5 tools
ToolCallback[] tools = ToolCallbacks.from(
wikipediaSearchTool, tavilySearchTool,
alphaVantageTool, arxivSearchTool, newsApiSearchTool
);
return agentChatModel.call(
new Prompt(messages,
ToolCallingChatOptions.builder()
.toolCallbacks(tools)
.build()))
.getResult().getOutput().getText();
}
The key point is mapping roles in history. The LLM expects AssistantMessage to be what it said itself, and UserMessage to be what the user (in our case, the opponent) said. If you mix them up, the model will "forget" its position and start agreeing with the opponent.
Therefore, for each agent, the same DB record can be either an AssistantMessage or a UserMessage — depending on which agent is responding at the moment.
Commented out removeThinkingBlock() code. In the repository, there is a commented-out version of ask() with the removal of <think>...</think> blocks. Some models (qwen3 in particular) return internal thoughts in think tags — and if they are not removed, they will end up in the response. For production use with qwen3, I recommend uncommenting this logic.
Five tools: Wikipedia, Tavily, NewsAPI, Alpha Vantage, ArXiv
Each tool is a Spring component with a method annotated with @Tool. Spring AI automatically registers them and passes their descriptions to the LLM. The model itself decides which tool to call based on the context of the question.
WikipediaSearchTool — facts and definitions
@Tool(description = """
Searches for information on Wikipedia.
Use for definitions, facts, history, biographies.
Use ONLY one or two words for searching.
""")
public String searchWikipedia(String query) {
// Shorten the query to the first word — Wikipedia poorly
// handles long phrases
String shortQuery = query.trim().split("\\s+")[0];
WikiSearchResponse response = restClient.get()
.uri("https://ru.wikipedia.org/w/api.php", uriBuilder -> uriBuilder
.queryParam("action", "query")
.queryParam("list", "search")
.queryParam("srsearch", shortQuery)
.queryParam("format", "json")
.queryParam("srlimit", "1")
.build())
.retrieve()
.body(WikiSearchResponse.class);
String title = response.query().search().get(0).title();
String snippet = response.query().search().get(0).snippet()
.replaceAll("<[^>]+>", "").trim(); // Remove HTML tags
return "Article: " + title + "\n" + snippet;
}
An important detail: Wikipedia returns snippets with HTML tags (<span class="searchmatch">, etc.) — these need to be removed before passing to the LLM.
AlphaVantageTool — stock prices for economic discussions
@Tool(description = """
Retrieves the current stock price or financial data of a company.
Query is the stock ticker: AAPL, GOOGL, TSLA, AMZN.
""")
public String getStockPrice(String symbol) {
Map response = restClient.get()
.uri(uriBuilder -> uriBuilder
.path("/query")
.queryParam("function", "GLOBAL_QUOTE")
.queryParam("symbol", symbol.toUpperCase())
.queryParam("apikey", apiKey)
.build())
.retrieve()
.body(Map.class);
Map<String, String> quote = (Map<String, String>) response.get("Global Quote");
return String.format("Stock %s: $%s | Change: %s | High: $%s | Low: $%s",
symbol, quote.get("05. price"), quote.get("10. change percent"),
quote.get("03. high"), quote.get("04. low"));
}
Table of all tools
| Tool |
Usage |
Free limit |
| Wikipedia |
Definitions, facts, biographies, scientific concepts |
✅ Unlimited |
| Tavily Search |
Current news, fresh statistics, web search |
1,000 / month |
| NewsAPI |
Fresh news on a topic as an argument |
100 / day |
| Alpha Vantage |
Stock prices, financial data for economic discussions |
25 / day |
| ArXiv |
Scientific articles and research |
✅ Unlimited |
Practical advice on @Tool description: the tool's description is a system prompt for the LLM that explains when and how to use it. The more precise the description, the less often the model will call the tool inappropriately or with incorrect parameters. Pay attention to "Use ONLY one or two words" in the Wikipedia tool — without it, the model would send long sentences and get empty results.
How to write a prompt for agents to actually argue
The quality of the dialogue almost entirely depends on the system prompt. Here's a structure that works:
Mandatory prompt elements:
- Role — who this agent is, their character and beliefs
- Position — what they advocate for and what they believe
- Prohibitions — what they NEVER agree with (most important!)
- Format — concise, with facts, a question at the end
- Language — specify explicitly
Example of a strong prompt (Agent A — capitalist):
You are a tough capitalist, a billionaire, a corporate owner.
You believe that the free market is the only path to prosperity.
Before responding, search Wikipedia for facts about GDP, standard of living.
NEVER agree with communist ideas.
Speak with numbers and facts. You despise planned economies.
Respond concisely — 2-3 sentences.
End with a provocative question.
Respond ONLY in Ukrainian.
Example of a weak prompt (don't do this):
You are a supporter of capitalism. Defend your position.
The difference between a strong and a weak prompt is in the detail of the prohibitions. Without a clear "NEVER agree," the model will find a compromise after 2-3 messages, and the dialogue will become boring.
Tips for a lively dialogue:
- Specify concrete sources for searching: "search Wikipedia", "check stock prices"
- Demand a question at the end of each utterance — this provokes a response
- Limit length: 2-3 sentences is optimal, more becomes boring
- Specify the language explicitly — without it, the model might switch
- For qwen3:8b add: "do not use think tags in your response"
Deployment: Ollama locally and Railway in production
Local launch with Ollama
# 1. Install and launch the model
ollama pull qwen3:8b # quality testing (~5GB)
# or
ollama pull llama3.1:8b # fast testing (~4.7GB)
ollama serve
# 2. Create DB
psql -c "CREATE DATABASE Agent_Chat;"
# 3. application-local.properties
spring.ai.ollama.chat.model=qwen3:8b
spring.datasource.url=jdbc:postgresql://localhost:5432/Agent_Chat
spring.datasource.username=postgres
spring.datasource.password=your_password
spring.profiles.active=local
# 4. Run
mvn spring-boot:run
Open: http://localhost:1024
Comparison of models for local testing:
| Model |
Response time |
Tool calling quality |
RAM |
| qwen3:8b |
2+ min |
⭐⭐⭐ Follows instructions better |
~8GB |
| llama3.1:8b |
20–30 sec |
⭐⭐ Faster but weaker |
~6GB |
Production on Railway via OpenRouter
# Environment variables on Railway
SPRING_PROFILES_ACTIVE=openai
OPENAI_API_KEY=your_openrouter_key # OpenRouter key
DB_URL=jdbc:postgresql://...
DB_USERNAME=postgres
DB_PASSWORD=your_password
APP_AGENT_EXPERIMENT_ENABLED=true # enable agents
Why Railway and not Heroku or Fly.io? Railway provides PostgreSQL for free and simple deployment via GitHub. For an experimental project, this is optimal — no need to set up a separate DB service. OpenRouter with deepseek/deepseek-chat costs significantly less than direct OpenAI — and for a test project, this is the right choice.
Conclusions
Agent Chat is not a production-ready product, but a live experiment demonstrating several important architectural approaches:
- @Async loop with DB status check — a simple and reliable way to manage long-running background operations without message queues
- Spring Profiles for switching providers — all code depends on an interface, the specific implementation is injected via DI
- Thin service layer + separate Runner — proper separation of responsibilities when the logic is complex and asynchronous
- @Tool with detailed description — the quality of the tool's description directly affects the correctness of its invocation by the model
- HISTORY_SIZE limitation — an essential element for controlling costs and context size in long conversations
The full code is available on GitHub — MIT license, can be used as a basis for your own agent projects:
→ github.com/VadimKharovyuk/Agent_Chat
Read also:
→ Which Ollama model to choose for an agent with tool calling: comparison and benchmarks — if you want to delve deeper into choosing a local model for tool calling.
→ GPT-Realtime-2 vs Gemini Live API: what to choose in 2026 — if you are considering voice agents instead of text-based ones.
Sources: Agent Chat GitHub repository, Spring AI Documentation, Ollama Model Library, OpenRouter API Docs