I am developing my own platform for communicating with AI characters — an analogue of Character.ai, but with its own memory architecture, model routing, and character categories. One of the first practical questions that arose was: which LLM to use and whether one model is suitable for all types of characters.
The short answer is no. A model that is excellent at writing code or analyzing documents may turn out to be a mediocre conversationalist: it loses the character's role, responds too formally, ignores details from previous messages. In this article — specific experience: which models I chose for different character categories, how much it costs, and when it's worth switching to another provider.
Why not every LLM is suitable for an AI character
Most modern language models were created as universal assistants. They were trained to respond accurately, safely, and helpfully. For a support chatbot or a code assistant — this is exactly what is needed. For an AI character — it is often an obstacle.
Typical problems when using "assistant" models for characters:
- The model breaks character after a few dozen messages
- Responses sound too formal even when the character is supposed to be informal
- Constant reminders that the user is interacting with an AI, not a real person
- Weak emotional engagement in the conversation
- Refusals in harmless role-playing scenarios
For an AI companion, completely different characteristics are important: maintaining the character's role throughout a long dialogue, natural conversational style, emotional responses, and the ability to "remember" details from previous conversations.
Therefore, it is worth using different models for different character categories — and this fundamentally affects the quality of the final product.
How I organized model routing by character categories
In my platform, characters are divided into categories: EDUCATION, SUPPORT, ENTERTAINMENT, COMPANION, ROMANTIC, FINANCE, CAREER, FITNESS, LANGUAGE, KIDS, CREATIVE. Each category has its own requirements — accuracy is important for an educational character, role retention for a romantic one.
The architectural solution I chose is to store model settings in the configuration
and build a separate ChatClient for each category. To change the model —
a single line in application.properties is enough, without any code changes:
# application.properties
ai.models.education=openai/gpt-4o-mini
ai.models.support=openai/gpt-4o-mini
ai.models.entertainment=deepseek/deepseek-v4-flash
ai.models.romantic=sao10k/l3.3-euryale-70b
ai.models.finance=openai/gpt-4o-mini
ai.models.summary=deepseek/deepseek-v4-flash
At the Spring Boot level, this is implemented through Map<CharacterCategory, ChatClient>
where each category gets its own client. Categories without a separate model are mapped
to the closest one in meaning:
@Bean
public Map<CharacterCategory, ChatClient> chatClientsByCategory(AiModelsProperties props) {
ChatClient educationClient = buildChatClient(props.getEducation());
ChatClient entertainmentClient = buildChatClient(props.getEntertainment());
ChatClient romanticClient = buildChatClient(props.getRomantic());
return Map.ofEntries(
Map.entry(CharacterCategory.EDUCATION, educationClient),
Map.entry(CharacterCategory.ENTERTAINMENT, entertainmentClient),
Map.entry(CharacterCategory.ROMANTIC, romanticClient),
Map.entry(CharacterCategory.COMPANION, romanticClient),
Map.entry(CharacterCategory.FINANCE, educationClient)
);
}
I have also implemented agent routing separately: if a message requires up-to-date data —
weather, news, stock prices — the request is passed to the SearchAgent with a set
of tools: Wikipedia, Tavily, NewsAPI, AlphaVantage. For RP categories
(ROMANTIC, COMPANION), this routing is disabled — specialized RP models
do not support function calling.
Model overview: DeepSeek, GPT-4o mini, Euryale, MiniMax M2-Her
DeepSeek V4 Flash
DeepSeek V4 Flash — my main model for most categories. It uses a Mixture-of-Experts architecture: 284B total parameters but only 13B active for each request. This is why it is so cheap with acceptable response quality.
Current price via OpenRouter: $0.10/M input, $0.20/M output. Context window — 1M tokens. Supports tool calling and structured output.
I chose it as the base for several reasons: stable operation without 429 errors, unlike the free version, full tool calling for SearchAgent, and a cost many times lower than GPT-4o mini with comparable quality for entertainment content.
Not suitable when: the character has a complex personality with subtle emotional reactions — the model sometimes "slips" into a neutral assistant tone after 20–30 messages. For ROMANTIC and COMPANION categories, this is noticeable and spoils the experience.
GPT-4o mini
GPT-4o mini — I use it for categories where content accuracy and safety are important. Current price: $0.15/M input, $0.60/M output. Context window — 128K tokens.
Why specifically it for EDUCATION, SUPPORT, FINANCE, KIDS: the model best follows the system prompt and content restrictions. This is crucial for child characters — other models sometimes go beyond the allowed limits even with clearly defined prohibitions in the prompt. Plus, stable tool calling for SearchAgent when the character needs to provide up-to-date answers.
Not suitable when: high emotional engagement and long RP dialogue are required. GPT-4o mini is too "polite" — even a sarcastic character sounds softer in it than intended in the prompt.
Sao10K Euryale 70B
Llama 3.3 Euryale 70B — a specialized RP model from independent developer Sao10K, popular among the SillyTavern community. Trained specifically on role-playing scenarios and long dialogues with characters. Current price: $0.65/M input, $0.75/M output. Context window — 131K tokens.
I connected it for the ROMANTIC category after noticing that both DeepSeek and GPT-4o mini "soften" characters — even with a detailed system prompt, the responses were too neutral for a romantic companion.
An important limitation I discovered immediately in practice: the model does not support function calling. When trying to pass a request to it through SearchAgent with tools — I get a 404. Therefore, agent routing is disabled separately for this category.
Not suitable when: the character needs to provide up-to-date information (news, rates, weather) or high accuracy of factual answers is required. This is purely an RP model.
MiniMax M2-Her
MiniMax M2-Her — a model I am considering as an intermediate option between DeepSeek and Euryale. Trained specifically for AI companions. Current price: $0.30/M input, $1.20/M output.
It is interesting because it maintains character role better than DeepSeek but is twice as cheap as Euryale for input. I am currently testing it for the COMPANION category — if the results are confirmed, I will switch to it as the main one for RP without strict romantic scenarios.
Not suitable when: maximum immersion in the role and long dialogue are required — Euryale still wins here. It is also worth checking tool calling support before using with agent routing.