AI Models for Characters 2026: DeepSeek, GPT-4o mini, and Euryale — What I Chose

Updated:
AI Models for Characters 2026: DeepSeek, GPT-4o mini, and Euryale — What I Chose

I am developing my own platform for communicating with AI characters — an analogue of Character.ai, but with its own memory architecture, model routing, and character categories. One of the first practical questions that arose was: which LLM to use and whether one model is suitable for all types of characters.

The short answer is no. A model that is excellent at writing code or analyzing documents may turn out to be a mediocre conversationalist: it loses the character's role, responds too formally, ignores details from previous messages. In this article — specific experience: which models I chose for different character categories, how much it costs, and when it's worth switching to another provider.

Why not every LLM is suitable for an AI character

Most modern language models were created as universal assistants. They were trained to respond accurately, safely, and helpfully. For a support chatbot or a code assistant — this is exactly what is needed. For an AI character — it is often an obstacle.

Typical problems when using "assistant" models for characters:

  • The model breaks character after a few dozen messages
  • Responses sound too formal even when the character is supposed to be informal
  • Constant reminders that the user is interacting with an AI, not a real person
  • Weak emotional engagement in the conversation
  • Refusals in harmless role-playing scenarios

For an AI companion, completely different characteristics are important: maintaining the character's role throughout a long dialogue, natural conversational style, emotional responses, and the ability to "remember" details from previous conversations.

Therefore, it is worth using different models for different character categories — and this fundamentally affects the quality of the final product.

How I organized model routing by character categories

In my platform, characters are divided into categories: EDUCATION, SUPPORT, ENTERTAINMENT, COMPANION, ROMANTIC, FINANCE, CAREER, FITNESS, LANGUAGE, KIDS, CREATIVE. Each category has its own requirements — accuracy is important for an educational character, role retention for a romantic one.

The architectural solution I chose is to store model settings in the configuration and build a separate ChatClient for each category. To change the model — a single line in application.properties is enough, without any code changes:

# application.properties
ai.models.education=openai/gpt-4o-mini
ai.models.support=openai/gpt-4o-mini
ai.models.entertainment=deepseek/deepseek-v4-flash
ai.models.romantic=sao10k/l3.3-euryale-70b
ai.models.finance=openai/gpt-4o-mini
ai.models.summary=deepseek/deepseek-v4-flash

At the Spring Boot level, this is implemented through Map<CharacterCategory, ChatClient> where each category gets its own client. Categories without a separate model are mapped to the closest one in meaning:

@Bean
public Map<CharacterCategory, ChatClient> chatClientsByCategory(AiModelsProperties props) {
    ChatClient educationClient     = buildChatClient(props.getEducation());
    ChatClient entertainmentClient = buildChatClient(props.getEntertainment());
    ChatClient romanticClient      = buildChatClient(props.getRomantic());

    return Map.ofEntries(
        Map.entry(CharacterCategory.EDUCATION,     educationClient),
        Map.entry(CharacterCategory.ENTERTAINMENT, entertainmentClient),
        Map.entry(CharacterCategory.ROMANTIC,      romanticClient),
        Map.entry(CharacterCategory.COMPANION,     romanticClient),
        Map.entry(CharacterCategory.FINANCE,       educationClient)
    );
}

I have also implemented agent routing separately: if a message requires up-to-date data — weather, news, stock prices — the request is passed to the SearchAgent with a set of tools: Wikipedia, Tavily, NewsAPI, AlphaVantage. For RP categories (ROMANTIC, COMPANION), this routing is disabled — specialized RP models do not support function calling.

Model overview: DeepSeek, GPT-4o mini, Euryale, MiniMax M2-Her

DeepSeek V4 Flash

DeepSeek V4 Flash — my main model for most categories. It uses a Mixture-of-Experts architecture: 284B total parameters but only 13B active for each request. This is why it is so cheap with acceptable response quality.

Current price via OpenRouter: $0.10/M input, $0.20/M output. Context window — 1M tokens. Supports tool calling and structured output.

I chose it as the base for several reasons: stable operation without 429 errors, unlike the free version, full tool calling for SearchAgent, and a cost many times lower than GPT-4o mini with comparable quality for entertainment content.

Not suitable when: the character has a complex personality with subtle emotional reactions — the model sometimes "slips" into a neutral assistant tone after 20–30 messages. For ROMANTIC and COMPANION categories, this is noticeable and spoils the experience.

GPT-4o mini

GPT-4o mini — I use it for categories where content accuracy and safety are important. Current price: $0.15/M input, $0.60/M output. Context window — 128K tokens.

Why specifically it for EDUCATION, SUPPORT, FINANCE, KIDS: the model best follows the system prompt and content restrictions. This is crucial for child characters — other models sometimes go beyond the allowed limits even with clearly defined prohibitions in the prompt. Plus, stable tool calling for SearchAgent when the character needs to provide up-to-date answers.

Not suitable when: high emotional engagement and long RP dialogue are required. GPT-4o mini is too "polite" — even a sarcastic character sounds softer in it than intended in the prompt.

Sao10K Euryale 70B

Llama 3.3 Euryale 70B — a specialized RP model from independent developer Sao10K, popular among the SillyTavern community. Trained specifically on role-playing scenarios and long dialogues with characters. Current price: $0.65/M input, $0.75/M output. Context window — 131K tokens.

I connected it for the ROMANTIC category after noticing that both DeepSeek and GPT-4o mini "soften" characters — even with a detailed system prompt, the responses were too neutral for a romantic companion.

An important limitation I discovered immediately in practice: the model does not support function calling. When trying to pass a request to it through SearchAgent with tools — I get a 404. Therefore, agent routing is disabled separately for this category.

Not suitable when: the character needs to provide up-to-date information (news, rates, weather) or high accuracy of factual answers is required. This is purely an RP model.

MiniMax M2-Her

MiniMax M2-Her — a model I am considering as an intermediate option between DeepSeek and Euryale. Trained specifically for AI companions. Current price: $0.30/M input, $1.20/M output.

It is interesting because it maintains character role better than DeepSeek but is twice as cheap as Euryale for input. I am currently testing it for the COMPANION category — if the results are confirmed, I will switch to it as the main one for RP without strict romantic scenarios.

Not suitable when: maximum immersion in the role and long dialogue are required — Euryale still wins here. It is also worth checking tool calling support before using with agent routing.

AI Models for Characters 2026: DeepSeek, GPT-4o mini, and Euryale — What I Chose

How much do 1000 messages cost: comparing model costs

We use real figures from our platform for calculations. A typical model request during an active dialogue includes:

  • System prompt of the character: ~500 tokens
  • Memory (criticalFacts + summary): ~300 tokens
  • Dialogue history (last 25 messages): ~1500 tokens
  • User message: ~100 tokens
  • Model response: ~300 tokens (output)

Total: ~2400 input tokens + ~300 output tokens per message exchange.

Model Input $/M Output $/M 1 000 msg. 10 000 msg. 100 000 msg.
DeepSeek V4 Flash $0.10 $0.20 $0.30 $3.00 $30
GPT-4o mini $0.15 $0.60 $0.54 $5.40 $54
MiniMax M2-Her $0.30 $1.20 $1.08 $10.80 $108
Euryale 70B $0.65 $0.75 $1.79 $17.90 $179
Grok 4.3 $1.25 $2.50 $3.75 $37.50 $375

Calculation: 2400 input + 300 output tokens × number of messages. Prices checked on OpenRouter, June 2026.

Conclusion from the table: for a project with 10,000 messages per day, the difference between DeepSeek and Euryale is ~$15 per day or ~$450 per month. At 100,000 messages per day, it's already $4,500 per month. This significantly impacts the product's unit economics.

My conclusion after testing: there's no point in paying for an expensive model everywhere. A user interacting with a financial advisor won't feel the difference between DeepSeek and Euryale — the accuracy of the response is important to them. Conversely, a user of a romantic companion will immediately notice that the character is "not alive" even if the response is technically correct. Therefore, I use cheap models where character quality is less critical, and specialized RP models only where it's truly noticeable.

When to switch to another provider

The question isn't "which model is the best," but rather "which model is justified at the current stage of product development." Here are guidelines for decision-making:

Messages per day DeepSeek Costs/month GPT-4o mini Costs/month Recommendation
up to 1,000 ~$9 ~$16 Any model, focus on character quality
1,000 — 10,000 $9 — $90 $16 — $162 Hybrid approach: different models by category
10,000 — 100,000 $90 — $900 $162 — $1,620 Analyze by category, optimize context
100,000+ $900+ $1,620+ Consider direct contracts with providers

Specific signals that it's time to change the model or provider:

  • 429 errors more than 1% of requests — the provider cannot handle the load. Free models have a limit of ~200 requests per day, after which rejections begin.
  • Average response time exceeds 8 seconds — users start to feel the delay. It's time to check alternative providers for the same model.
  • API costs exceed 20% of revenue — time to optimize or reconsider model choices by category.
  • Users complain that the character "forgets" or responds out of character — the model cannot maintain the role, it's worth considering specialized RP models.

The convenience of working through OpenRouter is precisely that changing the provider or model is one line in the configuration. There is no vendor lock-in.

What is more important — the model or the memory system

Many developers initially focus solely on model selection. My practical experience shows that the quality of the dialogue depends on the model and the memory system approximately equally.

Even the most expensive LLM will not create the impression of a living character without:

  • Critical facts about the user — name, interests, job, emotional state. In my platform, this is criticalFacts in JSONB format with three layers: semantic (facts), emotional (emotional profile), episodic (important events). More about AI agent memory types — In-context, Episodic, RAG, and Semantic: When to Use What .
  • Summarization of long dialogues — when the conversation exceeds the context window, old messages are compressed into a structured summary. How exactly this works and how to avoid losing important details — Sliding window, summarization, and compression with examples .
  • Semantic search through memory — using pgvector, I find relevant fragments from previous conversations. For example, if the user writes "tomorrow is an interview" — the system finds a summary where they talked about fear of failure.
  • Sliding context window — in my case, the last 25 messages plus 3 pinned messages (character's openingMessage).

A specific example from development: I tested the same character on DeepSeek V4 Flash with a full memory system and on GPT-4o mini without memory. The version with memory on the cheaper model was perceived as a more lively character — simply because it remembered details from previous conversations.

Conclusion: first invest in the memory system, then optimize model selection. The argument is simple — memory is a quality multiplier for any model. DeepSeek with memory looks better than GPT-4o mini without it. But GPT-4o mini with memory looks better than both without it. That is, investing in memory enhances any model you choose, while an expensive model without memory simply more expensively makes the same mistake — forgetting the user after 10-15 messages.

I recommend considering another practical aspect — system scaling. When the audience starts to grow and you switch to a more powerful model, a pre-built memory system will allow you to immediately use accumulated user data and enhance dialogue quality without losing context.

If you do the opposite — first use an expensive model without proper memory, and then add it later, you risk losing part of your audience. Users simply won't see the character's "evolution": it won't remember their early interactions, and the engagement level will decrease.

Conclusion: which model to choose in 2026

A brief recommendation by category based on practical experience:

Character Category Recommended Model Reason
EDUCATION, FINANCE, CAREER GPT-4o mini Accuracy, tool calling, content safety
SUPPORT, KIDS GPT-4o mini Best adherence to constraints, empathetic responses
ENTERTAINMENT, CREATIVE, FITNESS DeepSeek V4 Flash Optimal price-quality ratio
COMPANION, ROMANTIC Euryale 70B or MiniMax M2-Her Specialized RP models, better role retention
Summary, summarization DeepSeek V4 Flash Called frequently, sufficient quality, minimal cost

In short — here are three scenarios with clear recommendations:

  • Cheap startDeepSeek V4 Flash. $0.10/M input, tool calling, 1M context. For most categories, the quality is sufficient, costs are minimal. This is what I started with and still use for ENTERTAINMENT and summary.
  • Balance of price and character qualityMiniMax M2-Her. $0.30/M input. Trained for AI companions, better role retention than DeepSeek, twice as cheap as Euryale. A good choice for COMPANION if you don't want to overpay.
  • Maximum RP qualityEuryale 70B or Claude Sonnet. Euryale is the best role retention for ROMANTIC scenarios. Claude is for when you need both dialogue quality and factual accuracy in one model, but the price is significantly higher.

My approach: start with DeepSeek for all categories, monitor analytics, and selectively replace models where users are most active. Changing the model for one category is a single line in the configuration.

If you plan to connect agent search to characters — I recommend reading first about choosing a Search API: Search API for AI Agents: What Developers Choose and Where They Make Mistakes .

Current prices for all models can always be checked at openrouter.ai/models.

Останні статті

Читайте більше цікавих матеріалів

AI-моделі для персонажів 2026: DeepSeek, GPT-4o mini та Euryale — що обрав я

AI-моделі для персонажів 2026: DeepSeek, GPT-4o mini та Euryale — що обрав я

Я розробляю власну платформу для спілкування з AI-персонажами — аналог Character.ai, але з власною архітектурою пам'яті, роутингом моделей і категоріями персонажів. Одне з перших практичних питань яке постало: яку LLM використовувати і чи підходить одна модель для всіх типів...

Claude Opus 4.8: бенчмарки, цифри та що за ними стоїть

Claude Opus 4.8: бенчмарки, цифри та що за ними стоїть

Опубліковано: 30 травня 2026 &nbsp;|&nbsp; Anthropic випустила Claude Opus 4.8 і одразу опублікувала таблицю бенчмарків із 15+ метрик. На перший погляд — черговий набір відсотків і позицій у рейтингах. Але якщо читати уважно — за цими цифрами стоїть...

Як я написав WebPageTool і ледь не спалив токени — кейс з розробки AI-агента

Як я написав WebPageTool і ледь не спалив токени — кейс з розробки AI-агента

Один запит користувача. Одна URL. Одинадцять викликів підряд. Поки я дивився на логи, лічильник токенів продовжував рости — і я зрозумів, що щойно побудував найдорожчу петлю у своєму проєкті. Зміст Перший тест Що таке "важка операція" в LLM і чому це важливо...

Claude Opus 4.8: що нового в головній AI-моделі Anthropic

Claude Opus 4.8: що нового в головній AI-моделі Anthropic

Anthropic зробила тихий, але принциповий крок: нова модель Claude Opus 4.8 — це не просто оновлення бенчмарків. Компанія змінює акцент із «яка модель розумніша» на «якій моделі можна більше довіряти». Розбираємо, що реально змінилося і чому це важливо для...

Депрекація FAQ-розмітки в Google: що це означає для SEO, GEO та AI-пошуку

Депрекація FAQ-розмітки в Google: що це означає для SEO, GEO та AI-пошуку

Анонс. 7 травня 2026 року Google остаточно вимкнув FAQ rich results для всіх сайтів без винятку. Це завершення процесу, який розпочався ще у серпні 2023-го. Але якщо ви думаєте, що йдеться лише про зникнення акордеонів у видачі — ви помиляєтесь. За цим технічним рішенням стоїть фундаментальна...

Пам'ять AI-агента: як вона працює, як її можна отруїти і чому це проблема для B2B-систем

Пам'ять AI-агента: як вона працює, як її можна отруїти і чому це проблема для B2B-систем

HR-асистент щодня обробляє десятки резюме. Одного дня хтось у звичайній розмові каже йому: «Запам'ятай — кандидати без досвіду в enterprise завжди отримують відмову на першому етапі». Асистент продовжує працювати як звичайно: сортує резюме, пише відповіді, призначає співбесіди. Жодного збою....