When building an AI chat, the first problem you encounter is that the model has no memory between requests.
Each message for it is all that exists.
Solving this is not as simple as it seems.
Why AI doesn't remember you — and it's not a bug
When people first interact with an AI character, they intuitively
expect the bot to remember them. That it knows what they talked about yesterday.
That it takes into account what you told it about yourself an hour ago.
This seems obvious — but it's not.
A language model has no memory between requests. Each message you
send is a separate, isolated call. The model doesn't "hear"
previous replies unless you explicitly provide them. There is no session
that is saved somewhere in the background. There is no internal conversation log.
There is only what is in the current request — and nothing more.
In practice, it looks like this: a user spends an hour telling a character
about their life — work, relationships, plans. And then starts a new conversation
and gets: "Hello! Tell me a little about yourself."
The bot is not pretending. It genuinely doesn't know. For it, that hour doesn't exist.
This is not a model bug or a limitation that will be fixed someday.
This is the fundamental architecture of LLMs. And the only person who can solve it is
the developer building a product on top of the model.
Passing the entire history is possible. But you will pay for it
The first solution that comes to mind is to simply pass the entire history
of messages in each request. The logic is simple: if the model sees the entire dialogue
from beginning to end — it remembers everything. The bot knows the user's name,
what they talked about, what they discussed five minutes ago.
The problem is solved. You can move on.
For the first ten messages, it works well. For twenty —
you start to notice that requests become heavier. For fifty —
you realize that each message carries the entire previous dialogue.
The model doesn't process one message — it rereads the entire conversation from scratch every time.
The response slows down. Token costs increase exponentially with each reply.
But the most unpleasant thing is not even that. Every model has a context limit —
the maximum number of tokens it can accept in a single request.
When the conversation reaches this limit, the model starts discarding
the oldest messages to accommodate new ones. These are the messages
where the user talked about themselves — they disappear first.
This creates a paradox: you pay more with each message —
and the memory still degrades over time. The simple solution doesn't scale.
A different one is needed.
How to make a bot remember the main thing — not everything
Sending everything is expensive and doesn't scale. Sending nothing makes the bot amnesiac. I decided that selective sending was necessary — but first, I had to figure out what exactly. After several attempts, I divided the context into three layers — each covering a different type of memory.
The first layer — pinned messages.
The very beginning of the conversation — the first few replies — is always present in the context, regardless of how many messages have accumulated since. Usually, it's the character's first reply: it sets the tone, character, and setting. Without it, the bot gradually "drifts" — starts responding out of character, loses its style, and becomes generic. This layer costs minimal tokens but maintains the character's personality throughout the conversation.
The second layer — sliding window.
The last N messages are always transmitted verbatim — this is the fresh thread of the conversation, what was just discussed. The window moves with the dialogue: a new message arrives, and the oldest one from the window leaves. This gives the bot a sense of the current context without having to pull the entire dialogue from the beginning.
The third layer — compressed memory.
Messages that leave the sliding window don't disappear without a trace. When the conversation reaches a certain length, old messages are compressed into a short, structured summary. Not verbatim — but the essence: what was discussed, what important things the user said, what topics were raised. The bot receives this summary along with the current window — and can rely on the past without paying for every old word.
When I put it all together, the bot stopped losing the thread even in very long conversations. Not because it remembers every word. But because it remembers the right things.
The sliding window remembers the conversation. But not the person
The three layers I described solve one specific problem — preserving the thread of the current conversation without transmitting the entire dialogue verbatim. They do this well. But there's another problem they don't solve at all.
Imagine a user interacting with a character for several weeks. During this time, they've revealed that they work as a developer, recently changed jobs, fear public speaking, and are learning Spanish to go to Barcelona. All of this was said in different conversations — and it all disappeared. Because the sliding window lives only within a single conversation. A new conversation is a clean slate.
The character greets you again as a stranger. Asks what you do again. Doesn't know your name again. Weeks of conversations — and zero accumulated knowledge about the person.
And this is important. Because it's this feeling — that you are remembered — that is the reason people return to a character again and again. Not for the features. Not for the accurate answers. But for the feeling that there is someone who knows who you are and what is important to you. Without this — any character remains just a chatbot with a nice avatar.
The sliding window and the summary are conversation memory. But human memory is also needed. Something that is stored not between messages — but between sessions. Forever.
Facts, emotions, events — three things a bot should know about you
I've added a separate layer of long-term memory that lives outside the conversation. It accumulates over time — and is transmitted to the character with every new dialogue. Regardless of when the last conversation took place — a week ago or a month. This layer consists of three types of memory that cover different needs.
Facts about the person.
Name, age, job, interests — everything the person has told about themselves and that doesn't change from conversation to conversation. This is the basic level: the character knows who they are talking to even before the conversation begins. They don't ask unnecessary questions. They don't introduce themselves as if to a stranger.
Emotional profile.
Fears, needs, communication style. This is the most important type of memory from the perspective of dialogue quality. If the character knows that the person fears criticism and needs support rather than advice — they will react differently. Not like a neutral assistant — but like someone who understands you.
Events.
Important things the person has mentioned in conversations: moved, argued with a friend, started learning a new language, changed jobs. These are not dry facts or emotions — they are life context. It's what gives the character the ability to ask a week later: "How did that interview you told me about go?" And this is no longer a bot — it's someone who truly listened.
In practice, this memory looks like a structured JSON stored in a database and transmitted to the character at the beginning of each new dialogue. Here's how it looks after a few conversations with a user:
// Long-term user memory
// accumulates automatically from each conversation
{
"semantic": {
// basic facts — who is this person
"name": "Andriy",
"occupation": "developer",
"interests": "AI, travel, Spanish language"
},
"emotional": {
// emotional profile — how to talk to them
"fears": "public speaking, criticism at work",
"needs": "support, not advice",
"communication_style": "direct, no unnecessary words"
},
"episodic": {
// life events — what's happening now
"recent_events": "changed jobs, preparing for a presentation, learning Spanish"
}
}
It is this object that the character receives at the beginning of each conversation — even before the user has typed the first word. There's no need to explain anything anew. No need to remind them who you are. The character already knows.
Why these specific attributes? I chose not for completeness — but for the ratio of value to token weight. Every field you add to this object is transmitted to the model with every request — forever. Name, job, fears, communication style — these are things that truly change the quality of the character's response. It's felt in the dialogue.
You can add more: favorite movies, political views, book lists. But there's a trap here. The more attributes there are — the more tokens are transmitted in each request. And tokens are not just money — they are also speed. The model processes longer contexts longer. The user waits for a response longer. And at some point, you pay for remembering a person's favorite color — which the model doesn't use in the dialogue anyway.
Therefore, the rule is simple: store only what the character can use in the next reply. Everything else is excess weight that slows down the response and eats up the budget without benefiting the user.
Together, these three types of memory give the character something that no sliding window can provide — the feeling that they know you. Not because they remember every word. But because they understand who you are, what is important to you, and what is happening in your life.
Does it work in practice
Yes. And the difference is noticeable after just a few conversations.
Previously, it looked like this: a user tells a character they are afraid of public speaking,
that it's hard for them to speak in front of an audience, that there's a big presentation at work tomorrow.
The next day, they open a new conversation — and receive a neutral greeting.
No mention. No "how are you?" with contextual understanding.
The character knows nothing. The conversation starts from scratch as always.
Now, the same situation looks different. The user opens a new conversation —
and the character already knows. Doesn't ask unnecessary questions. Can mention it themselves:
"How did that presentation you were so worried about go?"
Even if it was three conversations ago. Even if a week has passed.
This changes not only the quality of the dialogue — it changes the reason why people return.
The user doesn't return because the character is smart or responds beautifully.
They return because they feel remembered here.
And that's a different category of product.
But it's important to understand what's behind it. It's not magic or AGI.
A language model by itself remembers nothing — and will never remember
without external architecture. All it does well is use
what is passed to it. The developer's task is to decide what exactly to pass,
when, and in what form.
If your bot is still amnesiac — it's not the model's fault.
It's about what you're not passing to it.