Phase 2Intermediate⏱ 90 minutes

Memory &
Context Management

This intermediate-level Python tutorial teaches you how to transform stateless AI applications into intelligent chatbots with memory. Master conversation history, context management, and build applications that remember past interactions using LangChain's memory systems.

🎯

Learning Objectives

  • Understand why memory is crucial for conversational AI
  • Implement different types of memory (Buffer, Summary, Window)
  • Manage token limits and optimize memory usage
  • Build context-aware applications with persistent memory

📚 Prerequisites for This Intermediate Chatbot Tutorial

This tutorial is perfect for developers building chatbots and conversational AI. You should have:

  • • Completed Lessons 5-7 (LangChain basics, templates, and chains)
  • • Intermediate Python programming skills
  • • Basic understanding of chat applications
  • • Familiarity with async programming (helpful)
🧠

Why Memory Matters

Without memory, each interaction with an AI is isolated. The AI doesn't remember previous questions, context, or your preferences. This leads to frustrating experiences where you need to repeat information.

❌ Without Memory:

User: My name is Alice

AI: Nice to meet you!

User: What's my name?

AI: I don't have that information.

✅ With Memory:

User: My name is Alice

AI: Nice to meet you, Alice!

User: What's my name?

AI: Your name is Alice.

🎯 Key Use Cases for Memory:

  • Customer Support: Remember customer details and previous issues
  • Personal Assistants: Learn user preferences and habits
  • Educational Tutors: Track learning progress and adapt to student needs
  • Multi-step Tasks: Maintain context through complex workflows
💾

Basic Memory Implementation

Memory with RunnableWithMessageHistory

The modern approach to memory in LangChain uses RunnableWithMessageHistory for better integration with the LCEL (LangChain Expression Language) system:

🔍 Understanding the Code:

  • InMemoryChatMessageHistory: Stores messages in memory
  • RunnableWithMessageHistory: Automatically manages conversation history
  • Session Management: Supports multiple concurrent conversations
  • Automatic History: No manual message management needed
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_google_genai import ChatGoogleGenerativeAI
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Store for managing multiple conversation sessions
store = {}

def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    """Get or create a chat history for a session"""
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# Create a prompt with memory placeholder
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

# Initialize the model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

# Create a chain
chain = prompt | llm

# Wrap with message history
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Have a conversation (note the session_id in config)
config = {"configurable": {"session_id": "alice_session"}}

response1 = chain_with_history.invoke(
    {"input": "Hi, my name is Alice"}, 
    config=config
)
print("AI:", response1.content)

# Continue conversation - history is automatically managed!
response2 = chain_with_history.invoke(
    {"input": "What's my name?"}, 
    config=config
)
print("AI:", response2.content)

# Check the conversation history
history = get_session_history("alice_session")
print("\nConversation history:")
for message in history.messages:
    print(f"{message.__class__.__name__}: {message.content}")

💡 Expected Output:

AI: Hello Alice! Nice to meet you.
AI: Your name is Alice.

Conversation history:
HumanMessage: Hi, my name is Alice
AIMessage: Hello Alice! Nice to meet you.
HumanMessage: What's my name?
AIMessage: Your name is Alice.

✨ Advantages:

  • • Automatic history management - no manual message tracking
  • • Session support - handle multiple users/conversations
  • • Clean integration with LCEL chains
  • • Future-proof - this is the recommended approach

Managing Multiple Conversations

One of the key advantages of the modern approach is easy management of multiple concurrent conversations:

🔍 Key Features:

  • Session Isolation: Each session has its own memory
  • Easy Switching: Just change the session_id
  • Concurrent Users: Handle multiple users simultaneously
  • Memory Persistence: Can easily save/load sessions
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_google_genai import ChatGoogleGenerativeAI
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Store for managing multiple conversation sessions
store = {}

def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

# Set up the chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

chain = prompt | llm

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Simulate multiple users
users = [
    ("alice_session", "Hi, I'm Alice and I love Python"),
    ("bob_session", "Hello, I'm Bob and I prefer JavaScript"),
    ("alice_session", "What's my favorite language?"),
    ("bob_session", "What language do I prefer?"),
]

for session_id, message in users:
    config = {"configurable": {"session_id": session_id}}
    response = chain_with_history.invoke({"input": message}, config=config)
    print(f"[{session_id}] User: {message}")
    print(f"[{session_id}] AI: {response.content}")
    print()

# Show that memories are separate
print("\n=== Session Histories ===")
for session_id in ["alice_session", "bob_session"]:
    history = get_session_history(session_id)
    print(f"\n{session_id}:")
    for msg in history.messages:
        print(f"  {msg.__class__.__name__}: {msg.content[:50]}...")

💡 Expected Output:

[alice_session] User: Hi, I'm Alice and I love Python
[alice_session] AI: Hello Alice! It's great to meet a Python enthusiast...

[bob_session] User: Hello, I'm Bob and I prefer JavaScript
[bob_session] AI: Hi Bob! JavaScript is a great language...

[alice_session] User: What's my favorite language?
[alice_session] AI: Your favorite language is Python!

[bob_session] User: What language do I prefer?
[bob_session] AI: You prefer JavaScript!

Note: Each session maintains its own separate conversation history, allowing the AI to remember different users' preferences.

🔄

Advanced LangChain Memory Types: Window, Summary, and Custom Solutions

Window Memory Pattern

Keep only the last K interactions to manage token usage with a custom implementation:

🔍 How Window Memory Works:

  • k=3: Keeps only the last 3 conversation exchanges (6 messages total)
  • FIFO Queue: Oldest messages are dropped when limit is reached
  • Token Efficient: Prevents context from growing indefinitely
  • Trade-off: May lose important context from earlier in conversation
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import HumanMessage, AIMessage
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

class WindowedChatMessageHistory(InMemoryChatMessageHistory):
    """Chat message history that only keeps last k exchanges"""
    
    def __init__(self, k: int = 3):
        super().__init__()
        self._k = k  # Number of exchanges to keep (1 exchange = 2 messages)
        
    def add_messages(self, messages):
        """Add messages and maintain window size"""
        super().add_messages(messages)
        # Keep only last k exchanges (2k messages)
        if len(self.messages) > self._k * 2:
            self.messages = self.messages[-(self._k * 2):]

# Store for managing sessions with windowed memory
store = {}

def get_windowed_session_history(session_id: str) -> WindowedChatMessageHistory:
    if session_id not in store:
        store[session_id] = WindowedChatMessageHistory(k=3)  # Keep last 3 exchanges
    return store[session_id]

# Set up the chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

chain = prompt | llm

# Wrap with windowed message history
chain_with_window = RunnableWithMessageHistory(
    chain,
    get_windowed_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Simulate a longer conversation
config = {"configurable": {"session_id": "test_window"}}
responses = [
    "Tell me about Python",
    "What are decorators?",
    "Give me an example",
    "What about generators?",
    "How do they save memory?",
    "What did we discuss first?"  # This won't remember Python intro
]

for user_input in responses:
    print(f"\nUser: {user_input}")
    response = chain_with_window.invoke({"input": user_input}, config=config)
    print(f"AI: {response.content[:100]}...")  # Show first 100 chars

# Show what's in the windowed memory
print("\nWindow Memory (last 3 exchanges):")
history = get_windowed_session_history("test_window")
for msg in history.messages:
    print(f"{msg.__class__.__name__}: {msg.content[:50]}...")

💡 Expected Behavior:

The window memory correctly keeps only the last 3 exchanges (6 messages). When asking "What did we discuss first?", the memory doesn't contain the initial Python discussion.

Important: The AI might still attempt to answer based on context clues in the recent messages, but this answer may be incorrect since the actual first topic is no longer in memory. Check the printed "Window Memory" to see what's actually stored.

Note: This example shows how to implement window memory using the modern RunnableWithMessageHistory approach. By extending InMemoryChatMessageHistory, we can create custom memory behaviors while staying compatible with the latest LangChain architecture.

Summary Memory Pattern

Implement conversation summarization to preserve context while reducing tokens:

🔍 How Summary Memory Works:

  • LLM-Powered: Uses an LLM to create concise summaries
  • Token Limit: Triggers summarization when exceeding threshold
  • Progressive Summarization: Summarizes summary + new messages
  • Information Loss: Details may be lost in summarization
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import SystemMessage, HumanMessage, AIMessage
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

class SummarizedChatMessageHistory(InMemoryChatMessageHistory):
    """Chat message history that summarizes old messages"""
    
    def __init__(self, llm, max_messages: int = 10):
        super().__init__()
        self._llm = llm
        self._max_messages = max_messages
        self._summary = ""
        
    def add_messages(self, messages):
        """Add messages and summarize if needed"""
        super().add_messages(messages)
        
        # If we exceed the limit, summarize older messages
        if len(self.messages) > self._max_messages:
            # Get messages to summarize (all but the last 4)
            messages_to_summarize = self.messages[:-4]
            
            # Create a summary prompt
            summary_prompt = f"""Previous summary: {self._summary}
            
New messages to add to summary:
{self._format_messages(messages_to_summarize)}

Create a concise summary of the conversation so far, preserving key information."""
            
            # Get summary from LLM
            summary_response = self._llm.invoke(summary_prompt)
            self._summary = summary_response.content
            
            # Keep only the summary and recent messages
            self.messages = [
                SystemMessage(content=f"Conversation summary: {self._summary}")
            ] + self.messages[-4:]
    
    def _format_messages(self, messages):
        """Format messages for summarization"""
        formatted = []
        for msg in messages:
            role = "Human" if isinstance(msg, HumanMessage) else "AI"
            formatted.append(f"{role}: {msg.content}")
        return "\n".join(formatted)

# Initialize model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

# Store for managing sessions with summary memory
store = {}

def get_summary_session_history(session_id: str) -> SummarizedChatMessageHistory:
    if session_id not in store:
        store[session_id] = SummarizedChatMessageHistory(llm, max_messages=10)
    return store[session_id]

# Set up the chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

# Wrap with summary message history
chain_with_summary = RunnableWithMessageHistory(
    chain,
    get_summary_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Have a detailed conversation
config = {"configurable": {"session_id": "test_summary"}}
conversations = [
    "I want to build a web scraper for job listings. I need to scrape Indeed and LinkedIn.",
    "It should save data to PostgreSQL and send email alerts for new matches.",
    "The tech stack will be Python with BeautifulSoup and Selenium.",
    "Add scheduling with cron jobs to run daily.",
    "Include error handling and logging.",
    "What database did I mention earlier?"  # Should remember PostgreSQL
]

for msg in conversations:
    response = chain_with_summary.invoke({"input": msg}, config=config)
    print(f"User: {msg}")
    print(f"AI: {response.content[:150]}...")
    print()

# Show the summarized memory
print("\nMemory State:")
history = get_summary_session_history("test_summary")
for msg in history.messages:
    if isinstance(msg, SystemMessage):
        print(f"Summary: {msg.content}")
    else:
        print(f"{msg.__class__.__name__}: {msg.content[:50]}...")

💡 Example Summary Output:

The user wants to build a web scraper for job listings from 
Indeed and LinkedIn. The data will be stored in PostgreSQL with 
email alerts for new matches. Technology stack: Python with 
BeautifulSoup and Selenium.

🎯 When to Use Each:

Buffer: Short conversations, need exact history

Window: Ongoing conversations, recent context matters most

Summary: Long conversations, key points matter more than exact words

Hybrid Memory Pattern (Summary + Buffer)

Combine the best of both approaches - keep recent messages in full detail while summarizing older ones:

🔍 Hybrid Memory Strategy:

  • Recent Messages: Kept in full detail for immediate context
  • Older Messages: Automatically summarized to save tokens
  • Smart Threshold: Summarizes when token limit is exceeded
  • Best Use Case: Long technical discussions needing both detail and history
# Note: ConversationSummaryBufferMemory is deprecated, but the concept
# can be implemented using the modern approach by combining our
# WindowedChatMessageHistory and SummarizedChatMessageHistory patterns

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import SystemMessage, HumanMessage, AIMessage
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

class HybridChatMessageHistory(InMemoryChatMessageHistory):
    """Hybrid memory: summarize old messages, keep recent ones in full"""
    
    def __init__(self, llm, buffer_size: int = 4, max_messages: int = 6):
        super().__init__()
        self._llm = llm
        self._buffer_size = buffer_size  # Messages to keep in full
        self._max_messages = max_messages  # Total messages before summarizing
        self._summary = ""
        
    def add_messages(self, messages):
        """Add messages and apply hybrid strategy"""
        super().add_messages(messages)
        
        # If we exceed the limit, summarize older messages
        if len(self.messages) > self._max_messages:
            # Messages to summarize (all except the buffer)
            messages_to_summarize = self.messages[:-self._buffer_size]
            
            # Create summary including previous summary
            summary_prompt = f"""Previous summary: {self._summary}
            
New messages to add to summary:
{self._format_messages(messages_to_summarize)}

Create a concise summary preserving key technical details and decisions."""
            
            # Get summary
            summary_response = self._llm.invoke(summary_prompt)
            self._summary = summary_response.content
            
            # Keep summary + recent buffer
            self.messages = [
                SystemMessage(content=f"Previous conversation summary: {self._summary}")
            ] + self.messages[-self._buffer_size:]
    
    def _format_messages(self, messages):
        """Format messages for summarization"""
        formatted = []
        for msg in messages:
            role = "Human" if isinstance(msg, HumanMessage) else "AI"
            formatted.append(f"{role}: {msg.content}")
        return "\n".join(formatted)

# Initialize model
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

# Store for sessions
store = {}

def get_hybrid_session_history(session_id: str) -> HybridChatMessageHistory:
    if session_id not in store:
        store[session_id] = HybridChatMessageHistory(llm, buffer_size=4, max_messages=6)
    return store[session_id]

# Set up chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful technical assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

chain = prompt | llm

chain_with_hybrid = RunnableWithMessageHistory(
    chain,
    get_hybrid_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

# Example: Technical discussion that evolves
config = {"configurable": {"session_id": "tech_discussion"}}
topics = [
    "I'm building a REST API with FastAPI",
    "I need authentication with JWT tokens",
    "The database will be PostgreSQL with SQLAlchemy",
    "How should I structure the project folders?",
    "What about testing strategies?",
    "Should I use Docker for deployment?"
]

for topic in topics:
    print(f"\nUser: {topic}")
    response = chain_with_hybrid.invoke({"input": topic}, config=config)
    print(f"AI: {response.content[:150]}...")

# Show hybrid memory state
print("\nHybrid Memory State:")
history = get_hybrid_session_history("tech_discussion")
for i, msg in enumerate(history.messages):
    if isinstance(msg, SystemMessage):
        print(f"[Summary] {msg.content[:100]}...")
    else:
        role = "Human" if isinstance(msg, HumanMessage) else "AI"
        print(f"[{role}] {msg.content[:50]}...")

💡 Expected Output:

User: I'm building a REST API with FastAPI
AI: Okay, that's great! FastAPI is an excellent choice...

[... 3 more exchanges ...]

User: What about testing strategies?
AI: Testing is absolutely critical for building reliable...

User: Should I use Docker for deployment?
AI: Yes, using Docker for deployment is highly recommended...

Hybrid Memory State:
[Summary] Previous conversation summary: User is building a FastAPI 
REST API with JWT authentication and PostgreSQL/SQLAlchemy database...
[Human] What about testing strategies?...
[AI] Testing is absolutely critical for building reliable...
[Human] Should I use Docker for deployment?...
[AI] Yes, using Docker for deployment is highly recommended...

🔍 How It Works:

  • Threshold Reached: After 3 exchanges (6 messages), summarization triggers
  • Summary Creation: First 4 messages are summarized by the LLM
  • Buffer Retention: Last 4 messages (2 exchanges) kept in full detail
  • Continuous Process: As new messages arrive, oldest buffer messages move to summary

🎯 Perfect For:

  • Technical Discussions: Preserve decisions while keeping recent details
  • Project Planning: Remember requirements while discussing implementation
  • Customer Support: Maintain issue history while handling current problem
  • Long Conversations: Balance between context preservation and token efficiency

⚡ Implementation Note:

This custom implementation shows how to extend InMemoryChatMessageHistory to create sophisticated memory behaviors. By combining summarization (for history) with buffering (for recent context), we get the best of both worlds: comprehensive history without token explosion.

📐

Context Management Strategies

Smart Context Selection

Not all context is equally important. Here's how to prioritize:

🔍 Understanding Smart Context Selection:

  • Token Counting: Uses tiktoken to accurately count tokens
  • Relevance Scoring: Prioritizes messages by recency and keyword matching
  • Smart Selection: Keeps most relevant messages within token budget
  • Maintains Order: Preserves chronological order for coherent context
from typing import List
from langchain.schema import BaseMessage, HumanMessage, AIMessage

class SmartMemoryManager:
    def __init__(self, max_tokens: int = 2000):
        self.max_tokens = max_tokens
        
    def count_tokens(self, text: str) -> int:
        """Estimate token count (rough approximation)
        
        For English text:
        - 1 token ≈ 4 characters
        - 1 token ≈ 0.75 words
        
        This is a simplified estimation. For production use with Gemini,
        use the API's built-in token counting.
        """
        # Simple character-based estimation
        return len(text) // 4
    
    def select_relevant_context(
        self, 
        messages: List[BaseMessage], 
        current_query: str
    ) -> List[BaseMessage]:
        """Select most relevant messages within token limit"""
        
        # Always include system message if present
        selected = []
        token_count = 0
        
        # Priority order:
        # 1. System messages
        # 2. Recent messages (last 3)
        # 3. Messages containing key terms from current query
        
        # Get key terms from current query
        key_terms = set(current_query.lower().split())
        
        # Score messages by relevance
        scored_messages = []
        for i, msg in enumerate(messages):
            score = 0
            msg_text = msg.content.lower()
            
            # Recent messages get higher score
            if i >= len(messages) - 3:
                score += 10
                
            # Messages with key terms get bonus
            matching_terms = sum(1 for term in key_terms if term in msg_text)
            score += matching_terms * 5
            
            scored_messages.append((score, i, msg))
        
        # Sort by score and select within token limit
        scored_messages.sort(reverse=True, key=lambda x: x[0])
        
        for score, idx, msg in scored_messages:
            msg_tokens = self.count_tokens(msg.content)
            if token_count + msg_tokens <= self.max_tokens:
                selected.append(msg)
                token_count += msg_tokens
                
        return sorted(selected, key=lambda x: messages.index(x))

# Usage example with lower token limit for demonstration
memory_manager = SmartMemoryManager(max_tokens=100)  # Very low limit to force selection

# In your chain
def get_relevant_history(query: str, full_history: List[BaseMessage]):
    return memory_manager.select_relevant_context(full_history, query)

# Example with sample messages
from langchain.schema import HumanMessage, AIMessage

# Sample conversation history (longer messages to consume more tokens)
messages = [
    HumanMessage(content="Hi, I'm learning Python and want to understand programming fundamentals"),
    AIMessage(content="Great! Python is a wonderful language to learn. It's known for its simplicity and readability."),
    HumanMessage(content="I want to build a web scraper to extract data from websites"),
    AIMessage(content="Web scraping is a useful skill. You can use libraries like BeautifulSoup for parsing HTML and Selenium for dynamic sites."),
    HumanMessage(content="Tell me about databases and how they work with applications"),
    AIMessage(content="Databases store and organize data efficiently. Popular ones include PostgreSQL for relational data and MongoDB for documents."),
    HumanMessage(content="How do I connect Python to a database for my projects?"),
    AIMessage(content="You can use libraries like psycopg2 for PostgreSQL, pymongo for MongoDB, or SQLAlchemy as an ORM for multiple databases."),
]

# Select relevant context for a new query
current_query = "What libraries do I need for web scraping?"
relevant_messages = memory_manager.select_relevant_context(messages, current_query)

print(f"Query: {current_query}")
print(f"Token limit: {memory_manager.max_tokens}")
print(f"Selected {len(relevant_messages)} out of {len(messages)} messages")
print(f"\nSelected messages (most relevant within token limit):")

# Calculate approximate tokens for each selected message
total_tokens = 0
for msg in relevant_messages:
    tokens = memory_manager.count_tokens(msg.content)
    total_tokens += tokens
    print(f"- {msg.type} ({tokens} tokens): {msg.content[:50]}...")

print(f"\nTotal tokens used: {total_tokens}/{memory_manager.max_tokens}")
print(f"\nNote: Messages about 'web scraping' and 'libraries' received higher scores")

📏 Token Counting Note:

This example uses a simple character-based token estimation (1 token ≈ 4 characters). While this is less accurate than using a proper tokenizer, it demonstrates the concept without requiring additional dependencies.

For production: Use Gemini's built-in model.count_tokens() method for accurate token counting.

💡 How It Works:

With a 100-token limit, the system must be selective. It prioritizes messages containing keywords from the query ("libraries", "web", "scraping") and recent messages. The algorithm:

  1. Scores each message based on keyword matches and recency
  2. Sorts messages by score (highest first)
  3. Adds messages until the token limit is reached
  4. Returns selected messages in chronological order

🎯 Context Optimization Tips:

  • • Prioritize recent messages for continuity
  • • Keep messages with entities (names, dates, numbers)
  • • Preserve messages that establish context or goals
  • • Consider semantic similarity to current query

🚀 Limitations & Next Steps:

This example uses simple keyword matching, which has limitations:

  • • May miss semantically related content (e.g., "web scraping" vs "data extraction")
  • • Can be fooled by keyword presence without actual relevance
  • • Doesn't understand context or meaning

For production systems, consider: Semantic search with embeddings, vector databases (Pinecone, Weaviate), or LangChain's built-in retrieval methods for more intelligent context selection.

💿

Persistent Memory Storage

Saving Conversations Across Sessions

For production applications, you need to persist memory between sessions:

🔍 Key Features of This Implementation:

  • File-Based Storage: Saves conversations as JSON files per user
  • Automatic Loading: Restores previous conversation on startup
  • Error Handling: Gracefully handles missing or corrupted files
  • Clear Function: Allows users to reset their conversation history
import json
import os
from datetime import datetime
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import messages_from_dict, messages_to_dict, HumanMessage, AIMessage
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

class PersistentChatMessageHistory(InMemoryChatMessageHistory):
    """Custom chat history that persists to disk"""
    def __init__(self, user_id: str, storage_dir: str = "./memory_store"):
        super().__init__()
        # Use private attributes to avoid Pydantic validation
        # InMemoryChatMessageHistory uses Pydantic, which doesn't allow arbitrary attributes
        self._user_id = user_id
        self._storage_dir = storage_dir
        self._memory_file = os.path.join(storage_dir, f"{user_id}_memory.json")
        
        # Create storage directory if it doesn't exist
        os.makedirs(storage_dir, exist_ok=True)
        
        # Load existing messages
        self.load_messages()
    
    def load_messages(self):
        """Load messages from disk"""
        if os.path.exists(self._memory_file):
            try:
                with open(self._memory_file, 'r') as f:
                    data = json.load(f)
                    messages = messages_from_dict(data['messages'])
                    self.messages.extend(messages)
                print(f"Loaded {len(messages)} messages from memory")
            except Exception as e:
                print(f"Error loading memory: {e}")
    
    def save_messages(self):
        """Save messages to disk"""
        try:
            messages_dict = messages_to_dict(self.messages)
            data = {
                'user_id': self._user_id,
                'last_updated': datetime.now().isoformat(),
                'messages': messages_dict
            }
            with open(self._memory_file, 'w') as f:
                json.dump(data, f, indent=2)
        except Exception as e:
            print(f"Error saving memory: {e}")
    
    def add_message(self, message):
        """Override to save after adding message"""
        super().add_message(message)
        self.save_messages()
    
    def clear(self):
        """Clear all messages and delete file"""
        super().clear()
        if os.path.exists(self._memory_file):
            os.remove(self._memory_file)
        print(f"Cleared memory for user {self._user_id}")

# Store for managing multiple user sessions
session_stores = {}

def get_persistent_session_history(user_id: str):
    """Get or create persistent session history for a user"""
    if user_id not in session_stores:
        session_stores[user_id] = PersistentChatMessageHistory(user_id)
    return session_stores[user_id]

# Usage example
def chat_with_persistent_memory(user_id: str):
    # Initialize LLM
    llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
    
    # Create prompt template
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant. Remember our conversation history."),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{input}")
    ])
    
    # Create chain
    chain = prompt | llm
    
    # Wrap with message history
    chain_with_history = RunnableWithMessageHistory(
        chain,
        lambda: get_persistent_session_history(user_id),
        input_messages_key="input",
        history_messages_key="history",
    )
    
    # Chat loop
    print(f"\nChatting as user: {user_id}")
    print("Type 'quit' to exit, 'clear' to clear memory\n")
    
    while True:
        user_input = input("You: ")
        
        if user_input.lower() == 'quit':
            break
        elif user_input.lower() == 'clear':
            get_persistent_session_history(user_id).clear()
            print("Memory cleared!")
            continue
            
        # Get response
        response = chain_with_history.invoke(
            {"input": user_input},
            config={"configurable": {"session_id": user_id}}
        )
        print(f"AI: {response.content}")

# Try it with different users
chat_with_persistent_memory("alice")
# Run again - memory will be loaded!
# chat_with_persistent_memory("alice")

💡 How to Use:

# First session
>>> chat_with_persistent_memory("alice")
You: Hi, I'm learning Python
AI: That's great! Python is a wonderful language...
You: quit

# Later session - memory is restored!
>>> chat_with_persistent_memory("alice")
You: What was I learning?
AI: You mentioned you were learning Python...

🏢 Production Considerations:

  • • Use a database (PostgreSQL, MongoDB) instead of JSON files
  • • Implement user authentication and access control
  • • Add encryption for sensitive conversations
  • • Set up automatic cleanup for old conversations
  • • Consider GDPR compliance for user data

✨ Memory Best Practices

Design Decisions

  • • Choose memory type based on conversation length
  • • Consider token costs vs. context quality
  • • Implement memory limits to prevent abuse
  • • Plan for memory migration as you scale

Performance Tips

  • • Cache frequently accessed memories
  • • Use async operations for memory I/O
  • • Implement memory compression for storage
  • • Monitor memory usage and costs

💡Common Memory Patterns

Customer Support Bot

Buffer (current issue) + Summary (customer history) + Entity extraction (order IDs, etc.)

Personal Assistant

Window (recent context) + Long-term memory (preferences) + Calendar integration

Code Assistant

Project context + Current file + Recent edits + Error history

🎉 Next Step: From Chatbots to Intelligent Agents

Excellent work! You've mastered LangChain memory management - a crucial skill for building intelligent chatbots and conversational AI. You can now create applications that remember context, manage conversation history, and provide personalized experiences.

Ready to take your AI to the next level? In the next lesson, you'll learn about Agents and Tools - how to create AI systems that can take actions, use external tools, and solve complex problems autonomously.