Phase 3Advanced⏱ 90 minutes

Vector Databases &
Storage Solutions

Learn how to use vector databases with LangChain for retrieval augmented generation. This tutorial covers ChromaDB, Pinecone, Weaviate, Qdrant, and FAISS implementation for production RAG applications.

🎯

Learning Objectives

  • Compare ChromaDB, Pinecone, Weaviate, Qdrant, and FAISS for LangChain RAG
  • Learn how to implement vector search with LangChain vector stores
  • Master LangChain vector store implementation and optimization
  • Build production-ready RAG vector database systems
🗄️

How to Use Vector Databases with LangChain for RAG

🔍 Understanding Your Options

Learn how to implement vector search for retrieval augmented generation using LangChain vector stores. Compare ChromaDB, Pinecone, Weaviate, Qdrant, Milvus, and FAISS for your RAG applications.

🌲Pinecone

Popular LangChain vector store for cloud-based retrieval augmented generation.

Serverless, no infrastructure
Real-time updates
Metadata filtering
Requires API key

🎨ChromaDB

Most popular open-source LangChain vector store for RAG tutorials.

Run locally or cloud
Simple API
Persistent storage
Limited scaling

FAISS

Facebook's vector database for large-scale RAG applications with LangChain.

Extremely fast
GPU support
Multiple algorithms
No built-in persistence

🌟 Other Popular Vector Databases for RAG:

  • Weaviate: Open-source vector database with hybrid search for LangChain RAG
  • Qdrant: High-performance vector similarity search for retrieval augmented generation
  • Milvus: Cloud-native embedding database supporting billion-scale RAG applications

Feature Comparison

FeaturePineconeChromaDBFAISS
DeploymentCloudLocal/CloudLocal
ScaleBillionsMillionsBillions
Real-time UpdatesLimited
Cost$$FreeFree
💻

LangChain Vector Store Implementation Tutorial

🌲How to Implement Pinecone Vector Search for RAG

from langchain.vectorstores import Pinecone
from langchain.embeddings import GoogleGenerativeAIEmbeddings
import pinecone
import os

# Initialize Pinecone
pinecone.init(
    api_key=os.getenv("PINECONE_API_KEY"),
    environment=os.getenv("PINECONE_ENV")
)

# Create or connect to index
index_name = "my-rag-index"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        name=index_name,
        dimension=768,  # Gemini embedding dimension
        metric="cosine",
        pod_type="p1.x1"
    )

# Initialize embeddings
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001"
)

# Create vector store
vectorstore = Pinecone.from_existing_index(
    index_name=index_name,
    embedding=embeddings,
    namespace="documents"  # Optional namespace
)

# Add documents with metadata
documents = [
    {
        "page_content": "LangChain is a framework for building LLM applications.",
        "metadata": {"source": "docs", "topic": "introduction"}
    },
    {
        "page_content": "Vector databases store embeddings for similarity search.",
        "metadata": {"source": "tutorial", "topic": "databases"}
    }
]

# Add to index
vectorstore.add_texts(
    texts=[doc["page_content"] for doc in documents],
    metadatas=[doc["metadata"] for doc in documents]
)

# Search with metadata filtering
results = vectorstore.similarity_search(
    "What is LangChain?",
    k=3,
    filter={"source": "docs"}  # Metadata filter
)

📖 Code Explanation:

  • pinecone.init(): Connects to Pinecone cloud service using API credentials
  • create_index(): Creates new index with 768 dimensions (matching Gemini embeddings)
  • metric="cosine": Uses cosine similarity for vector comparisons
  • namespace: Logical partitions within an index for organizing data
  • metadata filtering: Query only documents matching specific criteria

🎯 Pinecone Key Concepts:

Index = Database:

Each index is like a database table optimized for vector search. Choose pod type based on performance needs: p1 (storage), p2 (performance), s1 (fastest).

Namespaces = Partitions:

Use namespaces to separate data logically (e.g., by user, project, or document type) without creating multiple indexes.

💡 Expected Behavior:

After running this code, you'll have documents indexed in Pinecone cloud. The search returns the most semantically similar documents, filtered by metadata.

⚠️ Production Tips:

  • Batch operations: Use upsert() with batches of 100 vectors
  • Error handling: Implement exponential backoff for rate limits
  • Cost optimization: Delete unused indexes, use appropriate pod types
  • Monitoring: Track index stats via Pinecone console

🎨ChromaDB Tutorial: Local Vector Store for Retrieval Augmented Generation

from langchain.vectorstores import Chroma
from langchain.embeddings import GoogleGenerativeAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader

# Initialize embeddings
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001"
)

# Load and split documents
loader = DirectoryLoader("./documents", glob="**/*.txt")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
splits = text_splitter.split_documents(documents)

# Create persistent Chroma database
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory="./chroma_db",
    collection_name="my_collection"
)

# Persist to disk
vectorstore.persist()

# Load existing database
loaded_vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings,
    collection_name="my_collection"
)

# Advanced search with scores
results_with_scores = loaded_vectorstore.similarity_search_with_score(
    "How to build RAG applications?",
    k=5
)

for doc, score in results_with_scores:
    print(f"Score: {score:.3f}")
    print(f"Content: {doc.page_content[:100]}...")
    print(f"Metadata: {doc.metadata}\n")

📖 Code Explanation:

  • DirectoryLoader: Loads all .txt files from a directory recursively
  • RecursiveCharacterTextSplitter: Splits documents into overlapping chunks
  • persist_directory: Local folder where Chroma stores the vector database
  • collection_name: Named collection within the database (like a table)
  • similarity_search_with_score(): Returns documents with similarity scores

🎯 Chroma Architecture:

Storage Structure:
  • • SQLite for metadata
  • • Parquet files for embeddings
  • • DuckDB for queries
  • • Automatic persistence
Collection Features:
  • • Multiple collections per DB
  • • Metadata filtering
  • • Full-text search support
  • • Automatic ID generation

💡 Expected Output:

Score: 0.234
Content: RAG applications combine retrieval with generation for accurate responses...
Metadata: {'source': 'documents/rag_guide.txt', 'page': 1}

Score: 0.356
Content: Building RAG systems requires vector databases and embedding models...
Metadata: {'source': 'documents/tutorial.txt', 'page': 3}

⚠️ Chroma Best Practices:

  • Persistence: Always call persist() after updates
  • IDs: Let Chroma auto-generate IDs unless you need specific ones
  • Batch size: Add documents in batches of 5000 for optimal performance
  • Client/Server: Use server mode for production deployments

FAISS Vector Database: Scalable RAG with LangChain

from langchain.vectorstores import FAISS
from langchain.embeddings import GoogleGenerativeAIEmbeddings
import pickle

# Initialize embeddings
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001"
)

# Create FAISS index from texts
texts = [
    "FAISS is optimized for similarity search.",
    "It supports both CPU and GPU operations.",
    "FAISS can handle billions of vectors efficiently.",
    "Multiple index types are available for different use cases."
]

metadata = [
    {"id": 1, "category": "intro"},
    {"id": 2, "category": "features"},
    {"id": 3, "category": "scale"},
    {"id": 4, "category": "types"}
]

# Create FAISS vector store
vectorstore = FAISS.from_texts(
    texts=texts,
    embedding=embeddings,
    metadatas=metadata
)

# Save index to disk
vectorstore.save_local("faiss_index")

# Load index from disk
loaded_vectorstore = FAISS.load_local(
    "faiss_index", 
    embeddings,
    allow_dangerous_deserialization=True
)

# Different search methods
# 1. Basic similarity search
results = loaded_vectorstore.similarity_search(
    "How many vectors can FAISS handle?",
    k=2
)

# 2. Search with distance scores
results_with_scores = loaded_vectorstore.similarity_search_with_score(
    "GPU support",
    k=2
)

# 3. Maximum marginal relevance search (diversity)
diverse_results = loaded_vectorstore.max_marginal_relevance_search(
    "FAISS features",
    k=3,
    fetch_k=10,  # Fetch more candidates
    lambda_mult=0.5  # Balance relevance and diversity
)

print("Standard Search Results:")
for doc in results:
    print(f"- {doc.page_content}")

print("\nSearch with Scores:")
for doc, score in results_with_scores:
    print(f"- [{score:.3f}] {doc.page_content}")

📖 Code Explanation:

  • FAISS.from_texts(): Creates index from text directly (handles embedding internally)
  • save_local()/load_local(): Serializes index to disk for persistence
  • allow_dangerous_deserialization: Required when loading pickled data
  • max_marginal_relevance_search(): Balances relevance with diversity in results
  • fetch_k: Number of candidates to consider before applying MMR algorithm

🎯 FAISS Index Types:

Flat (Default):

Exact search, no approximation. Best accuracy but O(n) search time. Good for <100k vectors.

IVF (Inverted File):

Clusters vectors, searches nearby clusters. Faster but approximate. Good for millions of vectors.

HNSW (Hierarchical NSW):

Graph-based index, excellent recall/speed trade-off. Best for production use cases.

💡 Search Method Comparison:

Standard Search Results:
- FAISS can handle billions of vectors efficiently.
- FAISS is optimized for similarity search.

Search with Scores:
- [0.123] It supports both CPU and GPU operations.
- [0.234] Multiple index types are available for different use cases.

MMR (Diverse) Results:
- FAISS is optimized for similarity search.
- It supports both CPU and GPU operations.
- Multiple index types are available for different use cases.

⚠️ FAISS Optimization:

  • GPU acceleration: Use faiss.index_cpu_to_gpu() for 10-100x speedup
  • Index training: Train IVF indices on representative data subset
  • Quantization: Use PQ (Product Quantization) to reduce memory 8-32x
  • Batch search: Search multiple queries simultaneously for efficiency
🚀

Optimizing LangChain Vector Stores for Production RAG

📊 Indexing Strategies

  • • Use appropriate index types (IVF, HNSW, LSH)
  • • Configure nlist and nprobe parameters
  • • Pre-train indices on representative data
  • • Implement sharding for large datasets

⚡ Query Optimization

  • • Batch queries when possible
  • • Use metadata pre-filtering
  • • Implement caching strategies
  • • Optimize embedding dimensions

💾 Storage Optimization

  • • Compress embeddings with PQ/OPQ
  • • Use appropriate data types
  • • Implement efficient serialization
  • • Clean up outdated vectors

🔧 System Optimization

  • • Use GPU acceleration when available
  • • Optimize memory allocation
  • • Implement connection pooling
  • • Monitor and profile performance
🔬

Advanced Techniques

Hybrid Search Implementation

🔍 Understanding Hybrid Search:

Hybrid search combines semantic search (embeddings) with keyword search (BM25) to get the best of both worlds. Semantic search understands meaning, while keyword search excels at exact matches and rare terms.

from langchain.retrievers import EnsembleRetriever
from langchain.retrievers import BM25Retriever
from langchain.vectorstores import Chroma

# Create vector store for semantic search
vectorstore = Chroma(
    persist_directory="./chroma_db",
    embedding_function=embeddings
)
vector_retriever = vectorstore.as_retriever(
    search_kwargs={"k": 5}
)

# Create BM25 retriever for keyword search
documents = vectorstore.get()
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 5

# Combine retrievers with weights
ensemble_retriever = EnsembleRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    weights=[0.7, 0.3]  # 70% semantic, 30% keyword
)

# Use hybrid search
results = ensemble_retriever.get_relevant_documents(
    "What are the benefits of vector databases?"
)

# Metadata filtering with complex queries
from langchain.schema import Document

def advanced_search(query, filters):
    # Semantic search with pre-filtering
    vector_results = vectorstore.similarity_search(
        query,
        k=10,
        filter=filters
    )
    
    # Post-process results
    processed_results = []
    for doc in vector_results:
        # Custom scoring logic
        relevance_score = calculate_custom_score(doc, query)
        if relevance_score > 0.7:
            processed_results.append({
                "document": doc,
                "score": relevance_score,
                "highlights": extract_highlights(doc, query)
            })
    
    return sorted(processed_results, key=lambda x: x["score"], reverse=True)

# Example usage
results = advanced_search(
    "machine learning applications",
    filters={
        "$and": [
            {"category": {"$in": ["ml", "ai"]}},
            {"date": {"$gte": "2023-01-01"}},
            {"quality_score": {"$gt": 0.8}}
        ]
    }
)

📖 Code Explanation:

  • EnsembleRetriever: Combines multiple retrievers with weighted scoring
  • BM25Retriever: Traditional keyword-based search algorithm (Best Match 25)
  • weights=[0.7, 0.3]: 70% weight to semantic search, 30% to keyword search
  • Complex filters: MongoDB-style query syntax for advanced filtering
  • Post-processing: Custom scoring and highlight extraction for better UX

🎯 When to Use Hybrid Search:

✅ Good Use Cases:
  • • Technical documentation
  • • Legal documents
  • • Product catalogs
  • • Mixed query types
🤔 Consider Pure Semantic:
  • • Conversational queries
  • • Conceptual search
  • • Multi-language content
  • • Synonym-heavy domains

⚠️ Advanced Filtering Tips:

  • Pre-filter vs Post-filter: Pre-filtering is faster but may miss relevant results
  • Index metadata: Only index fields you'll actually filter on
  • Denormalize data: Store computed values rather than calculating at query time
  • Test filter performance: Complex filters can significantly impact speed

✨ Best Practices for Production

Data Management

  • • Version your embeddings and models
  • • Implement incremental updates
  • • Monitor index quality metrics
  • • Plan for data growth and scaling

Security & Reliability

  • • Implement access controls
  • • Regular backups and disaster recovery
  • • Monitor performance and availability
  • • Use connection retry logic

⚠️ Common Pitfalls to Avoid

Embedding Dimension Mismatch

Problem: Using different embedding models for indexing and querying
Solution: Always use the same embedding model and dimension

Ignoring Metadata

Problem: Not utilizing metadata for filtering and context
Solution: Design comprehensive metadata schema from the start

Poor Chunk Strategy

Problem: Chunks too large or too small for effective retrieval
Solution: Experiment with chunk sizes and overlap for your use case

🎉 Next Steps

Excellent work mastering LangChain vector stores for retrieval augmented generation! You now know how to implement ChromaDB, Pinecone, Weaviate, and other vector databases with LangChain. Next, you'll build your first complete retrieval augmented generation system from scratch.