Initial commit: multimodal RAG guide with Claude Code
Prompt-driven guide for building multimodal search using Gemini Embedding 2 + Pinecone + Claude Code. Includes example data (NASA public domain), step-by-step prompts, concepts explainer, cost breakdown, and troubleshooting guide.
This commit is contained in:
commit
edcd1721df
19 changed files with 4446 additions and 0 deletions
94
concepts.md
Normal file
94
concepts.md
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
# Concepts: What You Need to Know (and Nothing More)
|
||||
|
||||
This page explains the key ideas behind multimodal search.
|
||||
You do not need to understand these concepts to follow the guide.
|
||||
But if you are curious about what is happening behind the scenes,
|
||||
this is for you.
|
||||
|
||||
## What is an embedding?
|
||||
|
||||
Think of it as a fingerprint for meaning.
|
||||
|
||||
When you read the sentence "Jupiter is the largest planet," your brain
|
||||
understands what it means. An embedding is a way for a computer to do
|
||||
something similar. It converts text (or an image) into a long list of
|
||||
numbers that captures the meaning of that content.
|
||||
|
||||
The key insight: things that mean similar things get similar numbers.
|
||||
So "Jupiter is massive" and "Jupiter is the biggest planet" would have
|
||||
very similar embeddings, even though the words are different.
|
||||
|
||||
You never see these numbers. They work behind the scenes.
|
||||
|
||||
## What is a vector database?
|
||||
|
||||
A place to store embeddings so you can search through them quickly.
|
||||
|
||||
Imagine a library where books are not organized by author or title,
|
||||
but by what they are about. You walk in and say "I want something
|
||||
about storms on other planets" and the librarian immediately hands
|
||||
you the right book. That is what a vector database does, but with
|
||||
your files.
|
||||
|
||||
We use Pinecone in this guide because it has a free tier and works
|
||||
well. There are other options (Chroma, Weaviate, Qdrant), but
|
||||
Pinecone requires the least setup.
|
||||
|
||||
## What is RAG?
|
||||
|
||||
RAG stands for Retrieval-Augmented Generation. Big name, simple idea.
|
||||
|
||||
Normally, when you ask an AI a question, it answers from its training
|
||||
data. It might know general facts, but it does not know about YOUR
|
||||
files. RAG changes that.
|
||||
|
||||
With RAG, the AI first searches through your documents to find
|
||||
relevant information, then uses what it found to answer your question.
|
||||
It is like giving the AI a cheat sheet of your own content before
|
||||
it answers.
|
||||
|
||||
Without RAG: "What do we know about Jupiter's atmosphere?"
|
||||
The AI answers from general knowledge.
|
||||
|
||||
With RAG: "What do we know about Jupiter's atmosphere?"
|
||||
The AI searches your PDFs and images, finds the Jupiter fact sheet
|
||||
and the Voyager photo, and answers based on YOUR specific collection.
|
||||
|
||||
## What is chunking?
|
||||
|
||||
Your documents might be long. A 50-page PDF cannot be processed
|
||||
as one piece. Chunking means splitting it into smaller sections
|
||||
that the AI can work with.
|
||||
|
||||
Think of it like cutting a book into chapters. Each chapter gets
|
||||
its own embedding. When you search, the system finds the right
|
||||
chapter, not the whole book.
|
||||
|
||||
Claude Code handles chunking automatically. You do not need to
|
||||
do anything.
|
||||
|
||||
## What does "multimodal" mean?
|
||||
|
||||
"Multi" means many. "Modal" means types.
|
||||
|
||||
Regular search works with text only. Multimodal search works with
|
||||
text AND images AND PDFs AND videos. You can search across all
|
||||
of them at once.
|
||||
|
||||
This is what makes this project interesting. You ask a question
|
||||
in plain English, and the system searches through your PDFs,
|
||||
images, and their descriptions to find the best answer, regardless
|
||||
of what format the information is in.
|
||||
|
||||
## How does it all fit together?
|
||||
|
||||
1. You put files in a folder (PDFs, images with descriptions)
|
||||
2. Claude Code builds a system that reads each file
|
||||
3. Each piece of content gets converted to an embedding (a fingerprint)
|
||||
4. The embeddings are stored in Pinecone (the vector database)
|
||||
5. When you search, your question also gets converted to an embedding
|
||||
6. Pinecone finds the stored embeddings most similar to your question
|
||||
7. The matching content is shown to you (or fed to an AI for a detailed answer)
|
||||
|
||||
That is it. The rest is implementation details, and Claude Code
|
||||
handles those for you.
|
||||
Loading…
Add table
Add a link
Reference in a new issue