1
0
Fork 0
multimodal-rag-guide/README.md
Kjell Tore Guttormsen edcd1721df Initial commit: multimodal RAG guide with Claude Code
Prompt-driven guide for building multimodal search using
Gemini Embedding 2 + Pinecone + Claude Code. Includes example
data (NASA public domain), step-by-step prompts, concepts
explainer, cost breakdown, and troubleshooting guide.
2026-03-12 16:36:22 +01:00

241 lines
7.9 KiB
Markdown

# Build Multimodal Search with Claude Code
Search across your PDFs, images, and documents using plain English.
No coding required. Claude Code builds everything for you.
## What you will build
A local search app that lets you ask questions like:
- "What is the largest planet in our solar system?"
- "Show me photos from the first Moon landing"
- "Which moon has active volcanoes?"
The app searches through your PDFs and images simultaneously and
gives you answers with sources. You talk to it in plain English.
## How is this different from a Google search?
Google searches the internet. This searches YOUR files.
Imagine you have 500 PDFs, research papers, photos, and notes
scattered across folders. Normal file search only matches exact
words. This system understands meaning. You ask "what do we know
about storms on other planets?" and it finds the Jupiter fact sheet
mentioning wind speeds, the Jupiter photograph showing cloud bands,
and the solar system overview describing atmospheric composition.
It connects information across files and formats. That is what
makes it powerful.
## What you need
1. **Claude Code** (comes with Claude Pro at $20/month or Claude Max)
2. **A Google AI Studio account** (free) for Gemini embeddings
3. **A Pinecone account** (free tier) for the vector database
4. **30-45 minutes** for your first time
No programming knowledge required. You will copy prompts into
Claude Code, and it will build everything.
## How it works (the simple version)
```
Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone)
Your question ──> Embedding (Gemini) ──> Search ──> Claude answers
```
1. Your files get converted into "embeddings" (numerical fingerprints
that capture meaning)
2. When you ask a question, it gets the same treatment
3. The system finds fingerprints that match
4. Claude reads the matching content and answers your question
For a deeper explanation, see [concepts.md](concepts.md).
## Step 0: Get your accounts (10 minutes)
### Google AI Studio (for embeddings)
Embeddings convert your content into searchable vectors. We use
Google's Gemini Embedding 2 for this because it handles text,
images, and video.
1. Go to [aistudio.google.com](https://aistudio.google.com/)
2. Sign in with a Google account
3. Click "Get API key" in the left sidebar
4. Click "Create API key"
5. Copy the key somewhere safe
**What is an API key?** It is like a password that lets your app
talk to Google's embedding service. You will paste it into a
configuration file later. It never leaves your computer.
### Pinecone (for storing embeddings)
A vector database stores embeddings so you can search through
them. Think of it as a smart filing cabinet.
1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account
2. Once in the dashboard, click "Create Index"
3. Name it `space-search` (or whatever you like)
4. Set dimensions to `3072` (this matches Gemini Embedding 2)
5. Choose the `cosine` metric
6. Select the free "Starter" plan
7. Copy your API key from the "API Keys" section
### Verify you have Claude Code
Open your terminal and type `claude`. If Claude Code starts,
you are ready. If not, install it:
```
npm install -g @anthropic-ai/claude-code
```
You need a Claude Pro or Max subscription for this to work.
## Step 1: Get the example files
Clone or download this repository. The `example-data/` folder
contains everything you need to get started:
**PDFs:**
- `solar-system-overview.pdf` - Overview of our solar system (NASA)
- `jupiter-fact-sheet.pdf` - Detailed data about Jupiter (NASA)
- `solar-system-moons.pdf` - Guide to planetary moons (NASA)
**Images:**
- `earthrise.jpg` - Earth seen from lunar orbit, Apollo 8 (1968)
- `aldrin-moon.jpg` - Buzz Aldrin on the Moon, Apollo 11 (1969)
- `jupiter-great-red-spot.jpg` - Jupiter photographed by Voyager 1 (1979)
- `iss-over-earth.jpg` - The Moon seen from the ISS
**Descriptions:**
- `descriptions.md` - Detailed text descriptions of each image.
This is the most important file for image search quality.
See the section below on why descriptions matter.
All files are NASA public domain. No copyright restrictions.
## Step 2: Start Claude Code (5 minutes)
Open your terminal, navigate to this folder, and start Claude Code:
```
claude
```
Then copy the prompt from [prompts/01-setup.md](prompts/01-setup.md)
and paste it into Claude Code.
Claude Code will create the project structure and install
dependencies. When it is done, copy `.env.template` to `.env`
and fill in your API keys.
## Step 3: Ingest your files (10 minutes)
Copy the prompt from [prompts/02-ingest.md](prompts/02-ingest.md)
into Claude Code.
Claude Code will read each file, split it into chunks, generate
embeddings, and store everything in Pinecone. You will see a
summary of what was processed.
This is the step where your files become searchable.
## Step 4: Search (5 minutes)
Copy the prompt from [prompts/03-search.md](prompts/03-search.md)
into Claude Code.
Claude Code will build a web interface and start it. Open the URL
it gives you (usually `http://localhost:3333`) in your browser.
Try these searches:
| Search query | What should come back |
|---|---|
| "What is the largest planet?" | Jupiter fact sheet + Jupiter image |
| "First Moon landing" | Aldrin image + solar system overview |
| "Which moon has volcanoes?" | Moons PDF (mentioning Io) |
| "How far is Jupiter from Earth?" | Jupiter fact sheet (588.5 to 968.1 million km) |
| "What do astronauts see from orbit?" | ISS image description |
Notice how a single question can pull results from both PDFs and
images. That is multimodal search.
## Step 5: Make it your own
Now that you have seen it work with NASA files, try it with
your own content:
1. Add your own PDFs, images, or documents to the `example-data/` folder
2. Write descriptions for any images (see the tips in `descriptions.md`)
3. Use [prompts/04-improve.md](prompts/04-improve.md) to re-index
Ideas for what to search:
- Your company's internal documents
- Research papers for a project
- Travel photos with descriptions
- Recipe collections
- Course notes and textbook screenshots
## Why image descriptions matter
The search system cannot "see" your images directly. It finds
images through their text descriptions. This means:
**Bad description:** "Photo of a planet" will only match
searches containing "photo" or "planet."
**Good description:** "Full-disk portrait of Jupiter captured by
Voyager 1 in 1979, showing horizontal cloud bands and the Great
Red Spot, a massive storm larger than Earth" will match searches
about Jupiter, Voyager missions, storms, cloud patterns, and more.
The `descriptions.md` file in `example-data/` shows side-by-side
examples of bad versus good descriptions. Spending five minutes
on better descriptions will dramatically improve your search
results.
## What this costs
$0 extra if you already have a Claude subscription.
Both Gemini embeddings and Pinecone have generous free tiers.
See [costs.md](costs.md) for details.
## If you get stuck
See [troubleshooting.md](troubleshooting.md) for the 10 most
common problems and their solutions.
The most effective fix for almost anything: copy the exact error
message and paste it into Claude Code. It is very good at
diagnosing its own work.
## How it works (the deeper version)
Read [concepts.md](concepts.md) for plain-English explanations of:
- What are embeddings?
- What is a vector database?
- What is RAG?
- What is chunking?
- What does "multimodal" mean?
## Credits
Example data: All PDFs and images are from NASA and are in the
public domain (U.S. Government works, no copyright restrictions).
Built with:
- [Claude Code](https://claude.ai) by Anthropic (app building + AI answers)
- [Gemini Embedding 2](https://ai.google.dev/) by Google (multimodal embeddings)
- [Pinecone](https://www.pinecone.io/) (vector database)
---
*Part of [The Dharma Lab](https://thedharmalab.com). Read the
[full article](https://thedharmalab.com/) for the story behind this project.*