Initial commit: multimodal RAG guide with Claude Code
Prompt-driven guide for building multimodal search using Gemini Embedding 2 + Pinecone + Claude Code. Includes example data (NASA public domain), step-by-step prompts, concepts explainer, cost breakdown, and troubleshooting guide.
This commit is contained in:
commit
edcd1721df
19 changed files with 4446 additions and 0 deletions
241
README.md
Normal file
241
README.md
Normal file
|
|
@ -0,0 +1,241 @@
|
|||
# Build Multimodal Search with Claude Code
|
||||
|
||||
Search across your PDFs, images, and documents using plain English.
|
||||
No coding required. Claude Code builds everything for you.
|
||||
|
||||
## What you will build
|
||||
|
||||
A local search app that lets you ask questions like:
|
||||
|
||||
- "What is the largest planet in our solar system?"
|
||||
- "Show me photos from the first Moon landing"
|
||||
- "Which moon has active volcanoes?"
|
||||
|
||||
The app searches through your PDFs and images simultaneously and
|
||||
gives you answers with sources. You talk to it in plain English.
|
||||
|
||||
## How is this different from a Google search?
|
||||
|
||||
Google searches the internet. This searches YOUR files.
|
||||
|
||||
Imagine you have 500 PDFs, research papers, photos, and notes
|
||||
scattered across folders. Normal file search only matches exact
|
||||
words. This system understands meaning. You ask "what do we know
|
||||
about storms on other planets?" and it finds the Jupiter fact sheet
|
||||
mentioning wind speeds, the Jupiter photograph showing cloud bands,
|
||||
and the solar system overview describing atmospheric composition.
|
||||
|
||||
It connects information across files and formats. That is what
|
||||
makes it powerful.
|
||||
|
||||
## What you need
|
||||
|
||||
1. **Claude Code** (comes with Claude Pro at $20/month or Claude Max)
|
||||
2. **A Google AI Studio account** (free) for Gemini embeddings
|
||||
3. **A Pinecone account** (free tier) for the vector database
|
||||
4. **30-45 minutes** for your first time
|
||||
|
||||
No programming knowledge required. You will copy prompts into
|
||||
Claude Code, and it will build everything.
|
||||
|
||||
## How it works (the simple version)
|
||||
|
||||
```
|
||||
Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone)
|
||||
│
|
||||
Your question ──> Embedding (Gemini) ──> Search ──> Claude answers
|
||||
```
|
||||
|
||||
1. Your files get converted into "embeddings" (numerical fingerprints
|
||||
that capture meaning)
|
||||
2. When you ask a question, it gets the same treatment
|
||||
3. The system finds fingerprints that match
|
||||
4. Claude reads the matching content and answers your question
|
||||
|
||||
For a deeper explanation, see [concepts.md](concepts.md).
|
||||
|
||||
## Step 0: Get your accounts (10 minutes)
|
||||
|
||||
### Google AI Studio (for embeddings)
|
||||
|
||||
Embeddings convert your content into searchable vectors. We use
|
||||
Google's Gemini Embedding 2 for this because it handles text,
|
||||
images, and video.
|
||||
|
||||
1. Go to [aistudio.google.com](https://aistudio.google.com/)
|
||||
2. Sign in with a Google account
|
||||
3. Click "Get API key" in the left sidebar
|
||||
4. Click "Create API key"
|
||||
5. Copy the key somewhere safe
|
||||
|
||||
**What is an API key?** It is like a password that lets your app
|
||||
talk to Google's embedding service. You will paste it into a
|
||||
configuration file later. It never leaves your computer.
|
||||
|
||||
### Pinecone (for storing embeddings)
|
||||
|
||||
A vector database stores embeddings so you can search through
|
||||
them. Think of it as a smart filing cabinet.
|
||||
|
||||
1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account
|
||||
2. Once in the dashboard, click "Create Index"
|
||||
3. Name it `space-search` (or whatever you like)
|
||||
4. Set dimensions to `3072` (this matches Gemini Embedding 2)
|
||||
5. Choose the `cosine` metric
|
||||
6. Select the free "Starter" plan
|
||||
7. Copy your API key from the "API Keys" section
|
||||
|
||||
### Verify you have Claude Code
|
||||
|
||||
Open your terminal and type `claude`. If Claude Code starts,
|
||||
you are ready. If not, install it:
|
||||
|
||||
```
|
||||
npm install -g @anthropic-ai/claude-code
|
||||
```
|
||||
|
||||
You need a Claude Pro or Max subscription for this to work.
|
||||
|
||||
## Step 1: Get the example files
|
||||
|
||||
Clone or download this repository. The `example-data/` folder
|
||||
contains everything you need to get started:
|
||||
|
||||
**PDFs:**
|
||||
- `solar-system-overview.pdf` - Overview of our solar system (NASA)
|
||||
- `jupiter-fact-sheet.pdf` - Detailed data about Jupiter (NASA)
|
||||
- `solar-system-moons.pdf` - Guide to planetary moons (NASA)
|
||||
|
||||
**Images:**
|
||||
- `earthrise.jpg` - Earth seen from lunar orbit, Apollo 8 (1968)
|
||||
- `aldrin-moon.jpg` - Buzz Aldrin on the Moon, Apollo 11 (1969)
|
||||
- `jupiter-great-red-spot.jpg` - Jupiter photographed by Voyager 1 (1979)
|
||||
- `iss-over-earth.jpg` - The Moon seen from the ISS
|
||||
|
||||
**Descriptions:**
|
||||
- `descriptions.md` - Detailed text descriptions of each image.
|
||||
This is the most important file for image search quality.
|
||||
See the section below on why descriptions matter.
|
||||
|
||||
All files are NASA public domain. No copyright restrictions.
|
||||
|
||||
## Step 2: Start Claude Code (5 minutes)
|
||||
|
||||
Open your terminal, navigate to this folder, and start Claude Code:
|
||||
|
||||
```
|
||||
claude
|
||||
```
|
||||
|
||||
Then copy the prompt from [prompts/01-setup.md](prompts/01-setup.md)
|
||||
and paste it into Claude Code.
|
||||
|
||||
Claude Code will create the project structure and install
|
||||
dependencies. When it is done, copy `.env.template` to `.env`
|
||||
and fill in your API keys.
|
||||
|
||||
## Step 3: Ingest your files (10 minutes)
|
||||
|
||||
Copy the prompt from [prompts/02-ingest.md](prompts/02-ingest.md)
|
||||
into Claude Code.
|
||||
|
||||
Claude Code will read each file, split it into chunks, generate
|
||||
embeddings, and store everything in Pinecone. You will see a
|
||||
summary of what was processed.
|
||||
|
||||
This is the step where your files become searchable.
|
||||
|
||||
## Step 4: Search (5 minutes)
|
||||
|
||||
Copy the prompt from [prompts/03-search.md](prompts/03-search.md)
|
||||
into Claude Code.
|
||||
|
||||
Claude Code will build a web interface and start it. Open the URL
|
||||
it gives you (usually `http://localhost:3333`) in your browser.
|
||||
|
||||
Try these searches:
|
||||
|
||||
| Search query | What should come back |
|
||||
|---|---|
|
||||
| "What is the largest planet?" | Jupiter fact sheet + Jupiter image |
|
||||
| "First Moon landing" | Aldrin image + solar system overview |
|
||||
| "Which moon has volcanoes?" | Moons PDF (mentioning Io) |
|
||||
| "How far is Jupiter from Earth?" | Jupiter fact sheet (588.5 to 968.1 million km) |
|
||||
| "What do astronauts see from orbit?" | ISS image description |
|
||||
|
||||
Notice how a single question can pull results from both PDFs and
|
||||
images. That is multimodal search.
|
||||
|
||||
## Step 5: Make it your own
|
||||
|
||||
Now that you have seen it work with NASA files, try it with
|
||||
your own content:
|
||||
|
||||
1. Add your own PDFs, images, or documents to the `example-data/` folder
|
||||
2. Write descriptions for any images (see the tips in `descriptions.md`)
|
||||
3. Use [prompts/04-improve.md](prompts/04-improve.md) to re-index
|
||||
|
||||
Ideas for what to search:
|
||||
- Your company's internal documents
|
||||
- Research papers for a project
|
||||
- Travel photos with descriptions
|
||||
- Recipe collections
|
||||
- Course notes and textbook screenshots
|
||||
|
||||
## Why image descriptions matter
|
||||
|
||||
The search system cannot "see" your images directly. It finds
|
||||
images through their text descriptions. This means:
|
||||
|
||||
**Bad description:** "Photo of a planet" will only match
|
||||
searches containing "photo" or "planet."
|
||||
|
||||
**Good description:** "Full-disk portrait of Jupiter captured by
|
||||
Voyager 1 in 1979, showing horizontal cloud bands and the Great
|
||||
Red Spot, a massive storm larger than Earth" will match searches
|
||||
about Jupiter, Voyager missions, storms, cloud patterns, and more.
|
||||
|
||||
The `descriptions.md` file in `example-data/` shows side-by-side
|
||||
examples of bad versus good descriptions. Spending five minutes
|
||||
on better descriptions will dramatically improve your search
|
||||
results.
|
||||
|
||||
## What this costs
|
||||
|
||||
$0 extra if you already have a Claude subscription.
|
||||
Both Gemini embeddings and Pinecone have generous free tiers.
|
||||
|
||||
See [costs.md](costs.md) for details.
|
||||
|
||||
## If you get stuck
|
||||
|
||||
See [troubleshooting.md](troubleshooting.md) for the 10 most
|
||||
common problems and their solutions.
|
||||
|
||||
The most effective fix for almost anything: copy the exact error
|
||||
message and paste it into Claude Code. It is very good at
|
||||
diagnosing its own work.
|
||||
|
||||
## How it works (the deeper version)
|
||||
|
||||
Read [concepts.md](concepts.md) for plain-English explanations of:
|
||||
- What are embeddings?
|
||||
- What is a vector database?
|
||||
- What is RAG?
|
||||
- What is chunking?
|
||||
- What does "multimodal" mean?
|
||||
|
||||
## Credits
|
||||
|
||||
Example data: All PDFs and images are from NASA and are in the
|
||||
public domain (U.S. Government works, no copyright restrictions).
|
||||
|
||||
Built with:
|
||||
- [Claude Code](https://claude.ai) by Anthropic (app building + AI answers)
|
||||
- [Gemini Embedding 2](https://ai.google.dev/) by Google (multimodal embeddings)
|
||||
- [Pinecone](https://www.pinecone.io/) (vector database)
|
||||
|
||||
---
|
||||
|
||||
*Part of [The Dharma Lab](https://thedharmalab.com). Read the
|
||||
[full article](https://thedharmalab.com/) for the story behind this project.*
|
||||
Loading…
Add table
Add a link
Reference in a new issue