Initial commit: multimodal RAG guide with Claude Code

Prompt-driven guide for building multimodal search using Gemini Embedding 2 + Pinecone + Claude Code. Includes example data (NASA public domain), step-by-step prompts, concepts explainer, cost breakdown, and troubleshooting guide.
2026-03-12 16:36:22 +01:00 · 2026-03-12 16:36:22 +01:00 · edcd1721df
commit edcd1721df
19 changed files with 4446 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,241 @@
+# Build Multimodal Search with Claude Code
+
+Search across your PDFs, images, and documents using plain English.
+No coding required. Claude Code builds everything for you.
+
+## What you will build
+
+A local search app that lets you ask questions like:
+
+- "What is the largest planet in our solar system?"
+- "Show me photos from the first Moon landing"
+- "Which moon has active volcanoes?"
+
+The app searches through your PDFs and images simultaneously and
+gives you answers with sources. You talk to it in plain English.
+
+## How is this different from a Google search?
+
+Google searches the internet. This searches YOUR files.
+
+Imagine you have 500 PDFs, research papers, photos, and notes
+scattered across folders. Normal file search only matches exact
+words. This system understands meaning. You ask "what do we know
+about storms on other planets?" and it finds the Jupiter fact sheet
+mentioning wind speeds, the Jupiter photograph showing cloud bands,
+and the solar system overview describing atmospheric composition.
+
+It connects information across files and formats. That is what
+makes it powerful.
+
+## What you need
+
+1. **Claude Code** (comes with Claude Pro at $20/month or Claude Max)
+2. **A Google AI Studio account** (free) for Gemini embeddings
+3. **A Pinecone account** (free tier) for the vector database
+4. **30-45 minutes** for your first time
+
+No programming knowledge required. You will copy prompts into
+Claude Code, and it will build everything.
+
+## How it works (the simple version)
+
+```
+Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone)
+                                              │
+Your question ──> Embedding (Gemini) ──> Search ──> Claude answers
+```
+
+1. Your files get converted into "embeddings" (numerical fingerprints
+   that capture meaning)
+2. When you ask a question, it gets the same treatment
+3. The system finds fingerprints that match
+4. Claude reads the matching content and answers your question
+
+For a deeper explanation, see [concepts.md](concepts.md).
+
+## Step 0: Get your accounts (10 minutes)
+
+### Google AI Studio (for embeddings)
+
+Embeddings convert your content into searchable vectors. We use
+Google's Gemini Embedding 2 for this because it handles text,
+images, and video.
+
+1. Go to [aistudio.google.com](https://aistudio.google.com/)
+2. Sign in with a Google account
+3. Click "Get API key" in the left sidebar
+4. Click "Create API key"
+5. Copy the key somewhere safe
+
+**What is an API key?** It is like a password that lets your app
+talk to Google's embedding service. You will paste it into a
+configuration file later. It never leaves your computer.
+
+### Pinecone (for storing embeddings)
+
+A vector database stores embeddings so you can search through
+them. Think of it as a smart filing cabinet.
+
+1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account
+2. Once in the dashboard, click "Create Index"
+3. Name it `space-search` (or whatever you like)
+4. Set dimensions to `3072` (this matches Gemini Embedding 2)
+5. Choose the `cosine` metric
+6. Select the free "Starter" plan
+7. Copy your API key from the "API Keys" section
+
+### Verify you have Claude Code
+
+Open your terminal and type `claude`. If Claude Code starts,
+you are ready. If not, install it:
+
+```
+npm install -g @anthropic-ai/claude-code
+```
+
+You need a Claude Pro or Max subscription for this to work.
+
+## Step 1: Get the example files
+
+Clone or download this repository. The `example-data/` folder
+contains everything you need to get started:
+
+**PDFs:**
+- `solar-system-overview.pdf` - Overview of our solar system (NASA)
+- `jupiter-fact-sheet.pdf` - Detailed data about Jupiter (NASA)
+- `solar-system-moons.pdf` - Guide to planetary moons (NASA)
+
+**Images:**
+- `earthrise.jpg` - Earth seen from lunar orbit, Apollo 8 (1968)
+- `aldrin-moon.jpg` - Buzz Aldrin on the Moon, Apollo 11 (1969)
+- `jupiter-great-red-spot.jpg` - Jupiter photographed by Voyager 1 (1979)
+- `iss-over-earth.jpg` - The Moon seen from the ISS
+
+**Descriptions:**
+- `descriptions.md` - Detailed text descriptions of each image.
+  This is the most important file for image search quality.
+  See the section below on why descriptions matter.
+
+All files are NASA public domain. No copyright restrictions.
+
+## Step 2: Start Claude Code (5 minutes)
+
+Open your terminal, navigate to this folder, and start Claude Code:
+
+```
+claude
+```
+
+Then copy the prompt from [prompts/01-setup.md](prompts/01-setup.md)
+and paste it into Claude Code.
+
+Claude Code will create the project structure and install
+dependencies. When it is done, copy `.env.template` to `.env`
+and fill in your API keys.
+
+## Step 3: Ingest your files (10 minutes)
+
+Copy the prompt from [prompts/02-ingest.md](prompts/02-ingest.md)
+into Claude Code.
+
+Claude Code will read each file, split it into chunks, generate
+embeddings, and store everything in Pinecone. You will see a
+summary of what was processed.
+
+This is the step where your files become searchable.
+
+## Step 4: Search (5 minutes)
+
+Copy the prompt from [prompts/03-search.md](prompts/03-search.md)
+into Claude Code.
+
+Claude Code will build a web interface and start it. Open the URL
+it gives you (usually `http://localhost:3333`) in your browser.
+
+Try these searches:
+
+| Search query | What should come back |
+|---|---|
+| "What is the largest planet?" | Jupiter fact sheet + Jupiter image |
+| "First Moon landing" | Aldrin image + solar system overview |
+| "Which moon has volcanoes?" | Moons PDF (mentioning Io) |
+| "How far is Jupiter from Earth?" | Jupiter fact sheet (588.5 to 968.1 million km) |
+| "What do astronauts see from orbit?" | ISS image description |
+
+Notice how a single question can pull results from both PDFs and
+images. That is multimodal search.
+
+## Step 5: Make it your own
+
+Now that you have seen it work with NASA files, try it with
+your own content:
+
+1. Add your own PDFs, images, or documents to the `example-data/` folder
+2. Write descriptions for any images (see the tips in `descriptions.md`)
+3. Use [prompts/04-improve.md](prompts/04-improve.md) to re-index
+
+Ideas for what to search:
+- Your company's internal documents
+- Research papers for a project
+- Travel photos with descriptions
+- Recipe collections
+- Course notes and textbook screenshots
+
+## Why image descriptions matter
+
+The search system cannot "see" your images directly. It finds
+images through their text descriptions. This means:
+
+**Bad description:** "Photo of a planet" will only match
+searches containing "photo" or "planet."
+
+**Good description:** "Full-disk portrait of Jupiter captured by
+Voyager 1 in 1979, showing horizontal cloud bands and the Great
+Red Spot, a massive storm larger than Earth" will match searches
+about Jupiter, Voyager missions, storms, cloud patterns, and more.
+
+The `descriptions.md` file in `example-data/` shows side-by-side
+examples of bad versus good descriptions. Spending five minutes
+on better descriptions will dramatically improve your search
+results.
+
+## What this costs
+
+$0 extra if you already have a Claude subscription.
+Both Gemini embeddings and Pinecone have generous free tiers.
+
+See [costs.md](costs.md) for details.
+
+## If you get stuck
+
+See [troubleshooting.md](troubleshooting.md) for the 10 most
+common problems and their solutions.
+
+The most effective fix for almost anything: copy the exact error
+message and paste it into Claude Code. It is very good at
+diagnosing its own work.
+
+## How it works (the deeper version)
+
+Read [concepts.md](concepts.md) for plain-English explanations of:
+- What are embeddings?
+- What is a vector database?
+- What is RAG?
+- What is chunking?
+- What does "multimodal" mean?
+
+## Credits
+
+Example data: All PDFs and images are from NASA and are in the
+public domain (U.S. Government works, no copyright restrictions).
+
+Built with:
+- [Claude Code](https://claude.ai) by Anthropic (app building + AI answers)
+- [Gemini Embedding 2](https://ai.google.dev/) by Google (multimodal embeddings)
+- [Pinecone](https://www.pinecone.io/) (vector database)
+
+---
+
+*Part of [The Dharma Lab](https://thedharmalab.com). Read the
+[full article](https://thedharmalab.com/) for the story behind this project.*