226 lines
7.5 KiB
Markdown
226 lines
7.5 KiB
Markdown
# Multimodal RAG with Gemini Embedding 2 and Claude Code
|
|
|
|
**Search across PDFs, images, and documents using plain English.**
|
|
No coding required. Claude Code builds everything from four prompts.
|
|
|
|
`MIT License` · `Last updated: March 2026` · `Status: Complete, maintained`
|
|
|
|

|
|
|
|
> **Gemini Embedding 2** converts text, images, and video into the same
|
|
> searchable space. **Claude Code** builds the app. **Pinecone** stores the
|
|
> vectors. You just copy four prompts.
|
|
|
|
## Table of Contents
|
|
|
|
- [Download](#download)
|
|
- [What This Does](#what-this-does)
|
|
- [Prerequisites](#prerequisites)
|
|
- [Step-by-Step Guide](#step-by-step-guide)
|
|
- [Example Data](#example-data)
|
|
- [Why Image Descriptions Matter](#why-image-descriptions-matter)
|
|
- [Costs](#costs)
|
|
- [Troubleshooting](#troubleshooting)
|
|
- [How It Works](#how-it-works)
|
|
- [Tested On](#tested-on)
|
|
- [License](#license)
|
|
|
|
## Download
|
|
|
|
**Option A: Download ZIP (no Git required)**
|
|
|
|
1. [Click here to download the ZIP file](https://git.thedharmalab.com/ktg/multimodal-rag-guide/archive/main.zip)
|
|
2. Unzip the folder
|
|
3. Open a terminal in that folder
|
|
|
|
**Option B: Git clone**
|
|
|
|
```bash
|
|
git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
|
|
cd multimodal-rag-guide
|
|
```
|
|
|
|
Then start Claude Code by typing `claude` in the folder, and paste the
|
|
first prompt from [`prompts/01-setup.md`](prompts/01-setup.md).
|
|
|
|
## What This Does
|
|
|
|
One search box that understands PDFs, images, and text at the same time.
|
|
|
|
Ask "What is the largest planet in our solar system?" and the system
|
|
returns the Jupiter fact sheet from a PDF, the Voyager photograph of
|
|
the Great Red Spot from a JPG, and a confidence score for each result.
|
|
One question, multiple formats, ranked by meaning.
|
|
|
|
This is called Retrieval-Augmented Generation (RAG). Google's
|
|
Gemini Embedding 2 handles the multimodal part: it converts different
|
|
content types into the same numerical format so they become searchable
|
|
together. Claude Code handles the building part: it reads your prompts
|
|
and writes all the code. You handle neither.
|
|
|
|
## Prerequisites
|
|
|
|
| Requirement | Cost | What it does |
|
|
|---|---|---|
|
|
| [Claude Code](https://claude.ai) | Part of Claude Pro ($20/mo) or Max | Builds the app and answers questions |
|
|
| [Google AI Studio](https://aistudio.google.com/) | Free tier | Gemini Embedding 2 API key |
|
|
| [Pinecone](https://www.pinecone.io/) | Free tier | Vector database for storing embeddings |
|
|
|
|
No programming knowledge required.
|
|
|
|
## Step-by-Step Guide
|
|
|
|
### Step 0: Get your API keys (10 minutes)
|
|
|
|
**Google AI Studio** (for Gemini Embedding 2):
|
|
|
|
1. Go to [aistudio.google.com](https://aistudio.google.com/)
|
|
2. Sign in with a Google account
|
|
3. Click "Get API key" in the left sidebar
|
|
4. Click "Create API key" and copy it
|
|
|
|
**Pinecone** (for the vector database):
|
|
|
|
1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account
|
|
2. In the dashboard, click "Create Index"
|
|
3. Name it `space-search`, set dimensions to `3072`, choose `cosine` metric
|
|
4. Select the free "Starter" plan
|
|
5. Copy your API key from "API Keys"
|
|
|
|
### Step 1: Get the files and start Claude Code (5 minutes)
|
|
|
|
Download and unzip (see [Download](#download) above), or:
|
|
|
|
```bash
|
|
git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
|
|
cd multimodal-rag-guide
|
|
```
|
|
|
|
Open your terminal in the folder and type:
|
|
|
|
```bash
|
|
claude
|
|
```
|
|
|
|
Paste the prompt from [`prompts/01-setup.md`](prompts/01-setup.md).
|
|
Claude Code creates the project structure and installs dependencies.
|
|
|
|
When done, copy `env.template` to `.env` and fill in your API keys.
|
|
|
|
### Step 2: Ingest your files (10 minutes)
|
|
|
|
Paste the prompt from [`prompts/02-ingest.md`](prompts/02-ingest.md).
|
|
|
|
Claude Code reads each file, splits it into chunks, generates
|
|
embeddings via Gemini Embedding 2, and stores everything in Pinecone.
|
|
|
|
### Step 3: Search (5 minutes)
|
|
|
|
Paste the prompt from [`prompts/03-search.md`](prompts/03-search.md).
|
|
|
|
Claude Code builds a web interface. Open `http://localhost:3333`
|
|
in your browser and try these searches:
|
|
|
|
| Query | Expected results |
|
|
|---|---|
|
|
| "What is the largest planet?" | Jupiter fact sheet + Jupiter image |
|
|
| "First Moon landing" | Aldrin image + solar system overview |
|
|
| "Which moon has volcanoes?" | Moons PDF mentioning Io |
|
|
| "How far is Jupiter from Earth?" | Jupiter fact sheet with exact distance |
|
|
|
|
A single question pulls results from both PDFs and images.
|
|
|
|
### Step 4: Make it your own
|
|
|
|
Replace the NASA example files with your own content:
|
|
|
|
1. Add PDFs, images, or documents to `example-data/`
|
|
2. Write descriptions for images (see [`example-data/descriptions.md`](example-data/descriptions.md))
|
|
3. Paste [`prompts/04-improve.md`](prompts/04-improve.md) to re-index
|
|
|
|
Ideas: company documents, research papers, travel photos,
|
|
recipe collections, course notes.
|
|
|
|
## Example Data
|
|
|
|
The `example-data/` folder contains NASA public domain files
|
|
(no copyright restrictions):
|
|
|
|
| File | Description |
|
|
|---|---|
|
|
| `solar-system-overview.pdf` | Overview of our solar system |
|
|
| `jupiter-fact-sheet.pdf` | Detailed data about Jupiter |
|
|
| `solar-system-moons.pdf` | Guide to planetary moons |
|
|
| `earthrise.jpg` | Earth from lunar orbit, Apollo 8 (1968) |
|
|
| `aldrin-moon.jpg` | Buzz Aldrin on the Moon, Apollo 11 (1969) |
|
|
| `jupiter-great-red-spot.jpg` | Jupiter by Voyager 1 (1979) |
|
|
| `iss-over-earth.jpg` | The Moon seen from the ISS |
|
|
| `descriptions.md` | Image descriptions for search quality |
|
|
|
|
## Why Image Descriptions Matter
|
|
|
|
The search system finds images through their text descriptions,
|
|
not by "seeing" them. A description like "Photo of a planet" only
|
|
matches searches containing those exact concepts. A description
|
|
like "Full-disk portrait of Jupiter captured by Voyager 1 in 1979,
|
|
showing horizontal cloud bands and the Great Red Spot" matches
|
|
searches about Jupiter, Voyager missions, storms, and cloud patterns.
|
|
|
|
See [`example-data/descriptions.md`](example-data/descriptions.md)
|
|
for side-by-side examples.
|
|
|
|
## Costs
|
|
|
|
$0 extra if you already have a Claude subscription. Both Gemini
|
|
Embedding 2 and Pinecone have free tiers that cover this guide
|
|
and well beyond.
|
|
|
|
See [costs.md](costs.md) for the full breakdown.
|
|
|
|
## Troubleshooting
|
|
|
|
See [troubleshooting.md](troubleshooting.md) for the 10 most common
|
|
problems. The most effective fix for almost anything: copy the exact
|
|
error message and paste it into Claude Code.
|
|
|
|
## How It Works
|
|
|
|
```
|
|
Your files --> Chunking --> Gemini Embedding 2 --> Pinecone (vector DB)
|
|
|
|
|
Your question --> Gemini Embedding 2 --> Search --> Claude answers
|
|
```
|
|
|
|
Gemini Embedding 2 converts all content types (text, images, video,
|
|
audio) into numerical vectors in one shared space. Pinecone stores
|
|
and searches those vectors. Claude reads the matching content and
|
|
generates answers.
|
|
|
|
For plain-English explanations of embeddings, vector databases, RAG,
|
|
and chunking, see [concepts.md](concepts.md).
|
|
|
|
## Tested On
|
|
|
|
- macOS (Apple Silicon and Intel)
|
|
- Claude Code with Claude Pro subscription
|
|
- Gemini Embedding 2 free tier
|
|
- Pinecone free tier (Starter plan)
|
|
|
|
Should work on any system that runs Claude Code (macOS, Linux, Windows via WSL).
|
|
|
|
## Built With
|
|
|
|
- [Claude Code](https://claude.ai) by Anthropic
|
|
- [Gemini Embedding 2](https://ai.google.dev/) by Google
|
|
- [Pinecone](https://www.pinecone.io/)
|
|
|
|
## License
|
|
|
|
This project is licensed under the [MIT License](LICENSE). You are free
|
|
to use, modify, and distribute it.
|
|
|
|
Example data (NASA images and PDFs) is in the public domain.
|
|
|
|
---
|
|
|
|
Part of [The Dharma Lab](https://thedharmalab.com).
|