1
0
Fork 0

Improve README: professional format, demo screenshot, Gemini Embedding 2 focus

Restructured for clarity: table of contents, prerequisites table,
quick start section, and embedded screenshot showing actual search
results. Title now clearly states Gemini Embedding 2 + Claude Code.
This commit is contained in:
Kjell Tore Guttormsen 2026-03-12 16:44:19 +01:00
commit f959b53cac
3 changed files with 135 additions and 175 deletions

1
.gitignore vendored
View file

@ -5,3 +5,4 @@ src/
package.json package.json
package-lock.json package-lock.json
tsconfig.json tsconfig.json
harness-events.jsonl

321
README.md
View file

@ -1,241 +1,200 @@
# Build Multimodal Search with Claude Code # Multimodal RAG with Gemini Embedding 2 and Claude Code
Search across your PDFs, images, and documents using plain English. Search across PDFs, images, and documents using plain English.
No coding required. Claude Code builds everything for you. No coding required. Claude Code builds everything from prompts.
## What you will build ![Search for "What is the largest planet?" returns both the Jupiter photograph and the PDF fact sheet](docs/demo-screenshot.png)
A local search app that lets you ask questions like: > **Gemini Embedding 2** converts text, images, and video into the same
> searchable space. **Claude Code** builds the app. **Pinecone** stores the
> vectors. You just copy four prompts.
- "What is the largest planet in our solar system?" ## Table of Contents
- "Show me photos from the first Moon landing"
- "Which moon has active volcanoes?"
The app searches through your PDFs and images simultaneously and - [Quick Start](#quick-start)
gives you answers with sources. You talk to it in plain English. - [What This Does](#what-this-does)
- [Prerequisites](#prerequisites)
- [Step-by-Step Guide](#step-by-step-guide)
- [Example Data](#example-data)
- [Why Image Descriptions Matter](#why-image-descriptions-matter)
- [Costs](#costs)
- [Troubleshooting](#troubleshooting)
- [How It Works](#how-it-works)
- [License](#license)
## How is this different from a Google search? ## Quick Start
Google searches the internet. This searches YOUR files. ```bash
git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
Imagine you have 500 PDFs, research papers, photos, and notes cd multimodal-rag-guide
scattered across folders. Normal file search only matches exact claude
words. This system understands meaning. You ask "what do we know
about storms on other planets?" and it finds the Jupiter fact sheet
mentioning wind speeds, the Jupiter photograph showing cloud bands,
and the solar system overview describing atmospheric composition.
It connects information across files and formats. That is what
makes it powerful.
## What you need
1. **Claude Code** (comes with Claude Pro at $20/month or Claude Max)
2. **A Google AI Studio account** (free) for Gemini embeddings
3. **A Pinecone account** (free tier) for the vector database
4. **30-45 minutes** for your first time
No programming knowledge required. You will copy prompts into
Claude Code, and it will build everything.
## How it works (the simple version)
```
Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone)
Your question ──> Embedding (Gemini) ──> Search ──> Claude answers
``` ```
1. Your files get converted into "embeddings" (numerical fingerprints Then paste the prompt from [`prompts/01-setup.md`](prompts/01-setup.md) into Claude Code.
that capture meaning)
2. When you ask a question, it gets the same treatment
3. The system finds fingerprints that match
4. Claude reads the matching content and answers your question
For a deeper explanation, see [concepts.md](concepts.md). Four prompts, 30 minutes, working multimodal search.
## Step 0: Get your accounts (10 minutes) ## What This Does
### Google AI Studio (for embeddings) One search box that understands PDFs, images, and text at the same time.
Embeddings convert your content into searchable vectors. We use Ask "What is the largest planet in our solar system?" and the system
Google's Gemini Embedding 2 for this because it handles text, returns the Jupiter fact sheet from a PDF, the Voyager photograph of
images, and video. the Great Red Spot from a JPG, and a confidence score for each result.
One question, multiple formats, ranked by meaning.
This is called Retrieval-Augmented Generation (RAG). Google's
Gemini Embedding 2 handles the multimodal part: it converts different
content types into the same numerical format so they become searchable
together. Claude Code handles the building part: it reads your prompts
and writes all the code. You handle neither.
## Prerequisites
| Requirement | Cost | What it does |
|---|---|---|
| [Claude Code](https://claude.ai) | Part of Claude Pro ($20/mo) or Max | Builds the app and answers questions |
| [Google AI Studio](https://aistudio.google.com/) | Free tier | Gemini Embedding 2 API key |
| [Pinecone](https://www.pinecone.io/) | Free tier | Vector database for storing embeddings |
No programming knowledge required.
## Step-by-Step Guide
### Step 0: Get your API keys (10 minutes)
**Google AI Studio** (for Gemini Embedding 2):
1. Go to [aistudio.google.com](https://aistudio.google.com/) 1. Go to [aistudio.google.com](https://aistudio.google.com/)
2. Sign in with a Google account 2. Sign in with a Google account
3. Click "Get API key" in the left sidebar 3. Click "Get API key" in the left sidebar
4. Click "Create API key" 4. Click "Create API key" and copy it
5. Copy the key somewhere safe
**What is an API key?** It is like a password that lets your app **Pinecone** (for the vector database):
talk to Google's embedding service. You will paste it into a
configuration file later. It never leaves your computer.
### Pinecone (for storing embeddings)
A vector database stores embeddings so you can search through
them. Think of it as a smart filing cabinet.
1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account 1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account
2. Once in the dashboard, click "Create Index" 2. In the dashboard, click "Create Index"
3. Name it `space-search` (or whatever you like) 3. Name it `space-search`, set dimensions to `3072`, choose `cosine` metric
4. Set dimensions to `3072` (this matches Gemini Embedding 2) 4. Select the free "Starter" plan
5. Choose the `cosine` metric 5. Copy your API key from "API Keys"
6. Select the free "Starter" plan
7. Copy your API key from the "API Keys" section
### Verify you have Claude Code ### Step 1: Clone and start Claude Code (5 minutes)
Open your terminal and type `claude`. If Claude Code starts, ```bash
you are ready. If not, install it: git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
cd multimodal-rag-guide
```
npm install -g @anthropic-ai/claude-code
```
You need a Claude Pro or Max subscription for this to work.
## Step 1: Get the example files
Clone or download this repository. The `example-data/` folder
contains everything you need to get started:
**PDFs:**
- `solar-system-overview.pdf` - Overview of our solar system (NASA)
- `jupiter-fact-sheet.pdf` - Detailed data about Jupiter (NASA)
- `solar-system-moons.pdf` - Guide to planetary moons (NASA)
**Images:**
- `earthrise.jpg` - Earth seen from lunar orbit, Apollo 8 (1968)
- `aldrin-moon.jpg` - Buzz Aldrin on the Moon, Apollo 11 (1969)
- `jupiter-great-red-spot.jpg` - Jupiter photographed by Voyager 1 (1979)
- `iss-over-earth.jpg` - The Moon seen from the ISS
**Descriptions:**
- `descriptions.md` - Detailed text descriptions of each image.
This is the most important file for image search quality.
See the section below on why descriptions matter.
All files are NASA public domain. No copyright restrictions.
## Step 2: Start Claude Code (5 minutes)
Open your terminal, navigate to this folder, and start Claude Code:
```
claude claude
``` ```
Then copy the prompt from [prompts/01-setup.md](prompts/01-setup.md) Paste the prompt from [`prompts/01-setup.md`](prompts/01-setup.md).
and paste it into Claude Code. Claude Code creates the project structure and installs dependencies.
Claude Code will create the project structure and install When done, copy `env.template` to `.env` and fill in your API keys.
dependencies. When it is done, copy `.env.template` to `.env`
and fill in your API keys.
## Step 3: Ingest your files (10 minutes) ### Step 2: Ingest your files (10 minutes)
Copy the prompt from [prompts/02-ingest.md](prompts/02-ingest.md) Paste the prompt from [`prompts/02-ingest.md`](prompts/02-ingest.md).
into Claude Code.
Claude Code will read each file, split it into chunks, generate Claude Code reads each file, splits it into chunks, generates
embeddings, and store everything in Pinecone. You will see a embeddings via Gemini Embedding 2, and stores everything in Pinecone.
summary of what was processed.
This is the step where your files become searchable. ### Step 3: Search (5 minutes)
## Step 4: Search (5 minutes) Paste the prompt from [`prompts/03-search.md`](prompts/03-search.md).
Copy the prompt from [prompts/03-search.md](prompts/03-search.md) Claude Code builds a web interface. Open `http://localhost:3333`
into Claude Code. in your browser and try these searches:
Claude Code will build a web interface and start it. Open the URL | Query | Expected results |
it gives you (usually `http://localhost:3333`) in your browser.
Try these searches:
| Search query | What should come back |
|---|---| |---|---|
| "What is the largest planet?" | Jupiter fact sheet + Jupiter image | | "What is the largest planet?" | Jupiter fact sheet + Jupiter image |
| "First Moon landing" | Aldrin image + solar system overview | | "First Moon landing" | Aldrin image + solar system overview |
| "Which moon has volcanoes?" | Moons PDF (mentioning Io) | | "Which moon has volcanoes?" | Moons PDF mentioning Io |
| "How far is Jupiter from Earth?" | Jupiter fact sheet (588.5 to 968.1 million km) | | "How far is Jupiter from Earth?" | Jupiter fact sheet with exact distance |
| "What do astronauts see from orbit?" | ISS image description |
Notice how a single question can pull results from both PDFs and A single question pulls results from both PDFs and images.
images. That is multimodal search.
## Step 5: Make it your own ### Step 4: Make it your own
Now that you have seen it work with NASA files, try it with Replace the NASA example files with your own content:
your own content:
1. Add your own PDFs, images, or documents to the `example-data/` folder 1. Add PDFs, images, or documents to `example-data/`
2. Write descriptions for any images (see the tips in `descriptions.md`) 2. Write descriptions for images (see [`example-data/descriptions.md`](example-data/descriptions.md))
3. Use [prompts/04-improve.md](prompts/04-improve.md) to re-index 3. Paste [`prompts/04-improve.md`](prompts/04-improve.md) to re-index
Ideas for what to search: Ideas: company documents, research papers, travel photos,
- Your company's internal documents recipe collections, course notes.
- Research papers for a project
- Travel photos with descriptions
- Recipe collections
- Course notes and textbook screenshots
## Why image descriptions matter ## Example Data
The search system cannot "see" your images directly. It finds The `example-data/` folder contains NASA public domain files
images through their text descriptions. This means: (no copyright restrictions):
**Bad description:** "Photo of a planet" will only match | File | Description |
searches containing "photo" or "planet." |---|---|
| `solar-system-overview.pdf` | Overview of our solar system |
| `jupiter-fact-sheet.pdf` | Detailed data about Jupiter |
| `solar-system-moons.pdf` | Guide to planetary moons |
| `earthrise.jpg` | Earth from lunar orbit, Apollo 8 (1968) |
| `aldrin-moon.jpg` | Buzz Aldrin on the Moon, Apollo 11 (1969) |
| `jupiter-great-red-spot.jpg` | Jupiter by Voyager 1 (1979) |
| `iss-over-earth.jpg` | The Moon seen from the ISS |
| `descriptions.md` | Image descriptions for search quality |
**Good description:** "Full-disk portrait of Jupiter captured by ## Why Image Descriptions Matter
Voyager 1 in 1979, showing horizontal cloud bands and the Great
Red Spot, a massive storm larger than Earth" will match searches
about Jupiter, Voyager missions, storms, cloud patterns, and more.
The `descriptions.md` file in `example-data/` shows side-by-side The search system finds images through their text descriptions,
examples of bad versus good descriptions. Spending five minutes not by "seeing" them. A description like "Photo of a planet" only
on better descriptions will dramatically improve your search matches searches containing those exact concepts. A description
results. like "Full-disk portrait of Jupiter captured by Voyager 1 in 1979,
showing horizontal cloud bands and the Great Red Spot" matches
searches about Jupiter, Voyager missions, storms, and cloud patterns.
## What this costs See [`example-data/descriptions.md`](example-data/descriptions.md)
for side-by-side examples.
$0 extra if you already have a Claude subscription. ## Costs
Both Gemini embeddings and Pinecone have generous free tiers.
See [costs.md](costs.md) for details. $0 extra if you already have a Claude subscription. Both Gemini
Embedding 2 and Pinecone have free tiers that cover this guide
and well beyond.
## If you get stuck See [costs.md](costs.md) for the full breakdown.
See [troubleshooting.md](troubleshooting.md) for the 10 most ## Troubleshooting
common problems and their solutions.
The most effective fix for almost anything: copy the exact error See [troubleshooting.md](troubleshooting.md) for the 10 most common
message and paste it into Claude Code. It is very good at problems. The most effective fix for almost anything: copy the exact
diagnosing its own work. error message and paste it into Claude Code.
## How it works (the deeper version) ## How It Works
Read [concepts.md](concepts.md) for plain-English explanations of: ```
- What are embeddings? Your files --> Chunking --> Gemini Embedding 2 --> Pinecone (vector DB)
- What is a vector database? |
- What is RAG? Your question --> Gemini Embedding 2 --> Search --> Claude answers
- What is chunking? ```
- What does "multimodal" mean?
## Credits Gemini Embedding 2 converts all content types (text, images, video,
audio) into numerical vectors in one shared space. Pinecone stores
and searches those vectors. Claude reads the matching content and
generates answers.
Example data: All PDFs and images are from NASA and are in the For plain-English explanations of embeddings, vector databases, RAG,
public domain (U.S. Government works, no copyright restrictions). and chunking, see [concepts.md](concepts.md).
Built with: ## Built With
- [Claude Code](https://claude.ai) by Anthropic (app building + AI answers)
- [Gemini Embedding 2](https://ai.google.dev/) by Google (multimodal embeddings) - [Claude Code](https://claude.ai) by Anthropic
- [Pinecone](https://www.pinecone.io/) (vector database) - [Gemini Embedding 2](https://ai.google.dev/) by Google
- [Pinecone](https://www.pinecone.io/)
## License
[MIT](LICENSE)
--- ---
*Part of [The Dharma Lab](https://thedharmalab.com). Read the Part of [The Dharma Lab](https://thedharmalab.com). Read the
[full article](https://thedharmalab.com/) for the story behind this project.* [full article](https://thedharmalab.com/) for the story behind
this project.

BIN
docs/demo-screenshot.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 298 KiB