1
0
Fork 0

Improve README: professional format, demo screenshot, Gemini Embedding 2 focus

Restructured for clarity: table of contents, prerequisites table,
quick start section, and embedded screenshot showing actual search
results. Title now clearly states Gemini Embedding 2 + Claude Code.
This commit is contained in:
Kjell Tore Guttormsen 2026-03-12 16:44:19 +01:00
commit f959b53cac
3 changed files with 135 additions and 175 deletions

1
.gitignore vendored
View file

@ -5,3 +5,4 @@ src/
package.json
package-lock.json
tsconfig.json
harness-events.jsonl

321
README.md
View file

@ -1,241 +1,200 @@
# Build Multimodal Search with Claude Code
# Multimodal RAG with Gemini Embedding 2 and Claude Code
Search across your PDFs, images, and documents using plain English.
No coding required. Claude Code builds everything for you.
Search across PDFs, images, and documents using plain English.
No coding required. Claude Code builds everything from prompts.
## What you will build
![Search for "What is the largest planet?" returns both the Jupiter photograph and the PDF fact sheet](docs/demo-screenshot.png)
A local search app that lets you ask questions like:
> **Gemini Embedding 2** converts text, images, and video into the same
> searchable space. **Claude Code** builds the app. **Pinecone** stores the
> vectors. You just copy four prompts.
- "What is the largest planet in our solar system?"
- "Show me photos from the first Moon landing"
- "Which moon has active volcanoes?"
## Table of Contents
The app searches through your PDFs and images simultaneously and
gives you answers with sources. You talk to it in plain English.
- [Quick Start](#quick-start)
- [What This Does](#what-this-does)
- [Prerequisites](#prerequisites)
- [Step-by-Step Guide](#step-by-step-guide)
- [Example Data](#example-data)
- [Why Image Descriptions Matter](#why-image-descriptions-matter)
- [Costs](#costs)
- [Troubleshooting](#troubleshooting)
- [How It Works](#how-it-works)
- [License](#license)
## How is this different from a Google search?
## Quick Start
Google searches the internet. This searches YOUR files.
Imagine you have 500 PDFs, research papers, photos, and notes
scattered across folders. Normal file search only matches exact
words. This system understands meaning. You ask "what do we know
about storms on other planets?" and it finds the Jupiter fact sheet
mentioning wind speeds, the Jupiter photograph showing cloud bands,
and the solar system overview describing atmospheric composition.
It connects information across files and formats. That is what
makes it powerful.
## What you need
1. **Claude Code** (comes with Claude Pro at $20/month or Claude Max)
2. **A Google AI Studio account** (free) for Gemini embeddings
3. **A Pinecone account** (free tier) for the vector database
4. **30-45 minutes** for your first time
No programming knowledge required. You will copy prompts into
Claude Code, and it will build everything.
## How it works (the simple version)
```
Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone)
Your question ──> Embedding (Gemini) ──> Search ──> Claude answers
```bash
git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
cd multimodal-rag-guide
claude
```
1. Your files get converted into "embeddings" (numerical fingerprints
that capture meaning)
2. When you ask a question, it gets the same treatment
3. The system finds fingerprints that match
4. Claude reads the matching content and answers your question
Then paste the prompt from [`prompts/01-setup.md`](prompts/01-setup.md) into Claude Code.
For a deeper explanation, see [concepts.md](concepts.md).
Four prompts, 30 minutes, working multimodal search.
## Step 0: Get your accounts (10 minutes)
## What This Does
### Google AI Studio (for embeddings)
One search box that understands PDFs, images, and text at the same time.
Embeddings convert your content into searchable vectors. We use
Google's Gemini Embedding 2 for this because it handles text,
images, and video.
Ask "What is the largest planet in our solar system?" and the system
returns the Jupiter fact sheet from a PDF, the Voyager photograph of
the Great Red Spot from a JPG, and a confidence score for each result.
One question, multiple formats, ranked by meaning.
This is called Retrieval-Augmented Generation (RAG). Google's
Gemini Embedding 2 handles the multimodal part: it converts different
content types into the same numerical format so they become searchable
together. Claude Code handles the building part: it reads your prompts
and writes all the code. You handle neither.
## Prerequisites
| Requirement | Cost | What it does |
|---|---|---|
| [Claude Code](https://claude.ai) | Part of Claude Pro ($20/mo) or Max | Builds the app and answers questions |
| [Google AI Studio](https://aistudio.google.com/) | Free tier | Gemini Embedding 2 API key |
| [Pinecone](https://www.pinecone.io/) | Free tier | Vector database for storing embeddings |
No programming knowledge required.
## Step-by-Step Guide
### Step 0: Get your API keys (10 minutes)
**Google AI Studio** (for Gemini Embedding 2):
1. Go to [aistudio.google.com](https://aistudio.google.com/)
2. Sign in with a Google account
3. Click "Get API key" in the left sidebar
4. Click "Create API key"
5. Copy the key somewhere safe
4. Click "Create API key" and copy it
**What is an API key?** It is like a password that lets your app
talk to Google's embedding service. You will paste it into a
configuration file later. It never leaves your computer.
### Pinecone (for storing embeddings)
A vector database stores embeddings so you can search through
them. Think of it as a smart filing cabinet.
**Pinecone** (for the vector database):
1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account
2. Once in the dashboard, click "Create Index"
3. Name it `space-search` (or whatever you like)
4. Set dimensions to `3072` (this matches Gemini Embedding 2)
5. Choose the `cosine` metric
6. Select the free "Starter" plan
7. Copy your API key from the "API Keys" section
2. In the dashboard, click "Create Index"
3. Name it `space-search`, set dimensions to `3072`, choose `cosine` metric
4. Select the free "Starter" plan
5. Copy your API key from "API Keys"
### Verify you have Claude Code
### Step 1: Clone and start Claude Code (5 minutes)
Open your terminal and type `claude`. If Claude Code starts,
you are ready. If not, install it:
```
npm install -g @anthropic-ai/claude-code
```
You need a Claude Pro or Max subscription for this to work.
## Step 1: Get the example files
Clone or download this repository. The `example-data/` folder
contains everything you need to get started:
**PDFs:**
- `solar-system-overview.pdf` - Overview of our solar system (NASA)
- `jupiter-fact-sheet.pdf` - Detailed data about Jupiter (NASA)
- `solar-system-moons.pdf` - Guide to planetary moons (NASA)
**Images:**
- `earthrise.jpg` - Earth seen from lunar orbit, Apollo 8 (1968)
- `aldrin-moon.jpg` - Buzz Aldrin on the Moon, Apollo 11 (1969)
- `jupiter-great-red-spot.jpg` - Jupiter photographed by Voyager 1 (1979)
- `iss-over-earth.jpg` - The Moon seen from the ISS
**Descriptions:**
- `descriptions.md` - Detailed text descriptions of each image.
This is the most important file for image search quality.
See the section below on why descriptions matter.
All files are NASA public domain. No copyright restrictions.
## Step 2: Start Claude Code (5 minutes)
Open your terminal, navigate to this folder, and start Claude Code:
```
```bash
git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
cd multimodal-rag-guide
claude
```
Then copy the prompt from [prompts/01-setup.md](prompts/01-setup.md)
and paste it into Claude Code.
Paste the prompt from [`prompts/01-setup.md`](prompts/01-setup.md).
Claude Code creates the project structure and installs dependencies.
Claude Code will create the project structure and install
dependencies. When it is done, copy `.env.template` to `.env`
and fill in your API keys.
When done, copy `env.template` to `.env` and fill in your API keys.
## Step 3: Ingest your files (10 minutes)
### Step 2: Ingest your files (10 minutes)
Copy the prompt from [prompts/02-ingest.md](prompts/02-ingest.md)
into Claude Code.
Paste the prompt from [`prompts/02-ingest.md`](prompts/02-ingest.md).
Claude Code will read each file, split it into chunks, generate
embeddings, and store everything in Pinecone. You will see a
summary of what was processed.
Claude Code reads each file, splits it into chunks, generates
embeddings via Gemini Embedding 2, and stores everything in Pinecone.
This is the step where your files become searchable.
### Step 3: Search (5 minutes)
## Step 4: Search (5 minutes)
Paste the prompt from [`prompts/03-search.md`](prompts/03-search.md).
Copy the prompt from [prompts/03-search.md](prompts/03-search.md)
into Claude Code.
Claude Code builds a web interface. Open `http://localhost:3333`
in your browser and try these searches:
Claude Code will build a web interface and start it. Open the URL
it gives you (usually `http://localhost:3333`) in your browser.
Try these searches:
| Search query | What should come back |
| Query | Expected results |
|---|---|
| "What is the largest planet?" | Jupiter fact sheet + Jupiter image |
| "First Moon landing" | Aldrin image + solar system overview |
| "Which moon has volcanoes?" | Moons PDF (mentioning Io) |
| "How far is Jupiter from Earth?" | Jupiter fact sheet (588.5 to 968.1 million km) |
| "What do astronauts see from orbit?" | ISS image description |
| "Which moon has volcanoes?" | Moons PDF mentioning Io |
| "How far is Jupiter from Earth?" | Jupiter fact sheet with exact distance |
Notice how a single question can pull results from both PDFs and
images. That is multimodal search.
A single question pulls results from both PDFs and images.
## Step 5: Make it your own
### Step 4: Make it your own
Now that you have seen it work with NASA files, try it with
your own content:
Replace the NASA example files with your own content:
1. Add your own PDFs, images, or documents to the `example-data/` folder
2. Write descriptions for any images (see the tips in `descriptions.md`)
3. Use [prompts/04-improve.md](prompts/04-improve.md) to re-index
1. Add PDFs, images, or documents to `example-data/`
2. Write descriptions for images (see [`example-data/descriptions.md`](example-data/descriptions.md))
3. Paste [`prompts/04-improve.md`](prompts/04-improve.md) to re-index
Ideas for what to search:
- Your company's internal documents
- Research papers for a project
- Travel photos with descriptions
- Recipe collections
- Course notes and textbook screenshots
Ideas: company documents, research papers, travel photos,
recipe collections, course notes.
## Why image descriptions matter
## Example Data
The search system cannot "see" your images directly. It finds
images through their text descriptions. This means:
The `example-data/` folder contains NASA public domain files
(no copyright restrictions):
**Bad description:** "Photo of a planet" will only match
searches containing "photo" or "planet."
| File | Description |
|---|---|
| `solar-system-overview.pdf` | Overview of our solar system |
| `jupiter-fact-sheet.pdf` | Detailed data about Jupiter |
| `solar-system-moons.pdf` | Guide to planetary moons |
| `earthrise.jpg` | Earth from lunar orbit, Apollo 8 (1968) |
| `aldrin-moon.jpg` | Buzz Aldrin on the Moon, Apollo 11 (1969) |
| `jupiter-great-red-spot.jpg` | Jupiter by Voyager 1 (1979) |
| `iss-over-earth.jpg` | The Moon seen from the ISS |
| `descriptions.md` | Image descriptions for search quality |
**Good description:** "Full-disk portrait of Jupiter captured by
Voyager 1 in 1979, showing horizontal cloud bands and the Great
Red Spot, a massive storm larger than Earth" will match searches
about Jupiter, Voyager missions, storms, cloud patterns, and more.
## Why Image Descriptions Matter
The `descriptions.md` file in `example-data/` shows side-by-side
examples of bad versus good descriptions. Spending five minutes
on better descriptions will dramatically improve your search
results.
The search system finds images through their text descriptions,
not by "seeing" them. A description like "Photo of a planet" only
matches searches containing those exact concepts. A description
like "Full-disk portrait of Jupiter captured by Voyager 1 in 1979,
showing horizontal cloud bands and the Great Red Spot" matches
searches about Jupiter, Voyager missions, storms, and cloud patterns.
## What this costs
See [`example-data/descriptions.md`](example-data/descriptions.md)
for side-by-side examples.
$0 extra if you already have a Claude subscription.
Both Gemini embeddings and Pinecone have generous free tiers.
## Costs
See [costs.md](costs.md) for details.
$0 extra if you already have a Claude subscription. Both Gemini
Embedding 2 and Pinecone have free tiers that cover this guide
and well beyond.
## If you get stuck
See [costs.md](costs.md) for the full breakdown.
See [troubleshooting.md](troubleshooting.md) for the 10 most
common problems and their solutions.
## Troubleshooting
The most effective fix for almost anything: copy the exact error
message and paste it into Claude Code. It is very good at
diagnosing its own work.
See [troubleshooting.md](troubleshooting.md) for the 10 most common
problems. The most effective fix for almost anything: copy the exact
error message and paste it into Claude Code.
## How it works (the deeper version)
## How It Works
Read [concepts.md](concepts.md) for plain-English explanations of:
- What are embeddings?
- What is a vector database?
- What is RAG?
- What is chunking?
- What does "multimodal" mean?
```
Your files --> Chunking --> Gemini Embedding 2 --> Pinecone (vector DB)
|
Your question --> Gemini Embedding 2 --> Search --> Claude answers
```
## Credits
Gemini Embedding 2 converts all content types (text, images, video,
audio) into numerical vectors in one shared space. Pinecone stores
and searches those vectors. Claude reads the matching content and
generates answers.
Example data: All PDFs and images are from NASA and are in the
public domain (U.S. Government works, no copyright restrictions).
For plain-English explanations of embeddings, vector databases, RAG,
and chunking, see [concepts.md](concepts.md).
Built with:
- [Claude Code](https://claude.ai) by Anthropic (app building + AI answers)
- [Gemini Embedding 2](https://ai.google.dev/) by Google (multimodal embeddings)
- [Pinecone](https://www.pinecone.io/) (vector database)
## Built With
- [Claude Code](https://claude.ai) by Anthropic
- [Gemini Embedding 2](https://ai.google.dev/) by Google
- [Pinecone](https://www.pinecone.io/)
## License
[MIT](LICENSE)
---
*Part of [The Dharma Lab](https://thedharmalab.com). Read the
[full article](https://thedharmalab.com/) for the story behind this project.*
Part of [The Dharma Lab](https://thedharmalab.com). Read the
[full article](https://thedharmalab.com/) for the story behind
this project.

BIN
docs/demo-screenshot.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 298 KiB