Improve README: professional format, demo screenshot, Gemini Embedding 2 focus

Restructured for clarity: table of contents, prerequisites table, quick start section, and embedded screenshot showing actual search results. Title now clearly states Gemini Embedding 2 + Claude Code.
2026-03-12 16:44:19 +01:00 · 2026-03-12 16:44:19 +01:00 · f959b53cac
commit f959b53cac
parent edcd1721df
3 changed files with 135 additions and 175 deletions
--- a/.gitignore
+++ b/.gitignore
@ -5,3 +5,4 @@ src/
 package.json
 package-lock.json
 tsconfig.json
+harness-events.jsonl
--- a/README.md
+++ b/README.md
@ -1,241 +1,200 @@
-# Build Multimodal Search with Claude Code
+# Multimodal RAG with Gemini Embedding 2 and Claude Code

-Search across your PDFs, images, and documents using plain English.
-No coding required. Claude Code builds everything for you.
+Search across PDFs, images, and documents using plain English.
+No coding required. Claude Code builds everything from prompts.

-## What you will build
+![Search for "What is the largest planet?" returns both the Jupiter photograph and the PDF fact sheet](docs/demo-screenshot.png)

-A local search app that lets you ask questions like:
+> **Gemini Embedding 2** converts text, images, and video into the same
+> searchable space. **Claude Code** builds the app. **Pinecone** stores the
+> vectors. You just copy four prompts.

- "What is the largest planet in our solar system?"
- "Show me photos from the first Moon landing"
- "Which moon has active volcanoes?"
+## Table of Contents

-The app searches through your PDFs and images simultaneously and
-gives you answers with sources. You talk to it in plain English.
+- [Quick Start](#quick-start)
+- [What This Does](#what-this-does)
+- [Prerequisites](#prerequisites)
+- [Step-by-Step Guide](#step-by-step-guide)
+- [Example Data](#example-data)
+- [Why Image Descriptions Matter](#why-image-descriptions-matter)
+- [Costs](#costs)
+- [Troubleshooting](#troubleshooting)
+- [How It Works](#how-it-works)
+- [License](#license)

-## How is this different from a Google search?
+## Quick Start

-Google searches the internet. This searches YOUR files.
-
-Imagine you have 500 PDFs, research papers, photos, and notes
-scattered across folders. Normal file search only matches exact
-words. This system understands meaning. You ask "what do we know
-about storms on other planets?" and it finds the Jupiter fact sheet
-mentioning wind speeds, the Jupiter photograph showing cloud bands,
-and the solar system overview describing atmospheric composition.
-
-It connects information across files and formats. That is what
-makes it powerful.
-
-## What you need
-
-1. **Claude Code** (comes with Claude Pro at $20/month or Claude Max)
-2. **A Google AI Studio account** (free) for Gemini embeddings
-3. **A Pinecone account** (free tier) for the vector database
-4. **30-45 minutes** for your first time
-
-No programming knowledge required. You will copy prompts into
-Claude Code, and it will build everything.
-
-## How it works (the simple version)
-
-```
-Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone)
-                                              │
-Your question ──> Embedding (Gemini) ──> Search ──> Claude answers
+```bash
+git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
+cd multimodal-rag-guide
+claude
 ```

-1. Your files get converted into "embeddings" (numerical fingerprints
-   that capture meaning)
-2. When you ask a question, it gets the same treatment
-3. The system finds fingerprints that match
-4. Claude reads the matching content and answers your question
+Then paste the prompt from [`prompts/01-setup.md`](prompts/01-setup.md) into Claude Code.

-For a deeper explanation, see [concepts.md](concepts.md).
+Four prompts, 30 minutes, working multimodal search.

-## Step 0: Get your accounts (10 minutes)
+## What This Does

-### Google AI Studio (for embeddings)
+One search box that understands PDFs, images, and text at the same time.

-Embeddings convert your content into searchable vectors. We use
-Google's Gemini Embedding 2 for this because it handles text,
-images, and video.
+Ask "What is the largest planet in our solar system?" and the system
+returns the Jupiter fact sheet from a PDF, the Voyager photograph of
+the Great Red Spot from a JPG, and a confidence score for each result.
+One question, multiple formats, ranked by meaning.
+
+This is called Retrieval-Augmented Generation (RAG). Google's
+Gemini Embedding 2 handles the multimodal part: it converts different
+content types into the same numerical format so they become searchable
+together. Claude Code handles the building part: it reads your prompts
+and writes all the code. You handle neither.
+
+## Prerequisites
+
+| Requirement | Cost | What it does |
+|---|---|---|
+| [Claude Code](https://claude.ai) | Part of Claude Pro ($20/mo) or Max | Builds the app and answers questions |
+| [Google AI Studio](https://aistudio.google.com/) | Free tier | Gemini Embedding 2 API key |
+| [Pinecone](https://www.pinecone.io/) | Free tier | Vector database for storing embeddings |
+
+No programming knowledge required.
+
+## Step-by-Step Guide
+
+### Step 0: Get your API keys (10 minutes)
+
+**Google AI Studio** (for Gemini Embedding 2):

 1. Go to [aistudio.google.com](https://aistudio.google.com/)
 2. Sign in with a Google account
 3. Click "Get API key" in the left sidebar
-4. Click "Create API key"
-5. Copy the key somewhere safe
+4. Click "Create API key" and copy it

-**What is an API key?** It is like a password that lets your app
-talk to Google's embedding service. You will paste it into a
-configuration file later. It never leaves your computer.
-
-### Pinecone (for storing embeddings)
-
-A vector database stores embeddings so you can search through
-them. Think of it as a smart filing cabinet.
+**Pinecone** (for the vector database):

 1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account
-2. Once in the dashboard, click "Create Index"
-3. Name it `space-search` (or whatever you like)
-4. Set dimensions to `3072` (this matches Gemini Embedding 2)
-5. Choose the `cosine` metric
-6. Select the free "Starter" plan
-7. Copy your API key from the "API Keys" section
+2. In the dashboard, click "Create Index"
+3. Name it `space-search`, set dimensions to `3072`, choose `cosine` metric
+4. Select the free "Starter" plan
+5. Copy your API key from "API Keys"

-### Verify you have Claude Code
+### Step 1: Clone and start Claude Code (5 minutes)

-Open your terminal and type `claude`. If Claude Code starts,
-you are ready. If not, install it:
-
-```
-npm install -g @anthropic-ai/claude-code
-```
-
-You need a Claude Pro or Max subscription for this to work.
-
-## Step 1: Get the example files
-
-Clone or download this repository. The `example-data/` folder
-contains everything you need to get started:
-
-**PDFs:**
- `solar-system-overview.pdf` - Overview of our solar system (NASA)
- `jupiter-fact-sheet.pdf` - Detailed data about Jupiter (NASA)
- `solar-system-moons.pdf` - Guide to planetary moons (NASA)
-
-**Images:**
- `earthrise.jpg` - Earth seen from lunar orbit, Apollo 8 (1968)
- `aldrin-moon.jpg` - Buzz Aldrin on the Moon, Apollo 11 (1969)
- `jupiter-great-red-spot.jpg` - Jupiter photographed by Voyager 1 (1979)
- `iss-over-earth.jpg` - The Moon seen from the ISS
-
-**Descriptions:**
- `descriptions.md` - Detailed text descriptions of each image.
-  This is the most important file for image search quality.
-  See the section below on why descriptions matter.
-
-All files are NASA public domain. No copyright restrictions.
-
-## Step 2: Start Claude Code (5 minutes)
-
-Open your terminal, navigate to this folder, and start Claude Code:
-
-```
+```bash
+git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
+cd multimodal-rag-guide
 claude
 ```

-Then copy the prompt from [prompts/01-setup.md](prompts/01-setup.md)
-and paste it into Claude Code.
+Paste the prompt from [`prompts/01-setup.md`](prompts/01-setup.md).
+Claude Code creates the project structure and installs dependencies.

-Claude Code will create the project structure and install
-dependencies. When it is done, copy `.env.template` to `.env`
-and fill in your API keys.
+When done, copy `env.template` to `.env` and fill in your API keys.

-## Step 3: Ingest your files (10 minutes)
+### Step 2: Ingest your files (10 minutes)

-Copy the prompt from [prompts/02-ingest.md](prompts/02-ingest.md)
-into Claude Code.
+Paste the prompt from [`prompts/02-ingest.md`](prompts/02-ingest.md).

-Claude Code will read each file, split it into chunks, generate
-embeddings, and store everything in Pinecone. You will see a
-summary of what was processed.
+Claude Code reads each file, splits it into chunks, generates
+embeddings via Gemini Embedding 2, and stores everything in Pinecone.

-This is the step where your files become searchable.
+### Step 3: Search (5 minutes)

-## Step 4: Search (5 minutes)
+Paste the prompt from [`prompts/03-search.md`](prompts/03-search.md).

-Copy the prompt from [prompts/03-search.md](prompts/03-search.md)
-into Claude Code.
+Claude Code builds a web interface. Open `http://localhost:3333`
+in your browser and try these searches:

-Claude Code will build a web interface and start it. Open the URL
-it gives you (usually `http://localhost:3333`) in your browser.
-
-Try these searches:
-
-| Search query | What should come back |
+| Query | Expected results |
 |---|---|
 | "What is the largest planet?" | Jupiter fact sheet + Jupiter image |
 | "First Moon landing" | Aldrin image + solar system overview |
-| "Which moon has volcanoes?" | Moons PDF (mentioning Io) |
-| "How far is Jupiter from Earth?" | Jupiter fact sheet (588.5 to 968.1 million km) |
-| "What do astronauts see from orbit?" | ISS image description |
+| "Which moon has volcanoes?" | Moons PDF mentioning Io |
+| "How far is Jupiter from Earth?" | Jupiter fact sheet with exact distance |

-Notice how a single question can pull results from both PDFs and
-images. That is multimodal search.
+A single question pulls results from both PDFs and images.

-## Step 5: Make it your own
+### Step 4: Make it your own

-Now that you have seen it work with NASA files, try it with
-your own content:
+Replace the NASA example files with your own content:

-1. Add your own PDFs, images, or documents to the `example-data/` folder
-2. Write descriptions for any images (see the tips in `descriptions.md`)
-3. Use [prompts/04-improve.md](prompts/04-improve.md) to re-index
+1. Add PDFs, images, or documents to `example-data/`
+2. Write descriptions for images (see [`example-data/descriptions.md`](example-data/descriptions.md))
+3. Paste [`prompts/04-improve.md`](prompts/04-improve.md) to re-index

-Ideas for what to search:
- Your company's internal documents
- Research papers for a project
- Travel photos with descriptions
- Recipe collections
- Course notes and textbook screenshots
+Ideas: company documents, research papers, travel photos,
+recipe collections, course notes.

-## Why image descriptions matter
+## Example Data

-The search system cannot "see" your images directly. It finds
-images through their text descriptions. This means:
+The `example-data/` folder contains NASA public domain files
+(no copyright restrictions):

-**Bad description:** "Photo of a planet" will only match
-searches containing "photo" or "planet."
+| File | Description |
+|---|---|
+| `solar-system-overview.pdf` | Overview of our solar system |
+| `jupiter-fact-sheet.pdf` | Detailed data about Jupiter |
+| `solar-system-moons.pdf` | Guide to planetary moons |
+| `earthrise.jpg` | Earth from lunar orbit, Apollo 8 (1968) |
+| `aldrin-moon.jpg` | Buzz Aldrin on the Moon, Apollo 11 (1969) |
+| `jupiter-great-red-spot.jpg` | Jupiter by Voyager 1 (1979) |
+| `iss-over-earth.jpg` | The Moon seen from the ISS |
+| `descriptions.md` | Image descriptions for search quality |

-**Good description:** "Full-disk portrait of Jupiter captured by
-Voyager 1 in 1979, showing horizontal cloud bands and the Great
-Red Spot, a massive storm larger than Earth" will match searches
-about Jupiter, Voyager missions, storms, cloud patterns, and more.
+## Why Image Descriptions Matter

-The `descriptions.md` file in `example-data/` shows side-by-side
-examples of bad versus good descriptions. Spending five minutes
-on better descriptions will dramatically improve your search
-results.
+The search system finds images through their text descriptions,
+not by "seeing" them. A description like "Photo of a planet" only
+matches searches containing those exact concepts. A description
+like "Full-disk portrait of Jupiter captured by Voyager 1 in 1979,
+showing horizontal cloud bands and the Great Red Spot" matches
+searches about Jupiter, Voyager missions, storms, and cloud patterns.

-## What this costs
+See [`example-data/descriptions.md`](example-data/descriptions.md)
+for side-by-side examples.

-$0 extra if you already have a Claude subscription.
-Both Gemini embeddings and Pinecone have generous free tiers.
+## Costs

-See [costs.md](costs.md) for details.
+$0 extra if you already have a Claude subscription. Both Gemini
+Embedding 2 and Pinecone have free tiers that cover this guide
+and well beyond.

-## If you get stuck
+See [costs.md](costs.md) for the full breakdown.

-See [troubleshooting.md](troubleshooting.md) for the 10 most
-common problems and their solutions.
+## Troubleshooting

-The most effective fix for almost anything: copy the exact error
-message and paste it into Claude Code. It is very good at
-diagnosing its own work.
+See [troubleshooting.md](troubleshooting.md) for the 10 most common
+problems. The most effective fix for almost anything: copy the exact
+error message and paste it into Claude Code.

-## How it works (the deeper version)
+## How It Works

-Read [concepts.md](concepts.md) for plain-English explanations of:
- What are embeddings?
- What is a vector database?
- What is RAG?
- What is chunking?
- What does "multimodal" mean?
+```
+Your files --> Chunking --> Gemini Embedding 2 --> Pinecone (vector DB)
+                                                        |
+Your question --> Gemini Embedding 2 --> Search --> Claude answers
+```

-## Credits
+Gemini Embedding 2 converts all content types (text, images, video,
+audio) into numerical vectors in one shared space. Pinecone stores
+and searches those vectors. Claude reads the matching content and
+generates answers.

-Example data: All PDFs and images are from NASA and are in the
-public domain (U.S. Government works, no copyright restrictions).
+For plain-English explanations of embeddings, vector databases, RAG,
+and chunking, see [concepts.md](concepts.md).

-Built with:
- [Claude Code](https://claude.ai) by Anthropic (app building + AI answers)
- [Gemini Embedding 2](https://ai.google.dev/) by Google (multimodal embeddings)
- [Pinecone](https://www.pinecone.io/) (vector database)
+## Built With
+
+- [Claude Code](https://claude.ai) by Anthropic
+- [Gemini Embedding 2](https://ai.google.dev/) by Google
+- [Pinecone](https://www.pinecone.io/)
+
+## License
+
+[MIT](LICENSE)

 ---

-*Part of [The Dharma Lab](https://thedharmalab.com). Read the
-[full article](https://thedharmalab.com/) for the story behind this project.*
+Part of [The Dharma Lab](https://thedharmalab.com). Read the
+[full article](https://thedharmalab.com/) for the story behind
+this project.
--- a/docs/demo-screenshot.png
+++ b/docs/demo-screenshot.png