Improve README: professional format, demo screenshot, Gemini Embedding 2 focus
Restructured for clarity: table of contents, prerequisites table, quick start section, and embedded screenshot showing actual search results. Title now clearly states Gemini Embedding 2 + Claude Code.
This commit is contained in:
parent
edcd1721df
commit
f959b53cac
3 changed files with 135 additions and 175 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
|
@ -5,3 +5,4 @@ src/
|
|||
package.json
|
||||
package-lock.json
|
||||
tsconfig.json
|
||||
harness-events.jsonl
|
||||
|
|
|
|||
321
README.md
321
README.md
|
|
@ -1,241 +1,200 @@
|
|||
# Build Multimodal Search with Claude Code
|
||||
# Multimodal RAG with Gemini Embedding 2 and Claude Code
|
||||
|
||||
Search across your PDFs, images, and documents using plain English.
|
||||
No coding required. Claude Code builds everything for you.
|
||||
Search across PDFs, images, and documents using plain English.
|
||||
No coding required. Claude Code builds everything from prompts.
|
||||
|
||||
## What you will build
|
||||

|
||||
|
||||
A local search app that lets you ask questions like:
|
||||
> **Gemini Embedding 2** converts text, images, and video into the same
|
||||
> searchable space. **Claude Code** builds the app. **Pinecone** stores the
|
||||
> vectors. You just copy four prompts.
|
||||
|
||||
- "What is the largest planet in our solar system?"
|
||||
- "Show me photos from the first Moon landing"
|
||||
- "Which moon has active volcanoes?"
|
||||
## Table of Contents
|
||||
|
||||
The app searches through your PDFs and images simultaneously and
|
||||
gives you answers with sources. You talk to it in plain English.
|
||||
- [Quick Start](#quick-start)
|
||||
- [What This Does](#what-this-does)
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [Step-by-Step Guide](#step-by-step-guide)
|
||||
- [Example Data](#example-data)
|
||||
- [Why Image Descriptions Matter](#why-image-descriptions-matter)
|
||||
- [Costs](#costs)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
- [How It Works](#how-it-works)
|
||||
- [License](#license)
|
||||
|
||||
## How is this different from a Google search?
|
||||
## Quick Start
|
||||
|
||||
Google searches the internet. This searches YOUR files.
|
||||
|
||||
Imagine you have 500 PDFs, research papers, photos, and notes
|
||||
scattered across folders. Normal file search only matches exact
|
||||
words. This system understands meaning. You ask "what do we know
|
||||
about storms on other planets?" and it finds the Jupiter fact sheet
|
||||
mentioning wind speeds, the Jupiter photograph showing cloud bands,
|
||||
and the solar system overview describing atmospheric composition.
|
||||
|
||||
It connects information across files and formats. That is what
|
||||
makes it powerful.
|
||||
|
||||
## What you need
|
||||
|
||||
1. **Claude Code** (comes with Claude Pro at $20/month or Claude Max)
|
||||
2. **A Google AI Studio account** (free) for Gemini embeddings
|
||||
3. **A Pinecone account** (free tier) for the vector database
|
||||
4. **30-45 minutes** for your first time
|
||||
|
||||
No programming knowledge required. You will copy prompts into
|
||||
Claude Code, and it will build everything.
|
||||
|
||||
## How it works (the simple version)
|
||||
|
||||
```
|
||||
Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone)
|
||||
│
|
||||
Your question ──> Embedding (Gemini) ──> Search ──> Claude answers
|
||||
```bash
|
||||
git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
|
||||
cd multimodal-rag-guide
|
||||
claude
|
||||
```
|
||||
|
||||
1. Your files get converted into "embeddings" (numerical fingerprints
|
||||
that capture meaning)
|
||||
2. When you ask a question, it gets the same treatment
|
||||
3. The system finds fingerprints that match
|
||||
4. Claude reads the matching content and answers your question
|
||||
Then paste the prompt from [`prompts/01-setup.md`](prompts/01-setup.md) into Claude Code.
|
||||
|
||||
For a deeper explanation, see [concepts.md](concepts.md).
|
||||
Four prompts, 30 minutes, working multimodal search.
|
||||
|
||||
## Step 0: Get your accounts (10 minutes)
|
||||
## What This Does
|
||||
|
||||
### Google AI Studio (for embeddings)
|
||||
One search box that understands PDFs, images, and text at the same time.
|
||||
|
||||
Embeddings convert your content into searchable vectors. We use
|
||||
Google's Gemini Embedding 2 for this because it handles text,
|
||||
images, and video.
|
||||
Ask "What is the largest planet in our solar system?" and the system
|
||||
returns the Jupiter fact sheet from a PDF, the Voyager photograph of
|
||||
the Great Red Spot from a JPG, and a confidence score for each result.
|
||||
One question, multiple formats, ranked by meaning.
|
||||
|
||||
This is called Retrieval-Augmented Generation (RAG). Google's
|
||||
Gemini Embedding 2 handles the multimodal part: it converts different
|
||||
content types into the same numerical format so they become searchable
|
||||
together. Claude Code handles the building part: it reads your prompts
|
||||
and writes all the code. You handle neither.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
| Requirement | Cost | What it does |
|
||||
|---|---|---|
|
||||
| [Claude Code](https://claude.ai) | Part of Claude Pro ($20/mo) or Max | Builds the app and answers questions |
|
||||
| [Google AI Studio](https://aistudio.google.com/) | Free tier | Gemini Embedding 2 API key |
|
||||
| [Pinecone](https://www.pinecone.io/) | Free tier | Vector database for storing embeddings |
|
||||
|
||||
No programming knowledge required.
|
||||
|
||||
## Step-by-Step Guide
|
||||
|
||||
### Step 0: Get your API keys (10 minutes)
|
||||
|
||||
**Google AI Studio** (for Gemini Embedding 2):
|
||||
|
||||
1. Go to [aistudio.google.com](https://aistudio.google.com/)
|
||||
2. Sign in with a Google account
|
||||
3. Click "Get API key" in the left sidebar
|
||||
4. Click "Create API key"
|
||||
5. Copy the key somewhere safe
|
||||
4. Click "Create API key" and copy it
|
||||
|
||||
**What is an API key?** It is like a password that lets your app
|
||||
talk to Google's embedding service. You will paste it into a
|
||||
configuration file later. It never leaves your computer.
|
||||
|
||||
### Pinecone (for storing embeddings)
|
||||
|
||||
A vector database stores embeddings so you can search through
|
||||
them. Think of it as a smart filing cabinet.
|
||||
**Pinecone** (for the vector database):
|
||||
|
||||
1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account
|
||||
2. Once in the dashboard, click "Create Index"
|
||||
3. Name it `space-search` (or whatever you like)
|
||||
4. Set dimensions to `3072` (this matches Gemini Embedding 2)
|
||||
5. Choose the `cosine` metric
|
||||
6. Select the free "Starter" plan
|
||||
7. Copy your API key from the "API Keys" section
|
||||
2. In the dashboard, click "Create Index"
|
||||
3. Name it `space-search`, set dimensions to `3072`, choose `cosine` metric
|
||||
4. Select the free "Starter" plan
|
||||
5. Copy your API key from "API Keys"
|
||||
|
||||
### Verify you have Claude Code
|
||||
### Step 1: Clone and start Claude Code (5 minutes)
|
||||
|
||||
Open your terminal and type `claude`. If Claude Code starts,
|
||||
you are ready. If not, install it:
|
||||
|
||||
```
|
||||
npm install -g @anthropic-ai/claude-code
|
||||
```
|
||||
|
||||
You need a Claude Pro or Max subscription for this to work.
|
||||
|
||||
## Step 1: Get the example files
|
||||
|
||||
Clone or download this repository. The `example-data/` folder
|
||||
contains everything you need to get started:
|
||||
|
||||
**PDFs:**
|
||||
- `solar-system-overview.pdf` - Overview of our solar system (NASA)
|
||||
- `jupiter-fact-sheet.pdf` - Detailed data about Jupiter (NASA)
|
||||
- `solar-system-moons.pdf` - Guide to planetary moons (NASA)
|
||||
|
||||
**Images:**
|
||||
- `earthrise.jpg` - Earth seen from lunar orbit, Apollo 8 (1968)
|
||||
- `aldrin-moon.jpg` - Buzz Aldrin on the Moon, Apollo 11 (1969)
|
||||
- `jupiter-great-red-spot.jpg` - Jupiter photographed by Voyager 1 (1979)
|
||||
- `iss-over-earth.jpg` - The Moon seen from the ISS
|
||||
|
||||
**Descriptions:**
|
||||
- `descriptions.md` - Detailed text descriptions of each image.
|
||||
This is the most important file for image search quality.
|
||||
See the section below on why descriptions matter.
|
||||
|
||||
All files are NASA public domain. No copyright restrictions.
|
||||
|
||||
## Step 2: Start Claude Code (5 minutes)
|
||||
|
||||
Open your terminal, navigate to this folder, and start Claude Code:
|
||||
|
||||
```
|
||||
```bash
|
||||
git clone https://git.thedharmalab.com/ktg/multimodal-rag-guide.git
|
||||
cd multimodal-rag-guide
|
||||
claude
|
||||
```
|
||||
|
||||
Then copy the prompt from [prompts/01-setup.md](prompts/01-setup.md)
|
||||
and paste it into Claude Code.
|
||||
Paste the prompt from [`prompts/01-setup.md`](prompts/01-setup.md).
|
||||
Claude Code creates the project structure and installs dependencies.
|
||||
|
||||
Claude Code will create the project structure and install
|
||||
dependencies. When it is done, copy `.env.template` to `.env`
|
||||
and fill in your API keys.
|
||||
When done, copy `env.template` to `.env` and fill in your API keys.
|
||||
|
||||
## Step 3: Ingest your files (10 minutes)
|
||||
### Step 2: Ingest your files (10 minutes)
|
||||
|
||||
Copy the prompt from [prompts/02-ingest.md](prompts/02-ingest.md)
|
||||
into Claude Code.
|
||||
Paste the prompt from [`prompts/02-ingest.md`](prompts/02-ingest.md).
|
||||
|
||||
Claude Code will read each file, split it into chunks, generate
|
||||
embeddings, and store everything in Pinecone. You will see a
|
||||
summary of what was processed.
|
||||
Claude Code reads each file, splits it into chunks, generates
|
||||
embeddings via Gemini Embedding 2, and stores everything in Pinecone.
|
||||
|
||||
This is the step where your files become searchable.
|
||||
### Step 3: Search (5 minutes)
|
||||
|
||||
## Step 4: Search (5 minutes)
|
||||
Paste the prompt from [`prompts/03-search.md`](prompts/03-search.md).
|
||||
|
||||
Copy the prompt from [prompts/03-search.md](prompts/03-search.md)
|
||||
into Claude Code.
|
||||
Claude Code builds a web interface. Open `http://localhost:3333`
|
||||
in your browser and try these searches:
|
||||
|
||||
Claude Code will build a web interface and start it. Open the URL
|
||||
it gives you (usually `http://localhost:3333`) in your browser.
|
||||
|
||||
Try these searches:
|
||||
|
||||
| Search query | What should come back |
|
||||
| Query | Expected results |
|
||||
|---|---|
|
||||
| "What is the largest planet?" | Jupiter fact sheet + Jupiter image |
|
||||
| "First Moon landing" | Aldrin image + solar system overview |
|
||||
| "Which moon has volcanoes?" | Moons PDF (mentioning Io) |
|
||||
| "How far is Jupiter from Earth?" | Jupiter fact sheet (588.5 to 968.1 million km) |
|
||||
| "What do astronauts see from orbit?" | ISS image description |
|
||||
| "Which moon has volcanoes?" | Moons PDF mentioning Io |
|
||||
| "How far is Jupiter from Earth?" | Jupiter fact sheet with exact distance |
|
||||
|
||||
Notice how a single question can pull results from both PDFs and
|
||||
images. That is multimodal search.
|
||||
A single question pulls results from both PDFs and images.
|
||||
|
||||
## Step 5: Make it your own
|
||||
### Step 4: Make it your own
|
||||
|
||||
Now that you have seen it work with NASA files, try it with
|
||||
your own content:
|
||||
Replace the NASA example files with your own content:
|
||||
|
||||
1. Add your own PDFs, images, or documents to the `example-data/` folder
|
||||
2. Write descriptions for any images (see the tips in `descriptions.md`)
|
||||
3. Use [prompts/04-improve.md](prompts/04-improve.md) to re-index
|
||||
1. Add PDFs, images, or documents to `example-data/`
|
||||
2. Write descriptions for images (see [`example-data/descriptions.md`](example-data/descriptions.md))
|
||||
3. Paste [`prompts/04-improve.md`](prompts/04-improve.md) to re-index
|
||||
|
||||
Ideas for what to search:
|
||||
- Your company's internal documents
|
||||
- Research papers for a project
|
||||
- Travel photos with descriptions
|
||||
- Recipe collections
|
||||
- Course notes and textbook screenshots
|
||||
Ideas: company documents, research papers, travel photos,
|
||||
recipe collections, course notes.
|
||||
|
||||
## Why image descriptions matter
|
||||
## Example Data
|
||||
|
||||
The search system cannot "see" your images directly. It finds
|
||||
images through their text descriptions. This means:
|
||||
The `example-data/` folder contains NASA public domain files
|
||||
(no copyright restrictions):
|
||||
|
||||
**Bad description:** "Photo of a planet" will only match
|
||||
searches containing "photo" or "planet."
|
||||
| File | Description |
|
||||
|---|---|
|
||||
| `solar-system-overview.pdf` | Overview of our solar system |
|
||||
| `jupiter-fact-sheet.pdf` | Detailed data about Jupiter |
|
||||
| `solar-system-moons.pdf` | Guide to planetary moons |
|
||||
| `earthrise.jpg` | Earth from lunar orbit, Apollo 8 (1968) |
|
||||
| `aldrin-moon.jpg` | Buzz Aldrin on the Moon, Apollo 11 (1969) |
|
||||
| `jupiter-great-red-spot.jpg` | Jupiter by Voyager 1 (1979) |
|
||||
| `iss-over-earth.jpg` | The Moon seen from the ISS |
|
||||
| `descriptions.md` | Image descriptions for search quality |
|
||||
|
||||
**Good description:** "Full-disk portrait of Jupiter captured by
|
||||
Voyager 1 in 1979, showing horizontal cloud bands and the Great
|
||||
Red Spot, a massive storm larger than Earth" will match searches
|
||||
about Jupiter, Voyager missions, storms, cloud patterns, and more.
|
||||
## Why Image Descriptions Matter
|
||||
|
||||
The `descriptions.md` file in `example-data/` shows side-by-side
|
||||
examples of bad versus good descriptions. Spending five minutes
|
||||
on better descriptions will dramatically improve your search
|
||||
results.
|
||||
The search system finds images through their text descriptions,
|
||||
not by "seeing" them. A description like "Photo of a planet" only
|
||||
matches searches containing those exact concepts. A description
|
||||
like "Full-disk portrait of Jupiter captured by Voyager 1 in 1979,
|
||||
showing horizontal cloud bands and the Great Red Spot" matches
|
||||
searches about Jupiter, Voyager missions, storms, and cloud patterns.
|
||||
|
||||
## What this costs
|
||||
See [`example-data/descriptions.md`](example-data/descriptions.md)
|
||||
for side-by-side examples.
|
||||
|
||||
$0 extra if you already have a Claude subscription.
|
||||
Both Gemini embeddings and Pinecone have generous free tiers.
|
||||
## Costs
|
||||
|
||||
See [costs.md](costs.md) for details.
|
||||
$0 extra if you already have a Claude subscription. Both Gemini
|
||||
Embedding 2 and Pinecone have free tiers that cover this guide
|
||||
and well beyond.
|
||||
|
||||
## If you get stuck
|
||||
See [costs.md](costs.md) for the full breakdown.
|
||||
|
||||
See [troubleshooting.md](troubleshooting.md) for the 10 most
|
||||
common problems and their solutions.
|
||||
## Troubleshooting
|
||||
|
||||
The most effective fix for almost anything: copy the exact error
|
||||
message and paste it into Claude Code. It is very good at
|
||||
diagnosing its own work.
|
||||
See [troubleshooting.md](troubleshooting.md) for the 10 most common
|
||||
problems. The most effective fix for almost anything: copy the exact
|
||||
error message and paste it into Claude Code.
|
||||
|
||||
## How it works (the deeper version)
|
||||
## How It Works
|
||||
|
||||
Read [concepts.md](concepts.md) for plain-English explanations of:
|
||||
- What are embeddings?
|
||||
- What is a vector database?
|
||||
- What is RAG?
|
||||
- What is chunking?
|
||||
- What does "multimodal" mean?
|
||||
```
|
||||
Your files --> Chunking --> Gemini Embedding 2 --> Pinecone (vector DB)
|
||||
|
|
||||
Your question --> Gemini Embedding 2 --> Search --> Claude answers
|
||||
```
|
||||
|
||||
## Credits
|
||||
Gemini Embedding 2 converts all content types (text, images, video,
|
||||
audio) into numerical vectors in one shared space. Pinecone stores
|
||||
and searches those vectors. Claude reads the matching content and
|
||||
generates answers.
|
||||
|
||||
Example data: All PDFs and images are from NASA and are in the
|
||||
public domain (U.S. Government works, no copyright restrictions).
|
||||
For plain-English explanations of embeddings, vector databases, RAG,
|
||||
and chunking, see [concepts.md](concepts.md).
|
||||
|
||||
Built with:
|
||||
- [Claude Code](https://claude.ai) by Anthropic (app building + AI answers)
|
||||
- [Gemini Embedding 2](https://ai.google.dev/) by Google (multimodal embeddings)
|
||||
- [Pinecone](https://www.pinecone.io/) (vector database)
|
||||
## Built With
|
||||
|
||||
- [Claude Code](https://claude.ai) by Anthropic
|
||||
- [Gemini Embedding 2](https://ai.google.dev/) by Google
|
||||
- [Pinecone](https://www.pinecone.io/)
|
||||
|
||||
## License
|
||||
|
||||
[MIT](LICENSE)
|
||||
|
||||
---
|
||||
|
||||
*Part of [The Dharma Lab](https://thedharmalab.com). Read the
|
||||
[full article](https://thedharmalab.com/) for the story behind this project.*
|
||||
Part of [The Dharma Lab](https://thedharmalab.com). Read the
|
||||
[full article](https://thedharmalab.com/) for the story behind
|
||||
this project.
|
||||
|
|
|
|||
BIN
docs/demo-screenshot.png
Normal file
BIN
docs/demo-screenshot.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 298 KiB |
Loading…
Add table
Add a link
Reference in a new issue