Initial commit: multimodal RAG guide with Claude Code
Prompt-driven guide for building multimodal search using Gemini Embedding 2 + Pinecone + Claude Code. Includes example data (NASA public domain), step-by-step prompts, concepts explainer, cost breakdown, and troubleshooting guide.
This commit is contained in:
commit
edcd1721df
19 changed files with 4446 additions and 0 deletions
7
.gitignore
vendored
Normal file
7
.gitignore
vendored
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
node_modules/
|
||||
.env
|
||||
dist/
|
||||
src/
|
||||
package.json
|
||||
package-lock.json
|
||||
tsconfig.json
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
MIT License
|
||||
|
||||
Copyright (c) 2026 The Dharma Lab
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
241
README.md
Normal file
241
README.md
Normal file
|
|
@ -0,0 +1,241 @@
|
|||
# Build Multimodal Search with Claude Code
|
||||
|
||||
Search across your PDFs, images, and documents using plain English.
|
||||
No coding required. Claude Code builds everything for you.
|
||||
|
||||
## What you will build
|
||||
|
||||
A local search app that lets you ask questions like:
|
||||
|
||||
- "What is the largest planet in our solar system?"
|
||||
- "Show me photos from the first Moon landing"
|
||||
- "Which moon has active volcanoes?"
|
||||
|
||||
The app searches through your PDFs and images simultaneously and
|
||||
gives you answers with sources. You talk to it in plain English.
|
||||
|
||||
## How is this different from a Google search?
|
||||
|
||||
Google searches the internet. This searches YOUR files.
|
||||
|
||||
Imagine you have 500 PDFs, research papers, photos, and notes
|
||||
scattered across folders. Normal file search only matches exact
|
||||
words. This system understands meaning. You ask "what do we know
|
||||
about storms on other planets?" and it finds the Jupiter fact sheet
|
||||
mentioning wind speeds, the Jupiter photograph showing cloud bands,
|
||||
and the solar system overview describing atmospheric composition.
|
||||
|
||||
It connects information across files and formats. That is what
|
||||
makes it powerful.
|
||||
|
||||
## What you need
|
||||
|
||||
1. **Claude Code** (comes with Claude Pro at $20/month or Claude Max)
|
||||
2. **A Google AI Studio account** (free) for Gemini embeddings
|
||||
3. **A Pinecone account** (free tier) for the vector database
|
||||
4. **30-45 minutes** for your first time
|
||||
|
||||
No programming knowledge required. You will copy prompts into
|
||||
Claude Code, and it will build everything.
|
||||
|
||||
## How it works (the simple version)
|
||||
|
||||
```
|
||||
Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone)
|
||||
│
|
||||
Your question ──> Embedding (Gemini) ──> Search ──> Claude answers
|
||||
```
|
||||
|
||||
1. Your files get converted into "embeddings" (numerical fingerprints
|
||||
that capture meaning)
|
||||
2. When you ask a question, it gets the same treatment
|
||||
3. The system finds fingerprints that match
|
||||
4. Claude reads the matching content and answers your question
|
||||
|
||||
For a deeper explanation, see [concepts.md](concepts.md).
|
||||
|
||||
## Step 0: Get your accounts (10 minutes)
|
||||
|
||||
### Google AI Studio (for embeddings)
|
||||
|
||||
Embeddings convert your content into searchable vectors. We use
|
||||
Google's Gemini Embedding 2 for this because it handles text,
|
||||
images, and video.
|
||||
|
||||
1. Go to [aistudio.google.com](https://aistudio.google.com/)
|
||||
2. Sign in with a Google account
|
||||
3. Click "Get API key" in the left sidebar
|
||||
4. Click "Create API key"
|
||||
5. Copy the key somewhere safe
|
||||
|
||||
**What is an API key?** It is like a password that lets your app
|
||||
talk to Google's embedding service. You will paste it into a
|
||||
configuration file later. It never leaves your computer.
|
||||
|
||||
### Pinecone (for storing embeddings)
|
||||
|
||||
A vector database stores embeddings so you can search through
|
||||
them. Think of it as a smart filing cabinet.
|
||||
|
||||
1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account
|
||||
2. Once in the dashboard, click "Create Index"
|
||||
3. Name it `space-search` (or whatever you like)
|
||||
4. Set dimensions to `3072` (this matches Gemini Embedding 2)
|
||||
5. Choose the `cosine` metric
|
||||
6. Select the free "Starter" plan
|
||||
7. Copy your API key from the "API Keys" section
|
||||
|
||||
### Verify you have Claude Code
|
||||
|
||||
Open your terminal and type `claude`. If Claude Code starts,
|
||||
you are ready. If not, install it:
|
||||
|
||||
```
|
||||
npm install -g @anthropic-ai/claude-code
|
||||
```
|
||||
|
||||
You need a Claude Pro or Max subscription for this to work.
|
||||
|
||||
## Step 1: Get the example files
|
||||
|
||||
Clone or download this repository. The `example-data/` folder
|
||||
contains everything you need to get started:
|
||||
|
||||
**PDFs:**
|
||||
- `solar-system-overview.pdf` - Overview of our solar system (NASA)
|
||||
- `jupiter-fact-sheet.pdf` - Detailed data about Jupiter (NASA)
|
||||
- `solar-system-moons.pdf` - Guide to planetary moons (NASA)
|
||||
|
||||
**Images:**
|
||||
- `earthrise.jpg` - Earth seen from lunar orbit, Apollo 8 (1968)
|
||||
- `aldrin-moon.jpg` - Buzz Aldrin on the Moon, Apollo 11 (1969)
|
||||
- `jupiter-great-red-spot.jpg` - Jupiter photographed by Voyager 1 (1979)
|
||||
- `iss-over-earth.jpg` - The Moon seen from the ISS
|
||||
|
||||
**Descriptions:**
|
||||
- `descriptions.md` - Detailed text descriptions of each image.
|
||||
This is the most important file for image search quality.
|
||||
See the section below on why descriptions matter.
|
||||
|
||||
All files are NASA public domain. No copyright restrictions.
|
||||
|
||||
## Step 2: Start Claude Code (5 minutes)
|
||||
|
||||
Open your terminal, navigate to this folder, and start Claude Code:
|
||||
|
||||
```
|
||||
claude
|
||||
```
|
||||
|
||||
Then copy the prompt from [prompts/01-setup.md](prompts/01-setup.md)
|
||||
and paste it into Claude Code.
|
||||
|
||||
Claude Code will create the project structure and install
|
||||
dependencies. When it is done, copy `.env.template` to `.env`
|
||||
and fill in your API keys.
|
||||
|
||||
## Step 3: Ingest your files (10 minutes)
|
||||
|
||||
Copy the prompt from [prompts/02-ingest.md](prompts/02-ingest.md)
|
||||
into Claude Code.
|
||||
|
||||
Claude Code will read each file, split it into chunks, generate
|
||||
embeddings, and store everything in Pinecone. You will see a
|
||||
summary of what was processed.
|
||||
|
||||
This is the step where your files become searchable.
|
||||
|
||||
## Step 4: Search (5 minutes)
|
||||
|
||||
Copy the prompt from [prompts/03-search.md](prompts/03-search.md)
|
||||
into Claude Code.
|
||||
|
||||
Claude Code will build a web interface and start it. Open the URL
|
||||
it gives you (usually `http://localhost:3333`) in your browser.
|
||||
|
||||
Try these searches:
|
||||
|
||||
| Search query | What should come back |
|
||||
|---|---|
|
||||
| "What is the largest planet?" | Jupiter fact sheet + Jupiter image |
|
||||
| "First Moon landing" | Aldrin image + solar system overview |
|
||||
| "Which moon has volcanoes?" | Moons PDF (mentioning Io) |
|
||||
| "How far is Jupiter from Earth?" | Jupiter fact sheet (588.5 to 968.1 million km) |
|
||||
| "What do astronauts see from orbit?" | ISS image description |
|
||||
|
||||
Notice how a single question can pull results from both PDFs and
|
||||
images. That is multimodal search.
|
||||
|
||||
## Step 5: Make it your own
|
||||
|
||||
Now that you have seen it work with NASA files, try it with
|
||||
your own content:
|
||||
|
||||
1. Add your own PDFs, images, or documents to the `example-data/` folder
|
||||
2. Write descriptions for any images (see the tips in `descriptions.md`)
|
||||
3. Use [prompts/04-improve.md](prompts/04-improve.md) to re-index
|
||||
|
||||
Ideas for what to search:
|
||||
- Your company's internal documents
|
||||
- Research papers for a project
|
||||
- Travel photos with descriptions
|
||||
- Recipe collections
|
||||
- Course notes and textbook screenshots
|
||||
|
||||
## Why image descriptions matter
|
||||
|
||||
The search system cannot "see" your images directly. It finds
|
||||
images through their text descriptions. This means:
|
||||
|
||||
**Bad description:** "Photo of a planet" will only match
|
||||
searches containing "photo" or "planet."
|
||||
|
||||
**Good description:** "Full-disk portrait of Jupiter captured by
|
||||
Voyager 1 in 1979, showing horizontal cloud bands and the Great
|
||||
Red Spot, a massive storm larger than Earth" will match searches
|
||||
about Jupiter, Voyager missions, storms, cloud patterns, and more.
|
||||
|
||||
The `descriptions.md` file in `example-data/` shows side-by-side
|
||||
examples of bad versus good descriptions. Spending five minutes
|
||||
on better descriptions will dramatically improve your search
|
||||
results.
|
||||
|
||||
## What this costs
|
||||
|
||||
$0 extra if you already have a Claude subscription.
|
||||
Both Gemini embeddings and Pinecone have generous free tiers.
|
||||
|
||||
See [costs.md](costs.md) for details.
|
||||
|
||||
## If you get stuck
|
||||
|
||||
See [troubleshooting.md](troubleshooting.md) for the 10 most
|
||||
common problems and their solutions.
|
||||
|
||||
The most effective fix for almost anything: copy the exact error
|
||||
message and paste it into Claude Code. It is very good at
|
||||
diagnosing its own work.
|
||||
|
||||
## How it works (the deeper version)
|
||||
|
||||
Read [concepts.md](concepts.md) for plain-English explanations of:
|
||||
- What are embeddings?
|
||||
- What is a vector database?
|
||||
- What is RAG?
|
||||
- What is chunking?
|
||||
- What does "multimodal" mean?
|
||||
|
||||
## Credits
|
||||
|
||||
Example data: All PDFs and images are from NASA and are in the
|
||||
public domain (U.S. Government works, no copyright restrictions).
|
||||
|
||||
Built with:
|
||||
- [Claude Code](https://claude.ai) by Anthropic (app building + AI answers)
|
||||
- [Gemini Embedding 2](https://ai.google.dev/) by Google (multimodal embeddings)
|
||||
- [Pinecone](https://www.pinecone.io/) (vector database)
|
||||
|
||||
---
|
||||
|
||||
*Part of [The Dharma Lab](https://thedharmalab.com). Read the
|
||||
[full article](https://thedharmalab.com/) for the story behind this project.*
|
||||
94
concepts.md
Normal file
94
concepts.md
Normal file
|
|
@ -0,0 +1,94 @@
|
|||
# Concepts: What You Need to Know (and Nothing More)
|
||||
|
||||
This page explains the key ideas behind multimodal search.
|
||||
You do not need to understand these concepts to follow the guide.
|
||||
But if you are curious about what is happening behind the scenes,
|
||||
this is for you.
|
||||
|
||||
## What is an embedding?
|
||||
|
||||
Think of it as a fingerprint for meaning.
|
||||
|
||||
When you read the sentence "Jupiter is the largest planet," your brain
|
||||
understands what it means. An embedding is a way for a computer to do
|
||||
something similar. It converts text (or an image) into a long list of
|
||||
numbers that captures the meaning of that content.
|
||||
|
||||
The key insight: things that mean similar things get similar numbers.
|
||||
So "Jupiter is massive" and "Jupiter is the biggest planet" would have
|
||||
very similar embeddings, even though the words are different.
|
||||
|
||||
You never see these numbers. They work behind the scenes.
|
||||
|
||||
## What is a vector database?
|
||||
|
||||
A place to store embeddings so you can search through them quickly.
|
||||
|
||||
Imagine a library where books are not organized by author or title,
|
||||
but by what they are about. You walk in and say "I want something
|
||||
about storms on other planets" and the librarian immediately hands
|
||||
you the right book. That is what a vector database does, but with
|
||||
your files.
|
||||
|
||||
We use Pinecone in this guide because it has a free tier and works
|
||||
well. There are other options (Chroma, Weaviate, Qdrant), but
|
||||
Pinecone requires the least setup.
|
||||
|
||||
## What is RAG?
|
||||
|
||||
RAG stands for Retrieval-Augmented Generation. Big name, simple idea.
|
||||
|
||||
Normally, when you ask an AI a question, it answers from its training
|
||||
data. It might know general facts, but it does not know about YOUR
|
||||
files. RAG changes that.
|
||||
|
||||
With RAG, the AI first searches through your documents to find
|
||||
relevant information, then uses what it found to answer your question.
|
||||
It is like giving the AI a cheat sheet of your own content before
|
||||
it answers.
|
||||
|
||||
Without RAG: "What do we know about Jupiter's atmosphere?"
|
||||
The AI answers from general knowledge.
|
||||
|
||||
With RAG: "What do we know about Jupiter's atmosphere?"
|
||||
The AI searches your PDFs and images, finds the Jupiter fact sheet
|
||||
and the Voyager photo, and answers based on YOUR specific collection.
|
||||
|
||||
## What is chunking?
|
||||
|
||||
Your documents might be long. A 50-page PDF cannot be processed
|
||||
as one piece. Chunking means splitting it into smaller sections
|
||||
that the AI can work with.
|
||||
|
||||
Think of it like cutting a book into chapters. Each chapter gets
|
||||
its own embedding. When you search, the system finds the right
|
||||
chapter, not the whole book.
|
||||
|
||||
Claude Code handles chunking automatically. You do not need to
|
||||
do anything.
|
||||
|
||||
## What does "multimodal" mean?
|
||||
|
||||
"Multi" means many. "Modal" means types.
|
||||
|
||||
Regular search works with text only. Multimodal search works with
|
||||
text AND images AND PDFs AND videos. You can search across all
|
||||
of them at once.
|
||||
|
||||
This is what makes this project interesting. You ask a question
|
||||
in plain English, and the system searches through your PDFs,
|
||||
images, and their descriptions to find the best answer, regardless
|
||||
of what format the information is in.
|
||||
|
||||
## How does it all fit together?
|
||||
|
||||
1. You put files in a folder (PDFs, images with descriptions)
|
||||
2. Claude Code builds a system that reads each file
|
||||
3. Each piece of content gets converted to an embedding (a fingerprint)
|
||||
4. The embeddings are stored in Pinecone (the vector database)
|
||||
5. When you search, your question also gets converted to an embedding
|
||||
6. Pinecone finds the stored embeddings most similar to your question
|
||||
7. The matching content is shown to you (or fed to an AI for a detailed answer)
|
||||
|
||||
That is it. The rest is implementation details, and Claude Code
|
||||
handles those for you.
|
||||
67
costs.md
Normal file
67
costs.md
Normal file
|
|
@ -0,0 +1,67 @@
|
|||
# What Does This Cost?
|
||||
|
||||
Short answer: you can do this entire guide for free or near-free.
|
||||
|
||||
## Claude Code
|
||||
|
||||
You need a Claude subscription that includes Claude Code.
|
||||
|
||||
- **Claude Pro ($20/month):** Includes Claude Code with usage limits.
|
||||
- **Claude Max ($100/month or $200/month):** Higher limits.
|
||||
|
||||
If you already have a Claude subscription, there is no extra cost.
|
||||
Claude Code handles building the app AND answering questions
|
||||
based on your search results.
|
||||
|
||||
## Google Gemini Embedding 2 (for embeddings only)
|
||||
|
||||
We use Google's Gemini Embedding 2 to convert your content into
|
||||
searchable embeddings. We do NOT use Gemini as a language model.
|
||||
Claude does the thinking. Gemini just creates the fingerprints.
|
||||
|
||||
- **Free tier:** Available through Google AI Studio with rate limits
|
||||
(1,500 requests per day)
|
||||
- **Paid:** $0.20 per million tokens
|
||||
|
||||
For this guide with 7 example files, you will use roughly 10-20
|
||||
requests total. The free tier is more than enough.
|
||||
|
||||
Why Gemini Embedding 2 and not Voyage (Anthropic's embeddings)?
|
||||
Because Gemini Embedding 2 supports text, images, AND video natively.
|
||||
Voyage supports text and images but not video. For a multimodal guide,
|
||||
the broadest format support wins.
|
||||
|
||||
## Pinecone Vector Database (free tier)
|
||||
|
||||
Pinecone's free tier includes:
|
||||
|
||||
- **Free:** 1 project, 5 indexes, 100,000 vectors
|
||||
- **No credit card required**
|
||||
|
||||
For this guide, we store about 15-30 vectors (one per chunk of
|
||||
content). You could store thousands of documents and still stay
|
||||
within the free tier.
|
||||
|
||||
## Total cost for this guide
|
||||
|
||||
| Service | What it does | Cost |
|
||||
|---------|-------------|------|
|
||||
| Claude Code | Builds the app, answers questions | Part of your subscription |
|
||||
| Gemini Embedding 2 | Converts content to searchable vectors | Free (well within free tier) |
|
||||
| Pinecone | Stores and searches vectors | Free (well within free tier) |
|
||||
| **Total** | | **$0 extra** |
|
||||
|
||||
## What if you want to scale up?
|
||||
|
||||
If you move beyond the example and want to index thousands of
|
||||
documents, here is what the costs look like:
|
||||
|
||||
| Scale | Gemini embeddings | Pinecone | Monthly total |
|
||||
|-------|-------------------|----------|---------------|
|
||||
| 100 documents | Free | Free | $0 |
|
||||
| 1,000 documents | ~$0.50 | Free | ~$0.50 |
|
||||
| 10,000 documents | ~$5 | Free | ~$5 |
|
||||
| 100,000 documents | ~$50 | $70+ (Starter plan) | ~$120 |
|
||||
|
||||
For most personal and small business use cases, you will stay
|
||||
comfortably in the free tier.
|
||||
10
env.template
Normal file
10
env.template
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
# Gemini Embedding 2 (from Google AI Studio)
|
||||
# Get your key at: https://aistudio.google.com/ > "Get API key"
|
||||
GOOGLE_API_KEY=
|
||||
|
||||
# Pinecone Vector Database
|
||||
# Get your key at: https://app.pinecone.io/ > "API Keys"
|
||||
PINECONE_API_KEY=
|
||||
|
||||
# Pinecone index name (the one you created in the dashboard)
|
||||
PINECONE_INDEX=space-search
|
||||
BIN
example-data/aldrin-moon.jpg
Normal file
BIN
example-data/aldrin-moon.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 228 KiB |
100
example-data/descriptions.md
Normal file
100
example-data/descriptions.md
Normal file
|
|
@ -0,0 +1,100 @@
|
|||
# Image Descriptions
|
||||
|
||||
Good descriptions are the most important part of making images searchable.
|
||||
The AI uses these descriptions to understand what each image shows.
|
||||
Better descriptions lead to better search results.
|
||||
|
||||
Below are descriptions for each image in this folder. We include both
|
||||
a "bad" and a "good" version so you can see the difference.
|
||||
|
||||
---
|
||||
|
||||
## earthrise.jpg
|
||||
|
||||
**Bad description:** "Photo of Earth from space."
|
||||
|
||||
**Good description:** "Earthrise, photographed by astronaut William Anders
|
||||
during the Apollo 8 mission on December 24, 1968. Shows planet Earth
|
||||
rising above the lunar horizon, with the grey, cratered surface of the
|
||||
Moon in the foreground and the blackness of space behind. Earth appears
|
||||
as a blue and white marble, partly in shadow, with visible cloud patterns
|
||||
and ocean. This was the first photograph of Earth taken by a human from
|
||||
lunar orbit. It became one of the most influential environmental
|
||||
photographs ever taken."
|
||||
|
||||
**Why the good version works:** It includes the mission name (Apollo 8),
|
||||
the date, the photographer, what is visible in the image, and why the
|
||||
photo matters historically. A search for "first photo of Earth from the
|
||||
Moon" or "Apollo 8" or "William Anders" will all find this image.
|
||||
|
||||
---
|
||||
|
||||
## aldrin-moon.jpg
|
||||
|
||||
**Bad description:** "Astronaut on the Moon."
|
||||
|
||||
**Good description:** "Buzz Aldrin working beside the Apollo 11 Lunar Module
|
||||
Eagle on the surface of the Moon, July 20, 1969. Aldrin is wearing a white
|
||||
spacesuit (A7L Extravehicular Mobility Unit) and is positioned near
|
||||
scientific equipment deployed on the lunar surface. The Lunar Module is
|
||||
visible behind him with its gold and silver thermal protection layers. The
|
||||
grey lunar soil shows footprints and equipment tracks. Photographed by
|
||||
mission commander Neil Armstrong. This was the first crewed Moon landing
|
||||
in history."
|
||||
|
||||
**Why the good version works:** It names both astronauts, the mission,
|
||||
the spacecraft, and the equipment visible. A search for "first Moon
|
||||
landing equipment" or "Apollo 11 Lunar Module" or "Buzz Aldrin" will
|
||||
find this image.
|
||||
|
||||
---
|
||||
|
||||
## jupiter-great-red-spot.jpg
|
||||
|
||||
**Bad description:** "Planet Jupiter."
|
||||
|
||||
**Good description:** "Full-disk color portrait of Jupiter captured by the
|
||||
Voyager 1 spacecraft in 1979 during its flyby of the planet. Shows
|
||||
Jupiter's distinctive horizontal cloud bands in shades of orange, brown,
|
||||
and cream. The Great Red Spot, a massive storm larger than Earth that has
|
||||
been raging for hundreds of years, is visible in the southern hemisphere.
|
||||
Jupiter is the largest planet in our solar system, with a mass 318 times
|
||||
that of Earth. It is a gas giant composed primarily of hydrogen and helium."
|
||||
|
||||
**Why the good version works:** It mentions the spacecraft (Voyager 1),
|
||||
the Great Red Spot, the cloud bands, and key facts about Jupiter.
|
||||
A search for "largest storm in the solar system" or "gas giant cloud
|
||||
bands" or "Voyager Jupiter photos" will all find this image.
|
||||
|
||||
---
|
||||
|
||||
## iss-over-earth.jpg
|
||||
|
||||
**Bad description:** "Moon and Earth from space."
|
||||
|
||||
**Good description:** "The Moon photographed from the International Space
|
||||
Station (ISS), showing a gibbous Moon suspended above Earth's atmosphere.
|
||||
Earth's surface is visible in the lower portion of the image, covered with
|
||||
clouds and showing the thin blue line of the atmosphere at the horizon.
|
||||
The photo demonstrates the perspective astronauts have from the ISS,
|
||||
orbiting approximately 400 kilometers (250 miles) above Earth's surface.
|
||||
The ISS has been continuously occupied since November 2000 and serves
|
||||
as a microgravity research laboratory."
|
||||
|
||||
**Why the good version works:** It describes what is actually in the frame
|
||||
(the Moon seen from ISS, not the ISS itself), includes the orbital
|
||||
altitude, and mentions the ISS as a research laboratory. A search for
|
||||
"view from the space station" or "Moon from orbit" or "Earth's atmosphere
|
||||
from space" will find this image.
|
||||
|
||||
---
|
||||
|
||||
## Tips for writing your own descriptions
|
||||
|
||||
1. **Name what you see.** People, places, objects, colors, positions.
|
||||
2. **Add context.** When was it taken? By whom? Why does it matter?
|
||||
3. **Include facts.** Numbers, dates, names. These make searches precise.
|
||||
4. **Think about how someone would search.** What question would lead
|
||||
to this image? Make sure your description contains those words.
|
||||
5. **Be specific.** "A 12-mile-high cliff on Miranda, a moon of Uranus"
|
||||
beats "a cliff on a moon" every time.
|
||||
BIN
example-data/earthrise.jpg
Normal file
BIN
example-data/earthrise.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 66 KiB |
BIN
example-data/iss-over-earth.jpg
Normal file
BIN
example-data/iss-over-earth.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 63 KiB |
BIN
example-data/jupiter-fact-sheet.pdf
Normal file
BIN
example-data/jupiter-fact-sheet.pdf
Normal file
Binary file not shown.
BIN
example-data/jupiter-great-red-spot.jpg
Normal file
BIN
example-data/jupiter-great-red-spot.jpg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 46 KiB |
2668
example-data/solar-system-moons.pdf
Normal file
2668
example-data/solar-system-moons.pdf
Normal file
File diff suppressed because one or more lines are too long
891
example-data/solar-system-overview.pdf
Normal file
891
example-data/solar-system-overview.pdf
Normal file
File diff suppressed because one or more lines are too long
48
prompts/01-setup.md
Normal file
48
prompts/01-setup.md
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
# Prompt 1: Set Up the Project
|
||||
|
||||
Copy this into Claude Code after you have your API keys ready.
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
I want to build a multimodal search app. I have a folder of files
|
||||
(PDFs, images with text descriptions) that I want to make searchable
|
||||
using natural language.
|
||||
|
||||
Here is the tech stack I want:
|
||||
- Google Gemini Embedding 2 for converting content to embeddings
|
||||
(I have a Google AI Studio API key)
|
||||
- Pinecone for storing the embeddings
|
||||
(I have a Pinecone API key and an index called "space-search")
|
||||
- A simple local web interface where I can type questions and
|
||||
get results from my files
|
||||
- Use Claude for answering questions based on the search results
|
||||
(use my Claude Code subscription, not a separate API key)
|
||||
|
||||
My example files are in the folder: example-data/
|
||||
That folder contains:
|
||||
- 3 PDF files about the solar system, Jupiter, and planetary moons
|
||||
- 4 JPG images (Earthrise, Moon landing, Jupiter, ISS)
|
||||
- A file called descriptions.md with detailed text descriptions
|
||||
of each image
|
||||
|
||||
Please set up the project structure, install dependencies, and
|
||||
create a .env.template file for the API keys. Use Node.js with
|
||||
TypeScript. Do not start building the search logic yet, just
|
||||
the project skeleton.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What Claude Code will do
|
||||
|
||||
1. Create a new project folder with `package.json`
|
||||
2. Install libraries for Gemini embeddings, Pinecone, and a web server
|
||||
3. Create a `.env.template` with placeholders for your API keys
|
||||
4. Set up TypeScript configuration
|
||||
|
||||
## What you do next
|
||||
|
||||
1. Copy `.env.template` to `.env`
|
||||
2. Fill in your actual API keys
|
||||
3. Move to Prompt 2
|
||||
67
prompts/02-ingest.md
Normal file
67
prompts/02-ingest.md
Normal file
|
|
@ -0,0 +1,67 @@
|
|||
# Prompt 2: Ingest Your Files
|
||||
|
||||
Copy this into Claude Code after the project is set up and your
|
||||
.env file has your API keys.
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
Now build the ingestion pipeline. I need a script that:
|
||||
|
||||
1. Reads each PDF in example-data/ and extracts the text content.
|
||||
Split long documents into chunks of roughly 500 words each.
|
||||
Keep track of which file and which section each chunk came from.
|
||||
|
||||
2. Reads the image descriptions from example-data/descriptions.md.
|
||||
Use the "Good description" for each image (ignore the "Bad" ones).
|
||||
Each image description becomes one chunk, linked to its image file.
|
||||
|
||||
3. For each chunk, generate an embedding using Google Gemini
|
||||
Embedding 2 (model: gemini-embedding-exp-03-07 or the latest
|
||||
available). Use task_type "RETRIEVAL_DOCUMENT" for all chunks.
|
||||
|
||||
4. Store each embedding in Pinecone along with metadata:
|
||||
- source_file: the original filename
|
||||
- content_type: "pdf" or "image"
|
||||
- text: the actual text content of the chunk
|
||||
- chunk_index: which chunk number within the file
|
||||
|
||||
5. After ingestion, print a summary: how many chunks were created,
|
||||
how many embeddings stored, and any errors.
|
||||
|
||||
Run the ingestion script after building it. Show me the output.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What Claude Code will do
|
||||
|
||||
1. Build a script that reads PDFs and extracts text
|
||||
2. Parse the descriptions.md file for image descriptions
|
||||
3. Send each chunk to Google Gemini for embedding
|
||||
4. Store everything in Pinecone with metadata
|
||||
5. Run the script and show results
|
||||
|
||||
## What to expect
|
||||
|
||||
You should see output like:
|
||||
|
||||
```
|
||||
Processing solar-system-overview.pdf... 3 chunks
|
||||
Processing jupiter-fact-sheet.pdf... 4 chunks
|
||||
Processing solar-system-moons.pdf... 3 chunks
|
||||
Processing earthrise.jpg (from descriptions)... 1 chunk
|
||||
Processing aldrin-moon.jpg (from descriptions)... 1 chunk
|
||||
Processing jupiter-great-red-spot.jpg (from descriptions)... 1 chunk
|
||||
Processing iss-over-earth.jpg (from descriptions)... 1 chunk
|
||||
|
||||
Total: 14 chunks ingested, 14 embeddings stored in Pinecone.
|
||||
```
|
||||
|
||||
The exact numbers may vary depending on how Claude Code splits the PDFs.
|
||||
|
||||
## If something goes wrong
|
||||
|
||||
- "API key invalid": check your .env file
|
||||
- "Index not found": make sure your Pinecone index name matches
|
||||
- "Rate limit": wait a minute and run the script again
|
||||
67
prompts/03-search.md
Normal file
67
prompts/03-search.md
Normal file
|
|
@ -0,0 +1,67 @@
|
|||
# Prompt 3: Build the Search Interface
|
||||
|
||||
Copy this into Claude Code after ingestion is complete.
|
||||
|
||||
---
|
||||
|
||||
```
|
||||
Now build a simple web interface for searching my content.
|
||||
I want a local web app (localhost) with:
|
||||
|
||||
1. A search box where I type a question in plain English.
|
||||
|
||||
2. When I search, the app should:
|
||||
a. Convert my question to an embedding using Gemini Embedding 2
|
||||
with task_type "RETRIEVAL_QUERY"
|
||||
b. Search Pinecone for the 5 most similar chunks
|
||||
c. Show me the matching results with:
|
||||
- The source file name
|
||||
- The relevant text snippet
|
||||
- A similarity score
|
||||
- If the result is from an image, show the image too
|
||||
|
||||
3. Below the search results, add a "Ask Claude" button that:
|
||||
a. Takes the search results as context
|
||||
b. Sends them to Claude (use the claude CLI command, since I
|
||||
have Claude Code installed) with my question
|
||||
c. Shows Claude's answer, which should reference the specific
|
||||
files it used
|
||||
|
||||
Keep the design simple and clean. Dark background, readable text.
|
||||
No frameworks needed, just HTML + CSS + a small server.
|
||||
|
||||
Start the app after building it.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What Claude Code will do
|
||||
|
||||
1. Build a small web server (Express or similar)
|
||||
2. Create an HTML page with a search box
|
||||
3. Wire up the search to Gemini embeddings + Pinecone
|
||||
4. Add Claude integration for AI-powered answers
|
||||
5. Start the server
|
||||
|
||||
## What to try
|
||||
|
||||
Once the app is running, try these searches:
|
||||
|
||||
- "What is the largest planet in our solar system?"
|
||||
(Should find the Jupiter fact sheet AND the Jupiter image)
|
||||
|
||||
- "Tell me about the first Moon landing"
|
||||
(Should find the Aldrin image description)
|
||||
|
||||
- "Which moon has active volcanoes?"
|
||||
(Should find the moons PDF mentioning Io)
|
||||
|
||||
- "How far is Jupiter from Earth?"
|
||||
(Should find specific numbers from the Jupiter fact sheet)
|
||||
|
||||
- "What can astronauts see from the space station?"
|
||||
(Should find the ISS image description)
|
||||
|
||||
These examples demonstrate the power of multimodal search:
|
||||
a question about "the largest planet" finds both text data
|
||||
AND a photograph, because both are semantically related.
|
||||
69
prompts/04-improve.md
Normal file
69
prompts/04-improve.md
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
# Prompt 4: Improve and Iterate
|
||||
|
||||
These are prompts for when you want to make the search better.
|
||||
Use whichever ones are relevant to you.
|
||||
|
||||
---
|
||||
|
||||
## If search results are not relevant enough
|
||||
|
||||
```
|
||||
The search results are not matching my questions well.
|
||||
Can you add a re-ranking step? After getting the top 10 results
|
||||
from Pinecone, use Claude to re-rank them by relevance to my
|
||||
actual question, then show the top 5.
|
||||
```
|
||||
|
||||
## If you want to add your own files
|
||||
|
||||
```
|
||||
I want to add more files to the search index. I have put new
|
||||
files in example-data/. Please:
|
||||
1. Check which files are new (not already in Pinecone)
|
||||
2. Process only the new files
|
||||
3. Add their embeddings to the existing index
|
||||
4. Show me what was added
|
||||
```
|
||||
|
||||
## If you want better image descriptions
|
||||
|
||||
```
|
||||
Look at the images in example-data/ and suggest improved
|
||||
descriptions for any that could be more detailed. Show me
|
||||
the current description and your suggested improvement
|
||||
side by side. Do not update descriptions.md until I approve.
|
||||
```
|
||||
|
||||
## If you want to add a video clip
|
||||
|
||||
```
|
||||
I have added a short video file (MP4, under 2 minutes) to
|
||||
example-data/. Please:
|
||||
1. Extract key frames from the video
|
||||
2. Generate descriptions for each key frame
|
||||
3. Add these descriptions as searchable chunks in Pinecone
|
||||
4. Link them back to the video file with timestamps
|
||||
```
|
||||
|
||||
## If you want to export or share
|
||||
|
||||
```
|
||||
I want to share this search app with someone else. Can you:
|
||||
1. Create a README with setup instructions
|
||||
2. Make sure the .env.template has all required variables
|
||||
3. Add a "first run" script that handles ingestion automatically
|
||||
4. Package it so someone can clone the repo and get started
|
||||
with just their own API keys
|
||||
```
|
||||
|
||||
## If something is broken
|
||||
|
||||
```
|
||||
[Paste the exact error message here]
|
||||
|
||||
This happened when I tried to [describe what you did].
|
||||
Can you diagnose and fix the issue?
|
||||
```
|
||||
|
||||
The key to good results with Claude Code: be specific about
|
||||
what you see and what you expected to see instead.
|
||||
96
troubleshooting.md
Normal file
96
troubleshooting.md
Normal file
|
|
@ -0,0 +1,96 @@
|
|||
# Troubleshooting
|
||||
|
||||
Common problems and how to fix them.
|
||||
|
||||
## 1. "API key not found" or "Invalid API key"
|
||||
|
||||
**What happened:** The .env file is missing, in the wrong place,
|
||||
or the key was copied incorrectly.
|
||||
|
||||
**Fix:** Open the .env file in your project folder. Check that:
|
||||
- The file is named exactly `.env` (not `env.txt` or `.env.txt`)
|
||||
- There are no spaces around the `=` sign
|
||||
- The key is complete (no missing characters from copy-paste)
|
||||
|
||||
Tell Claude Code: "Check if my .env file is set up correctly."
|
||||
|
||||
## 2. "Module not found" or dependency errors
|
||||
|
||||
**What happened:** The project dependencies were not installed.
|
||||
|
||||
**Fix:** Tell Claude Code: "Install the project dependencies."
|
||||
It will run the right install command for you.
|
||||
|
||||
## 3. Image search returns bad results
|
||||
|
||||
**What happened:** The image descriptions are too vague.
|
||||
|
||||
**Fix:** Open `descriptions.md` and improve the descriptions.
|
||||
Compare your descriptions to the "good" examples in that file.
|
||||
Then tell Claude Code: "Re-index the images with the updated
|
||||
descriptions."
|
||||
|
||||
## 4. PDF content is not showing up in search
|
||||
|
||||
**What happened:** The PDF might be scanned (image-based)
|
||||
rather than text-based, or it might be too large.
|
||||
|
||||
**Fix:** Tell Claude Code: "Check if the PDFs contain extractable
|
||||
text." If a PDF is image-based, Claude Code can use OCR to
|
||||
extract the text.
|
||||
|
||||
## 5. Search is slow
|
||||
|
||||
**What happened:** This is normal for the first search after
|
||||
starting the app. The system needs to load.
|
||||
|
||||
**Fix:** Wait a few seconds. Subsequent searches will be faster.
|
||||
If it stays slow, tell Claude Code: "The search is slow. Can you
|
||||
optimize the query?"
|
||||
|
||||
## 6. "Rate limit exceeded" from Google
|
||||
|
||||
**What happened:** You sent too many requests too quickly.
|
||||
|
||||
**Fix:** Wait one minute and try again. The free tier allows
|
||||
1,500 requests per day but has a per-minute limit too. For the
|
||||
example files in this guide, you should never hit this limit.
|
||||
|
||||
## 7. Pinecone index not found
|
||||
|
||||
**What happened:** The index name in your code does not match
|
||||
what you created in Pinecone.
|
||||
|
||||
**Fix:** Log into pinecone.io, check your index name, and tell
|
||||
Claude Code: "My Pinecone index is named [your-index-name].
|
||||
Update the configuration."
|
||||
|
||||
## 8. Claude Code asks me something I do not understand
|
||||
|
||||
**What happened:** Claude Code sometimes asks technical questions
|
||||
about implementation choices.
|
||||
|
||||
**Fix:** You can safely answer with: "Use the simplest option"
|
||||
or "You decide, I trust your judgment." Claude Code will pick
|
||||
sensible defaults.
|
||||
|
||||
## 9. The app works but results are not great
|
||||
|
||||
**What happened:** Search quality depends on the descriptions
|
||||
and the chunking strategy.
|
||||
|
||||
**Fix:** Three things to try:
|
||||
1. Improve your image descriptions (this makes the biggest difference)
|
||||
2. Tell Claude Code: "The search results are not relevant enough.
|
||||
Can you adjust the chunking or add re-ranking?"
|
||||
3. Try rephrasing your search query in different ways
|
||||
|
||||
## 10. Something else is broken
|
||||
|
||||
Tell Claude Code exactly what you see. Copy-paste the error message.
|
||||
Claude Code is very good at diagnosing and fixing problems when
|
||||
you give it the exact error text.
|
||||
|
||||
If you are stuck, you can also start fresh: tell Claude Code
|
||||
"Let's start over. Delete the current search index and re-build
|
||||
everything from scratch."
|
||||
Loading…
Add table
Add a link
Reference in a new issue