1
0
Fork 0
multimodal-rag-guide/README.md
Kjell Tore Guttormsen edcd1721df Initial commit: multimodal RAG guide with Claude Code
Prompt-driven guide for building multimodal search using
Gemini Embedding 2 + Pinecone + Claude Code. Includes example
data (NASA public domain), step-by-step prompts, concepts
explainer, cost breakdown, and troubleshooting guide.
2026-03-12 16:36:22 +01:00

7.9 KiB

Build Multimodal Search with Claude Code

Search across your PDFs, images, and documents using plain English. No coding required. Claude Code builds everything for you.

What you will build

A local search app that lets you ask questions like:

  • "What is the largest planet in our solar system?"
  • "Show me photos from the first Moon landing"
  • "Which moon has active volcanoes?"

The app searches through your PDFs and images simultaneously and gives you answers with sources. You talk to it in plain English.

Google searches the internet. This searches YOUR files.

Imagine you have 500 PDFs, research papers, photos, and notes scattered across folders. Normal file search only matches exact words. This system understands meaning. You ask "what do we know about storms on other planets?" and it finds the Jupiter fact sheet mentioning wind speeds, the Jupiter photograph showing cloud bands, and the solar system overview describing atmospheric composition.

It connects information across files and formats. That is what makes it powerful.

What you need

  1. Claude Code (comes with Claude Pro at $20/month or Claude Max)
  2. A Google AI Studio account (free) for Gemini embeddings
  3. A Pinecone account (free tier) for the vector database
  4. 30-45 minutes for your first time

No programming knowledge required. You will copy prompts into Claude Code, and it will build everything.

How it works (the simple version)

Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone)
                                              │
Your question ──> Embedding (Gemini) ──> Search ──> Claude answers
  1. Your files get converted into "embeddings" (numerical fingerprints that capture meaning)
  2. When you ask a question, it gets the same treatment
  3. The system finds fingerprints that match
  4. Claude reads the matching content and answers your question

For a deeper explanation, see concepts.md.

Step 0: Get your accounts (10 minutes)

Google AI Studio (for embeddings)

Embeddings convert your content into searchable vectors. We use Google's Gemini Embedding 2 for this because it handles text, images, and video.

  1. Go to aistudio.google.com
  2. Sign in with a Google account
  3. Click "Get API key" in the left sidebar
  4. Click "Create API key"
  5. Copy the key somewhere safe

What is an API key? It is like a password that lets your app talk to Google's embedding service. You will paste it into a configuration file later. It never leaves your computer.

Pinecone (for storing embeddings)

A vector database stores embeddings so you can search through them. Think of it as a smart filing cabinet.

  1. Go to pinecone.io and create a free account
  2. Once in the dashboard, click "Create Index"
  3. Name it space-search (or whatever you like)
  4. Set dimensions to 3072 (this matches Gemini Embedding 2)
  5. Choose the cosine metric
  6. Select the free "Starter" plan
  7. Copy your API key from the "API Keys" section

Verify you have Claude Code

Open your terminal and type claude. If Claude Code starts, you are ready. If not, install it:

npm install -g @anthropic-ai/claude-code

You need a Claude Pro or Max subscription for this to work.

Step 1: Get the example files

Clone or download this repository. The example-data/ folder contains everything you need to get started:

PDFs:

  • solar-system-overview.pdf - Overview of our solar system (NASA)
  • jupiter-fact-sheet.pdf - Detailed data about Jupiter (NASA)
  • solar-system-moons.pdf - Guide to planetary moons (NASA)

Images:

  • earthrise.jpg - Earth seen from lunar orbit, Apollo 8 (1968)
  • aldrin-moon.jpg - Buzz Aldrin on the Moon, Apollo 11 (1969)
  • jupiter-great-red-spot.jpg - Jupiter photographed by Voyager 1 (1979)
  • iss-over-earth.jpg - The Moon seen from the ISS

Descriptions:

  • descriptions.md - Detailed text descriptions of each image. This is the most important file for image search quality. See the section below on why descriptions matter.

All files are NASA public domain. No copyright restrictions.

Step 2: Start Claude Code (5 minutes)

Open your terminal, navigate to this folder, and start Claude Code:

claude

Then copy the prompt from prompts/01-setup.md and paste it into Claude Code.

Claude Code will create the project structure and install dependencies. When it is done, copy .env.template to .env and fill in your API keys.

Step 3: Ingest your files (10 minutes)

Copy the prompt from prompts/02-ingest.md into Claude Code.

Claude Code will read each file, split it into chunks, generate embeddings, and store everything in Pinecone. You will see a summary of what was processed.

This is the step where your files become searchable.

Step 4: Search (5 minutes)

Copy the prompt from prompts/03-search.md into Claude Code.

Claude Code will build a web interface and start it. Open the URL it gives you (usually http://localhost:3333) in your browser.

Try these searches:

Search query What should come back
"What is the largest planet?" Jupiter fact sheet + Jupiter image
"First Moon landing" Aldrin image + solar system overview
"Which moon has volcanoes?" Moons PDF (mentioning Io)
"How far is Jupiter from Earth?" Jupiter fact sheet (588.5 to 968.1 million km)
"What do astronauts see from orbit?" ISS image description

Notice how a single question can pull results from both PDFs and images. That is multimodal search.

Step 5: Make it your own

Now that you have seen it work with NASA files, try it with your own content:

  1. Add your own PDFs, images, or documents to the example-data/ folder
  2. Write descriptions for any images (see the tips in descriptions.md)
  3. Use prompts/04-improve.md to re-index

Ideas for what to search:

  • Your company's internal documents
  • Research papers for a project
  • Travel photos with descriptions
  • Recipe collections
  • Course notes and textbook screenshots

Why image descriptions matter

The search system cannot "see" your images directly. It finds images through their text descriptions. This means:

Bad description: "Photo of a planet" will only match searches containing "photo" or "planet."

Good description: "Full-disk portrait of Jupiter captured by Voyager 1 in 1979, showing horizontal cloud bands and the Great Red Spot, a massive storm larger than Earth" will match searches about Jupiter, Voyager missions, storms, cloud patterns, and more.

The descriptions.md file in example-data/ shows side-by-side examples of bad versus good descriptions. Spending five minutes on better descriptions will dramatically improve your search results.

What this costs

$0 extra if you already have a Claude subscription. Both Gemini embeddings and Pinecone have generous free tiers.

See costs.md for details.

If you get stuck

See troubleshooting.md for the 10 most common problems and their solutions.

The most effective fix for almost anything: copy the exact error message and paste it into Claude Code. It is very good at diagnosing its own work.

How it works (the deeper version)

Read concepts.md for plain-English explanations of:

  • What are embeddings?
  • What is a vector database?
  • What is RAG?
  • What is chunking?
  • What does "multimodal" mean?

Credits

Example data: All PDFs and images are from NASA and are in the public domain (U.S. Government works, no copyright restrictions).

Built with:


Part of The Dharma Lab. Read the full article for the story behind this project.