1
0
Fork 0

Initial commit: multimodal RAG guide with Claude Code

Prompt-driven guide for building multimodal search using
Gemini Embedding 2 + Pinecone + Claude Code. Includes example
data (NASA public domain), step-by-step prompts, concepts
explainer, cost breakdown, and troubleshooting guide.
This commit is contained in:
Kjell Tore Guttormsen 2026-03-12 16:36:22 +01:00
commit edcd1721df
19 changed files with 4446 additions and 0 deletions

48
prompts/01-setup.md Normal file
View file

@ -0,0 +1,48 @@
# Prompt 1: Set Up the Project
Copy this into Claude Code after you have your API keys ready.
---
```
I want to build a multimodal search app. I have a folder of files
(PDFs, images with text descriptions) that I want to make searchable
using natural language.
Here is the tech stack I want:
- Google Gemini Embedding 2 for converting content to embeddings
(I have a Google AI Studio API key)
- Pinecone for storing the embeddings
(I have a Pinecone API key and an index called "space-search")
- A simple local web interface where I can type questions and
get results from my files
- Use Claude for answering questions based on the search results
(use my Claude Code subscription, not a separate API key)
My example files are in the folder: example-data/
That folder contains:
- 3 PDF files about the solar system, Jupiter, and planetary moons
- 4 JPG images (Earthrise, Moon landing, Jupiter, ISS)
- A file called descriptions.md with detailed text descriptions
of each image
Please set up the project structure, install dependencies, and
create a .env.template file for the API keys. Use Node.js with
TypeScript. Do not start building the search logic yet, just
the project skeleton.
```
---
## What Claude Code will do
1. Create a new project folder with `package.json`
2. Install libraries for Gemini embeddings, Pinecone, and a web server
3. Create a `.env.template` with placeholders for your API keys
4. Set up TypeScript configuration
## What you do next
1. Copy `.env.template` to `.env`
2. Fill in your actual API keys
3. Move to Prompt 2

67
prompts/02-ingest.md Normal file
View file

@ -0,0 +1,67 @@
# Prompt 2: Ingest Your Files
Copy this into Claude Code after the project is set up and your
.env file has your API keys.
---
```
Now build the ingestion pipeline. I need a script that:
1. Reads each PDF in example-data/ and extracts the text content.
Split long documents into chunks of roughly 500 words each.
Keep track of which file and which section each chunk came from.
2. Reads the image descriptions from example-data/descriptions.md.
Use the "Good description" for each image (ignore the "Bad" ones).
Each image description becomes one chunk, linked to its image file.
3. For each chunk, generate an embedding using Google Gemini
Embedding 2 (model: gemini-embedding-exp-03-07 or the latest
available). Use task_type "RETRIEVAL_DOCUMENT" for all chunks.
4. Store each embedding in Pinecone along with metadata:
- source_file: the original filename
- content_type: "pdf" or "image"
- text: the actual text content of the chunk
- chunk_index: which chunk number within the file
5. After ingestion, print a summary: how many chunks were created,
how many embeddings stored, and any errors.
Run the ingestion script after building it. Show me the output.
```
---
## What Claude Code will do
1. Build a script that reads PDFs and extracts text
2. Parse the descriptions.md file for image descriptions
3. Send each chunk to Google Gemini for embedding
4. Store everything in Pinecone with metadata
5. Run the script and show results
## What to expect
You should see output like:
```
Processing solar-system-overview.pdf... 3 chunks
Processing jupiter-fact-sheet.pdf... 4 chunks
Processing solar-system-moons.pdf... 3 chunks
Processing earthrise.jpg (from descriptions)... 1 chunk
Processing aldrin-moon.jpg (from descriptions)... 1 chunk
Processing jupiter-great-red-spot.jpg (from descriptions)... 1 chunk
Processing iss-over-earth.jpg (from descriptions)... 1 chunk
Total: 14 chunks ingested, 14 embeddings stored in Pinecone.
```
The exact numbers may vary depending on how Claude Code splits the PDFs.
## If something goes wrong
- "API key invalid": check your .env file
- "Index not found": make sure your Pinecone index name matches
- "Rate limit": wait a minute and run the script again

67
prompts/03-search.md Normal file
View file

@ -0,0 +1,67 @@
# Prompt 3: Build the Search Interface
Copy this into Claude Code after ingestion is complete.
---
```
Now build a simple web interface for searching my content.
I want a local web app (localhost) with:
1. A search box where I type a question in plain English.
2. When I search, the app should:
a. Convert my question to an embedding using Gemini Embedding 2
with task_type "RETRIEVAL_QUERY"
b. Search Pinecone for the 5 most similar chunks
c. Show me the matching results with:
- The source file name
- The relevant text snippet
- A similarity score
- If the result is from an image, show the image too
3. Below the search results, add a "Ask Claude" button that:
a. Takes the search results as context
b. Sends them to Claude (use the claude CLI command, since I
have Claude Code installed) with my question
c. Shows Claude's answer, which should reference the specific
files it used
Keep the design simple and clean. Dark background, readable text.
No frameworks needed, just HTML + CSS + a small server.
Start the app after building it.
```
---
## What Claude Code will do
1. Build a small web server (Express or similar)
2. Create an HTML page with a search box
3. Wire up the search to Gemini embeddings + Pinecone
4. Add Claude integration for AI-powered answers
5. Start the server
## What to try
Once the app is running, try these searches:
- "What is the largest planet in our solar system?"
(Should find the Jupiter fact sheet AND the Jupiter image)
- "Tell me about the first Moon landing"
(Should find the Aldrin image description)
- "Which moon has active volcanoes?"
(Should find the moons PDF mentioning Io)
- "How far is Jupiter from Earth?"
(Should find specific numbers from the Jupiter fact sheet)
- "What can astronauts see from the space station?"
(Should find the ISS image description)
These examples demonstrate the power of multimodal search:
a question about "the largest planet" finds both text data
AND a photograph, because both are semantically related.

69
prompts/04-improve.md Normal file
View file

@ -0,0 +1,69 @@
# Prompt 4: Improve and Iterate
These are prompts for when you want to make the search better.
Use whichever ones are relevant to you.
---
## If search results are not relevant enough
```
The search results are not matching my questions well.
Can you add a re-ranking step? After getting the top 10 results
from Pinecone, use Claude to re-rank them by relevance to my
actual question, then show the top 5.
```
## If you want to add your own files
```
I want to add more files to the search index. I have put new
files in example-data/. Please:
1. Check which files are new (not already in Pinecone)
2. Process only the new files
3. Add their embeddings to the existing index
4. Show me what was added
```
## If you want better image descriptions
```
Look at the images in example-data/ and suggest improved
descriptions for any that could be more detailed. Show me
the current description and your suggested improvement
side by side. Do not update descriptions.md until I approve.
```
## If you want to add a video clip
```
I have added a short video file (MP4, under 2 minutes) to
example-data/. Please:
1. Extract key frames from the video
2. Generate descriptions for each key frame
3. Add these descriptions as searchable chunks in Pinecone
4. Link them back to the video file with timestamps
```
## If you want to export or share
```
I want to share this search app with someone else. Can you:
1. Create a README with setup instructions
2. Make sure the .env.template has all required variables
3. Add a "first run" script that handles ingestion automatically
4. Package it so someone can clone the repo and get started
with just their own API keys
```
## If something is broken
```
[Paste the exact error message here]
This happened when I tried to [describe what you did].
Can you diagnose and fix the issue?
```
The key to good results with Claude Code: be specific about
what you see and what you expected to see instead.