Initial commit: multimodal RAG guide with Claude Code

Prompt-driven guide for building multimodal search using Gemini Embedding 2 + Pinecone + Claude Code. Includes example data (NASA public domain), step-by-step prompts, concepts explainer, cost breakdown, and troubleshooting guide.
2026-03-12 16:36:22 +01:00 · 2026-03-12 16:36:22 +01:00 · edcd1721df
commit edcd1721df
19 changed files with 4446 additions and 0 deletions
--- a/concepts.md
+++ b/concepts.md
@ -0,0 +1,94 @@
+# Concepts: What You Need to Know (and Nothing More)
+
+This page explains the key ideas behind multimodal search.
+You do not need to understand these concepts to follow the guide.
+But if you are curious about what is happening behind the scenes,
+this is for you.
+
+## What is an embedding?
+
+Think of it as a fingerprint for meaning.
+
+When you read the sentence "Jupiter is the largest planet," your brain
+understands what it means. An embedding is a way for a computer to do
+something similar. It converts text (or an image) into a long list of
+numbers that captures the meaning of that content.
+
+The key insight: things that mean similar things get similar numbers.
+So "Jupiter is massive" and "Jupiter is the biggest planet" would have
+very similar embeddings, even though the words are different.
+
+You never see these numbers. They work behind the scenes.
+
+## What is a vector database?
+
+A place to store embeddings so you can search through them quickly.
+
+Imagine a library where books are not organized by author or title,
+but by what they are about. You walk in and say "I want something
+about storms on other planets" and the librarian immediately hands
+you the right book. That is what a vector database does, but with
+your files.
+
+We use Pinecone in this guide because it has a free tier and works
+well. There are other options (Chroma, Weaviate, Qdrant), but
+Pinecone requires the least setup.
+
+## What is RAG?
+
+RAG stands for Retrieval-Augmented Generation. Big name, simple idea.
+
+Normally, when you ask an AI a question, it answers from its training
+data. It might know general facts, but it does not know about YOUR
+files. RAG changes that.
+
+With RAG, the AI first searches through your documents to find
+relevant information, then uses what it found to answer your question.
+It is like giving the AI a cheat sheet of your own content before
+it answers.
+
+Without RAG: "What do we know about Jupiter's atmosphere?"
+The AI answers from general knowledge.
+
+With RAG: "What do we know about Jupiter's atmosphere?"
+The AI searches your PDFs and images, finds the Jupiter fact sheet
+and the Voyager photo, and answers based on YOUR specific collection.
+
+## What is chunking?
+
+Your documents might be long. A 50-page PDF cannot be processed
+as one piece. Chunking means splitting it into smaller sections
+that the AI can work with.
+
+Think of it like cutting a book into chapters. Each chapter gets
+its own embedding. When you search, the system finds the right
+chapter, not the whole book.
+
+Claude Code handles chunking automatically. You do not need to
+do anything.
+
+## What does "multimodal" mean?
+
+"Multi" means many. "Modal" means types.
+
+Regular search works with text only. Multimodal search works with
+text AND images AND PDFs AND videos. You can search across all
+of them at once.
+
+This is what makes this project interesting. You ask a question
+in plain English, and the system searches through your PDFs,
+images, and their descriptions to find the best answer, regardless
+of what format the information is in.
+
+## How does it all fit together?
+
+1. You put files in a folder (PDFs, images with descriptions)
+2. Claude Code builds a system that reads each file
+3. Each piece of content gets converted to an embedding (a fingerprint)
+4. The embeddings are stored in Pinecone (the vector database)
+5. When you search, your question also gets converted to an embedding
+6. Pinecone finds the stored embeddings most similar to your question
+7. The matching content is shown to you (or fed to an AI for a detailed answer)
+
+That is it. The rest is implementation details, and Claude Code
+handles those for you.