# Build Multimodal Search with Claude Code Search across your PDFs, images, and documents using plain English. No coding required. Claude Code builds everything for you. ## What you will build A local search app that lets you ask questions like: - "What is the largest planet in our solar system?" - "Show me photos from the first Moon landing" - "Which moon has active volcanoes?" The app searches through your PDFs and images simultaneously and gives you answers with sources. You talk to it in plain English. ## How is this different from a Google search? Google searches the internet. This searches YOUR files. Imagine you have 500 PDFs, research papers, photos, and notes scattered across folders. Normal file search only matches exact words. This system understands meaning. You ask "what do we know about storms on other planets?" and it finds the Jupiter fact sheet mentioning wind speeds, the Jupiter photograph showing cloud bands, and the solar system overview describing atmospheric composition. It connects information across files and formats. That is what makes it powerful. ## What you need 1. **Claude Code** (comes with Claude Pro at $20/month or Claude Max) 2. **A Google AI Studio account** (free) for Gemini embeddings 3. **A Pinecone account** (free tier) for the vector database 4. **30-45 minutes** for your first time No programming knowledge required. You will copy prompts into Claude Code, and it will build everything. ## How it works (the simple version) ``` Your files ──> Embeddings (Gemini) ──> Vector database (Pinecone) │ Your question ──> Embedding (Gemini) ──> Search ──> Claude answers ``` 1. Your files get converted into "embeddings" (numerical fingerprints that capture meaning) 2. When you ask a question, it gets the same treatment 3. The system finds fingerprints that match 4. Claude reads the matching content and answers your question For a deeper explanation, see [concepts.md](concepts.md). ## Step 0: Get your accounts (10 minutes) ### Google AI Studio (for embeddings) Embeddings convert your content into searchable vectors. We use Google's Gemini Embedding 2 for this because it handles text, images, and video. 1. Go to [aistudio.google.com](https://aistudio.google.com/) 2. Sign in with a Google account 3. Click "Get API key" in the left sidebar 4. Click "Create API key" 5. Copy the key somewhere safe **What is an API key?** It is like a password that lets your app talk to Google's embedding service. You will paste it into a configuration file later. It never leaves your computer. ### Pinecone (for storing embeddings) A vector database stores embeddings so you can search through them. Think of it as a smart filing cabinet. 1. Go to [pinecone.io](https://www.pinecone.io/) and create a free account 2. Once in the dashboard, click "Create Index" 3. Name it `space-search` (or whatever you like) 4. Set dimensions to `3072` (this matches Gemini Embedding 2) 5. Choose the `cosine` metric 6. Select the free "Starter" plan 7. Copy your API key from the "API Keys" section ### Verify you have Claude Code Open your terminal and type `claude`. If Claude Code starts, you are ready. If not, install it: ``` npm install -g @anthropic-ai/claude-code ``` You need a Claude Pro or Max subscription for this to work. ## Step 1: Get the example files Clone or download this repository. The `example-data/` folder contains everything you need to get started: **PDFs:** - `solar-system-overview.pdf` - Overview of our solar system (NASA) - `jupiter-fact-sheet.pdf` - Detailed data about Jupiter (NASA) - `solar-system-moons.pdf` - Guide to planetary moons (NASA) **Images:** - `earthrise.jpg` - Earth seen from lunar orbit, Apollo 8 (1968) - `aldrin-moon.jpg` - Buzz Aldrin on the Moon, Apollo 11 (1969) - `jupiter-great-red-spot.jpg` - Jupiter photographed by Voyager 1 (1979) - `iss-over-earth.jpg` - The Moon seen from the ISS **Descriptions:** - `descriptions.md` - Detailed text descriptions of each image. This is the most important file for image search quality. See the section below on why descriptions matter. All files are NASA public domain. No copyright restrictions. ## Step 2: Start Claude Code (5 minutes) Open your terminal, navigate to this folder, and start Claude Code: ``` claude ``` Then copy the prompt from [prompts/01-setup.md](prompts/01-setup.md) and paste it into Claude Code. Claude Code will create the project structure and install dependencies. When it is done, copy `.env.template` to `.env` and fill in your API keys. ## Step 3: Ingest your files (10 minutes) Copy the prompt from [prompts/02-ingest.md](prompts/02-ingest.md) into Claude Code. Claude Code will read each file, split it into chunks, generate embeddings, and store everything in Pinecone. You will see a summary of what was processed. This is the step where your files become searchable. ## Step 4: Search (5 minutes) Copy the prompt from [prompts/03-search.md](prompts/03-search.md) into Claude Code. Claude Code will build a web interface and start it. Open the URL it gives you (usually `http://localhost:3333`) in your browser. Try these searches: | Search query | What should come back | |---|---| | "What is the largest planet?" | Jupiter fact sheet + Jupiter image | | "First Moon landing" | Aldrin image + solar system overview | | "Which moon has volcanoes?" | Moons PDF (mentioning Io) | | "How far is Jupiter from Earth?" | Jupiter fact sheet (588.5 to 968.1 million km) | | "What do astronauts see from orbit?" | ISS image description | Notice how a single question can pull results from both PDFs and images. That is multimodal search. ## Step 5: Make it your own Now that you have seen it work with NASA files, try it with your own content: 1. Add your own PDFs, images, or documents to the `example-data/` folder 2. Write descriptions for any images (see the tips in `descriptions.md`) 3. Use [prompts/04-improve.md](prompts/04-improve.md) to re-index Ideas for what to search: - Your company's internal documents - Research papers for a project - Travel photos with descriptions - Recipe collections - Course notes and textbook screenshots ## Why image descriptions matter The search system cannot "see" your images directly. It finds images through their text descriptions. This means: **Bad description:** "Photo of a planet" will only match searches containing "photo" or "planet." **Good description:** "Full-disk portrait of Jupiter captured by Voyager 1 in 1979, showing horizontal cloud bands and the Great Red Spot, a massive storm larger than Earth" will match searches about Jupiter, Voyager missions, storms, cloud patterns, and more. The `descriptions.md` file in `example-data/` shows side-by-side examples of bad versus good descriptions. Spending five minutes on better descriptions will dramatically improve your search results. ## What this costs $0 extra if you already have a Claude subscription. Both Gemini embeddings and Pinecone have generous free tiers. See [costs.md](costs.md) for details. ## If you get stuck See [troubleshooting.md](troubleshooting.md) for the 10 most common problems and their solutions. The most effective fix for almost anything: copy the exact error message and paste it into Claude Code. It is very good at diagnosing its own work. ## How it works (the deeper version) Read [concepts.md](concepts.md) for plain-English explanations of: - What are embeddings? - What is a vector database? - What is RAG? - What is chunking? - What does "multimodal" mean? ## Credits Example data: All PDFs and images are from NASA and are in the public domain (U.S. Government works, no copyright restrictions). Built with: - [Claude Code](https://claude.ai) by Anthropic (app building + AI answers) - [Gemini Embedding 2](https://ai.google.dev/) by Google (multimodal embeddings) - [Pinecone](https://www.pinecone.io/) (vector database) --- *Part of [The Dharma Lab](https://thedharmalab.com). Read the [full article](https://thedharmalab.com/) for the story behind this project.*