Prompt-driven guide for building multimodal search using Gemini Embedding 2 + Pinecone + Claude Code. Includes example data (NASA public domain), step-by-step prompts, concepts explainer, cost breakdown, and troubleshooting guide.
2 KiB
2 KiB
Prompt 3: Build the Search Interface
Copy this into Claude Code after ingestion is complete.
Now build a simple web interface for searching my content.
I want a local web app (localhost) with:
1. A search box where I type a question in plain English.
2. When I search, the app should:
a. Convert my question to an embedding using Gemini Embedding 2
with task_type "RETRIEVAL_QUERY"
b. Search Pinecone for the 5 most similar chunks
c. Show me the matching results with:
- The source file name
- The relevant text snippet
- A similarity score
- If the result is from an image, show the image too
3. Below the search results, add a "Ask Claude" button that:
a. Takes the search results as context
b. Sends them to Claude (use the claude CLI command, since I
have Claude Code installed) with my question
c. Shows Claude's answer, which should reference the specific
files it used
Keep the design simple and clean. Dark background, readable text.
No frameworks needed, just HTML + CSS + a small server.
Start the app after building it.
What Claude Code will do
- Build a small web server (Express or similar)
- Create an HTML page with a search box
- Wire up the search to Gemini embeddings + Pinecone
- Add Claude integration for AI-powered answers
- Start the server
What to try
Once the app is running, try these searches:
-
"What is the largest planet in our solar system?" (Should find the Jupiter fact sheet AND the Jupiter image)
-
"Tell me about the first Moon landing" (Should find the Aldrin image description)
-
"Which moon has active volcanoes?" (Should find the moons PDF mentioning Io)
-
"How far is Jupiter from Earth?" (Should find specific numbers from the Jupiter fact sheet)
-
"What can astronauts see from the space station?" (Should find the ISS image description)
These examples demonstrate the power of multimodal search: a question about "the largest planet" finds both text data AND a photograph, because both are semantically related.