🛒 AI-Powered Grocery Search: Building a Neo4j Graph App with Streamlit & Magic! ✨
🚀 Introduction
So, you’ve got a ton of product data, and you want to search it semantically — like a human, not a robot that only understands exact matches. But traditional databases don’t cut it, and vector databases sound too complicated. Enter Neo4j, the graph database that makes relationships as easy to handle as your Netflix recommendations.
In this blog, we’ll go from zero to hero, setting up Neo4j, uploading product data, and integrating it into a Streamlit-powered AI app for natural language search. And yes, we’re throwing in some LLM magic! ✨
🏗️ Step 1: Setting Up Neo4j & Uploading Data
1. Install Dependencies
Before diving in, install the necessary Python libraries:
pip install pandas neo4j streamlit langchain langchain-community langchain-groq2. Connect to Neo4j & Upload Data
We’re using Neo4j Aura (a cloud-based instance) for simplicity. Here’s how we insert our product data:
Download the data -> Kaggle
import pandas as pd
from neo4j import GraphDatabase
# Load CSV dataset
file_path = "BigBasket_Products.csv"
df = pd.read_csv(file_path)
# Neo4j Connection Details
uri = "neo4j+s://your-database-url"
username = "neo4j"
password = "your-password"
# Create Neo4j Driver
driver = GraphDatabase.driver(uri, auth=(username, password))
# Cypher Query for Insertion
def insert_product(tx, product, sale_price, market_price, p_type, rating, brand, category, sub_category):
if pd.isna(product) or pd.isna(brand) or pd.isna(category) or pd.isna(sub_category):
return # Skip rows with essential missing values
query = """
MERGE (p:Product {name: $product})
SET p.sale_price = $sale_price,
p.market_price = $market_price,
p.type = $p_type,
p.rating = $rating
MERGE (b:Brand {name: $brand})
MERGE (c:Category {name: $category})
MERGE (sc:SubCategory {name: $sub_category})
MERGE (p)-[:BELONGS_TO]->(b)
MERGE (p)-[:IN_CATEGORY]->(c)
MERGE (p)-[:IN_SUBCATEGORY]->(sc)
"""
tx.run(query, product=product, sale_price=sale_price, market_price=market_price,
p_type=p_type, rating=rating, brand=brand, category=category, sub_category=sub_category)
# Insert Data Row by Row
with driver.session() as session:
for _, row in df.iterrows():
session.write_transaction(insert_product, row["product"], row["sale_price"],
row["market_price"], row["type"], row["rating"],
row["brand"], row["category"], row["sub_category"])
print("Data inserted successfully!")
# Close the connection
driver.close()Boom! 💥 Your product data is now in Neo4j.
🤖 Step 2: Building a Streamlit App for AI-Powered Search
Now let’s make a cool Streamlit app that queries this Neo4j database using LangChain + LLMs (via Groq’s API).
1. Install Additional Dependencies
pip install streamlit langchain-groq langchain-experimental2. Create the Streamlit App
import streamlit as st
from langchain.chains import GraphCypherQAChain
from langchain.graphs import Neo4jGraph
import pandas as pd
import re
from langchain_groq import ChatGroq
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain.prompts import PromptTemplate
# Connect to Neo4j
graph = Neo4jGraph(url="neo4j+s://your-database-url", username="neo4j", password="your-password")
# Initialize LLM (GPT-4 via Groq API)
llm = ChatGroq(groq_api_key="your-groq-api-key", model_name="Gemma2-9b-It")
llm_transformer = LLMGraphTransformer(llm=llm)
# Define Custom Cypher Prompt
custom_cypher_prompt = PromptTemplate(
input_variables=["query"],
template="""
Given the following user query:
"{query}"
Generate a Cypher query that:
- Finds products linked to relevant categories.
- Uses `CONTAINS` for flexible matching.
- Ensures relationships are respected.
- Returns structured data with product name, sale price, and category.
Example:
User Query: "Show me Snacks products"
Cypher Query:
MATCH (p:Product)-[:IN_CATEGORY]->(c:Category)
WHERE toLower(c.name) CONTAINS "snacks"
RETURN p.name AS product_name, p.sale_price AS sale_price, c.name AS category
LIMIT 10
"""
)
# Create LangChain GraphCypherQAChain
chain = GraphCypherQAChain.from_llm(
llm=llm,
graph=graph,
verbose=True,
allow_dangerous_requests=True,
cypher_prompt=custom_cypher_prompt,
return_intermediate_steps=True
)
st.title("🔍 LLM-Powered Neo4j Graph Explorer")
st.write("Ask questions about your product database using natural language!")
query = st.text_input("Enter your query (e.g., 'Show me products with price > 100'):")
def display_results(results):
if not results:
st.warning("No products found!")
return
# Convert results into a Pandas DataFrame
df = pd.DataFrame(results)
# Rename columns for better readability
df.columns = ["Product Name", "Sale Price", "Category"]
# Display DataFrame in Streamlit
st.dataframe(df.style.format({"Sale Price": "₹{:.2f}"})) # Format price properly
import streamlit as st
def display_product_cards(products):
"""Display products in a structured and attractive way using Streamlit."""
for product in products:
print(f"Product: {product}")
product_name = product.get("product_name", "Unknown Product")
sale_price = product.get("sale_price", "N/A") # Ensure we handle missing prices
category = product.get("category", "Unknown Category")
print(f"Product: {product_name}, Price: {sale_price}, Category: {category}")
# Ensure sale_price is displayed correctly
price_display = f"₹{sale_price}" if isinstance(sale_price, (int, float)) else sale_price
st.markdown(f"""
<div style="border: 1px solid #ddd; border-radius: 10px; padding: 10px; margin: 10px 0; background-color: #f9f9f9;">
<h4 style="color: #333;">{product_name}</h4>
<p><strong>Category:</strong> {category}</p>
<p><strong>Price:</strong> {price_display}</p>
</div>
""", unsafe_allow_html=True)
def parse_response(products):
"""Extract structured product data from unstructured text."""
# products = response_text.split(", ") # Split by commas
structured_data = []
for product in products:
print(f"Product: {product}")
product_name = product.get('product_name', "Unknown Product")
sale_price = product.get('sale_price', "N/A")
category = product.get('category', "Unknown Category")
structured_data.append({"product_name": product_name.strip(), "sale_price": sale_price, "category": category.strip()})
return structured_data
if st.button("Search"):
# processed_query = preprocess_query(user_query)
response = chain.invoke({"query": query})
full_response = response.copy()
if isinstance(response, dict) and "intermediate_steps" in response and "I don't know the answer" not in full_response["result"] :
response = response["intermediate_steps"][-1] # Last step contains full context
results = parse_response(response["context"])
display_product_cards(results)
else:
st.warning("No products found!")🎬 Final Step: Running the App
Fire up the terminal and run:
streamlit run app.pyBoom! 🎉 Now you can ask things like:
- “Show me all snacks under ₹100”
- “show dairy products”
And get smart, AI-powered results directly from your Neo4j graph!
With just a few lines of code, we’ve built a Neo4j-powered graph search engine that understands natural language queries and returns structured data. The power of LLMs + Graph Databases makes search smarter and more intuitive! 🚀
What’s next? Try integrating semantic search using embeddings or expanding the app to handle user behavior analytics. The possibilities are endless!
Let me know how it goes — happy coding! 😎