Sitemap

🛒 AI-Powered Grocery Search: Building a Neo4j Graph App with Streamlit & Magic! ✨

5 min readMar 13, 2025
Press enter or click to view image in full size

🚀 Introduction

So, you’ve got a ton of product data, and you want to search it semantically — like a human, not a robot that only understands exact matches. But traditional databases don’t cut it, and vector databases sound too complicated. Enter Neo4j, the graph database that makes relationships as easy to handle as your Netflix recommendations.

In this blog, we’ll go from zero to hero, setting up Neo4j, uploading product data, and integrating it into a Streamlit-powered AI app for natural language search. And yes, we’re throwing in some LLM magic! ✨

🏗️ Step 1: Setting Up Neo4j & Uploading Data

1. Install Dependencies

Before diving in, install the necessary Python libraries:

pip install pandas neo4j streamlit langchain langchain-community langchain-groq

2. Connect to Neo4j & Upload Data

We’re using Neo4j Aura (a cloud-based instance) for simplicity. Here’s how we insert our product data:
Download the data -> Kaggle

import pandas as pd
from neo4j import GraphDatabase
# Load CSV dataset
file_path = "BigBasket_Products.csv"
df = pd.read_csv(file_path)
# Neo4j Connection Details
uri = "neo4j+s://your-database-url"
username = "neo4j"
password = "your-password"
# Create Neo4j Driver
driver = GraphDatabase.driver(uri, auth=(username, password))
# Cypher Query for Insertion
def insert_product(tx, product, sale_price, market_price, p_type, rating, brand, category, sub_category):
if pd.isna(product) or pd.isna(brand) or pd.isna(category) or pd.isna(sub_category):
return # Skip rows with essential missing values
query = """
MERGE (p:Product {name: $product})
SET p.sale_price = $sale_price,
p.market_price = $market_price,
p.type = $p_type,
p.rating = $rating
MERGE (b:Brand {name: $brand})
MERGE (c:Category {name: $category})
MERGE (sc:SubCategory {name: $sub_category})
MERGE (p)-[:BELONGS_TO]->(b)
MERGE (p)-[:IN_CATEGORY]->(c)
MERGE (p)-[:IN_SUBCATEGORY]->(sc)
"""
tx.run(query, product=product, sale_price=sale_price, market_price=market_price,
p_type=p_type, rating=rating, brand=brand, category=category, sub_category=sub_category)
# Insert Data Row by Row
with driver.session() as session:
for _, row in df.iterrows():
session.write_transaction(insert_product, row["product"], row["sale_price"],
row["market_price"], row["type"], row["rating"],
row["brand"], row["category"], row["sub_category"])
print("Data inserted successfully!")
# Close the connection
driver.close()

Boom! 💥 Your product data is now in Neo4j.

Press enter or click to view image in full size
Neo4j dashboard

🤖 Step 2: Building a Streamlit App for AI-Powered Search

Now let’s make a cool Streamlit app that queries this Neo4j database using LangChain + LLMs (via Groq’s API).

1. Install Additional Dependencies

pip install streamlit langchain-groq langchain-experimental

2. Create the Streamlit App

import streamlit as st
from langchain.chains import GraphCypherQAChain
from langchain.graphs import Neo4jGraph
import pandas as pd
import re
from langchain_groq import ChatGroq
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain.prompts import PromptTemplate

# Connect to Neo4j
graph = Neo4jGraph(url="neo4j+s://your-database-url", username="neo4j", password="your-password")
# Initialize LLM (GPT-4 via Groq API)
llm = ChatGroq(groq_api_key="your-groq-api-key", model_name="Gemma2-9b-It")
llm_transformer = LLMGraphTransformer(llm=llm)
# Define Custom Cypher Prompt
custom_cypher_prompt = PromptTemplate(
input_variables=["query"],
template="""
Given the following user query:
"{query}"

Generate a Cypher query that:
- Finds products linked to relevant categories.
- Uses `CONTAINS` for flexible matching.
- Ensures relationships are respected.
- Returns structured data with product name, sale price, and category.

Example:
User Query: "Show me Snacks products"
Cypher Query:
MATCH (p:Product)-[:IN_CATEGORY]->(c:Category)
WHERE toLower(c.name) CONTAINS "snacks"
RETURN p.name AS product_name, p.sale_price AS sale_price, c.name AS category
LIMIT 10
"""
)
# Create LangChain GraphCypherQAChain
chain = GraphCypherQAChain.from_llm(
llm=llm,
graph=graph,
verbose=True,
allow_dangerous_requests=True,
cypher_prompt=custom_cypher_prompt,
return_intermediate_steps=True
)
st.title("🔍 LLM-Powered Neo4j Graph Explorer")
st.write("Ask questions about your product database using natural language!")
query = st.text_input("Enter your query (e.g., 'Show me products with price > 100'):")
def display_results(results):
if not results:
st.warning("No products found!")
return

# Convert results into a Pandas DataFrame
df = pd.DataFrame(results)

# Rename columns for better readability
df.columns = ["Product Name", "Sale Price", "Category"]

# Display DataFrame in Streamlit
st.dataframe(df.style.format({"Sale Price": "₹{:.2f}"})) # Format price properly

import streamlit as st

def display_product_cards(products):
"""Display products in a structured and attractive way using Streamlit."""
for product in products:
print(f"Product: {product}")
product_name = product.get("product_name", "Unknown Product")
sale_price = product.get("sale_price", "N/A") # Ensure we handle missing prices
category = product.get("category", "Unknown Category")
print(f"Product: {product_name}, Price: {sale_price}, Category: {category}")
# Ensure sale_price is displayed correctly
price_display = f"₹{sale_price}" if isinstance(sale_price, (int, float)) else sale_price

st.markdown(f"""
<div style="border: 1px solid #ddd; border-radius: 10px; padding: 10px; margin: 10px 0; background-color: #f9f9f9;">
<h4 style="color: #333;">{product_name}</h4>
<p><strong>Category:</strong> {category}</p>
<p><strong>Price:</strong> {price_display}</p>
</div>
""", unsafe_allow_html=True)

def parse_response(products):
"""Extract structured product data from unstructured text."""
# products = response_text.split(", ") # Split by commas
structured_data = []
for product in products:
print(f"Product: {product}")
product_name = product.get('product_name', "Unknown Product")
sale_price = product.get('sale_price', "N/A")
category = product.get('category', "Unknown Category")
structured_data.append({"product_name": product_name.strip(), "sale_price": sale_price, "category": category.strip()})
return structured_data

if st.button("Search"):
# processed_query = preprocess_query(user_query)
response = chain.invoke({"query": query})
full_response = response.copy()
if isinstance(response, dict) and "intermediate_steps" in response and "I don't know the answer" not in full_response["result"] :
response = response["intermediate_steps"][-1] # Last step contains full context
results = parse_response(response["context"])
display_product_cards(results)
else:
st.warning("No products found!")

🎬 Final Step: Running the App

Fire up the terminal and run:

streamlit run app.py

Boom! 🎉 Now you can ask things like:

  • “Show me all snacks under ₹100”
Press enter or click to view image in full size
  • show dairy products
Press enter or click to view image in full size

And get smart, AI-powered results directly from your Neo4j graph!

With just a few lines of code, we’ve built a Neo4j-powered graph search engine that understands natural language queries and returns structured data. The power of LLMs + Graph Databases makes search smarter and more intuitive! 🚀

What’s next? Try integrating semantic search using embeddings or expanding the app to handle user behavior analytics. The possibilities are endless!

Let me know how it goes — happy coding! 😎

--

--

Aditya Mangal
Aditya Mangal

Written by Aditya Mangal

Tech enthusiast weaving stories of code and life. Writing about innovation, reflection, and the timeless dance between mind and heart.

No responses yet