Sitemap

Turn Any Website into a Smart Chatbot with AI-Powered Content Parsing!

4 min readFeb 21, 2025
Press enter or click to view image in full size

Alright, class! Today, we are going to discuss something far more useful than just scrolling endlessly through the internet. It’s called Content Parsing and Q&A, and if you pay attention, you might just build something cooler than your last failed attempt at a to-do list app. So, sit straight, open your brains, and let’s get into it!

What Is This Magical Thing?

This project is a Flask application that allows users to scrape content from URLs, store it in a vector database (ChromaDB, if you must know), and interact with the content using a chatbox and also use Cross Encoder. Think of it as your personal AI librarian — except it doesn’t give you judgmental looks for asking weird questions.

It uses:

  • ChromaDB for storing and retrieving the scraped content
  • Bedrock API to generate answers based on the content
  • Flask for serving this masterpiece to the world

Features That Make This Project Cool

Scrapes content from one or more URLs (Yes, even that shady-looking news site you love)

Stores scraped content in ChromaDB (because your brain can’t remember everything)

It lets you query the stored content using a chatbox (finally, a chatbot that makes sense!)

Uses the Bedrock API to generate smart answers (unlike your roommate)

Requirements (Because Nothing Works by Magic)

To run this beast, you need:

Python 3.7+ (Don't even try Python 2.7, I will find you.)
Flask
requests
beautifulsoup4
chromadb
sentence-transformers
boto3

If you don’t have these installed, don’t complain when things don’t work!

Installation (Follow This or Suffer!)

  1. Clone the repository like a true hacker:
git clone https://github.com/adityamangal1998/UI-Content-Parsing.git cd UI-Content-Parsing
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Configure your AWS credentials in config.py (or summon an AWS wizard to do it for you):
# AWS Configuration 
AWS_ACCESS_KEY_ID = 'your-aws-access-key-id'
AWS_SECRET_ACCESS_KEY = 'your-aws-secret-access-key'
AWS_REGION_NAME = 'your-aws-region'
# Bedrock Model Configuration
BEDROCK_MODEL_ID = 'your-bedrock-model-id'
# ChromaDB Configuration
CHROMA_DB_PATH = './chroma_db'
CHROMA_COLLECTION_NAME = 'web_scraped_data'
# Cross-Encoder Model
CROSS_ENCODER_MODEL = 'cross-encoder/ms-marco-MiniLM-L-6-v2'
# Embedding Model
EMBEDDING_MODEL = 'BAAI/bge-base-en'

If you mess this up, don’t come crying. Just read carefully!

Usage (A.K.A. How Not to Break It)

  1. Run the Flask application:
python app.py
  1. Open your browser and go to: 👉 http://localhost:5000
  2. Paste one or more URLs in the input box and click “Scrape Content”.
  3. After scraping, enter a question in the chatbox and hit “Get Answer”.
Press enter or click to view image in full size

Be amazed as your AI assistant gives you an answer (or be disappointed if you picked a bad website).

Project Structure (Because Organization Matters)

UI-Content-Parsing
├── app.py # Main Flask application
├── config.py # Secret keys and settings (guard it with your life)
├── model.py # Function to invoke Bedrock API
├── requirements.txt # Dependencies list (Do NOT ignore this!)
├── static/
│ ├── scripts.js # Frontend magic
│ └── styles.css # Make it pretty
├── templates/
│ └── index.html # Where the frontend action happens
├── utils.py # Scraping helper functions
└── vectordb.py # Handles ChromaDB storage and retrieval

If you randomly delete files, don’t blame me when things stop working.

Final Words

If you made it this far, congratulations! You now know how to scrape, store, and chat with web content. If you didn’t, well… go back and read again! This isn’t a bedtime story; it’s a real project that can make your life easier.

Now go forth, install it, break it (I mean, test it), and maybe even improve it! Just don’t ask it embarrassing questions — it remembers everything. 😏

Happy coding! 🚀

--

--

Aditya Mangal
Aditya Mangal

Written by Aditya Mangal

Tech enthusiast weaving stories of code and life. Writing about innovation, reflection, and the timeless dance between mind and heart.

No responses yet