Comparing Two Images: From Classic Vision to Deep Learning
In the grand tapestry of computer vision, the task of comparing two images is a thread that appears in diverse applications — be it in facial recognition, forgery detection, duplicate image detection, or image retrieval systems. But like any good story, there’s more than one way to tell it.
In this post, we shall explore and compare four powerful approaches:
- Feature-based methods using SIFT, SURF, and ORB
- Deep learning-based comparison
- Hash-based comparison using MD5
Let’s peel back the layers of each method, from the classical to the cutting-edge.
🌟 1. Classical Feature-Based Comparison
These methods are the Sherlock Holmes of image processing — painstakingly inspecting corners and edges to piece together similarities.
🔍 SIFT (Scale-Invariant Feature Transform)
- Inventor: David Lowe
- Strengths: Invariant to scale, rotation, and partially to affine transformations.
How it works:
- Detects keypoints in both images.
- Extracts descriptors from local image patches.
- Matches descriptors using distance metrics (like Euclidean).
import cv2
img1 = cv2.imread('image1.jpg', 0)
img2 = cv2.imread('image2.jpg', 0)
sift = cv2.SIFT_create()
kp1, des1 = sift.detectAndCompute(img1, None)
kp2, des2 = sift.detectAndCompute(img2, None)
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1, des2, k=2)
# Apply Lowe's ratio test
good = []
for m, n in matches:
if m.distance < 0.75 * n.distance:
good.append(m)
print(f"Number of good matches: {len(good)}")
⚡ SURF (Speeded-Up Robust Features)
- Inventor: Bay, Tuytelaars, and Van Gool
- Pros: Faster than SIFT, robust to similar transformations
- Cons: Patented (though now partially free via OpenCV-contrib)
surf = cv2.xfeatures2d.SURF_create(400)
kp1, des1 = surf.detectAndCompute(img1, None)
kp2, des2 = surf.detectAndCompute(img2, None)
🚀 ORB (Oriented FAST and Rotated BRIEF)
- Open-source, royalty-free
- Pros: Fast and efficient for real-time systems
- Cons: Less accurate than SIFT/SURF for complex scenes
orb = cv2.ORB_create()
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = bf.match(des1, des2)
matches = sorted(matches, key=lambda x: x.distance)
print(f"Number of matches: {len(matches)}")
🤖 2. Deep Learning-Based Comparison
Ah, the modern sorcery. Here, we let neural networks do the thinking.
💡 Method: Embedding-based Comparison using Pretrained CNN
- Concept: Convert images to embeddings using models like ResNet or VGG. Then, compute similarity using cosine or Euclidean distance.
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
from scipy.spatial.distance import cosine
# Load pre-trained model
model = models.resnet18(pretrained=True)
model = torch.nn.Sequential(*(list(model.children())[:-1])) # Remove classifier
model.eval()
def get_embedding(img_path):
img = Image.open(img_path).convert('RGB')
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
img_t = transform(img).unsqueeze(0)
with torch.no_grad():
emb = model(img_t)
return emb.view(-1).numpy()
emb1 = get_embedding('image1.jpg')
emb2 = get_embedding('image2.jpg')
similarity = 1 - cosine(emb1, emb2)
print(f"Cosine Similarity: {similarity:.4f}")
🧠 Higher similarity = more visually alike. This works beautifully even when the images have been slightly cropped, compressed, or resized.
🔐 3. MD5 Hashing for Pixel-Level Equality
Like comparing two poems letter by letter, MD5 tells us if two images are identical at the binary level.
- Use case: Detecting exact duplicates
- Limitations: Any small change (like brightness) results in a totally different hash.
import hashlib
def get_md5_hash(filename):
with open(filename, 'rb') as f:
data = f.read()
return hashlib.md5(data).hexdigest()
hash1 = get_md5_hash('image1.jpg')
hash2 = get_md5_hash('image2.jpg')
print("Identical" if hash1 == hash2 else "Different")
🎯 Which One to Choose?
- 🧬 Need perceptual similarity (e.g., face comparison, style match)? Use deep learning.
- 🕵️ Need to match object parts despite rotation or scale? Choose SIFT or SURF.
- ⚡ Real-time, low-power app? Go for ORB.
- 🔐 Duplicate detection with precision? MD5 will serve you well.
🎁 Final Thoughts
In a world awash with images, the art of comparison is both a science and a craft. Whether you favor the classical wisdom of SIFT or the neural magic of deep embeddings, the key is to choose based on your application’s soul.
May your pixels always align and your features never falter. 🌌