Skip to content

Backend Fingerprinting

Content-based fingerprinting using machine learning models.

Overview

Three complementary fingerprinting approaches:

  • CLIP: Semantic content fingerprinting (768-dim)
  • DINO: Structural fingerprinting (1024-dim)
  • PDQ: Perceptual hash (256-bit)

CLIP Fingerprinting

OpenAI CLIP model for semantic content embedding.

Characteristics: - 768-dimensional embeddings - Semantic understanding of content - Good for content-based matching - Robust to minor edits

DINO Fingerprinting

Meta's DINO v2 for structural analysis.

Characteristics: - 1024-dimensional embeddings - Captures structural/layout information - Robust to significant transformations - Foundation model approach

PDQ Hash

Facebook's perceptual hash algorithm.

Characteristics: - 256-bit binary hash - Fast approximate matching - Robust to compression - Used for rapid filtering

Similarity Matching

FAISS index for efficient similarity search:

# Query embedding
query = extract_clip_fingerprint(image)

# Search similar images
distances, indices = faiss_index.search(query, k=10)

# Calculate similarity scores
similarities = 1 - (distances / 2)

Implementation

See Services Documentation for detailed code.