Backend Architecture¶
System Architecture¶
The Certana backend follows a layered architecture pattern with clear separation of concerns:
┌─────────────────────────────────────────────┐
│ API Layer (FastAPI) │
│ - Request validation │
│ - OpenAPI documentation │
│ - Rate limiting & CORS │
└────────────────┬────────────────────────────┘
│
┌─────────────────▼────────────────────────────┐
│ Service Layer │
│ - Business logic │
│ - ML processing │
│ - Blockchain operations │
│ - Storage orchestration │
└────────────────┬────────────────────────────┘
│
┌─────────────────▼────────────────────────────┐
│ Data Layer (SQLAlchemy) │
│ - ORM models │
│ - Transactions │
│ - Query optimization │
└────────────────┬────────────────────────────┘
│
┌─────────────────▼────────────────────────────┐
│ Database & External Services │
│ - PostgreSQL (Primary) │
│ - Redis (Caching) │
│ - IPFS (Storage) │
│ - Solana RPC (Blockchain) │
└──────────────────────────────────────────────┘
Request Flow¶
Asset Upload Flow¶
1. Client uploads image
│
├─► API validation (file type, size)
│
├─► User/API key authentication
│
├─► Quota check (usage tracking)
│
├─► Store original file → IPFS
│
├─► Extract metadata (EXIF, TIFF, C2PA)
│
├─► Generate fingerprint (CLIP, DINO)
│ └─► Store embedding in pgvector
│
├─► Apply watermarking (Track A/B/C)
│
├─► Store watermarked image → IPFS/S3
│
├─► Create blockchain commitment → Solana
│
└─► Return AssetResponse with IDs
Verification Flow¶
1. Client uploads image for verification
│
├─► API validation
│
├─► Check quota (if authenticated)
│
├─► Extract watermark (all tracks)
│
├─► Compare with original via fingerprinting
│ ├─► Generate query embedding (CLIP)
│ ├─► Search similar images in FAISS index
│ └─► Compare with stored fingerprints
│
├─► Verify blockchain commitment (if exists)
│
├─► Check metadata integrity
│
├─► Generate tamper map (if enabled)
│
├─► Log verification result
│
└─► Return VerificationResult
Data Models¶
Core Entity Relationships¶
User (1) ──────────► (many) Organization
│ │
│ ├─► (many) Asset
│ │ ├─► Watermark
│ │ ├─► Fingerprint
│ │ └─► Commitment
│ │
│ └─► (many) OrganizationMember
│
└──────────► (many) ApiKey
│
└──────────► (many) VerificationLog
Key Models¶
User¶
User
├── id: UUID
├── email: str (unique)
├── full_name: str
├── password_hash: str
├── is_verified: bool
├── created_at: datetime
└── tier: str (free|pro|enterprise)
Asset¶
Asset
├── id: UUID
├── organization_id: UUID (FK)
├── content_hash: str (SHA-256)
├── original_filename: str
├── mime_type: str
├── file_size_bytes: int
├── width: int
├── height: int
├── ipfs_cid: str
├── watermarked_cid: str
├── processing_status: str
├── created_at: datetime
└── metadata: JSONB
Watermark¶
Watermark
├── id: UUID
├── asset_id: UUID (FK)
├── track_a_payload: bytes
├── track_b_payload: bytes
├── track_c_payload: bytes
├── embedding_strength: float
├── robustness_score: float
└── created_at: datetime
Fingerprint¶
Fingerprint
├── id: UUID
├── asset_id: UUID (FK)
├── embedding_type: str (clip|dino|pdq)
├── embedding_vector: Vector (pgvector)
├── hash_value: str
├── similarity_threshold: float
└── created_at: datetime
ML Pipeline¶
Watermarking (Track C - Neural Watermarking)¶
Input Image (RGB)
│
├─► Preprocess (normalize, resize)
│
├─► Generate watermark payload (128-bit commitment hash)
│
├─► Encode with invisible-watermark library
│ └─► Applies imperceptible perturbations
│
├─► Optional: Apply compression (JPEG quality 85-95)
│
└─► Output: Watermarked Image
Key Parameters: - Strength: 0.4 (configurable) - Payload: Commitment hash (32-64 bytes) - Robustness: Survives JPEG compression, minor crops
Fingerprinting (Content-Based)¶
CLIP Fingerprinting¶
Image
│
├─► Resize to 224x224
│
├─► Normalize (ImageNet stats)
│
├─► CLIP Vision Encoder (ViT-L/14)
│ └─► Output: 768-dim embedding
│
└─► Store in FAISS index + pgvector
DINO Fingerprinting¶
Image
│
├─► Resize to 518x518
│
├─► DINOv2-Large encoder
│ └─► Output: 1024-dim embedding
│
└─► Store for structural analysis
PDQ Hashing¶
Image
│
├─► Apply PDQ algorithm (Facebook)
│
├─► Generate 256-bit binary hash
│
└─► Use for fast similarity matching
Authentication & Authorization¶
JWT Token Structure¶
{
"sub": "user_id",
"email": "user@example.com",
"tier": "pro",
"iat": 1698768000,
"exp": 1698854400,
"scopes": ["read", "write", "verify"]
}
API Key Scopes¶
read- Read assets and verification historywrite- Create/modify assetsverify- Verify imagesadmin- Organization admin operations
OAuth2 Providers¶
- Google OAuth2
- GitHub OAuth2
- Custom SAML (enterprise)
Storage Architecture¶
Multi-Tier Storage Strategy¶
┌─ Hot Storage (S3 with CloudFront CDN)
│ └─ Recently accessed images
│ └─ TTL: 30 days
│
├─ Warm Storage (IPFS)
│ └─ All processed assets
│ └─ Content-addressed
│
└─ Cold Storage (Filecoin via Lighthouse)
└─ Long-term archival
└─ Verifiable storage proofs
Content Addressing¶
Original Image
│
├─► SHA-256 hash = content_hash
│
├─► IPFS CID (v1) = ipfs_cid
│ └─► Used for retrieval
│
└─► Filecoin CID (for backup)
└─► Long-term storage proof
Blockchain Integration¶
Commitment Creation Flow¶
Asset Watermark Data
│
├─► Hash commitment (SHA-256(watermark || metadata))
│
├─► Sign with organization key
│
├─► Create Solana transaction
│ ├─► Program: SOLANA_PROGRAM_ID
│ ├─► Instruction: CreateCommitment
│ ├─► Data: commitment_hash
│ └─► Signer: user keypair
│
├─► Submit to Solana RPC
│
├─► Wait for confirmation (finalized)
│
└─► Store tx_hash in database
Verification Against Blockchain¶
Reported Asset
│
├─► Fetch commitment from database
│
├─► Query Solana blockchain → tx_hash
│
├─► Verify transaction:
│ ├─► TX is finalized
│ ├─► Data matches stored commitment
│ └─► Signature is valid
│
└─► Return VerificationResult
Performance Considerations¶
Database Optimization¶
- Indexes: On frequently queried columns (user_id, asset_id, content_hash)
- Partitioning: Assets by creation date for large tables
- pgvector: HNSW index for embedding similarity search
- Connection Pooling: 20 min, 10 overflow
ML Model Optimization¶
- Model Caching: Loaded once at startup
- Batch Processing: Process multiple images in batches
- GPU Support: Optional CUDA/Metal for faster inference
- Quantization: FP16 precision for memory efficiency
Caching Strategy¶
- Redis: Cache API responses (5min TTL)
- In-Memory: Model embeddings for repeated queries
- HTTP: ETag headers for asset retrieval
Security Measures¶
Encryption¶
- Master Key: Stored in AWS KMS / Vault
- Data Encryption: AES-256-GCM for sensitive files
- Transport: TLS 1.3 for all communications
- API Keys: Hashed with bcrypt (cost: 12)
Rate Limiting¶
- Per-User: 60 req/min (adjustable by tier)
- Per-IP: 100 req/min (public endpoints)
- Sliding Window: Uses Redis for distributed limiting
Input Validation¶
- Pydantic Schemas: Automatic validation
- File Type Check: MIME type + magic bytes
- Size Limits: 100MB max for uploads
- Image Dimensions: 1px - 16384px
Error Handling¶
Standardized Error Responses¶
{
"detail": "String error message",
"error_code": "ASSET_NOT_FOUND",
"timestamp": "2024-01-15T10:30:00Z",
"request_id": "req_xxx"
}
HTTP Status Codes¶
400- Validation error401- Authentication required403- Forbidden (quota exceeded)404- Resource not found429- Rate limited500- Server error503- Service unavailable
Monitoring & Logging¶
Structured Logging¶
logger.info("asset_created",
asset_id=asset.id,
size_mb=asset.file_size_bytes / 1e6,
processing_time_ms=elapsed)
Metrics to Track¶
- Request latency (p50, p95, p99)
- Database query times
- ML model inference times
- Storage operation durations
- Blockchain commitment creation time
- Quota usage per tier
- API error rates