Backend Architecture¶

System Architecture¶

The Certana backend follows a layered architecture pattern with clear separation of concerns:

┌─────────────────────────────────────────────┐
│         API Layer (FastAPI)                 │
│  - Request validation                       │
│  - OpenAPI documentation                    │
│  - Rate limiting & CORS                     │
└────────────────┬────────────────────────────┘
                 │
┌─────────────────▼────────────────────────────┐
│       Service Layer                          │
│  - Business logic                            │
│  - ML processing                             │
│  - Blockchain operations                     │
│  - Storage orchestration                     │
└────────────────┬────────────────────────────┘
                 │
┌─────────────────▼────────────────────────────┐
│       Data Layer (SQLAlchemy)                │
│  - ORM models                                │
│  - Transactions                              │
│  - Query optimization                        │
└────────────────┬────────────────────────────┘
                 │
┌─────────────────▼────────────────────────────┐
│      Database & External Services            │
│  - PostgreSQL (Primary)                      │
│  - Redis (Caching)                           │
│  - IPFS (Storage)                            │
│  - Solana RPC (Blockchain)                   │
└──────────────────────────────────────────────┘

Request Flow¶

Asset Upload Flow¶

1. Client uploads image
   │
   ├─► API validation (file type, size)
   │
   ├─► User/API key authentication
   │
   ├─► Quota check (usage tracking)
   │
   ├─► Store original file → IPFS
   │
   ├─► Extract metadata (EXIF, TIFF, C2PA)
   │
   ├─► Generate fingerprint (CLIP, DINO)
   │   └─► Store embedding in pgvector
   │
   ├─► Apply watermarking (Track A/B/C)
   │
   ├─► Store watermarked image → IPFS/S3
   │
   ├─► Create blockchain commitment → Solana
   │
   └─► Return AssetResponse with IDs

Verification Flow¶

1. Client uploads image for verification
   │
   ├─► API validation
   │
   ├─► Check quota (if authenticated)
   │
   ├─► Extract watermark (all tracks)
   │
   ├─► Compare with original via fingerprinting
   │   ├─► Generate query embedding (CLIP)
   │   ├─► Search similar images in FAISS index
   │   └─► Compare with stored fingerprints
   │
   ├─► Verify blockchain commitment (if exists)
   │
   ├─► Check metadata integrity
   │
   ├─► Generate tamper map (if enabled)
   │
   ├─► Log verification result
   │
   └─► Return VerificationResult

Data Models¶

Core Entity Relationships¶

User (1) ──────────► (many) Organization
   │                        │
   │                        ├─► (many) Asset
   │                        │              ├─► Watermark
   │                        │              ├─► Fingerprint
   │                        │              └─► Commitment
   │                        │
   │                        └─► (many) OrganizationMember
   │
   └──────────► (many) ApiKey
   │
   └──────────► (many) VerificationLog

Key Models¶

User¶

User
├── id: UUID
├── email: str (unique)
├── full_name: str
├── password_hash: str
├── is_verified: bool
├── created_at: datetime
└── tier: str (free|pro|enterprise)

Asset¶

Asset
├── id: UUID
├── organization_id: UUID (FK)
├── content_hash: str (SHA-256)
├── original_filename: str
├── mime_type: str
├── file_size_bytes: int
├── width: int
├── height: int
├── ipfs_cid: str
├── watermarked_cid: str
├── processing_status: str
├── created_at: datetime
└── metadata: JSONB

Watermark¶

Watermark
├── id: UUID
├── asset_id: UUID (FK)
├── track_a_payload: bytes
├── track_b_payload: bytes
├── track_c_payload: bytes
├── embedding_strength: float
├── robustness_score: float
└── created_at: datetime

Fingerprint¶

Fingerprint
├── id: UUID
├── asset_id: UUID (FK)
├── embedding_type: str (clip|dino|pdq)
├── embedding_vector: Vector (pgvector)
├── hash_value: str
├── similarity_threshold: float
└── created_at: datetime

ML Pipeline¶

Watermarking (Track C - Neural Watermarking)¶

Input Image (RGB)
    │
    ├─► Preprocess (normalize, resize)
    │
    ├─► Generate watermark payload (128-bit commitment hash)
    │
    ├─► Encode with invisible-watermark library
    │   └─► Applies imperceptible perturbations
    │
    ├─► Optional: Apply compression (JPEG quality 85-95)
    │
    └─► Output: Watermarked Image

Key Parameters: - Strength: 0.4 (configurable) - Payload: Commitment hash (32-64 bytes) - Robustness: Survives JPEG compression, minor crops

Fingerprinting (Content-Based)¶

CLIP Fingerprinting¶

Image
  │
  ├─► Resize to 224x224
  │
  ├─► Normalize (ImageNet stats)
  │
  ├─► CLIP Vision Encoder (ViT-L/14)
  │   └─► Output: 768-dim embedding
  │
  └─► Store in FAISS index + pgvector

DINO Fingerprinting¶

Image
  │
  ├─► Resize to 518x518
  │
  ├─► DINOv2-Large encoder
  │   └─► Output: 1024-dim embedding
  │
  └─► Store for structural analysis

PDQ Hashing¶

Image
  │
  ├─► Apply PDQ algorithm (Facebook)
  │
  ├─► Generate 256-bit binary hash
  │
  └─► Use for fast similarity matching

Authentication & Authorization¶

JWT Token Structure¶

{
  "sub": "user_id",
  "email": "user@example.com",
  "tier": "pro",
  "iat": 1698768000,
  "exp": 1698854400,
  "scopes": ["read", "write", "verify"]
}

API Key Scopes¶

read - Read assets and verification history
write - Create/modify assets
verify - Verify images
admin - Organization admin operations

OAuth2 Providers¶

Google OAuth2
GitHub OAuth2
Custom SAML (enterprise)

Storage Architecture¶

Multi-Tier Storage Strategy¶

┌─ Hot Storage (S3 with CloudFront CDN)
│  └─ Recently accessed images
│     └─ TTL: 30 days
│
├─ Warm Storage (IPFS)
│  └─ All processed assets
│     └─ Content-addressed
│
└─ Cold Storage (Filecoin via Lighthouse)
   └─ Long-term archival
      └─ Verifiable storage proofs

Content Addressing¶

Original Image
    │
    ├─► SHA-256 hash = content_hash
    │
    ├─► IPFS CID (v1) = ipfs_cid
    │   └─► Used for retrieval
    │
    └─► Filecoin CID (for backup)
        └─► Long-term storage proof

Blockchain Integration¶

Commitment Creation Flow¶

Asset Watermark Data
    │
    ├─► Hash commitment (SHA-256(watermark || metadata))
    │
    ├─► Sign with organization key
    │
    ├─► Create Solana transaction
    │   ├─► Program: SOLANA_PROGRAM_ID
    │   ├─► Instruction: CreateCommitment
    │   ├─► Data: commitment_hash
    │   └─► Signer: user keypair
    │
    ├─► Submit to Solana RPC
    │
    ├─► Wait for confirmation (finalized)
    │
    └─► Store tx_hash in database

Verification Against Blockchain¶

Reported Asset
    │
    ├─► Fetch commitment from database
    │
    ├─► Query Solana blockchain → tx_hash
    │
    ├─► Verify transaction:
    │   ├─► TX is finalized
    │   ├─► Data matches stored commitment
    │   └─► Signature is valid
    │
    └─► Return VerificationResult

Performance Considerations¶

Database Optimization¶

Indexes: On frequently queried columns (user_id, asset_id, content_hash)
Partitioning: Assets by creation date for large tables
pgvector: HNSW index for embedding similarity search
Connection Pooling: 20 min, 10 overflow

ML Model Optimization¶

Model Caching: Loaded once at startup
Batch Processing: Process multiple images in batches
GPU Support: Optional CUDA/Metal for faster inference
Quantization: FP16 precision for memory efficiency

Caching Strategy¶

Redis: Cache API responses (5min TTL)
In-Memory: Model embeddings for repeated queries
HTTP: ETag headers for asset retrieval

Security Measures¶

Encryption¶

Master Key: Stored in AWS KMS / Vault
Data Encryption: AES-256-GCM for sensitive files
Transport: TLS 1.3 for all communications
API Keys: Hashed with bcrypt (cost: 12)

Rate Limiting¶

Per-User: 60 req/min (adjustable by tier)
Per-IP: 100 req/min (public endpoints)
Sliding Window: Uses Redis for distributed limiting

Input Validation¶

Pydantic Schemas: Automatic validation
File Type Check: MIME type + magic bytes
Size Limits: 100MB max for uploads
Image Dimensions: 1px - 16384px

Error Handling¶

Standardized Error Responses¶

{
  "detail": "String error message",
  "error_code": "ASSET_NOT_FOUND",
  "timestamp": "2024-01-15T10:30:00Z",
  "request_id": "req_xxx"
}

HTTP Status Codes¶

400 - Validation error
401 - Authentication required
403 - Forbidden (quota exceeded)
404 - Resource not found
429 - Rate limited
500 - Server error
503 - Service unavailable

Monitoring & Logging¶

Structured Logging¶

logger.info("asset_created", 
    asset_id=asset.id,
    size_mb=asset.file_size_bytes / 1e6,
    processing_time_ms=elapsed)

Metrics to Track¶

Request latency (p50, p95, p99)
Database query times
ML model inference times
Storage operation durations
Blockchain commitment creation time
Quota usage per tier
API error rates