Skip to content

Backend Architecture

System Architecture

The Certana backend follows a layered architecture pattern with clear separation of concerns:

┌─────────────────────────────────────────────┐
│         API Layer (FastAPI)                 │
│  - Request validation                       │
│  - OpenAPI documentation                    │
│  - Rate limiting & CORS                     │
└────────────────┬────────────────────────────┘
                 │
┌─────────────────▼────────────────────────────┐
│       Service Layer                          │
│  - Business logic                            │
│  - ML processing                             │
│  - Blockchain operations                     │
│  - Storage orchestration                     │
└────────────────┬────────────────────────────┘
                 │
┌─────────────────▼────────────────────────────┐
│       Data Layer (SQLAlchemy)                │
│  - ORM models                                │
│  - Transactions                              │
│  - Query optimization                        │
└────────────────┬────────────────────────────┘
                 │
┌─────────────────▼────────────────────────────┐
│      Database & External Services            │
│  - PostgreSQL (Primary)                      │
│  - Redis (Caching)                           │
│  - IPFS (Storage)                            │
│  - Solana RPC (Blockchain)                   │
└──────────────────────────────────────────────┘

Request Flow

Asset Upload Flow

1. Client uploads image
   │
   ├─► API validation (file type, size)
   │
   ├─► User/API key authentication
   │
   ├─► Quota check (usage tracking)
   │
   ├─► Store original file → IPFS
   │
   ├─► Extract metadata (EXIF, TIFF, C2PA)
   │
   ├─► Generate fingerprint (CLIP, DINO)
   │   └─► Store embedding in pgvector
   │
   ├─► Apply watermarking (Track A/B/C)
   │
   ├─► Store watermarked image → IPFS/S3
   │
   ├─► Create blockchain commitment → Solana
   │
   └─► Return AssetResponse with IDs

Verification Flow

1. Client uploads image for verification
   │
   ├─► API validation
   │
   ├─► Check quota (if authenticated)
   │
   ├─► Extract watermark (all tracks)
   │
   ├─► Compare with original via fingerprinting
   │   ├─► Generate query embedding (CLIP)
   │   ├─► Search similar images in FAISS index
   │   └─► Compare with stored fingerprints
   │
   ├─► Verify blockchain commitment (if exists)
   │
   ├─► Check metadata integrity
   │
   ├─► Generate tamper map (if enabled)
   │
   ├─► Log verification result
   │
   └─► Return VerificationResult

Data Models

Core Entity Relationships

User (1) ──────────► (many) Organization
   │                        │
   │                        ├─► (many) Asset
   │                        │              ├─► Watermark
   │                        │              ├─► Fingerprint
   │                        │              └─► Commitment
   │                        │
   │                        └─► (many) OrganizationMember
   │
   └──────────► (many) ApiKey
   │
   └──────────► (many) VerificationLog

Key Models

User

User
├── id: UUID
├── email: str (unique)
├── full_name: str
├── password_hash: str
├── is_verified: bool
├── created_at: datetime
└── tier: str (free|pro|enterprise)

Asset

Asset
├── id: UUID
├── organization_id: UUID (FK)
├── content_hash: str (SHA-256)
├── original_filename: str
├── mime_type: str
├── file_size_bytes: int
├── width: int
├── height: int
├── ipfs_cid: str
├── watermarked_cid: str
├── processing_status: str
├── created_at: datetime
└── metadata: JSONB

Watermark

Watermark
├── id: UUID
├── asset_id: UUID (FK)
├── track_a_payload: bytes
├── track_b_payload: bytes
├── track_c_payload: bytes
├── embedding_strength: float
├── robustness_score: float
└── created_at: datetime

Fingerprint

Fingerprint
├── id: UUID
├── asset_id: UUID (FK)
├── embedding_type: str (clip|dino|pdq)
├── embedding_vector: Vector (pgvector)
├── hash_value: str
├── similarity_threshold: float
└── created_at: datetime

ML Pipeline

Watermarking (Track C - Neural Watermarking)

Input Image (RGB)
    
    ├─► Preprocess (normalize, resize)
    
    ├─► Generate watermark payload (128-bit commitment hash)
    
    ├─► Encode with invisible-watermark library
       └─► Applies imperceptible perturbations
    
    ├─► Optional: Apply compression (JPEG quality 85-95)
    
    └─► Output: Watermarked Image

Key Parameters: - Strength: 0.4 (configurable) - Payload: Commitment hash (32-64 bytes) - Robustness: Survives JPEG compression, minor crops

Fingerprinting (Content-Based)

CLIP Fingerprinting

Image
  │
  ├─► Resize to 224x224
  │
  ├─► Normalize (ImageNet stats)
  │
  ├─► CLIP Vision Encoder (ViT-L/14)
  │   └─► Output: 768-dim embedding
  │
  └─► Store in FAISS index + pgvector

DINO Fingerprinting

Image
  │
  ├─► Resize to 518x518
  │
  ├─► DINOv2-Large encoder
  │   └─► Output: 1024-dim embedding
  │
  └─► Store for structural analysis

PDQ Hashing

Image
  │
  ├─► Apply PDQ algorithm (Facebook)
  │
  ├─► Generate 256-bit binary hash
  │
  └─► Use for fast similarity matching

Authentication & Authorization

JWT Token Structure

{
  "sub": "user_id",
  "email": "user@example.com",
  "tier": "pro",
  "iat": 1698768000,
  "exp": 1698854400,
  "scopes": ["read", "write", "verify"]
}

API Key Scopes

  • read - Read assets and verification history
  • write - Create/modify assets
  • verify - Verify images
  • admin - Organization admin operations

OAuth2 Providers

  • Google OAuth2
  • GitHub OAuth2
  • Custom SAML (enterprise)

Storage Architecture

Multi-Tier Storage Strategy

┌─ Hot Storage (S3 with CloudFront CDN)
│  └─ Recently accessed images
│     └─ TTL: 30 days
│
├─ Warm Storage (IPFS)
│  └─ All processed assets
│     └─ Content-addressed
│
└─ Cold Storage (Filecoin via Lighthouse)
   └─ Long-term archival
      └─ Verifiable storage proofs

Content Addressing

Original Image
    │
    ├─► SHA-256 hash = content_hash
    │
    ├─► IPFS CID (v1) = ipfs_cid
    │   └─► Used for retrieval
    │
    └─► Filecoin CID (for backup)
        └─► Long-term storage proof

Blockchain Integration

Commitment Creation Flow

Asset Watermark Data
    │
    ├─► Hash commitment (SHA-256(watermark || metadata))
    │
    ├─► Sign with organization key
    │
    ├─► Create Solana transaction
    │   ├─► Program: SOLANA_PROGRAM_ID
    │   ├─► Instruction: CreateCommitment
    │   ├─► Data: commitment_hash
    │   └─► Signer: user keypair
    │
    ├─► Submit to Solana RPC
    │
    ├─► Wait for confirmation (finalized)
    │
    └─► Store tx_hash in database

Verification Against Blockchain

Reported Asset
    │
    ├─► Fetch commitment from database
    │
    ├─► Query Solana blockchain → tx_hash
    │
    ├─► Verify transaction:
    │   ├─► TX is finalized
    │   ├─► Data matches stored commitment
    │   └─► Signature is valid
    │
    └─► Return VerificationResult

Performance Considerations

Database Optimization

  • Indexes: On frequently queried columns (user_id, asset_id, content_hash)
  • Partitioning: Assets by creation date for large tables
  • pgvector: HNSW index for embedding similarity search
  • Connection Pooling: 20 min, 10 overflow

ML Model Optimization

  • Model Caching: Loaded once at startup
  • Batch Processing: Process multiple images in batches
  • GPU Support: Optional CUDA/Metal for faster inference
  • Quantization: FP16 precision for memory efficiency

Caching Strategy

  • Redis: Cache API responses (5min TTL)
  • In-Memory: Model embeddings for repeated queries
  • HTTP: ETag headers for asset retrieval

Security Measures

Encryption

  • Master Key: Stored in AWS KMS / Vault
  • Data Encryption: AES-256-GCM for sensitive files
  • Transport: TLS 1.3 for all communications
  • API Keys: Hashed with bcrypt (cost: 12)

Rate Limiting

  • Per-User: 60 req/min (adjustable by tier)
  • Per-IP: 100 req/min (public endpoints)
  • Sliding Window: Uses Redis for distributed limiting

Input Validation

  • Pydantic Schemas: Automatic validation
  • File Type Check: MIME type + magic bytes
  • Size Limits: 100MB max for uploads
  • Image Dimensions: 1px - 16384px

Error Handling

Standardized Error Responses

{
  "detail": "String error message",
  "error_code": "ASSET_NOT_FOUND",
  "timestamp": "2024-01-15T10:30:00Z",
  "request_id": "req_xxx"
}

HTTP Status Codes

  • 400 - Validation error
  • 401 - Authentication required
  • 403 - Forbidden (quota exceeded)
  • 404 - Resource not found
  • 429 - Rate limited
  • 500 - Server error
  • 503 - Service unavailable

Monitoring & Logging

Structured Logging

logger.info("asset_created", 
    asset_id=asset.id,
    size_mb=asset.file_size_bytes / 1e6,
    processing_time_ms=elapsed)

Metrics to Track

  • Request latency (p50, p95, p99)
  • Database query times
  • ML model inference times
  • Storage operation durations
  • Blockchain commitment creation time
  • Quota usage per tier
  • API error rates

Next Steps