Skip to content

Backend Database Schema

Comprehensive PostgreSQL database schema for Certana.

Database Overview

  • Engine: PostgreSQL 15+
  • ORM: SQLAlchemy 2.0
  • Vector Support: pgvector for embeddings
  • Async Driver: AsyncPG

Entity Relationship Diagram

User (1) ──────────(N)──────────► Organization
 │                                    │
 │                                    ├─(N)──► Asset ─(1)─► Watermark
 │                                    │         │
 │                                    │         ├─(1)─► Fingerprint
 │                                    │         │
 │                                    │         ├─(1)─► Commitment
 │                                    │         │
 │                                    │         └─(1)─► Provenance
 │                                    │
 │                                    └─(N)──► OrganizationMember
 │
 ├─(1)──► ApiKey
 │
 ├─(N)──► VerificationLog
 │
 └─(1)──► EmailVerification

Users Table

Stores user account information.

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    full_name VARCHAR(255),
    password_hash VARCHAR(255) NOT NULL,
    is_verified BOOLEAN DEFAULT FALSE,
    verification_token VARCHAR(255),
    tier VARCHAR(50) DEFAULT 'free',  -- free, pro, enterprise
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE,
    last_login_at TIMESTAMP WITH TIME ZONE,

    INDEX idx_email (email),
    INDEX idx_tier (tier),
    INDEX idx_created_at (created_at)
);

Fields

Field Type Constraints Description
id UUID PK, Default Unique user identifier
email VARCHAR(255) UNIQUE, NOT NULL User email (login)
full_name VARCHAR(255) - User display name
password_hash VARCHAR(255) NOT NULL Bcrypt hash (12 rounds)
is_verified BOOLEAN Default false Email verification status
verification_token VARCHAR(255) - Email verification token
tier VARCHAR(50) Default 'free' Subscription tier
created_at TIMESTAMP Default now() Account creation time
updated_at TIMESTAMP - Last update time
last_login_at TIMESTAMP - Last login timestamp

Organizations Table

Stores organization/team information.

CREATE TABLE organizations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    owner_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    name VARCHAR(255) NOT NULL,
    slug VARCHAR(255) UNIQUE NOT NULL,
    description TEXT,
    logo_url VARCHAR(500),
    plan VARCHAR(50) DEFAULT 'free',  -- free, pro, enterprise
    master_key_ref VARCHAR(255) NOT NULL,
    max_users INTEGER DEFAULT 1,
    max_assets INTEGER DEFAULT 1000,
    max_verifications_per_day INTEGER DEFAULT 100,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE,

    INDEX idx_owner_id (owner_id),
    INDEX idx_slug (slug),
    INDEX idx_plan (plan)
);

Assets Table

Stores uploaded image assets.

CREATE TABLE assets (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    created_by UUID NOT NULL REFERENCES users(id) ON DELETE SET NULL,

    -- Content identifiers
    cid VARCHAR(100) UNIQUE NOT NULL,  -- IPFS content identifier
    content_hash VARCHAR(64) NOT NULL,  -- SHA-256 hash

    -- File info
    original_filename VARCHAR(500),
    mime_type VARCHAR(100) NOT NULL,
    file_size_bytes BIGINT NOT NULL,
    width INTEGER NOT NULL,
    height INTEGER NOT NULL,

    -- Storage locations
    ipfs_cid VARCHAR(100),
    filecoin_cid VARCHAR(100),
    watermarked_cid VARCHAR(100),
    watermarked_storage_url VARCHAR(1024),
    preview_url VARCHAR(1024),

    -- Encryption
    is_encrypted BOOLEAN DEFAULT FALSE,
    encryption_key BYTEA,
    encryption_salt BYTEA,

    -- Metadata
    title VARCHAR(500),
    description TEXT,
    tags TEXT[],
    exif_metadata JSONB,
    tiff_metadata JSONB,
    c2pa_metadata JSONB,

    -- Processing
    processing_status VARCHAR(50) DEFAULT 'pending',  -- pending, processing, completed, failed
    processing_error TEXT,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE,

    INDEX idx_organization_id (organization_id),
    INDEX idx_created_by (created_by),
    INDEX idx_content_hash (content_hash),
    INDEX idx_cid (cid),
    INDEX idx_processing_status (processing_status),
    INDEX idx_created_at (created_at)
);

Watermarks Table

Stores watermark information for each asset.

CREATE TABLE watermarks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    asset_id UUID NOT NULL UNIQUE REFERENCES assets(id) ON DELETE CASCADE,

    -- Watermark payloads (one per track)
    track_a_payload BYTEA,  -- Spatial domain watermark
    track_b_payload BYTEA,  -- Frequency domain watermark
    track_c_payload BYTEA,  -- Neural watermark (invisible)

    -- Detection metrics
    track_a_detected BOOLEAN DEFAULT FALSE,
    track_b_detected BOOLEAN DEFAULT FALSE,
    track_c_detected BOOLEAN DEFAULT FALSE,

    -- Quality metrics
    embedding_strength FLOAT,
    robustness_score FLOAT,

    -- Recovery info
    recovered_payload BYTEA,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE,

    INDEX idx_asset_id (asset_id)
);

Fingerprints Table

Stores content fingerprints using pgvector for embeddings.

CREATE TABLE fingerprints (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    asset_id UUID NOT NULL UNIQUE REFERENCES assets(id) ON DELETE CASCADE,

    -- CLIP fingerprint (768 dimensions)
    clip_embedding vector(768),
    clip_hash VARCHAR(256),

    -- DINO fingerprint (1024 dimensions)
    dino_embedding vector(1024),
    dino_hash VARCHAR(256),

    -- PDQ hash (Facebook perceptual hash)
    pdq_hash VARCHAR(256),
    pdq_hash_int BIGINT,  -- For Hamming distance queries

    -- Index metadata
    embedding_model VARCHAR(100),
    similarity_threshold FLOAT DEFAULT 0.8,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE,

    INDEX idx_asset_id (asset_id),
    -- HNSW index for fast similarity search
    INDEX idx_clip_embedding ON fingerprints USING hnsw(clip_embedding vector_cosine_ops),
    INDEX idx_dino_embedding ON fingerprints USING hnsw(dino_embedding vector_cosine_ops)
);

Commitments Table

Stores blockchain commitments.

CREATE TABLE commitments (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    asset_id UUID NOT NULL REFERENCES assets(id) ON DELETE CASCADE,
    organization_id UUID NOT NULL REFERENCES organizations(id),

    -- Commitment data
    commitment_hash VARCHAR(256) NOT NULL,
    commitment_payload BYTEA NOT NULL,

    -- Blockchain info
    blockchain_network VARCHAR(50),  -- mainnet, devnet, testnet
    tx_hash VARCHAR(256) UNIQUE,
    signature BYTEA NOT NULL,

    -- Status
    confirmation_status VARCHAR(50) DEFAULT 'pending',  -- pending, confirmed, finalized
    block_height BIGINT,
    confirmed_at TIMESTAMP WITH TIME ZONE,

    -- Verification
    verified BOOLEAN DEFAULT FALSE,
    verified_at TIMESTAMP WITH TIME ZONE,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_asset_id (asset_id),
    INDEX idx_tx_hash (tx_hash),
    INDEX idx_commitment_hash (commitment_hash),
    INDEX idx_confirmation_status (confirmation_status)
);

Fingerprint Similarity Cache

For faster fingerprint matching without recalculating:

CREATE TABLE fingerprint_matches (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_fingerprint_id UUID NOT NULL REFERENCES fingerprints(id) ON DELETE CASCADE,
    target_fingerprint_id UUID NOT NULL REFERENCES fingerprints(id) ON DELETE CASCADE,
    target_asset_id UUID NOT NULL REFERENCES assets(id) ON DELETE CASCADE,

    -- Similarity scores
    clip_similarity FLOAT,
    dino_similarity FLOAT,
    pdq_similarity FLOAT,

    -- Average similarity
    average_similarity FLOAT,

    is_match BOOLEAN,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    UNIQUE (source_fingerprint_id, target_fingerprint_id),
    INDEX idx_source (source_fingerprint_id),
    INDEX idx_target (target_fingerprint_id),
    INDEX idx_average_similarity (average_similarity DESC)
);

Verification Logs Table

Tracks all verification attempts.

CREATE TABLE verification_logs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID REFERENCES users(id) ON DELETE SET NULL,
    api_key_id UUID REFERENCES api_keys(id) ON DELETE SET NULL,

    -- Verification target
    verified_asset_id UUID REFERENCES assets(id) ON DELETE SET NULL,
    source_image_hash VARCHAR(256),
    source_image_size_bytes BIGINT,

    -- Results
    verdict VARCHAR(50),  -- authentic, forged, unknown
    confidence FLOAT,

    -- Detection results
    watermark_detected BOOLEAN,
    watermark_confidence FLOAT,
    fingerprint_match_found BOOLEAN,
    fingerprint_similarity FLOAT,
    blockchain_verified BOOLEAN,

    -- Processing
    processing_time_ms INTEGER,
    model_version VARCHAR(100),

    -- Metadata
    requester_ip VARCHAR(45),
    user_agent TEXT,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_user_id (user_id),
    INDEX idx_verdict (verdict),
    INDEX idx_verified_asset_id (verified_asset_id),
    INDEX idx_created_at (created_at)
);

API Keys Table

Stores API keys for programmatic access.

CREATE TABLE api_keys (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,

    -- Key info
    key_prefix VARCHAR(32) NOT NULL,  -- cert_pk_xxx
    key_hash VARCHAR(256) UNIQUE NOT NULL,  -- Hashed (bcrypt)
    name VARCHAR(255) NOT NULL,

    -- Permissions
    scopes TEXT[] DEFAULT ARRAY['read', 'verify'],

    -- Rate limits
    rate_limit_per_hour INTEGER,
    rate_limit_per_day INTEGER,

    -- Status
    is_active BOOLEAN DEFAULT TRUE,
    expires_at TIMESTAMP WITH TIME ZONE,
    last_used_at TIMESTAMP WITH TIME ZONE,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,

    INDEX idx_organization_id (organization_id),
    INDEX idx_user_id (user_id),
    INDEX idx_key_prefix (key_prefix),
    INDEX idx_is_active (is_active)
);

Quota Limits Table

Tracks usage limits per organization.

CREATE TABLE quota_limits (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL UNIQUE REFERENCES organizations(id) ON DELETE CASCADE,

    -- Current period (monthly)
    period_start DATE NOT NULL,
    period_end DATE NOT NULL,

    -- Usage tracking
    uploads_count INTEGER DEFAULT 0,
    verifications_count INTEGER DEFAULT 0,
    storage_used_bytes BIGINT DEFAULT 0,

    -- Limits
    uploads_limit INTEGER NOT NULL,
    verifications_limit INTEGER NOT NULL,
    storage_limit_bytes BIGINT NOT NULL,

    -- Status
    quota_exceeded BOOLEAN DEFAULT FALSE,

    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE,

    INDEX idx_organization_id (organization_id),
    INDEX idx_period (period_start, period_end)
);

Indexes Strategy

Primary Indexes

  • User authentication: idx_email
  • Asset lookup: idx_organization_id, idx_content_hash
  • Verification: idx_verified_asset_id, idx_created_at
  • Blockchain: idx_tx_hash, idx_confirmation_status

Vector Indexes

  • HNSW indexes on fingerprint embeddings for fast similarity search
  • Cosine distance metric for semantic similarity

Time-based Indexes

  • idx_created_at on assets, verification_logs for range queries
  • Partition assets by month for large tables

Migrations

Use Alembic for version control:

# Create new migration
alembic revision --autogenerate -m "Add new field"

# Apply migrations
alembic upgrade head

# Rollback
alembic downgrade -1

Performance Tuning

Connection Pooling

DATABASE_POOL_SIZE = 20
DATABASE_MAX_OVERFLOW = 10

Query Optimization

# Use join loading
await session.execute(
    select(Asset).options(
        joinedload(Asset.watermark),
        joinedload(Asset.fingerprint)
    )
)

Index Maintenance

-- Analyze table for query planner
ANALYZE assets;

-- Reindex if needed
REINDEX INDEX idx_asset_id;

Backup & Recovery

Backup

pg_dump -U certana -d certana > certana_backup.sql

Restore

psql -U certana -d certana < certana_backup.sql

Next Steps