Backend Database Schema¶
Comprehensive PostgreSQL database schema for Certana.
Database Overview¶
- Engine: PostgreSQL 15+
- ORM: SQLAlchemy 2.0
- Vector Support: pgvector for embeddings
- Async Driver: AsyncPG
Entity Relationship Diagram¶
User (1) ──────────(N)──────────► Organization
│ │
│ ├─(N)──► Asset ─(1)─► Watermark
│ │ │
│ │ ├─(1)─► Fingerprint
│ │ │
│ │ ├─(1)─► Commitment
│ │ │
│ │ └─(1)─► Provenance
│ │
│ └─(N)──► OrganizationMember
│
├─(1)──► ApiKey
│
├─(N)──► VerificationLog
│
└─(1)──► EmailVerification
Users Table¶
Stores user account information.
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
full_name VARCHAR(255),
password_hash VARCHAR(255) NOT NULL,
is_verified BOOLEAN DEFAULT FALSE,
verification_token VARCHAR(255),
tier VARCHAR(50) DEFAULT 'free', -- free, pro, enterprise
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE,
last_login_at TIMESTAMP WITH TIME ZONE,
INDEX idx_email (email),
INDEX idx_tier (tier),
INDEX idx_created_at (created_at)
);
Fields¶
| Field | Type | Constraints | Description |
|---|---|---|---|
| id | UUID | PK, Default | Unique user identifier |
| VARCHAR(255) | UNIQUE, NOT NULL | User email (login) | |
| full_name | VARCHAR(255) | - | User display name |
| password_hash | VARCHAR(255) | NOT NULL | Bcrypt hash (12 rounds) |
| is_verified | BOOLEAN | Default false | Email verification status |
| verification_token | VARCHAR(255) | - | Email verification token |
| tier | VARCHAR(50) | Default 'free' | Subscription tier |
| created_at | TIMESTAMP | Default now() | Account creation time |
| updated_at | TIMESTAMP | - | Last update time |
| last_login_at | TIMESTAMP | - | Last login timestamp |
Organizations Table¶
Stores organization/team information.
CREATE TABLE organizations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
owner_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE NOT NULL,
description TEXT,
logo_url VARCHAR(500),
plan VARCHAR(50) DEFAULT 'free', -- free, pro, enterprise
master_key_ref VARCHAR(255) NOT NULL,
max_users INTEGER DEFAULT 1,
max_assets INTEGER DEFAULT 1000,
max_verifications_per_day INTEGER DEFAULT 100,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE,
INDEX idx_owner_id (owner_id),
INDEX idx_slug (slug),
INDEX idx_plan (plan)
);
Assets Table¶
Stores uploaded image assets.
CREATE TABLE assets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
created_by UUID NOT NULL REFERENCES users(id) ON DELETE SET NULL,
-- Content identifiers
cid VARCHAR(100) UNIQUE NOT NULL, -- IPFS content identifier
content_hash VARCHAR(64) NOT NULL, -- SHA-256 hash
-- File info
original_filename VARCHAR(500),
mime_type VARCHAR(100) NOT NULL,
file_size_bytes BIGINT NOT NULL,
width INTEGER NOT NULL,
height INTEGER NOT NULL,
-- Storage locations
ipfs_cid VARCHAR(100),
filecoin_cid VARCHAR(100),
watermarked_cid VARCHAR(100),
watermarked_storage_url VARCHAR(1024),
preview_url VARCHAR(1024),
-- Encryption
is_encrypted BOOLEAN DEFAULT FALSE,
encryption_key BYTEA,
encryption_salt BYTEA,
-- Metadata
title VARCHAR(500),
description TEXT,
tags TEXT[],
exif_metadata JSONB,
tiff_metadata JSONB,
c2pa_metadata JSONB,
-- Processing
processing_status VARCHAR(50) DEFAULT 'pending', -- pending, processing, completed, failed
processing_error TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE,
INDEX idx_organization_id (organization_id),
INDEX idx_created_by (created_by),
INDEX idx_content_hash (content_hash),
INDEX idx_cid (cid),
INDEX idx_processing_status (processing_status),
INDEX idx_created_at (created_at)
);
Watermarks Table¶
Stores watermark information for each asset.
CREATE TABLE watermarks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
asset_id UUID NOT NULL UNIQUE REFERENCES assets(id) ON DELETE CASCADE,
-- Watermark payloads (one per track)
track_a_payload BYTEA, -- Spatial domain watermark
track_b_payload BYTEA, -- Frequency domain watermark
track_c_payload BYTEA, -- Neural watermark (invisible)
-- Detection metrics
track_a_detected BOOLEAN DEFAULT FALSE,
track_b_detected BOOLEAN DEFAULT FALSE,
track_c_detected BOOLEAN DEFAULT FALSE,
-- Quality metrics
embedding_strength FLOAT,
robustness_score FLOAT,
-- Recovery info
recovered_payload BYTEA,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE,
INDEX idx_asset_id (asset_id)
);
Fingerprints Table¶
Stores content fingerprints using pgvector for embeddings.
CREATE TABLE fingerprints (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
asset_id UUID NOT NULL UNIQUE REFERENCES assets(id) ON DELETE CASCADE,
-- CLIP fingerprint (768 dimensions)
clip_embedding vector(768),
clip_hash VARCHAR(256),
-- DINO fingerprint (1024 dimensions)
dino_embedding vector(1024),
dino_hash VARCHAR(256),
-- PDQ hash (Facebook perceptual hash)
pdq_hash VARCHAR(256),
pdq_hash_int BIGINT, -- For Hamming distance queries
-- Index metadata
embedding_model VARCHAR(100),
similarity_threshold FLOAT DEFAULT 0.8,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE,
INDEX idx_asset_id (asset_id),
-- HNSW index for fast similarity search
INDEX idx_clip_embedding ON fingerprints USING hnsw(clip_embedding vector_cosine_ops),
INDEX idx_dino_embedding ON fingerprints USING hnsw(dino_embedding vector_cosine_ops)
);
Commitments Table¶
Stores blockchain commitments.
CREATE TABLE commitments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
asset_id UUID NOT NULL REFERENCES assets(id) ON DELETE CASCADE,
organization_id UUID NOT NULL REFERENCES organizations(id),
-- Commitment data
commitment_hash VARCHAR(256) NOT NULL,
commitment_payload BYTEA NOT NULL,
-- Blockchain info
blockchain_network VARCHAR(50), -- mainnet, devnet, testnet
tx_hash VARCHAR(256) UNIQUE,
signature BYTEA NOT NULL,
-- Status
confirmation_status VARCHAR(50) DEFAULT 'pending', -- pending, confirmed, finalized
block_height BIGINT,
confirmed_at TIMESTAMP WITH TIME ZONE,
-- Verification
verified BOOLEAN DEFAULT FALSE,
verified_at TIMESTAMP WITH TIME ZONE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
INDEX idx_asset_id (asset_id),
INDEX idx_tx_hash (tx_hash),
INDEX idx_commitment_hash (commitment_hash),
INDEX idx_confirmation_status (confirmation_status)
);
Fingerprint Similarity Cache¶
For faster fingerprint matching without recalculating:
CREATE TABLE fingerprint_matches (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_fingerprint_id UUID NOT NULL REFERENCES fingerprints(id) ON DELETE CASCADE,
target_fingerprint_id UUID NOT NULL REFERENCES fingerprints(id) ON DELETE CASCADE,
target_asset_id UUID NOT NULL REFERENCES assets(id) ON DELETE CASCADE,
-- Similarity scores
clip_similarity FLOAT,
dino_similarity FLOAT,
pdq_similarity FLOAT,
-- Average similarity
average_similarity FLOAT,
is_match BOOLEAN,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
UNIQUE (source_fingerprint_id, target_fingerprint_id),
INDEX idx_source (source_fingerprint_id),
INDEX idx_target (target_fingerprint_id),
INDEX idx_average_similarity (average_similarity DESC)
);
Verification Logs Table¶
Tracks all verification attempts.
CREATE TABLE verification_logs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id) ON DELETE SET NULL,
api_key_id UUID REFERENCES api_keys(id) ON DELETE SET NULL,
-- Verification target
verified_asset_id UUID REFERENCES assets(id) ON DELETE SET NULL,
source_image_hash VARCHAR(256),
source_image_size_bytes BIGINT,
-- Results
verdict VARCHAR(50), -- authentic, forged, unknown
confidence FLOAT,
-- Detection results
watermark_detected BOOLEAN,
watermark_confidence FLOAT,
fingerprint_match_found BOOLEAN,
fingerprint_similarity FLOAT,
blockchain_verified BOOLEAN,
-- Processing
processing_time_ms INTEGER,
model_version VARCHAR(100),
-- Metadata
requester_ip VARCHAR(45),
user_agent TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
INDEX idx_user_id (user_id),
INDEX idx_verdict (verdict),
INDEX idx_verified_asset_id (verified_asset_id),
INDEX idx_created_at (created_at)
);
API Keys Table¶
Stores API keys for programmatic access.
CREATE TABLE api_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
-- Key info
key_prefix VARCHAR(32) NOT NULL, -- cert_pk_xxx
key_hash VARCHAR(256) UNIQUE NOT NULL, -- Hashed (bcrypt)
name VARCHAR(255) NOT NULL,
-- Permissions
scopes TEXT[] DEFAULT ARRAY['read', 'verify'],
-- Rate limits
rate_limit_per_hour INTEGER,
rate_limit_per_day INTEGER,
-- Status
is_active BOOLEAN DEFAULT TRUE,
expires_at TIMESTAMP WITH TIME ZONE,
last_used_at TIMESTAMP WITH TIME ZONE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
INDEX idx_organization_id (organization_id),
INDEX idx_user_id (user_id),
INDEX idx_key_prefix (key_prefix),
INDEX idx_is_active (is_active)
);
Quota Limits Table¶
Tracks usage limits per organization.
CREATE TABLE quota_limits (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL UNIQUE REFERENCES organizations(id) ON DELETE CASCADE,
-- Current period (monthly)
period_start DATE NOT NULL,
period_end DATE NOT NULL,
-- Usage tracking
uploads_count INTEGER DEFAULT 0,
verifications_count INTEGER DEFAULT 0,
storage_used_bytes BIGINT DEFAULT 0,
-- Limits
uploads_limit INTEGER NOT NULL,
verifications_limit INTEGER NOT NULL,
storage_limit_bytes BIGINT NOT NULL,
-- Status
quota_exceeded BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE,
INDEX idx_organization_id (organization_id),
INDEX idx_period (period_start, period_end)
);
Indexes Strategy¶
Primary Indexes¶
- User authentication:
idx_email - Asset lookup:
idx_organization_id,idx_content_hash - Verification:
idx_verified_asset_id,idx_created_at - Blockchain:
idx_tx_hash,idx_confirmation_status
Vector Indexes¶
- HNSW indexes on fingerprint embeddings for fast similarity search
- Cosine distance metric for semantic similarity
Time-based Indexes¶
idx_created_aton assets, verification_logs for range queries- Partition assets by month for large tables
Migrations¶
Use Alembic for version control:
# Create new migration
alembic revision --autogenerate -m "Add new field"
# Apply migrations
alembic upgrade head
# Rollback
alembic downgrade -1
Performance Tuning¶
Connection Pooling¶
DATABASE_POOL_SIZE = 20
DATABASE_MAX_OVERFLOW = 10
Query Optimization¶
# Use join loading
await session.execute(
select(Asset).options(
joinedload(Asset.watermark),
joinedload(Asset.fingerprint)
)
)
Index Maintenance¶
-- Analyze table for query planner
ANALYZE assets;
-- Reindex if needed
REINDEX INDEX idx_asset_id;
Backup & Recovery¶
Backup¶
pg_dump -U certana -d certana > certana_backup.sql
Restore¶
psql -U certana -d certana < certana_backup.sql