Architecture

How the Findings Generator works end-to-end.

DOCKER COMPOSE STACK EXTERNAL DATA SOURCES LDG CSV Export data/*.csv (mounted) OpenAPI JSON data/*.json (S1 spec) GitHub REST API api.github.com/repos/ MicrosoftDocs/defender-docs INGEST LAYER Python 3.12 + uv ingest_ldg.py ingest_openapi.py api_doc_scraper.py httpx + BeautifulSoup POSTGRESQL 15 db:5432 (internal network) ldg_fields vendor_id, field, type openapi_endpoints operation_id, path, method findings checks_hash (UNIQUE) field_map 290 fields, 74 confirmed vendors + export_config 9 vendors, sourcetypes + field_operators, field_aliases, api_sync_config Volume: postgres_data:/var/lib/postgresql/data psycopg3 FLASK WEB APP webapp:5001 (exposed) webapp.py Flask + threading.Thread Background job executor /generator SSE stream /config API sync Routes: / /generator /config /statistics /lucidum-reverse /architecture /export-lucidum.json psycopg3 MCP SERVER mcp_server.py (FastMCP) ldg_search_fields() FTS query ldg_list_fields() all fields openapi_search_endpoints() FTS query openapi_get_endpoint() by op_id findings_save() / findings_list() dedup field_map_lookup/search/validate() export aid psycopg3 CLAUDE CODE CLI @anthropic-ai/claude-code (npm) subprocess.Popen() claude -p "$(cat tasks/*/task_*)" --mcp-config .mcp.json tasks/{vendor}/ (8 vendors) s1, defender, crowdstrike, wiz, orca, okta, tenable, entra_id 6-7 tasks per vendor MCP stdio JSON-RPC 2.0 Thread spawn subprocess BROWSER CLIENT localStorage (job state) EventSource (SSE) HTTP :5001 DEDUP ENGINE findings_save() internals 1. Normalize checks (sort, lower) 2. SHA256(json.dumps(checks)) 3. INSERT ON CONFLICT DO NOTHING VOLUME MOUNTS ~/.claude:/root/.claude Claude auth (API key or subscription) ./tasks:/app/tasks Task prompts (live reload) DATA FLOW psycopg / HTTP subprocess spawn internal logic LUCIDUM EXPORT lucidum_export.py Smart Label JSON field_map + sourcetype routing DOCKER NETWORK db:5432 (postgres, 17 tables) webapp:5001 (flask, 30+ routes) 557 findings across 8 vendors

How It Works

1. Data Ingestion

Multiple ingestion methods populate the database with vendor-specific data:

2. PostgreSQL Database

Central store for all data:

3. MCP Server

mcp_server.py exposes the database as MCP tools that Claude can call:

Communicates with Claude over stdio using the Model Context Protocol.

4. Claude + Task Prompts

Claude generates findings by executing structured task prompts. Each of the 8 supported vendors has its own task folder:

Categories are vendor-specific (6-7 tasks per vendor). Common categories include:

Task prompts include strict anti-duplication rules. Claude self-verifies every field via ldg_search_fields and field_map_lookup before saving.

5. Generator UI / Task Runner

Multiple ways to run the generation pipeline:

The Web UI features:

6. API Documentation Sync

Automated API documentation fetching for cloud-based vendors:

Currently supports: Microsoft Defender for Endpoint (from MicrosoftDocs/defender-docs)

7. Deduplication Engine

Prevents duplicate findings at multiple levels:

8. Web Application

webapp.py (Flask with threading) provides:

9. Finding Payload Structure

Each finding stored as JSON contains:

10. Lucidum Export Engine

lucidum_export.py transforms findings into Lucidum Smart Label import format:

DB-backed field_map (290 fields, 7 types) with fallback to hardcoded base fields. Self-learning via Lucidum export uploads.

11. Lucidum Reverse Engineering

/lucidum-reverse page and reverse_engineer.py for learning Lucidum's internal field/operator rules:

12. Supported Vendors

8 integrated security platforms (557 findings total):

Architecture is vendor-agnostic — new vendors can be added by importing LDG fields, providing OpenAPI docs, and creating task prompts in tasks/{vendor}/.