Architecture

1. Data Ingestion

Multiple ingestion methods populate the database with vendor-specific data:

ingest_ldg.py — Imports Lucidum Data Gateway (LDG) CSV exports containing normalized asset fields (OS, agent status, IPs, threats, CVEs, etc.)
ingest_openapi.py — Imports vendor OpenAPI specs from JSON files (e.g., SentinelOne API docs)
api_doc_scraper.py — Auto-fetches and parses API docs from GitHub (Microsoft Defender)

2. PostgreSQL Database

Central store for all data:

ldg_fields — Queryable asset fields per vendor (LDG + source fields)
openapi_endpoints — API endpoints with parameters, permissions, and descriptions
findings — Generated findings with payload, checks_hash for dedup, lucidum_id for FIN tracking
field_map — 290 Lucidum field mappings (type, searchFieldName, operators, confirmed status)
field_operators — 43 valid type/operator pairs across 8 field types
export_config — Sourcetype mappings, created_by, field_collection for Lucidum export
vendors, api_sync_config, ldg_catalog_rows, ldg_services — Reference data

3. MCP Server

mcp_server.py exposes the database as MCP tools that Claude can call:

ldg_search_fields / ldg_list_fields — Discover available asset fields
openapi_search_endpoints / openapi_get_endpoint — Look up API endpoints
findings_save — Save a new finding (with automatic dedup via SHA256 hash of normalized checks)
findings_list — List existing findings so Claude avoids duplicates
field_map_lookup / field_map_search — Look up Lucidum field mappings, types, and valid operators
field_validate_condition — Validate field+operator+value combinations before saving

Communicates with Claude over stdio using the Model Context Protocol.

4. Claude + Task Prompts

Claude generates findings by executing structured task prompts. Each of the 8 supported vendors has its own task folder:

tasks/s1/ — SentinelOne Singularity XDR (7 tasks, 322 findings)
tasks/defender/ — Microsoft Defender for Endpoint (7 tasks, 41 findings)
tasks/crowdstrike/ — CrowdStrike Falcon (7 tasks, 53 findings)
tasks/wiz/ — Wiz Cloud Security (7 tasks, 81 findings)
tasks/orca/ — Orca Security (7 tasks, 30 findings)
tasks/okta/ — Okta SSO (6 tasks, 10 findings)
tasks/tenable/ — Tenable Vulnerability Management (6 tasks, 10 findings)
tasks/entra_id/ — Microsoft Entra ID (6 tasks, 10 findings)

Categories are vendor-specific (6-7 tasks per vendor). Common categories include:

Agent Health / Sensor Status — disabled, offline, outdated agents
Threats / Detections — active threats, malware by severity
Vulnerabilities — CVEs, CVSS scores, exploitable vulns
Network Exposure — public IPs, security groups, external access
Compliance / Policy — classifications, governance, benchmarks
IAM / Identity — MFA, access keys, roles, stale accounts
Cross-Category — compound risks (runs last)

Task prompts include strict anti-duplication rules. Claude self-verifies every field via ldg_search_fields and field_map_lookup before saving.

5. Generator UI / Task Runner

Multiple ways to run the generation pipeline:

Web UI (/generator) — Select vendor and tasks, click Generate, view real-time streaming output, cancel with Stop button
CLI (s1_find_gen.sh) — Batch execution of all tasks

The Web UI features:

8-vendor support with vendor-specific task categories
Job persistence via localStorage — navigate away and return without losing status
Real-time streaming output from Claude
Task order enforcement (cross-category runs last)

6. API Documentation Sync

Automated API documentation fetching for cloud-based vendors:

GitHub Scraper — Fetches markdown docs from vendor GitHub repos
OpenAPI Generation — Parses markdown tables to generate OpenAPI 3.0 specs
Scheduled Sync — Configure automatic updates (daily, weekly, monthly)
Manual Trigger — One-click sync from Configuration page

Currently supports: Microsoft Defender for Endpoint (from MicrosoftDocs/defender-docs)

7. Deduplication Engine

Prevents duplicate findings at multiple levels:

Checks are normalized (lowercased, trimmed, sorted by field/operator/value)
SHA256 hash computed deterministically from normalized checks
Database enforces UNIQUE(vendor_id, checks_hash)
Claude also calls findings_list before generating to avoid logical overlaps

8. Web Application

webapp.py (Flask with threading) provides:

Findings Browser (/) — Filter by vendor, weight (1-5), category; sort by date or weight; export as CSV/JSON
Finding Editor (/finding/<id>) — Edit title, description, weight, confidence, MITRE techniques, remediation
Generator (/generator) — 8-vendor task selection with real-time output streaming and job persistence
Lucidum Reverse (/lucidum-reverse) — Upload Lucidum exports, condition builder, field registry (290 fields, 43 operators)
Lucidum Export (/export-lucidum.json) — Transform findings into Lucidum Smart Label JSON format
Statistics (/statistics) — Per-vendor findings distribution, weight breakdown, FIN ranges
Configuration (/config) — Auth, API sync, export settings (sourcetype mappings)

9. Finding Payload Structure

Each finding stored as JSON contains:

checks — Array of conditions (field + operator + value) that must all be true
weight — Severity score (1-5)
mitre_techniques — Relevant ATT&CK technique IDs
remediation — Actionable fix steps
why_it_matters — Business impact explanation
notes — LDG fields used, supporting API endpoints

10. Lucidum Export Engine

lucidum_export.py transforms findings into Lucidum Smart Label import format:

Field routing — Base fields (74 confirmed) get their own searchFieldName; vendor-specific fields route to Extra_Data Embed_List
Operator mapping — Maps internal operators (match, not match, older_than_days) to Lucidum equivalents
Sourcetype rules — Adds connector scope filter per vendor (configurable via /config/export)
FIN ID assignment — Sequential FIN-NNN IDs assigned at export time, persisted in DB
Pre-formatted Embed_List — Wiz/Orca style sub-conditions with Key/Value/Operator triplets

DB-backed field_map (290 fields, 7 types) with fallback to hardcoded base fields. Self-learning via Lucidum export uploads.

11. Lucidum Reverse Engineering

/lucidum-reverse page and reverse_engineer.py for learning Lucidum's internal field/operator rules:

Upload parser — Parses exported Smart Labels, extracts field names, types, operators, value formats
Field registry — field_map table with 290 fields (74 confirmed from 560 live Smart Labels mined via API)
Operator catalog — field_operators table with 43 type/operator pairs across 8 field types
Condition builder — Interactive UI to test field+operator+value combinations before saving

12. Supported Vendors

8 integrated security platforms (557 findings total):

SentinelOne Singularity XDR — 322 findings, sourcetype: sentinelone_agent
Wiz Cloud Security — 81 findings, sourcetype: wiz_asset, pre-formatted Embed_List checks
CrowdStrike Falcon — 53 findings, sourcetype: crowdstrike_host
Microsoft Defender for Endpoint — 41 findings, sourcetype: defender_machine
Orca Security — 30 findings, sourcetype: orca_asset
Okta SSO — 10 findings, sourcetype: okta_user
Tenable Vulnerability Management — 10 findings, sourcetype: tenable_asset
Microsoft Entra ID — 10 findings, sourcetype: azure_ad

Architecture is vendor-agnostic — new vendors can be added by importing LDG fields, providing OpenAPI docs, and creating task prompts in tasks/{vendor}/.