A multi-agent reinforcement learning system where an orchestrator learns to detect deception, assign trust, and optimize decisions in real-time adversarial environments.
Each module operates as an independent inference layer within the trust-calibration pipeline. All components communicate via the orchestration bus.
Real-time orchestrator view. Agent trust scores update per-step. Red indicates flagged adversarial behaviour.
Data flows unidirectionally through the trust-calibrated RL loop. Each stage emits telemetry to the monitoring bus.
Averaged across evaluation episodes. Adversarial injection ratio fixed at 20%. Baseline: naive averaging orchestrator without trust calibration.