TrustGuard: Evaluating Trustworthiness in AI Systems

NEWSLETTER  >

TrustGuard is an in-isolation AI risk assessment system designed to evaluate risks arising from the compromise of key trustworthiness characteristics—such as safety, security and resilience—across both model and data assets throughout the entire AI lifecycle, from design and development to deployment. In addition to identifying risks, the tool provides asset-level controls to support targeted mitigation. Built as a checklist-based solution, TrustGuard operationalises a six-phase, risk-based methodology defined within the FAITH trustworthiness assessment framework and is aligned with ISO 27005 and ISO 42001 standards.

The tool assesses trustworthiness dimensions derived from leading European and international best practices and standards, including guidance from the EU High-Level Expert Group on AI (HLEG), ENISA, NIST’s AI Risk Management Framework, and relevant ISO/IEC initiatives. Together, these references form a comprehensive framework for evaluating and promoting trustworthy AI systems. TrustGuard begins with the cartography phase, which defines the assessment boundaries—such as the AI system assets, the lifecycle stage under evaluation, the relevant stakeholders, and the system’s sector-specific criticality, in line with the AI Act.

 

It then moves to the AI trustworthiness threat assessment phase, where the user evaluates potential technical threats to the AI system, (e.g., data poisoning, evasion) based on their sector-specific likelihood.

The third phase focuses on assessing the sector-specific severity of potential business, ethical, or legal consequences that may arise from a lack of certain trustworthiness properties – consequences that could result if specific technical threats materialize.

In phase four, the analysis shifts to system vulnerabilities. Here, the user answers a set of technical control questions (e.g., “Is the training dataset enriched with adversarial examples?”) to determine AI system vulnerabilities to AI threats.

Phase five involves calculating risk levels by combining the likelihood and impact of identified threats, based on the context-specific user’s inputs. Finally, phase six presents the threatened trustworthiness properties, the associated technical threats and risks, and a corresponding set of mitigation actions. These include technical safeguards, governance mechanisms, and behavioural or social interventions.

This end-to-end approach captures the complexity and interdependence of trust-related risks, ensuring that each assessment is context-specific and aligned with the unique characteristics of the AI system under test. The current prototype version of TrustGuard is now being tested by the project’s Large-Scale Pilots (LSPs). The feedback collected during this phase will directly inform the development of a second, more mature version of the system.

Author(s)

Eleni Tsalapati