Redwood Shores, CA-based startup Tumeryk has launched its AI Trust Scores to provide organizations with a deeper understanding of the security issues around the different gen-AI systems. It also announced the availability of its AI Trust Score Manager.
The former gives CISOs greater visibility into the strengths, weaknesses and risks associated with different gen-AI foundational models, while the latter can operationalize quantitative controls on the gen-AI responses of in-house deployments.
AI Trust Scores evaluates and quantifies risk based on nine critical factors: prompt injection, hallucinations, insecure output handling, security, toxicity, sensitive information disclosure, supply chain vulnerability, psychological safety and fairness.
The result can be a bit surprising in specific areas. For example, Chinese DeepSeek performs very well on sensitive information disclosure. In this category, DeepSeek-AI-DeepSeek-R1 scores 910, while Claude Sonnet 3.5 scores 687, and Meta Llama 3.1 405B scores 557.
“CISOs implementing gen-AI solutions are at a loss as to how to measure their risk. Gen-AI responses are non-deterministic,” explains Tumeryk CEO Rohit Valia. “By using the AI Trust Score they can implement safeguards to ensure the systems are operating within an established Trust Zone. They can integrate the Trust Score into their current security fabrics to get alerts or have incidents logged when these policy-based thresholds are violated.”
Organizations can use the comparative Trust Scores to help choose the foundational model best suited for their own requirements. GPT-4o remains the strongest overall security performer. Meta-Llama-3.2-1B-In offers open-source security but has variability in risk handling (‘variance’ in AI is a tendency to provide different outputs to the same input). DeepSeek is risky in key areas (prompt injections and hallucinations) but performs well in logical reasoning.
(As an aside, it is worth watching the evolution of DeepSeek. It is younger than its western rivals and still catching up. Susceptibility to prompt injection and hallucination were early problems for all gen-AI models – and DeepSeek is likely to improve in these areas as it evolves.)
However, the true value of the Trust Scores comes with the AI Trust Manager platform. “This tool provides real-time insights into AI system performance, identifies vulnerabilities, and recommends actionable steps to enhance security and compliance,” said Valia. “By integrating the AI Trust Manager, organizations can proactively manage risks and ensure their AI deployments align with regulatory standards and ethical guidelines.”
He further explained, “When a user or AI Agent calls an LLM protected by the AI Trust Score Manager control layer, a real time AI Trust Score is generated for the response. Based on policy (built using the AI Trust Score thresholds defined or written in Nvidia Conversational Language (Colang)), the user is or is not allowed to receive a response from the LLM. This is similar to how, at the Fair Isaac Corporation, credit card fraud is detected for billions of transactions by scoring each transaction (the FICO score) on a variety of factors.”