Explorer

Microsoft Rolls Out LLM-Powered Tools To Strengthen AI Chatbot's Security Against Manipulation

Two more features for directing models toward safe outputs and tracking prompts to flag potentially problematic users will be introduced soon.

Microsoft has designed a number of new features which will be easy to use for Azure customers who are not hiring groups of red teamers to test the AI services. These LLM-powered tolls will detect potential vulnerabilities, monitor plausible but unsupported hallucinations, and instantly prevent malicious prompts. This applies to Azure AI users utilizing any model hosted on the platform.

Sarah Bird, Microsoft’s chief product officer of responsible AI, in an interview with The Verge, said, “We know that customers don’t all have deep expertise in prompt injection attacks or hateful content, so the evaluation system generates the prompts needed to simulate these types of attacks. Customers can then get a score and see the outcomes.” 

This will help in avoiding generative AI controversies caused by undesirable or unintended responses. Some of the recent generative AI controversies are — Explicit fakes of celebrities (Microsoft’s Designer image generator), historically inaccurate images (Google Gemini), or Mario piloting a plane toward the Twin Towers (Bing).

Three features that are now available in the preview on Azure AI are:

  • Prompt Shields: Blocks prompt injections or malicious prompts from external documents that instruct models to go against their training.
  • Groundedness Detection: Finds and blocks hallucinations
  • Safety evaluations: Assess model vulnerabilities.

Two additional functionalities, aimed at guiding models towards secure outputs and monitoring prompts to identify potentially problematic users, are set to be introduced soon.

How Will This System Work

The monitoring system evaluates inputs, whether entered directly by the user or generated from third-party data, to detect prohibited words or hidden prompts before sending them to the model for processing. Subsequently, it analyzes the model's response to identify any instances of hallucinated information not present in the input.

In contrast to the Google Gemini images, where filters aimed at reducing bias had unintended consequences, Microsoft's Azure AI tools offer more personalized control. Bird acknowledges concerns about companies determining what is suitable for AI models, so her team has implemented a feature in Azure that allows customers to toggle hate speech or violence filtering, ensuring greater customization.

In the future, Azure users will have access to a report detailing users attempting to trigger unsafe outputs. This functionality enables system administrators to differentiate between their team's red teamers and individuals with potentially malicious intentions.

Bird mentions that the safety features are automatically integrated with GPT-4 and other widely used models such as Llama 2. However, since Azure's model repository includes numerous AI models, users of less popular open-source systems might need to manually configure the safety features for those models.

Top Headlines

Galaxy S26 Ultra vs Galaxy S25 Ultra: Stick With The Old Model Or Spend Rs 1.39 Lakh?
Galaxy S26 Ultra vs Galaxy S25 Ultra: Stick With The Old Model Or Spend Rs 1.39 Lakh?
Galaxy S26 vs Galaxy S26 Ultra: Is Rs 87,999 Model Smart Enough Or Should You Spend Rs 1.39 Lakh?
Galaxy S26 vs Galaxy S26 Ultra: Is Rs 87,999 Model Smart Enough Or Should You Spend Rs 1.39 Lakh?
ASUS ProArt PX13, ROG Flow Z13 KJP, & TUF Gaming A14 Launched In India: Check Price, & Specs
ASUS ProArt PX13, ROG Flow Z13 KJP, & TUF Gaming A14 Launched In India: Check Price, & Specs
Samsung Galaxy S26 Ultra: Is Rs 1.39 Lakh ‘Premium’ Phone Truly Worth It? Full Specs Inside
Samsung Galaxy S26 Ultra: Is Rs 1.39 Lakh ‘Premium’ Phone Truly Worth It? Full Specs Inside

Videos

Amit Shah in Bihar: BJP Govt Resolute to Remove Illegal Immigrants, Ensures National Security
Pathankot Army Exercise: Bhairav Commandos Showcase Lethal Air & Ground Combat Skills
LATEST UPDATE: PM Modi Pays Tribute at Yad Vashem, Set for Bilateral Talks with Israel
BREAKING NEWS: Shankaracharya’s Ashram Allegations Escalate as Insider Reveals Hidden Secrets
BREAKING NEWS: Hearing on Shankaracharya’s Anticipatory Bail Scheduled Amid Abuse Allegations

Photo Gallery

25°C
New Delhi
Rain: 100mm
Humidity: 97%
Wind: WNW 47km/h
See Today's Weather
powered by
Accu Weather
Embed widget