Tools To Assess Your Agent’s Work

Congratulations! You deployed your first AI agent and it’s on the market, doing its job, streamlining your workflows and serving to your staff work smarter. You’re monitoring the engagement metrics and escalation price KPIs. However you continue to may get up in the midst of the night time, questioning, “Do I’ve sufficient knowledge to know if my agent is doing a great job?”

The extra insights you could have, the extra rapidly you may make enhancements — which is why lots of people are asking the identical query. “We’re nonetheless in very early days, measuring these brokers,” mentioned Jesse Luke, senior supervisor, knowledge enablement, internet, at Salesforce. “It’s a course of everybody goes via.”

However there are methods to measure the standard and effectiveness of your AI brokers’ work, beginning with the KPIs you set in place at deployment. There are additionally Salesforce instruments — together with some on the horizon — that will help you assess your agent’s efficiency.

What does an efficient AI agent seem like?

An excellent agent doesn’t simply reply prospects’ or staff’ questions. It solves folks’s issues. One of the best brokers do that seamlessly.

“How have you learnt you’re working with a great AI agent vs. a mediocre one?” Mike Murchison, CEO of Ada, requested on LinkedIn. “Good AI ought to really feel like one of the best server at your favourite restaurant.”

Like a terrific server, he mentioned, a terrific agent anticipates your wants even earlier than you do. “They bear in mind your preferences, spot any issues earlier than they occur, and repair them with out fanfare,” he added.

That’s the best. However first, it’s possible you’ll merely need to know whether or not your agent is assembly its fundamental KPIs. “You probably have a good suggestion of your KPIs and may determine how the agent impacts these, you’re off to the races,” Luke mentioned.

On the Salesforce Assist web site, for instance, the customer support agent’s job is to assist folks rapidly discover the knowledge they want and scale back the caseload of human brokers. The corporate posts the agent’s efficiency metrics on a weekly foundation.

The numbers? One week in September, Agentforce, the Salesforce platform for constructing and deploying AI brokers, dealt with over 61,000 assist requests and resolved greater than 39,000 of them. Roughly 17,000 requests had been handed off to people.

These are the type of KPIs that present your agent is doing its job.

You may measure solely what you possibly can see

One of many largest challenges corporations have with AI brokers is visibility — with the ability to see what their agent is doing and ensure it’s appearing as they need. Salesforce’s Agentforce Observability provides a unified dashboard that tracks an agent’s error charges, escalation charges, latency, and extra. It sits inside Agentforce Studio, a brand new suite of instruments to gauge an agent’s efficiency. The dashboard can reply questions reminiscent of “How is adoption and utilization trending?” and “Are my brokers following authorized and regulatory necessities?”

It may possibly additionally categorize your agent’s conversations into subjects so you possibly can see how prospects are utilizing the agent. For instance, 40% of agent periods may be about fee issues; one other 20% might be cancellation requests.

Enroll now

How Salesforce measures efficiency

Salesforce conducts its personal AI agent analysis in a number of methods. The corporate’s Digital Success workforce runs artificial assessments twice a month to see how brokers carry out in hypothetical conditions. To do that, they use an in-house software much like the Agentforce Testing Middle, which lets prospects check brokers in safe sandboxes earlier than they’re deployed.

Earlier this yr, the workforce ran a check that resulted in low answer-quality scores, with the Salesforce Assist agent scoring 59% in opposition to a baseline of 60%. When the workforce seemed extra intently, they found the agent was hallucinating URLs. The answer? “We shipped a repair, ran one other check, and improved our reply high quality to 67%,” mentioned Zachary Stauber, senior director, digital success, AI, at Salesforce.

The reply-quality rating was helpful info. However Salesforce additionally wished to know the way Agentforce was interacting with customers in the actual world, and at scale. And so they wished to offer these conversations a rating.

So, the corporate’s Information Enablement workforce began trying on the session stage, which is all the dialog between a consumer and agent. “However we discovered that it wasn’t logical to do it that means,” mentioned Manoj Arora, principal member of the technical employees, software program engineering, at Salesforce. “There may be some questions the place the agent did a great job, and in the identical session, a query the place the agent didn’t do a great job.”

The Information Enablement workforce subsequent checked out particular person inquiries to see how an agent answered each. However that didn’t make sense both; once they reviewed a single query and reply, the back-and-forth lacked context. Lastly, they used an information science mannequin that classifies and clusters related subjects into teams, or moments. These are what the workforce determined to concentrate on.

The workforce then used Agentforce to check these agentic moments, scoring them on a scale of 1 to 5. They did this utilizing an inside software much like Agentforce Optimization, which is in beta now, and will likely be accessible on the finish of October as a part of Agentforce Observability.

What’s your agentic AI technique?

Our playbook is your free information to turning into an agentic enterprise. Find out about use circumstances, deployment, and AI expertise, and obtain interactive worksheets on your workforce.

Extra instruments coming quickly

Agentforce Optimization will allow you to look extra intently at your agent’s conversations. It should retrieve your agent’s chat knowledge and move it to a immediate inside Agentforce, which charges the interplay. A rating of 1 to 2 means your agent wants enchancment. A 5 is motive for celebration.

“What you’re searching for is to see if the appropriate motion is being taken by the agent in these conversations, and whether or not these actions are according to your corporation targets,” mentioned Luke.

To get a greater concept of how Agentforce Optimization will rating responses, listed below are two samples of conversations on salesforce.com that the workforce has already examined:

The interplay: A buyer requested Agentforce to “present some buyer success tales which have helped IT consultants enhance enterprise.” The agent replied, “I couldn’t discover any particular buyer tales associated to IT consultants bettering enterprise. Would you prefer to seek for one thing else or do you want info on a selected Salesforce product?”
The rating: 1-2, as a result of the agent couldn’t reply the shopper’s query.
The interplay: A buyer requested, “Can I already construct a semantic mannequin in Tableau Cloud that will likely be utilized by Tableau Agent?” The agent replied, “Sure you possibly can construct a semantic mannequin in Tableau Cloud that will likely be utilized by Tableau Agent. Tableau Semantics is a strong AI-infused semantic layer built-in into Information Cloud that interprets your knowledge into your corporation language.” The agent then described the options of Tableau Semantics intimately and supplied a hyperlink to the part’s internet web page.
The rating: 5, as a result of the shopper acquired the knowledge they wanted.

Agentforce Optimization will cluster a number of interactions like these into significant classes so you possibly can consider an agent’s efficiency at scale. You may, for instance, need to take a look at how your agent is dealing with a selected matter, reminiscent of requests for product info. Otherwise you may need to take a look at clusters by rating. The place is your agent routinely getting scores of 1 or two? The place is the agent doing nicely? All that will likely be doable with Agentforce Optimization.

Corporations will be capable of customise the software to go well with their enterprise wants. A big retailer, for instance, may need to see how their agent handles returns; one other firm may need to see how the agent manages tech assist.

However Agentforce Optimization isn’t the one new software on the horizon. Agentforce Analytics 2.0, a extra superior model of the present Agentforce Observability dashboard, can also be in beta. The beefed-up dashboard will supply a higher-level view, displaying what number of conversations have taken place and which subjects are being coated, in addition to latency and escalation charges. It, too, will likely be accessible on the finish of October.

Why AI agent analysis is so necessary

Corporations have to assess their agent’s efficiency for a easy motive: to know what’s working and what ought to be improved. With metrics in hand, you may see that it is advisable replace your content material, for instance, or that your agent wants extra detailed directions. “The primary factor we normally discover is unhealthy knowledge,” mentioned Stauber.

Unhealthy or mislabeled knowledge, knowledge from unknown sources, or knowledge that’s scattered over a number of programs can all be an issue. However when you’ve recognized the problem, you possibly can take motion. That’s what Salesforce’s Digital Success workforce does when it finds an error just like the URL hallucinations talked about earlier. “We will do a repair, come again to the baseline program, run a check once more, and see how issues have modified,” Stauber mentioned.

Calm down, your agent is tough at work

With all these new instruments to guage your AI agent’s efficiency, firm leaders ought to be capable of breathe a sigh of aid. So, the following time you startle awake questioning how your agent is doing, return to sleep. Let your agent work at that hour as a substitute.

Get a primary take a look at how companies are utilizing AI brokers

Discover how brokers are already serving to corporations throughout gross sales, service, inside operations and extra.

What's Hot

Verizon Prepaid vs Postpaid Plans: What’s the Difference?

BBC World Service – Global News Podcast, The Happy Pod: ‘I’m blind but I can read a book again’

Vanessa Williams Channels Miranda Priestly in ‘Devil Wears Prada’ Heels

Specsavers wins Brand of the Year accolade

2025 Talent Trailblazer Award winner revealed

Towards Trustworthy Enterprise Deep Research

Half of B2B marketers grappling with AI skills gap

How Agentforce Supported the Disability Help Desk at Dreamforce

Brand ‘fundamentals’ are what will drive success in the era of AI

Verizon Prepaid vs Postpaid Plans: What’s the Difference?

BBC World Service – Global News Podcast, The Happy Pod: ‘I’m blind but I can read a book again’

Vanessa Williams Channels Miranda Priestly in ‘Devil Wears Prada’ Heels

9 Movies That Pulled Their Studios Back from the Brink

Four ways to be more selfish at work

How to Create a Seamless Instagram Carousel Post

Up First from NPR : NPR

Meta Plans to Release New Oakley, Prada AI Smart Glasses

Our Picks

Verizon Prepaid vs Postpaid Plans: What’s the Difference?

BBC World Service – Global News Podcast, The Happy Pod: ‘I’m blind but I can read a book again’

Subscribe to Updates

What's Hot

Tools To Assess Your Agent’s Work

What does an efficient AI agent seem like?

You may measure solely what you possibly can see

Get articles chosen only for you, in your inbox

How Salesforce measures efficiency

What’s your agentic AI technique?

Extra instruments coming quickly

Why AI agent analysis is so necessary

Calm down, your agent is tough at work

Get a primary take a look at how companies are utilizing AI brokers

Related Posts