Better LLM Agents for CRM Tasks: Tips and Tricks

Primary Takeaways

CRM duties are troublesome for present LLM brokers to finish resulting from lack of coaching knowledge protection and LLM’s unfamiliarity with enterprise context.
Offering extra domain-specific data (in prompts or as instruments) can significantly assist LLM brokers.
Telling brokers how you can clear up a activity, as an alternative of what activity to unravel, is often extra useful, even with out equipping them with operate calling skills.

Background

LLM brokers are seeing an increasing number of functions in actual life, from being private assistants to serving to software program engineers write code and even working facet by facet with scientists on their analysis. With Agentforce, Salesforce’s trusted platform, we pioneer LLM brokers for CRM functions like serving to prospects with their return and refund requests, arising with the very best pitch for gross sales representatives tailor-made in direction of their shoppers, and producing insights about worker productiveness and roadblocks for managers.

Whereas fashions reminiscent of GPT, Claude, and Gemini present spectacular common skills, CRM duties are a unique story. Their specialised nature and restricted knowledge protection make it exhausting for LLMs to carry out reliably. Moreover, lots of them are “noob errors” resulting from lack of ample understanding of the enterprise context and specialised area data. (Verify our weblog about Why Generic LLM Brokers Fall Brief in Enterprise Environments for extra particulars.)

To resolve this drawback and bridge the hole between excessive common functionality and low specialised functionality of LLMs, in addition to human-in-the-loop efforts and trade-offs, we performed a sequence of investigation and recognized varied ideas and methods to higher unleash their efficiency on varied real looking CRM duties.

Agentic Simulation Atmosphere with CRMArena-Professional

Our benchmark of selection is the newly launched CRMArena-Professional: Holistic Evaluation of LLM Brokers Throughout Numerous Enterprise Situations and Interactions, developed by researchers at Salesforce AI Analysis, consisting of twenty-two duties and 2140 activity situations spanning numerous classes reminiscent of workflow execution, coverage compliance, and knowledge retrieval. The primary discovering is the suboptimal efficiency of even state-of-the-art LLMs with tried and true agentic implementation frameworks reminiscent of ReAct. For instance, GPT-4o is barely capable of clear up lower than 30% of all duties, whereas its reasoning mannequin counterpart, o1, nonetheless fails at simply over 50% of all duties. The perfect performing mannequin out of 9 flagship fashions from varied suppliers, Gemini-2.5-pro, struggles to attain a completion charge of 60%. (Verify our weblog on how you can Consider LLM Brokers for Enterprise Purposes with CRMArena-Professional.)

After an intensive evaluation of brokers’ execution, we recognized a number of potential causes for the shortage of efficiency.

Question Syntax Limitations: Manipulating knowledge on the Salesforce platform requires writing queries in SOQL and SOSL languages. Whereas they’re just like SQL, there are specific key variations. Because of this, the agent generally produces queries of unlawful syntax. Whereas the agent can appropriate some errors after observing the error message, for others, the tried correction leads to additional errors.
Knowledge Mannequin/Schema Confusion: As a trademark future for CRMArena-Professional is the intricate and inter-connected schemas, representing real-life enterprise entities reminiscent of account supervisor, pricebook, order, lead and voice name transcript, brokers typically confuse associated ideas, reminiscent of an order merchandise vs. a pricebook entry, or a lead vs. a possibility. Because of this, they often lookup info within the improper desk, leading to failed executions or improper outcomes.
Ambiguity in Underspecified Duties: There are specific underspecified particulars within the activity, reminiscent of whether or not a case that has been transferred from one customer support consultant to a different one ought to depend for both of them (e.g., when calculating the typical dealing with time), each, or neither. Brokers typically straight assume a selected reply, failing to understand that there’s ambiguity to be clarified.
Unfamiliarity with Enterprise Workflow: Lastly, even when the agent is obvious on the info schema and activity specification, it might nonetheless fail on the duty resulting from its unfamiliarity with the enterprise workflow. For instance, whereas SOQL has fuzzy search skill, most search duties are higher applied with SOSL. As a result of the agent is mostly unfamiliar with the fine-grained variations, it generally fails to make use of the proper instruments, resulting in excessively lengthy outputs and really inefficient executions.

Within the subsequent few sections, we describe varied concepts that we explored on augmenting the agent with extra info and instruments. We think about the ability group of “Structured Knowledge Querying & Numerical Computation” in CRMArena-Professional, as they embody the agentic procedural execution essentially the most. Beneath, we summarize our primary findings within the desk, that are defined intimately within the subsequent sections.

SOQL/SOSL Solely (Unique CRMArena-Professional Setup)+ Perform Header Solely+ Full Perform Implementation+ Floor Fact WorkflowTask-Particular Capabilities (TSF)TSF + Refactored SubroutinesTask-Particular Capabilities (TSF)TSF + Refactored SubroutinesTechnical DescriptionNon-Technical Description5 Tasks0.33~ 91percentDid Not Evaluate3 Unseen Tasks0.310.310.320.480.340.720.54Human EffortNoneHighHighHighHighMediumLow

Past Uncooked SOQL and SOSL

Within the CRMArena-Professional benchmark, by default LLM brokers are restricted to utilizing solely Within the CRMArena-Professional benchmark, by default LLM brokers are restricted to utilizing solely two capabilities: SOQL and SOSL. Regardless of their versatility in concept, the brokers should deal with duties utterly autonomously, ranging from scratch and relying solely on these two question languages. Human setup time is deliberately stored near zero, simulating a hands-off, absolutely self-reasoning agent.

By comparability, in the actual world, groups can present LLM brokers with extra customized actions tailor-made to the duties they care about. These can embrace domain-specific instruments, scripts, or workflows. Groups could even ask LLMs to generate new actions on the fly — although at this time, this typically nonetheless requires human validation or expert-level coding to make them dependable.
With platforms like Agentforce, builders can speed up this course of by leveraging default motion libraries and accessing present metadata from their org. Nevertheless, there’s an essential tradeoff:

How a lot human effort is required to outline and refine these actions (e.g., immediate engineering, code writing, integration)?
How properly do these actions carry out on core in-domain duties that the agent was constructed for?
And critically, how properly do they generalize to out-of-domain duties that weren’t anticipated however should be requested by customers?

Discovering the proper steadiness between autonomy and setup effort is vital to creating LLM brokers sensible, scalable, and reliable in enterprise settings. To research finest methods to enhance agent efficiency, we rigorously research the traits of 5 duties (deal with time, switch depend, prime situation identification, finest area identification and conversion charge comprehension), whereas leaving three others (month-to-month pattern evaluation, gross sales quantity understanding and gross sales cycle understanding) as problem duties to check the agent generalization. As the primary consequence column reveals, with uncooked SOQL/SOSL entry (i.e.,the unique CRMArena-Professional setup), the agent achieves a efficiency of XX% on former 5 duties and 31% on the latter three duties.

Larger-Degree Capabilities Can Assist, however With a Caveat

Our first exploration is to supply task-specific capabilities for brokers to name. Writing these capabilities are time consuming and requires skilled data on programming. We anticipate that in most conditions, these capabilities are offered for only some duties. On the similar time, nonetheless, we wish the mannequin to know the excessive stage targets by way of these capabilities. Due to this fact, our primary analysis is on a set of duties that aren’t straight coated by these capabilities.

Essentially the most pure method of offering capabilities to brokers is by exhibiting the operate headers, with an instance under. This operate finds the agent with the minimal or most common deal with time of their assigned circumstances in a time period.

def find_agent_with_handle_time(start_date, end_date, min_cases, find_min=True):
“””
Finds the agent with the required deal with time standards.

Parameters:
start_date (str): Begin date in ‘YYYY-MM-DD’ format.
end_date (str): Finish date in ‘YYYY-MM-DD’ format.
min_cases (int): Minimal variety of circumstances the agent will need to have managed. All brokers who deal with (min_cases – 1) or fewer non-transferred circumstances will likely be excluded.
find_min (bool): If True, discover the agent with the minimal deal with time. If False, discover the utmost.

Returns:
str: The Id of the agent.
“””

We write one operate for every of the 5 duties that we studied, and the agent utilizing them can obtain a really excessive efficiency of 91%.

Issues are fairly totally different, nonetheless, on the three unseen duties. After we present solely the operate headers of those task-specific capabilities (TSF), the agent achieves a efficiency of 31%. This is similar efficiency because the agent with solely uncooked SOQL/SOSL entry within the unique CRMArena-Professional setup, suggesting that straight exposing the operate headers of those extremely specialised capabilities will not be useful.

Given the monolithic nature of those capabilities, we hypothesize that offering extra atomic subroutines could also be helpful. Thus, we ask GPT-4o (the LLM underlying our agent) to generate reusable subroutines for these high-level capabilities (with analogous header documentation). Then, we offer the headers to each the high-level capabilities and the subroutines to the agent to make use of. An instance of such a subroutine is offered under.

def query_accounts_by_ids(account_ids):
“””
Fetches account particulars for a listing of account IDs.

Parameters:
account_ids (checklist): An inventory of account IDs.

Returns:
dict: A dictionary mapping account IDs to account particulars.
“””

Curiously, we observe very slight improve in efficiency, at 32%, when giving each varieties of operate headers. After additional analyses, we discovered that whereas the agent generally accurately use these subroutines, the implementations of those subroutines (that are generated by GPT-4o) could also be problematic, leading to incorrect consequence or program crashes. Moreover, because the supply code will not be uncovered to the agent, the agent has extraordinarily restricted insights into the explanations of those errors and strategies of correction. Thus, we conclude that offering subroutines by way of header documentation solely doesn’t enhance the agent efficiency.

Motivated by the findings above, we subsequent hypothesize that exhibiting the complete supply code implementation could possibly be helpful, because the supply code tells the agent not solely what the capabilities do, however how they work. Word that the agent remains to be not allowed to execute arbitrary code — solely the offered (high-level or subroutine) capabilities and uncooked SOQL/SOSL.

This seems to be very useful: 48% accuracy when the agent is supplied with the full implementation of the high-level TSF capabilities. In contrast, resulting from bugs launched within the refactoring course of, the efficiency of the agent, when supplied with the (buggy) refactored operate implementations and taking them because the supply of fact, regresses again to 34%, although nonetheless barely greater than the 2 setups with operate header solely. The considerably greater efficiency suggests the utility of offering appropriate, detailed and actionable steerage to brokers, particularly exterior of their “pure habitats”, i.e., in unfamiliar domains.

Workflow Description Is Very Useful

Can we additional enhance the efficiency of the unseen duties? A pure concept, motivated by how new human staff are skilled for a job, is to let the agent observe the workflow of a selected, consultant activity and ask it to extrapolate and generalize. We experiment with two varieties of workflows. The primary one is a technical workflow, the place we absolutely describe the process required for a activity. Beneath is the start of an instance workflow.

Suppose that we need to reply the next question: At this time’s date: 2021-05-09. Decide the agent with the quickest common time to shut alternatives within the final 6 weeks.

We use the next workflow to reply this question:

At this time’s date is 2021-05-09, so six weeks in the past is 2021-03-28. After we discuss an the time it takes to shut or signal a possibility, we’re excited about all alternatives whose corresponding contract has an organization signed date falling throughout the interval of curiosity. Due to this fact, we first get all contracts with an organization signed date inside this time interval. We need to retrieve the corporate signed date and the contract ID (which will likely be linked to the chance). So we use execute the next SoQL question:

SELECT Id, CompanySignedDate FROM Contract WHERE CompanySignedDate != NULL AND CompanySignedDate >= 2021-03-28 AND CompanySignedDate

This question leads to the next information:

{‘Id’: ‘800Wt00000DDfifIAD’, ‘CompanySignedDate’: ‘2021-04-27’}
{‘Id’: ‘800Wt00000DE1T0IAL’, ‘CompanySignedDate’: ‘2021-04-15’}
{‘Id’: ‘800Wt00000DE42gIAD’, ‘CompanySignedDate’: ‘2021-04-29’}

Then, for every contract ID, we have to discover the corresponding alternative with this ContractId__c. We have to retrieve the OwnerId (which corresponds to the agent), and the created date of the chance. We use the next SoQL question:

(extra textual content omitted)

Scripting this one requires a human person to first research the duty, write the SOQL/SOSL queries, and analyze the outcomes. Naturally, the author must have working data of the database question language. Nonetheless, in comparison with offering the complete task-specific capabilities, that is nonetheless a lot simpler, because the human solely must carry out an indication for a concrete instance, reasonably than laboriously arising with a completely common operate that covers all doable circumstances.

By comparability, the second workflow sort that we give is non-technical. For a similar activity, the excerpt under offers the entire workflow description on this non-technical method.

Suppose that we need to reply the next question: At this time’s date: 2021-05-09. Decide the agent with the quickest common time to shut alternatives within the final 6 weeks.

We use the next workflow to reply this question:

At this time’s date is 2021-05-09, so six weeks in the past is 2021-03-28. After we discuss an the time it takes to shut or signal a possibility, we’re excited about all alternatives whose corresponding contract has an organization signed date falling throughout the interval of curiosity. Due to this fact, we first get all contracts with an organization signed date inside this time interval. We need to retrieve the corporate signed date and the contract ID (which will likely be linked to the chance).

Then, for every contract ID, we have to discover the corresponding alternative with this ContractId__c. We have to retrieve the OwnerId (which corresponds to the agent), and the created date of the chance.

By combining the 2 outcomes, we are able to calculate the typical closing time for every agent because the distinction between the contract’s firm signed date and the chance’s created date. In the long run, we return the agent with the shortest common closing time.

As we are able to see, there isn’t any SOQL/SOSL question and no presentation of the precise question consequence. As an alternative, solely the high-level process is given. This description ought to be very straightforward to write down for anybody with a working data of the system, even when they don’t seem to be accustomed to the precise database question language.

With these two workflow codecs, we see that the agent achieves a 72% accuracy when given technical workflows and 54% when giving non-technical workflows, suggesting sturdy agent functionality to study and generalize from just one occasion.

Conclusion

Giant Language Fashions (LLMs) wrestle with specialised CRM duties resulting from restricted area coaching knowledge and inadequate enterprise context, resulting in errors in question syntax, schema confusions, ambiguity dealing with, and unfamiliarity with workflows. At Salesforce AI Analysis, we attempt to make LLM brokers higher at CRM duties and to take action, we discover varied methods to complement LLM brokers with domain-specific instruments, technical workflow descriptions, or operate implementations. We discover that telling brokers how to carry out duties — not simply what to do — makes a major distinction, even with out refined operate calling skills. Whereas uncooked SOQL/SOSL entry yields low activity accuracy (~31%), offering full operate implementations or detailed technical workflows can elevate accuracy to as a lot as 74% with technical workflow descriptions. Even with out technical workflows, their non-technical counterparts are nonetheless efficient, and, to a lesser extent, implementation particulars of human-written capabilities. For future work, we’ll discover extra methods for brokers to study passively from people, or with affordable quantity of human efforts, in addition to making them higher at enhancing themselves by studying from their previous errors.

What's Hot

Verizon Prepaid vs Postpaid Plans: What’s the Difference?

BBC World Service – Global News Podcast, The Happy Pod: ‘I’m blind but I can read a book again’

Vanessa Williams Channels Miranda Priestly in ‘Devil Wears Prada’ Heels

Better LLM Agents for CRM Tasks: Tips and Tricks

Specsavers wins Brand of the Year accolade

2025 Talent Trailblazer Award winner revealed

Towards Trustworthy Enterprise Deep Research

Half of B2B marketers grappling with AI skills gap

How Agentforce Supported the Disability Help Desk at Dreamforce

Brand ‘fundamentals’ are what will drive success in the era of AI

Verizon Prepaid vs Postpaid Plans: What’s the Difference?

BBC World Service – Global News Podcast, The Happy Pod: ‘I’m blind but I can read a book again’

Vanessa Williams Channels Miranda Priestly in ‘Devil Wears Prada’ Heels

9 Movies That Pulled Their Studios Back from the Brink

Four ways to be more selfish at work

How to Create a Seamless Instagram Carousel Post

Up First from NPR : NPR

Meta Plans to Release New Oakley, Prada AI Smart Glasses

Our Picks

Verizon Prepaid vs Postpaid Plans: What’s the Difference?

BBC World Service – Global News Podcast, The Happy Pod: ‘I’m blind but I can read a book again’

Subscribe to Updates

What's Hot

Better LLM Agents for CRM Tasks: Tips and Tricks

Primary Takeaways

Background

Agentic Simulation Atmosphere with CRMArena-Professional

Past Uncooked SOQL and SOSL

Larger-Degree Capabilities Can Assist, however With a Caveat

Workflow Description Is Very Useful

Conclusion

Related Posts