Think about it… What would it be like to have a customer service agent who works without sleep, understands everything you say, and handles 70% of your complaints on their own? That’s what every quality assurance (QA) team dreams of.
Now let’s look at the other side. Let’s say that the same agent sent the wrong refund policy to 10,000 customers overnight, without anyone’s permission. No test script caught the error. By the time the office opens the next working day, everything will be damaged.
This is the biggest challenge every team using Salesforce might come across.
At Dreamforce 2024, Salesforce CEO Marc Benioff made this clear:
“Agentforce is the Third Wave of AI, advancing beyond copilots to a new era of intelligent, low-hallucination agents that set a new standard for accuracy and relevance, boosting productivity and customer satisfaction to unprecedented levels.”
It wasn’t just an announcement, but a clear indication of the changes to come. Agentforce is completely different from the chatbots or co-pilots of the past. It can find the necessary information on its own, prepare precise plans, and get the tasks done quickly, without waiting for every human instruction.
This is the beginning of a big change in the field of quality assurance (QA). Until now, we have been doing Salesforce testing in a time when software responded to humans. But in this new era of agents, software is acting like humans. There is a big gap between our testing methods and this new technology. It is through that gap that mistakes often occur.
To understand why your quality assurance (QA) practices need to change, you first need to understand what you are actually testing.
Agentic Salesforce agents can work autonomously on web portals and messaging platforms 24 hours a day, within the limits we set. While handling duties on their own, they can also escalate complex issues to human help promptly. Its specialty is efficiency and accuracy.
All of this is controlled by the Atlas Reasoning Engine. This engine understands what a user wants, gathers the necessary information, and finds a solution to that problem on its own. This is not something that is limited to a few questions and answers like a regular chatbot. Rather, it is a system that can make its own decisions according to the situation at each stage.
Salesforce has developed this technology in four major phases over the past year, working with tens of thousands of customers:
These changes are happening at a rapid pace. With each new release, the way we test changes dramatically.
There’s a truth that many QA teams are reluctant to admit: your old test scripts are not effective in these new systems.
Older software used to be deterministic, and it worked according to the exact rules we provided. But AgentForce is not like that. Its Atlas reasoning engine is capable of thinking and making decisions like humans. Therefore, even if you give the same instruction, the result can change slightly each time. It doesn’t work in the same pattern as typical Flow or Apex classes. Therefore, it is not possible to predict when and how they will respond. Therefore, accurate monitoring is required even after the system goes live.
In short, write a script, run it, check that everything is fine: these old testing methods do not work with agents.
In the past, if something went wrong somewhere, customers would complain, so that we would have time to fix the bug. But in the case of these autonomous agents, that comfort may not be there. No matter how accurate the instructions we give seem, in some cases, these agents are likely to give false information (Hallucination). If this is not taken care of in time, it can lead to a loss of customer trust and huge financial losses.
The security shields that we have trusted for so long (Traditional QA) have changed. The risk is very high. The scripts you have prepared for months have become useless in this new era.
This new agentic change in Salesforce can lead to errors that we don’t see in our usual test plans. Here are the main areas your team should pay attention to from now on:
When testing Agentforce, you need to ensure two factors at once.
This is a completely different field. It requires new tools and new ways of thinking. The very meaning of what we say a test passes is changing here.
What exactly is a QA strategy for modern agents?
Salesforce itself has launched the AgentForce Testing Center (ATC) with these changes in mind. It can generate hundreds of customer interactions (synthetic interactions) by simply giving instructions in the language we normally speak. For example, it can automatically generate a variety of questions that people are likely to ask a customer service agent and simultaneously test how well they work.
Scenario-based tests, API mock-ups, and guardrails can all be tested through ATC. Whether you’re building an internal co-pilot or a large agent workflow, ATC can help you test with accuracy.
But Salesforce’s own tool also has some limitations. While this is great for initial batch testing, it may not be able to accurately simulate many complex behaviors in a production environment.
This is where AI-based testing platforms come in. Salesforce testing tools like testRigor are built for these non-deterministic and complex scenarios. It helps teams test without the fear of scripts breaking frequently and without the need for constant manual work.
A complete agentic QA strategy should also include the following:
Let’s take a company that uses AgentForce service agents to help their customers around the world.
By the weekend, they made a small change to their product return policy. The change was to extend the return period by three days. As if it were a small change that wouldn’t make a big difference, they updated the information in Salesforce. “This is a small change. It doesn’t need any special testing,” everyone thought, and went off to celebrate the weekend.
But by Sunday, the agent was still telling 40% of the customers the same old policy. Why? Even with the new information, the agent still had some old data and instructions left somewhere in the agent’s system. So when it came to specific questions, the agent kept going back to the old information.
So, what happened? Hundreds of customers were given incorrect information. When the office opened on Monday morning, there was a huge backlog of complaints. This led to a loss of trust in the company among customers.



