This study is part of the increasing body of research warnings about the risks of deploying AI agents in real -world financial decision making. Earlier this month, a group of researchers from many universities Argued that LLM agents should be evaluated primarily based on their risk profiles, not only their peak performance. Current benchmarks, they say, emphasize accuracy and returns-based matrix, which measures how well an agent can perform on its best performance, but it ignores how safe it can fail. His research also found that even top performing models are likely to break under adverse conditions.
The team suggests that in terms of real-world finance, a small weakness-even a 1% failure rate can also highlight the system for systemic risks. They recommend that AI agents be “tested” before making practical use.
Hancheng Cao, an upcoming assistant professor at the Mori University, noticed that the value negotiation studies are limitations. “The experiments were conducted in a fake environment that cannot fully catch the complexity of real -world talks or user behavior,” Cao says.
Researcher, PEI says that researchers and industry physicians are experimenting with various strategies to reduce these risks. These include refining the indications given to AI agents, enabling agents to use external tools or codes, coordinating several models to re-examine each other’s work, and coordinating fine-tuning models on domain-specific financial data-well has shown the promise to improve performance.
Many major AI shopping tools are currently limited to the recommendation of the product. In April, for example, Amazon Launched “Buy for me,” An AI agent who helps customers find and purchase products from other brands’s sites if Amazon does not sell them directly.
While the price interaction in consumer e-commerce is rare, it is more common in business-to-business transactions. Alibaba.com has rolled a sourcing assistant called Accio produced on its open-source Qwen model, which helps businesses find suppliers and research products. The company told MIT technology review Citing high risk, there is no plan to automatically automatically automatically priced value bargaining.
This can be an intelligent step. For now, PEI advises consumers that they consider AI shopping assistants as accessories-not stand-in for humans in taking-innovation.
“I don’t think we are fully prepared to hand over AI shopping agents to their decisions,” they say. “So perhaps use it as an information tool, not a conversation.”
Improvement: We removed a line about agent fines