Rethinking Tool Calling: Towards a Scalable Standard
The utility of a large language model (LLM) is directly tied to its ability to perform actions and access external information. This process, known as tool calling, enables agents to interact with services, read files, and access data beyond their training corpus. Early implementations of tool calling were often bespoke, requiring an agent to manually support each new tool. This led to a fragmented and unscalable ecosystem.
The Model Context Protocol (MCP), a protocol for agent-tool communication, was proposed to address these challenges by introducing a standardized layer. MCP’s architecture relies on a client-server model where all tool requests are proxied through an MCP server. While this approach brought much-needed standardization to the field, it also introduced new complexities and potential limitations. This article explores a different paradigm for agent-tool interaction, as proposed in the Universal Tool Calling Protocol (UTCP), which seeks to remove the mandatory middleman and allow for more direct and efficient communication.
The Model Context Protocol’s Architectural Hurdles
The MCP architecture is built around a centralized server that acts as a proxy between the agent and the tool. The agent communicates with the MCP client, which then sends the request to the MCP server. This server, in turn, translates and forwards the request to the actual tool. This design choice, while standardizing communication, creates a number of technical challenges:
- Wrapper Tax: The MCP’s reliance on a server requires a tool provider to create a custom “wrapper” server for every tool, even for those with existing, well-defined APIs like a command-line interface (CLI) or a simple HTTP endpoint. This introduces additional development effort and maintenance overhead for tool providers1.
- Security Reinvention: When a tool is placed behind an MCP server, the tool provider is often forced to reimplement core services like authentication, permissions, and rate limiting. The existing, battle-tested security infrastructure of the native API is bypassed, and the agent must now trust an intermediary wrapper server, which may not be vetted, with sensitive credentials.
- Performance Inefficiency: The mandatory extra hop through the MCP server introduces increased latency and a loss of efficiency. Complex data structures returned by the native tool are often simplified into strings by the MCP protocol, which can lead to opacity and a loss of rich data context.
- Scaling Challenges: As the number of tools grows, the MCP context can become full, requiring additional logic on the agent’s side to select the most relevant tools. While solvable, this issue highlights the limitations of a system that passes all tool metadata through a single, context-constrained protocol.
These issues create a high cost of change for the MCP ecosystem. Any protocol updates require servers and clients to be rebuilt, which can make providers reluctant to adopt the protocol or spend resources on development.
Introducing the Universal Tool Calling Protocol (UTCP)
The Universal Tool Calling Protocol (UTCP) presents a different approach by re-evaluating the role of the middleman. Instead of a mandatory server, UTCP proposes a manual for tools. This manual, which is often a simple JSON file, contains the necessary instructions for the agent client to directly call the native endpoint of a tool.
The UTCP architecture works as follows:
- Discovery: The UTCP client obtains a “manual” for a tool. This manual can be provided by the tool itself, a community member, or even generated automatically by an LLM from an existing standard like OpenAPI documentation.
- Direct Communication: The UTCP client uses the instructions in the manual to call the tool’s native API endpoint directly. The UTCP protocol effectively “gets out of the way” after the initial discovery phase.
- Leveraging Existing Infrastructure: This direct-calling model allows agents to leverage the tool’s existing, battle-tested security, authentication, billing, and rate-limiting systems. There is no need for tool providers to reinvent these core services.
This approach addresses the core limitations of the MCP:
- No Wrapper Tax: Tool providers do not need to build and maintain a separate server just to expose their existing API to an agent.
- Enhanced Security: The protocol defaults to using the native security of the tool’s API, eliminating the risk of a malicious or poorly secured intermediary server.
- Improved Efficiency: By removing the extra network hop, UTCP reduces latency and allows for the seamless transfer of rich, structured data.
The UTCP is not an exclusive protocol; it is designed to be fully compatible with MCP. A UTCP client can be configured to call an MCP server, allowing for a hybrid approach where MCP’s strengths (such as complex bi-directional communication for elicitation) can be leveraged when necessary.
Behind the Scenes: Architectural Contrasts
A direct comparison of the implementation shows the core philosophical difference between MCP and UTCP. The client-side code for both protocols is functionally similar, as both abstract the tool-calling process. The most significant difference lies in the tool provider’s implementation.
MCP Provider Implementation:
To expose an API via MCP, a provider must implement a server that acts as a proxy. This involves writing code to receive MCP requests, translate them into the native API calls, and then format the native response back into the MCP protocol. For a simple jobs
API, this requires a stateful process running on a server:
# A simplified example of an MCP server implementation
from mcp import Server
class MyJobServer(Server):
def handle_request(self, request):
# Additional code for proxying the call
# and handling security/authentication
# ...
native_response = my_job_api.call_endpoint(request.data)
return native_response
if __name__ == '__main__':
server = MyJobServer()
server.start()
This server must be maintained, secured, and scaled.
UTCP Provider Implementation:
A UTCP provider, in contrast, simply provides a manual—a JSON file that describes how to call the existing API. This manual can be hosted statically or provided directly to the agent. It contains all the information needed for the client to make the call itself.
{
"name": "OpenLibraryAPI",
"description": "API for Open Library services",
"tools": [
{
"name": "get_author_by_name",
"description": "Finds an author by their name.",
"protocol": "http",
"endpoint": "https://openlibrary.org/search/authors",
"method": "GET",
"parameters": {
"q": {
"type": "string",
"description": "The author's name"
}
}
}
]
}
This approach shifts the burden from the tool provider to the client, which can be seen as a way of “getting out of the way” and allowing native APIs to be consumed directly, without extra infrastructure.
My Thoughts
The debate between an opinionated, centralized protocol like MCP and a flexible, direct-calling approach like UTCP reflects a classic tension in software architecture. MCP’s standardized, server-based model offers a clear, secure boundary for enterprises. It provides a single point for implementing enterprise-wide guardrails, security policies, and auditing, which can be highly valuable in a corporate environment1. This opinionated approach can be a feature, not a bug, much like the rigid but highly functional framework provided by Kubernetes.
However, UTCP’s flexibility is a compelling proposition, particularly for integrating the vast ecosystem of existing APIs. The core premise—that an agent should interact with a tool in the same way a developer does—is both intuitive and pragmatic. The protocol’s ability to leverage existing security infrastructure for authentication and billing is a significant advantage, removing a major hurdle for adoption.
A key challenge for both protocols remains scalability. While the speaker claims UTCP addresses this with a native search functionality based on tags and embeddings, the effectiveness of this approach in the wild is still being explored. The success of any tool-calling standard will ultimately depend on its ability to handle hundreds or thousands of tools efficiently without overwhelming the agent’s context window. The fact that a similar open API-based approach did not gain significant traction with ChatGPT suggests that a standard alone may not be enough; the protocol must be tightly integrated with the underlying agentic system to be truly useful.
Acknowledgements
Thank you to Razvan Ion Radulescu for the insightful talk1 and for sharing his work with the community. Special thanks to the host, Darren, and the organization, Bevel, for hosting the discussion. The insights provided are invaluable to the broader MCP and AI community.
References