The core KPIs of LLM performance (and how to track them)
A few months ago, I built an MCP server for Toronto’s Open Data portal so an agent could fetch datasets relevant to a user’s question. I threw the first version together, skimmed the code, and everything looked fine. Then I asked Claude: “What are all the traffic-related data sources for the city of Toronto?” The tool call fired. I got relevant results. And then I hit an error: “Conversation is too long, please start a new conversation.” I had only asked one question.