Pope Leo got hung up on. Your metrics would have said the call was fine.

TechTarget and Informa Tech’s Digital Business Combine.

Together, we power an unparalleled network of 220+ online properties covering 10,000+ granular topics, serving an audience of 50+ million professionals with original, objective content from trusted sources. We help you gain critical insights and make more informed decisions across your business priorities.

 
Advertisement

Pope Leo got hung up on. Your metrics would have said the call was fine.

By every measure most contact centers still report on, the call that turned a Chicago bank into international news was a success. That tells you something about the scorecard.

When Father Tom McCarthy told a roomful of parishioners in Naperville that Pope Leo XIV had been hung up on by his Chicago bank, nobody in the audience expected the anecdote to go viral. Within a week, the story was on CBS, picked up by wire services that cover the Vatican and being shared in WhatsApp groups well outside the usual Catholic-media orbit.

The version most people heard goes like this: About two months after his election, the former Cardinal Robert F. Prevost rang his Chicago bank to update his phone number. He passed every security question. When he was told the change required an in-person visit, he explained he was living out of town. When pressed for more detail he said, “Would it matter if I tell you I’m Pope Leo?” The agent hung up on him.

For contact center leaders, this is a story to laugh at and then, contemplate for a long time. So, forgive us Father for we have sinned, because by the metrics most operations teams are still measured against, that call was a success.

What the standard scorecard would have said

Authentication completion: passed. The Pope answered the security questions correctly.

Adherence to policy: 100%. The bank’s identity-verification policy required an in-branch change. The agent followed it.

Average handle time: presumably within target. There is no indication that the call ran long.

Compliance: clean. The agent didn’t deviate from the script.

Quality assurance: any QA reviewer working off a standard scorecard would tick every box. Greeting, verification, policy enforcement, professional sign-off. The QA team would not pull this call for review and if they did, would probably have drawn the same conclusion that the agent presumably did…this is a prank call.

First-call resolution: this is where it gets interesting. By strict definition, the customer’s stated reason for calling was not resolved on the first call. Many operations; however, define FCR by whether the agent followed the correct path, not by whether the customer’s problem went away. And because The Pope “knew a guy” to get it sorted, he never called back. By that looser definition, FCR holds.

What the standard scorecard missed

Customer effort was extreme. The Pope went through the verification process, was rebuffed by policy, escalated by invoking his identity and was hung up on. Then, he had to involve a third party to reach the bank’s president. By the time the issue was resolved, the matter had travelled up several layers of the bank’s hierarchy.

Reputational impact was substantial. The bank was unnamed in the original story, which arguably saved it. The customer was the Pope. The mainstream coverage gave retail banking another example of out-of-touch policy enforcement, on the front page, free of charge.

The signal to every other customer reading the story was clear. Even if you are who you say you are, the call center may not be able to help you with a situation outside the script.

A second scorecard, alongside the first

The fix here is not to throw out compliance and handle-time metrics. Those exist for good reasons. I believe the fix is to put alongside them a second set of metrics that catch the experiential dimension of a call. Here’s a short list of where I would start:

Customer effort tied to outcome. Effort scoring has been around for over a decade, but in many operations, it is still treated as a survey artefact rather than an operational metric. Effort should be inferred from call signals (repeats, escalations, transfers, channel switches) and reviewed alongside resolution status. A call with high effort and a policy-compliant ending is an amber flag, not a green one.

Sentiment trajectory rather than sentiment score. A single sentiment score for a call tells you very little. The shape of sentiment over the call is more useful. A call that opens neutral, dips when the customer hits policy friction, and ends sharply negative is a different beast from a call where sentiment recovers after the friction is resolved.

Unresolved by customer intent. Most contact centers measure “unresolved” against the system’s view of the case. The customer’s view is usually different. Did the customer get what they came for or did they leave the call still needing to solve their problem? In the Pope’s case, the answer is unambiguous. He left the call still needing to update his phone number. The system’s view of the call as “handled per policy” is irrelevant to that outcome.

Policy-edge frequency. How often is policy applied to situations the policy was not designed for? This is the metric I see most operations teams miss entirely. Edge cases, by definition, fall outside the bell curve. They are statistically rare and individually expensive. A simple count of calls where the agent had to make a judgment call about whether the policy applied tells you where your policy needs work.

Time to escalation. When an agent does decide to escalate, how long does it take, and how often does the customer have to ask twice? The Chicago bank story turned in part on whether the bank’s processes made escalation easy. The answer, evidently, was no, because the resolution involved a different party calling the bank president directly.

Why the scorecard ended up this way

There is a structural reason metrics look the way they do. The ones that survived are the ones you could measure cheaply with the tools available in the 2000s. Handle time, hold time, abandonment, occupancy. These came from telephony switches. Quality scoring came from sampling and human review. Customer effort and sentiment came along later but were treated as periodic survey questions rather than continuous signals.

What has changed is the data. Many contact centers now have call recording with transcript, sentiment scoring, journey data from the IVR and the agent desktop, and a customer record with previous interactions. The data is there. The metrics that surface from that data have not kept pace.

Pope Leo’s call is a small story and a useful one. It works as an example because the customer was, briefly, unambiguous. He was who he said he was, his situation was unusual but legitimate, and the bank’s process failed him while looking, from the inside, as if everything had gone correctly.

My child, the lesson for you is to ask what your scorecard would have said about that call. Now go and say ten Hail Marys and go design a new one that would have caught what the old one missed.

Until next time and as always, hooroo.

Topics: Coaching And Quality Management, Customer Experience, Metrics