How I Learned to Speak MOS

I have a confession. When I did my first Amazon Connect implementation, I learned about the term MOS (Mean Opinion Score).

Truthfully, I only knew that it was a way to measure voice quality. I even built part of the business case to shift to the cloud around how Amazon Connect provided best-in-class MOS. At the time, that’s all I needed to know. I didn’t need to know the technical behind it, but here at Operata, it’s one of the things we measure with our CX Observability platform, and so my curiosity about how it is measured drove me to get into much greater depth of MOS.

Forewarning, I get a little technical in this one but if you are in contact center ops and IT then I think it is worth getting your head around (like I probably should have). At the very least, it will help you with your BS detector when someone tries to dazzle you with jargon.

When someone asks you to rate voice quality on a scale of 1 to 5, they are really asking for the Mean Opinion Score, or MOS. That somewhat familiar question hides a world of behind-the-scenes data, audio processing, and network hiccups. This has nothing to do with quality monitoring of a contact center agent's interaction, which is based on process, compliance and tone. MOS is purely a technical measure of the clarity of the voice data.

In this article, I will take you inside the ITU-T E-Model behind modern MOS measurement (which is what Operata uses) to translate jitter, latency, and packet loss into a single, human-friendly MOS value. I’ll also explain the R-factor at the heart of the model, compare alternative MOS methods, and show you how observability uses MOS to inform your contact center and IT operations about voice quality.

What is R-Factor?

Before I explain MOS I wanted to explain the R-factor which is an objective measure that takes into account many technical aspects, including signal-to-noise ratio, latency, and packet loss. The E-Model (ITU-T G.107) turns what the network is doing into a single quality index called R.

It is used within the MOS calculation of voice quality, quanitfying call quality on a scale from 0 to 100, with higher values indicating better quality

How R-Factor is Calculated:

R = Ro - Is - Id - Ie + A

Where:

Ro: Represents the basic signal-to-noise ratio.
Is: Accounts for impairments that occur simultaneously with the voice signal, such as background noise.
Id: Represents impairments due to delay, including latency and echo.
Ie: Reflects impairments caused by the codec and network equipment.
A: The advantage factor, which compensates for user expectations in certain scenarios, like mobile usage.

What is MOS?

MOS or Mean Opinion Score, which is a subjective measure of how good or bad a voice sounds under different data network conditions.

Early researchers recruited panels of listeners to rate call quality on a 1 to 5 scale ( 1 unusable and 5 perfect) as the data network used to carry voice was gradually made worse.

Subjective listening tests remain the ground truth, but they are slow and expensive. You cannot scale up hundreds of participants every time you want to spot network issues. That is why we calculate an estimated MOS in real time by using network metrics recorded on the agent’s computer. We approximate what a human listener would report. This automated MOS gives you continuous observability without interrupting callers or agents.

MOS uses three key data network performance metrics to derive a MOS value:

Round-Trip Time (RTT): how long it takes for data to go out and come back
Jitter: how much the timing of those data packets wiggles around
Packet Loss: how many pieces of the conversation get dropped

MOS Score:

To put this into context here are some call samples of what this sounds like:

Other Ways to Measure voice quality on a data network

While the E-Model is the go-to for live monitoring, several other standards exist to predict or measure voice quality:

Subjective Listening Tests (ITU-T P.800)
These are controlled lab experiments where human listeners rate call quality on a 1–5 scale. They remain the gold standard for accuracy but are slow, costly, and do not scale to production environments.

Full-Reference Objective Algorithms
These require both the original clean signal and the degraded signal to compute a perceptual difference. Two leading ITU-T standards are:

PESQ (P.862)
Perceptual Evaluation of Speech Quality was standardized in 2001 for narrowband telephone networks and codecs. It aligns closely with subjective tests by temporally aligning reference and test signals, applying a perceptual model, and mapping the result to a MOS-like score.
POLQA (P.863)
Perceptual Objective Listening Quality Assessment entered into force in 2011 and was updated in 2018. It extends PESQ’s approach to wideband (up to 14 kHz) and full-band (up to 24 kHz) audio, handling HD-Voice codecs and complex delay variations. POLQA remains a full-reference model, offering highly accurate MOS predictions for next-generation voice services.

(We have a full article explaining the difference here)

No-Reference Objective Algorithms
These require only the degraded signal without a clean reference.

P.563 is an ITU-T standard for single-ended narrowband speech quality estimation. It trades some accuracy for ease of deployment when you cannot capture or synchronize a reference stream.

Non-Intrusive Network Models (E-Model G.107 and extensions)
I learned that Operata uses an E-Model for a few good reasons. Firstly, its wideband/fullband extension estimates MOS solely from packet metrics. This is great because you get real-time, continuous quality scores without impacting live calls, making it perfect for CX Observability and your contact center.

In most contact center observability scenarios, the E-Model strikes the best balance of accuracy, scalability, and simplicity. You can, however, schedule periodic PESQ or POLQA tests for deeper validation in more controlled testing scenarios.

Why CX Observability matters

Our CX observability platform tracks metrics, including MOS, that can impact voice quality such as CPU, memory and network throughput.

These metrics tell you how your systems perform, but not how your callers experience voice quality. Operata CX observability bridges the gap.

Early detection of caller pain
While exploring this, I observed in one scenario MOS dropping from 4.5 to 3.2 over ten minutes. By correlating that with a surge in jitter, the team identified a misconfigured network switch, and this was even before agents started filing support tickets.
Prioritization by user impact
A CPU spike is a technical alert. A MOS below 3.5 is a customer crisis. Ranking issues by MOS ensures you fix what hurts callers first.
Focused troubleshooting
Because the E-Model penalty terms are separate, you can see whether delay, jitter, or packet loss drives quality loss. That tells you whether to dig into routing, firewall rules, or your VoIP gateway.
Shared language across teams
When contact-center ops and network teams use MOS as the common metric, you avoid the blame game. Everyone speaks the same quality language.

Pair MOS with call volume, agent status, and trunk utilization to get a 360-degree view. You’re unlikely to ever miss a silent quality degradation again.

The Wrap Up

I warned you I was getting technical on this one but for good reason. Understanding the performance of your data network, for every second of each call is the foundation in understanding how good your customer and agent audio will sound.

MOS is the voice quality KPI that both contact center and IT can rally around and the E-Model’s R factor turns packet data into a number everyone understands.

You cna use MOS to prioritize incidents, verify fixes, and trend improvements. When you need gold-standard checks, run PESQ or POLQA.

Ready to start speaking MOS? Try Operata’s free trial today. See MOS live, set intelligent alerts, and solve voice quality issues before your callers notice.

Until next time and always,

Hooroo

‍

Luke Jamieson brings over twenty years’ experience in contact centre leadership, having overseen some of Australia’s most successful operations and guided major CCaaS cloud migrations for leading organisations. Despite his track record, he remains committed to learning and is keen to share his discoveries each week at Operata. See more discoveries here.