POLQA Vs PESQ

Objective quality scoring explained.

The differences between ITU objective quality scoring standards and why you should care.

PESQ and POLQA are widely adopted ITU standards for scoring audio quality across telephony networks.

They have its origins in ITU-T’s family of full reference objective voice quality measurements which started in 1997 with P.861 (PSQM), which was superseded by P.862 (PESQ) in 2001 [Wikipedia]. It was originally developed to test narrow band networks. For WebRTC and IP based calls, ITU-T P.862 PESQ was effectively superseded in 2010 by ITU-T P.863 POLQA.

Fig. 1: MOS as a function of packet loss and jitter w/ PCM codec (PESQ algorithm [L], POLQA algorithm [R]), Slavata & Holub, 2013.

For modern test requirements, the use of POLQA is strongly recommended. The reasoning behind this is because of POLQA's capability for wideband and super-wideband measurement accuracy and suitability to advanced IP based networks.

When testing modern IP networks, the only reason why a customer would want to test with PESQ would be if they needed to compare earlier results with those obtained now. Even in this case, they should be aware of possible issues with scores obtained if measuring over IP services. For modern stacks it is recommended to utilise POLQA for objective measurement, which is appropriate for WebRTC quality scoring.

Let's explain a little more:

Effectively both PESQ and POLQA use an algorithm that predicts what a subjective score would be from a human listener. However, the listening conditions for a narrowband (NB) and wideband (WB) codec are not comparable, and PESQ uses the NB listening conditions whereas POLQA uses WB listening conditions.
POLQA is tuned to respect modern codec behavior - including error correction - whereas PESQ does not, nor is it designed to be used on IP based networks.
PESQ cannot evaluate speech above 7kHz (popular codec's such as Opus is 8kHz in wideband mode).
PESQ cannot resolve 'time-warping' (variable speed for error-correcting) correctly and therefore tends to give pessimistic scores for WB codecs. POLQA tracks time-warping and gives realistic scores when it occurs.

"POLQA is the correct method of testing when working with wideband(WB) and super wideband(SWB) codecs such as Opus."
‍
— John Mitchem, Co-founder Operata.

Simply put, for VoIP and webRTC testing, stick with POLQA. PESQ can cause false positives and scores should be considered erroneous when used in wideband networks. Make sure you compare quality monitoring and testing tools to ensure they support POLQA for objective quality scoring.

To learn more, take a deep dive into the research that compares the algorithms.

‍