Consistency Counts: Why 3UK Did So Well in OFCOM’s Tests and EE is Not Impressed

In this week’s news, we noted the row between EE and OFCOM over the regulator’s recent Mobile Broadband Performance Report, an exercise to benchmark the data rates, latencies, and Web page load times on the UK MNOs’ 3G and 4G networks. EE doesn’t agree with OFCOM’s results, and is angry enough that it threatened legal action. But what are lawyers doing in this conversation? This ought to be a technical problem that we can solve with data!

The results are interesting. EE is the fastest operator, by a distance, on the straightforward large-file speed tests, and 3UK is the slowest. Which may seem surprising if you’ve read this Telco 2.0 Executive Briefing. However, 3UK won by an impressive margin on the Web page test (Figure 22 below) and by a lesser margin on the ping test. EE did quite poorly on the Web page test.
Average 4G and 3G web browsing speed
But what really got EE’s goat was this chart:

Distribution of 4G web browsing speeds by network
EE had a startlingly high rate of time-outs. And you might wonder how 3UK could possibly have won the Web page test given that they are well behind everyone else on the pure speed test.

Average 4G and 3G HTTP download speeds by provider
Here’s the problem, though. The pure speed test is actually quite an artificial measure – how often do you want to download, or upload, 2GB of digits pulled from your machine’s pseudo-random number generator as fast as you possibly can? One of the most artificial things about it is that it assumes that the data you want is a single file and that it’s a lump download, rather than streaming media or something with interactivity, so peak speed dominates and latency doesn’t matter much. Essentially, it measures the performance of a single long running TCP connection.

But Web pages, or apps’ interactions with servers, or VoIP calls, aren’t much like that. This blog post makes the point with vigour. Modern Web pages consist of a large number of requests for server resources, often from several different servers. As the post makes clear, this means that averages are of only limited use.

If there are more than 69 GETs in the page, it is statistically more likely than not that any given user will encounter the 99th percentile of performance, i.e. an actual majority of page loads will encounter at least one GET that falls in the worst 1%. Further, Murphy’s law suggests that the request that takes forever to respond will be something important, like a JavaScript library that’s required for the page to work.

ETSI has developed a standard page to test this sort of thing, called Kepler, that contains a typical mix of text files (like JavaScript libraries, or just…text), CSS styles, images of various sizes, and uncompressible binary data (like a Flash object or HTML5 video). It doesn’t do anything clever other than provide a representative target for requests once extracted into a public directory on a web server. You can get it here. The ETSI Kepler page includes…75 requests, and the lighter Mobile Web mKepler includes 22.

This isn’t some weird caprice on ETSI’s part. The Google homepage includes 31 requests. Facebook is 178. Amazon.com is 190. It is what it is. If anything, this suggests mKepler is probably too light these days and needs to be closer to the mainline Kepler.

And it’s not only latency that is affected, but also speed. OFCOM’s speed data reports that the differences between operators are statistically significant at the 95% confidence level. In other words, only the slowest 1 in 20 requests on EE overlapped the fastest 1 in 20 on Vodafone. But if there are 75 requests in a page, then 3.75 (i.e. 75/20, but say four) of them will be no faster, and possibly slower, than the competition, and an actual majority of users won’t perceive any effective speed boost – that is, unless they take it into their heads to stop everything else and download 2GB of random numbers in a single file.

The way TCP works also means that it only reaches maximum performance after some time. This is why pure speed tests use large files. With small data transfers, it’s unlikely that the peak data rate will be reached before the transfer is complete. As a result, with lots of small requests, latency accounts for a bigger proportion of the total load time, and jitter – the variation in latency – becomes very important, as we have just seen. 3UK had lower latency on 3G than any other carrier and was only just pipped by EE on 4G. But where it really shone was in the consistency of its performance. In fact, it looks like 3UK tuned its network deliberately to achieve consistent performance rather than trying to beat the speed-gun.

Distribution of 4G latency by network Q4 2014
EE’s response is apparently, according to Total Telecom, as follows:

According to our EE insider, this suggests that the anomalies were caused by either an issue with Ofcom’s test software, the server hosting the mKepler Website, or the interoperability between that server and EE’s network.

So, let’s look at the methodology OFCOM used. They used identical Samsung Galaxy Note 3 devices, running the stock firmware and operating system, with uncapped data allowances (this was surprisingly difficult to achieve), and a test-and-measurement tool from Anite, called Datum.

The test sites were selected to generate 50 observations for each metric, distributed pseudorandomly in an area of 4km radius from the centre of various British cities. If either 3G or 4G coverage for the carrier in question wasn’t available at the selected test site, the test engineer would pick another. This all seems fair enough. The test tool was programmed to carry out download and upload speed tests using HTTP GET and POST requests over 2GB of random data, standard ICMP pings, and requests for the Kepler page.

So this strikes us as fair. All the tests would be targeting the same server, the random data would not be systematically different by definition, the hardware and software involved was no different, the site placement was designed to be fair, and the Web page was identical for everyone. Also, the server hosting the Kepler page was the same for everyone. Tests on different networks were carried out at the same time, so we can rule out the possibility that some sort of event hit the server while EE was being tested.

What does EE mean by “interoperability” between the server and their network? One thing that might explain high latency and high variation in latency would be if there are problems in the Internet between EE’s Gi interface and the OFCOM web server. For example, a link might be congested, routing might be sub-optimal, or there might be unstable routing. But this isn’t just something you can shrug off like the weather. ISP engineers’ business is to fix these issues.

In our Differentiated Mobile Data Executive Briefing, we identified 3UK’s investment in fibre-to-the-cell backhaul as a major reason for its commercial success. We also identified that 3UK, and other highly successful mobile data operators, had a small but noticeable lead in some core Internet metrics. In the following chart, 3UK, Telenor Sweden, and Free are all in our highly successful group, and they seem to be between 1 and 2 networks closer to the Internet core, measuring by AS_PATH length.

Average AS path length
We didn’t think this was a very strong result at the time, but the OFCOM report makes us wonder. However, most EE address space is in Orange’s AS12576, for which we get an average path length of 3.9, just like 3UK. EE gets most of its upstream connectivity from either Orange (France Telecom Opentransit) or DTAG, but it also has a presence at the LINX. We’d therefore tend to rule out an Internet routing issue.