Telco Cloud: Why it hasn’t delivered, and what must change for 5G

Related Webinar – 5G Telco Clouds: Where we are and where we are headed

This research report will be expanded upon on our upcoming webinar 5G Telco Clouds: Where we are and where we are headed. In this webinar we will argue that 5G will only pay if telcos find a way to make telco clouds work. We will look to address the following key questions:

  • Why have telcos struggled to realise the telco cloud promise?
  • What do telcos need to do to unlock the key benefits?
  • Why is now the time for telcos to try again?

Join us on April 8th 16:00 – 17:00 GMT by using this registration link.

Telco cloud: big promises, undelivered

A network running in the cloud

Back in the early 2010s, the idea that a telecoms operator could run its network in the cloud was earth-shattering. Telecoms networks were complicated and highly-bespoke, and therefore expensive to build, and operate. What if we could find a way to run networks on common, shared resources – like the cloud computing companies do with IT applications? This would be beneficial in a whole host of ways, mostly related to flexibility and efficiency. The industry was sold.

In 2012, ETSI started the ball rolling when it unveiled the Network Functions Virtualisation (NFV) whitepaper, which borrowed the IT world’s concept of server-virtualisation and gave it a networking spin. Network functions would cease to be tied to dedicated pieces of equipment, and instead would run inside “virtual machines” (VMs) hosted on generic computing equipment. In essence, network functions would become software apps, known as virtual network functions (VNFs).

Because the software (the VNF) is not tied to hardware, operators would have much more flexibility over how their network is deployed. As long as we figure out a suitable way to control and configure the apps, we should be able to scale deployments up and down to meet requirements at a given time. And as long as we have enough high-volume servers, switches and storage devices connected together, it’s as simple as spinning up a new instance of the VNF – much simpler than before, when we needed to procure and deploy dedicated pieces of equipment with hefty price tags attached.

An additional benefit of moving to a software model is that operators have a far greater degree of control than before over where network functions physically reside. NFV infrastructure can directly replace old-school networking equipment in the operator’s central offices and points of presence, but the software can in theory run anywhere – in the operator’s private centralised data centre, in a datacentre managed by someone else, or even in a public hyperscale cloud. With a bit of re-engineering, it would be possible to distribute resources throughout a network, perhaps placing traffic-intensive user functions in a hub closer to the user, so that less traffic needs to go back and forth to the central control point. The key is that operators are free to choose, and shift workloads around, dependent on what they need to achieve.

The telco cloud promise

Somewhere along the way, we began talking about the telco cloud. This is a term that means many things to many people. At its most basic level, it refers specifically to the data centre resources supporting a carrier-grade telecoms network: hardware and software infrastructure, with NFV as the underlying technology. But over time, the term has started to also be associated with cloud business practices – that is to say, the innovation-focussed business model of successful cloud computing companies

Figure 2: Telco cloud defined: New technology and new ways of working

Telco cloud: Virtualised & programmable infrastructure together with cloud business practices

Source: STL Partners

In this model, telco infrastructure becomes a flexible technology platform which can be leveraged to enable new ways of working across an operator’s business. Operations become easier to automate. Product development and testing becomes more straightforward – and can happen more quickly than before. With less need for high capital spend on equipment, there is more potential for shorter, success-based funding cycles which promote innovation.

Much has been written about the vast potential of such a telco cloud, by analysts and marketers alike. Indeed, STL Partners has been partial to the same. For this reason, we will avoid a thorough investigation here. Instead, we will use a simplified framework which covers the four major buckets of value which telco cloud is supposed to help us unlock:

Figure 3: The telco cloud promise: Major buckets of value to be unlocked

Four buckets of value from telco cloud: Openness; Flexibility, visibility & control; Performance at scale; Agile service introduction

Source: STL Partners

These four buckets cover the most commonly-cited expectations of telcos moving to the cloud. Swallowed within them all, to some extent, is a fifth expectation: cost savings, which have been promised as a side-effect. These expectations have their origin in what the analyst and vendor community has promised – and so, in theory, they should be realistic and achievable.

The less-exciting reality

At STL Partners, we track the progress of telco cloud primarily through our NFV Deployment Tracker, a comprehensive database of live deployments of telco cloud technologies (NFV, SDN and beyond) in telecoms networks across the planet. The emphasis is on live rather than those running in testbeds or as proofs of concept, since we believe this is a fairer reflection of how mature the industry really is in this regard.

What we find is that, after a slow start, telcos have really taken to telco cloud since 2017, where we have seen a surge in deployments:

Figure 4: Total live deployments of telco cloud technology, 2015-2019
Includes NFVi, VNF, SDN deployments running in live production networks, globally

Telco cloud deployments have risen substantially over the past few years

Source: STL Partners NFV Deployment Tracker

All of the major operator groups around the world are now running telco clouds, as well as a significant long tail of smaller players. As we have explained previously, the primary driving force in that surge has been the move to virtualise mobile core networks in response to data traffic growth, and in preparation for roll-out of 5G networks. To date, most of it is based on NFV: taking existing physical core network functions (components of the Evolved Packet Core or the IP Multimedia Subsystem, in most cases) and running them in virtual machines. No operator has completely decommissioned legacy network infrastructure, but in many cases these deployments are already very ambitious, supporting 50% or more of a mobile operator’s total network traffic.

Yet, despite a surge in deployments, operators we work with are increasingly frustrated in the results. The technology works, but we are a long way from unlocking the value promised in Figure 2. Solutions to date are far from open and vendor-neutral. The ability to monitor, optimise and modify systems is far from ubiquitous. Performance is acceptable, but nothing to write home about, and not yet proven at mass scale. Examples of truly innovative services built on telco cloud platforms are few and far between.

We are continually asked: will telco cloud really deliver? And what needs to change for that to happen?

The problem: flawed approaches to deployment

Learning from those on the front line

The STL Partners hypothesis is that telco cloud, in and of itself, is not the problem. From a theoretical standpoint, there is no reason that virtualised and programmable network and IT infrastructure cannot be a platform for delivering the telco cloud promise. Instead, we believe that the reason it has not yet delivered is linked to how the technology has been deployed, both in terms of the technical architecture, and how the telco has organised itself to operate it.

To test this hypothesis, we conducted primary research with fifteen telecoms operators at different stages in their telco cloud journey. We asked them about their deployments to date, how they have been delivered, the challenges encountered, how successful they have been, and how they see things unfolding in the future.

Our sample includes individuals leading telco cloud deployment at a range of mobile, fixed and converged network operators of all shapes and sizes, and in all regions of the world. Titles vary widely, but include Chief Technology Officers, Heads of Technology Exploration and Chief Network Architects. Our criteria were that individuals needed to be knee-deep in their organisation’s NFV deployments, not just from a strategic standpoint, but also close to the operational complexities of making it happen.

What we found is that most telco cloud deployments to date fall into two categories, driven by the operator’s starting point in making the decision to proceed:

Figure 5: Two starting points for deploying telco cloud

Function-first "we need to virtualise XYZ" vs platform-first "we want to build a cloud platform"

Source: STL Partners

The operators we spoke to were split between these two camps. What we found is that the starting points greatly affect how the technology is deployed. In the coming pages, we will explain both in more detail.

Table of contents

  • Executive Summary
  • Telco cloud: big promises, undelivered
    • A network running in the cloud
    • The telco cloud promise
    • The less-exciting reality
  • The problem: flawed approaches to deployment
    • Learning from those on the front line
    • A function-first approach to telco cloud
    • A platform-first approach to telco cloud
  • The solution: change, collaboration and integration
    • Multi-vendor telco cloud is preferred
    • The internal transformation problem
    • The need to foster collaboration and integration
    • Standards versus blueprints
    • Insufficient management and orchestration solutions
    • Vendor partnerships and pre-integration
  • Conclusions: A better telco cloud is possible, and 5G makes it an urgent priority

How 5G is Disrupting Cloud and Network Strategy Today

5G – cutting through the hype

As with 3G and 4G, the approach of 5G has been heralded by vast quantities of debate and hyperbole. We contemplated reviewing some of the more outlandish statements we’ve seen and heard, but for the sake of brevity and progress we’ll concentrate in this report on the genuine progress that has also occurred.

A stronger definition: a collection of related technologies

Let’s start by defining terms. For us, 5G is a collection of related technologies that will eventually be incorporated in a 3GPP standard replacing the current LTE-A. NGMN, the forum that is meant to coordinate the mobile operators’ requirements vis-à-vis the vendors, recently issued a useful document setting out what technologies they wanted to see in the eventual solution or at least have considered in the standards process.

Incremental progress: ‘4.5G’

For a start, NGMN includes a variety of incremental improvements that promise substantially more capacity. These are things like higher modulation, developing the carrier-aggregation features in LTE-A to share spectrum between cells as well as within them, and improving interference coordination between cells. These are uncontroversial and are very likely to be deployed as incremental upgrades to existing LTE networks long before 5G is rolled out or even finished. This is what some vendors, notably Huawei, refer to as 4.5G.

Better antennas, beamforming, etc.

More excitingly, NGMN envisages some advanced radio features. These include beamforming, in which the shape of the radio beam between a base station and a mobile station is adjusted, taking advantage of the diversity of users in space to re-use the available radio spectrum more intensely, and both multi-user and massive MIMO (Multiple Input/Multiple Output). Massive MIMO simply means using many more antennas – at the moment the latest equipment uses 8 transmitter and 8 receiver antennas (8T*8R), whereas 5G might use 64. Multi-user MIMO uses the variety of antennas to serve more users concurrently, rather than just serving them faster individually. These promise quite dramatic capacity gains, at the cost of more computationally intensive software-defined radio systems and more complex antenna designs.Although they are cutting-edge, it’s worth pointing that 802.11ac Wave 2 WiFi devices shipping now have these features, and it is likely that the WiFi ecosystem will hold a lead in these for some considerable length of time.

New spectrum

NGMN also sees evolution towards 5G in terms of spectrum. We can divide this into a conservative and a radical phase – in the first, conservative phase, 5G is expected to start using bands below 6GHz, while in the second, radical phase, the centimetre/millimetre-wave bands up to and above 30GHz are in discussion. These promise vastly more bandwidth, but as usual will demand a higher density of smaller cells and lower transmitter power levels. It’s worth pointing out that it’s still unclear whether 6GHz will make the agenda for this year’s WRC-15 conference, and 60GHz may or may not be taken up in 2019 at WRC-19, so spectrum policy is a critical path for the whole project of 5G.

Full duplex radio – doubling capacity in one stroke

Moving on, we come to some much more radical proposals and exotic technologies. 5G may use the emerging technology of full-duplex radio, which leverages advances in hardware signal processing to get rid of self-interference and make it possible for radio devices to send and receive at the same time on the same frequency, something hitherto thought impossible and a fundamental issue in radio. This area has seen a lot of progress recently and is moving from an academic research project towards industrial status. If it works, it promises to double the capacity provided by all the other technologies together.

A new, flatter network architecture?

A major redesign of the network architecture is being studied. This is highly controversial. A new architecture would likely be much “flatter” with fewer levels of abstraction (such as the encapsulation of Internet traffic in the GTP protocol) or centralised functions. This, however, would be a very radical break with the GSM-inspired practice that worked in 2G, 3G, and in an adapted form in 4G. However, the very demanding latency targets we will discuss in a moment will be very difficult to satisfy with a centralised architecture.

Content-centric networking

Finally, serious consideration is being given to what the NGMN calls information-based networking, better known to the wider community as either name-based networking, named-data networking, or content-centric networking, as TCP-Reno inventor Van Jacobsen called it when he introduced the concept in a now-classic lecture. The idea here is that the Internet currently works by mapping content to domain names to machines. In content-centric networking, users request some item of content, uniquely identified by a name, and the network finds the nearest source for it, thus keeping traffic localised and facilitating scalable, distributed systems. This would represent a radical break with both GSM-inspired and most Internet practice, and is currently very much a research project. However, code does exist and has even beenimplemented using the OpenFlow NFV platform, and IETF standardisation is under way.

The mother of all stretch targets

5G is already a term associated with implausibly grand theoretical maxima, like every G before it. However, the NGMN has the advantage that it is a body that serves first of all the interests of the operators, the customers, rather than the vendors. Its expectations are therefore substantially more interesting than some of the vendors’ propaganda material. It has also recently started to reach out to other stakeholders, such as manufacturing companies involved in the Internet of Things.

Reading the NGMN document raises some interesting issues about the definition of 5G. Rather than set targets in an absolute sense, it puts forward parameters for a wide range of different use cases. A common criticism of the 5G project is that it is over-ambitious in trying to serve, for example, low bandwidth ultra-low power M2M monitoring networks and ultra-HD multicast video streaming with the same network. The range of use cases and performance requirements NGMN has defined are so diverse they might indeed be served by different radio interfaces within a 5G infrastructure, or even by fully independent radio networks. Whether 5G ends up as “one radio network to rule them all”, an interconnection standard for several radically different systems, or something in between (for example, a radio standard with options, or a common core network and specialised radios) is very much up for debate.

In terms of speed, NGMN is looking for 50Mbps user throughput “everywhere”, with half that speed available uplink. Success is defined here at the 95th percentile, so this means 50Mbps to 95% geographical coverage, 95% of the time. This should support handoff up to 120Km/h. In terms of density, this should support 100 users/square kilometre in rural areas and 400 in suburban areas, with 10 and 20 Gbps/square km capacity respectively. This seems to be intended as the baseline cellular service in the 5G context.

In the urban core, downlink of 300Mbps and uplink of 50Mbps is required, with 100Km/h handoff, and up to 2,500 concurrent users per square kilometre. Note that the density targets are per-operator, so that would be 10,000 concurrent users/sq km when four MNOs are present. Capacity of 750Gbps/sq km downlink and 125Gbps/sq km uplink is required.

An extreme high-density scenario is included as “broadband in a crowd”. This requires the same speeds as the “50Mbps anywhere” scenario, with vastly greater density (150,000 concurrent users/sq km or 30,000 “per stadium”) and commensurately higher capacity. However, the capacity planning assumes that this use case is uplink-heavy – 7.5Tbps/sq km uplink compared to 3.75Tbps downlink. That’s a lot of selfies, even in 4K! The fast handoff requirement, though, is relaxed to support only pedestrian speeds.

There is also a femtocell/WLAN-like scenario for indoor and enterprise networks, which pushes speed and capacity to their limits, with 1Gbps downlink and 500Mbps uplink, 75,000 concurrent users/sq km or 75 users per 1000 square metres of floor space, and no significant mobility. Finally, there is an “ultra-low cost broadband” requirement with 10Mbps symmetrical, 16 concurrent users and 16Mbps/sq km, and 50Km/h handoff. (There are also some niche cases, such as broadcast, in-car, and aeronautical applications, which we propose to gloss over for now.)

Clearly, the solution will have to either be very flexible, or else be a federation of very different networks with dramatically different radio properties. It would, for example, probably be possible to aggregate the 50Mbps everywhere and ultra-low cost solutions – arguably the low-cost option is just the 50Mbps option done on the cheap, with fewer sites and low-band spectrum. The “broadband in a crowd” option might be an alternative operating mode for the “urban core” option, turning off handoff, pulling in more aggregated spectrum, and reallocating downlink and uplink channels or timeslots. But this does begin to look like at least three networks.

Latency: the X factor

Another big stretch, and perhaps the most controversial issue here, is the latency requirement. NGMN draws a clear distinction between what it calls end-to-end latency, aka the familiar round-trip time measurement from the Internet, and user-plane latency, defined thus:

Measures the time it takes to transfer a small data packet from user terminal to the Layer 2 / Layer 3 interface of the 5G system destination node, plus the equivalent time needed to carry the response back.

That is to say, the user-plane latency is a measurement of how long it takes the 5G network, strictly speaking, to respond to user requests, and how long it takes for packets to traverse it. NGMN points out that the two metrics are equivalent if the target server is located within the 5G network. NGMN defines both using small packets, and therefore negligible serialisation delay, and assuming zero processing delay at the target server. The target is 10ms end-to-end, 1ms for special use cases requiring low latency, or 50ms end-to-end for the “ultra-low cost broadband” use case. The low-latency use cases tend to be things like communication between connected cars, which will probably fall under the direct device-to-device (D2D) element of 5G, but nevertheless some vendors seem to think it refers to infrastructure as well as D2D. Therefore, this requirement should be read as one for which the 5G user plane latency is the relevant metric.

This last target is arguably the biggest stretch of all, but also perhaps the most valuable.

The lower bound on any measurement of latency is very simple – it’s the time it takes to physically reach the target server at the speed of light. Latency is therefore intimately connected with distance. Latency is also intimately connected with speed – protocols like TCP use it to determine how many bytes it can risk “in flight” before getting an acknowledgement, and hence how much useful throughput can be derived from a given theoretical bandwidth. Also, with faster data rates, more of the total time it takes to deliver something is taken up by latency rather than transfer.

And the way we build applications now tends to make latency, and especially the variance in latency known as jitter, more important. In order to handle the scale demanded by the global Internet, it is usually necessary to scale out by breaking up the load across many, many servers. In order to make this work, it is usually also necessary to disaggregate the application itself into numerous, specialised, and independent microservices. (We strongly recommend Mary Poppendieck’s presentation at the link.)

The result of this is that a popular app or Web page might involve calls to dozens to hundreds of different services. Google.com includes 31 HTTP requests these days and Amazon.com 190. If the variation in latency is not carefully controlled, it becomes statistically more likely than not that a typical user will encounter at least one server’s 99th percentile performance. (EBay tries to identify users getting slow service and serve them a deliberately cut-down version of the site – see slide 17 here.)

We discuss this in depth in a Telco 2.0 Blog entry here.

Latency: the challenge of distance

It’s worth pointing out here that the 5G targets can literally be translated into kilometres. The rule of thumb for speed-of-light delay is 4.9 microseconds for each kilometre of fibre with a refractive index of 1.47. 1ms – 1000 microseconds – equals about 204km in a straight line, assuming no routing delay. A response back is needed too, so divide that distance in half. As a result, in order to be compliant with the NGMN 5G requirements, all the network functions required to process a data call must be physically located within 100km, i.e. 1ms, of the user. And if f the end-to-end requirement is taken seriously, the applications or content that they want must also be hosted within 1000km, i.e. 10ms, of the user. (In practice, there will be some delay contributed by serialisation, routing, and processing at the target server, so this would actually be somewhat more demanding.)

To achieve this, the architecture of 5G networks will need to change quite dramatically. Centralisation suddenly looks like the enemy, and middleboxes providing video optimisation, deep packet inspection, policy enforcement, and the like will have no place. At the same time, protocol designers will have to think seriously about localising traffic – this is where the content-centric networking concept comes in. Given the number of interested parties in the subject overall, it is likely that there will be a significant period of ‘horse-trading’ over the detail.

It will also need nothing more or less than a CDN and data-centre revolution. Content, apps, or commerce hosted within this 1000km contour will have a very substantial competitive advantage over those sites that don’t move their hosting strategy to take advantage of lower latency. Telecoms operators, by the same token, will have to radically decentralise their networks to get their systems within the 100km contour. Those content, apps, or commerce sites that move closer in still, to the 5ms/500km contour or further, will benefit further. The idea of centralising everything into shared services and global cloud platforms suddenly looks dated. So might the enormous hyperscale data centres one day look like the IT equivalent of sprawling, gas-guzzling suburbia? And will mobile operators become a key actor in the data-centre economy?

  • Executive Summary
  • Introduction
  • 5G – cutting through the hype
  • A stronger definition: a collection of related technologies
  • The mother of all stretch targets
  • Latency: the X factor
  • Latency: the challenge of distance
  • The economic value of snappier networks
  • Only Half The Application Latency Comes from the Network
  • Disrupt the cloud
  • The cloud is the data centre
  • Have the biggest data centres stopped getting bigger?
  • Mobile Edge Computing: moving the servers to the people
  • Conclusions and recommendations
  • Regulatory and political impact: the Opportunity and the Threat
  • Telco-Cloud or Multi-Cloud?
  • 5G vs C-RAN
  • Shaping the 5G backhaul network
  • Gigabit WiFi: the bear may blow first
  • Distributed systems: it’s everyone’s future

 

  • Figure 1: Latency = money in search
  • Figure 2: Latency = money in retailing
  • Figure 3: Latency = money in financial services
  • Figure 4: Networking accounts for 40-60 per cent of Facebook’s load times
  • Figure 5: A data centre module
  • Figure 6: Hyperscale data centre evolution, 1999-2015
  • Figure 7: Hyperscale data centre evolution 2. Power density
  • Figure 8: Only Facebook is pushing on with ever bigger data centres
  • Figure 9: Equinix – satisfied with 40k sq ft
  • Figure 10: ETSI architecture for Mobile Edge Computing