For those that have followed my recent postings about extending the cloud from the data center to the edge of the operator’s network, I am delighted to report that BT announced its Network Virtualization (NV) initiative at last week’s Carrier Cloud Summit. An initial Proof-of-Concept—involving a collaboration among BT, HP, Intel, Tail-f, Verivue, and WindRiver—demonstrated the viability of replacing hardware-based appliances with software running on virtualized commodity servers. Other operators are engaged in similar evaluations, so more news like this should be expected in the coming months.
Several things are noteworthy about the BT announcement, On the technical side, the team demonstrated a BRAS service running side-by-side on a commodity server with Verivue’s scalable caching service, known as HyperCache. Additional work is needed, but consolidating and interconnecting two services as diverse as BRAS and CDN on a single virtualized platform is a significant accomplishment. On the business side, the team conducted an in-depth analysis that shows a potential to reduce the total cost of ownership by a third to a half. What’s harder to quantify, but most exciting about the potential of NV, is the opportunity it provides operators to be more agile in deploying new services.
Perhaps the most important outcome of the Proof-of-Concept is the lessons learned by the participants, which are now being applied to a follow-on prototyping effort. For myself, several things have come more into focus. I’ll summarize three of them.
First, because it’s unlikely the industry will settle on a standard virtualization technology any time soon, service developers will need to work with multiple virtualization systems, including at least VMware, KVM, and Xen. The challenge isn’t so much getting a service to run on different flavors of VM, as it is integrating service management with the operator’s VM orchestration system of choice. From the service’s perspective, this requires a clean separation of VM provisioning from service configuration. The next step will be to look for opportunities to unify service configuration across multiple independent services, which on the surface sounds daunting, but I am convinced there is an opportunity for significant consolidation in the management plane. I’ll leave that discussion for a future post.
Second, more attention needs to be given to how storage is virtualized. This is because clouds typically support a self-contained storage service—e.g., SAN (block access) or NAS (file access)—rather than depending on each VM to directly manage local storage on its own. By decoupling compute and storage, the cloud takes responsibility for transparently migrating VMs from one server to another. This frees legacy services—such those being ported to the cloud from hardware appliance—from having to be cloud-aware. In contrast, a CDN service like Verivue’s HyperCache, which was originally designed to run in a virtualized environment, is different in two important ways. First, it is designed to directly manage storage rather than depend on an external storage service. Second, it largely involves ephemeral state, which means VMs that host a cache are more likely spun-up/down, as demand and maintenance schedules dictate, rather than transparently migrated from one server to another.
This suggests a hybrid solution, where on-server SSDs are viewed as a “CDN accelerator” and mapped directly to the VM hosting the resident caching service (e.g., HP’s BL460 blades each have two SSD slots), and an off-server SAN/NAS service provides a general-purpose storage facility that is available to all VMs running across a set of servers (including the CDN service, which uses it as a large-capacity parent cache).
Third, the trial brought a bit more clarity to the relationship between Software-Defined Networks (SDN) and Network Virtualization (NV). SDN has come to be a generic term, but narrowly it has to do with managing the network(s) used to interconnect the set of VMs that implement some scalable service, creating the illusion that the VMs are connected to a “nicely behaving” logical network and isolating such logical networks from each other. On the other hand, NV—as defined by the BT initiative—has to do with deploying network services as software modules running in VMs on commodity processors and deployed at the edge of the access network (as well as in the data center).
This is a clear enough distinction, but I think it’s helpful to distinguish among three scenarios that are likely to play out over the next few years. The first is the current state-of-the-art, let’s call it 1st generation SDN, and involves using SDN to manage the interconnection of VMs in a single data center. The second scenario is starting to be deployed in green field environments, let’s call it 2nd generation SDN, and involves using SDN to manage the interconnection of VMs distributed across multiple data centers. This means both the switches in the individual data centers and the switches/routers that interconnect data centers are participating in a common SDN management domain. The third scenario is the one that results when the fruits of the NV project start to be widely deployed. Let’s call it 3rd generation SDN, which I claim is distinct form the 2nd generation because the problem of managing VMs distributed throughout an access network is qualitatively different from the problem of managing a set of VMs distributed across a handful of data centers. I briefly talked about the unique properties of this third scenario in a recent post, but this strikes me as a topic that deserves much more attention.