Cloud computing: Addressing on a global scale
My interest for virtualization solution includes cloud computing. Now, cloud computing is the new buzzword. Users, system administrators and investors hope that cloud computing will solve all redundancy and administration problems that currently exists. I am currently not that optimistic, especially when I think about all the new problems that cloud computing introduces.
Today’s topic is addressing, and I have to admit that I have not done any extensive research on this area. VMware’s primary concern is how to migrate virtual networks within data centers. That is also a major issue, but it will not solve the issue I have in mind. The thought behind cloud computing is that your application can be operated somewhere on the network and you do not care about where and who operates it as long as it runs. Moving the application or service from one cloud provider to another should happen without users noticing it. Now, that vision put some restraints on how applications and services could be addressed on the Internet.
Now, why would one change cloud computing providers often if the provider is taking care of redundancy, backup and disaster recovery? Consider a scenario where a framework makes it possible to switch provider based on cost. The framework always selects the provider with the lowest cost and makes a switch if a provider is cheaper than the existing one.
For large enough services, this is not a huge problems. If the application owner has its own IP network delegates from one of the Regional Internet Registrars (RIR), for example RIPE, the transition could be done by updating BGP tables. The change will not be instant due to delay and caching, but it might be better than the alternative.
The brute force solution is to use DNS zones with a TTL of 1, making changes almost instant because it circumvents DNS server caching (this also includes round robin DNS entries). The disadvantages are huge; if the service is popular, the DNS servers hosting the zone files will be under a lot of stress. A move between server centers or cloud computing providers will terminate all existing sessions for users utilizing the service. This is not acceptable.
One friend of mine suggested that one could mirror the data and services during the transition in the same way as you do live migrations in VMware vSphere. This is probably a necessary feature of the framework since some old data will be updated during the switch, but it will not solve the addressing problem. Since a transition could take hours or days with the regular scheme, the amount of data that has to be synchronized could be overwhelming.
Now, another idea is to utilize Mobile IP, described in RFC3344 for IPv4 and RFC3775 for IPv6. Updates are added in RFC4721. It is also described in Wikipedia. Now, if I remember correctly from my course on internet addressing and routing, Mobile IP works in the following way:
- A home node has the announced address for the service. The home node exists at the same place at all times and should be placed somewhere on the Internet. For redundancy, a few home nodes should be added to a DNS RR service so that a single point of failure does not bring down the service.
- A foreign agent makes a virtual tunnel to the home node. Now, the foreign agent is supposed to be an edge router for the network where the service correctly resides on, but for simplicity, it could be the service itself.
The service has an address that is delegated to it from the network it currently resides on. The home node(s), which all the clients use to connect to the service, forwards all the packets to either the foreign agent which finds the actual service in its local network or to the service itself. Now, when the service is moved, the foreign agent or the service recreates its tunnel to the home node. Everything that happens behind the home node is hidden from the clients, making it impossible for the clients to notice that the service actually was moved. Data synchronization between the old location and the new location could be done with a REDO log in the same way as live migration.
Now, this solution is not free of problems. The first one is selecting the home node. It must be fairly stable and be placed at one or more trusted locations. The next problem is that all the traffic to the home node must be forwarded over the Internet to the foreign agent. This adds to the network load since traffic must go through a few extra hops to reach its destination. When talking latency sensitive applications, this is a killer.
Addressing on the Internet have been the focus for many researchers for many years. Basically, in the pure form, it is quite simple. When the Internet was small, this was a pretty easy task compared to the complex world we have created after years of exploring its limits. Dynamic networks, transitions between service providers and redundant network paths have made addressing a major issue. BGP is the routing protocol of choice today, but that might change when we move from a world where applications, services and networks are stationary to a world where everything could be moved with a few clicks. Some days, abstractions is a bitch.
Now, any comments? Research papers? Leave me a note!
Posted: June 18th, 2009 under Cloud computing, Random thoughts by Frode.
Comments: none
Write a comment