Choosing A Cloud Software Partner

Jay Judkowitz (Profile)
Friday, August 12th 2011

This is the third in a series of four articles discussing infrastructure as a service (IaaS) clouds. The articles started with basic level setting and here we continue diving progressively deeper. The topics for the series are:

  1. Cloud 101
    • What is cloud
    • What value should cloud provide
    • Public, private, and hybrid cloud
    • Starting on a cloud project
  2. Application taxonomy, what belongs in the cloud, and why
  3. What you should look for in cloud infrastructure software
  4. Evaluating different approaches to cloud infrastructure software

Choosing A Cloud Software Partner

The concepts section describes architectural design points you should ask vendors about to make sure that the they are thinking like a true cloud provider and not simply cloud-washing older technology to try to be relevant in a new world. Keep in mind that these concepts are about infrastructure management in general, not just compute. You should think about the storage, network, and power aspects of your cloud in the same way.

The specific functionality section lists cloud features to check for.  If too many of these are missing, the cloud value proposition will not be delivered.

Core Concepts and Philosophy


In a cloud, scale is the key to long-term success. The number of nodes and instances, simultaneous connections to the management system, the networking and security features, etc. all need to scale. For each and every exciting and valuable feature a cloud vendor touts, you need to ask, “Can I have tens of thousands of those? What is the experience at that scale? When, if ever, does the scale impact the end user and how they do their work?”

While one can deploy a small-scale cloud, if a cloud is successful, it will become a single pool of capacity for an entire organization or even multiple organizations. In fact, the larger clouds scale, the more cost savings and value they generate since you start to see the benefits of  “the law of large numbers”. If you do not build for scale from the very beginning, you will hit a wall and need to create separately managed clouds. This will force end users to decide what workloads go on what clouds, thus the frictionless self-service model is broken. Furthermore, capex benefits will be lost as you are forced to overprovision each cloud fragment rather than benefiting from a single pool hosting many applications with offsetting resource consumption curves.

For example, in a non-cloud deployment, the datacenter management system is used by datacenter admins only. If it can only deal with tens of simultaneous connections and is limited to one or two nodes, there is no problem since the administrative team is relatively small. However, in a cloud, since a large number of end users drive the management system directly via self-service workflows, the management system requires a whole new level of scale.


Automation is the key for allowing end users to do their own work and also for lowering datacenter operation costs. Make sure there is a proper degree of automation for both application lifecycle operations and infrastructure operations.

The core principle for end user operations is that no end user task should ever trigger work on the datacenter administrator side, not even a single mouse click approval. There is still a high degree of control and protection required, but these controls must be implemented as up front policies where the right groups of people delegate the right privileges to the right consumers. Furthermore, there need to be audit trails so that one can show that the policies constrained people to the proper activities. However, none of this changes the fact that manual approval processes by central admins on regular daily end user operations cannot work in a cloud model.

The core principle for datacenter operations is that the cloud should be self-discovering, self-organizing, self-monitoring, and self-healing. Anyone that sells you a complete zero-touch datacenter today is certainly exaggerating, but you should check the features they have to make sure this philosophy is followed where technically feasible. Where manual intervention is needed, make sure that this intervention is required only for infrequent up front tasks and never for frequent operations that happen on a frequent basis.

As an example, initial cloud configuration and network setup may be items that require significant up front planning and hours to days of setup, but regularly growing the cloud deployment by adding nodes must require no more time and effort than it takes to rack the systems and plug them in.