The Effects of Infrastructure Workload Provisioning on Capacity Planning

6 minute read

Infrastructure workload provisioning is the most basic consumption method of a cloud, public or private. Over time, most clouds are consumed in repeatable methods, such as a continuous integration pipeline. The representation of the repeatable method can be thought of as a profile. I refer to these as infrastructure workload provisioning profiles and are a concept that many cloud operators subconciously deal with on a daily basis but don’t always think about. They play into capacity planning, resource scheduling, service-level objectives, but most importantly they are becoming a very visible focal point to cloud customers. If a customer cannot provision that’s an outage, especially in an automated cloud.

Definitions

Let’s start with some definitions:

Workload

Any process or group of processes that consume resources and provide a service. In the virtual infrastructure world this has historically been represented as a virtual machine, however, containers, physical servers, storage volumes can also be (and should be) viewed as a workloads.

Infrastructure workload provisioning profile (IWPP)

A representation of the velocity, magnitude, concurrency, and churn of workload lifecycles within a resource constrained system.

Each property of the IWPP can be described as:

Velocity

Represents how fast provisioning will occur over X amount of time. An example is a service-level objective (SLO) for provisioning - each workload must be provisioned and available for use within a 5-minute from the original request.

Magnitude

Represents the number of workloads to execute a lifecycle task against at one time. An example is a CI/CD pipeline that uses a template to deploy a testing environment consisting of 40 workloads.

Concurrency

Represents how many parallel lifecycle tasks are running at the same time. An example is ten CI/CD pipelines, each using a template to deploy a testing environment consisting of 40 workloads.

Churn

Represents how often lifecycle tasks occur over X period of time. Churn ties velocity, magnitude, and concurrency together. An example is provisioning 100 workloads at 9 AM and deprovisioning them at 10 AM.

What do IWPPs describe?

IWPPs provide a usage model constructed from how your cloud is consumed. Consumption in this case should be defined as the usage of resources, whether that be facilities (power, cabling, cooling), compute, networking, and storage. Understanding your cloud’s IWPPs help you properly gauge how often or when you need to allocate or rebalance resources across your cloud to ensure availability and capacity.

An additional point to recognize about an IWPP is that it does not describe the types of workloads (e.g. physical server, virtual machine, container, object store, etc.), it describes the periodic interaction with workloads. All interactions from the management system with a workload has a performance cost associated with it, especially during initial provisioning (object creation) and final deprovisioning (object destruction). This means that a cloud can have multiple profiles, a good example is a cloud that hosts both continuous integration and continously delivered production services. The provisioning of workloads supporting continuous integration services are normally fully automated, without any manual steps; the provisioning of continuously delivered production services are only partially automated as code pushes to production are still manually triggered.

How does workload lifespan affect IWPPs?

Different lifespans of workloads can require unique profiles - one such scenario is a cloud with multiple service-level agreements (SLAs) or service-level objectives (SLOs) tied to provisioning. Using the example above, the lifetime of a continuous integration workload can be measured in hours, minutes, or even seconds. This results in a higher churn rate. The lifetime of continuously delivered production workloads, in most cases, will be longer, measured in days, weeks, or months (hopefully not years), and ultimately resulting in a much lower churn rate.

IWPP Types

There are four general types of IWPPs:

Low churn/low retention

Less frequent provisioning of short-lived workloads. An example is the lifecycle of a virtual machine hosting an actuarial table calculation application that is provisioned quarterly to build an insurance company’s actuarial tables and is destroyed within a day of its provisioning.

Low churn/high retention

Less frequent provisioning of long-lived workloads. Historically this is what the WPP in an enterprise most looks like. Generally, workloads are either manually provisioned by an infrastructure team or leverage a semi-automated method including checkpoints.

High churn/low retention

Continual provisioning of short-lived workloads. Continuous Integration (CI) environments normally match this profile, workload provisioning is fully automated with testing frameworks with the workload life expectancy measured in minutes or hours.

High churn/high retention

Continual provisioning of long-lived workloads. An excellent example of this is AWS, customers can continously deploy AWS EC2 instances without degradation and will keep them around.

Each IWPP type listed above can be represented graphically to better understand what the profile means. The following images are visual representations, describing some running workloads over time vs. some capacity thresholds. One thing you will notice is that I am using number of workloads as a unit descriptor instead of CPU or memory usage. There’s a reason behind that, these graphs should not be viewed as tactical information, they represent trends over time. Remember that a workload is an analogue for resource consumption.

A few things to pay attention to on these graphs:

  • The churn rate is described by the Average provisioning deviation data points.
  • “Provisioning” includes both the initial deployment of the workload and the eventual destruction of the workload.
  • The “workload” represents both the instance itself (e.g. a virtual machine) but can also represent other objects (e.g. a network port, a firewall rule, a storage volume, etc.) required by the workload, which by itself is only one dimension. The other required objects can be the kicker, in an OpenStack system, generally each virtual instance has three separately managed objects: the instance, a network port, a firewall ruleset. The overall cloud management system does not contend with a single object, it’s dealing with three, each with a management cost that plays into the churn rates.
  • Many cloud use general resource utilization thresholds to trigger new hardware purchases. These thresholds don’t necessary work as well under the duress of automated provisioning as they have in the past. In each graph data points have been added to represent 80% and 90% resource utilization.
  • The actual number in the Y axis that describes the number of workloads should be ignored, the differences are due to the different methods used to develop the graphs. However, with that said, the numbers do matter in your own cloud, for example it would be difficult to support the high churn/low retention if there aren’t enough resources.
Low churn/low retention

In the low churn/low retention profile, workloads are provisioned infrequently and only exist for a short period of time. There’s a low probability that the cloud’s capacity resource usage thresholds will be violated without enough time to procure additional consumption resources.

Low churn/high retention

In the low churn/high retention profile, workloads are still provisioned infrequently but exist for a much longer period of time. The probability of threshold violations is higher but a capacity planner should see it coming.

High churn/low retention

In the high churn/low retention profile things can get challenging. Clouds can get themselves in trouble during the shift from low churn/high retention to high churn/low retention; the capacity planning efforts do not always have a full understanding of the daily up-down of this provisioning profile. For example, in this graphic the 80% and 90% capacity usage thresholds are surpassed several times but the average number of workloads running over the month stays the same. The idea of what’s acceptable for available capacity vs. what isn’t needs to be rethought in a lot of cases.

High churn/high retention

Finally, in the high churn/high retention profile, honestly if a cloud is using this profile the operators probably already know what they are doing. If not the cloud will very quickly become a nightmare to manage and plan for. In a lot of cases I have experienced the high retention is due to the lack of consumer education, they may be used to grabbing as many resources as possible if historically doing so was difficult or arduous. If the cloud is pay-to-play this may not be a big deal, however, private clouds don’t normally have the mature chargeback and accounting functions in-place to account for this profile. I have also seen this profile in use within companies that are using AWS like it’s a private cloud, it “works” until the bill is received.

Updated:

comments powered by Disqus