Building a Private Cloud at Los AlamosBy Drew Robb
September 22, 2010
That was the approach taken by Los Alamos National Laboratory as it seeks to create an infrastructure on demand (IOD) architecture to simplify the rollout of new technology projects and to eliminate delays in storage, server and network provisioning.
Anil Karmel, IT manager at Los Alamos National Lab noted four tenets that played a major role in the private cloud decision:
As we deploy more virtual servers, we consume far less power and also reduce electronic waste, said Karmel. We estimate eventual savings of $1.3 million annually due to IOD.
Server capacity on demand is now achievable in a few clicks. Instead of 30 days to provision a server, it now takes less than 30 minutes.
The organization is utilizing HP c7000 blade enclosures along with HP Virtual Connect Fibre Channel/Flex 10 Ethernet. HP BL460c and BL490c blades are used, with each blade containing multiple quad-core and six-core chips.
A NetApp SAN was brought in to add storage capacity. This is based on the NetApp V Series with 2 PBs of Tier 2 SATA storage. Tier One is provided by existing HP arrays.
The cloud itself consists of four elements: a web portal at the front end; Microsoft SharePoint as the automation engine for cloud workflows, and also as the integration point for functions such as chargeback; VMware vCloud Director to manage and operate the cloud; and VMware vShield to provide security at both the application level and at the user device level.
Any virtual environment has to be cost effective, so that means it has to be simple while being aware of any and all changes in real time, said Karmel.
This is especially important in the security arena. Traditional security operates at the hardware or software layer. But the addition of a virtualization layer, said Karmel, provides too many gray areas for such security tools to operate effectively. Hence security itself is now being virtualized to eliminate yet another wave of security holes showing up in the corporate networks.
Using Infrastructure on Demand, the National Lab is creating virtual security enclaves using vShield that prevent one desktop or client from infecting others, and keeps virtual machines (VMs) out of harms way. Rules are set indicating access rights, as well as security protocols based on threat detection. Traditional security tools interface with this virtual security layer to keep servers and devices more protected. Any time a threat is detected, the offending virtual computer is sent to a remediation area, which has no network connectivity with which to propagate malware.
This all occurs automatically based on preset policy, said Karmel. If a VM is moved from one host to another, the security policy given to it moves with it.
To prevent VM sprawl, VMs are given an expiry data. This is one year by default, though that can be adjusted. 30 days before the due date, an email is automatically generated asking the VM owner about renewal.
Another similar email is relayed with 10 days left and then again the day before expiry. As soon as the VM is turned off, the user is informed of the fact and asked if he/she wants it back on line. Even then, 29 days later, the user is told that VM is scheduled for deletion. The next day it is deleted.
However, a backup is retained for seven years just in case. The NetApp storage is used to create snapshots of VMs before they are retired to tape. For now, restores are not automated. But in the next version of Infrastructure on Demand, users will be able to restore VMs they desire in a few clicks. Lifecycle management of VMs is very important, said Karmel.
The organization has erected a chargeback structure. Cloud resources are priced according to CPU, RAM and disk. Users can see the total cost before submitting a request for IT resources. Following a request, the line manger has to approve and accepts the charges to that unit.
You have to build best practices around our workloads, said Karmel. Service Level Agreements (SLAs) are set at four 9s. If some hardware goes down and Infrastructure on Demand doesnt meet the SLA, it doesnt charge for that resource for that month. In addition, uptime and availability metrics are regularly published so users are fully informed.
At the moment, separate network, security and virtual server teams are being maintained to monitor the infrastructure. Over time, this may be streamlined to one centralized unit.