Following on from my previous entry I’ve recently read that Google have lifted the veil of secrecy on their data centres and are now showing the world how they make their DC’s highly efficient. They too, have opted for the decentralised Uninterruptible Power Supply. Well, actually it’s more than that. They’ve actually incorporated the UPS into the server itself. (See this article)

What they do is basically refit the power supply with a battery, intelligent charger and some DC/DC conversion. This makes perfect sense, as you gain efficiency by removing the DC-AC inverter stage as required by all other UPS systems, raising the efficiency from low 90′s to over 99%.

The trouble with this however, is that you can’t actually post fit. You need to have your server built with this technology incorporated at the beginning, and Google custom build their own servers in any case.

There is one big drawback however, and that is they’ve completely ignored power quality. It’s all very well making systems more efficient, but to do so at the expense of power quality seems false economy to me.

An Uninterruptible Power Supply does more than provide battery backup, it should condition the utility power so that any transients, surges, harmonics and all power quality problems are eradicated before they hit your server. Google’s approach seems to ignore this and they may be leaving themselves open to power problems as a result.

I agree with the decentralised approach however, you put the UPS in, as and when needed, saving the upfront costs. Ensuring the UPS are at capacity also has the efficiency benefit. Any problems with the UPS can be easily rectified (and will only effect the server it’s attached too) and probably more importantly, the UPS makes sure that only clean power enters the server.

I read this article about an Australian University to Save £200k by a variety of energy saving schemes, one of which was to use local UPS systems instead of a central UPS system.

There are course, pro’s and con’s for each approach, but I hadn’t considered the efficiency angle before. When you look at it as a whole, then there is no way lots of individual UPS systems can be more efficient than one big one. Most centralised UPS systems will operate at no more than 50% load (due to redundancy -so if one UPS fails the other can support 100% load), and this is where a lot of efficiency is lost. Most UPS systems will be more efficient at full load than at half load (or less).

With point of use UPS systems, if you wanted to maintain redundancy, the same effect would occur. You would have two systems running at half load. Since each UPS system also requires its own onboard controller, you would think that this power loss would add up throughout the data center, in order to make the data centre less, rather than more efficient.

However, the real gain with using local systems is that you can size them exactly. With a centralised system you need to define what the maximum power consumption will be now, and at any time in the future and put in the according UPS (or opt for a modular system – but this is another blog entry). It is likely therefore, that in most early data centres, the centralised UPS are running no where near their 50% loading, whereas with local point of use systems you can just add systems as and when needed, thereby ensuring that you’re not wasting power by not having the UPS operating at its sweet spot.

We’ve actually used this approach for a customer recently. He has a small computer room, that has been built up over the years and has no overall UPS support. We’ve gone in to help and look at the options. The simplest approach seemed to be to put in a 10KVA UPS and wire this in to the existing infrastructure. This would give him the UPS support he needed. However, as his data suite was provided power by several circuits we would need to run in a new power feed. We would need to add PDU’s at the output of the UPS. We would then need to wire these into the existing circuits. All of a sudden, the actual cost of the UPS started paling into insignificance with the added installation costs.

As a result, we looked at individual UPS to fit into each rack and power the server and associated equipment individually. All of a sudden the numbers started to make sense. The KR1000J is more than enough for his servers, and occupies only 2U of rack space. So the customer has opted for individual UPS systems, saving an astounding £5,000 on a centralised system!

Well, here we are, the pinnacle of data centre design – the Tier IV Centre. Well, what does this do that the others don’t? Well, again, like Tier III, it requires computer systems that have dual power supplies (or a local STS if dual supplies are not available), but unlike Tier III it has two fully redundant power paths. This means one path can go down without effecting the load, and a failure of a UPS on the other path will still maintain power to the load.

Tier IV Data Centre Power Paths

Tier IV Data Centre Power Paths

This infrastructure allows planned maintenance to be carried out with the absolute minimum of downtime. A whole power path can be taken out of circuit, and the load is still supplied with redundant power, effectively becoming a Tier II centre during this outage. Something fairly major would have to happen to lose power to a well designed Tier IV centre (and occasionally it does!).

The availability of a Tier IV centre is 99.995% equating to less than 45 minutes downtime per year, practically guaranteeing 100% availability in any one year (as the 45 minutes is planned downtime and done every few years rather than every year).

So how can we implement this best practice in the SME computer room? Well, for a start remember that even in the world of data centres, Tier IV is rare. This is not because of just the power requirements but other considerations which is beyond the scope of this article. However, one idea may be to separate the redundancy element.

For example, let us suppose you had a computer room protected by 2x100KVA UPS Systems in parallel redundant configuration with gen set support. ie. a Tier II config. If a separate source (or even parallel the existing source) was available you could improve the reliability by taking one of the UPS and using this to supply a new ‘B’ power path. So the load is still supported by 2x100KVA UPS, if either one should fail then the other can support the full load. The UPS are independent of each other, therefore there is no risk of a communications failure, and more importantly, the load is not at risk from a single switchgear or panel failure. I’ll see if any statisticians can work out the maths for this, but common sense dictates it is a more reliable configuration.

Anyway, that concludes our data centre overview. Any comments or feedback appreciated.

The decision to create a Tier III Data Centre is a strategic one, usually as as result of the necessity of creating an extremely fault tolerant system, either for customer demand (website hosting for example) or for business continuity reasons (credit card processing, military, financial etc.).

The Tier III data centre was conceived when computer systems were introduced that had dual power supplies. The basis of the Tier III centre is that there is one power path with redundancy, and an alternate power path:

Tier III Data Center Power Flow

Tier III Data Center Power Flow

Note that essential cooling is now added to the UPS output, whereas with Tier I and II it was expected that the site could cope with short downtime of cooling during which time the generators would power up and restart the system. In Tier III this is not allowed and cooling is continuous. This needs to be borne in mind when selecting the type and size of UPS. It is for this reason that you will often find rotary UPS systems (with their ability to handle mechanical loads better than static systems) used for such applications, although this is by no means a requirement.

The computer systems are dual powered (commonly referred to as A & B inputs). If a computer system is utilised without dual inputs it is expected that a local Static Transfer Switch (STS) is utilised.

In the diagram above we will assume that the primary power path is A, and the secondary power path is B. The Static Transfer Switch (STS) also has primary and secondary inputs, and the primary input is also taken from Path A. The computers are therefore supplied by two power inputs, all of which is provided by the UPS systems. Should the primary power fail anywhere along Path A, then the Static Transfer Switch will revert to its secondary input and continue to supply power along Path B to the load.

NOTE: Static Transfer Switches are capable of switching within μsecs between their A & B inputs provided the two sources are synchronised. If they are not then there will be a switching delay. This is particularly important when you consider the issue of selectivity, i.e. the ability of the source to clear a fault. In order to achieve this selectivity, UPS are synchoronised with their bypass input. Should a fault occur they can switch to bypass instantaneously (quickly anyway) which then will allow a greater fault current allowing the fault to be cleared quickly (ie pop a fuse or trip a breaker) without causing disruption to other equipment on parallel circuits. This means that the primary and secondary sources should be synchronised. Make sure this can be done! Another factor of having unsynchronised inputs is for the potential of having 400V (not 230V) AC within the computer room cabinet.

As you can see, a Tier III is inherently more robust as it will allow failure along the entire path without power being lost. This is what classifies a Tier III system – it is basically a Tier II system with an alternate power path, derived from a seperate source. A Tier III centre has an availability of 99.982% which equates to 1.6 hours of downtime per year.

So how does this help the average computer room user? Well, Tier III is probably way over the top for an SME computer room. I have known small financial companies that require the fault tolerance of the Tier III infrastructure, however Tier III is more strategic and therefore the site is designed from the beginning with Tier III in mind. It is difficult to post fit a Tier III system without severe disruption to the existing business. However, if you required just a little more protection against unplanned outages, it may not be too difficult to install a secondary power path and an STS to feed your computers. Or another alternative may be to look at a halfway house for Tier IV…….

Tier II centres encompass all the features of Tier I centres with the addition of redundant critical power and cooling components.

Tier II Data Centre Power Path

Tier II Data Centre Power Path

 

Each component must be capable of operating if the other component fails. This is typically achieved with n+1 redundancy. What this means is that if ‘n’ eg 2, modules are required to support the load then install ‘n+1′, i.e. 3. There is a lot of debate as to what determines true redundancy. For example, some manufacturers have seperate UPS modules, controlled by a single controller. If the controller fails then the system fails, so this is not true redundancy, although they may argue that the controller is designed with redundancy built in.

Some UPS modules don’t have an internal static bypass and instead rely upon a wrap around static bypass. The argument here is that one large wrap around is more robust than several smaller ones, as smaller systems may blow up one by one due to a race condition in the event of a fault. My belief is that the latter is unlikely as static switching can occur in μseconds, probably about a thousand times quicker than the time needed to damage the single static switch. In any case, the static switch is usually rated many times higher than the nominal load current to accommodate fault currents. However, the wrap around is now a single point of failure in the system – although if it fails, this will only cause a problem if the system needs to bypass, so is this a problem?. The debate will continue to rage on.

Historically, UPS were configured in what was known as a “hot standby” configuration to achieve redundancy. In this instance two UPS are fed from the utility, but the output of the standby UPS is fed to the bypass input of the primary UPS. The primary UPS provides power and if it fails (and bypasses), the standby unit will then provide UPS power to the load via the bypass of the primary. Works in principle, and can be used with mixed manufacturers and ratings of different UPS systems, however, the primary UPS output is a single point of failure. In addition, should the primary UPS fail, the secondary UPS will instantaneously be expected to deliver from 0 to 100% load immediately. Shouldn’t be a problem, but sometimes it is!

The modern method of achieving redundancy is to share the load equally amongst the UPS modules (this is how all our Kehua Parallel Systems operate). The UPS talk to each other through redundant communications ports and no one UPS is master over the others. If any UPS should fail, the UPS is isolated from the others automatically.

Enough about redundancy, the Tier II centre is more robust than the Tier I centre however still has one power path and therfore there are times during faults or planned maintenance that the computers have to be powered down. As a result a Tier II centre has availability of 99.741% equating to 22.7 hours of downtime per year. Not much better than Tier I on the face of it (28.8 hours), however Tier II is more robust against unplanned outages.

So how does this impact the normal computer room? As said under Tier I then it depends upon the financial impact of downtime to your business. For SME’s that rely on computer systems, but will all go to the pub if the power is off, then there is perhaps no need to keep the computers running for hours on end, simply shut down gracefully and that’s that. Where you don’t want to have to shut down the system to perform maintenance on the UPS, and don’t want to leave the system vulnerable to power cuts or surges when using an external bypass switch, then you will need redundancy. Like all things in life, it’s a choice based upon your needs and wants.

It’s worth stating that Tier Standards are only a guide as to the robustness of a site against outages, there is no standard or law dictating that this is the way it should be. Use the information as a guide to what is best for your business.

Tier I data centres provide a dedicated site infrastructure to support IT Systems and include:

  • A dedicated space for IT systems
  • A UPS to filter power spikes, sags and protect against momentary outages
  • Dedicated Cooling Equipment
  • A Backup Generator to protect against prolonged outages

There is a single power path delivering power to the load and redundancy is not required. As a result any component or distribution path failure will impact the computer systems.

Power Flow Schematic for a Tier I Data Center

Standby Power Flow Schematic for a Tier I Data Center

During normal operation the UPS is providing clean power and protecting the load. A short term power outage will see the UPS continue to provide power to the computers but the cooling system will be shut down. During extended outages the generators will activate allowing continuing operation of the computers and the cooling system will restart. Any planned work will more than likely require the computer systems to be shut down.

Tier I data Centres have an availability of 99.671%, which equates to over 28.8 hours downtime per year (Planned and Unplanned).

For a typical computer room, a Tier 1 set up is more than likely adequate, with the addition perhaps of a redundant UPS module (see Tier II). The use of a generator is optional and dependent upon the impact of downtime to the business. The IT equipment can be configured to shut down gracefully in the event of a extended power failure, and the fact that lost data has been avoided is probably acceptable for many businesses.

Over the next week we’ll be blogging about how UPS are required and configured to achieve Tier I, Tier II, Tier III and Tier IV levels.

The Tier system has been developed by The Uptime Institute as an objective basis for comparing the capabilites of one particular design topology over another or to compare groups of sites.

We will be showing that (at least as far as power is concerned) how we can have move from Tier to Tier level depending upon the requirements of the data center, and how best to utilise these classifications into everday computer rooms. After all, not every business will need the stability of a modern data center.

Watch this space.

© 2012 The Power Protection Blog Uninterruptible Power Supply