Hyperconverged Infrastructure: Preparing for the Next Failure

Image of data tower

Posted on Wednesday, October 16, 2019

Hyperconverged Infrastructure (HCI) simplifies many of the traditional administrative tasks associated with expanding or managing your data storage capabilities. The fully integrated, modular approach allows you to add capacity quickly and easily, and to configure fail-over systems in the event of an outage. 

But like any other storage array, HCI systems like the Dell EMC VxBlock range, are also prone to electro-mechanical failure. Here’s what you need to monitor.

Physical storage

The actual ‘disk’ element of any HCI array is the most likely point of failure. Wear and tear within a hard disk drive can cause bearing or servo failure – hardly surprising when you consider each platter completes more than two million revolutions every day.

SSD storage may not contain moving parts, but they too will inevitably fail over time. With a finite number of I/O operations, SSDs must be replaced relatively regularly.

PSUs

The presence of redundant units is an indication that PSUs can, and do, fail. In most cases, PSU breakdowns are caused by overheating following bearing failure in the cooling fan. This is especially true of HCI systems which contain separate PSUs for array controllers, network switches and other associated components.

Fans

Heat remains the number one cause of component failure in the data center – and that’s why fans are so important. Fans inside the array chassis and CPU/GPU coolers play a vital role in preventing HCI system failure.

Monitoring and maintenance is essential

To prevent HCI failure, your data center team need to closely monitor the components mentioned above. Spotting the early signs of imminent failure allows you to replace components before they fail unexpectedly, taking elements of your storage system offline.

For more help and advice about proactive monitoring, or to learn how CDS can help, please get in touch.

Download article as a PDF - Hyperconverged Infrastructure: Preparing for the Next Failure

More Articles

Image of people in data center

Who is monitoring your monitoring?

Urgent system alerts get the attention they deserve – but are important warnings being ignored?

Two tape storage disks with tape unspooling from them

Do You Recognize This Tape Archive Support Anomaly?

OEM support provisions are wholly inadequate for your tape storage strategy.