The “Keep Everything” Approach Conundrum: Is it time to re-evaluate your Big Data Storage Strategy?

Image of a data storage server with a waste bucket and folder of deleted data

Posted on Wednesday, November 4, 2015

Modern Big Data Analytics strategies are built on one fundamental principle: never discard anything. According to Big Data gurus, it is impossible to unlock hidden insights if you do not have a complete understanding of the customer; logically then, businesses need to store every detail of every interaction to ensure that their analysis is not based on an incomplete dataset.

The ability to link unrelated datasets to provide additional context is thought to reveal further actionable insights that can be used to boost sales and profit margins. And in some industries, like grocery shopping, this is true; information about weather conditions, geography, demographics, health concerns, and trends does affect consumption of certain goods, so tracking these and other data points can assist with making timely business decisions.

Consequently businesses are rushing to implement Big Data systems that store everything indefinitely. Big Data analytics are forcing businesses to re-evaluate their information storage provisions, with many choosing to migrate much of the data to the Cloud, rather than make costly capital investment in additional onsite capacity.

But is there really a need to keep everything?

Accuracy and relevance

Any data scientist will tell you that their analysis is only as good as the information available. Research suggests that between 1%-5% of corporate data is inaccurate at any given point in time, while other estimates suggest the rate may actually be as high as 20%.

So it’s perfectly possible that one-fifth of the data stored in a Big Data system is inaccurate, affecting the insights drawn from it. And that’s before considering the rate of data decay – information that was accurate at the time of entry, but which has subsequently gone out of date.

Unfortunately these statistics do not accurately reflect the fact that much of today’s “corporate data” contain personal data that happens to reside on corporate storage assets. The rise of BYOD (bring your own device) computing has blurred the lines between personal and corporate data, with many employees using company resources as an additional backup for their own files. Their data is often duplicated several times, unnecessarily increasing the amount of paid-for Cloud storage required to backup and protect “corporate” data.

Regulatory issues

The European Court of Justice decision to overturn Safe Harbor agreements has once again raised the issue of privacy rights although there are not many questions currently being asked about the compatibility of a “store everything” strategy, and how this fits with privacy legislation.

Every sovereign state has their own rules about how personal data is to be handled, but most have provisions demanding that that data be accurate. The UK Data Protection Act enshrines one such principle, demanding that “Personal data shall be accurate and, where necessary, kept up to date.” The implication is that data that does not meet this guideline, or which will not be updated, should be removed/deleted. And UK legislation is not unique – the EU’s Data Protection Directive requires all member states to formulate their own local frameworks about handling personal data properly. 

Time to re-evaluate your data storage strategy?

In many respects, the rush for Big Data could be costing businesses more than it realises, particularly if the above concerns are not being addressed. Unnecessary Cloud resource usage, faulty insights and potential legal action for storing inaccurate data all eat into profit margins to the point that CIOs need to reconsider their “keep everything” policies. 

They may find that shaping and managing data at the point of collection reduces their capacity requirements to the point that local upgrades work out more cost-effective, efficient and speedy than archiving to the Cloud.

Interested in learning more? Give the CDS team a call today on +1 866 237 8008 and we’ll talk you through your options.