Are You Ready For Synthetic Data?
Posted on Wednesday, November 7, 2018
Much of the concern surrounding explosive data growth has been in relation to Internet of Things (IoT) sensors and their management systems. But there is another source that many CTOs are not yet aware of.
Machine Learning (ML) promises to make managing and actioning IoT data easier and less resource intensive – but the underlying algorithms require training first. And to do that, your data scientists will need raw data sets.
Introducing synthetic data
Rather than unleash their ML engine on live data, it is considered best practice to use a sanitized set of records. Often this is copies of your live databases that has been anonymized to remove personally identifiable data. Alternatively, your data scientists may generate a completely new dataset that exhibits the same basic details and behaviors as your live system.
But these are not small subsets of data. Machine Learning algorithms needs very large information data stores on which to train. Which means that synthetic data could grow your data estate faster than expected.
Choosing storage for your synthetic data
In an ideal world, the training phase of your Machine Learning project could be carried out on production-quality, high-speed storage. But the reality is that there is no need.
You could also choose to use a cloud platform for development and testing – but this involves a significant pause while your large synthetic dataset is uploaded. Then there is the issue of pay-as-you-use billing which could make the whole project more expensive than expected.
More expedient – and potentially more cost-effective – is the re-use of existing storage assets. By pressing redundant, or post-warranty arrays back into service you can quickly add the capacity required for your Machine Learning project, reducing lead time and overall project cost.
Even for businesses that do not use IoT sensors, Machine Learning will become a vital aspect of their digital transformation projects. So, it makes sense to consider how you will deal with the issue of synthetic data sooner rather than later.
To learn more about pushing post-warranty storage and servers back into service, and how CDS can help, please get in touch.
Powder: The Most Impractical Storage Suggestion Yet?
Belgian researchers have developed a way to store data on a dust-like substance – but what about the practicalities?
Are You Ready For Black Friday and Cyber Monday?
In a matter of weeks corporate IT systems are going to be placed under heavy load. Are you ready?