Dark Data Digging in 2018
By: Matt Schuster
The world of data analytics is about to see a boom not unlike the gold rush of 1848. But instead of uncovering large veins of gold, organizations are about to realize and refine a large segment of previously shapeless data that has been accumulating for years in our physical and virtual file cabinets. The world of Dark Data, or collected but unused data, is about to be turned inside out as corporations are appreciating the scope and breadth of the amount of data being collected from various processes.
A quick explanation of Dark Data is any data that has been collected, but has only been used for the primary purpose it was assembled. Dark Data by its nature has not been aggregated into information, even though it may be relevant to other areas of an organization. Customer mailing lists containing physical addresses could be used to extrapolate geographical statistics, but because it was gathered solely as a mailing destination, it has not been studied to provide further insight into consumer trends.
According to Kayla Matthews of SmarDataCollective.com, organizations are finding better ways to utilize data that’s already being captured, but has not been structured or put a use other than its original intended purpose. With new regulations being put into place like the General Data Protection Regulation (GDPR), organizations are going to discover that controlling and removing their information about consumers will be onerous, and expensive, unless they find ways to structure and regulate their dark data.
Sony Shetty of Gartner.com recommends a security minded approach by reigning in the vast amounts of hoarded data from across the organization, securing it, creating accessibility, and deleting is at necessary. He endorses the use of unstructured data to be gathered and analyzed to further the business agenda, and with Alan Dayley, research director with Gartner, both caution the blind collection of data because if leads to increased storage expenses and too much unstructured data means lost business prospects because the data is not in a useful format providing information or insight.
As organizations continue to gather data from websites and customer interactions, companies like Deloitte Consulting’s Analytics & Information Management encourage the use of external demographics to strengthen internal analytics. Nitin Mittal of Deloitte Consulting LLP anticipates that by 2020 we will see 44 Zetabytes (44 billion terabytes) of data available, but 90 percent of that will be unstructured data gathered from non-traditional sources. Mittal envisions the next step will be the use of data that is already collected but has not had a means to analyze due to a lack of advanced computer pattern recognition algorithms. He posits that still images, audio, and video files already gathered can may contain valuable information once the data is extracted using the latest technologies.
While Dark Data itself has been collected for centuries, since man began first making marks to note down anything of importance, in May of 2018 there is the anticipation of a surge in Dark Data mining tied almost directly with the GDRP. With stringent legal liabilities coming out for data breaches of consumer information, Rebecca Harper, a freelance writer for SmartDataCollective.com, advises data scientists that we don’t have to stop collecting data, we just have to alter the way we store it. She proposes anonymizing the data to protect the individual without detracting from the results of the analytics.
Even American based organizations should be looking for ways to round up and control our Dark Data, and not just for compliance with the European Union General Data Protection Regulation. We should be able to consolidate our various repositories of data and employ new data mining techniques to scrutinize the hidden nuggets of information buried deep within the layers of unstructured data, just like the early miners excavating rock to find deposits of valuable gold. These new examinations of Dark Data hold the promise of riches in the form of statistical data about our consumer trends, as well as compliance with global data privacy regulations.
Harper, R. (2018, January 30). What Do Big Data Professionals Need to Know About GDPR. Retrieved from SmartDataCollective.com: https://www.smartdatacollective.com/what-big-data-professionals-need-know-about-gdpr/
Matthews, K. (2018, March 29). 5 Ways Dark Data Is Changing Data Analytics. Retrieved from SmartDataCollective.com: https://www.smartdatacollective.com/5-ways-dark-data-changing-data-analytics/
McNulty, K. (2017, June 30). Expanding the Definition of Dark Data and Mining It with Dark Analytics. Retrieved from ProwessCorp.com: http://www.prowesscorp.com/expanding-the-definition-of-dark-data-and-mining-it-with-dark-analytics/
Mittal, N. (2017, June 14). Dark analytics: Shedding light on a new business asset. Retrieved from Analytics-Magazine.org: http://analytics-magazine.org/dark-analytics-shedding-light-new-business-asset/
Shetty, S. (2017, September 28). How to Tackle Dark Data. Retrieved from Gartner.com: https://www.gartner.com/smarterwithgartner/how-to-tackle-dark-data/