What is Dark Data and Why Should We Care About It?
By: Linda Rosland
Mining dark data will be a trend in the coming year (Some, n.d.). The term “dark data” may conjure up images of nefarious web sites and activities, but in reality, it’s information that has been gathered through different computer networks, but is not used to derive any insights for decision making (Wikipedia, n.d.).
Dark data is like the detritus in an unorganized desk. There might be items of interest in there, but without categorization and evaluation, those items don’t serve a useful purpose.
Dark data is unconstructed and it’s formats not easy to analyze. Often there is so much of it that it’s hard to access. Many companies and organizations keep dark data “in case” they find a use for it (Tittel, 2014). Dark data can include text messages, log files, presentations, account information, documents, email, audio and video files and images (Kambries, Roma, Mittal, & Sharma, n.d.). It can also include service tickets, customer complaints, paper documents and the deep web, which is hundreds of time bigger than the surface web (Chandler, n.d.). All this data holds possibilities for being useful and has potential for positioning organizations to be at the head of the pack concerning data driven customer relationships.
Organizations keep dark data for different reasons. Storage is usually not expensive and not difficult, and dark data might be useful in the future. However, though it may not be expensive to store dark data, analyzing it can be costly and time consuming, two reasons that many organizations don’t actively pursue examining their dark data (Wikipedia, n.d.).
Not inspecting dark data at all or quickly enough, has consequences. Dark data may contain sensitive information and needs to be safeguarded to prevent security breaches. Security risks can be prevented or mitigated by looking at this data and determining whether it is of a sensitive nature or not.
It is likely dark data in some amount, will continue to exist, in some form. However, organizations should have processes in place and make a point of regularly auditing their stored information. This process should include purging unnecessary data and should provide a structure for organizing data so that in the future, dark data can be made useful. If an entity finds that it doesn’t want to purge old data, it should find a secure way of storing and backing up the data. Organizations should assess the potential value of their data, and data should be encrypted when stored (Tittel, 2014).
There are some companies that focus on retrieving dark data. Deep Web Technologies is one (Kambries, Roma, Mittal, & Sharma, n.d.). Federal scientific agencies and several academic and corporate organizations currently use Deep Web Technologies search tool (Kambries, Roma, Mittal, & Sharma, n.d.). Deep Web Technologies web site explains that they create custom search solutions. They’re search tool, Explorit Everywhere! searches, retrieves, ranks, categorizes and analyzes data from deep databases that are not accessible using general search engines (Explorit Everywhere!, n.d.). Among Deep Web Technologies customers are: Intel Corporation, BASF, Boeing, the National Library of Energy, George Mason University and Stanford University.
DeepDive is a project led by Christopher Re of Stanford University. The project is not under active development any longer, but still has an active community. Deep Dive is a data management system that extracts value from dark data (DeepDive, n.d.). DeepDive has spawned Lattice.io, a private, for-profit company. Lattice.io, founded by Christopher Re and Mike Cafarella, builds on top of DeepDive. It converts unstructured data to formats that can be used by current data analysis tools (lattice.io, n.d.).
As we place more emphasis on data driven
customer experiences, parsing dark data will become vital. Organizations that
are mining and consolidating dark data now will be ahead of the curve in the
Chandler, N. (n.d.). How the Deep Web Works. Retrieved from howstuffworks.com: https://computer.howstuffworks.com/internet/basics/how-the-deep-web-works.htm
DeepDive. (n.d.). Retrieved from deepdive.standord.edu: http://deepdive.stanford.edu/#who-develops-deepdive
Explorit Everywhere! (n.d.). Retrieved from deepwebtech.com: https://www.deepwebtech.com/products/explorit-everywhere/
Kambries, T., Roma, P., Mittal, N., & Sharma, S. K. (n.d.). Dark analytics: Illuminating opportunities hidden within unstructured data. Retrieved from www2.deloitte.com: https://www2.deloitte.com/insights/us/en/focus/tech-trends/2017/dark-data-analyzing-unstructured-data.html
lattice.io. (n.d.). Lattice io. Retrieved from LinkedIn: https://www.linkedin.com/company/lattice?trk=ppro_cprof
Some, K. (n.d.). Top 7 Big Data Analytics Trends For 2019. Retrieved from analyticsinsight.net: https://www.analyticsinsight.net/top-7-big-data-analytics-trends-for-2019/
Tittel, E. (2014). The Dangers of Dark Data and How to Minimize Your Exposure. Retrieved from cio.com: https://www.cio.com/article/2686755/data-analytics/the-dangers-of-dark-data-and-how-to-minimize-your-exposure.html
Wikipedia. (n.d.). Dark data. Retrieved from https://en.wikipedia.org: https://en.wikipedia.org/wiki/Dark_data