Management Imaging Recognition Scanning Software Strategy Privacy

Current Filter: Document>>>>>>

PREVIOUS

   Current Article ID:1872

NEXT



Hidden treasures

Editorial Type: Strategy     Date: 01-2013    Views: 4526   








Using technology to discover and extract usable information from unstructured data is helping to realise tangible benefits for businesses across all sectors, explains Dr. Vijay Magon of CCube Solutions

Unstructured Data (or unstructured information) refers to information that is not held in a spreadsheet or database or does not fit within any recognised template or model. It can be textual or non-textual, comprising of dates, numbers, e-mail messages, instant messages, etc. It is typically held on paper and electronic files like Word documents, PowerPoint, e-mail, images, audio and video files, and increasingly as social media feeds. This makes unstructured data difficult to understand using traditional computer programs when compared to data stored in defined fields in database tables or tags within documents.

Businesses across all industries are gathering and storing more and more data on a daily basis - most of the business information in use today does not reside in a standard relational database. An often-cited statistic is that 80% of business data is unstructured. More recently analysts have estimated that data will grow 800% over the next five years. Unstructured information accounts for more than 70%-80% of all data in organisations and is growing 10-50x more than structured data, especially with the explosion in use of social media. In terms of estimated volumes, the numbers are staggering: recent estimates show that this year, the Digital Universe - meaning every electronically stored piece of data or file out there - will reach 1.2 million petabytes, this year. That's up from a measly 800,000 petabytes in 2009. Every day, we create 2,500 petabytes bytes of data - so much that 90% of the data in the world today has been created in the last two years alone!

RECOGNISING THE PROBLEM
If left unmanaged, the sheer volume of unstructured data generated each year within an enterprise can be costly in terms of storage, potential liability, access, and inefficiencies that multiply because data cannot analysed (e.g. for relationship management) or shared between users and between systems.

Unstructured data held in electronic files can have some imposed structure, at least for filing purposes, e.g. filenames, folder and sub-folder names, etc. - the assigned filing structures provide some degree of management and control to document collections just like tags within HTML serve to render information in a browser but do not directly convey the semantic meaning of the tagged content.

Paper-based unstructured data poses the biggest problems. Some organisations manage paper records internally using technologies such as imaging and document management, which apply pre-defined indexing rules to provide some degree of management and control. In both cases, the assigned indexing or metadata provides the means to convey structure onto collections of documents held on servers or managed using document management technologies.

Recognition technologies have been around for some time now of course and are continually getting better. These provide text-based content extracted from unstructured data sources, particularly from paper based records. Recent advances in text and content analytics, Natural Language Processing (NLP), and predictive analysis are offering opportunities for software applications to understand the extracted text (concepts, context, and meaning) and help unlock and use information buried on paper and in files. Can these technologies help practitioners unlock and use potentially life-saving information held in health records, for example?

WHAT'S IT WORTH?
It is worth looking at the health record problem as the potential benefits can be life-saving. The majority of document management solutions in use in hospitals in the UK provide facilities for capturing, managing, and delivering patient records.

A key requirement at most sites is to capture the legacy paper records - records which have been typically collated and managed over the years with few, if any, guidelines on how to manage paper records - there is a large variation in the way hospitals file paper records, ranging from random storage within paper folders (worst case) to organised filing within tabs or sections held in such folders. Consequently, the high investment required to sort, prepare, and digitise such records for use by practitioners, is difficult to justify.

As a result, scanning processes are put in place to digitise the patient records using the quickest and cheapest options - i.e. scan the records as they are found!

It is worth stating at the outset that new (or ongoing) records captured within document management systems and information created within such systems do not fall under the same trap - classification of new records is much more granular and, furthermore, automated to a large degree. Consequently, access and use of these records within an electronic system is more acceptable and welcomed by practitioners.



Page   1  2

Like this article? Click here to get the Newsletter and Magazine Free!

Email The Editor!         OR         Forward ArticleGo Top


PREVIOUS

                    


NEXT