Management Imaging Recognition Scanning Software Strategy Privacy

Current Filter: Document>>>>>>

PREVIOUS

   Current Article ID:2759

NEXT



Can't see the wood for the trees?

Editorial Type: Interview     Date: 09-2013    Views: 19938   








DM Editor David Tyler talks to CCube Solutions' MD Vijay Magon about the inexorable rise of unstructured data in business, and how technology is helping to address the issues that come with it

David Tyler: There is a long-standing statistic that suggests that around 80% of business data is unstructured - given the massive growth rates in data volumes, that means a lot of information that can't be readily exploited by current technologies, doesn't it?
Vijay Magon: Businesses across all industries are gathering and storing more and more data on a daily basis - most of the business information in use today does not reside in a standard relational database. Analysts have estimated recently that data will grow 800% over the next five years. Unstructured information accounts for more than 70%-80% of all data in organizations and is growing 10-50x more than structured data, especially with the explosion in use of social media. In terms of estimated volumes, the numbers are staggering: this year the Digital Universe - meaning every electronically stored piece of data or file out there - will reach 1.2 million petabytes. That's up from a measly 800,000 petabytes in 2009. Every day, we create 2,500 petabytes bytes of data - so much that 90% of the data in the world today has been created in the last two years alone!

If left unmanaged, the sheer volume of unstructured data that's generated each year within an enterprise can be costly in terms of storage, potential liability, access, and inefficiencies that multiply because data cannot analysed (e.g. for relationship management) or cannot be shared between users and between systems. Unstructured data held in electronic files can have some imposed structure, at least for filing purposes - filenames, folder and sub-folder names, etc. - the assigned filing structures provide some degree of management and control to document collections, just like tags within HTML serve to render information in a browser but do not directly convey the semantic meaning of the tagged content.

Paper-based unstructured data poses the biggest problems - some organisations manage paper records internally using technologies such as imaging and document management which apply pre-defined indexing rules to provide some degree of management and control. In both cases, the assigned indexing or metadata provides the means to convey structure onto collections of documents held on servers or managed using document management technologies.

DT: Your focus at CCube Solutions over recent years has been very much on the healthcare sector, as our readers will know: is this an area that suffers more than others from issues around unstructured data?
VM: It is worth looking in more detail at the health record issues as the potential benefits there can be life-saving. The majority of document management solutions in use in hospitals in the UK provide facilities for capturing, managing, and delivering patient records. A key requirement at most sites is to capture the legacy paper records - records which have been typically collated and managed over the years with few, if any, guidelines on how to manage them. There is a large variation in the way hospitals file paper records, ranging from random storage within paper folders (worst case) to organised filing within tabs or sections held in such folders. Consequently, the high investment required to sort, prepare, and digitise such records for use by practitioners, is difficult to justify. As a result, scanning processes are put in place to digitise the patient records using the quickest and cheapest options - which usually means scanning the records as they are found!

The usual cost models for scanning paper records to alleviate storage space are based on scanning these as they are found. These have not changed. Consequently, given the poor and variable paper filing practices, the digitised records add little value in delivering information, and the digitisation exercises do not adequately compensate for the loss of the universal convenience of paper!

While clever facilities within the viewing software might help users to navigate through the electronic records, these are far from an ideal solution and, at worst, lead to "IT failures" due to poor user acceptance. So then, if the time-consuming and costly processes necessary to sort, prepare, and in many cases re-structure existing paper records, cannot be justified, can technology help to unlock this vital information?

DT: What sort of technologies are you thinking of here - and how do they differ from solutions already in place?
VM: Recognition technologies have been around for a while, of course, and are getting better all the time. These provide text-based content extracted from unstructured data sources, particularly from paper based records. In addition, recent advances in text and content analytics, Natural Language Processing (NLP), and predictive analysis are offering opportunities for software applications to understand the extracted text (concepts, context, and meaning) and help unlock and use information buried on paper and in files. Can these technologies help practitioners unlock and use potentially life-saving information held in health records? That is the key question, as there is a clear need to make this information accessible, and actionable.



Page   1  2

Like this article? Click here to get the Newsletter and Magazine Free!

Email The Editor!         OR         Forward ArticleGo Top


PREVIOUS

                    


NEXT