Management Imaging Recognition Scanning Software Strategy Privacy

Current Filter: Document>>>>>>

PREVIOUS

   Current Article ID:2767

NEXT



Mass migration: how not to get stung

Editorial Type: Strategy     Date: 09-2013    Views: 3803   





Jeff Mills, VP International for Actuate's Content Services Group (formerly Xenos Group), breaks down the steps and processes involved in a content migration project

Modern businesses deal with an inordinate amount of disparate data sources. It is not untypical for just one organisation to use relational databases, content repositories, email stores and file servers. Managing this complex environment can be a huge challenge on a daily basis yet only when you factor in requirements surrounding acquisitions and mergers, regulatory legislation, information governance, and mandates to reduce operational cost through vendor and infrastructure consolidation, does the true complexity emerge.

Enterprise content management (ECM) repositories have felt the force of these challenges more than anything else. ECM repositories hold a broad array of content types and rely on metadata associated with the individual items for discovery, validation, storage, organisation, retrieval, distribution, delivery and deletion.

As organisations acquire or merge with other businesses, consolidation of departmental silos of content into corporate archives, and addressing regulatory compliance create two major challenges: de-duplication of key metadata naming standards and connecting business applications with siloed and disparate content to create a global customer view.

ECM migration of this type is usually viewed only with trepidation but does in fact offer an opportunity to address the enhancement and augmentation activities required to address these issues. Just what are the challenges of migrating content to and from ECM systems and what is the best way to go about it?

DISCOVERY
The first task in any data migration project is to study the source of the data to ensure the documents themselves are well understood. The ECM system(s) in which the documents are currently stored need to be analysed, and the business environment in which the ECM system operates needs to be understood.

There will undoubtedly be many different types of document stored so what metadata or indexes are used to describe each document type? Individual document types typically possess a unique set of indexes that describe their contents and understanding these differences is necessary to recreate relationships between documents, the target ECM system and the connected business applications.

There is also a need to understand the source system in order to determine the most efficient means of accessing the data for migration. Indexes and other information that describe the documents stored in a system can be located in databases, control files or actually appended to the document contents themselves within the source system. This metadata is crucial for the retrieval process and needs to be maintained and migrated to the target system.

EXTRACTION
Extracting all document data and associated metadata from an ECM system is undoubtedly challenging. Since extraction can potentially put a large strain on the ECM system, care must be taken as to when the procedure occurs. ECM systems typically provide a mechanism by which to extract individual documents one at a time for viewing or editing. Extracting every document individually from the ECM tool for migration is not normally an effective mechanism, but a number of alternative approaches exist, such as batch tools, APIs to retrieve individual or multiple documents and even taking advantage of experienced vendors or consultants to help manage and facilitate moves between ECM systems.

Many of the methods used for extraction are time-consuming, and without the relevant expertise can literally take years to perform on large datasets. Once extracted, the content itself may not yet be in a usable state. To save space, content management systems employ schemes such as data compression using common or even proprietary algorithms. Data is also often encrypted as a security measure and furthermore it is imperative to maintain any existing metadata associations.

TRANSFORMATION
Once content has been identified in existing ECM systems, it may be necessary to convert or re-purpose that content from one format into another prior to loading into your target ECM system. ECM systems generally require document content streams, associated resources, and metadata to be in a specific format, prior to loading. For example, some ECM systems require documents to be in a stacked file with associated indexes structured in a specific format. In other systems, where document transformation is not possible at retrieval time, documents may need to be converted during the migration process and stored in PDF format, ready for presentation.



Page   1  2

Like this article? Click here to get the Newsletter and Magazine Free!

Email The Editor!         OR         Forward ArticleGo Top


PREVIOUS

                    


NEXT