by Joseph Brady, Manager of Marketing and Technical Documentation at Treehouse Software, Inc.
Hadoop and Big Data are revolutionizing data processing, and because of the increasing digitalization, the Internet, the rising importance of Social Media, and the presence of “Internet of Things”, the data diversity is growing in dimensions that did not exist before.
To process and maintain large and diverse data sets in a meaningful way, new technologies (such as Hadoop) have been developed. What is Hadoop? Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
Enterprises with heterogenous IT infrastructures, especially larger corporation of all industry sectors and public institutions, very often include mainframe technology. These enterprises are now facing the challenge to integrate existing mainframe data into a Hadoop platform – in real-time.
Data integration technology also has experienced great evolution over the past decades. Today, a standard ETL solution is not sufficient, and the understanding of data integration must now include the entire data exchange process in terms of replication and synchronization. Data exchange is now a time critical process. Near real-time is more and more the only accepted method to meet the high, up-to-date requirements in an increasing co-existence of mainframe and Hadoop technologies.
The tcVISION Solution
An important part of the added value of modern IT systems is the latency-free data- and process-integration of transactional and analytical areas. The cross-system integration platform from Treehouse Software, tcVISION, is unique, efficient, and reliable. With tcVISION, mainframe data can quickly and easily be integrated in near real-time into Hadoop-based operative applications or Business Intelligence and Analytics.
The tcVISION solution is proven and mature, and is constantly under development to meet the requirements of new technologies, including support for Hadoop in Version 6.
The main focus of the tcVISION integration platform is to allow real-time synchronization to integrate mainframe data into Hadoop based solutions.
The tcVISION Technology Components
The tcVISION integration platform consists of a variety of state-of-the-art technology components, which cover much more than simply an ETL process.
- Data exchange in the sense of a real-time synchronization becomes a single step operation with tcVISION.
- No additional middleware is required.
- Modern Change Data Capture Technologies allow an efficient selection of the required data from the source system with focus on the changed data. The data exchange process is reduced to the necessary minimum which results in lower costs for the cross-system data integration.
- tcVISION also supports the fast and efficient load of large volumes of mainframe data into Hadoop. In this context the processor costs of the mainframe are low and negligible.
- An integrated Data Repository guarantees an overall cross-platform and transparent data management. Mainframe knowledge is not required.
- tcVISION include a rule-engine to transform data into a target compliant format or allows user-specific processing via supplied APIs.
- The integrated staging concept supports the offload of changed data in “Raw Format” to less expensive processor systems. This reduces mainframe processor resources to a minimum. The preparation of the data for the target system can be performed on a less expensive platform (Linux, UNIX or MS-Windows).
- The transfer to and feeding of data into Hadoop is part of the tcVISION data exchange process. No intermediate files are required.
- The exchange of large volumes of data between a production mainframe environment and Hadoop can run in parallel processes to reduce latencies to a minimum.
- The tcVISION integration platform contains comprehensive control mechanisms and monitoring functions for an automated data exchange.
- tcVISION has been designed in a way that Hadoop-based projects can be deployed with total project autonomy and maximum reduction of mainframe resources.
With tcVISION, data synchronization between mainframe and Hadoop pays off…
- Near real-time replication of mainframe data to Hadoop allows actual real-time analytics, or the relocation of mainframe applications (i.e., Internet applications like Online-Banking, e-Government, etc.) to Hadoop with synchronous data on both platforms.
- Because of the concentration on changed data, the costs of the data exchange are greatly reduced.
- The utilization of mainframe resources is reduced to a level that minimizes costs for mainframe know how and mainframe MIPS.
- Data exchange processes can be deployed and maintained with tcVISION without mainframe knowledge, hence costs can be saved and Hadoop projects can be faster developed and put into production.
- The near real-time replication of tcVISION from mainframe to Hadoop allows the relocation of BI reporting and analytic applications to the more cost efficient and – for these applications – more powerful Hadoop platform.