Does your data science team want to accelerate insights and bring advanced ML/AI capabilities to your mainframe data with Amazon Redshift? Sure they do—and Treehouse Software enables that…

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

We are beginning to see a pleasant and welcomed trend with Treehouse customers who are looking to modernize their valuable mainframe legacy data on the Cloud—they are including their data science teams in the important planning phase of architecting new Cloud environments and targets. This is especially vital for customers who want to incorporate advanced analytics and ML/AI in their strategic data usage plans on the Cloud. Who can contribute better understandings of ultimate data usage than your resident data scientists?

____0_Amazon_Redshift

We have heard from many of these data scientists that a primary item on their “wish lists” is for a fully managed, AI powered, massively parallel processing (MPP) architecture to extract maximum value and insights. They specifically mention Amazon Redshift as the Cloud data warehouse (which is much more than a data warehouse) of choice for driving digitization across the enterprise, as well as help personalizing customer experiences. Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the highest performance at any scale. To this desire/question, we can answer with a resounding, “Yes, Treehouse Software has got you covered with Redshift connectivity!”.

The Treehouse Software solution…

Enterprise customers have come to Treehouse Software, because we bring not only proven mainframe data replication tools, but deep subject matter expertise in mainframe technologies, as well as the know-how to target relevant AWS offerings, such as Redshift, S3 (including S3 Express One Zone – see our recent blog on S3 Express One Zone), etc.

The Rocket Data Replicate and Sync (RDRS) solution allows customers’ legacy mainframe environment to operate normally while replicating data on AWS. The technology focuses on changed data capture (CDC) when transferring information between mainframe data sources and Cloud-based databases and applications. Through an innovative set of technologies, changes occurring in any mainframe datastore are tracked and captured, and ultimately published to Redshift.

____0_Mainframe_To_RedshiftHow does it work?

  1. We start at the source – the mainframe – where an agent (with a very small footprint) extracts data (in the context of either bulk-load or CDC processing).
  2. The raw data is securely passed from the mainframe to RDRS, which speedily transforms mainframe-formatted data into Unicode/JSON and publishes the results to a Kafka topic.
  3. Our efficient, autoscaling microservices take it from there. Treehouse Dataflow Toolkit functions consume the data from Kafka and land it in S3 buckets, where Treehouse’s proprietary crawler technology is used to automatically prepare landing tables, views, and additional infrastructure in Redshift.  Thenthe mainframe data is loaded into Redshift (all the while adhering to AWS’ recommended “best practices” for massive data loading, thus assuring shortest and surest loads).  The inherent reliability and scalability of the entire pipeline infrastructure assure near-real-time synchronization between mainframe sources and Redshift target tables.

Redshift tables and views: something for everybody

Within this framework, the Redshift staging tables (often referred to as “delta tables”) are constantly accruing historical data, ideally suited for data scientists looking to do trend analysis, predictive analytics, ML, and AI work.  For business analysts and others who prefer structured data representations of potentially complex hierarchical data, the Treehouse framework also automatically provides structured user-views, providing the look and feel of a SQL database.

…as innovations move faster along the timeline, keep your options open!

Publishing both bulk-load and CDC data to a reliable and scalable framework like Kafka allows you to maintain a broad array of options to ultimately feed your legacy data to any number of JSON-friendly ETL tools, target datastores, and data analytics packages (some of which may not even have been invented yet!).  In addition to Redshift, the Treehouse Dataflow Toolkit also currently targets Snowflake, Amazon DynamoDB, and Amazon Athena/S3.

Video – Introduction to Data Warehousing on AWS with Amazon Redshift…


__TSI_LOGO

Contact Treehouse Software today to discuss your project, or to schedule a demo of our Mainframe-to-AWS real-time and bi-directional data replication solution. 

AWS Services Provide Advanced Monitoring and Analytics of tcVISION’s Mainframe CDC Processing

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software, Inc.

____AI_Data_Monitoring_And_Analytics

Many Treehouse Software mainframe modernization customers have requirements for continuous near-real-time replication of mainframe data in order to keep a copy of the data synchronized on the Cloud. These customers are using tcVISION from Treehouse Software for changed data capture (CDC) for this synchronization, which allows changes occurring in any mainframe application data to be tracked and captured, and then published to a variety of AWS targets, including Amazon Simple Storage Service (S3). Some of these customers are also now asking us to recommend the best Cloud-based tools and methods to monitor and gain insights to these complex data processes. Coincidentally, while working with a current tcVISION customer, our technicians are testing out two particularly good, fully managed AWS services that can work hand-in-hand to address this need:

Amazon Athena

Since tcVISION supports Amazon S3 as a target, customers modernizing their mainframe systems on AWS can use Amazon Athena for monitoring and analysis of CDC processing from an S3 bucket.

Amazon Athena is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats. Athena provides a simplified, flexible way to analyze data from an S3 Bucket, as well as many other data sources, including on-premises data sources or other Cloud systems. Athena is built on open-source Trino and Presto engines and Apache Spark frameworks, with no provisioning or configuration effort required.

Figure 1: Example of an Athena query showing bulk-load statistics per table

____01_Amazon_Athena_Query

Amazon QuickSight

____01_Amazon_QuickSight

Once Athena is setup for monitoring an S3 Bucket, users can easily view their CDC processing and analytics with Amazon QuickSight. QuickSight utilizes advanced machine learning-powered insights and intuitive dashboards, so end users can make the best and quickest data-driven business decisions.

Figure 2: Example of Amazon QuickSight monitoring the throughput of our data to Snowflake

____01_Amazon_QuickSight02

Figure 3: Example of Amazon QuickSight pie chart showing the resulting rows loaded for each Snowflake table:

____01_Amazon_QuickSight03

Figure 4: Example of Amazon QuickSight chart showing statistics for our data bulk-load into Snowflake:

____01_Amazon_QuickSight04

Figure 5: Example of Amazon QuickSight chart showing our load time into Snowflake per table:

____01_Amazon_QuickSight05

View the Amazon QuickSite video here…


__001_TSI_LOGO

Interested in seeing a live, online demo of tcVISION?

Just fill out the Treehouse Software tcVISION Demonstration Request Form and a Treehouse representative will contact you to set up a time for your online tcVISION demonstration.


Providing a High Availability Framework for Mainframe-to-AWS Data Replication

by Dan Vimont, Cloud Solutions Architect at Treehouse Software, Inc.

tcV_HA_on_AWS

Treehouse Software customers are using tcVISION to enable mission-critical mainframe-to-AWS data replication pipelines.  Some of these production pipelines are providing vital near-real-time synchronization between source and target, and thus can’t afford any significant downtime in the event of failure.  So it’s only natural that a number of our customers have been asking for advice in setting up a high availability configuration for their tcVISION components that run on AWS EC2 instances.  The High Availability Framework discussed here provides for a Failover EC2 instance to automatically pick up tcVISION processing should the Primary instance (running in another Availability Zone) go down.

The Core Components:  Primary Instance & Failover Instance

The core components of a tcVISION high availability framework consist of two EC2 instances running in different Availability Zones:  a Primary EC2 instance and a Failover EC2 instance.  Both identically-configured EC2 instances are attached to a shared working-storage file system (either an EFS or FSx volume), which allows the Failover instance to seamlessly and quickly pick up tcVISION processing should the Primary instance suddenly become unavailable.

HA1

Use a Step Function to Automate the Failover Process

In the event of failure of the Primary instance, the recommended framework calls for automatic triggering of a Step Function for reliable failover processing, with steps that include the following:

  • verify that the Primary instance is unavailable (The tcVISION service cannot be active on both instances simultaneously, so this verification is vital.)
  • redirect all network traffic from the Primary instance to the Failover instance (via Route 53)
  • start tcVISION processing on the Failover instance

HA2

When Ready, Use a Step Function to Automate the Restoration Process

After operations personnel have completed recovery of the Primary EC2 instance, another Step Function may be manually triggered to reliably transfer tcVISION processing back to the Primary instance.

HA3.jp

Many More Details are Available Upon Request to Treehouse Customers

Full details regarding our recommended High Availability Framework for tcVISION are available upon request to Treehouse customers.  AWS services utilized in the complete recommended framework include Step Functions, Lambda Functions, EventBridge rules, CloudWatch alarms, SNS topics, a Route 53 Private Hosted Zone, and more.  The following diagram is a partial visual inventory of the recommended framework components.

HA5

Interested in seeing a live, online demo of tcVISION?

Just fill out the Treehouse Software tcVISION Demonstration Request Form and a Treehouse representative will contact you to set up a time for your online tcVISION demonstration.


__001_TSI_LOGO

How to Synchronize Data in Real Time Between the Mainframe and AWS with Treehouse Software’s Enterprise CDC Tool

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software, Inc.

Bidirectional_Data_Replication

Many mainframe integration scenarios require continuous near-real-time replication of relational data to keep a copy of the data synched in the Cloud. Change Data Capture (CDC) is used for this near-real-time transactional replication by capturing change log activity to drive changes in the target dataset.

Just what is CDC anyway?

Simply put, and in relation to Mainframe-to-Cloud and open systems data replication, CDC is the use of processes to identify when data has been changed in a source system, so the replicated upstream or downstream (depending on how you look at it) target can be kept in sync with the changes.

In a recent AWS Architecture Blog, readers learn about integration using mainframe data to build Cloud native services with AWS, including transactional replication-based integration via CDC.

____AWS_Mainframe_CDC_Diagram

As mentioned in the blog, AWS Partner CDC Tools are available for connecting data center mainframes to the various data targets, and Treehouse Software’s tcVISION is one of those tools available in the AWS Marketplace.

tcVISION allows changes occurring in any mainframe application data to be tracked and captured, and then published to a variety of target AWS databases and applications. tcVISION provides an easy and fast approach for Hybrid Cloud projects, enabling real-time and bi-directional data replication between the hardware and AWS.

Example of Db2-to-AWS CDC using tcVISION Mainframe Manager:

tcVISION_Db2_To_AWS_CDC

tcVISION supports several CDC methods available, depending on each customer’s use case:

Bulk Transfer

  • Efficient transfer of entire databases
  • Analysis for data consistency (verification)
  • Initial load (ETL) and periodic mass data transfer
  • One-step data transfer

Log Processing

  • Transfer of changed data near-realtime or scheduled time frame
  • Reads both active logs and archived logs

Batch Compare

  • Comparison of data snapshots using checksums
  • Efficient transfer of changed data since last processing
  • Flexible processing options (SORT etc.)
  • Automatic creation of deltas by tcVISION

DBMS Extension

  • Real-time capture of changed data directly from the DBMS
  • Secure data storage even across DBMS restart
  • Flexible propagation methods

Interested in seeing a live, online demo of tcVISION CDC?

Just fill out the Treehouse Software tcVISION Demonstration Request Form and a Treehouse representative will contact you to set up a time for your online tcVISION demonstration.


__001_TSI_LOGO

Treehouse Software helps Mainframe Customers Replicate Data on the World’s Fastest Cloud Data Warehouse — Amazon Redshift

by Joseph Brady, Director of Business Development / AWS and Cloud Alliance Lead at Treehouse Software, Inc.

Treehouse Software is an AWS Technology Partner, and has been a mainframe software company since 1982, and in the data replication and migration market space since the mid-1990s. Today, Treehouse Software is helping enterprise mainframe customers successfully move their data to the Cloud in order to utilize all of the most advanced Cloud services and tools available. Our tcVISION product can help customers replicate their mainframe data, in real-time and bi-directionally, between a vast array of source databases and many AWS Cloud technologies, including Amazon Redshift, the world’s fastest cloud data warehouse today.

Highly concurrent workloads? Not a problem. Redshift can handle virtually unlimited concurrency. Amazon Redshift’s hybrid architecture enables unmatched performance…

___AWS_RedShift_Architecture

Shared storage provides the ability to scale to unlimited concurrency, while Redshift’s on-instance storage provides low latency access to data that can’t be achieved any other way. The unique combination of both strategies provides Redshift’s best-in-class performance today and leaves room for continued performance improvements tomorrow.

Scale quickly and automatically to handle unpredictable workloads…

Start small at $0.25 per hour and scale up to terabytes or petabytes for under $1,000 per terabyte per year. Pay only for what you use and know how much you’ll spend with predictable monthly costs. That’s 75% less expensive than the #2 cloud data warehouse provider.


Further reading: tcVISION Mainframe-to-AWS data replication is featured on the AWS Partner Network Blog…

___tcVISION_AWS_HA_Architecture

AWS recently published a blog about tcVISION’s Mainframe-to-AWS data replication capabilities, including a technical overview, security, high availability, scalability, and a step-by-step example of the creation of tcVISION metadata and scripts for replicating mainframe Db2 z/OS data to Amazon Aurora. Read the blog here: AWS Partner Network (APN) Blog: Real-Time Mainframe Data Replication to AWS with tcVISION from Treehouse Software.


__tsi_logo_400x200

Contact Treehouse Software for a Demo Today…

No matter where you want your mainframe data to go – the cloud, open systems, or any LUW target – tcVISION from Treehouse Software is your answer.

tcVISION_Overall_Diagram

Just fill out the Treehouse Software Product Demonstration Request Form and a Treehouse representative will contact you to set up a time for your online tcVISION demonstration.