TDT: Much more than a mere “data connector” for Snowflake

Posted on April 16, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

____0_TDT_Snowflake_Splash

Over the past few months, we have been rolling out information on Treehouse Dataflow Toolkit (TDT), a state-of-the-art, fully automated offering for data transfer from Kafka pipes to Analytics/ML/AI frameworks. TDT is a set of proprietary microservices that assures highly-available, auto-scalable, and event-driven data transfers to your data science teams’ favorite analytics frameworks, such as Snowflake, Amazon Redshift, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL, all the while adhering to AWS’s and Snowflake’s recommended best practices for massive data loading. Make no mistake, TDT is MUCH more than merely a “connector”.

In this blog, we will focus on how TDT handles data transfers to perhaps the most complex environment: Snowflake. Out of all TDT functions and features, our Snowflake connectivity offers the biggest “value added” to customers, because Snowflake has quickly become a top choice for enterprises looking for a Cloud platform onto which they can mobilize data at near-unlimited scale and performance, and bring advanced ML/AI capabilities.

Snowflake overview video…

Connectivity using Snowflake’s best practices vs. traditional ODBC…

TDT’s innovative Lambda-based (microservices) approach enables faster data flow than any conceivable ODBC-based solution, which is the standard tool used for most “roll your own” approaches, or “we have a connector for that” offerings.

To load massive quantities of data to a target, TDT uses Snowflake’s (hugely scalable) bulk load utilities—not ODBC. It is vital to note that Snowflake is NOT a relational (OLTP) database, so doing CDC transfers to these targets via ODBC (with update, insert, delete transactions) goes directly against “best practices” advice from Snowflake, and would almost assuredly result in unwieldy bottlenecks.

____0_TDT_Snowflake01

TDT loads data into Snowflake’s “delta tables”, which inherently retain the entire history of source data ever since the source-to-target synchronization began (perfect for time-based trend/predictive/prescriptive analytics). Again, TDT adheres to Snowflake’s best practices recommendation for pulling data from S3 for bulk loading massive quantities of data…

____0_TDT_Snowflake02

Publishing both bulk-load and CDC data to a reliable and scalable framework like Kafka allows you to maintain a broad array of options to ultimately feed your legacy data to any number of JSON-friendly ETL tools, target data stores, and data analytics packages (some of which have not even been invented yet!).

The “build vs buy” question is put to rest…

The Snowflake-proprietary target DDL/metadata/resources that TDT automatically produces for the staging of data in Snowflake are of such complexity that it is easy to justify the “buy” option in the “build vs buy” conversations customers have. A decision by an enterprise not to use TDT, but instead to build its own Kafka-to-Snowflake solution, could result in any or all of the following:

accumulation of technical debt
extensive/unpredictable time to production
ongoing resource planning to maintain home-grown technologies
potential vendor lock for maintenance of custom-made technologies designed and developed by consultants
managing a mix of manual and automated functions
tracking cobbled together components created by multiple staff and consultants
limited agility for future customization and innovation
problems adhering to evolving best practices over time
higher costs for future growth/scaling
potential lack of proper security/ongoing security updates
your organization has now become an enterprise software development company, whether or not you intended that, and whether or not you realized that!

Simply put, TDT is a self-contained, turn-key solution that can eliminate months, or years, of research and development time and costs. With TDT, high-speed and massive data movement to Snowflake takes minutes to ramp up.

Download the TDT AWS Partner Solution Brief to share with your team…

DOWNLOAD…

Treehouse Dataflow Toolkit (TDT) is Copyright © 2024 Treehouse Software, Inc. All rights reserved.

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

So, You’ve Managed to Start Streaming Your Legacy Data into Kafka Pipelines… Now What?

Posted on April 3, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

Treehouse_Dataflow_Toolkit_Splash

Treehouse Software is helping customers modernize their valuable enterprise data on Cloud and Hybrid Cloud environments without disrupting the existing critical work on their legacy systems. However, a new strategic imperative has been added to the modernization game—the requirement to utilize today’s advanced Analytics/AI/ML-friendly platforms, such as Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL, where an ever-expanding array of AI/ML tools are available to generate vital insights from the customer’s data. Many of these customers are already using software tools provided by Treehouse, or other vendors to replicate their data into various target data stores, but also more crucially into Kafka pipelines (i.e., Amazon MSK, Confluent, etc.). Kafka is now the top choice for high-speed streaming of massive volumes of mission critical data, providing stable performance under extreme loads. This is especially valuable for enterprises that require up-to-the-second data delivery for use cases that include e-commerce, financial services, logistics, telecommunications, and government IT.

Traditionally, Treehouse customers utilized our data replication technologies to load legacy data into Kafka pipelines, and that was where our involvement generally ended…

____0_Traditional_Mainframe_To_Kafka

However, once Kafka is designated as a target in the customer’s architecture, we have increasingly become involved in two questions: “What now?”, and/or “What is the best mechanism for us to rapidly transfer data from Kafka to advanced analytics platforms?” Our answer: Look no further than Treehouse Software!

Treehouse Software brings a state-of-the-art, fully automated offering for data transfer from Kafka pipes to Analytics/ML/AI frameworks: the Treehouse Dataflow Toolkit (TDT). TDT is a set of proprietary microservices that assures highly-available, auto-scalable, and event-driven data transfers to your data science teams’ favorite analytics frameworks, all the while adhering to AWS’s and Snowflake’s recommended best practices for massive data loading, thus assuring shortest and surest loads. Additionally, TDT provides a frictionless and instant implementation, accelerating your path to deep data insights for optimizing business processes.

Why do AWS’s and Snowflake’s best practices recommend against using ODBC?

Your data science teams need large quantities of the very latest data in near-real-time, and ODBC doesn’t really do the job, offering only single-threaded, difficult to scale pipes. By contrast, TDT’s approach not only keeps things up-to-date faster than any conceivable ODBC-based solution, but the “delta tables” into which it loads data also inherently retain the entire history of source data ever since the source-to-target synchronization began (perfect for time-based trend/predictive/prescriptive analytics). To load massive quantities of data to a target, TDT uses the target vendors’ (massively scalable) bulk load utilities—not ODBC. It’s vital to note that Snowflake and Redshift are NOT relational (OLTP) databases, so doing CDC transfers to these targets via ODBC (with update, insert, delete transactions) goes directly against “best practices” advice from the vendors, and would almost assuredly result in unwieldy bottlenecks.

What if my data is not on a mainframe?

No worries. Treehouse Software’s messaging is primarily mainframe-centric, since that has been our area of expertise and bread-and-butter for over 40 years. However, data movement is data movement, and if your mainframe, or non-mainframe, data is being pumped to a Kafka pipeline, TDT will take it from there. When a data replication tool publishes both bulk-load and CDC data in JSON format to a reliable and scalable framework like Kafka, it sets the stage for TDT to feed legacy data to any number of JSON-friendly ETL tools, target data stores, and the latest (or yet to be invented) data analytics packages. TDT is the turn-key solution for the easiest and fastest implementation of Kafka data transfer…

TDT allows you to quickly ramp up your data analytics game by providing a rapid flow of data fresh off your enterprise data systems.

Download: TDT AWS Partner Solution Brief to share with your team…

DOWNLOAD…

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Quick Read: AWS Partner Solution Brief – Treehouse Dataflow Toolkit

Posted on March 11, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc.

____0_TDT_Generic01

Treehouse Software and AWS are collaborating on several AWS-centric initiatives in the coming months. The focus of these efforts is to market our new Treehouse Dataflow Toolkit (TDT), a set of microservices that provides the turn-key solution for transferring data from Kafka into advanced Analytics/AI/ML-friendly targets, such as Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL. We have worked with an AWS Marketing Manager to create the following TDT AWS Partner Solution Brief downloadable PDF that provides a one-minute overview of TDT, its benefits, and resource links for your team…

DOWNLOAD…

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Just what is the new Treehouse Dataflow Toolkit, and why is it the perfect tool for transferring mainframe data to Cloud-based data analytics and AI/ML frameworks?

Posted on March 5, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

____0_TDT_Splash

Introducing Treehouse Dataflow Toolkit…

Many enterprise customers and Cloud platform partners have been coming to Treehouse Software seeking the know-how and technology that enables state-of-the-art transfer of mainframe data to advanced analytics and ML/AI frameworks. In response to this demand, we have designed the Treehouse Dataflow Toolkit (TDT), a set of proprietary microservices that assures highly-available, auto-scalable, and event-driven data transfers to your data science teams’ favorite analytics frameworks.

These customers either already have, or are in the process of acquiring, software tools that replicate their data into Kafka pipelines (i.e., Amazon MSK, Confluent, etc.). Our new and innovative offering, TDT, provides the turn-key solution for getting this data from Kafka into advanced Analytics/AI/ML-friendly targets, such as Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL, all the while adhering to AWS’s and Snowflake’s recommended best practices for massive data loading, thus assuring shortest and surest loads.

Market snapshot…

For years, Snowflake and Redshift have been providing “old school” data analytics functionality, and now they are both ramping up their support for ML and GenAI functionality. They are generating the demand (and are doing a good job of it!).

As we’ve been hearing from our customers, it is not a question of, for example: getting their data to either Snowflake OR PostgreSQL OR Redshift, but instead to ALL OF THEM! Each target environment has its own business justifications and reasoning. Many sites will want to do this—send data not only to various RDBMS targets, but also to various Data Analytics targets. The justification for TDT is in a customer’s desire to ramp up its Data Analytics game, quickly and easily with data fresh off the mainframe; and achieving business goals and results faster and at a much lower cost than building a solution themselves.

How does TDT Work?

When a mainframe data replication tool (provided by one of Treehouse’s partners) publishes both bulk-load and CDC data in JSON format to a reliable and scalable framework like Kafka, it sets the stage for TDT to feed legacy data from Kafka to any number of JSON-friendly ETL tools, target datastores, and data analytics packages (some of which may not even have been invented yet!).

____0_TDT_Generic

We start at the source – the mainframe – where an agent (with a very small footprint) extracts data (in the context of either bulk-load or CDC processing).
The raw data is securely passed from the mainframe by one of our partner’s data replication tools that transforms the data into Unicode/JSON and publishes the results to a Kafka topic (in our example above, a topic in an Amazon MSK cluster).
TDT microservices consume the data from MSK/Kafka and land it in S3 buckets, where TDT’s proprietary crawler technology is used to automatically prepare landing tables, views, and additional infrastructure for various analytics friendly targets. Then the mainframe data is loaded into Redshift, Snowflake, S3, or PostgreSQL (all the while adhering to AWS’s and Snowflake’s recommended “best practices” for massive data loading, thus assuring shortest and surest loads). The inherent reliability and scalability of the entire pipeline infrastructure assures near-real-time synchronization between mainframe sources and the target tables, even with huge bulk-loads or transaction-heavy CDC processing.

History is enterprise GOLD…

TDT not only keeps things up to date faster than any conceivable ODBC-based solution, but the “delta tables” into which it loads data also inherently retain the entire history of source data ever since mainframe-to-target synchronization began. So, for example, after TDT has been syncing a target table for 5 years, a data scientist now has 5 years’ worth of historical data to work with for trend analysis, predictive analytics, prescriptive analytics, ML, etc.

…but you also need the very latest data in near-real-time.

While TDT’s unique “delta-tables” approach offers comprehensive “history” for advanced analytics, the traditional need for up-to-the-second, current snapshots of mainframe datastores is also completely provided for. Adhering once again to target vendors’ “best practices”, self-materializing views are provided to work with current data, not only in the JSON format in which it is stored, but also in fully-structured views which provide the more traditional look and feel of a SQL database.

Competitive differentiators between TDT and the “connectors”

TDT provides massive scalability, thanks to the AWS Lambda infrastructure.
TDT’s delta-table approach means unbeatable throughput (everything is just an INSERT, and it’s all going through the target vendors’ “best-practices” bulk-load utilities).
TDT’s advanced crawler automatically provides JSON-manipulating VIEWs (often awkward to develop in a SQL context) and other target infrastructure.
TDT adheres to AWS’s and Snowflake’s recommended best practices for connectivity.
Other data replication tools that attempt to target Redshift and Snowflake use only generic ODBC connections for data transmission.
- To load massive quantities of data to a target, TDT uses the target vendors’ (massively scalable) bulk load utilities—not ODBC. (Transaction-based ODBC transmissions afford a single, inherently difficult-to-scale pipe.)
- Snowflake and Redshift are NOT relational (OLTP) databases, so doing CDC transfers to these targets via ODBC (with update, insert, delete transactions) goes directly against “best practices” advice from the vendors, and will almost assuredly result in unwieldy bottlenecks.
- For Snowflake’s bulk-load functions to work, the development of additional Snowflake-proprietary objects (in addition to just target tables and views) is required; TDT’s crawler (DDL generator) function for Snowflake automatically generates statements to create these unique objects, along with the standard “create table”, “create view” statements.
Loading hierarchical data in JSON format (to JSON-friendly environments like Snowflake, Athena/S3, Redshift, and PostgreSQL) is the best methodology for many situations, because it avoids having to split hierarchies out into parent/child/grandchild tables, which have to subsequently be pulled back together again via cumbersome SQL queries in order for the data to be effectively worked with. NOTE that one of our customers has become so frustrated with working with “split apart” parent/child/grandchild structures in PostgreSQL that they want the ability to send their hierarchical data in JSON format TO POSTGRESQL (hence our recent addition of TDT support for PostgreSQL as a target).
For users who still want to work with data in structured parent/child/grandchild format (yes, many people still may be reluctant to work with JSON in the context of SQL queries), TDT’s crawler (DDL generator) functions provide user-views that exactly emulate those old-school parent/child/grandchild structures.
Production environment with TDT can be up and running in 2-4 weeks.
TDT’s SaaS model advantages include: ease of implementation, shorter time to move into production, reliable uptime, instantaneous upgrades, pay-as-you-go billing based on usage metrics, and ease of integration with other SaaS offerings.

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Treehouse Software Provides a Fast Path for Mainframe Data to Microsoft Azure’s Data Services

Posted on February 16, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

Customers who want to modernize mainframe data by leveraging Microsoft Azure without disrupting existing critical work on their legacy systems are finding Rocket Data Replicate and Sync (RDRS) from Treehouse Software to be the ideal solution. In addition to replicating data to a variety of Azure database targets, RDRS can stream data (in near-real-time) directly to Event Hubs (in JSON, CSV, or Avro formats), from which customers can either directly consume the data using their own microservices, or transfer the data to Azure Streaming Analytics, which then automatically feeds it to Azure Data Lake Storage, or Azure Cosmos DB, as seen in this high-level overview…

____0_Mainframe_To_Azure

RDRS focuses on changed data capture (CDC) when transferring information between mainframe data sources and Cloud targets. Through an innovative technology, changes occurring in any mainframe application data are tracked and captured, and then published to a variety of RDBMS and other targets.

RDRS utilizes a Windows-based GUI Dashboard, which is ideal for non-mainframe programmers. The RDRS Dashboard acts as a single point of administration, data modeling and mapping, script generation, and monitoring. Comprehensive monitoring and logging of all data movements ensure transparency across all data exchange processes.

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Treehouse Software and Confluent offer High-Speed Mainframe Dataflow for Cloud-based Advanced Analytics

Posted on January 31, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc.; Dan Vimont, Director of Innovation at Treehouse Software, Inc.; and Ram Dhakne, Staff Solutions Engineer at Confluent

____0_Treehouse_and_Confluent01

The message is clear from our customers—They want to modernize mainframe data on Cloud and Hybrid Cloud environments without disrupting the existing critical work on their legacy systems. They also want to tap into today’s advanced data analytics platforms such as Amazon Redshift, Snowflake, and Amazon Athena/S3, where an ever-expanding array of machine learning and artificial intelligence (ML/AI) tools are available to generate vital insights from their enterprise’s data. Your data science teams are eagerly awaiting the arrival of critical data from your mainframes to supercharge their predictive analytics and generative AI frameworks.

Treehouse Software and Confluent: Two companies providing a reliable and scalable solution…

Confluent Cloud Data Streaming Service

As stated on the Confluent website, “Your team has better things to do than fight Kafka fires.” That is why Confluent Cloud was built as a 10x better, fully managed, and truly Cloud-native service for Apache Kafka, powered by Kora engine. Customers can take data streaming to the next level—sans the Kafka management and operational woes.

Confluent Cloud offers enhanced productivity, improved scalability, minimized downtime, and much more—all while reducing total cost of ownership. Confluent Cloud offers:

Elastic scaling: Scale up and down quickly to meet fluctuating customer demand, without the ops burden that comes with scaling your data infrastructure
Infinite Storage: Enable powerful use cases by never having to worry about Kafka retention limits again, while only paying for the storage used
Built-in Resiliency: Ensure high availability and offload Kafka ops with 99.99% uptime SLA, multi-AZ clusters, and no-touch Kafka patches

Treehouse Software Mainframe CDC Data Replication

Enterprise customers have come to Treehouse Software, because the company brings not only proven mainframe data replication tools, but also deep subject matter expertise in mainframe technologies, as well as the know-how to target relevant offerings especially designed for ingesting data for advanced analytics and ML/AI.

The Rocket Data Replicate and Sync (formerly tcVISION) solution from Treehouse allows customers’ legacy mainframe environment to operate normally while replicating data on Cloud and Hybrid Cloud environments. The technology focuses on changed data capture (CDC) when transferring information between mainframe data sources and Cloud-based databases and applications. Through an innovative set of technologies, changes occurring in any mainframe datastore are tracked and captured, and ultimately published to various Cloud targets. Additionally, the Treehouse Dataflow Toolkit (TDT) set of microservices greatly enhances the architecture’s connectivity to high performance, non-relational, massive parallel processing datastores (Amazon Redshift, Snowflake, Amazon Athena/S3) that are primed to supply the most advanced ML/AI tools to data science teams.

Figure 1: In the longer-term picture, an enterprise can now keep its options open by propagating data to the highly reliable, very scalable Confluent Cloud that can be “subscribed to” by any number of current or yet-to-be-invented ETL toolsets and target datastores.

____0_Confluent01

How does it work?

We start at the source – the mainframe – where an agent (with a very small footprint) extracts data (in the context of either bulk-load or CDC processing).
The raw data is securely passed from the mainframe to Rocket Data Replicate and Sync (RDRS) which speedily transforms mainframe-formatted data into Unicode/JSON and publishes the results to a Kafka topic in Confluent Cloud.
The Treehouse Dataflow Toolkit functions consume the data from Confluent and land it in S3 buckets, where Treehouse’s proprietary crawler technology is used to automatically prepare landing tables, views, and additional infrastructure for various analytics friendly targets. Then the mainframe data is loaded into Redshift, Snowflake, or S3 (all the while adhering to AWS’ and Snowflake’s recommended “best practices” for massive data loading, thus assuring shortest and surest loads). The inherent reliability and scalability of the entire pipeline infrastructure assure near-real-time synchronization between mainframe sources and the target tables.

The very latest data—delivered!

Figure 2: RDRS, Confluent, and TDT work in tandem to easily replicate mainframe data and create target Snowflake resources for a wide variety of end use.

____0_Confluent02

Figure 3: TDT adheres to Snowflake’s recommended “best practices” for bulk loading of mainframe data by using its COPY function to load data from S3

____0_Confluent03c

This Treehouse/Confluent framework allows data in staging tables to be constantly accruing the most current data, ideally suited for data scientists looking to do trend analysis, predictive analytics, ML, and AI work. For business analysts and others who prefer structured data representations of potentially complex hierarchical data, this framework also automatically provides structured user-views, providing the look and feel of a SQL database.

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Treehouse Software – 40 Years and Still Moving Forward (Part 1)

Posted on January 16, 2023 by Treehouse Software

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software, Inc.

__TSI_LOGO_40th_Transp

Introduction

Many readers know that Treehouse Software has been around since 1983, serving enterprises worldwide with industry-leading software products and outstanding technical support. However, this blog series will dig a little deeper into Treehouse Software’s origins and explore how founder and president, George Szakach blazed a trail from being a systems programmer in the early 60s, to creating and growing his own software company from the early 80s up to the present.

The beginnings… 1960’s. Moon Landing, Flower Power, the Righteous Brothers, and Punched Cards

George is a Vietnam-era veteran and started working with mainframes in 1960 while in the Army.

After programming school in Fort Monmouth, NJ, George was assigned to Fort Huachuca, Arizona where he wrote army related applications on the IBM 709.

Before leaving the army in 1963, George had many job offers. Three years of programming experience was unheard-of back then, so his skillset was very valuable. He was even offered a job by the president of Informatics, working in Houston at the NASA Johnson Space Center to “put a man on the moon.” He declined.

Throughout the rest of the 60s, George worked at Burroughs, Univac, and Leasco. During the 70s through 1982, George worked for Ocean Data Systems, Data General, Optical Recognition Systems, Software AG (his longest stint – 7 years), and Superior Oil.

Punched card…

01_2023_Treehouse_Chronology_Blog_Punched_Card

IBM 709 Computer System…

IBM709

George’s archeological finds from his time at Univac…

____01_Chron_UNIVAC_Card

With all of this foundational mainframe experience combined with his skills honed at Software AG, the seeds were planted for George’s future: take those roots, move to the trees, and build a house…

Coming soon… Part 2: Treehouse Software’s first generations.

About Treehouse Software

Since 1983, Treehouse Software has been serving enterprises worldwide with industry-leading mainframe software products and outstanding technical support. Today, Treehouse Software is a global leader in providing data replication, and integration solutions for the most complex and demanding heterogeneous environments, as well as feature-rich, accelerated-ROI offerings for information delivery, and application modernization.

Contact Treehouse Software

Treehouse Software Salutes Franco Harris

Posted on January 3, 2023 by Treehouse Software

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software, Inc.

With the recent passing of Pittsburgh Steelers great running back and Pro Football Hall of Famer, Franco Harris, we would like to revisit April of 1993, when Treehouse Software held an international consultant’s symposium. The symposium brought together attendees and speakers from many consulting and technology companies, and schools from around the world. Since Treehouse Software is located in the greater Pittsburgh area, company president George Szakach was acquainted with Franco and invited him to deliver a fascinating and entertaining address, where he spoke about his career, several business ventures he was pursuing, as well as his budding interest in computer technology.

____01_Chron_Franco

Franco Harris at Treehouse Software’s Consultant’s Symposium (April 1993)

A few years ago, George reminded Franco about his visit to Treehouse back in 1993. He remembered and they shared some laughs and memories.

We would also like to mention Franco’s well-known sense of community and accessibility in Pittsburgh. Many staff members have met Franco over the years and have fond memories of his friendliness and willingness to spend time engaging in conversations. Those who come in to the Pittsburgh International airpot can see a sculpture depicting Franco’s famous “Immaculate Reception” from 1972. Thousands of people, especially recently, have selfies taken with the sculpture. Franco will be missed by his many friends and the community.

____01_Chron_Franco_Airport_Sculpt02

Franco Harris sculpture at Pittsburgh International Airport.

About Treehouse Software

Since 1982, Treehouse Software has been serving enterprises worldwide with industry-leading mainframe software products and outstanding technical support. Today, Treehouse Software is a global leader in providing data replication, and integration solutions for the most complex and demanding heterogeneous environments, as well as feature-rich, accelerated-ROI offerings for information delivery, and application modernization.

Contact Treehouse Software

Enterprise Mainframe Change Data Capture (CDC) to Apache Kafka with tcVISION and Confluent

Posted on May 3, 2022 by Treehouse Software

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software, Inc. and Ram Dhakne, Solutions Engineer at Confluent

___Mainframe_To_Kafka_Confluent

This blog focuses on using Treehouse Software’s tcVISION to replicate data in real time between mainframes and Confluent, allowing for new use cases and truly setting data in motion.

Why mainframe modernization? Benefits and use cases

Mainframe data stores often hold large amounts of complex and critical data in proprietary legacy formats, making this data difficult to extract and incompatible with modern databases, data types, and data tools.

Enterprises are looking to take advantage of the latest cloud services, such as analytics, artificial intelligence (AI) and machine learning, scalable storage, security, high availability, etc., or move data to a variety of newer databases. Additionally, many customers want to modernize their application on a cloud or open systems platform without disrupting the existing critical work on the legacy system.

How tcVISION syncs legacy data for the cloud

tcVISION is a data replication software product that performs real-time synchronization of mainframe data sources and cloud and open systems, allowing critical mainframe data to be consumed by a variety of leading cloud services.

tcVISION supports many mainframe data sources for both online and offline scenarios. Data can be replicated from IBM Db2 z/OS, Db2 z/VSE, VSAM, IMS/DB, CA IDMS, CA Datacom, or Software AG ADABAS. tcVISION can replicate data to many targets including Confluent Platform, Apache Kafka^®, AWS, Google Cloud, Microsoft Azure, PostgreSQL, Snowflake, etc. To learn more, see the complete list of supported tcVISION sources and targets.

tcvision-mainframe-to-confluent-cloud-data-replication-1536x1042

tcVISION focuses on CDC (change data capture) when transferring information between mainframe data sources and cloud and open systems databases and applications. Through innovative technology, changes occurring in any mainframe application data are tracked and captured, and then published to a variety of cloud and open systems targets.

tcVISION stores metadata in a relational database and the tcVISION manager components are administered by the tcVISION control board, a Windows GUI interface, which can be installed on premises or in the cloud. This allows tcVISION users to create metadata, create and control replication scripts, and control database interactions. tcVISION’s architecture is designed to minimize mainframe resource utilization.

Using the tcVISION control board, the most complex transformations can be specified, and it facilitates the mapping of the mainframe copybooks, redefines, data dictionaries, data catalogs, codepages, data type mapping, and more via the user-friendly interface. The repository editor allows users to control data transformations.

What is Confluent?

Confluent Cloud is a real-time data in motion platform that can be deployed in any public cloud, in any region of your choice. It comes with an SLA and uptime of 99.95%, and fully managed components like ZooKeeper, Kafka brokers, 120+ Kafka connectors, Schema Registry, and ksqlDB so you can leverage it on any cloud without having to worry about how it runs and scales.

Kafka Connect, Connect API, connectors, and tcVISION IBM Db2 connector

Kafka comes with three core APIs:

Kafka producer/Consumer API
Connect API
KStreams API

Kafka Connect is a tool for scalably and reliably streaming data between Kafka and other data systems. It makes it simple to quickly define connectors that move large data sets into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. Kafka Connect connects APIs under the hood with fully managed connector support in Confluent Cloud.

Step-by-step guide on how to use tcVISION and Confluent

This example discusses the integration of tcVISION replication of data from Db2 to Confluent Cloud.

Set up tcVISION access to Confluent

Create an account with Confluent to make a Confluent user ID/password; the user ID is generally your email address. To sign on to Confluent, go to the Confluent Cloud login and enter your user ID:

Confluent Cloud welcome page

Then, enter your password:

Enter your password

When you log in, you’ll be in a Confluent environment called “default”:

Confluent environment called “default”

A Confluent environment is a type of container that holds clusters which in turn hold topics. If you are familiar with messaging systems, Confluent/Kafka will seem familiar. A cluster will need to be created to serve as a target for the data produced by tcVISION. The first attribute to be selected is the type of cluster. Confluent offers three types: Basic, Standard, and Dedicated. For the purposes of this demonstration, Basic will be used. A Basic cluster does not incur charges for simply existing, but does for data transmission and data storage.

Select "Basic cluster" and begin configuration

Select Begin configuration.

Select a cloud provider

Here, a cloud provider can be chosen—AWS, Google Cloud, or Microsoft Azure. For this example, AWS is used. Select Continue and the characteristics of the new cluster are displayed, which we’ve named “tcVISION_cluster_0”:

Cluster characteristics

After entering your payment information (not shown), you can click on the cluster name to launch the cluster overview.

Cluster overview

In order to use Confluent with tcVISION, the user must provide tcVISION with information about the cluster they intend to use. Specifically, the user must supply the hostname and port of the Confluent AWS virtual machine, and the credentials needed to access the cluster.

Confluent refers to the hostname and port as a bootstrap server. There can be multiple bootstrap servers for the purpose of load balancing, but a single server is used for this demonstration.

To find bootstrap server information, click Cluster Settings on the left-hand side:

Cluster settings

The bootstrap server will be listed under “Identification,” and includes both the AWS hostname and the port.

Credentials in Confluent consist of an API Key and an API Secret. These are generated for the cluster and take the place of the Confluent user ID and password used to log in. To generate a key/secret pair, click API Access on the left:

API Keys page

Followed by Create Key:

Select API Key scope

For this example, we use “Global Access” here, so click Next:

API Key and secret

Pay particular attention to the tip about saving the key and secret somewhere safe, because once this panel is exited, there is no way to display the secret again. A descriptive string for this key/secret pair can be filled in. The key or secret text to be copied can be selected, or use the convenient icons at the end of the field to copy. Once the key/secret has been safely stored, check the box that says it has been done, and click Save. You will return to the “API Keys” panel, and the key is now displayed:

API Key displayed

Set up Confluent and define the topic

The last thing to do is define a topic within the cluster. Confluent producers have the capability to define their own topics within a cluster, but this capability can be disabled by a Confluent configuration and is disabled in the configuration used here.

Go back to the cluster Overview:

Cluster Overview

On the left sidebar, click Topics:

Topics

Then Create Topic:

Create a topic

The topic name is filled in (“CONFLUENT_CLOUD_TOPIC1”), overriding the number of partitions from 6 to 1, since that is what the Confluent demo uses. Click Create with defaults:

Cloud topic

A topic is now available, which can be populated with Db2 data.

Set up tcVISION and run a bulk load of Db2 data

tcVISION’s control board is a Windows graphical user interface (GUI) that allows users to configure the replication stream between various database platforms, including the IBM mainframe and Confluent. Using the control board and built-in wizards, users can define the metadata and the mappings between the mainframe and target.

The following sequence of screens shows the steps required to create the tcVISION metadata and scripts for replicating mainframe Db2 z/OS data to Confluent.

Access the tcVISION control board:

tcVISION control board

Log on to Db2 z/OS:

Db2 z/OS

Create metadata that is specific to the input (Db2) and output (Kafka) and the replication definition. In this example, the Db2 table is mapped to the Confluent Cloud Kafka topic using JSON:

Import of structure definitions

The tcVISION metadata wizard asks for the information required for the replication of the mainframe database to Confluent Cloud. For Db2 z/OS, it asks for the mainframe Db2 subsystem:

Source type for structure definition import

Db2 subsystem

tcVISION presents the tables contained in the Db2 z/OS catalog on the mainframe. Select the schemas and associated tables for replication:

Select the schemas and associated tables for replication

Once the required tcVISION wizard-based screens are completed, the tool automatically defines the mappings between the source and target. tcVISION’s metadata import wizard creates a default mapping that handles data type conversion issues, such as EBCDIC to ASCII, Endianness conversion, codepages, redefines data types, and more:

Default mapping

tcVISION data scripts are created through wizards. Data scripts control the replication of data from the source (Db2 z/OS) to the target (Confluent Cloud Kafka JSON). tcVISION bulk load scripts are a type of data script that performs the initial load of the Kafka topic. The following script shows data being accessed directly from the mainframe Db2 z/OS database. Another alternative to reduce MIPS consumption is to read the data from a Db2 image copy.

Data script

Bulk load script running:

Bulk load script running

After execution of the bulk load script, replication statistics of the Db2 bulk load into the Confluent Cloud Kafka topic can be viewed:

Replication statistics of the Db2 bulk load

Now that the topic has been loaded with data from Db2, it can be displayed in Confluent. To do this, navigate to the topics panel again:

Notice that there are now statistics indicating that the tcVISION producer uploaded some data to the topic. On the horizontal menu, switch from “Overview” to “Messages” to display the messages (data records) that the tcVISION bulk load placed in the topic. The display can be filtered in various ways, but for this example, the default is used: “Jump to Offset,” which says “start displaying sequentially from this offset.” Here, an offset of 0 (start at the beginning) is specified, since we just want to verify that the Db2 data uploaded by tcVISION was actually delivered:

Messages (data records) from tcVISION bulk load

Run a change script in tcVISION to show the changes in Confluent

To capture ongoing changes to Db2 in real time, a Db2 z/OS CDC replication script is created.

This script captures the changes on the Db2 z/OS side and applies them into the repository where the output target is Confluent Cloud topic.

Replication script

Target database Confluent Cloud topic

The CDC replication is initiated from the tcVISION control board. The tcVISION control board shows a graphical representation of the replication:

Graphical representation of the replication

The CDC replication is now actively capturing and replicating data changes whenever they occur on the Db2 z/OS side. You can test it by making a change in the Db2 z/OS table:

********************************* Top of Data **********************************
---------+---------+---------+---------+---------+---------+---------+---------+
UPDATE SXE1.TVKFKATB                                                    00010004
SET DEPT = '696969'                                                     00040029
WHERE PERS_ID = 5;                                                      00050004
---------+---------+---------+---------+---------+---------+---------+---------+
DSNE615I NUMBER OF ROWS AFFECTED IS 1                                           
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0                       
---------+---------+---------+---------+---------+---------+---------+---------+
--COMMIT;                                                               00060019
---------+---------+---------+---------+---------+---------+---------+---------+
DSNE617I COMMIT PERFORMED, SQLCODE IS 0                                         
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0                       
---------+---------+---------+---------+---------+---------+---------+---------+
DSNE601I SQL STATEMENTS ASSUMED TO BE BETWEEN COLUMNS 1 AND 72                  
DSNE620I NUMBER OF SQL STATEMENTS PROCESSED IS 1                                
DSNE621I NUMBER OF INPUT RECORDS READ IS 4                                      
DSNE622I NUMBER OF OUTPUT RECORDS WRITTEN IS 17                                 
******************************** Bottom of Data ********************************

This change is processed and replicated by tcVISION. The tcVISION control board shows the statistics highlighting that one update was performed:

Display of extended statistics

Checking in Confluent, the Db2 z/OS change has successfully been propagated to the Confluent Cloud topic:

Db2 z/OS change successfully propagated to Confluent Cloud topic

tcVISION and Confluent are better together

With tcVISION’s groundbreaking Db2 CDC connector and Confluent’s ability to serve as the multi-tenant data hub, this combination creates a very powerful solution to aggregate data from multiple sources and have data published into various Kafka topics. Sourcing events from any kind of Db2 via a connector into Confluent will set data in motion for the entire organization. Simplicity and agility are key elements of the tcVISION and Confluent “better together” story.

__001_TSI_LOGO

Video: tcVISION Demonstration…

In this video, we show a tcVISION overview, then a demonstration of replication of mainframe data on AWS RDS for PostgreSQL:

Contact Treehouse Software for a tcVISION Demo Today!

No matter where you want your mainframe data to go – the Cloud, open systems, or any LUW target – tcVISION from Treehouse Software is your answer.

Just fill out the Treehouse Software tcVISION Demonstration Request Form and a Treehouse representative will contact you to set up a time for your online tcVISION demonstration.

How to Replicate Mainframe Data on Azure SQL with tcVISION from Treehouse Software

Posted on March 2, 2022 by Treehouse Software

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software, Inc.

tcVISION allows enterprise customers to replicate data between mainframe, Cloud, or Hybrid Cloud while maintaining their legacy environments.

We are currently working with a government site to architect bi-directional mainframe data replication on Azure SQL. One of the customer’s requirements is for tcVISION to provide real-time data synchronization of changes on either platform reflected on the other platform (e.g., a change to an Azure SQL table is reflected back on mainframe). This way, the customer can modernize their application on the Azure Cloud without disrupting the existing critical work on their legacy system.

tcVISION_Azure_Architecture

VIDEO: See how tcVISION easily connects mainframe systems to Azure SQL…

The tcVISION solution focuses on changed data capture (CDC) when transferring information between mainframe data sources and modern databases and applications. Through an innovative technology, changes occurring in any mainframe application data are tracked and captured, and then published to a variety of targets.

Azure SQL is a supported target in tcVISION, and in this instructional video, tcVISION is shown synchronizing data in real-time between Db2 on z/OS and Azure SQL:

Contact Treehouse Software Today…

No matter where you want your mainframe data to go – the Cloud, Open Systems, or any LUW target – tcVISION from Treehouse Software is your answer.

Just fill out the Treehouse Software Product Demonstration Request Form and a Treehouse representative will contact you to set up a time for your online tcVISION demonstration.

	TDT: Much more than… on Treetip: Treehouse Software ca…
	So, You’ve Man… on Treetip: Treehouse Software ca…
	Quick Read: AWS Part… on Treetip: Treehouse Software ca…
	Just what is the new… on Treetip: Treehouse Software ca…
	Accelerate insights… on Treetip: Treehouse Software ca…

Treehouse Software's Blog

Welcome to "The Branches" — Treehouse Software's Blog

Tag Archives: Confluent Platform