TDT: Much more than a mere “data connector” for Snowflake

Posted on April 16, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

____0_TDT_Snowflake_Splash

Over the past few months, we have been rolling out information on Treehouse Dataflow Toolkit (TDT), a state-of-the-art, fully automated offering for data transfer from Kafka pipes to Analytics/ML/AI frameworks. TDT is a set of proprietary microservices that assures highly-available, auto-scalable, and event-driven data transfers to your data science teams’ favorite analytics frameworks, such as Snowflake, Amazon Redshift, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL, all the while adhering to AWS’s and Snowflake’s recommended best practices for massive data loading. Make no mistake, TDT is MUCH more than merely a “connector”.

In this blog, we will focus on how TDT handles data transfers to perhaps the most complex environment: Snowflake. Out of all TDT functions and features, our Snowflake connectivity offers the biggest “value added” to customers, because Snowflake has quickly become a top choice for enterprises looking for a Cloud platform onto which they can mobilize data at near-unlimited scale and performance, and bring advanced ML/AI capabilities.

Snowflake overview video…

Connectivity using Snowflake’s best practices vs. traditional ODBC…

TDT’s innovative Lambda-based (microservices) approach enables faster data flow than any conceivable ODBC-based solution, which is the standard tool used for most “roll your own” approaches, or “we have a connector for that” offerings.

To load massive quantities of data to a target, TDT uses Snowflake’s (hugely scalable) bulk load utilities—not ODBC. It is vital to note that Snowflake is NOT a relational (OLTP) database, so doing CDC transfers to these targets via ODBC (with update, insert, delete transactions) goes directly against “best practices” advice from Snowflake, and would almost assuredly result in unwieldy bottlenecks.

____0_TDT_Snowflake01

TDT loads data into Snowflake’s “delta tables”, which inherently retain the entire history of source data ever since the source-to-target synchronization began (perfect for time-based trend/predictive/prescriptive analytics). Again, TDT adheres to Snowflake’s best practices recommendation for pulling data from S3 for bulk loading massive quantities of data…

____0_TDT_Snowflake02

Publishing both bulk-load and CDC data to a reliable and scalable framework like Kafka allows you to maintain a broad array of options to ultimately feed your legacy data to any number of JSON-friendly ETL tools, target data stores, and data analytics packages (some of which have not even been invented yet!).

The “build vs buy” question is put to rest…

The Snowflake-proprietary target DDL/metadata/resources that TDT automatically produces for the staging of data in Snowflake are of such complexity that it is easy to justify the “buy” option in the “build vs buy” conversations customers have. A decision by an enterprise not to use TDT, but instead to build its own Kafka-to-Snowflake solution, could result in any or all of the following:

accumulation of technical debt
extensive/unpredictable time to production
ongoing resource planning to maintain home-grown technologies
potential vendor lock for maintenance of custom-made technologies designed and developed by consultants
managing a mix of manual and automated functions
tracking cobbled together components created by multiple staff and consultants
limited agility for future customization and innovation
problems adhering to evolving best practices over time
higher costs for future growth/scaling
potential lack of proper security/ongoing security updates
your organization has now become an enterprise software development company, whether or not you intended that, and whether or not you realized that!

Simply put, TDT is a self-contained, turn-key solution that can eliminate months, or years, of research and development time and costs. With TDT, high-speed and massive data movement to Snowflake takes minutes to ramp up.

Download the TDT AWS Partner Solution Brief to share with your team…

DOWNLOAD…

Treehouse Dataflow Toolkit (TDT) is Copyright © 2024 Treehouse Software, Inc. All rights reserved.

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

So, You’ve Managed to Start Streaming Your Legacy Data into Kafka Pipelines… Now What?

Posted on April 3, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

Treehouse_Dataflow_Toolkit_Splash

Treehouse Software is helping customers modernize their valuable enterprise data on Cloud and Hybrid Cloud environments without disrupting the existing critical work on their legacy systems. However, a new strategic imperative has been added to the modernization game—the requirement to utilize today’s advanced Analytics/AI/ML-friendly platforms, such as Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL, where an ever-expanding array of AI/ML tools are available to generate vital insights from the customer’s data. Many of these customers are already using software tools provided by Treehouse, or other vendors to replicate their data into various target data stores, but also more crucially into Kafka pipelines (i.e., Amazon MSK, Confluent, etc.). Kafka is now the top choice for high-speed streaming of massive volumes of mission critical data, providing stable performance under extreme loads. This is especially valuable for enterprises that require up-to-the-second data delivery for use cases that include e-commerce, financial services, logistics, telecommunications, and government IT.

Traditionally, Treehouse customers utilized our data replication technologies to load legacy data into Kafka pipelines, and that was where our involvement generally ended…

____0_Traditional_Mainframe_To_Kafka

However, once Kafka is designated as a target in the customer’s architecture, we have increasingly become involved in two questions: “What now?”, and/or “What is the best mechanism for us to rapidly transfer data from Kafka to advanced analytics platforms?” Our answer: Look no further than Treehouse Software!

Treehouse Software brings a state-of-the-art, fully automated offering for data transfer from Kafka pipes to Analytics/ML/AI frameworks: the Treehouse Dataflow Toolkit (TDT). TDT is a set of proprietary microservices that assures highly-available, auto-scalable, and event-driven data transfers to your data science teams’ favorite analytics frameworks, all the while adhering to AWS’s and Snowflake’s recommended best practices for massive data loading, thus assuring shortest and surest loads. Additionally, TDT provides a frictionless and instant implementation, accelerating your path to deep data insights for optimizing business processes.

Why do AWS’s and Snowflake’s best practices recommend against using ODBC?

Your data science teams need large quantities of the very latest data in near-real-time, and ODBC doesn’t really do the job, offering only single-threaded, difficult to scale pipes. By contrast, TDT’s approach not only keeps things up-to-date faster than any conceivable ODBC-based solution, but the “delta tables” into which it loads data also inherently retain the entire history of source data ever since the source-to-target synchronization began (perfect for time-based trend/predictive/prescriptive analytics). To load massive quantities of data to a target, TDT uses the target vendors’ (massively scalable) bulk load utilities—not ODBC. It’s vital to note that Snowflake and Redshift are NOT relational (OLTP) databases, so doing CDC transfers to these targets via ODBC (with update, insert, delete transactions) goes directly against “best practices” advice from the vendors, and would almost assuredly result in unwieldy bottlenecks.

What if my data is not on a mainframe?

No worries. Treehouse Software’s messaging is primarily mainframe-centric, since that has been our area of expertise and bread-and-butter for over 40 years. However, data movement is data movement, and if your mainframe, or non-mainframe, data is being pumped to a Kafka pipeline, TDT will take it from there. When a data replication tool publishes both bulk-load and CDC data in JSON format to a reliable and scalable framework like Kafka, it sets the stage for TDT to feed legacy data to any number of JSON-friendly ETL tools, target data stores, and the latest (or yet to be invented) data analytics packages. TDT is the turn-key solution for the easiest and fastest implementation of Kafka data transfer…

TDT allows you to quickly ramp up your data analytics game by providing a rapid flow of data fresh off your enterprise data systems.

Download: TDT AWS Partner Solution Brief to share with your team…

DOWNLOAD…

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Quick Read: AWS Partner Solution Brief – Treehouse Dataflow Toolkit

Posted on March 11, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc.

____0_TDT_Generic01

Treehouse Software and AWS are collaborating on several AWS-centric initiatives in the coming months. The focus of these efforts is to market our new Treehouse Dataflow Toolkit (TDT), a set of microservices that provides the turn-key solution for transferring data from Kafka into advanced Analytics/AI/ML-friendly targets, such as Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL. We have worked with an AWS Marketing Manager to create the following TDT AWS Partner Solution Brief downloadable PDF that provides a one-minute overview of TDT, its benefits, and resource links for your team…

DOWNLOAD…

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Just what is the new Treehouse Dataflow Toolkit, and why is it the perfect tool for transferring mainframe data to Cloud-based data analytics and AI/ML frameworks?

Posted on March 5, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

____0_TDT_Splash

Introducing Treehouse Dataflow Toolkit…

Many enterprise customers and Cloud platform partners have been coming to Treehouse Software seeking the know-how and technology that enables state-of-the-art transfer of mainframe data to advanced analytics and ML/AI frameworks. In response to this demand, we have designed the Treehouse Dataflow Toolkit (TDT), a set of proprietary microservices that assures highly-available, auto-scalable, and event-driven data transfers to your data science teams’ favorite analytics frameworks.

These customers either already have, or are in the process of acquiring, software tools that replicate their data into Kafka pipelines (i.e., Amazon MSK, Confluent, etc.). Our new and innovative offering, TDT, provides the turn-key solution for getting this data from Kafka into advanced Analytics/AI/ML-friendly targets, such as Amazon Redshift, Snowflake, Amazon Athena/S3, Amazon S3 Express One Zone Buckets, as well as Amazon Aurora PostgreSQL, all the while adhering to AWS’s and Snowflake’s recommended best practices for massive data loading, thus assuring shortest and surest loads.

Market snapshot…

For years, Snowflake and Redshift have been providing “old school” data analytics functionality, and now they are both ramping up their support for ML and GenAI functionality. They are generating the demand (and are doing a good job of it!).

As we’ve been hearing from our customers, it is not a question of, for example: getting their data to either Snowflake OR PostgreSQL OR Redshift, but instead to ALL OF THEM! Each target environment has its own business justifications and reasoning. Many sites will want to do this—send data not only to various RDBMS targets, but also to various Data Analytics targets. The justification for TDT is in a customer’s desire to ramp up its Data Analytics game, quickly and easily with data fresh off the mainframe; and achieving business goals and results faster and at a much lower cost than building a solution themselves.

How does TDT Work?

When a mainframe data replication tool (provided by one of Treehouse’s partners) publishes both bulk-load and CDC data in JSON format to a reliable and scalable framework like Kafka, it sets the stage for TDT to feed legacy data from Kafka to any number of JSON-friendly ETL tools, target datastores, and data analytics packages (some of which may not even have been invented yet!).

____0_TDT_Generic

We start at the source – the mainframe – where an agent (with a very small footprint) extracts data (in the context of either bulk-load or CDC processing).
The raw data is securely passed from the mainframe by one of our partner’s data replication tools that transforms the data into Unicode/JSON and publishes the results to a Kafka topic (in our example above, a topic in an Amazon MSK cluster).
TDT microservices consume the data from MSK/Kafka and land it in S3 buckets, where TDT’s proprietary crawler technology is used to automatically prepare landing tables, views, and additional infrastructure for various analytics friendly targets. Then the mainframe data is loaded into Redshift, Snowflake, S3, or PostgreSQL (all the while adhering to AWS’s and Snowflake’s recommended “best practices” for massive data loading, thus assuring shortest and surest loads). The inherent reliability and scalability of the entire pipeline infrastructure assures near-real-time synchronization between mainframe sources and the target tables, even with huge bulk-loads or transaction-heavy CDC processing.

History is enterprise GOLD…

TDT not only keeps things up to date faster than any conceivable ODBC-based solution, but the “delta tables” into which it loads data also inherently retain the entire history of source data ever since mainframe-to-target synchronization began. So, for example, after TDT has been syncing a target table for 5 years, a data scientist now has 5 years’ worth of historical data to work with for trend analysis, predictive analytics, prescriptive analytics, ML, etc.

…but you also need the very latest data in near-real-time.

While TDT’s unique “delta-tables” approach offers comprehensive “history” for advanced analytics, the traditional need for up-to-the-second, current snapshots of mainframe datastores is also completely provided for. Adhering once again to target vendors’ “best practices”, self-materializing views are provided to work with current data, not only in the JSON format in which it is stored, but also in fully-structured views which provide the more traditional look and feel of a SQL database.

Competitive differentiators between TDT and the “connectors”

TDT provides massive scalability, thanks to the AWS Lambda infrastructure.
TDT’s delta-table approach means unbeatable throughput (everything is just an INSERT, and it’s all going through the target vendors’ “best-practices” bulk-load utilities).
TDT’s advanced crawler automatically provides JSON-manipulating VIEWs (often awkward to develop in a SQL context) and other target infrastructure.
TDT adheres to AWS’s and Snowflake’s recommended best practices for connectivity.
Other data replication tools that attempt to target Redshift and Snowflake use only generic ODBC connections for data transmission.
- To load massive quantities of data to a target, TDT uses the target vendors’ (massively scalable) bulk load utilities—not ODBC. (Transaction-based ODBC transmissions afford a single, inherently difficult-to-scale pipe.)
- Snowflake and Redshift are NOT relational (OLTP) databases, so doing CDC transfers to these targets via ODBC (with update, insert, delete transactions) goes directly against “best practices” advice from the vendors, and will almost assuredly result in unwieldy bottlenecks.
- For Snowflake’s bulk-load functions to work, the development of additional Snowflake-proprietary objects (in addition to just target tables and views) is required; TDT’s crawler (DDL generator) function for Snowflake automatically generates statements to create these unique objects, along with the standard “create table”, “create view” statements.
Loading hierarchical data in JSON format (to JSON-friendly environments like Snowflake, Athena/S3, Redshift, and PostgreSQL) is the best methodology for many situations, because it avoids having to split hierarchies out into parent/child/grandchild tables, which have to subsequently be pulled back together again via cumbersome SQL queries in order for the data to be effectively worked with. NOTE that one of our customers has become so frustrated with working with “split apart” parent/child/grandchild structures in PostgreSQL that they want the ability to send their hierarchical data in JSON format TO POSTGRESQL (hence our recent addition of TDT support for PostgreSQL as a target).
For users who still want to work with data in structured parent/child/grandchild format (yes, many people still may be reluctant to work with JSON in the context of SQL queries), TDT’s crawler (DDL generator) functions provide user-views that exactly emulate those old-school parent/child/grandchild structures.
Production environment with TDT can be up and running in 2-4 weeks.
TDT’s SaaS model advantages include: ease of implementation, shorter time to move into production, reliable uptime, instantaneous upgrades, pay-as-you-go billing based on usage metrics, and ease of integration with other SaaS offerings.

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Treehouse Software Provides a Fast Path for Mainframe Data to Microsoft Azure’s Data Services

Posted on February 16, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

Customers who want to modernize mainframe data by leveraging Microsoft Azure without disrupting existing critical work on their legacy systems are finding Rocket Data Replicate and Sync (RDRS) from Treehouse Software to be the ideal solution. In addition to replicating data to a variety of Azure database targets, RDRS can stream data (in near-real-time) directly to Event Hubs (in JSON, CSV, or Avro formats), from which customers can either directly consume the data using their own microservices, or transfer the data to Azure Streaming Analytics, which then automatically feeds it to Azure Data Lake Storage, or Azure Cosmos DB, as seen in this high-level overview…

____0_Mainframe_To_Azure

RDRS focuses on changed data capture (CDC) when transferring information between mainframe data sources and Cloud targets. Through an innovative technology, changes occurring in any mainframe application data are tracked and captured, and then published to a variety of RDBMS and other targets.

RDRS utilizes a Windows-based GUI Dashboard, which is ideal for non-mainframe programmers. The RDRS Dashboard acts as a single point of administration, data modeling and mapping, script generation, and monitoring. Comprehensive monitoring and logging of all data movements ensure transparency across all data exchange processes.

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

Treehouse Software and Confluent offer High-Speed Mainframe Dataflow for Cloud-based Advanced Analytics

Posted on January 31, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development at Treehouse Software, Inc.; Dan Vimont, Director of Innovation at Treehouse Software, Inc.; and Ram Dhakne, Staff Solutions Engineer at Confluent

____0_Treehouse_and_Confluent01

The message is clear from our customers—They want to modernize mainframe data on Cloud and Hybrid Cloud environments without disrupting the existing critical work on their legacy systems. They also want to tap into today’s advanced data analytics platforms such as Amazon Redshift, Snowflake, and Amazon Athena/S3, where an ever-expanding array of machine learning and artificial intelligence (ML/AI) tools are available to generate vital insights from their enterprise’s data. Your data science teams are eagerly awaiting the arrival of critical data from your mainframes to supercharge their predictive analytics and generative AI frameworks.

Treehouse Software and Confluent: Two companies providing a reliable and scalable solution…

Confluent Cloud Data Streaming Service

As stated on the Confluent website, “Your team has better things to do than fight Kafka fires.” That is why Confluent Cloud was built as a 10x better, fully managed, and truly Cloud-native service for Apache Kafka, powered by Kora engine. Customers can take data streaming to the next level—sans the Kafka management and operational woes.

Confluent Cloud offers enhanced productivity, improved scalability, minimized downtime, and much more—all while reducing total cost of ownership. Confluent Cloud offers:

Elastic scaling: Scale up and down quickly to meet fluctuating customer demand, without the ops burden that comes with scaling your data infrastructure
Infinite Storage: Enable powerful use cases by never having to worry about Kafka retention limits again, while only paying for the storage used
Built-in Resiliency: Ensure high availability and offload Kafka ops with 99.99% uptime SLA, multi-AZ clusters, and no-touch Kafka patches

Treehouse Software Mainframe CDC Data Replication

Enterprise customers have come to Treehouse Software, because the company brings not only proven mainframe data replication tools, but also deep subject matter expertise in mainframe technologies, as well as the know-how to target relevant offerings especially designed for ingesting data for advanced analytics and ML/AI.

The Rocket Data Replicate and Sync (formerly tcVISION) solution from Treehouse allows customers’ legacy mainframe environment to operate normally while replicating data on Cloud and Hybrid Cloud environments. The technology focuses on changed data capture (CDC) when transferring information between mainframe data sources and Cloud-based databases and applications. Through an innovative set of technologies, changes occurring in any mainframe datastore are tracked and captured, and ultimately published to various Cloud targets. Additionally, the Treehouse Dataflow Toolkit (TDT) set of microservices greatly enhances the architecture’s connectivity to high performance, non-relational, massive parallel processing datastores (Amazon Redshift, Snowflake, Amazon Athena/S3) that are primed to supply the most advanced ML/AI tools to data science teams.

Figure 1: In the longer-term picture, an enterprise can now keep its options open by propagating data to the highly reliable, very scalable Confluent Cloud that can be “subscribed to” by any number of current or yet-to-be-invented ETL toolsets and target datastores.

____0_Confluent01

How does it work?

We start at the source – the mainframe – where an agent (with a very small footprint) extracts data (in the context of either bulk-load or CDC processing).
The raw data is securely passed from the mainframe to Rocket Data Replicate and Sync (RDRS) which speedily transforms mainframe-formatted data into Unicode/JSON and publishes the results to a Kafka topic in Confluent Cloud.
The Treehouse Dataflow Toolkit functions consume the data from Confluent and land it in S3 buckets, where Treehouse’s proprietary crawler technology is used to automatically prepare landing tables, views, and additional infrastructure for various analytics friendly targets. Then the mainframe data is loaded into Redshift, Snowflake, or S3 (all the while adhering to AWS’ and Snowflake’s recommended “best practices” for massive data loading, thus assuring shortest and surest loads). The inherent reliability and scalability of the entire pipeline infrastructure assure near-real-time synchronization between mainframe sources and the target tables.

The very latest data—delivered!

Figure 2: RDRS, Confluent, and TDT work in tandem to easily replicate mainframe data and create target Snowflake resources for a wide variety of end use.

____0_Confluent02

Figure 3: TDT adheres to Snowflake’s recommended “best practices” for bulk loading of mainframe data by using its COPY function to load data from S3

____0_Confluent03c

This Treehouse/Confluent framework allows data in staging tables to be constantly accruing the most current data, ideally suited for data scientists looking to do trend analysis, predictive analytics, ML, and AI work. For business analysts and others who prefer structured data representations of potentially complex hierarchical data, this framework also automatically provides structured user-views, providing the look and feel of a SQL database.

Contact Treehouse Software for a Demo Today!

Contact Treehouse Software today for more information or to schedule a product demonstration.

A Treehouse Software Proof of Concept is the low-risk approach to testing mainframe data replication on Cloud and Hybrid Cloud environments

Posted on January 31, 2024 by Treehouse Software

by Joseph Brady, Director of Business Development / Cloud Alliance Leader at Treehouse Software, Inc.

____0_Mainframe_To_Cloud

Many Treehouse Software customers have discovered the value of saving weeks, or months in their mainframe modernization initiatives by engaging in a Rocket Data Replicate and Sync (RDRS) Proof of Concept (POC) for Mainframe-to-Cloud data replication. Depending on the complexity of the customer’s project, an RDRS POC generally lasts as little as 10 business days after the product is installed and all connectivity is set up between the mainframe and Cloud environments.

How does it work?

Treehouse Software provides documentation beforehand that outlines all of the requirements and agenda for the POC, and Treehouse technicians assist in downloading and installing RDRS.
The customer provides a representative subset of z/OS or z/VSE mainframe data (e.g., Db2, Adabas, VSAM, IMS/DB, CA IDMS, CA DATACOM, etc.), use case, and goals for the POC, and the Treehouse team mentors the customer’s technical team via remote screen sharing sessions.
The application is executed on customer facilities, in a non-production environment, and a limited-scope implementation of RDRS is conducted to prove that the product meets the customer’s desired use case.

By the end of the POC, customers will have replicated mainframe data on their Cloud target, tested out product capabilities, and demonstrated a successful, repeatable data replication process, with documented results. After the POC, the customer has all the connectivity and processes in place to begin setting up the production phase of their mainframe data modernization project. The minimal cost and resources makes an RDRS POC a valuable ROI in the customer’s mainframe modernization journey.

About RDRS…

Many Cloud and Systems Integration partners are recommending RDRS for mainframe data modernization projects. RDRS focuses on changed data capture (CDC) when transferring information between mainframe data sources and Cloud targets. Through an innovative technology, changes occurring in any mainframe application data are tracked and captured, and then published to a variety of RDBMS and other targets.

RDRS utilizes a Windows-based GUI Control Board, which is ideal for non-mainframe programmers. While mainframe experts are required in the design/architecture phase during the POC and occasionally during implementation, the requirement for their involvement is limited. The RDRS Control Board acts as a single point of administration, data modeling and mapping, script generation, and monitoring. Comprehensive monitoring and logging of all data movements ensure transparency across all data exchange processes.

Additionally, once RDRS is up and running, the customer’s legacy mainframe environment can continue as long as needed, while they replicate data – in real time and bi-directionally – on the new Cloud platform. Now the enterprise can quickly take advantage of the latest Cloud services, such as advanced analytics, ML/AI, etc., as well as move data to a variety of highly available and secure databases and data stores.

Contact Treehouse Software Today…

Contact us to discuss how a Treehouse Software POC can accelerate your mainframe Cloud and hybrid Cloud data modernization journey.

Treehouse Software – 40 Years and Still Moving Forward (Part 1)

Posted on January 16, 2023 by Treehouse Software

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software, Inc.

__TSI_LOGO_40th_Transp

Introduction

Many readers know that Treehouse Software has been around since 1983, serving enterprises worldwide with industry-leading software products and outstanding technical support. However, this blog series will dig a little deeper into Treehouse Software’s origins and explore how founder and president, George Szakach blazed a trail from being a systems programmer in the early 60s, to creating and growing his own software company from the early 80s up to the present.

The beginnings… 1960’s. Moon Landing, Flower Power, the Righteous Brothers, and Punched Cards

George is a Vietnam-era veteran and started working with mainframes in 1960 while in the Army.

After programming school in Fort Monmouth, NJ, George was assigned to Fort Huachuca, Arizona where he wrote army related applications on the IBM 709.

Before leaving the army in 1963, George had many job offers. Three years of programming experience was unheard-of back then, so his skillset was very valuable. He was even offered a job by the president of Informatics, working in Houston at the NASA Johnson Space Center to “put a man on the moon.” He declined.

Throughout the rest of the 60s, George worked at Burroughs, Univac, and Leasco. During the 70s through 1982, George worked for Ocean Data Systems, Data General, Optical Recognition Systems, Software AG (his longest stint – 7 years), and Superior Oil.

Punched card…

01_2023_Treehouse_Chronology_Blog_Punched_Card

IBM 709 Computer System…

IBM709

George’s archeological finds from his time at Univac…

____01_Chron_UNIVAC_Card

With all of this foundational mainframe experience combined with his skills honed at Software AG, the seeds were planted for George’s future: take those roots, move to the trees, and build a house…

Coming soon… Part 2: Treehouse Software’s first generations.

About Treehouse Software

Since 1983, Treehouse Software has been serving enterprises worldwide with industry-leading mainframe software products and outstanding technical support. Today, Treehouse Software is a global leader in providing data replication, and integration solutions for the most complex and demanding heterogeneous environments, as well as feature-rich, accelerated-ROI offerings for information delivery, and application modernization.

Contact Treehouse Software

Treehouse Software Salutes Franco Harris

Posted on January 3, 2023 by Treehouse Software

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software, Inc.

With the recent passing of Pittsburgh Steelers great running back and Pro Football Hall of Famer, Franco Harris, we would like to revisit April of 1993, when Treehouse Software held an international consultant’s symposium. The symposium brought together attendees and speakers from many consulting and technology companies, and schools from around the world. Since Treehouse Software is located in the greater Pittsburgh area, company president George Szakach was acquainted with Franco and invited him to deliver a fascinating and entertaining address, where he spoke about his career, several business ventures he was pursuing, as well as his budding interest in computer technology.

____01_Chron_Franco

Franco Harris at Treehouse Software’s Consultant’s Symposium (April 1993)

A few years ago, George reminded Franco about his visit to Treehouse back in 1993. He remembered and they shared some laughs and memories.

We would also like to mention Franco’s well-known sense of community and accessibility in Pittsburgh. Many staff members have met Franco over the years and have fond memories of his friendliness and willingness to spend time engaging in conversations. Those who come in to the Pittsburgh International airpot can see a sculpture depicting Franco’s famous “Immaculate Reception” from 1972. Thousands of people, especially recently, have selfies taken with the sculpture. Franco will be missed by his many friends and the community.

____01_Chron_Franco_Airport_Sculpt02

Franco Harris sculpture at Pittsburgh International Airport.

About Treehouse Software

Since 1982, Treehouse Software has been serving enterprises worldwide with industry-leading mainframe software products and outstanding technical support. Today, Treehouse Software is a global leader in providing data replication, and integration solutions for the most complex and demanding heterogeneous environments, as well as feature-rich, accelerated-ROI offerings for information delivery, and application modernization.

Contact Treehouse Software

Mainframe-to-Cloud Data Replication with tcVISION: Recommendations for Roadmapping Your Deployment on a Cloud Environment

Posted on May 16, 2022 by Treehouse Software

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software

Careful planning must occur for a Mainframe-to-Cloud data modernization project, including how a customer’s desired Cloud environment will look. This blog serves as a general guide for organizations planning to replicate their mainframe data on Cloud platforms using Treehouse Software‘s tcVISION.

A successful move to the Cloud requires a number of post-migration considerations and solutions in order to modernize an application on the Cloud. Some examples of these considerations and solutions include:

Personnel Resource Considerations

Staffing for Mainframe-to-Cloud data replication projects depends on the scale and requirements of your replication project (e.g., bi-directional data replication projects will require more staffing).

Most customers deploy a data replication product with Windows and Linux knowledgeable staff at varying levels of seniority. For the architecture and setup tasks, we recommend senior technical staff to deal with complex requirements around the mainframe, Cloud architecture, networking, security, complex data requirements, and high availability. Less senior staff are effective for the more repeatable deployment tasks such as mapping new database/file deployments. Business staff and system staff are rarely required but can be necessary for more complex deployment tasks. For example, bi-directional replication requires matching keys on both platforms and their input might be required. Other activities would be PII consideration, specifics of data transformation and data verification requirements.

An example of staffing for a very large deployment might be one very part-time project manager, a part-time mainframe DBA/systems programmer, 1-2 staff to setup and deployment the environment and an additional 1-2 staff to manage the existing replication processes.

Environment Considerations

As part of the architecture planning, your team needs to decide how many tiers of deployment are needed for your replication project. Much like with applications, you may want a Dev, QA, and Prod tier. For each of these tiers, you will need to decide the level of separation. For example, you might combine Dev and QA, but not Prod. Many customers will keep production as a distinct environment. Each environment will have its own set of resources, including mainframe managers (possibly on separate LPARs), Could VMs (e.g., EC2) for replication processing, and for managed Cloud RDBMSs (such as AWS RDS).

After the required QA testing, changes are deployed to the production environment. Object promotion test procedures should be detailed and documented, allowing for less experience personnel to work in some testing tasks. Adherence to details, processes, and extended testing is most import when deploying bi-directional replication, due to the high impact of errors and difficult remediation.

Rollout Planning

A data replication product is typically deployed using Agile methods with sprints. This allows for incrementally realized business value. The first phase is typically a planning/architecture phase during which the technical architecture and deployment process are defined. Files for replication are deployed in groups during sprint planning. Initial sprint deployments might be low value file replications to shield the business from any interruptions due to process issues. Once the team is satisfied that the process is effective, replication is working correctly, and data is verified on the source and targets, wide scale deployments can start. The number of files to deploy in a sprint will depend on the customer’s requirements. An example would be to deploy 20 mainframe files per 2–3-week sprint. Technical personnel and business users need to work together to determine which files and deployment order will have the greatest business benefit.

Security

For security, both on-premises and to the major Cloud environments, there are several considerations:

Data will be replicated between a source and target. The data security for PII data must be considered. In addition, rules such as HIPPA, FIPS, etc. will govern specific security requirements.
The path of the data must be considered, whether it is a private path, or if the data transverses the internet. For example, when going from on-premises to the Cloud the major Cloud providers have a VPN option which encrypts data going over the internet. More secure options are also available, such as AWS Direct Connect and Azure ExpressRoute. With these options, the on-premises network is connected directly to the Cloud provider edge location via a telecom provider, and the data goes over a private route rather than the internet.
Additionally, Cloud services such as S3, Azure Blob Storage, and GCP buckets default to route service connections over the internet. Creating a private end point (e.g., AWS PrivateLink) allows for a private network connection within the Cloud provider’s network. Private connections that do not traverse the Internet provide better security and privacy.
Protecting data at rest is important for both the source and target environments. The modern Z/OS mainframe has advanced pervasive and encryption capabilities: https://www.redbooks.ibm.com/redbooks/pdfs/sg248410.pdf. The major Cloud providers all provide extensive at-rest encryption capabilities. Turning on encryption for Cloud Storage and databases is often just a parameter setting and the Cloud provider takes care of the encryption, keys, and certificates automatically.
Protecting data in transit is equally important. There are often multiple transit points to encrypt and protect. First, is the transit from the mainframe to on-premises to the Cloud VM instance. A mainframe data replication product should provide protection employing TLS 1.2 to utilize keys and certificates on both the mainframe and Cloud. Second is from the Cloud VM to the Cloud target database or service. Encryption may be less important since often these services are in a private environment. However, encryption can be achieved as required.

High Availability

During CDC processing, high availability must be maintained in the Cloud environment. The data replication product should keep track of processing position. The first can be a Restart file, which keeps track of mainframe log position, target processing position, and uncommitted transactions. The second can be a container stored on Linux or Windows to store committed unprocessed transactions. Both need to be on highly available storage with a preference for storage across Availability Zones (AZs), such as Elastic File System (Amazon EFS) or Windows File Server (FSx).
The Amazon EC2 instance (or other Cloud instance) can be part of an Auto Scaling Group spread across AZs with minimum and maximum of one Amazon EC2 instance.
Upon failure, the replacement Amazon EC2 instance of the replication product’s administrator function is launched and communicates its IP address to the product’s mainframe administrator function. The mainframe then starts communication with the replacement Amazon EC2 instance.
Once the Amazon EC2 instance is restarted, it continues processing at the next logical restart point, using a combination of the LUW and Restart files.
For production workloads, Treehouse Software recommends turning on Multi-AZ target and metadata databases.

Scalable Storage

With scalable storage provided on most Cloud platforms, the customer pays only for what is used. The data replication product should require file-based storage for its files that can grow in size if target processing stops for an unexpected reason. For example, Amazon EFS, and Amazon FSx provide a serverless elastic file system that lets the customer share file data without provisioning or managing storage.

Analytics

All top Cloud platform providers give customers the broadest and deepest portfolio of purpose-built analytics services optimized for all unique analytics use cases. Cloud analytics services allow customers to analyze data on demand, and helps streamline the business intelligence process of gathering, integrating, analyzing, and presenting insights to enhance business decision making.
A data replication product should replicate data to several data sources that can easily be captured by various Cloud based analytics services. For example, mainframe database data can be replicated to the various Cloud ‘buckets’ in JSON, CSV, or AVRO format, which allows for consumption by the various Cloud analytic services. Bucket types include AWS S3, Azure BLOB Data, Azure Data Lake Storage, and GCP Cloud storage. Several other Cloud analytics type services also support targets including Kafka, Elasticsearch, HADOOP, and AWS Kinesis.
Kafka has become a common target and can serve as a central data repository. Most customers target Kafka using JSON formatted replicated mainframe data. Kafka can be installed on-premises, or using a managed Kafka service, such as the Confluent Cloud, AWS Managed Kafka, or the Azure Event Hub.

Monitoring

Monitoring is a critical part of any data replication process. There are several levels of monitoring at various points in a data replication project. For example, each node of the replication including the mainframe, network communication, Cloud VM instances (such as EC2) and the target Cloud database service all can require a level of monitoring. The monitoring process will also be different in development or QA vs. a full production deployment.
A data replication product should also have its own monitoring features. One important area to measure is performance and it is important to determine where any performance bottleneck is located. Sometimes it could be the mainframe process, the network, the transformation computation process, or the target database. A performance monitor helps to detect where the bottleneck is occurring and then the customer can drill down into specifics. For example, if the bottleneck is the input data, areas to examine are the mainframe replication product component performance, or the network connection. The next step is to monitor the area where the bottleneck is occurring using the data replication product’s statistics, mainframe monitoring tools, or Cloud monitoring such as AWS CloudWatch.
A data replication product should also allow the customer to monitor processing functions during the replication process. The data replication product should also have extensive logs and traces that allow for detailed monitoring of the data replication process and produce detailed replication statistics that include a numeric breakdown of processing statistics by table, type of operation (insert, update delete), and where these operations occurred (mainframe, or target database).
CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing customers with a unified view of AWS resources, applications, and services that run on AWS, and on-premises servers. You can use CloudWatch to set high resolution alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, discover insights to optimize your applications, and ensure they are running smoothly.
Some customers are satisfied with a basic monitoring that polls every five minutes, while others need more detailed monitoring and can choose polls that occur every minute.
CloudWatch allows customers to record metrics for EC2 and other Amazon Cloud Services and display them in a graph on a monitoring dashboard. This provides visual notifications of what is going on, such as CPU per server, query time, number of transactions, and network usage.
Given the dynamic nature of AWS resources, proactive measures including the dynamic re-sizing of infrastructure resources can be automatically initiated. Amazon CloudWatch alarms can be sent to the customer, such as a warning that CPU usage is too high, and as a result, an auto scale trigger can be set up to launch another EC2 instance to address the load. Additionally, customers can set alarms to recover, reboot, or shut down EC2 instances if something out of the ordinary happens.

Disaster Recovery

IT disasters such as data center failures, or cyber attacks can not only disrupt business, but also cause data loss, and impact revenue. Most Cloud platforms offer disaster recovery solutions that minimize downtime and data loss by providing extremely fast recovery of physical, virtual, and Cloud-based servers.
A disaster recovery solution must continuously replicate machines (including operating system, system state configuration, databases, applications, and files) into a low-cost staging area in a target Cloud account and preferred region.
Unlike snapshot-based solutions that update target locations at distinct, infrequent intervals, a Cloud based disaster recovery solution should provide continuous and asynchronous replication.
Consult with your Cloud platform provider to make sure you are adhering to their respective best practices.
Example: https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-workloads-on-aws/introduction.html

Artificial Intelligence and Machine Learning

Many organizations lack the internal resources to support AI and machine learning initiatives, but fortunately the leading Cloud platforms offer broad sets of machine learning services that put machine learning in the hands of every developer and data scientist. For example, AWS offers SageMaker, GCP has AI Platform, and Microsoft Azure provides Azure AI.
Applications that are good candidates for AI or ML are those that need to determine and assign meaning to patterns (e.g., systems used in factories that govern product quality using image recognition and automation, or fraud detection programs in financial organizations that examine transaction data and patterns).

The list goes on…

Treehouse Software and our Cloud platform and migration partners can advise and assist customers in designing their roadmaps into the future, taking advantage of the most advanced technologies in the world.
Successful customer goals are top priority for all of us, and we can continue to work with our customers on a consulting basis even after they are in production.

Of course, each project will have unique environments, goals, and desired use cases. It is important that specific use cases are determined and documented prior to the start of a project and a tcVISION POC. This planning will allow the Treehouse Software team and the customer develop a more accurate project timeline, have the required resources available, and realize a successful project.

Your Mainframe-to-Cloud Data Migration Partner…

Treehouse Software is a global technology company and Technology Partner with AWS, Google Cloud, and Microsoft. The company assists organizations with migrating critical workloads of mainframe data to the Cloud.

Further reading on tcVISION from AWS, Google Cloud, and Confluent:

More About tcVISION from Treehouse Software…

tcVISION supports a vast array of integration scenarios throughout the enterprise, providing easy and fast data migration for mainframe application modernization projects. This innovative technology offers comprehensive abilities to identify and capture changes occurring in mainframe and relational databases, then publish the required information to an impressive variety of targets, both Cloud and on-premises.

tcVISION acquires data in bulk or via CDC methods from virtually any IBM mainframe data source (Software AG Adabas, IBM Db2, IBM VSAM, CA IDMS, CA Datacom, and sequential files), and transform and deliver to a wide array of Cloud and Open Systems targets, including AWS, Google Cloud, Microsoft Azure, Confluent, Kafka, PostgreSQL, MongoDB, etc. In addition, tcVISION can extract and replicate data from a variety of non-mainframe sources, including Adabas LUW, Oracle Database, Microsoft SQL Server, IBM Db2 LUW and Db2 BLU, IBM Informix, and PostgreSQL.

Contact Treehouse Software for a tcVISION Demo Today…

Simply fill out our tcVISION Demonstration Request Form and a Treehouse representative will be contacting you to set up a time for your requested demonstration.

	TDT: Much more than… on Treetip: Treehouse Software ca…
	So, You’ve Man… on Treetip: Treehouse Software ca…
	Quick Read: AWS Part… on Treetip: Treehouse Software ca…
	Just what is the new… on Treetip: Treehouse Software ca…
	Accelerate insights… on Treetip: Treehouse Software ca…

Treehouse Software's Blog

Welcome to "The Branches" — Treehouse Software's Blog

Tag Archives: Confluent Cloud