Does your data science team want to accelerate insights and bring advanced ML/AI capabilities to your mainframe data with Amazon Redshift? Sure they do—and Treehouse Software enables that…

by Joseph Brady, Director of Business Development at Treehouse Software, Inc. and Dan Vimont, Director of Innovation at Treehouse Software, Inc.

We are beginning to see a pleasant and welcomed trend with Treehouse customers who are looking to modernize their valuable mainframe legacy data on the Cloud—they are including their data science teams in the important planning phase of architecting new Cloud environments and targets. This is especially vital for customers who want to incorporate advanced analytics and ML/AI in their strategic data usage plans on the Cloud. Who can contribute better understandings of ultimate data usage than your resident data scientists?

____0_Amazon_Redshift

We have heard from many of these data scientists that a primary item on their “wish lists” is for a fully managed, AI powered, massively parallel processing (MPP) architecture to extract maximum value and insights. They specifically mention Amazon Redshift as the Cloud data warehouse (which is much more than a data warehouse) of choice for driving digitization across the enterprise, as well as help personalizing customer experiences. Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the highest performance at any scale. To this desire/question, we can answer with a resounding, “Yes, Treehouse Software has got you covered with Redshift connectivity!”.

The Treehouse Software solution…

Enterprise customers have come to Treehouse Software, because we bring not only proven mainframe data replication tools, but deep subject matter expertise in mainframe technologies, as well as the know-how to target relevant AWS offerings, such as Redshift, S3 (including S3 Express One Zone – see our recent blog on S3 Express One Zone), etc.

The Rocket Data Replicate and Sync (RDRS) solution allows customers’ legacy mainframe environment to operate normally while replicating data on AWS. The technology focuses on changed data capture (CDC) when transferring information between mainframe data sources and Cloud-based databases and applications. Through an innovative set of technologies, changes occurring in any mainframe datastore are tracked and captured, and ultimately published to Redshift.

____0_Mainframe_To_RedshiftHow does it work?

  1. We start at the source – the mainframe – where an agent (with a very small footprint) extracts data (in the context of either bulk-load or CDC processing).
  2. The raw data is securely passed from the mainframe to RDRS, which speedily transforms mainframe-formatted data into Unicode/JSON and publishes the results to a Kafka topic.
  3. Our efficient, autoscaling microservices take it from there. Treehouse Dataflow Toolkit functions consume the data from Kafka and land it in S3 buckets, where Treehouse’s proprietary crawler technology is used to automatically prepare landing tables, views, and additional infrastructure in Redshift.  Thenthe mainframe data is loaded into Redshift (all the while adhering to AWS’ recommended “best practices” for massive data loading, thus assuring shortest and surest loads).  The inherent reliability and scalability of the entire pipeline infrastructure assure near-real-time synchronization between mainframe sources and Redshift target tables.

Redshift tables and views: something for everybody

Within this framework, the Redshift staging tables (often referred to as “delta tables”) are constantly accruing historical data, ideally suited for data scientists looking to do trend analysis, predictive analytics, ML, and AI work.  For business analysts and others who prefer structured data representations of potentially complex hierarchical data, the Treehouse framework also automatically provides structured user-views, providing the look and feel of a SQL database.

…as innovations move faster along the timeline, keep your options open!

Publishing both bulk-load and CDC data to a reliable and scalable framework like Kafka allows you to maintain a broad array of options to ultimately feed your legacy data to any number of JSON-friendly ETL tools, target datastores, and data analytics packages (some of which may not even have been invented yet!).  In addition to Redshift, the Treehouse Dataflow Toolkit also currently targets Snowflake, Amazon DynamoDB, and Amazon Athena/S3.

Video – Introduction to Data Warehousing on AWS with Amazon Redshift…


__TSI_LOGO

Contact Treehouse Software today to discuss your project, or to schedule a demo of our Mainframe-to-AWS real-time and bi-directional data replication solution. 

Treetip: Treehouse Software can help enterprise mainframe customers accelerate their data analytics, machine learning, and AI journeys by targeting the new Amazon S3 Express One Zone

by Joseph Brady, Director of Business Development and Cloud Alliance Leader at Treehouse Software, Inc.

Treehouse Software specializes in helping enterprise customers with Mainframe-to-Cloud, Multi-Cloud, and Hybrid Cloud data modernization projects. Many times, our customers not only discuss strategies for replicating their mainframe data, but also their plans for what they want to do with that data on the Cloud side.  This makes it important to our team to stay current on the latest Cloud offerings that can benefit our customers’ enterprise modernization planning. Consequently, a very exciting announcement caught our attention during the 2023 AWS re:Invent conference—the general availability of a new type of S3 storage service referred to as Amazon S3 Express One Zone Storage Class

For those unfamiliar, Amazon S3 (“simple storage service”) is the basic file storage service of AWS, and as such it forms a foundational pillar of the entire AWS world. Amazon S3 Express One Zone is a new type of S3 bucket called a “directory bucket”, which is purpose-built to deliver consistent, single-digit millisecond data access for an enterprise’s most frequently used data and latency-sensitive applications. The new S3 directory buckets allow customers to store data in a single Availability Zone (AZ) that they specifically select, as opposed to the default of three AZs for standard S3. This eliminates the latency associated with spreading data across multiple AZs, providing applications with lower-latency storage. S3 directory buckets also follow a different request scaling model compared to traditional buckets, and their authentication is based on sessions rather than on a per-request basis. Bottom line… reduction in compute time = greater cost reduction.

S3 Express One Zone is ideally suited for services such as Amazon SageMaker Model TrainingAmazon AthenaAmazon EMR, and AWS Glue Data Catalog to accelerate Machine Learning (ML) and interactive analytics workloads. With S3 Express One Zone, storage automatically scales up or down based on consumption and need, and customers no longer need to manage multiple storage systems for low-latency workloads.

So, why is S3 Express One Zone important to Treehouse mainframe modernization customers?

____0_Mainframe_To_S3ExpressOneZone

Amazon S3 Express One Zone just made the Amazon S3 targeting in the Treehouse Dataflow Toolkit (TDT) potentially much more potent and valuable to our enterprise mainframe customers.  When an enterprise uses TDT to land their mission critical data in Express One Zone flavored Athena/S3 buckets, it becomes more directly accessible and manipulable by the various AWS ML and AI tools. In short, if customers choose, Express One Zone Athena/S3 becomes an intermediate data store for big data processing workloads and advanced analytics.

So, when we are asked, “What should Treehouse Software be doing to respond to the burgeoning interest in ML, Generative AI, etc.?”, the answer is — We are doing exactly what we need to be doing.  AI and ML frameworks are the newest incentive for people to use RDRS (Rocket Data Replicate and Sync — formerly called tcVISION) and TDT from Treehouse Software to replicate their mainframe data on advanced data analytics frameworks, or possibly into super-charged S3 Express One Zone buckets.  

Video – Deep Dive Introduction to Amazon S3 Express One Zone Storage Class:


__TSI_LOGO

Contact Treehouse Software today to discuss your project, or to schedule a demo of our Mainframe-to-AWS real-time and bi-directional data replication solution.