by Dan Vimont, Cloud Solutions Architect at Treehouse Software, Inc.
Treehouse Software customers are using tcVISION to enable mission-critical mainframe-to-AWS data replication pipelines. Some of these production pipelines are providing vital near-real-time synchronization between source and target, and thus can’t afford any significant downtime in the event of failure. So it’s only natural that a number of our customers have been asking for advice in setting up a high availability configuration for their tcVISION components that run on AWS EC2 instances. The High Availability Framework discussed here provides for a Failover EC2 instance to automatically pick up tcVISION processing should the Primary instance (running in another Availability Zone) go down.
The Core Components: Primary Instance & Failover Instance
The core components of a tcVISION high availability framework consist of two EC2 instances running in different Availability Zones: a Primary EC2 instance and a Failover EC2 instance. Both identically-configured EC2 instances are attached to a shared working-storage file system (either an EFS or FSx volume), which allows the Failover instance to seamlessly and quickly pick up tcVISION processing should the Primary instance suddenly become unavailable.
Use a Step Function to Automate the Failover Process
In the event of failure of the Primary instance, the recommended framework calls for automatic triggering of a Step Function for reliable failover processing, with steps that include the following:
- verify that the Primary instance is unavailable (The tcVISION service cannot be active on both instances simultaneously, so this verification is vital.)
- redirect all network traffic from the Primary instance to the Failover instance (via Route 53)
- start tcVISION processing on the Failover instance
When Ready, Use a Step Function to Automate the Restoration Process
After operations personnel have completed recovery of the Primary EC2 instance, another Step Function may be manually triggered to reliably transfer tcVISION processing back to the Primary instance.
Many More Details are Available Upon Request to Treehouse Customers
Full details regarding our recommended High Availability Framework for tcVISION are available upon request to Treehouse customers. AWS services utilized in the complete recommended framework include Step Functions, Lambda Functions, EventBridge rules, CloudWatch alarms, SNS topics, a Route 53 Private Hosted Zone, and more. The following diagram is a partial visual inventory of the recommended framework components.
Interested in seeing a live, online demo of tcVISION?
Just fill out the Treehouse Software tcVISION Demonstration Request Form and a Treehouse representative will contact you to set up a time for your online tcVISION demonstration.