L10n in Heterogeneous Data Replication

by Wayne Lashley, Chief Business Development Officer for Treehouse Software

Most software vendors whose product markets extend beyond their own home country are familiar with the concepts of “i18n” and “L10n”, which are numeronyms for “internationalization” and “localization” respectively. i18n is the process of making a software product capable of adaptation to different languages and cultures, while L10n is the specific adaptation process for a given local market.

These terms take on special significance in the context of data replication software products—such as Treehouse’s DPSync, which provides real-time replication of mainframe ADABAS data to relational database (RDBMS) targets like DB2, Microsoft SQL Server and Oracle on various platforms. The very purpose of these products is to take data from a source and apply appropriate L10n to make it usable at the target, which is generally dissimilar in various aspects of the technical environment.

Perhaps the simplest form of L10n, having nothing to do with language or locale, is to transform database-specific field/column datatypes. Alphanumeric (A) fields in ADABAS are often mapped to CHAR or VARCHAR datatypes in an RDBMS, which are conceptually quite similar. Packed (P) fields may be expressed in an RDBMS as NUMBER, INTEGER, NUMERIC, DECIMAL, etc., depending on the vendor implementation and desired usages.

When it comes to Binary (B) format, things get tricky.  An array of bits in an ADABAS field can’t usually be mapped directly to a binary representation in an RDBMS column, due to the differences in the way data are represented between the platforms.

Decades ago, when I was earning my stripes as a novice mainframe programmer, the rules seemed simple: 8 bits made up a byte, and characters were expressed in single bytes encoded in EBCDIC.

(True story: During a university Assembler class many years ago, one of my classmates was muttering to himself, and the professor queried him about the subject of the “conversation”. The student replied “Just practicing my EBCDIC, sir!”)

Later on, I learned about that ASCII column of the “CODE TRANSLATION TABLE” in my indispensable System/370 Reference Summary GX20-1850-3, and I realized there was a whole world of computers beyond mainframes.

Image

But in fact things can be much more complex than simply EBCDIC and ASCII. L10n of data has to take into account the multitude of code pages and conventions that customers may use—and the customizations and exceptions to these.

Our European Technical Representative, Hans-Peter Will, has had to become somewhat of an expert in this over the past few years as he has worked with various customers in the Middle East on DPSync implementations.

Take the case of the way the Arabic language is handled in the context of applications at one site. Arabic is normally read right-to-left. But depending on system configuration, Arabic characters in a given field may be stored either left-to-right or right-to-left. Certain characters are represented in one byte, others in two. The cursive appearance of certain characters must be altered if they appear in the middle of a word rather than on an end. And in certain of this customer’s applications, the same screen display may show both Arabic and English. Even on screens where all of the words are in Arabic, and displayed right-to-left, there may be embedded numbers (e.g., telephone numbers) that need to be displayed left-to-right.

Now take all these complexities and factor in different database management systems (ADABAS vs. Oracle) running on different platforms (mainframe vs. Unix), each of which have their own configuration settings that affect the way characters are stored and displayed. Add to that the question of endianness (big-endian vs. little-endian) of the processing architecture.

The first time that Hans-Peter visited the customer in question, Treehouse software engineers had to figure out how to handle all these issues to ensure that ADABAS data would be replicated accurately and appropriately for use in Oracle-based applications. Fortunately, the combination of great product maturity (DPSync and its key underlying components tRelational/DPS having been battle-tested at countless sites over many years) and product extensibility (the ability to plug in complex custom transformations) enabled DPSync to be readily configured to accomplish the task at hand.

Having learned from that initial experience, Hans-Peter is now on familiar ground when assisting new Arabic-language sites implementing DPSync. Recently he was back in the Middle East visiting one of these new customers, and only hours after product installation he was able to confirm the accuracy of the SQL Server representation of data materialized (initially loaded via what is commonly called ETL, Extract-Transform-Load) from ADABAS using DPSync. The customer was also impressed with the speed of the process, both in terms of configuring the materialization (taking advantage of the tRelational schema auto-generation feature) and executing it (using an ADASAV backup as source, avoiding any workload on ADABAS). That customer is now in production with real-time ADABAS-to-SQL Server replication.

What’s your L10n challenge? Contact Treehouse and learn how DPSync and our other products are able to meet it.

TREEHOUSE CUSTOMER UPDATE:

 

 

Image

by Chris Rudolph, Senior Technical Representative for Treehouse Software and Joseph Brady, Marketing and Documentation Manager for Treehouse Software

This is a follow-up to our recent Treehouse Software Blog entry “Treehouse Software is Setting Sights on Many New Data Replication Projects”, in which we described a typical customer visit to implement data replication.

Treehouse representatives were on-site at a state government agency to configure tcVISION and set up bulk transfer and change data capture, as well as train the State employees on using and managing tcVISION. In a subsequent discussion with our contacts at the site, they reported that their deadline for delivering a reporting database in Microsoft SQL Server replicated from 63 ADABAS files has been met. They also happily noted that by using tcVISION, the bulk transfer of 60 million ADABAS records into SQL Server completed in only 20 minutes.

We are very pleased to have yet another satisfied customer benefitting from one of Treehouse Software’s mature and proven enterprise software solutions.

tcVISION provides real-time data replication through change data capture, and allows easy and fast data migration for mainframe application modernization projects. Enterprises looking for a product that enables bi-directional heterogeneous data replication between mainframe, Linux, Unix, and Windows platforms need look no further than to tcVISION from Treehouse Software.

To learn more about tcVISION, or to request a demonstration, contact Treehouse Software today.