L10n in Heterogeneous Data Replication

by Wayne Lashley, Chief Business Development Officer for Treehouse Software

Most software vendors whose product markets extend beyond their own home country are familiar with the concepts of “i18n” and “L10n”, which are numeronyms for “internationalization” and “localization” respectively. i18n is the process of making a software product capable of adaptation to different languages and cultures, while L10n is the specific adaptation process for a given local market.

These terms take on special significance in the context of data replication software products—such as Treehouse’s DPSync, which provides real-time replication of mainframe ADABAS data to relational database (RDBMS) targets like DB2, Microsoft SQL Server and Oracle on various platforms. The very purpose of these products is to take data from a source and apply appropriate L10n to make it usable at the target, which is generally dissimilar in various aspects of the technical environment.

Perhaps the simplest form of L10n, having nothing to do with language or locale, is to transform database-specific field/column datatypes. Alphanumeric (A) fields in ADABAS are often mapped to CHAR or VARCHAR datatypes in an RDBMS, which are conceptually quite similar. Packed (P) fields may be expressed in an RDBMS as NUMBER, INTEGER, NUMERIC, DECIMAL, etc., depending on the vendor implementation and desired usages.

When it comes to Binary (B) format, things get tricky.  An array of bits in an ADABAS field can’t usually be mapped directly to a binary representation in an RDBMS column, due to the differences in the way data are represented between the platforms.

Decades ago, when I was earning my stripes as a novice mainframe programmer, the rules seemed simple: 8 bits made up a byte, and characters were expressed in single bytes encoded in EBCDIC.

(True story: During a university Assembler class many years ago, one of my classmates was muttering to himself, and the professor queried him about the subject of the “conversation”. The student replied “Just practicing my EBCDIC, sir!”)

Later on, I learned about that ASCII column of the “CODE TRANSLATION TABLE” in my indispensable System/370 Reference Summary GX20-1850-3, and I realized there was a whole world of computers beyond mainframes.

Image

But in fact things can be much more complex than simply EBCDIC and ASCII. L10n of data has to take into account the multitude of code pages and conventions that customers may use—and the customizations and exceptions to these.

Our European Technical Representative, Hans-Peter Will, has had to become somewhat of an expert in this over the past few years as he has worked with various customers in the Middle East on DPSync implementations.

Take the case of the way the Arabic language is handled in the context of applications at one site. Arabic is normally read right-to-left. But depending on system configuration, Arabic characters in a given field may be stored either left-to-right or right-to-left. Certain characters are represented in one byte, others in two. The cursive appearance of certain characters must be altered if they appear in the middle of a word rather than on an end. And in certain of this customer’s applications, the same screen display may show both Arabic and English. Even on screens where all of the words are in Arabic, and displayed right-to-left, there may be embedded numbers (e.g., telephone numbers) that need to be displayed left-to-right.

Now take all these complexities and factor in different database management systems (ADABAS vs. Oracle) running on different platforms (mainframe vs. Unix), each of which have their own configuration settings that affect the way characters are stored and displayed. Add to that the question of endianness (big-endian vs. little-endian) of the processing architecture.

The first time that Hans-Peter visited the customer in question, Treehouse software engineers had to figure out how to handle all these issues to ensure that ADABAS data would be replicated accurately and appropriately for use in Oracle-based applications. Fortunately, the combination of great product maturity (DPSync and its key underlying components tRelational/DPS having been battle-tested at countless sites over many years) and product extensibility (the ability to plug in complex custom transformations) enabled DPSync to be readily configured to accomplish the task at hand.

Having learned from that initial experience, Hans-Peter is now on familiar ground when assisting new Arabic-language sites implementing DPSync. Recently he was back in the Middle East visiting one of these new customers, and only hours after product installation he was able to confirm the accuracy of the SQL Server representation of data materialized (initially loaded via what is commonly called ETL, Extract-Transform-Load) from ADABAS using DPSync. The customer was also impressed with the speed of the process, both in terms of configuring the materialization (taking advantage of the tRelational schema auto-generation feature) and executing it (using an ADASAV backup as source, avoiding any workload on ADABAS). That customer is now in production with real-time ADABAS-to-SQL Server replication.

What’s your L10n challenge? Contact Treehouse and learn how DPSync and our other products are able to meet it.