Step 1:
Parsing the source code
and building the object hierarchy

First, the entire source code is parsed and broken-down into objects. Then, an object model tree is built dynamically from the lowest-level representation of the input using the contextual patterns from the Parsaur definition library.

The Parsaur logic is built on the lowest element, a primitive byte object, so we have complete flexibility in adding new context-aware patterns and can support complex requirements. In short, this means that we have defined what a letter is, for example, what “s” or “S” is. And based on that we have defined various constructs, such as “SET” or “SELECT”, and the whole hierarchy of objects, starting from the one that knows that SELECT is a command, to the final “Query” object that possesses the logic of a SELECT statement.

The Parsaur library and transformation engine are separate, so data engineers can easily extend the content with new patterns. The library has a graphical view of the definitions and is managed by using the high-level meta-language MorphQL. Since the entire definition library is in a global GIT repository, best practices can be shared between projects.

Step 2:
Transforming the parsed structure

Since the context is embedded in the object model tree, any source object can be easily transformed into any target language and syntax using the conversion rules from the Parsaur translation library. The same high-level markup language MorphQL is used to define the conversion rules. Again, to give data engineers the ability to easily extend the rules, as there is a part of the code in each project that is a bit different and more complex.

The conversion rules can be as simple as converting the data type in the SQL DLL statement from Microsoft BIT to Oracle NUMBER(1) or removing parts of the code that are no longer needed (e.g., the distribution key from the Netezza DDL statement) to more complex contextual modifications, such as converting the Netezza stored procedure to the Snowflake stored procedure or creating a completely different structured output code in the case of the Oracle UPDATE capability.