Initial Data Extract and Load

Oracle to Snowflake Migration Series, part 3

We have many options for extracting data from Oracle into Snowflake. Some approaches that we can use for migrating from Oracle include:

  • Legacy tools such as UTL_FILE packages, SQL*Plus, or the Export utility that have been available from almost the very beginning of Oracle. But these tools do not perform very well, especially on huge amounts of data, because they extract data into text files that must be then loaded into Snowflake.
  • Creating external tables in Oracle is another option, although you are still limited to text file formats that point to a local storage location, which must then be loaded into Snowflake.
  • Migration tools that are available on the market, but these incur additional licensing cost, installation and configuration effort, and training on how to use the tool.
  • Some approaches take the road of using replication capabilities to copy data into an object store, such as AWS S3 buckets, for example. We then create a storage integration in Snowflake which holds the IAM entity that is authorized to access the AWS S3 location. Using a bucket event notification and SQS queues, data is provided in near real time to Snowflake pipes, which in turn copy data into target tables.

Lift’n’shift with Antfarm

For our projects, we use our in-house tool antFarm, so none of the above options are needed. antFarm is a lift ‘n’ shift data migration solution that supports both cloud and on-premises data sources. It automatically: 

  • Retrieves the list of tables and their definitions from the data source catalogue.
  • Creates a metadata repository where the definitions of application sources, table lists, optimization rules, such as partitions, etc., are stored. With this information the Queen Ant can successfully manage working ants.
  • Creates target tables according to the source table definitions.
  • Converts data types based on the source and target databases, if necessary.
  • Generates SQL queries to lift ‘n’ shift the data.

We use the above to extract the data and copy it across. Since Snowflake was designed for the cloud, it is also very well optimized for loading data into the cloud. AntFarm works in parallel processes and has proven to be very efficient in all of our data migration projects. It moves the data from the source to the Snowflake stage so that we can continue to load into target tables from there. AntFarm uses the usual PUT and COPY commands, which are commonly used in Snowflake for such tasks.

Of course, we don’t always just load data initially, often we need to load data incrementally after the initial load as well. For this purpose, we design an appropriate CDC (Change Data Capture) strategy to approach incremental loading, whether it’s with timestamps, partitioning, triggers, ORA_ROWSCN or some other way to identify incremental data that has appeared since the last load. We can also use Snowflake streams as a convenient way to detect changes for the CDC process where we allow Snowflake to detect changes and keep track of what has already been loaded and what will be loaded in the next batch.

antFarm architecture

Oracle to Snowflake Migration Ebook

Download this step by step tutorial written by Snowflake Data Superhero.

Resolving Data Quality issues 

It would be easiest if we could just take the data as-is and copy it across, but there may be some technical issues that you need to deal with first. 

Here are some examples we’ve encountered in our Oracle to Snowflake migration projects:

  • Numeric data types without precision. A common issue that we encounter when migrating from Oracle to Snowflake is values with the NUMBER data type that don’t specify scale and precision as mentioned earlier. Due to this we must be careful when copying NUMBER data. We must either specify a suitable precision in Snowflake or – if possible – define the correct scale and precision in the NUMBER data type already in Oracle before migrating. 
  • Invalid dates. In the source system, we have encountered dates with the year 0000. This was supported in Oracle in the past but is not supported in Snowflake. This kind of data is impossible to be just copied across, so a different solution is required. Ideally, users would correct this in the source system, but if this isn’t possible, they should specify a default value during migration, such as, for example, to replace year 0 with 1.
  • Infinity values. Another interesting problem that arises in Oracle source systems are infinity values in numeric columns. Snowflake supports infinity values only in float data types, but not in fixed-point data types. When migrating, we can either convert the data type to float or replace it with a default high value when writing into Snowflake. 

Maja Ferle
Senior Consultant and Snowflake Data Superhero