Create Data Structures

Oracle to Snowflake Migration Series, part 2

Before we can perform an initial data extract and load, we have to create data structures for data warehouse tables and other objects that will be migrated. These may include tables, views, sequences, functions, and so on. To create all of these objects in Snowflake we will run DDL scripts, just like in Oracle.

Snowflake supports standard ANSI SQL as well as analytic extensions such as window functions. This means that most database objects and code will migrate from Oracle to Snowflake seamlessly. When preparing DDL statements to create database objects, there are a few considerations to keep in mind as explained below.

Prepare DDL Scripts for Migration

If you already have DDL scripts to create objects in Oracle, you can use the same scripts to create objects in Snowflake. You will probably have to make a few adjustments of the data types to Snowflake-optimized data types.

If you don’t already have DDL scripts to create objects in Oracle, or the script that you have is outdated, you might use a data modelling tool such as, for example, Oracle SQL Developer Data Modeler or SqlDBM (why we really like the latter read here) or another data modelling tool, to reverse engineer the existing Oracle database and then create DDL scripts using ANSI SQL. Most of the generated scripts will run in Snowflake, although some modification may be required due to Oracle specifics that don’t translate into ANSI SQL.

If you don’t have DDL scripts and you also don’t have access to a data modelling tool, you can generate the DDL yourself by reading metadata in the Oracle data dictionary tables. You only need the object definitions without any storage-related information. In case you need to do any data type conversion, you can do this already while building the DDL scripts.

Here are some additional tips to be aware of while building DDL scripts for creating tables and other objects:

  • Object Naming. Snowflake has different object naming logic, for example Database.Schema.Object as compared to Oracle where the naming logic is Schema.Object@dblink. The @dblink notation is used in Oracle to provide the database name in cross-database queries.
  • Synonyms. Snowflake does not provide synonyms which are frequently used in Oracle. One of the reasons to use synonyms in Oracle is to hide the database link. This will not work in Snowflake so you will have to write out the full object name including the database name.
  • Storage Clauses. Snowflake takes care of everything related to the physical storage of data and as a user you don’t need to define indexes, partitions, or storage clauses such as, for example, the tablespace name. You need only the basic DDL, such as CREATE TABLE, CREATE VIEW and so on. Snowflake stores data in micro-partitions without the use of data files, table partitions or sub-partitions. Storage grows and shrinks as you add or delete data without any intervention from users. This also means that Snowflake has no concept of tablespace, therefore you can execute your DDL statements for creating tables without storage clauses.
  • Indexes. Traditional relational databases, such as Oracle, use indexes for better performance. Snowflake with its MPP architecture performs parallel processing behind the scenes so there is no need to provide a hint for parallel processing like you would in Oracle. Because Snowflake doesn’t use indexes, you don’t have to include your indexes in the DDL scripts.
  • Constraints. Oracle enforces primary key and foreign key constraints. While Snowflake supports the syntax to define primary keys and foreign keys, they are not enforced. You may need to write additional code to be executed if you want to check that the data is consistent with respect to the constraints. The primary key and foreign key syntax is there in Snowflake mainly to help you recognize the constraints when you reverse engineer a data model.
  • Sequences. Snowflake supports sequences so these will migrate seamlessly. Remember that you will have to set up initial values for the sequences after migrating if you wish to continue using the sequence starting from the next value. In Snowflake, just like in Oracle, you can use sequences as default values on table columns, and the identity column is supported as well.
  • Materialized views. Materialized views are supported in Snowflake. They work similarly as in Oracle and have similar limitations, for example, you can’t create a materialized view if the SQL query uses aggregations, certain functions, or if the query is complicated. Similar rules apply in Snowflake where you can have materialized views only based on simple queries.

Oracle to Snowflake Migration Ebook

Download this step by step tutorial written by Snowflake Data Superhero.

Mapping Data Types

As previously mentioned, Snowflake supports ANSI SQL, but there are certain specifics. Snowflake supports most basic SQL data types for use in columns, local variables, expressions, parameters and any other relevant locations. Data types are automatically coerced whenever necessary and possible.

A comprehensive list of data types supported by Snowflake is available in the Snowflake documentation. Most data types will convert seamlessly from Oracle to Snowflake. However, you may have to make some adjustments:

  • Numeric Data Types. Snowflake supports all common numeric data types to store numeric values, including integer, fixed point, and floating point. When converting numeric data types from Oracle, one important aspect is that Oracle allows the data type NUMBER to be defined without specifying a scale and a precision. In such cases, when a value is provided and the precision is not specified, Oracle will store the value as given, preserving any decimal places. In Snowflake, if scale and precision aren’t defined, then a precision of 0 is assumed. The consequence is that when converting data stored in a column of data type NUMBER without scale or precision in Oracle, the numbers will be stored as whole numbers without decimals in Snowflake. Some up-front data analysis is required to verify the precision and scale of data in Oracle and create the DDL scripts for NUMBER data types in Snowflake accordingly.
  • String Data Types. Snowflake supports the usual string and binary data types. All character string data types are stored as variable length strings. The frequently used VARCHAR2 data type in Oracle is also recognized in Snowflake and stored internally as VARCHAR so that DDL scripts can remain unchanged with respect to string data types.
  • Date and Time Data Types. The DATE data type in Snowflake contains only the date value without time while in Oracle, the DATE data type can include time as well. When migrating DATE values from Oracle to Snowflake it is recommended to use TIMESTAMP_NTZ as the Snowflake data type to preserve the time component, otherwise you run the risk of losing the time values. Again, you would update the DDL scripts accordingly. The date model format RR, which in Oracle lets you store 20th century dates in the 21st century by specifying only the last two digits of the year, is not supported in Snowflake. Instead, the YY model format behaves similarly in Snowflake: when a two-digit year is supplied, if the year is between 00 and 69, it will translate to between 2000 and 2069, otherwise it will use 19 as the first two digits of the year.

Maja Ferle
Senior Consultant and Snowflake Data Superhero