Recipe for a Successful On-Prem to Snowflake Migration5.01.2021
Migration to Snowflake from A to Z series, part 1
About the series
As the businesses started to collect massive amounts of data, rigid and costly legacy data warehouse appliances can no longer support all their analytical requirements. Whoever wants to keep up with evolving data world today, is left with almost no choice but to modernize their existing environment. Most common decision is to go with the cloud option which is often also the most sensible thing to do.
There is no simple on-premise to cloud migration. According to Gartner, 83% of all data migration projects either fail or exceed their budgets and timelines. Regardless, data migrations are a fact, and we can no longer avoid them, but luckily, we can be well prepared for them.
With this in mind, we have written a series of articles on the subject to help companies understand the different perspectives of a migration project and set realistic expectations. We begin by summarizing key topics you should take into consideration during the preparation phase to achieve the best outcome. Each topic is explained in more detail in related articles.
What you’ll read in this series can be applied to any migration to the cloud, though we will focus on migration to Snowflake Platform. We have written vastly about its many benefits here, here and here, and up to this day, we are constantly dazzled by Snowflake’s constant evolvement and new, exciting features of their platform. It started off as a data warehouse built for the cloud and grew into this amazing, powerful Cloud Data Platform that never ceases to impress with its speed, elasticity, ease of use, a wide range of capabilities (such as analysing structured and semi-structured data together), secure data sharing and data exchange capabilities, recently launched data marketplace and to top it, effective pricing model.
Recipe for a Successful Data Warehouse Migration
Data warehouse migrations can be risky, costly, time-consuming, and challenging. To help avoid an unwanted outcome, here are four key topics you should thoroughly reconsider before signing a contract.
Migration Goals and Strategy
There are many compelling reasons why organizations around the world are moving to Snowflake, from gaining limitless scalability and high performance to lowering the infrastructure cost and taking advantage of new big data capabilities.
Be aware that not all the pathways will get you to the point where you will be able to harness all these benefits. There is no one-size-fits-all migration strategy. The choice of the most suitable approach depends on what primary goals you would like to achieve. For example, if you are running out of storage and you have a tight deadline because you don’t want to invest in new costly hardware, then lift ‘n’ shift of your existing data warehouse as-is, would be a better option to start with than complete re-engineering.
However, if your main goal is to take advantage of cloud-native capabilities that Snowflake has to offer, redesign would be hard to avoid. For example, using existing on-prem data integration tool for data transformation will not exploit the potential of the cloud computing power as it could.
Furthermore, since you are traversing to Snowflake, you would want to immediately start taking advantage of its unlimited computing power and unlimited storage. For this reason, ELT approach is way more beneficial than traditional ETL since it allows you to quickly load all your data into the Snowflake data lake and perform all the transformations there. You can also exploit solutions provided by Snowflake community or Snowflake Snowpipe – Snowflake’s continuous data ingestion service. In short, Snowpipe provides a “pipeline” for loading fresh data in micro-batches as soon as it’s available and as such, makes real-time reporting a breeze.
Proof of Concept
Before you choose your future data platform, make sure that you have run a POC use case on your data. This will allow you to learn and grasp the value a new solution can bring. Tangible results will also build confidence among stakeholders and users. Nevertheless, make sure that you choose the use case carefully so that it will assure you that the chosen platform is capable of supporting your requirements. Snowflake offers free trial to test the most complex workloads that one could not process on traditional infrastructure due to limited compute resources or other shortcomings such as semi-structured data, zero copy clone, workload isolation to avoid resource contention to name a few.
The complexity of data migration projects varies, but it can take up to several years for a large-scale project to complete. Devoting enough time to prepare a good data migration plan that will minimize disruptions and mitigate the risks, is crucial. Here is just a short checklist of the most important things you should have in mind as this topic will be covered later on in the series.
It is not only the data that needs to be migrated. There are multiple sources and DW staging areas, ETL processes in a variety of forms, metadata, security schemas and authorization privileges, orchestration processes, BI tools, users, and other data consumers. And none of them should be forgotten.
If possible, you should avoid a Big Bang approach. Instead of only one and late go-live, you can mitigate the risk by trying to break the scope into several independent parts that can be migrated and released in phases.
In the case of large projects, two-phased architecture is something that should be taken into consideration as it might bring better value because you will be able to reap cloud-based benefits sooner.
Following a selected approach, the migration process should be clearly defined with a detailed checklist and go-live strategy, supported by strong project management.
Different skill sets are required during the migration project. From specialists knowing all the technical details related to existing ETL workload and data sources to business users who know the data itself and will execute user acceptance testing as the final step before production deployment.
Regardless of the project size, challenges are very similar, so it is important to know what can come along the way.
Unreal scope, timeline, and budget.
Preconditions for preparing a well-made data migration plan and defining a realistic scope, timeline and budget are a detailed analysis and a full assessment of your existing system. Over-the-thumb estimates can create a completely distorted picture of the required effort. You must be aware of the consequences that may result from neglecting this step.
As mentioned previously, migration will touch everything from data sources and ETL procedures to BI applications and user permissions and everything in between. The more the existing system is customized and the longer it is in use, the more attention is expected to be needed. Be warned, this is the exact time when all the skeletons will come out of the closet.
Lack of human resources or knowledge.
Missing knowledge about any component that is part of the migration, legacy application developers who are not available or busy project team members that are occupied with other activities may all have a major effect on meeting the deadlines. Don’t underestimate the effort that will be required from their part.
Poor data quality. If the selected approach includes data warehouse modernization, be prepared for a parallel project dedicated to data quality.
Subject matter expertise.
Data migration is not something that is done regularly within a company and so it is important that you are informed about modern data migration approaches when hiring external consultants.
Katja Jurkovšek, Senior Consultant
Curious to learn more? Don’t miss our upcoming webinar Your Ultimate Guide to Migration to Snowflake.