Back

Maintaining a Mapping Table in the Snowflake Native App

7.03.2024

Forecasting with Snowflake Cortex ML-based Functions Predicting sales demand based on retail sites search terms

Photo by Andrea Schettino on Pexels

According to the Snowflake documentation, Cortex ML-based functions use machine learning to detect patterns in your data. You don’t have to be a machine learning developer to take advantage of them.

I thought it would be fun to explore the forecasting functions directly in SQL, which can be done quickly and easily from Snowsight worksheets. To get some sample data, I browsed the Snowflake Marketplace where I found the Onsite Search — Product demand analysis on retail sites and marketplaces data set provided by Similarweb Ltd. They offer a free version of their dataset which I got into my Snowflake account, naming the database ONSITE_SEARCH. There is a schema named DATAFEEDS and a table named ON_SITE_SEARCH.

The description of this data set says that it can be used to understand which brands and products consumers are searching for to keep up with market demand. Based on this data, I can use the Cortex ML-based forecasting functions in Snowflake to predict future demand.

Preliminary data analysis

With the sample data ready, I did some preliminary data analysis. Using a few simple SQL queries, I discovered that the free dataset includes data for keyword search on Amazon.com from the US between January 1, 2021 until June 30, 2022.

Here is an overall representation of all keyword searches by day for the entire dataset using the Chart feature in Snowsight worksheets to visualize the results:

-- all keyword searches by day
select date, count(*) as cnt 
from ONSITE_SEARCH.DATAFEEDS.ON_SITE_SEARCH 
group by all;
All keyword searches by day

It is obvious from the diagram that there is an outlier on June 21, 2021. After some digging I found out that this was Amazon Prime day, an annual shopping holiday sponsored by Amazon. Another outlier can be spotted between November 20 and 25, 2021 which is the time leading up to and including the Black Friday sales.

The keyword search also appears to be larger than usual during the month of December when shoppers usually look for Christmas gifts. There is also a noticeable slight dip in keyword search on December 24, 2021 which is Christmas Eve when the gifts have been bought and consumers probably spend more time with their families and less time on-line shopping. There appears to be a general downwards trend of keyword searches since the beginning of 2021, presumably due to Covid when consumers were shopping on-line more than usual.

It was fun also looking at a few individual keyword searches, for example toys:

-- keyword searches for toys by day
select date, sum(calibrated_visits) as cnt_toys
from ONSITE_SEARCH.DATAFEEDS.ON_SITE_SEARCH
where oss_keyword ilike '%toys%'
group by all;
Keyword searches for toys by day

As we can see from the chart, in addition to the two outliers related to Amazon Prime day and Black Friday sales, there is also a significant peak in keyword search during the month of December, presumably due to Christmas gift shopping when more toys are bought for children as compared to other items.

Some other keyword searches display different trends, for example backpacks are in highest demand during August for back-to-school shopping. Cat litter, on the other hand, doesn’t show peaks in keyword search, because cats require litter regularly, regardless of holiday seasons or special sales events.

But enough exploring, let’s go to forecasting. I will forecast the demand for toys in the future based on historical keyword searches.

Forecasting toy demand

Let’s start with a basic time-series forecasting model. To create a forecast, the model requires historical data based on which it will produce univariate predictions of future data.

The historical data must include:

  • a timestamp column, in our case the date of the search
  • a value column, in our case the number of keyword searches

Before we can start forecasting, let’s do some prep work:

-- create a role that will do the forecasting
use role useradmin;
create role data_scientist;

-- grant the role to myself so I can test it
grant role data_scientist to <my current user>;

-- grant privileges on the historical data database to the new role
use role accountadmin;
grant imported privileges on database onsite_search to role data_scientist;

-- create a warehouse, database, and schema and grant privileges
use role sysadmin;
create database forecast_db;
grant usage on database forecast_db to role data_scientist;
create schema time_series;
grant all on schema time_series to role data_scientist;
create warehouse forecast_wh with warehouse_size = 'xsmall';
grant usage on warehouse forecast_wh to role data_scientist;

-- continue working with the newly created role
use role data_scientist;
use warehouse forecast_wh;
use database forecast_db;
use schema time_series;

This python code first retrieves the Snowflake session information so that it can issue commands to Snowflake. It then creates a Streamlit form. The form contains a data frame editor and a Submit button. The data frame editor is initially populated from the SIMPLE_MAP table that was created in the setup script.

The user can add and update data in the data frame editor and once they are done, they press the Submit button. The contents of the edited data frame are written back to the SIMPLE_MAP table using the overwrite = True option which means that the data in the table is overwritten with the new data.

This part of the code was not working initially, until I figured out (using the DEBUG mode and looking at Query History) that when a data frame is overwritten, the data is first written to a temporary table, then the target table is dropped and the temporary table renamed to the target table, causing the grant to the APP_PUBLIC role to disappear. For this reason, the grant must be issued again each time the data is written to the SIMPLE_MAP table.

CREATE APPLICATION simple_mapping_app
  FROM APPLICATION PACKAGE simple_mapping_app_pck
  USING '@simple_mapping_app_pck.STAGE_CONTENT.simple_mapping_app_stage';

I tested the application in the same account where I developed it and after many retries, tweaks, and bug fixes, it was finally working! Here it is:

The user can edit the data in the Streamlit data frame editor and save the data to a table in Snowflake. The data can then be accessed using a SQL SELECT statement:

SELECT FROM_CODE, TO_CODE
FROM SIMPLE_MAPPING_APP.CORE.SIMPLE_MAP;

Giving the following result:

In a real situation, this data could then be accessed by an ETL process and populated into the data warehouse. Of course this application is just a prototype and could use some tweaking before it is production-ready, particularly in adding more functionality such as allowing a user to delete records (or mark as deleted), checking for uniqueness of codes so that users don’t enter duplicate data, possibly checking target codes against a preexisting table, and so on.

Recommendations for working with Native Apps

While developing my application, I found it most convenient to have four windows open:

  1. Visual Studio Code where I maintained all of my code files
  2. Command line window with SnowSQL running where I executed the PUT commands to the Snowflake internal stage (hopefully Snowflake Git integration will be available soon which should obliterate this step)
  3. A Snowsight window with a worksheet where I created the application package and the application as well as dropped and recreated the application each time I changed the code. I also used this window and additional worksheets where I debugged my application by viewing objects and Query History.
  4. Another Snowsight window where I tested the application by executing it each time I recreated it. It was useful to have this additional Snowsight window so that I didn’t have to jump from the application to the worksheets and back again within a single window.

Since a Native App has many moving parts and many possible points of failure, testing can be difficult. A very useful feature is the ability to run the application in debug mode. This can be activated using the following command:

ALTER APPLICATION simple_mapping_app SET DEBUG_MODE = TRUE;

When the application is executed in debug mode, it will show you all objects in the application, even those that would otherwise not be visible to the consumer. This is very helpful in troubleshooting grants on objects. Additionally, you will be able to see all the commands that were executed by the application in the Query History window (normally, an application hides these commands from the consumer), including any failed commands which is again helpful in troubleshooting your application’s code logic.

When developing my application I found it easier to not create any versions because this would just add complexity while I was still developing and debugging my code. Each time I changed my code, I just recreated the application. Once I had the application up and running, this was my initial version 1.0.

While testing the application, I executed it in the same account where I was developing. Just like I don’t deploy any untested code, I didn’t publish the application to the Marketplace to be shared with other accounts until I was sure it was completely done and working as expected.


Maja Ferle
Senior Consultant and Snowflake Data Superhero