Back

Seamlessly connect Dropbox to Snowflake with Openflow

17.06.2025

Snowflake’s Openflow simplifies the cumbersome job of accessing and uploading files into Snowflake for various AI use cases, such as building RAG applications

Many AI and data applications in Snowflake start with accessing files, such as PDF files, Word documents, or plain text. The first step is usually to upload the files into a Snowflake stage which can be a cumbersome and manual process.

One example of an AI application is a RAG (retrieval augmented generation) system, where users interact with their own documents by asking questions in natural language. I described such a solution in one of my previous posts titled Using Snowflake Cortex with Anthropic Claude to Find Information in Home Appliance Manuals. In that example, I uploaded the files manually to a Snowflake internal stage.

Now with Openflow, Snowflake’s newest integration service that connects any data source and any destination with hundreds of processors supporting structured and unstructured text, images, audio, video, and sensor data, we can retrieve files directly from platforms like Dropbox, SharePoint, Google Drive, Box, and many more. We can fully automate this process for continuous ingestion.

Let’s walk through the steps required to build an Openflow data flow that takes files from a designated folder in Dropbox and saves them in a Snowflake internal stage for later use in a RAG application.

Introducing Openflow

The Openflow service is built on Apache NiFi and currently available in Snowflake AWS commercial regions. Some of the key features and benefits of Openflow are:

It serves as a data integration platform that enables data engineers to build ETL processes
It is extensible, allowing you to build and customize processors from any data source to any destination
It ensures security, compliance, and observability for data integration
It lets you ingest structured and unstructured data, in both batch and streaming modes
It supports continuous ingestion of multimodal data for AI processing

The main components of Openflow are:

Deployment: this is where data flows execute in individual runtime environments
Runtime environment: these host your data flows. You can have multiple runtime environments within a deployment to isolate different projects or teams.
Control plane: this layer contains the Openflow service and API

To set up Openflow, a cloud administrator creates a deployment and a runtime environment. Additional information on how to set up Openflow is available in the Snowflake documentation.

Once the set up is complete, you will have a deployment and a runtime environment available where you can start building your data flows.

We will now walk through creating and running a flow that accesses files in a Dropbox folder and saves them in a Snowflake stage. But first, let’s set up our Dropbox.

Set up Dropbox

In your Dropbox account (you can sign up for free if you don’t have one already), create a folder that will store your files and upload some sample files. In this example, I named my folder manuals and I uploaded a few pdf files.

Then go to the App Console in your Dropbox account to set up a scoped app. Create a new app with the following permissions: account_info.read, files.metadata.read, and files.content.read.

Then generate an access token. While you are there, also take note of the app key and app secret because you will need them later.

Create a data flow in Openflow

In the Snowsight UI, click Openflow which you will find under the Data option in the left navigation menu. Click the Launch Openflow button at the top right and sign in. You should have a deployment and a runtime environment available.

Click the name of your runtime environment. This will open your Apache NiFi canvas where you will configure your data flow. Before you begin, it helps if you know the basic Apache NiFi building blocks, such as process groups, processors, and controllers. You can find many Apache NiFi tutorials on the internet, or start with the Getting Started with Apache NiFi guide.

To better organize the data flows, we will contain each flow inside a process group. In the canvas, create a process group by dragging and dropping the Process Group icon on the canvas. Give the process group a name, for example Dropbox_to_Snowflake.

Dropbox_to_Snowflake process group

Double click this process group, which will open another empty canvas where we will build the flow.

Our flow will consist of the following processors:

ListDropbox — to list files in a Dropbox folder
FetchDropbox — to fetch files from Dropbox
PutSnowflakeInternalStageFile — to put the fetched files into a Snowflake internal stage

You can drag-and-drop each of these processors onto your canvas and connect them in sequence, adding a funnel at the end to terminate the flow. As we don’t have any error handling implemented yet, we will route the failure paths to the funnel that ends the flow for now. Your flow should resemble the one shown below:

Data flow that lists files in Dropbox, fetches them, and puts them into a Snowflake internal stage

After creating this flow, you will notice that all processors have a yellow triangle, indicating that they have not been configured yet. Let’s now proceed with the configuration.

Configure the processors

Starting with the ListDropbox processor, double-click it and open the Properties pane. Then configure these properties:

Dropbox Credential Service: click the three dots, choose the + Create new service option, and create the StandardDropboxCredentialService service (we will configure this service later)
Folder: enter /manuals because we will be listing files from this folder in Dropbox
Listing Strategy: you can leave the default Tracking Timestamps option, or change it, depending on how you want your listing strategy to work. For example, when the data flow is in production, you will want to list only new files that were added to the Dropbox folder to avoid uploading files that were there previously. But while you are developing and testing, you will run the process many times, and you might prefer the No Tracking option which will list all files in the folder every time the process runs.

Leave the remaining values with their default settings:

Next, double-click the FetchDropbox processor and configure these properties:

Dropbox Credential Service: you can reuse the same credential service you created in the previous processor by clicking the Value field and selecting it from the drop-down list.
File: the Dropbox file id that will be fetched. Here we are using the ${dropbox.id} variable returned from the ListDropbox processor, denoting the Dropbox file ID that will be fetched.

Leave the remaining values with their default settings:

Finally, double-click the PutSnowflakeInternalStageFile processor and configure these properties:

Snowflake Connection Service: click the three dots, choose the + Create new service option, and create the SnowflakeConnectionService service (we will configure this service later)
Internal Stage Type: choose the Named stage type
Database: enter the name of the Snowflake database where you will put the files
Schema: enter the name of the schema in your Snowflake database where you will put the files
Stage: enter the name of the internal stage where you will put the files

Leave the remaining values with their default settings:

Before we can run the flow, we must configure the services.

Configure the controller services

Right-click on your canvas and choose Controller Services. This will take you to the list of your controller services, where you should see the two services you created earlier: StandardDropboxCredentialService and SnowflakeConnectionService. Because they have not been configured yet, you will see each of them in an Invalid state.

Let’s start with the StandardDropboxCredentialService. Click the three dots next to it and choose Edit. Then enter the values for the following properties:

App Key: the app key from the Dropbox app you created earlier
App Secret: the app secret from the Dropbox app you created earlier
Access Token: the access token from the Dropbox app you generated earlier
Refresh Token: follow the instructions under Additional Details of the StandardDropboxCredentialService documentation to generate a refresh token. Tip: if you are configuring the data flow just for testing, you don’t need a refresh token so you can skip this step for now.

Save the values, then click the three dots next to the StandardDropboxCredentialService in the controller service list and Enable the service.

Next, configure the SnowflakeConnectionService with Key Pair authentication. Click the three dots next to the service name and choose Edit. Then enter the values for the following properties:

Authentication Strategy: Key Pair
Account: your Snowflake account identifier
User: your Snowflake username
Private Key Service: create a new StandardPrivateKeyService by clicking the three dots and selecting + Create new service

Leave the remaining values with their default settings. You will now see a third service added in your Controller Services list:

To configure the StandardPrivateKeyService, click the three dots next to the service name and choose Edit. Enter the value for the following property:

Key: copy/paste your private key from the private key file you use for key pair authentication

Save the value and go back to the services list. Now that you have everything configured, you should be able to Enable the StandardPrivateKeyService and the SnowflakeConnectionService.

With all processors and controller services configured and enabled, you can now proceed to run the data flow.

Run the data flow

Navigate back to the canvas. Your processors should no longer have the yellow triangle because everything is configured. Right click on the canvas and choose Start to start your data flow. You should see the processors executing and files queuing between the processors. If everything went well, your files from Dropbox should have been copied to your Snowflake internal stage. In case of any errors or misconfigurations, you should see error messages informing you of the issues that you can address.

After I ran my data flow with the following files in the /manuals folder in my Dropbox:

I checked the contents of my Snowflake internal stage by executing the list @DROPBOX_STAGE command and I saw that the files were indeed copied to the stage:

With the data flow still running, I can now add another file to my Dropbox folder, wait a little while, and see the file appear in my internal Snowflake stage. As long as the process is running, it will continue to pick up new files as they appear in Dropbox and copy them to the Snowflake stage. The files then seamlessly become part of the RAG application or can be used for any other purpose.

Conclusion

With Openflow, Snowflake enables developers to connect any data source and any destination, as demonstrated in this blog post where we integrated an external file source like Dropbox with our Snowflake internal stage.

In addition to Dropbox, Openflow supports a wide and growing range of data sources and targets, including:

Cloud file storage platforms like Google Drive, Box, SharePoint
Enterprise content systems such as Confluence, Jira, Slack, Workday
Many popular databases, including SQL Server, PostgreSQL, MySQL
Streaming platforms like Kafka, Kinesis
IoT and sensor networks for multimodal AI use cases involving video, audio, images, telemetry

Whether you’re building a RAG application or uploading documents for other AI and machine learning use cases, Openflow helps you to get your data and make it available where needed.

Maja Ferle
Senior Consultant and Snowflake Data Superhero




Full name *
Company *
Primary Email *
Message *

Cookie	Type	Duration	Description
cookielawinfo-checkbox-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	0	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
JSESSIONID	1		Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
viewed_cookie_policy	0	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.





Subscribe to our News

Seamlessly connect Dropbox to Snowflake with Openflow

Snowflake’s Openflow simplifies the cumbersome job of accessing and uploading files into Snowflake for various AI use cases, such as building RAG applications

Introducing Openflow

Set up Dropbox

Create a data flow in Openflow

Configure the processors

Configure the controller services

Run the data flow

Conclusion

Related Content

Demand Forecasting with Python and Snowflake

Forecasting with Snowflake Cortex ML-based Functions