Seamlessly connect Dropbox to Snowflake with Openflow
17.06.2025Snowflake’s Openflow simplifies the cumbersome job of accessing and uploading files into Snowflake for various AI use cases, such as building RAG applications
Many AI and data applications in Snowflake start with accessing files, such as PDF files, Word documents, or plain text. The first step is usually to upload the files into a Snowflake stage which can be a cumbersome and manual process.
One example of an AI application is a RAG (retrieval augmented generation) system, where users interact with their own documents by asking questions in natural language. I described such a solution in one of my previous posts titled Using Snowflake Cortex with Anthropic Claude to Find Information in Home Appliance Manuals. In that example, I uploaded the files manually to a Snowflake internal stage.
Now with Openflow, Snowflake’s newest integration service that connects any data source and any destination with hundreds of processors supporting structured and unstructured text, images, audio, video, and sensor data, we can retrieve files directly from platforms like Dropbox, SharePoint, Google Drive, Box, and many more. We can fully automate this process for continuous ingestion.
Let’s walk through the steps required to build an Openflow data flow that takes files from a designated folder in Dropbox and saves them in a Snowflake internal stage for later use in a RAG application.
Introducing Openflow
The Openflow service is built on Apache NiFi and currently available in Snowflake AWS commercial regions. Some of the key features and benefits of Openflow are:
- It serves as a data integration platform that enables data engineers to build ETL processes
- It is extensible, allowing you to build and customize processors from any data source to any destination
- It ensures security, compliance, and observability for data integration
- It lets you ingest structured and unstructured data, in both batch and streaming modes
- It supports continuous ingestion of multimodal data for AI processing
The main components of Openflow are:
- Deployment: this is where data flows execute in individual runtime environments
- Runtime environment: these host your data flows. You can have multiple runtime environments within a deployment to isolate different projects or teams.
- Control plane: this layer contains the Openflow service and API
To set up Openflow, a cloud administrator creates a deployment and a runtime environment. Additional information on how to set up Openflow is available in the Snowflake documentation.
Once the set up is complete, you will have a deployment and a runtime environment available where you can start building your data flows.
We will now walk through creating and running a flow that accesses files in a Dropbox folder and saves them in a Snowflake stage. But first, let’s set up our Dropbox.
Set up Dropbox
In your Dropbox account (you can sign up for free if you don’t have one already), create a folder that will store your files and upload some sample files. In this example, I named my folder manuals and I uploaded a few pdf files.
Then go to the App Console in your Dropbox account to set up a scoped app. Create a new app with the following permissions: account_info.read, files.metadata.read, and files.content.read.
Then generate an access token. While you are there, also take note of the app key and app secret because you will need them later.
Create a data flow in Openflow
In the Snowsight UI, click Openflow which you will find under the Data option in the left navigation menu. Click the Launch Openflow button at the top right and sign in. You should have a deployment and a runtime environment available.
Click the name of your runtime environment. This will open your Apache NiFi canvas where you will configure your data flow. Before you begin, it helps if you know the basic Apache NiFi building blocks, such as process groups, processors, and controllers. You can find many Apache NiFi tutorials on the internet, or start with the Getting Started with Apache NiFi guide.
To better organize the data flows, we will contain each flow inside a process group. In the canvas, create a process group by dragging and dropping the Process Group icon on the canvas. Give the process group a name, for example Dropbox_to_Snowflake.

Double click this process group, which will open another empty canvas where we will build the flow.
Our flow will consist of the following processors:
- ListDropbox — to list files in a Dropbox folder
- FetchDropbox — to fetch files from Dropbox
- PutSnowflakeInternalStageFile — to put the fetched files into a Snowflake internal stage
You can drag-and-drop each of these processors onto your canvas and connect them in sequence, adding a funnel at the end to terminate the flow. As we don’t have any error handling implemented yet, we will route the failure paths to the funnel that ends the flow for now. Your flow should resemble the one shown below:

After creating this flow, you will notice that all processors have a yellow triangle, indicating that they have not been configured yet. Let’s now proceed with the configuration.
Configure the processors
Starting with the ListDropbox processor, double-click it and open the Properties pane. Then configure these properties:
- Dropbox Credential Service: click the three dots, choose the + Create new service option, and create the StandardDropboxCredentialService service (we will configure this service later)
- Folder: enter
/manuals
because we will be listing files from this folder in Dropbox - Listing Strategy: you can leave the default Tracking Timestamps option, or change it, depending on how you want your listing strategy to work. For example, when the data flow is in production, you will want to list only new files that were added to the Dropbox folder to avoid uploading files that were there previously. But while you are developing and testing, you will run the process many times, and you might prefer the No Tracking option which will list all files in the folder every time the process runs.
Leave the remaining values with their default settings:

Next, double-click the FetchDropbox processor and configure these properties:
- Dropbox Credential Service: you can reuse the same credential service you created in the previous processor by clicking the Value field and selecting it from the drop-down list.
- File: the Dropbox file id that will be fetched. Here we are using the
${dropbox.id}
variable returned from the ListDropbox processor, denoting the Dropbox file ID that will be fetched.
Leave the remaining values with their default settings:

Finally, double-click the PutSnowflakeInternalStageFile processor and configure these properties:
- Snowflake Connection Service: click the three dots, choose the + Create new service option, and create the SnowflakeConnectionService service (we will configure this service later)
- Internal Stage Type: choose the Named stage type
- Database: enter the name of the Snowflake database where you will put the files
- Schema: enter the name of the schema in your Snowflake database where you will put the files
- Stage: enter the name of the internal stage where you will put the files
Leave the remaining values with their default settings:

Before we can run the flow, we must configure the services.
Configure the controller services
Right-click on your canvas and choose Controller Services. This will take you to the list of your controller services, where you should see the two services you created earlier: StandardDropboxCredentialService and SnowflakeConnectionService. Because they have not been configured yet, you will see each of them in an Invalid state.
Let’s start with the StandardDropboxCredentialService. Click the three dots next to it and choose Edit. Then enter the values for the following properties:
- App Key: the app key from the Dropbox app you created earlier
- App Secret: the app secret from the Dropbox app you created earlier
- Access Token: the access token from the Dropbox app you generated earlier
- Refresh Token: follow the instructions under Additional Details of the StandardDropboxCredentialService documentation to generate a refresh token. Tip: if you are configuring the data flow just for testing, you don’t need a refresh token so you can skip this step for now.
Save the values, then click the three dots next to the StandardDropboxCredentialService in the controller service list and Enable the service.
Next, configure the SnowflakeConnectionService with Key Pair authentication. Click the three dots next to the service name and choose Edit. Then enter the values for the following properties:
- Authentication Strategy: Key Pair
- Account: your Snowflake account identifier
- User: your Snowflake username
- Private Key Service: create a new StandardPrivateKeyService by clicking the three dots and selecting + Create new service
Leave the remaining values with their default settings. You will now see a third service added in your Controller Services list:

To configure the StandardPrivateKeyService, click the three dots next to the service name and choose Edit. Enter the value for the following property:
- Key: copy/paste your private key from the private key file you use for key pair authentication
Save the value and go back to the services list. Now that you have everything configured, you should be able to Enable the StandardPrivateKeyService and the SnowflakeConnectionService.
With all processors and controller services configured and enabled, you can now proceed to run the data flow.
Run the data flow
Navigate back to the canvas. Your processors should no longer have the yellow triangle because everything is configured. Right click on the canvas and choose Start to start your data flow. You should see the processors executing and files queuing between the processors. If everything went well, your files from Dropbox should have been copied to your Snowflake internal stage. In case of any errors or misconfigurations, you should see error messages informing you of the issues that you can address.
After I ran my data flow with the following files in the /manuals
folder in my Dropbox:

I checked the contents of my Snowflake internal stage by executing the list @DROPBOX_STAGE
command and I saw that the files were indeed copied to the stage:

With the data flow still running, I can now add another file to my Dropbox folder, wait a little while, and see the file appear in my internal Snowflake stage. As long as the process is running, it will continue to pick up new files as they appear in Dropbox and copy them to the Snowflake stage. The files then seamlessly become part of the RAG application or can be used for any other purpose.
Conclusion
With Openflow, Snowflake enables developers to connect any data source and any destination, as demonstrated in this blog post where we integrated an external file source like Dropbox with our Snowflake internal stage.
In addition to Dropbox, Openflow supports a wide and growing range of data sources and targets, including:
- Cloud file storage platforms like Google Drive, Box, SharePoint
- Enterprise content systems such as Confluence, Jira, Slack, Workday
- Many popular databases, including SQL Server, PostgreSQL, MySQL
- Streaming platforms like Kafka, Kinesis
- IoT and sensor networks for multimodal AI use cases involving video, audio, images, telemetry
Whether you’re building a RAG application or uploading documents for other AI and machine learning use cases, Openflow helps you to get your data and make it available where needed.

Senior Consultant and Snowflake Data Superhero