antFarm is composed of two main components:

Execution engine installed on the local environment. This is the place where ant colony reside and work to process the data as fast as possible.

Central meta data repository that is created on the target destination. This is what we call the Queen Ant residence where she can find here everything needed to successfully manage working ants.

High level architecture

Cloud Environment

Local Environment

antFarm supports different target destinations

For bigger enterprises or group of companies, there is a possibility to have distributed execution engines with single central repository.

Scalable parallel execution

How does antFarm work?

Environment setup

After you have installed antFarm, the first step is to establish connections to your data sources and to the target destination. antFarm uses named connection strings to databases and filesystems.

Data load preparation

AntFarm automatically:

  • Retrieves the list of tables and its definition from the data source catalogue.
  • Creates a meta data repository where the definitions of application sources, table lists, optimization rules, such as partitions, etc., are stored. With this information the Queen Ant can successfully manage working ants.
  • Creates target tables according to the source table definitions.
  • Converts data types based on the source and target databases, if necessary.
  • Generates SQL queries to shift 'n' lift the data.

Configurable execution

To define the data load execution logic, you first need to configure the data processing flow. Data processing flow is composed of different steps, e.g. extract, truncate, load and process. You can define as many steps as you like.

In addition, we have a parametrized workflow that takes care of the process standardization and dictates data processing flow behaviour.

Within the workflow you assign various operations to each step. In general, there are two types of operations:

  • extract operation that reads the data from the source and writes it to the buffer - CSV file
  • and database processing operations such as truncate, delete, copy, put and process.

For example, a load step could be composed of put and copy operations.

During the data processing flow each step is assigned one or multiple number of workers. Workers, we call them ants, are related to hardware resources. The more resources you have, the more workers can be activated, all resulting in faster data processing.

Based on meta data defined in the central repository for each data processing flow step, separate queues are populated with tasks. Each task is than processed by the working ant according to the workflow settings.

Optimization and table partitioning

antFarm updates queued tasks with start and end times. Gathered data is available in predefined reports. One of them displays loads execution times.
If you are not satisfied with execution times, one thing you can do is table partitioning. This way antFarm generates multiple tasks and extracted CSV files for a single table in order to achieve the best data load performance (parallel execution).
An additional option is, as mentioned, to scale the hardware.

Out-of-the box support

We’re constantly growing the list of supported data sources and target destinations. If you need to access data from a source which isn’t currently supported, please get in touch. antFarm can be easily extended.

Data Sources

Data Warehouse Destinations