antFarm is composed of two main components:
Execution engine installed on the local environment. This is the place where ant colony reside and work to process the data as fast as possible.
Central meta data repository that is created on the target destination. This is what we call the Queen Ant residence where she can find here everything needed to successfully manage working ants.
antFarm supports different target destinations
For bigger enterprises or group of companies, there is a possibility to have distributed execution engines with single central repository.
Scalable parallel execution
After you have installed antFarm, the first step is to establish connections to your data sources and to the target destination. antFarm uses named connection strings to databases and filesystems.
To define the data load execution logic, you first need to configure the data processing flow. Data processing flow is composed of different steps, e.g. extract, truncate, load and process. You can define as many steps as you like.
In addition, we have a parametrized workflow that takes care of the process standardization and dictates data processing flow behaviour.
Within the workflow you assign various operations to each step. In general, there are two types of operations:
For example, a load step could be composed of put and copy operations.
During the data processing flow each step is assigned one or multiple number of workers. Workers, we call them ants, are related to hardware resources. The more resources you have, the more workers can be activated, all resulting in faster data processing.
Based on meta data defined in the central repository for each data processing flow step, separate queues are populated with tasks. Each task is than processed by the working ant according to the workflow settings.
antFarm updates queued tasks with start and end times. Gathered data is available in predefined reports. One of them displays loads execution times.
If you are not satisfied with execution times, one thing you can do is table partitioning. This way antFarm generates multiple tasks and extracted CSV files for a single table in order to achieve the best data load performance (parallel execution).
An additional option is, as mentioned, to scale the hardware.
We’re constantly growing the list of supported data sources and target destinations. If you need to access data from a source which isn’t currently supported, please get in touch. antFarm can be easily extended.
Data Warehouse Destinations