This article describes the architecture of the forecasting platform, as shown in the diagram below.
As shown on the left of the diagram above, the platform includes a data collector module to continuously collect the data. Users define the timescale and horizon they wish to forecast, including the target measures and possible influencing factors.
When a new account is created in Anodot Cost, it uses the Forecast API to generate a new Forecast task with the details required to fetch the data from Anodot Cost.
Then the platform automatically preprocesses the data by cleaning it, removing anomalies, extracting features and preparing it for the Forecast model. The platform trains a large number of machine learning models from various types and hyperparameters, such as XGB, linear models and prophet in order to achieve the most accurate results. The platform evaluates the performance of the models and ensembles them to produce the most accurate forecasts per cost metric.
The platform generates forecasts continuously, and provides different ways to consume them. In Anodot Cost the forecasts are retrieved every day using the Forecast API, and the accuracy of the forecasts are automatically reviewed. New models are retrained every once in a while to adjust them to the latest data.
The Forecast architecture can be implemented on a large scale, mainly by using an orchestration engine for scheduling and running the pipelines, and by distributing the jobs to run in parallel on an AWS batch.