Finding the most accurate model is a particularly time-consuming task, requiring teams of experienced data scientists. However, even this type of profile cannot escape the trend towards algorithms automating tasks, and we are currently seeing the emergence of new technologies known as “AutoML” (automated machine learning). With Verteego, we're applying this trend to the field of decision-making.
Choosing the right algorithmic model is the key to achieving high accuracy in operational decision-making. There are hundreds of models on the market, many of them belonging to open source libraries. In its native version, Verteego offers the most frequently used algorithms (decision trees, regressions, time series, neural networks, etc.) and makes it easy to “plug in” other libraries. When training models, Verteego automatically compares the different models activated through parameter files according to user-defined prioritization criteria.
Model hyperparameters (selected by the user, as opposed to parameters which are not chosen by the user) play a key role in prediction performance. Thus, the same model can produce totally divergent results depending on the set of hyperparameters selected. The user can “guide” Synapp® in the choice of hyperparameters by indicating specific ranges in the configuration file. Nevertheless, Verteego Brain natively integrates methods for selecting the most effective hyperparameters, so that even without manual selection, results can be obtained with good accuracy.
The correct selection of the model's explanatory variables (also known as features) is one of the most time-consuming tasks in the work of a data scientist. It is important to include as many explanatory variables as possible, as these increase the accuracy of predictions, without adding superfluous variables that could create unnecessary “noise” and deteriorate the quality of predictions. Verteego Brain uses the most efficient techniques for assessing the relevance of variables, to relieve the user of this sometimes complex task.
Verteego Brain is not a solution for data preparation (like Talend, Trifacta, Dataiku and others). Nevertheless, sometimes it can be convenient to modify input data “on the fly”, without having to regenerate the underlying datasets entirely.
For this purpose, Verteego Brain makes it very intuitive to set up pre-processing rules via the parameter file. For example, it is possible to generate additional variables, calculated from other variables, define rules for replacing certain values, exclude outliers according to certain well-defined criteria, and so on.
In some cases, prediction results can be anomalous. This is particularly true when the quality of the input data is not optimal. It then becomes necessary to correct the results by implementing different types of business rules (e.g. correction of outliers, replacement of null values, etc.). Verteego Brain makes it easy to define these rules via its parameter file.
Your datasets can be very heterogeneous. Depending on the input datasets used in learning, one or other algorithm may be more effective. But it's technically complex to use different algorithms in combination.
With Verteego Brain, this constraint is a thing of the past, as you no longer need to choose between different approaches. Depending on the type of data, Verteego Brain will combine the most efficient modeling approaches for each subset of data to achieve the best overall accuracy.