Nonium

Nonium (from term nonius, device that improves accuracy by using an auxiliary scale) is our smart analytics framework, which covers the whole predictive modeling cycle. It is structured in smart infocells: isolated steps that analyze data, interact with other infocells and take decisions about the features to be used during modeling process and about the process itself.

VE

Variable Exploration

This infocell decides whether a feature is nominal, ordinal or numeric when the feature type is unknown (blind datasets).

It also explores each variable distribution detecting singular values as errors or some representation of missing values.

VR

Variable Representation

This infocell answers questions like:
  • How many observations within each nominal feature level should be kept in the model (or otherwise merged to an 'others' level)?
  • When is necessary to reorganize the levels within a nominal feature and how?
  • When is better to use a one-hot representation of an ordinal or nominal variable in tree based models?
  • Our experience is that variable representation can contribute up to 5-10% of average improvement in model performance.

NS

Normalization and Scaling

This infocell transform and re-scale numeric variables.

Depending on the model we are fitting and the nature of the variables we are working with, it might be necessary to transform some of them (like counters, quantities or elapsed times) in order to improve model performance.

FS

Feature Selection

This infocell performs variable selection for both, linear and non-linear models.

The results obtained are by far better than standard approaches of the problem (Recursive Feature Elimination, mRMR, Boruta, Lasso, ...).

We usually reach double digit improvements in several metrics (auc, rmse, logloss, normal discount cumulative gain, mean average precission...) over a fresh test set.

PT

Parameter Tuning

This infocell provides smart and fast parameter tuning, using state of the art adaptive optimization approaches.

This step is critical for most of the existing models and interacts with other infocells, like Variable Representation or Feature Selection

MB

Model Benchmarking

This infocell accurately fits several models for each dataset and compares their performance.
  • Non linear models:
    • Support vector machine (non linear kernel)
    • Random Forest
    • Gradient Boosting (first order derivatives)
    • Extrem Gradient Boosting (second order derivatives)
    • Neural Networks
  • Linear models:
    • Support vector machine (linear kernel)
    • Elastic net (Ridge and Lasso regularization coordinate descendent)
    • Linear Extreme Gradient Boosting (second order derivatives)

FI

Feature Importance

This infocell computes, for each feature, unbiased measures of its contribution to the model.

Most out-of-the-box procedures for relative importance assessment of variables are biased and overfitted. Our method is unbiased and works with one-hot features representation (not only with individual levels).

MD

Marginal Dependence

This infocell plots the marginal contribution to the response prediction for each feature in its range of variation.

This is very useful for understand the relationship between variables and response in black box models.

ME

Model Ensembling

This infocell provides advanced model ensembling using adaptive optimization.

This give us the gold standard model and let us compare the relative performance of each model with it.

ML

Meta Learning

This infocell is the next step towards automatic machine learning.

We currently use a ruled based system to select initial values, options and parameter ranges for each model. In the near future this will change:

We are currently working in a meta learning model that uses topological descriptors of the dataset in order to select the most suitable model family, parameters range where the model will be optimal and variable transformations that will perform better. This will be done without the need of deploying a huge number of models.

Next

Contact

Write us: