Skip to content

YAML reference

Continual feature sets and models can be declaratively defined as YAML.

This document describes the options available when creating YAML files.

Feature set YAML reference

For reference, the general format of the file is as follows:

type: FeatureSet
name: <feature_set_name>
entity: [entity_name]
index: <column_name>
time_index: [column_name]
description: [description]
url: [url]
owners:
  - [owner@domain.com]
  ...
documentation:
    [documentation]
columns:
  - name: <feature_1_name>
    description: [feature_1_description]
    type: <feature_1_type>
        ...
exclude_columns:
  - [list_of_column_names]
profile:
  schedule: [schedule]
  ...
query: <sql_query>
table: <table_name>

type [Required]

Must be FeatureSet.

name [Required]

The name of this feature set. Allowed characters in the name are all alphanumeric characters and '_'.

See Feature Sets & Entites for more information on naming feature sets.

entity [Optional]

The name of the entity that the feature set is a part of. If not provided, this will default to to the name of the feature set. Allowed characters in the name are all alphanumeric characters and '_'.

See Feature Sets & Entites for more information on naming entities.

index [Required]

The name of the column which identifies a member of the entity (i.e. the ID). If not using time_index this should uniquely identify records in your feature set. If using a time_index this will uniquely identify records in your feature set along with the time_index.

time_index [Optional]

If using a temporal feature set, this is the name of the column that corresponds to the timestamp which, along with the index uniquely identifies records in your feature set.

description [Optional]

Short description of the feature set.

documentation [Optional]

Free-text documentation of the feature set. Used to help others understand how to use the feature set, the features contained therein, etc.

url [Optional]

URL related to the featureset.

owners [Optional]

Email addresses of owners. These should match their Continual accounts.

columns [Optional]

Columns are optional. If not provided Continual will infer types from the underlying data warehouse. If you wish to override types or provide descriptions for columns, you may include them in your YAML file.

name [Required]

The column name.

type [Optional]

The column data type. Supported options are:

  • Number
  • Text
  • Categorical
  • Boolean
  • Timestamp

description [Optional]

Short description of the column.

exclude_columns [Optional]

This is a list of columns in the resulting query or table to explicitly exclude from the feature set. This is mainly used if you wish to not explicitly list columns and instead let Continual infer them, but you also want to exclude a few columns from being used.

profile [Optional]

schedule [Optional]

Cron-based schedule which determines how often to profile the featureset. See, for example, Cron Manual Page.

query [Optional]

A valid SQL query that will generate the feature set.

NOTE

Either query or table is required in your YAML file.

table [Optional]

A valid table name in your data warehouse that will be used as the feature set.

NOTE

Either query or table is required in your YAML file.

Quoting table names

When using quoted table names, use a YAML literal block, or wrap your quoted table name in single or double quotes.

Example:

Invalid: table: "database"."schema"."table"

Valid : table: '"database"."schema"."table"'

Valid :

table: |
  "database"."schema"."table" # note how this does not need to be escaped since it's inside the YAML literal block.

Model YAML reference

For reference, the general format of the file is as follows:

type: Model
name: <model_name>
index: <column_name>
time_index: [column_name]
target: <column_name>
split: <column_name>
description: [description]
url: [url]
owners:
  - [owner@domain.com]
  ...
documentation:
    [documentation]
columns:
  - name: <feature_1_name>
    description: [feature_1_description]
    type: <feature_1_type>
    entity: [entity_name]
exclude_columns:
  - [list_of_column_names]
train:
  schedule: [schedule]
  metric: [metric]
  included_model_types: [list_of_model_types]
  excluded_model_types: [list_of_model_types]
  size: [size]
  plots: [list_of_plots]
  exclude_ensemble: [True/False]
  optimization: [optimization]
  disable_data_checks: [True/False]
  time_limit: [number_of_seconds]
  size_limit: [number_of_bytes]
predict:
  schedule: [schedule]
  incremental: [True/False]
promote:
  policy: [policy]
  ...
query: [sql]
table: [table_name]

type [Required]

Must be Model.

name [Required]

The name of this model. Allowed characters in the name are all alphanumeric characters and '_'.

See Models & Model Versions for more information on naming models.

index [Required]

The name of the column which identifies a member of the entity (i.e. the ID). If not using time_index this should uniquely identify records in your model spine. If using a time_index this will uniquely identify records in your feature set along with the time_index.

time_index [Optional]

If using a temporal model, this is the name of the column that corresponds to the timestamp which, along with the index, uniquely identifies records in your model.

target [Required]

The name of the column that contains the value you wish to predict.

split [Required]

The name of hte column that contains the user-defined split. Values in here must be TRAIN, TEST, and VALI. Any other values will be ignored and not used for training.

description [Optional]

Short description of the feature set.

documentation [Optional]

Free-text documentation of the feature set. Used to help others understand how to use the feature set, the features contained therein, etc.

url [Optional]

URL related to the featureset.

owners [Optional]

Email addresses of owners. These should match their Continual accounts.

columns [Optional]

Columns are optional. If not provided Continual will infer types from the underlying data warehouse. If you wish to link to external entities, override types, or provide descriptions for columns, you may include them in your YAML file.

name [Required]

The column name.

type [Optional]

The column data type. Supported options are:

  • Number
  • Text
  • Categorical
  • Boolean
  • Timestamp

entity [Optional]

The name of the entity that this column links to. I.E. values in this column are mapped to the index column in the target entity.

description [Optional]

Short description of the column.

exclude_columns [Optional]

This is a list of columns in the resulting query or table to explicitly exclude from the model. This is mainly used if you wish to not explicitly list columns and instead let Continual infer them, but you also want to exclude a few columns from being used.

train [Optional]

The training section allows you to configure Continual's AutoML Engine. This is optional configuration and Continual will use smart defaults if not provided.

schedule [Optional]

Cron-based schedule which determines how often to train the model. See, for example, Cron Manual Page.

metric [Optional]

The performance metric that the AutoML will optimize on. Current opetions are:

included_model_types [Optional]

The list of model types to include when training a model. Available types are:

  • Supported Models - XGBoost - GBM - Catboost - FastAI - ExtremeTree - Linear - RandomForest - NeuralNet - KNN

  • Experimental Models: - FASTTEXT - AG_TEXT_NN - TRANSF - LightGBMLarge

    If nothing is entered, Continual will by default run the supported model types. Users may optionally enter models to include (whitelist) or exclude (blacklist). Included models take precedent. If it is included, only those models will be run. If it is not included, Continual will run any supported models not on the excluded list.
    

excluded_model_types [Optional]

A list of model types to exclude from training. The list is the same from the included model type

size [Optional]

The size of the trainer pod.

Available options are:

  • small
  • medium (default)
  • large
  • xlarge
  • xxlarge

If you receive out of memory errors during a training, you'll want to increase the size of the memory pod and try running your training again.

plots [Optional]

A list of option plots to generate when training a model version. Some plots may be resource or time-intensive to generate, so they are optionally included.

Available options are:

exclude_ensemble [Optional]

Whether or not to generate ensemble models during model training. Default: True.

optimization: [Optional]

Types of optimization to run during model training.

Available Options are:

disable_data_checks: [Optional]

If set to true, will not run basic checks on dataset validity such as checking for duplicate indices or null valued indices. See Data Checks for full details on what checks are performed on data. Default value is False.

time_limit: [Optional]

The wallclock time in seconds allowed for model training. Training will be stopped after this regardless of model performance. Default value is 6 hours.

size_limit: [Optional]

The size limit in bytes for the dataset used for training, testing or validation. A warning is thrown if the memory usage of the data exceeds this value. See the Data Checks page for more details. Default value is 10 GB.

predict [Optional]

schedule: [Optional]

Cron-based schedule which determines how often to run a prediction job. See, for example, Cron Manual Page.

incremental: [Optional]

Whether to run a full (False) or incremental (True) prediction. Default: False.

promote [Optional]

policy [Optional]

Instructs Continual how to handle new model versions that are trained.

Available options are:

  • latest (default)
  • best
  • manual

latest will also promote a new model version, whereas best will only promote a new model version if the specified performance metric of the new model is better than the currently deployed version. If manual is specified, no automatic promotions will occur and users will have to manually promote model versions after review.

query [Optional]

A valid SQL query that will generate the model spine. This should contain, at minimum, a column for index and target.

NOTE

Either query or table is required in your YAML file.

table [Optional]

A valid table name in your data warehouse that will be used as the model spine. This should contain, at minimum, a column for index and target.

NOTE

Either query or table is required in your YAML file.

Quoting table names

When using quoted table names, use a YAML literal block, or wrap your quoted table name in single or double quotes.

Example:

Invalid: table: "database"."schema"."table"

Valid : table: '"database"."schema"."table"'

Valid :

table: |
  "database"."schema"."table" # note how this does not need to be escaped since it's inside the YAML literal block.

Back to top