Using Continual with dbt¶
Overview¶
Using Continual on a dbt project is easy:
-
Define Continual feature sets and models in your dbt project
-
Execute
dbt run
to run your dbt project and build your data models -
Execute
continual run
to run your Continual project and build your predictive models
For an overview of how this works, refer to dbt integration.
Getting started¶
In order to use Continual on a dbt project, you'll need to have:
- Registered an account in Continual
- Installed the Continual CLI
- Created a Continual project and connected a data warehouse to it
Refer to the quickstart for a walk through of these steps.
Choosing an integration method¶
If you have an existing dbt project, there are two approaches to adding Continual:
-
Create Continual feature sets and models by annotating dbt files with Continual metadata
-
Create Continual feature sets and models by adding Continual YAML definition files to your repository
Annotating dbt files with Continual metadata¶
Continual feature sets and models can be defined by annotating dbt files with metadata.
This approach uses dbt's model config mechanisms.
dbt meta
fields can be defined in four places:
- The
models
block in thedbt_project.yml
project file. - Model schema files
schema.yml
in yourmodels/
directory - Model property files
properties.yml
in yourmodels/
directory - As inline
config
blocks directly in your dbt model SQL files
In schema or property YAML files, e.g. models/schema.yml
, Continual feature sets and models are defined using a YAML syntax:
models:
- name: customer_churn
description: Probability a customer churns in the next 30 days.
meta:
continual:
type: Model
index: ID
target: churn
In dbt model files, e.g. models/customer_churn.sql
, Continual feature sets and models are defined using python/jinja syntax:
{{
config(
meta = {
"continual": {
"type": "Model",
"index": "ID",
"target": "churn",
}
}
)
}}
SELECT ...
Annotating dbt projects with Continual YAML¶
Continual feature sets and models can be defined side-by-side with your dbt models by adding Continual YAML definition files to your repository.
For example, if you have an existing dbt project:
my_dbt_project/
dbt_project.yml
models/
customers.sql
customer_churn.sql
you can add Continual YAML definition files into your repository by creating a side-by-side folder:
my_dbt_project/
dbt_project.yml
models/
customers.sql
continual/
featuresets/
customers.yml
models/
customer_churn.yml
In this setup you run your dbt workflow as usual, and then run the Continual workflow via the CLI as a separate step.
For more information on working with Continual YAML and the CLI, refer to Working with the CLI.
Using dbt metadata¶
How you define Continual feature sets and models will depend on your chosen integration method.
If using dbt metadata in dbt model files, you'll annotate your dbt model sql files with python/jinja syntax:
{{
config(
materialized = "table",
meta = {
"continual": {
"type": "FeatureSet",
"entity": "Customer",
"index": "customer_id"
}
}
)
}}
SELECT
customer_id,
...
For all other integration methods, you'll define dbt model metadata with YAML syntax.
If using dbt config metadata in the dbt project file dbt_project.yml
:
name: my_dbt_project
models:
my_dbt_project:
customers: # applies to models/customers.sql
config:
+meta:
continual:
type: FeatureSet
entity: Customer
index: customer_id
If using dbt config metadata in dbt model property files models/properties.yml
:
version: 2
models:
- name: customers
config:
meta:
continual:
type: FeatureSet
entity: Customer
index: customer_id
If using dbt metadata in dbt schema files models/schema.yml
:
version: 2
models:
- name: customers
config:
meta:
continual:
type: FeatureSet
entity: Customer
index: customer_id
Defining Continual resources¶
Defining feature sets¶
To define a Continual feature set based on an existing dbt model, use type = FeatureSet
:
{{
config(
materialized = "table",
meta = {
"continual": {
"type": "FeatureSet",
"index": "customer_id"
}
}
)
}}
SELECT
customer_id,
...
The only required fields are:
type
index
Note
Feature set name
is also a required field, but when annotating an existing dbt model Continual will use the name of the model itself
All fields available in Continual's declarative YAML format can also be used here. For a full list of options, refer to YAML reference.
In addition to these fields, there are also dbt integration specific fields that can be defined:
enabled
[Optional]: Whether or not to enable Continual integration for this feature set. The default value isTrue
. Any dbt model which has this set toFalse
will be skipped for Continual integration, even if the other parameters are set.
Defining predictive models¶
To define a Continual predictive model, use type = Model
:
{{
config(
materialized = "view",
meta = {
"continual": {
"type": "Model",
"index": "customer_id",
"target": "is_active"
}
}
)
}}
SELECT
customer_id,
is_active,
...
type
index
target
Note
Predictive model name
is also a required field, but when annotating an existing dbt model Continual will use the name of the model itself
All fields available in Continual's declarative YAML format can also be used here. For a full list of options, refer to YAML reference.
In addition to these fields, there are also dbt integration specific fields that can be defined:
-
enabled
[Optional]: Whether or not to enable Continual integration for this feature set. The default value isTrue
. Any dbt model which has this set toFalse
will be skipped for Continual integration, even if the other parameters are set. -
create_exposures
[Optional]: Whether or not to create dbt exposure files for Continual predictive models. If used, Continual will create exposures in your dbt workflow that will link dependent tables into the Continual model. Default isFalse
. -
create_sources
[Optional]: Whether or not to create dbt soruce files that link to Continual prediction tables. This will create a source file in your dbt project pointing to all the prediction tables created by Continual, making it easier for downstream consumption. Default isFalse
-
create_stub
[Optional]: Whether or not to create dbt model stub files for Continual predictive model prediction tables. Using this will have Continual create files in your dbt project that reference the prediction tables built by Continual. This makes it easy to start incorporating your predictions downstream in your dbt project. Default isFalse
{{ config(
meta = {
"continual": {
"type": "Model",
"index": "customer_id",
"time_index": "timestamp",
"target": "churn",
"split": "split",
"description": "description",
"columns": [
{'name':'customer_id', 'entity': 'customers'},
{'name':'product_id', 'entity': 'products'},
{'name':'created_at','type': 'Timestamp'},
],
"exclude_columns": ['address', 'SSN'],
"train": {'metric': 'roc_auc', 'schedule':'@daily'},
"promote": {'policy':'best'},
"predict": {'incremental':True, 'schedule':'@daily'},
"create_stub": True,
"create_exposures": True,
"create_sources": True,
"enabled": True,
}
})
}}
SELECT ...
Running¶
Running Continual with dbt is a two step process:
- Execute
dbt run
- Execute
continual run
dbt run¶
Execute dbt run
as you would in your normal dbt workflow.
dbt run
needs to be done first because:
-
dbt compiles model metadata into a manifest which Continual then uses to determine which Continual resources to create and manage
-
dbt executes on your data warehouse, creating any model tables that Continual will query as inputs to feature sets and predictive models
continual run¶
continual run
:
- Parses the dbt compiled manifest for Continual metadata
- Generates Continual YAML representations of any defined feature sets and models
- Pushes those feature set and model definitions to Continual, which results in the execution of a change plan in Continual
Continual run functions much like dbt run
. It reads profile and target information directly from the dbt project:
-
--profiles-dir: The directory containing your
profiles.yml
file. -
--project-dir: The dbt project directory, i.e. the directory containing your
dbt_project.yml
file. -
--profile: Overrides the default profile found in
dbt_project.yml
. -
--target: Overrides the default target found in
profiles.yml
.
Continual specific information can also be passed in to override project defaults:
-
--project: The continual project to use. Overrides the currently set project.
-
--continual-dir: The subdirectory in your dbt_project to save Continual YAML files. By default, this is set to whatever is the
targets-path
in thedbt_project.yml
file. As a result, Continual YAML files will not be saved or versioned in git with the default dbt settings. You can modify this behavior by explicitly selecting a different directory.
Making changes¶
Continual configuration changes¶
When you make changes to the Continual meta fields in your dbt models, you'll need to tell Continual about them:
- Execute
dbt run
to regenerate the dbtmanifest.json
file - If you don't want to issue a run on your data warehouse, you can optionally just execute
dbt compile
instead. - Execute
continual run
. The Continual CLI will parse the dbt compiled manifest, generate updated Continual YAML definitions, and push them to Continual. This will result in a change plan to be executed.
dbt model changes¶
When you make schema changes to dbt models which Continual relies on, you'll want to re-run your Continual feature sets and models on the new tables:
- Execute
dbt run
to run the dbt models in your data warehouse - Execute
continual run
Working with projects¶
dbt projects¶
A dbt project is a collection of dbt files associated together with a dbt_project.yml
configuration file. You typically have a separate git repository for each dbt project.
A dbt project is identified by the name
key in dbt_project.yml
.
Continual projects¶
A Continual project is a collection of Continual feature sets and models. We recommend creating a separate Continual project for each dbt project you have.
A Continual project is identified by the name
key you use when you log in to the Continual CLI:
continual login --email my@email.com --password secret --project my_project
If your Continual user is associated to more than one Continual project, you can specify the desired project:
-
By setting the default project for Continual CLI:
continual config set-project my_other_project
-
By setting the project on each
continual run
call:continual run --project my_project_associated_with_dbt
For more information on creating Continual projects, refer to Projects and environments.
Working with targets and environments¶
dbt profiles and targets¶
In a typical dbt setup, you'll have:
- One dbt project, defined in
dbt_project.yml
- The project is associated to a profile name via the
profile
key indbt_project.yml
- The profile is defined in
~/.dbt/profiles.yml
- Note: when using dbt Cloud IDE, you do not configure profiles, but you can configure targets.
- The profile definition has one or more targets
- Each target uses a different data warehouse connection configuration
- In some cases, you may have separate targets pointing to different schemas in the same database
- In other cases, you have may separate targets pointing to the same schema names but in different databases
Continual environments¶
In a Continual project:
- You have one or more Continual projects
- Each Continual project has one or more environments configured
- Each environment is associated to a data store configuration
Mapping targets to environments¶
For a Continual project used in conjunction with a dbt project, the recommended setup is:
- Each dbt project is associated to one Continual project
- Each dbt target is associated to its own Continual environment
For 2), when executing continual run
, the Continual CLI determines the active dbt target by:
- reading the default dbt target based on the dbt configuration in
dbt_project.yml
and~/.dbt/profiles.yml
- using the target provided with the
continual run --target dbt_target_name
flag (this matches the behaviour of dbt run:dbt run --target dbt_target_name
)
It will then use the Continual environment with the same name and configuration (and will create the Continual environment with the name and configuration if it does not already exist).
The implication of this setup is that Continual will read and write to the same database and schema as dbt.
Some users may wish to keep their Continual tables in a separate schema from their dbt tables. In this case, it is recommended to create dbt targets specifically for Continual and then execute the following:
dbt run --target dev
continual run --target continual-dev
Environment workflows¶
Below are some sample workflows that may be common to dbt users and how
continual run
would be integrated into them.
Development¶
git clone <my-dbt-project>
cd <my-dbt-project>
git checkout -b <my-new-branch>
<modify dbt files>
dbt run --target dev
continual run --target dev
git add <modified dbt files>
git commit -a
git push
Separate production environment, no CI/CD¶
git clone <my-dbt-project>
cd <my-dbt-project>
git checkout -b <my-new-branch>
<modify dbt files>
dbt run --target dev
continual run --target dev
git add <modified dbt files>
git commit -a
git push
dbt run --target prod
continual run --target prod
Separate production environment, with CI/CD¶
git clone <my-dbt-project>
cd <my-dbt-project>
git checkout -b <my-new-branch>
<modify dbt files>
dbt run --target dev
continual run --target dev
git add <modified dbt files>
git commit -a
git push
<create pull request>
In this example, your CI/CD system will be tasked with building models and predictions in production. See our CI/CD guide for more information.