Skip to content

Predictive models with dbt

In this example, we'll look at using Continual's dbt integration to quickly leverage our dbt projects to build predictive models.

Scenario

We're operating a business that offers a service with a monthly subscription fee. We've been asked to come up with a customer churn model and make predictions on whether a given customer is at risk of churning. Let's dive in!

Setup

We'll be using dbt's mrr-playbook example project.

Note

This example is only compatible with Snowflake!

Note

This example uses a small amount of synthetic data. It's not ideal from an ML perspective, but the point of this exercise is to demonstrate the dbt integration workflow with a small example.

Before you start, ensure that you have git, dbt, and continual CLI installed.

Next, clone the repository:

git clone https://github.com/dbt-labs/mrr-playbook

Next, ensure you have a profile named playbook, or change the profile key in dbt_project.yml to point to an existing dbt profile.

Now, execute dbt:

dbt deps
dbt seed
dbt run
dbt test

And you will have successfully built your tables in Snowflake.

dbt Models

We now have five models in our warehouse:

customer_churn_month
customer_revenue_by_month
mrr
subscription_periods
util_months

The mrr table is the main culmination of the work here. In this table we have a monthly tally for each customer to see if they were active and what their monthly spend is.

Customer churn model

Now let's add a model that will predict if each customer is going to continue using our service for the following month.

Specifically, given everything we know about the customer up to now, will they be active next month? If not, then that is an example of churn.

The mrr table has a column for previous_month_is_active but we'll actually want next_month_is_active.

Additionally, we'll also want to filter out any customer/month combinations where the customer is not active. That is, if the customer is already churned, we want to exclude them from our analysis (Note: there are some instances of customers leaving and then coming back later. We could also build a predictive model to try to detect that behavior as well, but we'll exclude it from the current example.)

We can quickly create a new table, customer_churn with the following sql. Create a file models/customer_churn.sql and copy this into it:

select customer_id
    , date_month
    , mrr
    , mrr_change
    , is_first_month
    , next_month_is_active
from (
    select *
        , lead(is_active) over (partition by customer_id order by date_month) as next_month_is_active
    from  {{ ref('mrr') }}
)
where is_active = True

Next up, we'll just need to annotate this model so Continual knows about it. To do so, place the following configuration at the top of the sql file you just created:

{{
    config(
        meta = {
            'continual': {
                'type':'Model',
                'index':'customer_id',
                'time_index': 'date_month',
                'target': 'next_month_is_active',
            }
        }
    )
}}

Now we'll just need to build our table. To do this, you can execute dbt run, which will create the customer_churn table. You can inspect it in Snowflake, as desired.

Your dbt run output should look something like this.

Now we can build the model in Continual. To do this, make sure you have:

  1. Created a Project in Continual. Let's call it 'continual_example_mrr'.
  2. Installed the Continual CLI.
  3. Logged into Continual via the CLI and set continual_example_mrr as your default project.

Once those have been completed, you can execute the following:

continual run
Your continual run output will look similar to the above.

The link at the bottom will navigate to the change in Continual. This will update with the model progress and provides a link the model version, which you can inspect when it is finished. From there you can review the performance of the model version, as well as see the predictions being manifested into Continual.

This concludes the dbt example!

Back to top