Restricting document access for copilot users

Tutorial: Restricting Document Access for Copilot Users

This tutorial walks through an example of authenticating copilot users and filtering document retrievals based on the authenticated user’s role. The main tasks of this tutorial are:

  1. Write files to S3 with tags for user roles
  2. Create a sync between S3 files and a Continual Knowledge Base
  3. Authenticate a user and their role with Continual
  4. Filter knowledge base documents based on the copilot user’s role

Getting Started

Basic knowledge of Continual Copilots, Javascript, JSON Web Tokens (JWT), and AWS S3 is recommended for this tutorial. This tutorial depends on a copilot already embedded in an application. If you haven’t already, sign up for Continual (opens in a new tab), and create and embed a copilot in your application.

If you encounter any issues please contact support@continual.ai.

Introduction

Suppose you’re an ecommerce platform and at the end of each month PDF reports are generated for customers to analyze their business performance. A subdirectory in S3 is created for each customer account and a backend service generates and writes each report to the appropriate subdirectory. Each customer has multiple users and each user has a role. A user’s role dictates which reports the user has access to. For example, one customer of the ecommerce platform is an apparel company called Ocean Threads. Two types of users are sales managers and operations managers. The sales managers need to view the Sales Report and the operations managers need to view the Shipping and Fulfillment report.

Let’s step through how the ecommerce platform uses metadata filtering to ensure Ocean Threads’ copilot users can’t view reports they don’t have access to.

Load files into S3 and apply tags

First, add the Sales Report and Operations Report to an S3 bucket. Add an S3 tag to each file with the appropriate role. For example, role: sales for the Sales Report and role: operations for the Operations Report.

Create a data sync

In the Continual console, click on Knowledge Bases and Create Knowledge Base. Choose Managed Knowledge Base and enter the AWS access key id, secret key, bucket region, bucket name, and sync schedule. Click Connect.

After a minute, you should see the files from your bucket as documents in Continual.

Clicking on a document reveals the document’s contents and associated metadata. Metadata is automatically created depending on the data source. Files synced from S3 will have the file path, bucket name, bucket tags, and file tags.

Connect the Knowledge Base to a Copilot

Because we created a new managed Knowledge Base from our S3 bucket, we need to connect it to a copilot. Go to a chosen copilot in the Continual Console and select Integrations. Connect the new Knowledge Base (Customer Financial Reports). Now, the copilot is able to reference the documents in this knowledge base.

Make sure the selected copilot is already embedded in an application. If you don’t have any copilots created or you haven’t embedded a copilot into an application yet, please see the Copilot section or NextJS tutorial.

Authenticate application users with Continual

So far we’ve loaded two business reports into S3, applied tags on each report with different roles, created a data sync between our S3 bucket and Continual, and connected our newly created Knowledge Base with a copilot.

The next step is to know and trust the identity of copilot users. This is where Continual Access Tokens come in. Continual Access Tokens are JWTs that must have:

  1. the Copilot ID: a unique id that can be found in the upper right hand corner of the copilot page.
  2. the copilot secret key: used to sign the Continual Access Token and verify it’s associated with the copilot. It can be found in the settings tab for your copilot.
  3. an Identity grant: the identity of the user who is accessing this copilot

Authorize users to access Continual Knowledge Bases and Documents

In addition to authenticating users, we can authorize users to access specified knowledge bases and documents by setting a Datasets grant in the Continual Access Token. The "datasets" grant is specified in the payload object within the grants property. The datasets property is an object that maps dataset identifiers to their corresponding metadata filters.

We can use the metadata set on our sales and operational reports as filters by setting key-value pairs as dataset grants.

index.js
datasets: {
  my_dataset_id: { metadata_key: metadata_value },
  another_dataset_id: { another_metadata_key: another_metadata_value }
},

Let’s see how we can generate Continual Access Tokens to authenticate and authorize copilot users.

Generating Continual Access Tokens

You can generate a Continual access token using a variety of libraries (opens in a new tab) that are available for most programming languages. As mentioned above the critical information to include are:

  • Copilot Id set as the issuer
  • User Id set as the identity grant
  • User Role set as the datasets grant

An example node function:

index.js
 
const jwt = require("jsonwebtoken");
 
const generateUserAccessToken = (
  userId,
  userRole,
  copilotId,
  copilotSecret
) => {
  const algorithm = "HS256";
  const ttl = 86400;
 
  const payload = {
    grants: {
      identity: userId,
      datasets: { my_dataset_id: { role: userRole } },
    },
  };
  const header = {
    typ: "JWT",
    alg: algorithm,
  };
  const signOptions = {
    header: header,
    issuer: copilotId,
    expiresIn: ttl,
  };
 
  return jwt.sign(payload, copilotSecret, signOptions);
};
 
const signed = generateUserAccessToken("a_user_id", “user_role”, "a_copilot_id", "copilot_secret")
console.log({signed});
 
const decoded = jwt.verify(signed, "copilot_secret");
console.log({decoded})

Testing the metadata filter

Login to your app using a user with the sales role and ask the copilot “What was the total revenue in January 2024?” The copilot should return the correct answer of $45,000. Then ask “What were the total shipping costs?” And the copilot should politely respond that it couldn’t find the answer.