Dec 12, 2024 3 min read

How Character.AI Prioritizes Teen Safety

Millions of people visit Character.AI every month to create original Characters or write their own interactive stories — using our technology as a tool to supercharge their creativity and imagination. Our goal is to provide a space that is both engaging and safe for our community.

That’s why we’ve rolled out a suite of new safety features across nearly every aspect of our platform, designed especially with teens in mind. These features include modifications to our Large Language Model (LLM), improvements to our detection and intervention systems for human behavior and model responses, and additional features that empower teens and their parents. This suite of changes results in a different experience for teens from what is available to adults — with specific safety features that place more conservative limits on responses from the model, particularly when it comes to romantic content.

Model Training and Guidance

The Character.AI experience begins with the LLM that powers so many of our user and Character interactions. Conversations with Characters are currently driven by a proprietary model that we continuously update and refine. This work is one way to try to guide the model and improve the user experience.

In the past month, we have been developing a separate model specifically for our teen users. The goal is to guide the model away from certain responses or interactions, reducing the likelihood of users encountering, or prompting the model to return, sensitive or suggestive content. This initiative has resulted in two distinct models and user experiences on the Character.AI platform — one for teens and one for adults.

Evolving the Model Experience

Often, when an LLM generates sensitive or inappropriate content, it does so because a user prompts it to try to elicit that kind of response. To reduce the likelihood of such responses, we use two tactics: we take technical steps to block inappropriate model outputs, and we take separate technical steps to block inappropriate user inputs.

Model Outputs: We employ classifiers to identify specific types of content in the model’s responses. These classifiers help us enforce policies that filter out sensitive content. As part of our safety-related changes, we have added new classifiers, and strengthened existing ones, for users under 18.

User Inputs: While much of our focus is on the model's output, we are also taking significant steps to improve detection, response, and intervention related to inputs from all users . This is critical because inappropriate user inputs are often what leads a language model to generate inappropriate outputs. For example, if we detect that a user has submitted content that violates our Terms of Service or Community Guidelines, that content will be blocked from their conversation with the Character. In certain cases where we detect that the content contains language referencing suicide or self-harm, we will also surface a specific pop-up directing users to the National Suicide Prevention Lifeline.

New Features

We are also in the process of rolling out several new features, including:

Parental Controls: This feature will give parents insight into their child's experience on Character.AI, including time spent on the platform and the Characters they interact with most frequently. We aim to launch this feature across the platform in Q1. This is the first iteration of this feature, and we plan to continue evolving these controls to provide parents with additional tools.
Time Spent Notification: Users will receive a notification after completing an hour-long session on the platform. While this feature will eventually have customizable options for adult users, users under 18 will have more limits on their ability to modify it.
Prominent Disclaimers: Engaging with Characters on our site should be interactive and entertaining, but it’s important for our users to remember that Characters are not real people. We have evolved our disclaimer, which is present on every chat, to remind users that the chatbot is not a real person and that what the model says should be treated as fiction. Additionally, for any Characters created by users with the words “psychologist,” “therapist,” “doctor,” or other similar terms in their names, we have included additional language making it clear that users should not rely on these Characters for any type of professional advice.

Safety Partners

We are also collaborating with several teen online safety experts to help ensure that the under-18 experience is designed with safety as a top priority. These experts include ConnectSafely, an organization with nearly twenty years of experience educating people about online safety, privacy, security and digital wellness. We’ll consult our partner organizations as part of our safety by design process as we are developing new features, and they also will provide their perspective on our existing product experience.

At Character.AI, we are committed to fostering a safe environment for all our users. To meet that commitment we recognize that our approach to safety must evolve alongside the technology that drives our product — creating a platform where creativity and exploration can thrive without compromising safety. To get this right, safety must be infused in all we do here at Character.AI. This suite of changes is part of our long-term commitment to continuously improve our policies and our product.