How to define a code evaluator

Code evaluators in the LangSmith UI allow you to write custom evaluation logic using Python or TypeScript code directly in the interface. Unlike LLM-as-a-judge evaluators that use a model to evaluate outputs, code evaluators use deterministic logic you define.

To define code evaluators programmatically using the SDK, refer to How to define a code evaluator (SDK).

Step 1. Create the evaluator

Create an evaluator from one of the following pages in the LangSmith UI:
- In the playground or from a dataset: Select the + Evaluator button.
- From a tracing project: Select Add rules, configure your rule and select Apply evaluator.
Select Create custom code evaluator from the evaluator type options.

Step 2. Write your evaluator code

In the Add Custom Code Evaluator page, define your evaluation logic using Python or TypeScript. Your evaluator function must be named perform_eval and should:

Accept run and example parameters.
Access data via run['inputs'], run['outputs'], and example['outputs'].
Return a dictionary with your metric name as the key.

Function signature

def perform_eval(run, example):
    # Access the data
    inputs = run['inputs']
    outputs = run['outputs']
    reference_outputs = example['outputs']  # Optional: reference/expected outputs

    # Your evaluation logic here
    score = ...

    # Return a dict with your metric name
    return {"metric_name": score}

Example: Exact match evaluator

def perform_eval(run, example):
    """Check if the answer exactly matches the expected answer."""
    actual = run['outputs']['answer']
    expected = example['outputs']['answer']

    is_correct = actual == expected
    return {"exact_match": is_correct}

Example: Concision evaluator

def perform_eval(run, example):
    """Score how concise the answer is. 1 is most concise, 5 is least concise."""
    answer = run['outputs']['answer']
    score = min(len(answer) // 1000, 4) + 1

    return {"concision_score": score}

Example: Input-based evaluator

def perform_eval(run, example):
    """Check if the input text contains toxic language."""
    text = run['inputs'].get('text', '').lower()
    toxic_words = ["idiot", "stupid", "hate", "awful"]

    is_toxic = any(word in text for word in toxic_words)
    return {"is_toxic": is_toxic}

Step 3. Configure the evaluator

Name and description

Give your evaluator a clear name that describes what it measures (e.g., “Exact Match”, “Concision Score”).

Feedback configuration

Configure how the score should be interpreted:

Boolean: True/false feedback
Categorical: String values representing categories
Continuous: Numerical scoring within a range

Step 4. Test and save

Preview your evaluator on example data to ensure it works as expected
Click Save to make the evaluator available for use

Use your code evaluator

Once created, you can use your code evaluator:

When running evaluations from the playground
As part of a dataset to automatically run evaluations on experiments
When running online evaluations

LLM-as-a-judge evaluator (UI): Use an LLM to evaluate outputs
Composite evaluators: Combine multiple evaluator scores

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Datasets

Set up evaluations

Analyze experiment results

Annotation & human feedback

Common data types

How to define a code evaluator

Step 1. Create the evaluator

Step 2. Write your evaluator code

Function signature

Example: Exact match evaluator

Example: Concision evaluator

Example: Input-based evaluator

Step 3. Configure the evaluator

Name and description

Feedback configuration

Step 4. Test and save

Use your code evaluator

Datasets

Set up evaluations

Analyze experiment results

Annotation & human feedback

Common data types

​Step 1. Create the evaluator

​Step 2. Write your evaluator code

​Function signature

​Example: Exact match evaluator

​Example: Concision evaluator

​Example: Input-based evaluator

​Step 3. Configure the evaluator

​Name and description

​Feedback configuration

​Step 4. Test and save

​Use your code evaluator

​Related

Step 1. Create the evaluator

Step 2. Write your evaluator code

Function signature

Example: Exact match evaluator

Example: Concision evaluator

Example: Input-based evaluator

Step 3. Configure the evaluator

Name and description

Feedback configuration

Step 4. Test and save

Use your code evaluator

Related