To define code evaluators programmatically using the SDK, refer to How to define a code evaluator (SDK).
Step 1. Create the evaluator
-
Create an evaluator from one of the following pages in the LangSmith UI:
- In the playground or from a dataset: Select the + Evaluator button.
- From a tracing project: Select Add rules, configure your rule and select Apply evaluator.
- Select Create custom code evaluator from the evaluator type options.
Step 2. Write your evaluator code
In the Add Custom Code Evaluator page, define your evaluation logic using Python or TypeScript. Your evaluator function must be namedperform_eval and should:
- Accept
runandexampleparameters. - Access data via
run['inputs'],run['outputs'], andexample['outputs']. - Return a dictionary with your metric name as the key.
Function signature
Example: Exact match evaluator
Example: Concision evaluator
Example: Input-based evaluator
Step 3. Configure the evaluator
Name and description
Give your evaluator a clear name that describes what it measures (e.g., “Exact Match”, “Concision Score”).Feedback configuration
Configure how the score should be interpreted:- Boolean: True/false feedback
- Categorical: String values representing categories
- Continuous: Numerical scoring within a range
Step 4. Test and save
- Preview your evaluator on example data to ensure it works as expected
- Click Save to make the evaluator available for use
Use your code evaluator
Once created, you can use your code evaluator:- When running evaluations from the playground
- As part of a dataset to automatically run evaluations on experiments
- When running online evaluations
Related
- LLM-as-a-judge evaluator (UI): Use an LLM to evaluate outputs
- Composite evaluators: Combine multiple evaluator scores