Create your own LLM-as-a-judge evaluator
For complete control of evaluator logic, create your own LLM-as-a-judge evaluator and run it using the LangSmith SDK (Python / TypeScript). Requireslangsmith>=0.2.0
An LLM-as-a-judge evaluator consists of three key components:
- Evaluator function: A function that receives the example inputs and application outputs, then uses an LLM to score the quality. The function should return a boolean, number, string, or dictionary with score information.
- Target function: Your application logic being evaluated (wrapped with
@traceablefor observability). - Dataset and evaluation: A dataset of test examples and the
evaluate()function that runs your target function on each example and applies your evaluators.