ai evals claudecode claude tech
I ran an eval that called clod code on a bunch of different tasks, but I ran into an issue where because clod code was running as a subprocess, the traces weren't actually showing up in my eval. So this image of my trace shows the problem, it says run clod agent, but I don't actually see any of what the clod agent did. So first I grabbed my current brain trust span and got the span ID, root span ID, and experiment ID. I passed those three values as environment variables to my clod code subprocess. Inside clod code, a hook reads those environment variables. When it creates the root span for the clod code session, it sets the root span ID as a parent's root span. Basically it's telling brain trust that this trace belongs under that trace. And this is what it looks like with the change. And this way we can see all the LLM calls with their input and output as well as the command line execution.
Summary
The video discusses resolving tracing issues in clod code evaluations by using environment variables to manage span IDs, enabling better visibility of LLM calls and executions.
Key Points
- Clod code was evaluated on various tasks but faced tracing issues.
- Traces from the subprocess were not visible in the evaluation.
- Environment variables were used to pass span IDs to the subprocess.
- A hook in clod code reads these variables for trace management.
- The root span ID is set as a parent's root span for tracking.
- This change allows visibility of all LLM calls and executions.
Tags
Repurpose Ideas
- LinkedIn post: Steps to manage tracing in AI evaluations
- Tweet: How to pass environment variables for better tracing
- Checklist: Ensure visibility of LLM calls in evaluations
Save videos. Search everything.
Build your personal library of inspiration. Find any quote, hook, or idea in seconds.
Create Free Account No credit card required