Promptfoo Tutorials - Search News

AI memory is really a database problem

If we want to avoid making AI agents a huge new attack surface, we’ve got to treat agent memory the way we treat databases: with firewalls, audits, and access privileges. The pace at which large ...

GitHub

[Feature] F1 score metric for tool calling evaluations

My use case requires a straightforward way to measure tool calling accuracy using F1 score, it would be great to have promptfoo support this. When testing LLMs that call tools/functions, I'd like to ...

GitHub

[Discussion Request] Tool calling as a more reliable grading output mechanic

I tried to search if this has already come up because I thought this is a rather intuitive use case for function / tool calling since we require an exact output JSON and not free form text, but I ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

AI memory is really a database problem

[Feature] F1 score metric for tool calling evaluations

[Discussion Request] Tool calling as a more reliable grading output mechanic

Trending now