If we want to avoid making AI agents a huge new attack surface, we’ve got to treat agent memory the way we treat databases: with firewalls, audits, and access privileges. The pace at which large ...
My use case requires a straightforward way to measure tool calling accuracy using F1 score, it would be great to have promptfoo support this. When testing LLMs that call tools/functions, I'd like to ...
I tried to search if this has already come up because I thought this is a rather intuitive use case for function / tool calling since we require an exact output JSON and not free form text, but I ...