Compare Agents

You have a real dataset and a real question, and little patience for benchmarks that miss your use case. So how do you pick the right agent when you are busy?

Meet Greyhound: run the same question on data you select across a lineup of agents, then compare results side by side. It is that simple: a clean CLI and (soon) open source.

Your question, your data, your choice

There are quite a few of general-purpose chatbots and agents: Gemini, GPT, Claude, or Manus. Specialized ones include Biomni, Edison Scientific, K-Dense, and ours from 20minds.

Greyhound allows your to compare and score (with LLM-as-a-judge) different agents side-by-side.

Get starting with Greyhound

git clone https://github.com/20minds/greyhound.git && cd greyhound

uv run init  # store your credentials locally once

uv run eval \
  --agents 20minds biomni edison k-dense gemini openai claude manus \
  --prompt "Answer to the ultimate question of life, the universe, and everything" \
  --expected-response "42" \
  --files bespoke_data.xlsx

uv run view --browser  # view the results

You do not have to guess. Have Greyhound race your agents and pick your winner.

https://github.com/20minds/greyhound