Arena

Can your models count carefully?

Run the same tiny precision test across local and API models.

fact_checkNew Evaluation
User input
Submitted from the Arena composer

Model outputs