Show HN: Phare: A Safety Probe for Large Language Models

arxiv.org

4 points by dberenstein1957 13 hours ago

We've just published a benchmark and accompanying paper on arXiv that challenges conventional leaderboard-driven LLM evaluation.

Phare focuses on factual reliability, prompt sensitivity, multilingual support, and how models handle false premises like issues that actually matter when you're building serious applications.

Some insights:

- Preference scores ≠ factual correctness.

- Framing effects can cause models to miss obvious falsehoods.

- Safety metrics like sycophancy and stereotype reproduction show surprising results across popular models.

Would love feedback from the community.