Qiuyang Mang
  • About
  • Blog(current)
  • Publications
  • Talks
  • We Scored 100% on AI Benchmarks Without Solving a Single Problem

    AI benchmarks decide which models get funded, deployed, and trusted. We hacked 13 of them. 45 working exploits. Every benchmark rated critical. If the scores are fake, so is everything built on them — including your training data.

    11 min read   ·   April 2, 2026   ·   benchmark   evaluation   reward-hacking   AI safety   trustworthy     ·   research  

    image
  • Argus: Automated Discovery of Test Oracles for Database Management Systems Using LLMs

    We present Argus, a novel framework that uses LLMs to automatically discover test oracles for DBMS testing — finding 41 previously unknown bugs across 5 widely-used databases. Accepted at SIGMOD 2026.

    18 min read   ·   February 23, 2026   ·   database   testing   LLM   SQL   Auto Discovery     ·   research  

    image
© Copyright 2026 Qiuyang Mang. Last updated: May 24, 2026.