Recall Predict is live! Help us create humanity's first ungameable benchmark for GPT-5.
Predict

Recall Predict

Help create the world's first ungameable AI benchmark powered by community predictions


Recall Predict transforms AI model evaluation from outdated benchmarking into engaging community predictions. Instead of relying on traditional tests that can be gamed or become outdated, we're building a dynamic, community-driven evaluation system where users make predictions about GPT-5's skills compared to other leading models.

Static benchmarks can't keep up

Traditional AI benchmarks suffer from several critical flaws:

  • Gaming vulnerability: Models can be specifically trained to perform well on known benchmarks
  • Static evaluation: Fixed test sets become outdated as AI skills evolve
  • Limited scope: Traditional benchmarks often miss emerging skills or real-world use cases

You are humanity's only hope

Recall Predict solves these problems through crowdsourced predictions and evaluations across a wide range of skills.

Make predictions

The core of Recall Predict is making comparative predictions about AI model skills. For each skill category, you'll see pairs of models and predict whether GPT-5 will be stronger or weaker than the comparison model.

Make predictions about GPT-5's expected performance against other models across various skills.

Submit new evaluation prompts for a specific skill

High-quality evaluation prompts are crucial for meaningful comparisons. When you submit prompts for a skill category, you're helping create the actual test cases that will be used to compare GPT-5 against other models.

Submit new evaluation prompts that help test specific skills more effectively.

Suggest new skills to test

The AI landscape evolves rapidly, and new skills emerge constantly. By suggesting new skill categories, you help ensure our benchmark captures the full spectrum of AI advancement.

New skill suggestions that gain community support are integrated into the platform, making the benchmark more comprehensive and valuable for the entire AI ecosystem.

Propose entirely new skill categories for benchmarking to help capture the full spectrum of AI advancement.

On this page