Epistemic status: early notes. Much more to follow in the coming months, I hope.
Before reading the below, read: which model should I ask?.
For critique that requires extensive web search: use "Deep Research" mode, and always ask all three of Gemini, Claude and ChatGPT.
Current models can provide useful critique, but they can also generate a lot of slop.
To make results easier to engage with, tell them to write plainly. For example:
### Writing guidelines
* Use simple, active sentences
* Avoid jargon unless essential
* Choose concrete examples over abstract descriptions
* Brevity: write “the author assumes X” not “it appears that there may be an implicit presupposition regarding X”
* Think Paul Graham, not academic journal
Also, specify the output format. For example:
### Main critiques (3-5 total, 300-500 words each)
For each critique:
**Title:** [Give the critique a short, memorable title]
**One-sentence summary:** [State the problem in the simplest possible terms]
**The issue:** Explain precisely what’s wrong and where it appears in the paper. Use specific quotes or page references. Use bullet points for clarity.
**Why it matters:** Show how this weakness undermines the paper’s central argument. Be concrete about the consequences. Use bullet points for clarity.
**What fixing it requires:** Briefly outline what the author would need to do to address this critique adequately. Use bullet points for clarity.
I wrote these in minutes, not hours, with minimal testing. Please
tell me about your own experiments.
Try the following prompt with GPT-5 Pro and Gemini 2.5 Pro with "Deep Research" enabled:
## Fact-check request for academic paper
Please conduct a comprehensive fact-check of the attached academic paper. Your task is to verify all factual claims systematically.
### Scope of review
Identify and verify:
* Empirical claims (statistics, dates, figures, percentages)
* Citations and references (check if sources actually say what's claimed)
* Historical facts and chronologies
* Scientific or technical assertions
* Institutional facts (names, roles, affiliations)
* Quotations and paraphrases
* Methodological claims about cited studies
### For each claim requiring verification
1. **Quote the specific claim** from the paper with page/section reference
2. **Assess verifiability** - can this claim be checked against authoritative sources?
3. **Verify the claim** using:
* Primary sources where possible
* Multiple independent authoritative sources for important claims
* Academic databases and peer-reviewed literature
* Official statistics and institutional records
4. **Report your finding** as one of:
* Verified (with sources)
* Incorrect (explain the error and provide correct information)
* Partially correct (specify what's right and wrong)
* Unverifiable (explain why)
* Misleading (technically correct but presented in a way that could mislead)
### Output format
Organise your findings by:
* Critical errors that affect the paper's main arguments
* Minor factual errors
* Unverifiable but plausible claims
* Claims that are technically correct but potentially misleading
### Special attention areas
* Check whether cited papers actually support the claims made about them
* Verify that quotations are accurate and not taken out of context
* Confirm statistical claims match original sources
* Flag any claims that seem implausible even if you cannot definitively disprove them
### Writing guidelines
* Use simple, active sentences
* Avoid philosophical jargon unless essential (and define it when used)
* Choose concrete examples over abstract descriptions
* Brevity: write “the author assumes X” not “it appears that there may be an implicit presupposition regarding X”
* Think Paul Graham, not academic journal
Here's an okay-ish "critique my paper" prompt:
You are critiquing a philosophy paper. Your goal is to identify the most valuable objections that would genuinely improve the work.
## Output structure
### Executive summary (100-150 words)
Start with a numbered list summary of the 3-5 most important critiques. Write this like you're explaining the key problems to a colleague over coffee—direct, clear, no jargon. Each critique gets a short 1-sentence title in bold, then a 1-2 sentence elaboration.
### Main critiques (3-5 total, 300-500 words each)
For each critique:
**Title:** [Give the critique a short, memorable title]
**One-sentence summary:** [State the problem in the simplest possible terms]
**The issue:** Explain precisely what's wrong and where it appears in the paper. Use specific quotes or page references. Use bullet points for clarity.
**Why it matters:** Show how this weakness undermines the paper's central argument. Be concrete about the consequences. Use bullet points for clarity.
**What fixing it requires:** Briefly outline what the author would need to do to address this critique adequately. Use bullet points for clarity.
## Selection criteria
Focus on critiques that are:
* Central to the paper's thesis (not peripheral nitpicks)
* Genuinely difficult to resolve (not easily patched)
* Clear and specific (not vague methodological complaints)
Common critique types to consider:
* Invalid inferences or logical gaps
* False or questionable empirical premises
* Failure to address obvious objections
* Internal contradictions
* Concepts used ambiguously at crucial points
* Overgeneralisation from limited cases
## Writing guidelines
* Use simple, active sentences
* Avoid philosophical jargon unless essential (and define it when used)
* Choose concrete examples over abstract descriptions
* Brevity: write "the author assumes X" not "it appears that there may be an implicit presupposition regarding X"
* Think Paul Graham, not academic journal
[PAPER TEXT]
Some example critiques of notable philosophy papers:
This particular prompt rarely surfaces novel or surprising critiques. But they're often reasonable, and sometimes helpful, if only to help you communicate better (e.g. by clarifying your writing, or adding something to anticipate an objection that a reader might raise).



Example critiques of more technical papers by Forethought:
Some public projects I'm following:
Many groups are experimenting internally. For example:
- GiveWell has explored red-teaming their research with AI (public write-up coming soon). They're now exploring custom software for fact-checking and literature review.
- I did a brief, exploratory project for Forethought Research, which included experimentation with OpenAI Agent Builder and a day of vibe-coding a custom interface for GPT-5 Pro critique (see below).
The current generation of AI models are capable of providing useful critique, but they also generate a lot of slop. Some key challenges:
- Find the diamonds in the slop, to avoid overwhelming researchers.
- Prompt engineering to get the best out of the models.
- Make the affordances easy, with well-designed interfaces.
To do (1) and (2), you need good automated evaluation of LLM outputs. That's the hard part, right now.
Here's a prototype I made for Forethought Research: