Data analysis: clean your CSVs, and use o3 or Gemini (not Claude)

Epistemic status: based on a single test, plus memory of previous results. I do "light" data analysis tasks every month, but not every week.

This week I did a personal finance review. The review required some simple spreadsheet analysis.

I sent the following prompt to o3, o3-pro, Claude 4 Opus (Thinking), Gemini 2.5 Pro, and Grok 3 (Thinking):

Prompt: Please calculate the capital gain on this sale from my Vanguard account. Use the "share matching" method.

Attachments: PDF of the sale transaction, and then a CSV of "buy" transactions prior to the sale.

Observations:

Initially, I uploaded a somewhat messy CSV that contained a bunch of irrelevant information. o3 and Gemini 2.5 Pro made mistakes while trying to identify the correct information.
With the cleaned CSV, o3, o3-pro, Gemini 2.5 Pro and Grok 3 (Thinking) did well and their numbers agreed with each other.
Claude 4 Opus failed. I sent the same prompt three times: first time it gave up, second time it threw a server error (Anthropic really struggle with capacity), and third time it gave incorrect results.
All the models (with possible exception of Grok ¹) used code, without an explicit prompt to do so.

My takeaways: clean your CSVs; don't use Claude.

Footnotes

The Grok UI did not show that it used code. It might have done under the hood. ↩