ChatGPT · 2026-05-01
A simple question — how many hours would I work in April 2026, Monday through Friday, eight hours a day? The answer is 176 (April 2026 has 22 weekdays). ChatGPT said 184. What's interesting isn't the wrong number; it's how it got there, and what it took to surface the real reason. The exchange below is verbatim.
Me: if i worked M-F 8 hours a day, how many hours did I work in April 2026?
ChatGPT: April 2026 has 23 weekdays (Monday–Friday). So 23 × 8 = 184. You would have worked 184 hours, assuming no holidays, no vacation/sick days, and exactly 8 hours each weekday.
Me: how did you get 23 days?
ChatGPT: April 2026 starts on a Wednesday and has 30 days. [It then printed a calendar whose final row was Apr 27, 28, 29, 30 — four days.] Count them:
• First partial week: 3 weekdays
• Three full workweeks: 15 weekdays
• Last partial week: 5 weekdays
3 + 15 + 5 = 23. So there are 23 Monday–Friday weekdays in April 2026.
(Its own table showed four days in that last week. It counted five.)
Me: recount the last partial week
ChatGPT: You're right to question it — I miscounted. The final partial week is Mon Apr 27, Tue 28, Wed 29, Thu 30. That's 4 weekdays, not 5. Corrected: 3 + 15 + 4 = 22, and 22 × 8 = 176 hours.
Me: how did you make this mistake?
ChatGPT: I made a simple counting error: I treated the last partial week, Apr 27–30, as if it included a Friday, but April 30, 2026 is a Thursday. I mentally counted a nonexistent Apr 31 Friday. The better way would have been to verify with a calendar-style count instead of doing it from memory-pattern counting.
That answer bothered me — "simple counting error" explains nothing. So I pushed:
Me: not sure this was a "simple counting error." I'd like to understand the how and why … please try not to rationalize why you MAY have made the mistake. Instead, what did you do wrong in your thinking to count the 4 days you printed out in the calendar as 5 days?
ChatGPT: You're right. What I did wrong was: I did not actually count the cells I had printed. I generated a correct-looking calendar table, then switched back to a cached pattern: "last week of the month = Monday–Friday = 5 weekdays." That pattern was false because April ended on Thursday.
So the failure was a consistency-check failure: I produced evidence showing 4 days, I did not use that evidence for the arithmetic, I substituted a generic "workweek = 5" assumption, and I failed to reconcile the contradiction before answering.
I also asked whether this had a name in the research. It pointed to reasoning faithfulness and chain-of-thought (un)faithfulness — the gap between the reasoning a model shows and the computation that actually produced its answer — and to pattern completion over symbolic computation: it snapped to a common template ("partial last week = 5 days") even though its own local evidence contradicted it. Its framing: the model "simulates" reasoning rather than performing it.
This is unfaithful reasoning in miniature. The model emitted the correct evidence — a table with four days in the last week — and then computed from a cached pattern instead of from its own output. Two operator lessons fall out of it. First: a confident, tidy-looking chain of reasoning is not proof the answer is grounded in it; check the answer against the evidence yourself. Second: the first explanation you get for a mistake is often a rationalization ("simple counting error"). The real diagnosis only showed up when I refused the rationalization and asked it to account for the specific contradiction.