167
AI agents wrong ~70% of time: Carnegie Mellon study
(www.theregister.com)
This is a most excellent place for technology news and articles.
It doesn't matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.
Right, so this is really only useful in cases where either it's vastly easier to verify an answer than posit one, or if a conventional program can verify the result of the AI's output.
A human can review something close to correct a lot better than starting the task from zero.
It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.