The counter argument is GPT-4. For the domains this machine has been trained on it has a large amount of generality - a large amount of capturing that real world complexity and dirtiness. Reinforcement learning can make it better.
Or in essence, if you collect colossal amounts of information, yes pirated from humans, and then choose what to do next by 'what would a human do', this does seem to solve the generality problem. You then fix your mistakes with RL updates when the machine fails on a real world task.
I am too dumb/autistic to know what you're conveying here.