Trying to detect poisoned images is the wrong approach. Include them in the training set and the training process itself will eventually correct for it.
I think if you build more robust features
Diffusion approaches etc. do not involve any conscious "building" of features in the first place. The features are trained by training the net to match images with text features correctly, and then "just" repeatedly predict how to denoise an image to get closer to a match with the text features. If the input includes poisoned images, so what? It's no different than e.g. compression artifacts, or noise.
These tools all try to counter models trained without images using them in the training set with at most fine-tuning, but all they show is that models trained without having seen many images using that particular tool will struggle.
But in reality, the massive problem with this is that we'd expect any such tool that becomes widespread to be self-defeating, in that they become a source for images that will work their way into the models at a sufficient volume that the model will learn them. In doing so they will make the models more robust against noise and artifacts, and so make the job harder for the next generation of these tools.
In other words, these tools basically act like a manual adversarial training source, and in the long run the main benefit coming out of them will be that they'll prod and probe at failure modes of the models and help remove them.
The age matters less than the power-dynamics of her being his nanny.