323
just p-hack it.
(mander.xyz)
A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.
Rules
This is a science community. We use the Dawkins definition of meme.
if they existed they'd be killer for RL. RL is insanely unstable when the distribution shifts as the policy starts exploring different parts of the state space. you'd think there'd be some clean approach to learning P(Xs|Ys) that can handle continuous shift of the Ys distribution in the training data, but there doesn't seem to be. just replay buffers and other kludges.