I've designed similar devices.
The epoxy blob likely plays factory-set chiptunes out the piezo element.
It's unlikely to play voices, just tones. The ones that record arbitrary audio use specific ICs from China, none of which I see there. Also those work with small but standard speakers, not generally piezo elements.
As a small window into how these are designed, a lot of the engineering goes into reducing power consumption and parts count. The button might for example drop the pin of an MCU low to trigger an interrupt that wakes it from deep sleep. Then it plays a tune stored in eeprom (more common) or internal flash (less common). Usually the chips are one-time-programmable and cost under 0.10$!