The methodology sounds bizarrely complex to me for the purposes of establishing comparative information transfer rate.
Wouldn't just timing how long it takes to communicate a controlled set of information answer that?
I'm confused by the concept of establishing an average "bitrate per syllable" and multiplying that through. Is this trying to address cases where language constructs DEMAND additional information be encoded in speech? Can one not construct a set of information intended to be communicated that could account for those quirks? Find some "lowest common denominator" sentences?
I feel like I'm missing something and I'm very curious about what my faulty assumption is