In this project, I'm trying three different methods to generate text to speech. See inside and compare the methods to match text on the stage with spoken words. What are the differences?