(04-06-2020, 07:57 AM)jim_chapman Wrote: [ -> ] (04-05-2020, 07:30 PM)jwhitten Wrote: [ -> ]Just out of curiosity, how are you handling the 'read aloud' part now? Do you 'chunk it up' into discrete amounts, or just keep shoving text until it blocks?
It's a bit intricate - but basically the algorithm is:
1) select the next paragraph of text
2) construct an SSML string for the selected paragraph of text, including 'marker' elements to show the location of any page-breaks in the text
3) submit the string to the speech engine
4) whenever the speech engine raises a 'marker reached' event, switch the page as necessary, and re-apply the 'selected text' overlay
5) when the speech engine raises an 'end of utterance' event, figure out whether I am meant to keep reading; if so, select the next paragraph and go to (2)
Complicating factors include the facts that:- In UWP, you need to set up a MediaPlayer separately from the speech engine, to play the speech sound stream and some of those events come from the MediaPlayer object, not the speech engine, and sometimes a MediaPlayer will freeze if fed two speech streams immediately one after another (so I build a round-robin pool of MediaPlayers), and some of these actions must be done on the UI thread, while others must not be.
- In Android, there are many different speech engines that might be installed, and they have different features. None of them provides 'marker' functionality, but some of them provide a 'progress' event, which triggers repeatedly during the utterance (typically at the end of each phoneme) - so if the speech engine provides that feature, I can use that to do logic like step (4). Of course, there is no reliable declarative mechanism to determine whether a particular Android speech engine does or does not support the feature, so I have to test it dynamically. If the feature isn't supported, then I need to use a work-around (basically, I read one sentence at a time, rather than one paragraph at a time).
Issues like these would lead me to believe that adding another speech engine (and moreover one that is in the cloud) might not be entirely straightforward ;-)
Hi Jim,
I did do some digging into the Windows UWP / SAPI stuff and you are 100% correct-- what a cluster-F*** that is. I have an inkling of the issues that you face with the Windows TTS subsystem. No question there.
BTW, if you'll look below the message, I've rounded up a number of .NET-related examples and stuff that might help you.
Bear in mind that I wouldn't be bugging you about this except for the fact that FREDA is ficken' awesome!
I'm not a rich guy, but would a $100 bounty sweeten the pot?
Also, if you would tell me more about your .Net setup-- are you using Mono by chance? And/or could pull out enough of your code to show me the problem-- give me something I can compile and noodle with, I don't mind helping insofar as I can to do some testing or something? I recognize that you might not want to do that, and I understand if you don't. I've never done .NET programming specifically, but I've got lots of years in other languages.
I also get your "issues" in terms of chunking it up, dealing with the potentially multiple audio streams, and I presume one of your biggest issues is to get the thing to simply shut up if the user cuts it off mid-stream??
What do you think about the idea of a companion app that could offload these types of jobs via some sort of local interface? That would have minimal impact on your existing FREDA application while permitting experimentation / augmentation on an as-needed basis via the companion app?
Just spit-balling here. Trying to see if there's any road that could lead to glory...
Also, FWIW, I'm not married to the Amazon Polly voices, I just like them best. If IBM Watson, GoogleTTS, or any of the others would work better-- works for me. Just looking for anything that isn't MS David, Mark or Zira ;-) (Or any of their other crappy voices)
And one last thing-- I've investigated the possibility of using commercial TTS voices, but I haven't found any that solve the problem-- that FREDA finds to use. I still have inquiries out to a couple of vendors, but I'm not holding my breath based on my findings thus far.
Thanks
John Whitten
I found these documents detailing .NET access to Amazon Polly.
AmazonPollyClient - (.NET) -
https://docs.aws.amazon.com/sdkfornet/v3...lient.html
And a number of .NET examples and libraries:
"
Top 20 NuGet Polly Packages" -
https://nugetmusthaves.com/Tag/polly
This link has a complete example (see snippet below) "
Using Amazon Polly From .NET / C#, Get MP3 File" -
https://chrisbitting.com/2017/04/07/usin...-mp3-file/
(A snippet from above document)
AmazonPollyClient pc = new AmazonPollyClient();
SynthesizeSpeechRequest sreq = new SynthesizeSpeechRequest();
sreq.Text = "Your Sample Text Here";
sreq.OutputFormat = OutputFormat.Mp3;
sreq.VoiceId = VoiceId.Amy;
SynthesizeSpeechResponse sres = pc.SynthesizeSpeech(sreq);
using (var fileStream = File.Create(@"c:\yourfile.mp3"))
{
sres.AudioStream.CopyTo(fileStream);
fileStream.Flush();
fileStream.Close();
}
Another .NET Example:
Another .NET / C# Example I found on GitHub: - "
boriz/AmazonPollyTester" -
https://github.com/boriz/AmazonPollyTester
Other Resources:
I found this which looks interesting, but while it mentions .NET here and there, the examples are all in Python. But otherwise a good read for interacting with AWS / Polly in general -
https://catalogimages.wiley.com/images/d...xcerpt.pdf
This one appears to be the formal document for developing .NET applications, in general, though certainly, Polly would be a subset - "
Developing and Deploying .NET Applications on AWS" (
From Amazon) -
https://d1.awsstatic.com/whitepapers/dev...on_aws.pdf