Turnipsoft Forums - Add Amazon Polly Support **PUH-LEEEEZZE** !!! :-)

Hello,

First of all, I **LOVE** Freda. ( Heart

) It is a very nice program. I have tried a zillion different e-readers and keep coming back to Freda, IMO it has the best mix of features and is the easiest to use. I use it to write books with.

The one biggest request that I have is would you PLEEEEEEEEASE consider adding support for Amazon Polly? I know that Freda already has support for the built-in (and incredibly monotonous) windows voices, but Amazon Polly voices are so much nicer. They're technically not free, though they do give away the first year for free, but the cost is so low it's trivial-- and if you use the read-aloud feature, the voices are well worth it.

Here is their Info Page:
https://aws.amazon.com/polly/

Here's Basic info about their API:
https://aws.amazon.com/polly/features/?nc=sn&loc=3

Here's their full API reference:
https://docs.aws.amazon.com/polly/latest...rence.html

I've looked over the API and I'm guessing that it wouldn't be too terribly difficult to bolt it on. You'd have to add some variables to store the user's AWS Polly ID and probably some voice setup stuff.

Big Grin

I would be delighted (and ecstatic) to be your BETA TESTER for this feature!!! Big Grin

I hope you will consider it, the Amazon Polly voices rock!!

Thanks

JWhitten

Thank you for the suggestion, and for looking into the implementation details. It is certainly something that I would want to keep an eye on. But just now, I think it's the commercial aspect that would be the big problem. If I understand Amazon's pricing model, I would need to pay several dollars, whenever a Freda user got the program to read one book aloud, using a Polly voice. So imagining that, in the course of one month, all my users might between them read a hundred books aloud in Freda, I am facing a cost of several thousand dollars a year.

I can't just swallow that cost, but it would be tricky to find a way to pass it on to the people who are actually using the Polly voices. It would mean adding a new pay-per-use feature to the app, and then trying to balance it so that it did actually recover the right amount of money. Then there is also the work involved in API integration. My past experience of integrating text-to-speech APIs is that even the simple ones take hours to get right, and with Amazon Polly I have the added complication of a dependency on an AWS cloud service. That brings in all sorts of error cases (e.g. any API call might fail because the internet connection is unavailable or interrupted).

That adds up to quite a lot of work, and at the end of the day I am not sure that many people would actually use it, once they realised they had to pay for it (I think that the cost of the feature would come out at around ten cents per hour of reading).

So right now, I am afraid it doesn't look like something I'll be doing soon. But please do feel free to get back to me if you see some solution that I am missing - and thanks, in any case, for your engagement!

(04-02-2020, 07:51 AM)jim_chapman Wrote: [ -> ]Thank you for the suggestion, and for looking into the implementation details. It is certainly something that I would want to keep an eye on. But just now, I think it's the commercial aspect that would be the big problem. If I understand Amazon's pricing model, I would need to pay several dollars, whenever a Freda user got the program to read one book aloud, using a Polly voice. So imagining that, in the course of one month, all my users might between them read a hundred books aloud in Freda, I am facing a cost of several thousand dollars a year.

I can't just swallow that cost, but it would be tricky to find a way to pass it on to the people who are actually using the Polly voices. It would mean adding a new pay-per-use feature to the app, and then trying to balance it so that it did actually recover the amount the right amount of money. Then there is also the work involved in API integration. My past experience of integrating text-to-speech APIs is that even the simple ones take hours to get right, and with Amazon Polly I have the added complication of a dependency on an AWS cloud service. That brings in all sorts of error cases (e.g. any API call might fail because the internet connection is unavailable or interrupted).

That adds up to quite a lot of work, and at the end of the day I am not sure that many people would actually use it, once they realised they had to pay for it (I think that the cost of the feature would come out at around ten cents per hour of reading).

So right now, I am afraid it doesn't look like something I'll be doing soon. But please do feel free to get back to me if you see some solution that I am missing - and thanks, in any case, for your engagement!

I think you could simply embed the feature (ability) into the program and then let the readers themselves buy their own subscriptions and simply marry their ID to the program. If they buy a license (subscription) then they can use it if they don't, they can't.

I'll double-check, but I think that model would work okay. I don't see how that's any different than doing it from the command line, for instance. When I signed up for Polly I had to register, give my payment details, create my account ID, etc. And then when I use the command line, I include my key and it processes the data.

I see what you have in mind - but it does look like it might be quite difficult to do. From a quick look at the AWS and Polly documentation, I didn't find any easily digestible instructions for how a .NET (UWP and Xamarin) app should carry out authentication and construct the endpoint for invoking the Polly APIs. It is obviously possible, but the only proper sample code I code find was for Python and Java, and even that seemed to be missing some of the important steps. So making it work in .NET might be a bit of a research project. Implementing the Freda file-browsers for DropBox, GoogleDrive, and Synology were all research projects of similar complexity, and they all required some days or weeks of work. To that we add the work to deal with all the tricky asynchronous logic for launching, tracking and interrupting speech playback, and dealing with all the various edge cases and error cases.

I think you can tell that I'm not keen ;-)

But honestly it sounds like a fair amount of work to provide a feature that's not going to be useful to many of my users (until/unless a lot of people start setting up personal AWS accounts and subscribing to Polly voices for their personal use). So, given my current knowledge of what would be involved, I am not going to prioritise this feature.

Sorry not to be more helpful.

(04-05-2020, 06:00 PM)jim_chapman Wrote: [ -> ]I see what you have in mind - but it does look like it might be quite difficult to do. From a quick look at the AWS and Polly documentation, I didn't find any easily digestible instructions for how a .NET (UWP and Xamarin) app should carry out authentication and construct the endpoint for invoking the Polly APIs. It is obviously possible, but the only proper sample code I code find was for Python and Java, and even that seemed to be missing some of the important steps. So making it work in .NET might be a bit of a research project. Implementing the Freda file-browsers for DropBox, GoogleDrive, and Synology were all research projects of similar complexity, and they all required some days or weeks of work. To that we add the work to deal with all the tricky asynchronous logic for launching, tracking and interrupting speech playback, and dealing with all the various edge cases and error cases.

I think you can tell that I'm not keen ;-)

But honestly it sounds like a fair amount of work to provide a feature that's not going to be useful to many of my users (until/unless a lot of people start setting up personal AWS accounts and subscribing to Polly voices for their personal use). So, given my current knowledge of what would be involved, I am not going to prioritise this feature.

Sorry not to be more helpful.

Just out of curiosity, how are you handling the 'read aloud' part now? Do you 'chunk it up' into discrete amounts, or just keep shoving text until it blocks?

(04-05-2020, 07:30 PM)jwhitten Wrote: [ -> ]Just out of curiosity, how are you handling the 'read aloud' part now? Do you 'chunk it up' into discrete amounts, or just keep shoving text until it blocks?

It's a bit intricate - but basically the algorithm is:
1) select the next paragraph of text
2) construct an SSML string for the selected paragraph of text, including 'marker' elements to show the location of any page-breaks in the text
3) submit the string to the speech engine
4) whenever the speech engine raises a 'marker reached' event, switch the page as necessary, and re-apply the 'selected text' overlay
5) when the speech engine raises an 'end of utterance' event, figure out whether I am meant to keep reading; if so, select the next paragraph and go to (2)

Complicating factors include the facts that:

In UWP, you need to set up a MediaPlayer separately from the speech engine, to play the speech sound stream and some of those events come from the MediaPlayer object, not the speech engine, and sometimes a MediaPlayer will freeze if fed two speech streams immediately one after another (so I build a round-robin pool of MediaPlayers), and some of these actions must be done on the UI thread, while others must not be.
In Android, there are many different speech engines that might be installed, and they have different features. None of them provides 'marker' functionality, but some of them provide a 'progress' event, which triggers repeatedly during the utterance (typically at the end of each phoneme) - so if the speech engine provides that feature, I can use that to do logic like step (4). Of course, there is no reliable declarative mechanism to determine whether a particular Android speech engine does or does not support the feature, so I have to test it dynamically. If the feature isn't supported, then I need to use a work-around (basically, I read one sentence at a time, rather than one paragraph at a time).

Issues like these would lead me to believe that adding another speech engine (and moreover one that is in the cloud) might not be entirely straightforward ;-)

[attachment=17 Wrote:jim_chapman pid='11017' dateline='1585813914']Revisiting this question briefly-- when I originally asked it, I was thinking about the possibility of adding the Amazon Polly API into Freda to interact with Amazon Polly directly, via Amazon.

BUT -- in thinking it over... When you install Amazon Polly on your system, the voices get installed into the Windows Text-to-Speech system, meaning that if you use the drop-down, the Amazon Polly voices are listed (see screenshots).

But when I go into Freda, all I see are the three default voices that Microsoft delivered. Why isn't Freda showing the Amazon Polly voices? Did you just hard-code the MS default voices?

I guess my question is, Why isn't Freda working with the Amazon Polly voices as installed??

Thanks!!

John Whitten

[attachment=15]

[attachment=17]

Thank you for the suggestion, and for looking into the implementation details. It is certainly something that I would want to keep an eye on. But just now, I think it's the commercial aspect that would be the big problem. If I understand Amazon's pricing model, I would need to pay several dollars, whenever a Freda user got the program to read one book aloud, using a Polly voice. So imagining that, in the course of one month, all my users might between them read a hundred books aloud in Freda, I am facing a cost of several thousand dollars a year.

I can't just swallow that cost, but it would be tricky to find a way to pass it on to the people who are actually using the Polly voices. It would mean adding a new pay-per-use feature to the app, and then trying to balance it so that it did actually recover the amount the right amount of money. Then there is also the work involved in API integration. My past experience of integrating text-to-speech APIs is that even the simple ones take hours to get right, and with Amazon Polly I have the added complication of a dependency on an AWS cloud service. That brings in all sorts of error cases (e.g. any API call might fail because the internet connection is unavailable or interrupted).

That adds up to quite a lot of work, and at the end of the day I am not sure that many people would actually use it, once they realised they had to pay for it (I think that the cost of the feature would come out at around ten cents per hour of reading).

So right now, I am afraid it doesn't look like something I'll be doing soon. But please do feel free to get back to me if you see some solution that I am missing - and thanks, in any case, for your engagement!

(08-06-2020, 06:55 PM)jwhitten Wrote: [ -> ]Revisiting this question briefly-- when I originally asked it, I was thinking about the possibility of adding the Amazon Polly API into Freda to interact with Amazon Polly directly, via Amazon.

BUT -- in thinking it over... When you install Amazon Polly on your system, the voices get installed into the Windows Text-to-Speech system, meaning that if you use the drop-down, the Amazon Polly voices are listed (see screenshots).

But when I go into Freda, all I see are the three default voices that Microsoft delivered. Why isn't Freda showing the Amazon Polly voices? Did you just hard-code the MS default voices?

I guess my question is, Why isn't Freda working with the Amazon Polly voices as installed??

Thanks!!

John Whitten

Text-to-speech in Windows is a messy business - and not all installed voices are available to all applications. Moreover there are specific limitations that apply to UWP (Universal Windows Platform) apps like Freda. (If you want to explore this mess, you can Google terms like UWP, SAPI 5 Voice, Core Voice and the like - and look into the distinction between 32-bit and 64-bit voices ... and find various dangerous suggestions for messing with your machine's registry. But the bottom line is: just because some installer utility says that a certain voice is 'installed' that doesn't mean that Freda can use it).

Freda gets its list of available text-to-speech voices by doing: "from voice in SpeechSynthesizer.AllVoices select voice". So it's working from a list of all the voices that the Windows 10 Operating System has decided to make available to UWP applications. I don't know why the Amazon Polly voices are not on that list but (per the 'messy business' point) there are many possible reasons for it - and only Amazon could tell us what is going on. I did a quick search of their on-line documentation, and did not find anything informative.

Thanks for checking. I really like the Amazon Polly voices and it's the one thing I wish Freda had / could support that would "make my life complete" (I have simple needs ;-) ) I know it's not your "fault" in any sense, I just wish there was a way to figure it out.

Alternately-- any of the better voices from any of the TTS services would be better than nothing. I can keep looking. I was exploring Watson voices the other day. I might sign up and see if they can do any better-- I'm guessing not though, or else they would likely already be out there in other readers.

Thanks again

John

(04-06-2020, 07:57 AM)jim_chapman Wrote: [ -> ]
(04-05-2020, 07:30 PM)jwhitten Wrote: [ -> ]Just out of curiosity, how are you handling the 'read aloud' part now? Do you 'chunk it up' into discrete amounts, or just keep shoving text until it blocks?

It's a bit intricate - but basically the algorithm is:
1) select the next paragraph of text
2) construct an SSML string for the selected paragraph of text, including 'marker' elements to show the location of any page-breaks in the text
3) submit the string to the speech engine
4) whenever the speech engine raises a 'marker reached' event, switch the page as necessary, and re-apply the 'selected text' overlay
5) when the speech engine raises an 'end of utterance' event, figure out whether I am meant to keep reading; if so, select the next paragraph and go to (2)

Complicating factors include the facts that:
In UWP, you need to set up a MediaPlayer separately from the speech engine, to play the speech sound stream and some of those events come from the MediaPlayer object, not the speech engine, and sometimes a MediaPlayer will freeze if fed two speech streams immediately one after another (so I build a round-robin pool of MediaPlayers), and some of these actions must be done on the UI thread, while others must not be.

In Android, there are many different speech engines that might be installed, and they have different features. None of them provides 'marker' functionality, but some of them provide a 'progress' event, which triggers repeatedly during the utterance (typically at the end of each phoneme) - so if the speech engine provides that feature, I can use that to do logic like step (4). Of course, there is no reliable declarative mechanism to determine whether a particular Android speech engine does or does not support the feature, so I have to test it dynamically. If the feature isn't supported, then I need to use a work-around (basically, I read one sentence at a time, rather than one paragraph at a time).

Issues like these would lead me to believe that adding another speech engine (and moreover one that is in the cloud) might not be entirely straightforward ;-)

Hi Jim,

I did do some digging into the Windows UWP / SAPI stuff and you are 100% correct-- what a cluster-F*** that is. I have an inkling of the issues that you face with the Windows TTS subsystem. No question there.

BTW, if you'll look below the message, I've rounded up a number of .NET-related examples and stuff that might help you.

Bear in mind that I wouldn't be bugging you about this except for the fact that FREDA is ficken' awesome! Big Grin

I'm not a rich guy, but would a $100 bounty sweeten the pot?

Also, if you would tell me more about your .Net setup-- are you using Mono by chance? And/or could pull out enough of your code to show me the problem-- give me something I can compile and noodle with, I don't mind helping insofar as I can to do some testing or something? I recognize that you might not want to do that, and I understand if you don't. I've never done .NET programming specifically, but I've got lots of years in other languages.

I also get your "issues" in terms of chunking it up, dealing with the potentially multiple audio streams, and I presume one of your biggest issues is to get the thing to simply shut up if the user cuts it off mid-stream??

What do you think about the idea of a companion app that could offload these types of jobs via some sort of local interface? That would have minimal impact on your existing FREDA application while permitting experimentation / augmentation on an as-needed basis via the companion app?

Just spit-balling here. Trying to see if there's any road that could lead to glory...

Also, FWIW, I'm not married to the Amazon Polly voices, I just like them best. If IBM Watson, GoogleTTS, or any of the others would work better-- works for me. Just looking for anything that isn't MS David, Mark or Zira ;-) (Or any of their other crappy voices)

And one last thing-- I've investigated the possibility of using commercial TTS voices, but I haven't found any that solve the problem-- that FREDA finds to use. I still have inquiries out to a couple of vendors, but I'm not holding my breath based on my findings thus far.

Thanks

John Whitten

I found these documents detailing .NET access to Amazon Polly.

AmazonPollyClient - (.NET) - https://docs.aws.amazon.com/sdkfornet/v3...lient.html

And a number of .NET examples and libraries:

"Top 20 NuGet Polly Packages" - https://nugetmusthaves.com/Tag/polly

This link has a complete example (see snippet below) "Using Amazon Polly From .NET / C#, Get MP3 File" - https://chrisbitting.com/2017/04/07/usin...-mp3-file/

(A snippet from above document)

AmazonPollyClient pc = new AmazonPollyClient();

SynthesizeSpeechRequest sreq = new SynthesizeSpeechRequest();
sreq.Text = "Your Sample Text Here";
sreq.OutputFormat = OutputFormat.Mp3;
sreq.VoiceId = VoiceId.Amy;
SynthesizeSpeechResponse sres = pc.SynthesizeSpeech(sreq);

using (var fileStream = File.Create(@"c:\yourfile.mp3"))
{
sres.AudioStream.CopyTo(fileStream);
fileStream.Flush();
fileStream.Close();
}

Another .NET Example:

Another .NET / C# Example I found on GitHub: - "boriz/AmazonPollyTester" - https://github.com/boriz/AmazonPollyTester

Other Resources:

I found this which looks interesting, but while it mentions .NET here and there, the examples are all in Python. But otherwise a good read for interacting with AWS / Polly in general - https://catalogimages.wiley.com/images/d...xcerpt.pdf

This one appears to be the formal document for developing .NET applications, in general, though certainly, Polly would be a subset - "Developing and Deploying .NET Applications on AWS" (From Amazon) - https://d1.awsstatic.com/whitepapers/dev...on_aws.pdf