Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Add Amazon Polly Support **PUH-LEEEEZZE** !!! :-)
#1
Hello,

First of all, I **LOVE** Freda. ( Heart Heart Heart Heart ) It is a very nice program. I have tried a zillion different e-readers and keep coming back to Freda, IMO it has the best mix of features and is the easiest to use. I use it to write books with.

The one biggest request that I have is would you PLEEEEEEEEASE consider adding support for Amazon Polly? I know that Freda already has support for the built-in (and incredibly monotonous) windows voices, but Amazon Polly voices are so much nicer. They're technically not free, though they do give away the first year for free, but the cost is so low it's trivial-- and if you use the read-aloud feature, the voices are well worth it.

Here is their Info Page: 
https://aws.amazon.com/polly/

Here's Basic info about their API:
https://aws.amazon.com/polly/features/?nc=sn&loc=3

Here's their full API reference:
https://docs.aws.amazon.com/polly/latest...rence.html


I've looked over the API and I'm guessing that it wouldn't be too terribly difficult to bolt it on. You'd have to add some variables to store the user's AWS Polly ID and probably some voice setup stuff.

Big Grin Big Grin  I would be delighted (and ecstatic) to be your BETA TESTER for this feature!!!  Big Grin Big Grin

I hope you will consider it, the Amazon Polly voices rock!!

Thanks

JWhitten
Reply
#2
Thank you for the suggestion, and for looking into the implementation details.  It is certainly something that I would want to keep an eye on.  But just now, I think it's the commercial aspect that would be the big problem.  If I understand Amazon's pricing model, I would need to pay several dollars, whenever a Freda user got the program to read one book aloud, using a Polly voice.  So imagining that, in the course of one month, all my users might between them read a hundred books aloud in Freda, I am facing a cost of several thousand dollars a year.

I can't just swallow that cost, but it would be tricky to find a way to pass it on to the people who are actually using the Polly voices.  It would mean adding a new pay-per-use feature to the app, and then trying to balance it so that it did actually recover the right amount of money.  Then there is also the work involved in API integration.  My past experience of integrating text-to-speech APIs is that even the simple ones take hours to get right, and with Amazon Polly I have the added complication of a dependency on an AWS cloud service.  That brings in all sorts of error cases (e.g. any API call might fail because the internet connection is unavailable or interrupted).

That adds up to quite a lot of work, and at the end of the day I am not sure that many people would actually use it, once they realised they had to pay for it (I think that the cost of the feature would come out at around ten cents per hour of reading).

So right now, I am afraid it doesn't look like something I'll be doing soon.  But please do feel free to get back to me if you see some solution that I am missing - and thanks, in any case, for your engagement!
Reply
#3
(04-02-2020, 07:51 AM)jim_chapman Wrote: Thank you for the suggestion, and for looking into the implementation details.  It is certainly something that I would want to keep an eye on.  But just now, I think it's the commercial aspect that would be the big problem.  If I understand Amazon's pricing model, I would need to pay several dollars, whenever a Freda user got the program to read one book aloud, using a Polly voice.  So imagining that, in the course of one month, all my users might between them read a hundred books aloud in Freda, I am facing a cost of several thousand dollars a year.

I can't just swallow that cost, but it would be tricky to find a way to pass it on to the people who are actually using the Polly voices.  It would mean adding a new pay-per-use feature to the app, and then trying to balance it so that it did actually recover the amount the right amount of money.  Then there is also the work involved in API integration.  My past experience of integrating text-to-speech APIs is that even the simple ones take hours to get right, and with Amazon Polly I have the added complication of a dependency on an AWS cloud service.  That brings in all sorts of error cases (e.g. any API call might fail because the internet connection is unavailable or interrupted).

That adds up to quite a lot of work, and at the end of the day I am not sure that many people would actually use it, once they realised they had to pay for it (I think that the cost of the feature would come out at around ten cents per hour of reading).

So right now, I am afraid it doesn't look like something I'll be doing soon.  But please do feel free to get back to me if you see some solution that I am missing - and thanks, in any case, for your engagement!


I think you could simply embed the feature (ability) into the program and then let the readers themselves buy their own subscriptions and simply marry their ID to the program. If they buy a license (subscription) then they can use it if they don't, they can't.

I'll double-check, but I think that model would work okay. I don't see how that's any different than doing it from the command line, for instance. When I signed up for Polly I had to register, give my payment details, create my account ID, etc. And then when I use the command line, I include my key and it processes the data.
Reply
#4
I see what you have in mind - but it does look like it might be quite difficult to do.  From a quick look at the AWS and Polly documentation, I didn't find any easily digestible instructions for how a .NET (UWP and Xamarin) app should carry out authentication and construct the endpoint for invoking the Polly APIs.  It is obviously possible, but the only proper sample code I code find was for Python and Java, and even that seemed to be missing some of the important steps.  So making it work in .NET might be a bit of a research project.  Implementing the Freda file-browsers for DropBox, GoogleDrive, and Synology were all research projects of similar complexity, and they all required some days or weeks of work.  To that we add the work to deal with all the tricky asynchronous logic for launching, tracking and interrupting speech playback, and dealing with all the various edge cases and error cases.

I think you can tell that I'm not keen ;-)

But honestly it sounds like a fair amount of work to provide a feature that's not going to be useful to many of my users (until/unless a lot of people start setting up personal AWS accounts and subscribing to Polly voices for their personal use).  So, given my current knowledge of what would be involved, I am not going to prioritise this feature.

Sorry not to be more helpful.
Reply
#5
(04-05-2020, 06:00 PM)jim_chapman Wrote: I see what you have in mind - but it does look like it might be quite difficult to do.  From a quick look at the AWS and Polly documentation, I didn't find any easily digestible instructions for how a .NET (UWP and Xamarin) app should carry out authentication and construct the endpoint for invoking the Polly APIs.  It is obviously possible, but the only proper sample code I code find was for Python and Java, and even that seemed to be missing some of the important steps.  So making it work in .NET might be a bit of a research project.  Implementing the Freda file-browsers for DropBox, GoogleDrive, and Synology were all research projects of similar complexity, and they all required some days or weeks of work.  To that we add the work to deal with all the tricky asynchronous logic for launching, tracking and interrupting speech playback, and dealing with all the various edge cases and error cases.

I think you can tell that I'm not keen ;-)

But honestly it sounds like a fair amount of work to provide a feature that's not going to be useful to many of my users (until/unless a lot of people start setting up personal AWS accounts and subscribing to Polly voices for their personal use).  So, given my current knowledge of what would be involved, I am not going to prioritise this feature.

Sorry not to be more helpful.


Just out of curiosity, how are you handling the 'read aloud' part now? Do you 'chunk it up' into discrete amounts, or just keep shoving text until it blocks?
Reply
#6
(04-05-2020, 07:30 PM)jwhitten Wrote: Just out of curiosity, how are you handling the 'read aloud' part now? Do you 'chunk it up' into discrete amounts, or just keep shoving text until it blocks?

It's a bit intricate - but basically the algorithm is:
1) select the next paragraph of text
2) construct an SSML string for the selected paragraph of text, including 'marker' elements to show the location of any page-breaks in the text
3) submit the string to the speech engine
4) whenever the speech engine raises a 'marker reached' event, switch the page as necessary, and re-apply the 'selected text' overlay
5) when the speech engine raises an 'end of utterance' event, figure out whether I am meant to keep reading; if so, select the next paragraph and go to (2)

Complicating factors include the facts that:
  • In UWP, you need to set up a MediaPlayer separately from the speech engine, to play the speech sound stream and some of those events come from the MediaPlayer object, not the speech engine, and sometimes a MediaPlayer will freeze if fed two speech streams immediately one after another (so I build a round-robin pool of MediaPlayers), and some of these actions must be done on the UI thread, while others must not be.
  • In Android, there are many different speech engines that might be installed, and they have different features.  None of them provides 'marker' functionality, but some of them provide a 'progress' event, which triggers repeatedly during the utterance (typically at the end of each phoneme) - so if the speech engine provides that feature, I can use that to do logic like step (4).  Of course, there is no reliable declarative mechanism to determine whether a particular Android speech engine does or does not support the feature, so I have to test it dynamically.  If the feature isn't supported, then I need to use a work-around (basically, I read one sentence at a time, rather than one paragraph at a time).

Issues like these would lead me to believe that adding another speech engine (and moreover one that is in the cloud) might not be entirely straightforward ;-)
Reply
#7
[attachment=17 Wrote:jim_chapman pid='11017' dateline='1585813914']Revisiting this question briefly-- when I originally asked it, I was thinking about the possibility of adding the Amazon Polly API into Freda to interact with Amazon Polly directly, via Amazon.

BUT -- in thinking it over... When you install Amazon Polly on your system, the voices get installed into the Windows Text-to-Speech system, meaning that if you use the drop-down, the Amazon Polly voices are listed (see screenshots).

But when I go into Freda, all I see are the three default voices that Microsoft delivered. Why isn't Freda showing the Amazon Polly voices? Did you just hard-code the MS default voices?

I guess my question is, Why isn't Freda working with the Amazon Polly voices as installed??

Thanks!!

John Whitten


   

   


Thank you for the suggestion, and for looking into the implementation details.  It is certainly something that I would want to keep an eye on.  But just now, I think it's the commercial aspect that would be the big problem.  If I understand Amazon's pricing model, I would need to pay several dollars, whenever a Freda user got the program to read one book aloud, using a Polly voice.  So imagining that, in the course of one month, all my users might between them read a hundred books aloud in Freda, I am facing a cost of several thousand dollars a year.

I can't just swallow that cost, but it would be tricky to find a way to pass it on to the people who are actually using the Polly voices.  It would mean adding a new pay-per-use feature to the app, and then trying to balance it so that it did actually recover the amount the right amount of money.  Then there is also the work involved in API integration.  My past experience of integrating text-to-speech APIs is that even the simple ones take hours to get right, and with Amazon Polly I have the added complication of a dependency on an AWS cloud service.  That brings in all sorts of error cases (e.g. any API call might fail because the internet connection is unavailable or interrupted).

That adds up to quite a lot of work, and at the end of the day I am not sure that many people would actually use it, once they realised they had to pay for it (I think that the cost of the feature would come out at around ten cents per hour of reading).

So right now, I am afraid it doesn't look like something I'll be doing soon.  But please do feel free to get back to me if you see some solution that I am missing - and thanks, in any case, for your engagement!
Reply
#8
(08-06-2020, 06:55 PM)jwhitten Wrote: Revisiting this question briefly-- when I originally asked it, I was thinking about the possibility of adding the Amazon Polly API into Freda to interact with Amazon Polly directly, via Amazon.

BUT -- in thinking it over... When you install Amazon Polly on your system, the voices get installed into the Windows Text-to-Speech system, meaning that if you use the drop-down, the Amazon Polly voices are listed (see screenshots).

But when I go into Freda, all I see are the three default voices that Microsoft delivered. Why isn't Freda showing the Amazon Polly voices? Did you just hard-code the MS default voices?

I guess my question is, Why isn't Freda working with the Amazon Polly voices as installed??

Thanks!!

John Whitten

Text-to-speech in Windows is a messy business - and not all installed voices are available to all applications. Moreover there are specific limitations that apply to UWP (Universal Windows Platform) apps like Freda. (If you want to explore this mess, you can Google terms like UWP, SAPI 5 Voice, Core Voice and the like - and look into the distinction between 32-bit and 64-bit voices ... and find various dangerous suggestions for messing with your machine's registry. But the bottom line is: just because some installer utility says that a certain voice is 'installed' that doesn't mean that Freda can use it).

Freda gets its list of available text-to-speech voices by doing: "from voice in SpeechSynthesizer.AllVoices select voice". So it's working from a list of all the voices that the Windows 10 Operating System has decided to make available to UWP applications. I don't know why the Amazon Polly voices are not on that list but (per the 'messy business' point) there are many possible reasons for it - and only Amazon could tell us what is going on. I did a quick search of their on-line documentation, and did not find anything informative.
Reply
#9
Thanks for checking. I really like the Amazon Polly voices and it's the one thing I wish Freda had / could support that would "make my life complete" (I have simple needs ;-) ) I know it's not your "fault" in any sense, I just wish there was a way to figure it out.

Alternately-- any of the better voices from any of the TTS services would be better than nothing. I can keep looking. I was exploring Watson voices the other day. I might sign up and see if they can do any better-- I'm guessing not though, or else they would likely already be out there in other readers.

Thanks again

John
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)