fameklion.blogg.se - Cost of google text to speech api

These kinds of API are often used in plagiarism detection software. To do this I decided to use a paid API that analyses 2 text sources and uses AI/ML to output a similarity score. The idea here is that rather than counting the number of errors, you check how similar the transcribed text is from the original transcript, using a certain criteria. The second method I used for measuring accuracy was to check text similarity. Amazon has a default model (which I used) and a niche medical model.

For my testing I used the video model because it seemed to be the most accurate one of the bunch, even though it’s a little bit more expensive than their default model. Models: Google has a few different models for different use cases: phone call, video, command and default. As one can imagine, this is a daunting task, because punctuation is sometimes subjective/ambiguous and even humans can listen to the same audio and punctuate it slightly differently. Punctuation: Although for Google this feature is only available in Beta, all 3 APIs have the ability to automatically add punctuation to transcribed text. Multichannel recognition & Speaker Diarization: This is the ability for ASR to distinguish when there are different sources of audio ( e.g Zoom conference call) or in the case of speaker diarization, to determine which speaker in the audio is saying what when there are multiple speakers. All 3 services offer this feature, which in turn allows them to generate time-stamped transcripts separated by speaker/channel. This can be extremely helpful when transcribing audio with sensitive data such as certain customer service conversations or recordings in the medical field. In addition, Amazon also has the option to filter out personally Identifiable information (PII). Content redacting and filtering: All 3 API offer the option of automatically filtering out profanity or inappropriate words from the transcription.Google lets you specify contexts with fields like phone number, address, currency, and dates to help with formatting those values (for example transcribing the words twenty twenty as 2020)Īmazon transcribe not only lets you specify the custom vocabulary to expect, but how it should be formatted in the transcript and what it will sound like. This can be especially useful for names of people or places that are not necessarily spelled the way they are pronounced. Google and Amazon go a step further by offering several extra options that make this feature more flexible and powerful. US).Ĭustom Vocabulary, Speech adaptation: All 3 services allow you to specify a custom vocabulary list which aids in the transcription of technical or domain-specific words/phrases as well as the spelling of names and other special words. Rev.ai currently only supports English, but this automatically includes variants of english (e.g UK vs. Languages: Google supports over 125 languages and variants, whereas Amazon Transcribe supports about 30 different languages and variants.