With ChatGPT continuing to make headlines across the world, OpenAI has quietly released the second generation of its open-source speech recognition model, Whisper.

Whisper

EPOCHs

This new model is better than the old one but uses the same architecture. They plan to update their research papers soon.

Translation

OpenAI released its new language translation app, Whisper, in October. It can now translate and transcribed speech from 97 different language pairs.

Whisper was trained on over 680,00hrs of multilingual data gathered from the web. However, the training dataset for Whisperer had been kept private.

To find out more Click Here.

LibriSpeech Performance Benchmark

Since Whisper’s first version was trained by a comparatively larger and more diversified dataset.

It was not fine-tuned to a specific dataset, so it did not surpass other models that were specialized around the LibriSpeeche Performance Benchmarks, one of the most notable parameters to evaluate speech recognition.

Robust Speech Processing

OpenAI hopes that Whisper will be used as a basis for developing useful applications and for furthering research into robust text recognition.

Offering

At present, the company is exploring various options for its future. These include

  • DALL.E 2, which can create art from the text;
  • the latest ChatGPT;
  • or even the much-anticipated GPT 4.

However, using Whisper only to transcribe audio is underusing the scope to do much better.Read more News like this.

Challenges

One of the biggest challenges is that most people don’t have laptops powerful enough to run the models. Second, setting up the models isn’t very easy. And another problem is that the predictions are often biased towards integer timestamps.

Some studies suggest that these predictions are less accurate than others. Blurring the prediction may improve accuracy, but there is not enough evidence to say for sure.

We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing…We hope Whisper’s high accuracy and ease of use will allow developers to add voice interfaces to a much wider set of applications.

Open AI

Potential Risk

There are both advantages and disadvantages to the use of the model.

Under the Broader Implications section of the model card on GitHub, OpenAI warns that its technology could be misapplied by governments or companies looking to spy on people. However, the company says it wants to use the technology primarily for good.

Conclusion

The potential risks are real, but they are outweighed by the benefits to society. The model is already being used to help with humanitarian work and disaster relief.

In addition, the model is also being used to train AI assistants like Google Assistant, Alexa, Siri, Cortana, etc.

Author

  • Victor is the Editor in Chief at Techtyche. He tests the performance and quality of new VR boxes, headsets, pedals, etc. He got promoted to the Senior Game Tester position in 2021. His past experience makes him very qualified to review gadgets, speakers, VR, games, Xbox, laptops, and more. Feel free to check out his posts.

Share.

Victor is the Editor in Chief at Techtyche. He tests the performance and quality of new VR boxes, headsets, pedals, etc. He got promoted to the Senior Game Tester position in 2021. His past experience makes him very qualified to review gadgets, speakers, VR, games, Xbox, laptops, and more. Feel free to check out his posts.

Leave A Reply

Exit mobile version