Microsoft India releases ‘Speech Corpus’ for three Indian languages
Microsoft India on Thursday launched availability of Microsoft Indian language ‘Speech Corpus’. In Speech Corpus application, Microsoft will provide speech training and test data in many Indian languages including Telegu, Tamil and Gujarati. Microsoft Indian aimed to help researcher and academia build Indian language speech recognition for all application where speech is used.
This is one of the largest publicly available Indian language speech dataset which includes audio and corresponding transcripts, Microsoft said in a statement.
This Indian language “Speech Corpus” content provided by Microsoft Research Open Data initiative, a collection of free datasets from Microsoft Research to advance research in areas such as natural language processing, computer vision, and domain-specific sciences.
“We believe India’s increasing digital literacy needs to be supported by a multi-lingual digital world. Microsoft Indian Language Speech Corpus is an extension of our on-going efforts to reduce language barriers and empower Indians to harness the full potential of the Internet,” Sundar Srinivasan, General Manager, Artificial Intelligence and Research, Microsoft India quoted.
“Using our technology expertise, we want to accelerate innovation in voice-based computing for India by supporting researchers and academia,” Srinivasan quoted as saying.
Microsoft’s Indian Language Speech Corpus was tested at Interspeech 2018 conference in Hyderabad this month. In a Low Resource Speech Recognition Challenge, participants used data from Microsoft Indian language speech corpus to build Automatic Speech Recognition (ASR) systems.
The Microsoft India was able to create high-quality speech recognition models using this data, thus validating the efficacy of the speech Corpus, Microsoft said. Microsoft has been working with Indian languages for over two decades since they launched Project Bhasha in 1998, allowing users to input localised text easily and quickly using the Indian Language Input tool.
Also read: One Plus 6 T leak reveals In-Display fingerprint Sensor & mysterious date