The talk explores into depth the need for accessibility with multimedia content such as videos for WCAG compliance and inclusion. The talk describes the lack of accessibility with online video and live streaming platform in terms of transcriptions and video description, making it harder for people with hearing, speech and visual impairments to access digital content.
It explores the journey of development of TranscribeIt, touching on aspects such as:
Performance optimization and timestamping with faster-whisper for working on CPU and low-spec machines.
Extending the platform to support generic audio and video content for transcription.
Combining speaker diarization with customization using pyannote.
Extending video content accessibility using video description, allowing correlation with transcriptions for people with visual impairments using FrameStory.
It touches on challenges encountered while developing the application in terms of:
WER for non-Latin languages, with emphasis on Indian languages, describing the challenges and scope of improvement with OpenAI Whisper and usage of fine-tuned models over whisperx (such as Whisper-Hindi-v2 for Indian content)
Striking balance between audio quality and transcription accuracy in processing real-world audio with noise reduction techniques
The talk finally touches on the future implementation and scope of improvement for improved accessibility in online platforms, making open-source accessibility software as the pioneer for digital inclusion, shaped by local LLMs for improved privacy.
Understanding the importance of multimedia accessibility and how it is linked to WCAG compliance
The process of developing TranscribeIt with technical process while being local-first and privacy respecting
Understanding the challenges in developing multimedia accessibility systems, allowing room for scope and discussion for future development
This seems like a helpful FOSS project people should know about