Siddalingappa, Rashmi ORCID: https://orcid.org/0000-0001-9786-8436, S, Deepa, Savitha, Margaret, P, Kalpana, Stella Mary I, Priya, Gornale, Shivanand
ORCID: https://orcid.org/0000-0001-5373-4049, B A, Lakshmi, Li, Kefeng and Wen Goh, Khang
(2026)
Adaptive Phoneme State Learning Architecture for Enhanced Speech Recognition Using Backpropagation Neural Network and Hidden Markov Model.
F1000Research, 15.
p. 338.
Preview |
Text
30ac2ef1-fed4-4e3d-a4d8-21a35cf462d9_f1000res177414.pdf - Published Version Available under License Creative Commons Attribution. | Preview |
Abstract
Speech remains a primary mode of human communication; however, automated speech recognition (ASR) systems face challenges from accent variability, temporal fluctuations, noise, and data privacy concerns. This paper proposes an enhanced ASR architecture incorporating an Adaptive Phoneme State Learning (APSL) algorithm with a Backpropagation Neural Network (BPNN) and Hidden Markov Model (HMM). APSL dynamically adjusts HMM state probabilities using phoneme confidence scores derived from the BPNN, thereby improving phoneme transition modeling and alignment. The multi-stage ASR pipeline includes noise reduction, speech-pause detection, and feature extraction via framing and windowing. APSL’s adaptive mechanism reduces ambiguities in phoneme transitions, resulting in a more accurate speech-to-text conversion. A comparative evaluation framework assesses the baseline HMM, standalone BPNN, and integrated APSL-BPNN-HMM model. Experiments were conducted using a custom-built dataset of 2000 audio files alongside five benchmark corpora: BNC, ANC, COCA, Buckeye, and Emu. Key evaluation metrics—recall, precision, F-score, and Word Error Rate (WER)—demonstrate that the APSL-enhanced model significantly outperforms baseline systems, achieving 95.7% recall, 92.95% precision, 94.53% F-score, and 96% overall accuracy. Notably, APSL-BPNN-HMM consistently yielded the lowest WER across all datasets, validating its effectiveness. This work highlights the benefits of adaptive learning in probabilistic frameworks for achieving robust and accurate speech recognition.
| Item Type: | Article |
|---|---|
| Status: | Published |
| DOI: | 10.12688/f1000research.177414.1 |
| School/Department: | York Business School |
| URI: | https://ray.yorksj.ac.uk/id/eprint/14817 |
University Staff: Request a correction | RaY Editors: Update this record
Altmetric
Altmetric