Adaptive Phoneme State Learning Architecture for Enhanced Speech Recognition Using Backpropagation Neural Network and Hidden Markov Model

Siddalingappa, Rashmi; S, Deepa; Savitha, Margaret; P, Kalpana; Stella Mary I, Priya; Gornale, Shivanand; B A, Lakshmi; Li, Kefeng; Wen Goh, Khang

Adaptive Phoneme State Learning Architecture for Enhanced Speech Recognition Using Backpropagation Neural Network and Hidden Markov Model

Siddalingappa, Rashmi ORCID: https://orcid.org/0000-0001-9786-8436, S, Deepa, Savitha, Margaret, P, Kalpana, Stella Mary I, Priya, Gornale, Shivanand ORCID: https://orcid.org/0000-0001-5373-4049, B A, Lakshmi, Li, Kefeng and Wen Goh, Khang (2026) Adaptive Phoneme State Learning Architecture for Enhanced Speech Recognition Using Backpropagation Neural Network and Hidden Markov Model. F1000Research, 15. p. 338.

[thumbnail of 30ac2ef1-fed4-4e3d-a4d8-21a35cf462d9_f1000res177414.pdf]

Preview

Text
30ac2ef1-fed4-4e3d-a4d8-21a35cf462d9_f1000res177414.pdf - Published Version
Available under License Creative Commons Attribution.
| Preview

Official URL: https://doi.org/10.12688/f1000research.177414.1

Abstract

Speech remains a primary mode of human communication; however, automated speech recognition (ASR) systems face challenges from accent variability, temporal fluctuations, noise, and data privacy concerns. This paper proposes an enhanced ASR architecture incorporating an Adaptive Phoneme State Learning (APSL) algorithm with a Backpropagation Neural Network (BPNN) and Hidden Markov Model (HMM). APSL dynamically adjusts HMM state probabilities using phoneme confidence scores derived from the BPNN, thereby improving phoneme transition modeling and alignment. The multi-stage ASR pipeline includes noise reduction, speech-pause detection, and feature extraction via framing and windowing. APSL’s adaptive mechanism reduces ambiguities in phoneme transitions, resulting in a more accurate speech-to-text conversion. A comparative evaluation framework assesses the baseline HMM, standalone BPNN, and integrated APSL-BPNN-HMM model. Experiments were conducted using a custom-built dataset of 2000 audio files alongside five benchmark corpora: BNC, ANC, COCA, Buckeye, and Emu. Key evaluation metrics—recall, precision, F-score, and Word Error Rate (WER)—demonstrate that the APSL-enhanced model significantly outperforms baseline systems, achieving 95.7% recall, 92.95% precision, 94.53% F-score, and 96% overall accuracy. Notably, APSL-BPNN-HMM consistently yielded the lowest WER across all datasets, validating its effectiveness. This work highlights the benefits of adaptive learning in probabilistic frameworks for achieving robust and accurate speech recognition.

Item Type:	Article
Status:	Published
DOI:	10.12688/f1000research.177414.1
School/Department:	York Business School
URI:	https://ray.yorksj.ac.uk/id/eprint/14817

University Staff: Request a correction | RaY Editors: Update this record

Altmetric

View Altmetric information about this item.

CORE (COnnecting REpositories)

Tools

Deposit and Record Details

ID Code:	14817
Depositing User:	Siddalingappa, Rashmi
Deposited On:	19 May 2026 08:58
Last Modified:	20 Jun 2026 09:45