Pokhrel, Sangita ORCID: https://orcid.org/0009-0008-2092-7029, K C, Bina and Shah, Prashant Bikram (2025) A Practical Application of Retrieval-Augmented Generation for Website-Based Chatbots: Combining Web Scraping, Vectorization, and Semantic Search. Journal of Trends in Computer Science and Smart Technology, 6 (4). pp. 424-442.
Preview |
Text
07.pdf - Published Version Available under License Creative Commons Attribution Non-commercial. | Preview |
Abstract
The Retrieval-Augmented Generation (RAG) model significantly enhances the capabilities of large language models (LLMs) by integrating information retrieval with text generation, which is particularly relevant for applications requiring context-aware responses based on dynamic data sources. This research study presents a practical implementation of a RAG model personalized for a Chabot that answers user inquiries from various specific websites. The methodology encompasses several key steps: web scraping using BeautifulSoup to extract relevant content, text processing to segment this content into manageable chunks, and vectorization to create embeddings for efficient semantic search. By employing a semantic search approach, the system retrieves the most relevant document segments based on user queries. The OpenAI API is then utilized to generate contextually appropriate responses from the retrieved information. Key results highlight the system's effectiveness in providing accurate and relevant answers, with evaluation metrics centered on response quality, retrieval efficiency, and user satisfaction. This research contributes a comprehensive integration of scraping, vectorization, and semantic search technologies into a cohesive chatbot application, offering valuable insights into the practical implementation of RAG models.
Item Type: | Article |
---|---|
Status: | Published |
DOI: | 10.36548/jtcsst.2024.4.007 |
School/Department: | London Campus |
URI: | https://ray.yorksj.ac.uk/id/eprint/11412 |
University Staff: Request a correction | RaY Editors: Update this record