(In Peer Review - ID 1598979 @Frontiers in Medicine) Automatic Extraction of SmPC document for IDMP data model construction using Open-Source Foundation Model LLM RAG: A preliminary experiment for Pharmaceutical Regulatory Affairs.
The research paper, titled “Automatic Extraction of SmPC Document for IDMP Data Model Construction using Open-Source Foundation Model LLM RAG: A Preliminary Experiment for Pharmaceutical Regulatory Affairs,” addresses the development and preliminary testing of an automated method for extracting data from Summary of Product Characteristics (SmPC) documents. This process is critical for building and populating the Identification of Medicinal Products (IDMP) data model, which is a standardized data structure required by regulatory agencies for managing and exchanging medicinal product information.
The study leverages advanced open-source Large Language Models (LLMs) using Retrieval-Augmented Generation (RAG) techniques to automatically extract, organize, and format essential information from SmPC documents. By implementing an open-source, foundation model-based approach, the research aims to provide a scalable and accessible solution for pharmaceutical companies and regulatory bodies, facilitating regulatory compliance and reducing the manual effort needed to align data with IDMP standards. This work represents a preliminary exploration of how LLMs can be effectively utilized in pharmaceutical regulatory affairs, particularly in constructing structured data models for regulatory submissions.
Recommended citation: EU ISO IDMP IG Chapter 2: Data elements for the electronic submission of information on medicinal products for human use. European Medicines Agency, v2.1.1.