Computational text analysis on unstructured police data: a scoping review
Wilson Lukmanjaya, Christina Halmich, Tony Butler, Darren Cook & George Karystianis (2026): Computational text analysis on unstructured police data: a scoping review In: Crime Science
Introduction: Police reports made following attendance at various events (e.g., crashes, domestic violence, theft) often contain rich contextual details including indicators of mental health issues or abuse types, and persons/entities involved and their relationships, which are not typically captured in structured administrative data, interviews or official statistics. However, the sheer volume of information along with strict data access protocols render manual analysis impractical. Computational text analysis methods offer a feasible and effective approach to automatically process this underutilized data source.
Aim: This article is an overview of studies using computational text analysis (e.g., text mining, natural language processing (NLP)), on unstructured police data, serving as a guide for researchers interested in employing similar methodologies.
Methods: This scoping review was conducted in accordance with the PRISMA-SCR guidelines, following the two screening processes (title/abstract and full text screening) and the development of a pre-defined protocol. A search was conducted across seven electronic databases (ProQuest, IEEE Xplore, Scopus, PubMed, Web of Science, Criminal Justice Abstracts, Google Scholar) covering the past 20 years.
Results: A total of 5426 records were identified. After removing duplicate entries and screening titles/abstracts and full-text publications, 61 studies met the inclusion criteria. Included studies were published between 2004 and 2024, with most from the United States, Australia and the Netherlands. Most studies used opensource tools: Bidirectional Encoder Representations from Transformers (BERT), natural language tool kit (NLTK), scikit-learn, or General Architecture for Text Engineering (GATE) to analyze unstructured police data. Our review indicates applications of computational text analysis on unstructured police data have moderate to high performance. Common limitations included variable data quality, with reliability depending on the level of detail provided by the police report’s author, and failure to report ethical implications or methodological limitations.
Conclusions: Computational text analysis can extract key information from unstructured police data. However, future research should clearly report ethics approvals and implications, and methodological limitations. Establishing a structured data-sharing framework between law enforcement and researchers is also crucial to facilitate access and support high quality, impactful research in this field.