LLM consensus pipeline reduced review filtering effort

Research area:Information retrievalMeta-analysis and systematic reviewsPipeline (software)

What the study found

The study found that a human-supervised pipeline using multiple large language models, combined with a consensus scheme, can reduce manual effort in filtering papers for systematic literature reviews. The authors report that the approach achieved lower error rates than single human annotators.

Why the authors say this matters

The authors say this matters because systematic literature reviews require analyzing large research fields, and the initial retrieval and filtering of papers is time-consuming and labor-intensive. They conclude that responsible human-AI collaboration can accelerate and improve systematic literature reviews, and that modern open-source models may make the method accessible and cost-effective.

What the researchers tested

The researchers proposed a pipeline that classifies papers using descriptive prompts and then decides jointly through consensus across multiple large language models. The process was human-supervised and controlled through an open-source visual analytics web interface called LLMSurver, which allowed real-time inspection and modification of model outputs.

What worked and what didn't

According to the abstract, the pipeline significantly reduced manual effort. It also showed lower error rates than single human annotators, and modern open-source models were sufficient for the task. The abstract does not report specific failure cases or detailed comparisons beyond these points.

What to keep in mind

The available summary does not describe detailed limitations, error modes, or boundary conditions. The evaluation used ground-truth data from one recent systematic literature review with 8,323 candidate papers, so the abstract only supports conclusions within that setting.

Key points

A multi-LLM, human-supervised consensus pipeline was proposed for filtering papers in systematic literature reviews.
The authors report lower error rates than single human annotators.
The approach significantly reduced manual effort in the review-filtering process.
Modern open-source models were reported as sufficient, suggesting the method may be cost-effective.
The evaluation used ground-truth data from a recent review with 8,323 candidate papers.

Disclosure

Research title:: LLM consensus pipeline reduced review filtering effort
Authors:: Lucas Joos, Daniel A. Keim, Maximilian T. Fischer
Institutions:: University of Konstanz
Publication date:: 2026-02-16
DOI:: 10.1016/j.cag.2026.104537
OpenAlex record:: View

AI provenance: This post was generated by OpenAI. The original authors did not write or review this post.

LLM consensus pipeline reduced review filtering effort

What the study found

Why the authors say this matters

What the researchers tested

What worked and what didn't

What to keep in mind

Disclosure

More posts

Next-to-leading power terms can be significant in slepton pair production

Modular symmetry shapes quintessence and de Sitter vacua

BIR-Adapter reduces training needs for blind image restoration

Gamma-limit analysis of thin incompressible magnetoelastic shallow shells