AI Summary of Peer-Reviewed Research

This page presents an AI-generated summary of a published research paper. The original authors did not write or review this article. [See full disclosure ↓]

Publishing process signals: STRONG — reflects the venue and review process. — venue and review process.

LLM consensus pipeline reduced review filtering effort

A person wearing a gray cardigan over a white collared shirt sits at a wooden desk, reviewing an open magazine or journal with black and white photographs and text, with another open document visible in the background.
Research area:Information retrievalMeta-analysis and systematic reviewsPipeline (software)

What the study found

The study found that a human-supervised pipeline using multiple large language models, combined with a consensus scheme, can reduce manual effort in filtering papers for systematic literature reviews. The authors report that the approach achieved lower error rates than single human annotators.

Why the authors say this matters

The authors say this matters because systematic literature reviews require analyzing large research fields, and the initial retrieval and filtering of papers is time-consuming and labor-intensive. They conclude that responsible human-AI collaboration can accelerate and improve systematic literature reviews, and that modern open-source models may make the method accessible and cost-effective.

What the researchers tested

The researchers proposed a pipeline that classifies papers using descriptive prompts and then decides jointly through consensus across multiple large language models. The process was human-supervised and controlled through an open-source visual analytics web interface called LLMSurver, which allowed real-time inspection and modification of model outputs.

What worked and what didn't

According to the abstract, the pipeline significantly reduced manual effort. It also showed lower error rates than single human annotators, and modern open-source models were sufficient for the task. The abstract does not report specific failure cases or detailed comparisons beyond these points.

What to keep in mind

The available summary does not describe detailed limitations, error modes, or boundary conditions. The evaluation used ground-truth data from one recent systematic literature review with 8,323 candidate papers, so the abstract only supports conclusions within that setting.

Key points

  • A multi-LLM, human-supervised consensus pipeline was proposed for filtering papers in systematic literature reviews.
  • The authors report lower error rates than single human annotators.
  • The approach significantly reduced manual effort in the review-filtering process.
  • Modern open-source models were reported as sufficient, suggesting the method may be cost-effective.
  • The evaluation used ground-truth data from a recent review with 8,323 candidate papers.

Disclosure

Research title:
LLM consensus pipeline reduced review filtering effort
Authors:
Lucas Joos, Daniel A. Keim, Maximilian T. Fischer
Institutions:
University of Konstanz
Publication date:
2026-02-16
OpenAlex record:
View
AI provenance: This post was generated by OpenAI. The original authors did not write or review this post.