Comparing Domain-Adapted Small LLMs and Large Zero-Shot Models for Efficient Hypothesis Graph Construction in Neurodegenerative Disease Research
Abstract
The rapid expansion of biomedical literature
has made it increasingly difficult to intersect findings
and uncover novel hypotheses, particularly within
neurodegenerative disease research. This work
presents a domain-adapted approach for hypothesis
extraction and synthesis using compact large language
models (Qwen3 0.6B–8B parameters). Through
parameter-efficient fine-tuning on a curated corpus of
open-access biomedical papers, these models work over
identifying, relate, and visualize research hypotheses as
interconnected knowledge graphs. Their performance
is evaluated against a larger, general-purpose models
in zero-shot conditions, demonstrating that smaller,
specialized models can achieve comparable or superior
interpretability and relevance. The study highlights
the potential of lightweight, domain-focused LLMs as
practical tools for accelerating discovery and improving
transparency in biomedical research.
has made it increasingly difficult to intersect findings
and uncover novel hypotheses, particularly within
neurodegenerative disease research. This work
presents a domain-adapted approach for hypothesis
extraction and synthesis using compact large language
models (Qwen3 0.6B–8B parameters). Through
parameter-efficient fine-tuning on a curated corpus of
open-access biomedical papers, these models work over
identifying, relate, and visualize research hypotheses as
interconnected knowledge graphs. Their performance
is evaluated against a larger, general-purpose models
in zero-shot conditions, demonstrating that smaller,
specialized models can achieve comparable or superior
interpretability and relevance. The study highlights
the potential of lightweight, domain-focused LLMs as
practical tools for accelerating discovery and improving
transparency in biomedical research.
Keywords
Large language models (LLMs), biomedical research, hypothesis extraction, parameter-efficient fine-tuning, LoRA, small language models, neurodegenerative diseases