AI-Assisted Research Pipeline

The Goal

The point is not the pipeline — it's what happens when you sit down to write. I wanted to be able to ask a question about my field and get back a real answer — grounded in my own papers, cross-referenced against the broader literature, with actual citations I could verify. Not "the literature suggests," but a synthesis I could trust enough to put in a grant.

That's what this system does. A Claude Code session can search semantically across ~1,000 full-text scientific papers, query the entire PubMed database, cross-reference findings, identify gaps in the evidence, and recommend specific papers to add. It turns grant writing from a memory exercise into a conversation with an AI that has actually read your papers.

What a Research Session Looks Like

Say I'm writing a specific aims page and I need to know: "What's the evidence that FGF21 acts in the hindbrain vs. the hypothalamus?" The system searches my vault semantically, finds the relevant papers, reads their key findings, checks PubMed for recent work I might not have, synthesizes across sources with specific citations, flags where the evidence is contested, and tells me which papers I should add to strengthen the argument.

The output has passage-level citations — not "the literature suggests" but "Laeger et al. (2014, JCI) showed that..." I use this workflow for R01 grant writing, manuscript drafting, and literature review. It's the closest thing I've found to having a collaborator who has read everything and remembers all of it.

The Infrastructure

PDFs go in, searchable knowledge comes out. The pipeline runs automatically — conversion, enrichment, indexing, mapping — so the knowledge base stays current without manual effort.

How Papers Are Understood

Each paper gets more than keyword indexing. An LLM reads the full text and extracts specific key findings — stated as precise experimental results, not the vague summaries you get from abstracts — along with factual tags and relationships to other papers in the collection. The goal is that when I ask a question, the system can point to the actual result in the actual paper, not a paraphrase.

Tags aren't imposed from a fixed taxonomy. They emerge from the papers themselves and reuse existing vocabulary when appropriate. Over 35 topic clusters evolve automatically — a nightly review process proposes splits, merges, and new clusters, with human approval before changes take effect. The knowledge map grows with the collection.

Where New Papers Come From

The vault is fed from two sources: my Endnote library (a career's worth of papers, bulk-imported) and an autonomous agent that surfaces new papers weekly via the Research Digest. When something on the digest catches my eye, I can select it for full-text retrieval — and once it's in the vault, it's automatically converted, enriched, indexed, and available in the next research session.

The goal is to close this loop completely: the agent finds a paper, I approve it, and full text flows straight into the vault. Today there's still a manual export step, and full-text access is unreliable for paywalled journals (publishers really don't want you automating this). But the architecture is designed so that when reliable full-text access exists, papers flow from discovery to indexed knowledge without intervention.

The Inbox Pipeline

A separate but related system for the ideas that hit at inconvenient times. I capture a quick thought on my phone — a project idea, a connection between papers, something I overheard at a seminar — and it lands in a staging area. Every other day, an LLM picks up each capture, researches it (competitive landscape, technical feasibility, connections to things I'm already working on), and writes a review document.

I read the review, mark which items to act on, and the pipeline executes. Same security compartmentalization as the agent architecture: the stage that reads the internet can't modify my files, and the stage that writes to the knowledge base has no internet access. Paranoid? Maybe. But it means I can let the system research freely without worrying about what it might overwrite.