AI-Powered Data Extraction Interface for Clinical Researchers at MayoClinic
We built an Airtable-style “smart spreadsheet” that lets doctors pull structured clinical trial data from PDFs—fast, transparently, and with full control. The tool blends AI-assisted extraction with human validation, streamlining a previously manual and error-prone workflow into a focused, fluid experience.
TL;DR
- Problem: 100+ hrs/review, error-prone manual data extraction.
- Solution: AI-augmented, spreadsheet-style UI with inline evidence and conflict alerts.
- Impact: 4× faster than any other method, error rate ↓66%, 80% clinician adoption, SUS 85+.
- Next: React MVP, explainability dashboard.
The Human Cost of Manual Review
Systematic reviews demand endless PDF scrubbing: copy, paste, alt‑tab, over and over. We shadowed Dr. Khan and Dr. Umar as they juggled trial tables and spreadsheets. Midway through the tenth paper, Dr. Khan sighed:
Multiply that by a hundred papers, and you see the issue: a process designed for machines, executed by humans.
Existing tools like Covidence, Rayyan, or DistillrSR failed on usability. Doctors either abandoned them or hacked their own makeshift systems.
Challenges Faced
We initially understood high-level issues through informal discussions and workshops, but deeper challenges like specific error rates and AI distrust became clearer through subsequent research. These insights directly informed our detailed design and iterative approach:
- High Manual Workload: Extracting data from 100+ PDFs per review cycle.
- Error-Prone Process: Copy-paste fatigue led to a 15% transcription error rate.
- Fragmented Tools: Switching between PDF readers, spreadsheets, and search interfaces disrupted workflow.
- AI Skepticism: Lack of transparent sourcing fueled skepticism.
Our UX Approach
Our design process unfolded across iterative phases, each building on user insights and technical feasibility:
Understand & Empathize
We began with informal conversations to understand clinicians' current workflows and deeper pain points. One clinician clearly articulated their hesitation with AI:
Research
To genuinely experience doctors' workflows, we shadowed three researchers through full review sessions, logging issues from version chaos to hidden table rows.
Concurrently, we performed heuristic and competitive audits of leading LSR tools (Covidence, Rayyan, DistillrSR), pinpointing gaps in evidence transparency and human–AI collaboration.
Brainstorming
We held targeted brainstorming sessions with the core team, applying the MoSCoW framework to categorize and prioritize features into Must‑haves, Should‑haves, Could‑haves, and Wonʼt‑haves.
With rich insights, features prioritized and detailed task-flows, we set a clear vision.
Goals & Success Metrics
We aimed to build an intuitive platform automating data extraction using AI/LLM, yet ensuring clear reasoning and easy human supervision. Although stakeholders proposed a cautious 50% automation goal, we set our sights higher.
Prototype, Validate, Iterate
We ran 4 two-week Agile sprints:
Low-Fid Wireframes Roadblocks
During the conversations, we still faced some resistance from the clinicians. We pivoted to a spreadsheet-style interface to mirror usersʼ mental models and reduce resistance. It became clear that the solution had to feel instantly familiar and require minimal adaptation. This led us to pivot to a spreadsheet-style interface to mirror usersʼ mental models and reduce resistance.
From there, we layered on explainability: exploring side-panel interactions that surfaced AI reasoning, dual-model comparisons, and highlighted source excerpts.
Technical Feasibility
The final sprint addressed technical implementation issues, specifically OCR reliability and asynchronous model call latency. Through close collaboration with developers, we validated practical solutions, such as OCR fallbacks, visual loading placeholders, and optimized asynchronous AI model calls, enhancing overall reliability and user experience.
Final Solution
- Smart Suggestion Grid: Inline dual-model AI fills cells; low-confidence fields flagged for review.
- Evidence Panel: Contextual PDF excerpts displayed beside suggestions, ensuring provenance.
- Confidence Alerts: Voting-based system highlights conflicts, involving humans only when needed.
- Continuous Learning Loop: User edits feed back into model retraining for improved future suggestions.
- Audit & Export: Compliance-ready logs and CSV/API export maintain regulatory transparency.
The interface is thoughtfully designed to be intuitive and efficient, streamlining the data extraction process with minimal user effort. Users can configure extraction settings simultaneously as files are being uploaded, saving valuable time and eliminating unnecessary steps.
Unlike many traditional medical tools, the interface adopts a clean, minimal design. Advanced or infrequently used settings are tucked away behind a toggle, allowing users to stay focused on their core tasks without being overwhelmed by clutter or irrelevant options.
If a user decides to add a new column, it's as simple as clicking the "+" button. The user could either type out what they want to extract or select from a list of common extraction types. The system will then automatically generate the necessary prompts for the AI model, streamlining the process and ensuring accuracy.
The system highlights cells with low confidence scores, allowing users to prioritize their review efforts. This feature is particularly useful when dealing with large datasets, as it helps users quickly identify areas that require their attention.
To quickly reference the original data, users can hover over a cell to view the corresponding PDF excerpt. This feature is especially helpful when users need to verify the accuracy of the extracted information or understand the context in which it was presented.
The system also provides a side panel that displays the AI's reasoning for its suggestions. This transparency helps users understand how the AI arrived at its conclusions and fosters trust in the system.
Reflections
Through this project, we learned critical lessons:
- Adaptability: It's easier to adapt to user styles than expect users to learn new systems.
- Transparency: Clear evidence and explanations are crucial for AI acceptance.
- Early Technical Alignment: Collaborating with developers early helped mitigate late-stage surprises.
- Balance: Users preferred augmenting their judgment rather than full automation.
- Familiarity: Leveraging familiar mental models (e.g., spreadsheets) greatly reduced user resistance.
Learnings
- Know your client: Doctors don’t design. Wireframes needed to feel real to get useful feedback. It’s easier to adapt to their context than expect them to adapt to ours.
- Trust comes from transparency: The side panel did more than explain—it turned AI into a collaborator.
- Embrace constraints early: Parsing issues and latency could have derailed the project. Early collaboration with engineers helped us design around limitations, not despite them.
- Doctors value control over speed: They didn’t want full automation—they wanted AI to augment their judgment, not override it.
- Familiarity is your Trojan Horse: By mimicking Excel and Airtable, we lowered the learning curve and framed AI as just a smarter cell.