Introducing CLERK: A New Knowledgebase for Training Healthcare Admin AI
Care Lifecycle Events and Records Knowledgebase (CLERK)
We’re thrilled to announce the launch of CLERK — a new data product from Protege, built for AI developers working on some of healthcare’s most complex administrative challenges: billing, coding, prior authorization, and beyond.
Why CLERK?
Despite the hype, scaling AI in healthcare remains a challenge. Most models for admin tasks rely on fragmented, customer-specific datasets, leading to long deployment times, limited generalizability, and high compute costs.
CLERK changes the game.
CLERK is a connected EHR × claims dataset, built from tens of millions of real-world patient encounters and meticulously validated to ensure accuracy, scale, and bias mitigation. This is the infrastructure AI builders need to move fast and deploy with confidence — without reinventing the wheel for every customer.
What’s Inside:
Encounter-level linked EHR + open claims
Specialty-specific CPT + ICD coverage
Bias mitigation to ensure real-world generalizability
Built for Builders
Whether you're building your own foundation model or fine-tuning a LLM, CLERK provides the data backbone you need to build smarter, faster, and more scalable healthcare admin tools.
This is just the beginning. We’re continuing to expand CLERK’s coverage and fidelity — and we can’t wait to see what you build with it.
Read our full white paper here.