Automatic Qualifications: Keeping Humans in the Loop

Before a nurse can be submitted to a hospital, someone has to qualify her against the job. Does her license cover this state? Is her ACLS current and from an issuing body the client accepts? Does she have 18 months of trauma experience on a Level II unit? Has it been long enough since her last placement at this facility? Is her skills-checklist score above the threshold for the unit? Multiply that by twenty or thirty constraints per job, and dozens of jobs per clinician per day, and you have what used to be the most expensive cognitive task at our company.

For most of Trusted’s history, that work was a human reading a long-form job description in one tab and clicking through clinician credential tabs in another. An advocate spent the better part of a workday qualifying a single clean application, and longer when the clinician was missing something. The work was careful and almost entirely unaided by software.

We rebuilt the layer. Every client requirement is now a typed JobRule. Every clinician credential is a structured field on a model. A rules engine evaluates them pair-by-pair, returns a verdict for each, and assembles a per-application Qualification record that knows its own state. For the items rules alone can’t decide, an AI qualification layer runs through a structured prompt at temperature 0.0 and returns a verdict and a reason. Every AI execution captures a feedback record from the human who reviews it. That feedback compounds.

Automated where certain, human where it matters.

The problem was prose

Job requirements lived as prose. “Candidate must have a minimum of 18 months recent experience in a Level II trauma center, ACLS current within the last two years, and no placement at this facility within the prior 13 weeks.” A human reading that sentence has to remember it long enough to compare against credential records that live on a different screen, in a different shape, with different vocabularies. Recency windows live in the requirement. Credentials live on the profile. The comparison lives in the advocate’s head.

Two consequences fell out of that. On speed: the clinician experience we’re aiming at is one tap to apply, and submission-ready within hours, not days. At an advocate-day of cognitive work per application, that target is structurally impossible. On consistency: a new advocate took longer than an experienced one because the knowledge of how to read each client’s prose was learned, not encoded. The same job could get qualified two different ways by two different advocates, and the discrepancy was invisible until something went wrong.

Information-architecture problems get fixed by structuring the information. Turn every requirement into an object with a known shape, turn every credential into a typed field on a model, and the comparison becomes mechanical. Mechanical comparison runs in milliseconds and never disagrees with itself.

Every requirement, structured

The rules engine starts with a taxonomy. Every requirement we’ve ever seen from a client is encoded as one of a fixed set of JobRule types, each with its own validator. A validator takes the rule, with its parameters, and the clinician’s profile data, and returns a pass/fail with a structured reason.

A grouped catalog of rule type categories used by the rules engine: identity and licensing, clinical experience, credentials, health and compliance, logistics, and client-specific. Each category lists two or three example rule types as monospace labels under it.

The categories that have stabilized:

Identity and licensing. StateLicense (active license for the work state), LicenseVerification (verification through the appropriate external source required), ProofOfIdentification (government-issued ID required).

Clinical experience. RoleExperience (minimum months in the specialty), UnitExperience (minimum months on this unit type within a recency window), TraumaLevelExperience (trauma-level certification or experience required). Each one carries its own recency rules, because three years on a med-surg floor seven years ago is not the same credential as three years that ended last quarter.

Credentials. Certification (ACLS, BLS, PALS, and the long tail, with alternative-certification support and issuing-body requirements), SkillsChecklist (unit-specific self-assessment scored above a configured threshold), Education (degree requirements with accreditation rules).

Health and compliance. Vaccines, Boosters, FluVaccine, each with configuration for which kinds of declinations the client accepts and which they don’t.

Logistics. StartDate (maximum weeks to start), MaximumTimeOff (with holiday restrictions, consecutive-day caps, and contract-length-adjusted allowances), MinimumPlacementGap (required gap since last placement at this facility or client), Radius (travel vs. local rules, distance, excluded geographies).

Client-specific. ClientForm (a custom attestation form the client requires), BlockSchedule (clinician must commit to a fixed weekly block), TechnologyExperience (specific EMR or technology familiarity).

Each validation type encodes years of patterns as executable logic. MaximumTimeOff knows about holiday restrictions, consecutive-day limits, per-week caps, and contract-length-adjusted allowances. Certification knows about alternative certifications, issuing bodies, and expiration windows. Radius knows about travel vs. local distinctions, excluded states and counties, and pre-close requirements. None of this is a lookup table. It is the accumulated knowledge of healthcare staffing compiled into code, with a single owner per rule type who keeps the validator honest.

Operations teams configure rules through the same admin surface that runs them, so the shape clients negotiate and the shape advocates work against are the same shape: a list of typed requirements with parameters. The system of record for “what does this client require” is the rule set itself, not the prose in a contract addendum.

A qualification is a state machine

When a clinician applies, the system creates a Qualification record immediately. Each applicable rule produces a QualificationItem. Each item runs through a state machine.

The states:

auto_qualified: the rule passed automatically; no human needed.
auto_disqualified: the rule failed automatically; the application can’t proceed without an override.
auto_bypassed: the system waived the rule, because it’s a flexible requirement the client accepts alternatives for.
auto_pending_clinician: the system identified a gap and automatically requested the missing artifact from the clinician.
needs_review: the system couldn’t make the determination on its own.
qualified / disqualified / bypassed: a human made the call.

The Qualification itself rolls up into one of four outcomes: qualified, disqualified, auto_qualified, auto_disqualified. The distinction between qualified and auto_qualified matters because it tells us how much of the work the system did on its own and where the human attention went. Once all items reach a terminal state and the overall outcome is positive, the next step downstream fires on its own. No one has to push a button to start packet assembly; the work moves itself.

I kept the state machine small on purpose. Richer state graphs I tried early on burned operations time on transitions that didn’t correspond to anything an advocate actually does. These seven states are the ones that survived contact with production.

Where rules end, AI begins

Rules are fast and consistent, and they have limits.

A rule can check that a clinician has a skills-checklist record; it can’t evaluate whether her self-reported scores are credible given her work history. A rule can verify that a license exists; it can’t interpret an ambiguous expiration date scrawled on a scanned wallet card she uploaded from her phone. A rule can flag that a required certification is missing; it can’t decide whether an alternative certification the client hasn’t explicitly listed would still be acceptable.

For those cases we have QualificationAIExecution. Each QualificationItem can trigger an evaluation via QualificationPrompt, a structured prompt running through our internal LLM layer. The prompt receives the rule, the clinician’s relevant credentials, and the job context. It returns a structured verdict, qualified or disqualified, with a reason field that explains how it got there.

class QualificationPrompt < Glados::Prompt
  temperature 0.0
  model "gpt-5.4"

  schema do
    string :verdict, enum: %w[qualified disqualified]
    string :reason
  end
end

Temperature 0.0 is a deliberate determinism decision, not a default I forgot to change. We want the same inputs to produce the same verdict every time. Drift between two evaluations of the same credential is a bug, not a feature. The model still has room to surface judgment in the reason field, but the verdict is committed alongside it on the same structured response. When we want to explore alternative interpretations of an ambiguous case, we do it with a different prompt, never by turning the knob on the production one.

The AI execution is a first-class record:

class QualificationAIExecution < ApplicationRecord
  belongs_to :qualification_item
  has_one :feedback, class_name: "QualificationAIFeedback"

  enum status: { pending: 0, completed: 1, failed: 2, no_changes: 3 }
  enum outcome: { qualified: 0, disqualified: 1 }
  # checksum of the inputs; re-run only when underlying data changes
end

class QualificationAIFeedback < ApplicationRecord
  belongs_to :qualification_ai_execution
  belongs_to :reviewer, class_name: "User"

  enum rating: { positive: 0, negative: 1 }
  validates :note, presence: true, if: :negative?
end

Every execution carries a feedback slot. When an advocate reviews an AI verdict, she rates it positive or negative, and a negative rating requires a note explaining what was wrong. That record is attached to the execution forever. When the prompt changes, those notes become the eval set we run against. When the model changes, those notes are how we tell whether the new model is better or worse for our specific surface, not just better on a generic benchmark.

We don’t re-run AI on inputs that haven’t changed. If a clinician’s relevant credentials are byte-for-byte the same as the last time we evaluated this rule, we reuse the prior verdict. We pay for the model exactly once per real decision.

Humans where it matters

The AI layer isn’t fully autonomous, and we don’t want it to be. It exists to handle the determinations rules alone can’t make, with human review on the edge cases and a feedback mechanism that improves the system over time.

Our design rule is concrete. Auto-qualify everything that can be determined with certainty. Auto-pend the clinician when something is missing, so the system requests it from her directly. Escalate to the AI layer when the answer requires interpretation. Surface to an advocate only what genuinely requires human judgment. An advocate’s job shifts from “evaluate every requirement on every application” to “review the few items the system flagged.”

When the system gets a qualification right and the advocate confirms it, that is signal. When the system gets it wrong and the advocate overrides with a note, that is also signal, and the more valuable kind. The loop is built into the workflow, not bolted on afterward. Advocates aren’t filling out a separate quality-rating form at the end of their day; they are marking thumbs up or thumbs down on the same screen where they make the decision.

The shape this gives us is familiar from the rest of the platform. Our job autocuration system, described in Agents that fill out forms, uses the same posture: agents act, but only inside structured surfaces that the same humans review, with feedback compounding into prompts and evals. The infrastructure underneath both is shared on purpose---the same pattern keeps showing up wherever we replace cognitive work with software.

Applications clearing qualification end-to-end without a human touching them is now routine, not a special case. We don’t publish a current rate as an operational metric because the slope matters more than the slice, and which clients are in scope changes the denominator week to week. We’d rather describe the architecture honestly than wave a number around opportunistically. The trend is up, and every advocate override teaches the system something the next release inherits.

What makes one tap real

Everything this post describes is what sits behind a clinician’s tap on Apply.

When she taps, the system doesn’t just create a job application. It creates the qualification immediately, runs every applicable rule against her structured profile, auto-qualifies the items it can, auto-bypasses the ones the client allows, asks her for anything that is genuinely missing, fires AI evaluations on the ambiguous remainder, and surfaces only the residue to an advocate. By the time a human is involved, most of the qualification work is done. For the happy-path clinician with a complete profile and current credentials, the qualification completes without an advocate ever opening the application. The system then moves the work forward on its own: packet assembly, submission-ready checks, and out the door.

--- Thaynã, Engineering