Yield-math walkthrough: how Celia estimates enrollment probability for a single student
Step-by-step walkthrough of how a single student's yield probability gets computed — baseline, signals, weights, and the factors shipped with every score.
Enrollment leaders who understand the math trust the number. Enrollment leaders who do not understand the math either over-trust the number (and get burned when it moves unexpectedly) or ignore it entirely (and lose the benefit of having it). Neither is acceptable, so we want to demystify how Celia produces a yield probability for a single student.
This is a walk-through, not a spec. We are not going to drop the weight matrix — the weights are calibrated per institution and tuned quarterly against outcome data, so any specific number we printed would be misleading. What we are going to do is take one student, show you the inputs, show you how the baseline moves, and show you the final number. Once you have seen the pattern, the rest of it is tuning.
Start with a baseline, not a guess
A yield score for a student that starts from zero is pretending to know nothing. We always know something. The student is in a cohort. The cohort has a historical yield rate.
So the first input to Celia’s yield math is the institution’s own historical baseline for a student matching the cohort this student belongs to. For a worked example, take a first-gen, in-state, engineering-program student at the deposit-paid stage at a regional public four-year. Pull three years of historical data on students who matched that cohort at the same stage. The institutional baseline yield-to-matriculation for that cohort might be sixty-four percent. That is the starting point.
Baseline P(enrollment) = 0.64
A student in that cohort, with zero other signals, sits at 0.64 before Celia looks at anything else. This number is already more useful than most dashboards will tell you, because it is grounded in the institution’s own outcomes, not a generic industry average.
Layer the signals
The student is not a generic member of their cohort. They have a behavioral profile accumulated over months. Celia reads that profile and converts each relevant signal into a directional modifier on the baseline.
Take our example student. Pull their signals as of today:
- Application complete: yes. This is a lift; students with complete applications at the deposit stage matriculate at meaningfully higher rates than those with gaps.
- FAFSA verification: requested fourteen days ago, not responded to. This is a drag. FAFSA-verification stalls are among the strongest negative signals in the pre-matriculation window.
- Portal logins: zero in the last twenty-one days. Drag. Portal silence is correlated with family-side hesitation.
- Deposit: paid, on time. Lift. Deposits on or before the deadline are modestly predictive relative to deposits submitted in extension.
- Orientation attendance: attended the virtual orientation session last week. Lift. Orientation attendance is one of the strongest lift signals we see.
- Housing application: not yet submitted, twelve days past the institution’s soft deadline. Drag. Housing-application completion is a classic melt predictor.
Each of these signals maps to a weight derived from the institution’s own historical data — how much do students who fired this signal over- or under-perform the baseline cohort yield. The weights are not pulled from a textbook. They come from running the institution’s last three years of outcomes through a calibration job that Celia refreshes quarterly.
Apply the modifiers
For illustration — and these are illustrative, not production values — imagine the modifier set for our example student looks like this:
Baseline: 0.64
Application complete: +0.06
FAFSA verification stalled 14d: -0.09
No portal login in 21d: -0.07
Deposit paid on time: +0.03
Orientation attended: +0.08
Housing application open 12d past soft due: -0.05
----------------------------------------------------
Raw adjusted P(enrollment): 0.60
The modifiers do not simply add. In production, Celia uses a sigmoidal combining function so that a student cannot drift above 0.99 or below 0.01, and so that multiple strong negative signals compound slightly rather than canceling out. For this walk-through, a linear-additive view is close enough to show the intuition.
The student started at a cohort baseline of 0.64 and, after signals, sits at roughly 0.60. A slight drag overall — the FAFSA and portal-silence negatives outweigh the orientation and application-complete positives. The interpretation is not “this student probably will not enroll.” It is “this student is slightly below cohort expectation, and here is specifically why.”
Ship the top contributing factors
A probability without explanation is a black box. Celia always publishes the score alongside the two or three signals that moved it most, ranked by absolute contribution.
For the student above, the published output looks like:
ss_celia_yield: 0.60
ss_celia_yield_trend: down 0.04 in 14d
ss_celia_yield_drivers: FAFSA stalled 14d · no recent portal login · strong deposit+orientation signal
That final field is what the counselor reads. The number tells them the magnitude. The drivers tell them the action. “FAFSA stalled” is not a conclusion; it is a to-do. “No recent portal login” is a conversation opener. “Strong deposit+orientation signal” is a reassurance that the top-of-funnel commitment is still there.
Three pieces of information. One field each. That is the whole output surface.
Honest caveats about what this is
We want to be direct about what Celia’s yield math is and is not.
It is not a regression model in the classical sense. There is no single fitted equation with coefficients. Celia’s scoring is signal-weighted with cohort-level priors, blended through a calibration layer that uses your institution’s actual outcomes to adjust weights over time. It is closer in spirit to a Bayesian updating model than to a logistic regression.
It is not a black-box neural network. There are no opaque deep-learning activations deciding a student’s fate. Every signal that moves the number is nameable. Every modifier is auditable. If a counselor asks “why is this student at 0.60?” we can answer, specifically and in natural language, every time.
It is right often enough to be useful and transparent enough to be auditable, which matters more than chasing an extra two points of AUC. A model that scores three points better on a validation set but cannot be explained to a VP has no business in an enrollment office. The floor for trust in this domain is explainability, not accuracy. Accuracy is necessary; it is not sufficient.
What not to do with the number
A yield probability is not a binary prediction. A student at 0.45 is not “not enrolling.” They are asking for intervention that tips them.
We have watched institutions misread this. They treat the yield number as a filter — keep the 0.80s, triage out the 0.40s. That is exactly backward. A 0.95 student is on autopilot; they need little. A 0.25 student is almost certainly lost; your time on them will not move the number much. The 0.45 to 0.65 band is where the work is. That is where a counselor conversation, a financial aid reconfirmation, a housing nudge actually changes the outcome. Yield probability is a prioritization tool for intervention, not a sorting tool for triage.
Celia’s recommendation engine uses the yield number inversely weighted with signal-actionability to produce the daily counselor to-do list. A student at 0.55 with three stalled signals ranks higher for outreach than a student at 0.20 with nothing actionable in their profile. The math serves the intervention, not the other way around.
Recalibration and drift
Yield models drift. The signals that predicted enrollment in 2022 are not the same signals that predict enrollment in 2026. Generational behavior changes. Platform changes change (students open fewer emails now than five years ago; they respond to texts at different rates than they did in 2020). Economic changes change the weight of financial-aid signals.
Celia recalibrates weights quarterly against your institution’s actual outcome data. Students who enrolled and students who did not are fed back into the weight calibration, and the next quarter’s scores reflect what actually mattered last quarter. This is the unsexy maintenance work that separates a yield model that stays useful from a yield model that drifts into uselessness after eighteen months.
Trust comes from explainability
The single most important property of a yield number, from a counselor and VP point of view, is whether it can be argued with. A number that cannot be argued with cannot be trusted. A number that can be argued with — whose drivers can be verified against the student record, whose movement can be traced to a specific signal change — earns trust over time.
Every yield score Celia publishes ships with its drivers. Every drift in the number can be explained by a change in a named signal. Every counselor override is captured and feeds back into the next recalibration.
That is how yield math earns its way into a counselor’s morning routine. Not by being marginally more accurate than the last tool. By being honest about what it knows and why.
See the three analyses Celia runs every day →
Tagged