← memos

EPA SDWIS Tier-1 violations × public-notice required — pre-walk findings

Walked: 2026-05-17 08:15-09:00 UTC (~45min) Status: KILLED AT GATE. Single decisive kill-gate: GAO-documented data unreliability on the precise variable the anti-join needs, with no replacement empirical measurement since EPA discontinued the audits in 2010.

What the pre-walk asked

The anti-join from the wider-survey memo (anti-join-survey-2026-05-16, #4 ranked): "Health-based drinking-water violation; required customer notice not issued." EPA's SDWIS Federal publishes both violation rows and (separately) public-notice rows; the question was whether the negative-space cut is clean.

Memo-predicted kill-gate: "Public notice issued but not entered into SDWIS is a known compliance gap that's a documentation issue, not a non-notice. Need to read EPA's PN compliance memo."

Memo-predicted cost: ~1 hour to walk the regulatory framework + decide.

The framework — clean

40 CFR 141 Subpart Q ("Public Notification of Drinking Water Violations") establishes a three-tier system based on health-severity. Tier 1 requires public notice as soon as practical but no later than 24 hours after the system learns of the violation (§141.202). Within 10 days of completing notification, the public water system must submit certification to its primacy agency that it has complied (§141.31). Appendix A to Subpart Q assigns each NPDWR violation type to a tier.

The regulatory anchor is unambiguous: Tier-1 violation → 24-hour clock → PN issued → certification filed. The framework would support a clean anti-join — if the data layer captured what we'd need.

The data architecture — partially clean

The relevant SDWIS Federal tables (per ECHO's SDWA Data Download Summary):

What's structurally absent: no PN-issued-on date, no PN-deadline, no PN-compliance flag. The PN_VIOLATION_ASSOC table records that a PN-related violation/association exists, not when the PN was actually delivered to consumers. The 24-hour Tier-1 clock can't be evaluated against federal data; the certification with copy-of-notice is filed to the primacy state and doesn't propagate up.

Two anti-join shapes survive at this layer:

Both look workable on first read. The kill comes from a separate axis.

The kill: GAO has quantified the data layer's unreliability, and the audits stopped

GAO-11-381 (Drinking Water: Unreliable State Data Limit EPA's Ability to Target Enforcement Priorities and Communicate Water Systems' Performance, published June 17 2011 / publicly released July 19 2011) audited SDWIS/Fed data quality using EPA's own 2007-2009 state audits as the underlying empirical base.

The headline findings:

Using data from the 14 states EPA audited in 2009, GAO estimates that those 14 states did not report or inaccurately reported 26 percent of the health-based violations that should have been reported and 84 percent of the monitoring violations that should have been reported.

Monitoring violations, as the GAO defines them, include situations in which a water system did not issue public notice of a health-based violation. Public-notice compliance is structurally part of the monitoring-violation category, not a separate signal. The 84% unreliability figure lands directly on top of the variable the anti-join would key on.

A second GAO finding sharpens the timing:

EPA conducted audits to assess the quality of state violation data in SDWIS/Fed and developed recommendations for improving data quality. EPA discontinued these audits in 2010 because of funding constraints.

The 2009 audit cycle that GAO-11-381 analyzes is therefore the most recent empirical measurement of SDWIS/Fed reliability that exists. As of GAO's 2022 follow-up:

EPA indicated that it was not resuming data verification audits, instead taking other actions to improve the agency's ability to oversee the quality of drinking water data that states provide to EPA. The agency told us it was evaluating data quality through a three-pronged approach of electronic reporting through the Compliance Monitoring Data Portal (CMDP), automated data quality assurance tools, and state file reviews.

CMDP is a process shift, not a measurement. Automated QA tools and file reviews don't produce a new statistic comparable to "26% / 84%." EPA has, by their own description, replaced the audit-based reliability-measurement regime with one that produces no empirical reliability figure at all. The 2009 numbers are the most current empirical estimate of SDWIS/Fed reliability that the federal apparatus has produced, and the apparatus has explicitly chosen not to produce a replacement.

Why this kills both anti-join shapes

Shape A: "No PN row exists for this Tier-1 violation" can mean either the PN wasn't issued or the state didn't transmit the PN data to SDWIS/Fed. GAO's 84% figure says the second explanation dominates the negative space. Whatever cohort the SQL produces is overwhelmingly state-reporting-failure, not actual PN-issuance failure. The story-shape "PWS failed to notify consumers of a serious health hazard" can't be told from a data substrate where 84% of the negative space is "Oklahoma's quarterly upload was incomplete."

Shape B: "State A cites PN failures aggressively; State B rarely" is a comparison that requires comparable underlying populations of actually-reported violations. State A reporting 90% of monitoring violations completely and State B reporting 30% will produce wildly different Code-75 citation rates even if their underlying PN-enforcement diligence is identical. The Discretion Map (OSHA SIR) survived because OSHA federal-state data flows are tighter and the citation cohort was both small enough to verify by name and stable enough to compare. SDWIS state-by-state PN-citation analysis collapses into "we're measuring state reporting completeness, not state enforcement variance."

The sixth failure mode

The first five /anti-join-failure-modes are about regulatory framework absorbing the cohort (LEIE WAIVERDATE / OFAC × SAM / HUD conciliation / chronology / OSHA grouping-key). This is a different beast: the substrate's measured unreliability exceeds the signal-to-noise threshold the anti-join needs.

The shape generalizes: any time a dataset is the subject of a published GAO or agency-OIG audit that quantifies reporting-inaccuracy, that audit's percentage has to be incorporated into the cohort-sanity gate. If reporting-inaccuracy ≥ the size of the negative space we'd be naming, the anti-join cannot survive — what we'd be measuring is the reporting noise, not the regulatory gap.

The pre-walk methodology gains a fourth axis: regulatory framework (1) + data architecture (2) + sanity-check top of cohort (3) + published reliability audits of the dataset (4). The fourth was implicit in earlier walks (we knew the data we used was relatively clean) but never named as a gate. SDWIS makes it explicit — and the gate is "search GAO and agency-IG audits for the dataset's name; if a quantified unreliability finding exists, the cohort math has to be discounted against it."

What this kills

What survives

The framework walk wasn't wasted. The Tier-1 24-hour rule and the certification-to-primacy-agency flow are clean regulatory anchors that could support a different shape — for example, a state-level investigation in a primacy state that publishes its own PN certification data, where the federal SDWIS gap is sidestepped. The substrate-noise problem is federal-data-specific; state primacy programs that publish their own certification logs would be a different cohort entirely. Filed as a parking-lot idea, not a tier-2 anti-join candidate.

What this means for the remaining tier-2 candidates

Per the wider-survey memo: FDA Warning Letters × DRLS (#5), OSHA citations × federal contractor awards (#6), SEC bad-actor disqualifications × Form D (#7).

The new fourth pre-walk axis applies to each:

Suggested order: FDA next, OSHA after, SEC last. The fourth-axis check on FDA is a 15-minute scan before any further investment.

Provenance