Event Logging to EL3: Practical Telemetry, Retention & Cost Controls
- Harshil Shah
- Sep 1
- 4 min read

A federal CISO playbook to turn policy into outcomes—reach EL3 under M-21-31 with the minimum viable telemetry, a tiered retention model, defensible cost controls, and evidence an Authorizing Official will accept.
Audience: Federal CISOs, Deputies, SOC leaders, PMO • Time to implement: 60–90 days • Dependencies: SOC, privacy, legal, finance, procurement
“If your logging strategy can’t survive budget review, it won’t survive an incident review. Engineer for both.”
— Harshil Shaw
What you’ll achieve
Clear scope: which systems and log types must reach EL3 now vs. next quarter.
Telemetry blueprint: identity, endpoint, network, cloud, SaaS, and application logs that actually drive detection and investigations.
Retention you can fund: hot/warm/cold tiers mapped to mission criticality.
Spend control: routing, filtering, and storage tactics that cut cost without losing signal.
Evidence: artifacts to satisfy ATO reviewers and internal audit.
EL3 in one minute
Goal: Logging requirements met across all criticality levels, with centralized access for investigations and response.
Core capabilities: event coverage by category, integrity (tamper-resistant), timely availability, queryability, retention, and cross-system correlation.
Outcome Metric: % of in-scope systems at EL3 × % of required event types collected × % of events searchable within target time.
Minimum viable telemetry (by domain)
Identity & Access
AuthN/AuthZ events: success/fail, MFA state, risk score, device binding.
Privilege changes: role grants, elevation, break-glass usage, dormant → active.
Federation: token issuance, unusual claims, session anomalies.
Endpoints & Servers
EDR detections, process starts, driver loads, script engines, lateral movement.
Patching status deltas; kernel and audit logs for high-value assets.
Network & Edge
DNS queries (resolver and egress), HTTP(S) metadata, TLS fingerprint changes.
Zero Trust gateways: policy decisions, deny reasons, inline malware verdicts.
Cloud, SaaS & Apps
Cloud control-plane: IAM changes, key/secret use, policy updates, resource create/delete.
SaaS admin: sharing/permission changes, external app grants, bulk downloads.
Application logs: auth flows, business-critical transactions, error rates, admin actions.
Tiered retention that balances cost & readiness
Tier | Purpose | Typical Window | Storage | Notes |
Hot | Active detection & investigations | 30–90 days | Primary SIEM/search | Fast query SLA; index high-value fields only |
Warm | Case expansion, threat hunting | 6–12 months | Lower-cost searchable store | Columnar/object storage with late-binding schema |
Cold | Compliance & rare look-backs | 12–24 months+ | Object/archive with on-demand restore | Use lifecycle policies; encrypt & verify integrity |
Tip: Define retention by mission criticality and investigation value, not by product default. Document exceptions up front.
Cost controls that don’t break investigations
Reduce ingest safely
Filter noise at source: drop known-benign health checks, verbose DEBUG unless under case.
De-duplicate: collapse repetitive events with counters and first/last timestamps.
Field hygiene: exclude high-cardinality payloads (e.g., full request bodies) from hot paths.
Sampling with guardrails: full capture for security-critical events; sample low-risk telemetry.
Store smarter
Triage routing: route only high-value events to SIEM; send the rest to object storage.
Lifecycle policies: automatic down-tiering (hot → warm → cold) and deletion on schedule.
Compression & partitioning: time- and tenant-based partitions; enforce gzip/zstd.
Schema-on-read: keep raw + minimal enriched versions to avoid re-ingest.
Evidence pack for ATO & audit
Inventory: authoritative list of log sources, category, owner, system boundary, data sensitivity.
Coverage map: required vs. collected event types per system, with gaps and target dates.
Data path diagram: source → collector → broker → storage tiers → SIEM/SOAR, including integrity controls.
Retention matrix: system × tier × duration × encryption × access control.
Operational runbooks: onboarding, schema changes, incident hold/legal preservation.
Quarterly attestation: % systems at EL3, % events searchable within target time, spot-check queries.
KPIs that matter
KPI | Target Pattern | How it’s used |
% in-scope systems at EL3 | >90% within 2 quarters | Program health |
Time to search new events | <5 minutes to SIEM/searchable store | Investigation readiness |
Coverage of required event types | >95% for HVA, >85% portfolio-wide | Signal quality |
Cost per ingested GB (hot) | Quarter-over-quarter ↓ with stable MTTD | Financial control |
MTTD / MTTI (investigation) | Month-over-month ↓ | Outcome metric |
90-day implementation plan
Phase | Weeks | Deliverables |
Scope & governance | 1–2 | In-scope systems, required events per category, owners, initial coverage map, exception process. |
Data plumbing | 3–6 | Collectors/brokers deployed, routing rules, hot/warm/cold stores, integrity controls, schema catalog. |
Use-cases & tuning | 7–9 | Detection use-cases, saved queries, dashboards, noise filters, investigation runbooks. |
Attest & operate | 10–12 | Quarterly metrics, gap closure plan, cost report, evidence pack for AO/internal audit. |
Control economics: fund what reduces risk
Control | Primary Benefit | Cost Signal | Decision |
Centralized log routing/broker | One policy plane; reliable delivery; integrity | ↓ Duplicate ingest, cheaper down-tiering | Fund |
Phishing-resistant MFA logs & admin trails | Identity attack visibility | Modest ingest; high investigation value | Fund |
Verbose DEBUG in hot storage | Occasional deep dives | ↑ Cost; low daily value | Defer to warm/cold |
High-cardinality payload capture | For rare forensic needs | ↑↑ Cost; privacy risk | Scope tightly |
FAQ
What about “EL4”?
Some programs define an internal “EL4/Optimized” tier to drive continuous improvement beyond EL3 (for example, stronger integrity controls or longer searchable retention). It’s optional—treat it as an agency standard, not a new policy requirement.
How do I avoid runaway costs?
Filter at the source, route by value, down-tier aggressively with lifecycle policies, and publish a quarterly cost per GB vs. MTTD trend. If MTTD stays flat while cost drops, you’re cutting waste—not signal.
What’s the quickest win?
Close identity and admin-action logging first. It unlocks the largest set of high-value detections and accelerates investigations across apps, cloud, and SaaS.
How do I show progress to leadership?
Use a simple scorecard: % systems at EL3, coverage by required event type, search latency, and unit cost. Pair the scorecard with two improved investigation case studies.




Comments