top of page
Search

Event Logging to EL3: Practical Telemetry, Retention & Cost Controls

  • Writer: Harshil Shah
    Harshil Shah
  • Sep 1
  • 4 min read

 

ree

A federal CISO playbook to turn policy into outcomes—reach EL3 under M-21-31 with the minimum viable telemetry, a tiered retention model, defensible cost controls, and evidence an Authorizing Official will accept.

Audience: Federal CISOs, Deputies, SOC leaders, PMO • Time to implement: 60–90 days • Dependencies: SOC, privacy, legal, finance, procurement

“If your logging strategy can’t survive budget review, it won’t survive an incident review. Engineer for both.”

Harshil Shaw

What you’ll achieve

  • Clear scope: which systems and log types must reach EL3 now vs. next quarter.

  • Telemetry blueprint: identity, endpoint, network, cloud, SaaS, and application logs that actually drive detection and investigations.

  • Retention you can fund: hot/warm/cold tiers mapped to mission criticality.

  • Spend control: routing, filtering, and storage tactics that cut cost without losing signal.

  • Evidence: artifacts to satisfy ATO reviewers and internal audit.

EL3 in one minute

Goal: Logging requirements met across all criticality levels, with centralized access for investigations and response.

Core capabilities: event coverage by category, integrity (tamper-resistant), timely availability, queryability, retention, and cross-system correlation.

Outcome Metric: % of in-scope systems at EL3 × % of required event types collected × % of events searchable within target time.

Minimum viable telemetry (by domain)

Identity & Access

  • AuthN/AuthZ events: success/fail, MFA state, risk score, device binding.

  • Privilege changes: role grants, elevation, break-glass usage, dormant → active.

  • Federation: token issuance, unusual claims, session anomalies.

Endpoints & Servers

  • EDR detections, process starts, driver loads, script engines, lateral movement.

  • Patching status deltas; kernel and audit logs for high-value assets.

Network & Edge

  • DNS queries (resolver and egress), HTTP(S) metadata, TLS fingerprint changes.

  • Zero Trust gateways: policy decisions, deny reasons, inline malware verdicts.

Cloud, SaaS & Apps

  • Cloud control-plane: IAM changes, key/secret use, policy updates, resource create/delete.

  • SaaS admin: sharing/permission changes, external app grants, bulk downloads.

  • Application logs: auth flows, business-critical transactions, error rates, admin actions.

Tiered retention that balances cost & readiness

Tier

Purpose

Typical Window

Storage

Notes

Hot

Active detection & investigations

30–90 days

Primary SIEM/search

Fast query SLA; index high-value fields only

Warm

Case expansion, threat hunting

6–12 months

Lower-cost searchable store

Columnar/object storage with late-binding schema

Cold

Compliance & rare look-backs

12–24 months+

Object/archive with on-demand restore

Use lifecycle policies; encrypt & verify integrity

Tip: Define retention by mission criticality and investigation value, not by product default. Document exceptions up front.

Cost controls that don’t break investigations

Reduce ingest safely

  • Filter noise at source: drop known-benign health checks, verbose DEBUG unless under case.

  • De-duplicate: collapse repetitive events with counters and first/last timestamps.

  • Field hygiene: exclude high-cardinality payloads (e.g., full request bodies) from hot paths.

  • Sampling with guardrails: full capture for security-critical events; sample low-risk telemetry.

Store smarter

  • Triage routing: route only high-value events to SIEM; send the rest to object storage.

  • Lifecycle policies: automatic down-tiering (hot → warm → cold) and deletion on schedule.

  • Compression & partitioning: time- and tenant-based partitions; enforce gzip/zstd.

  • Schema-on-read: keep raw + minimal enriched versions to avoid re-ingest.

Evidence pack for ATO & audit

  • Inventory: authoritative list of log sources, category, owner, system boundary, data sensitivity.

  • Coverage map: required vs. collected event types per system, with gaps and target dates.

  • Data path diagram: source → collector → broker → storage tiers → SIEM/SOAR, including integrity controls.

  • Retention matrix: system × tier × duration × encryption × access control.

  • Operational runbooks: onboarding, schema changes, incident hold/legal preservation.

  • Quarterly attestation: % systems at EL3, % events searchable within target time, spot-check queries.

KPIs that matter

KPI

Target Pattern

How it’s used

% in-scope systems at EL3

>90% within 2 quarters

Program health

Time to search new events

<5 minutes to SIEM/searchable store

Investigation readiness

Coverage of required event types

>95% for HVA, >85% portfolio-wide

Signal quality

Cost per ingested GB (hot)

Quarter-over-quarter ↓ with stable MTTD

Financial control

MTTD / MTTI (investigation)

Month-over-month ↓

Outcome metric

90-day implementation plan

Phase

Weeks

Deliverables

Scope & governance

1–2

In-scope systems, required events per category, owners, initial coverage map, exception process.

Data plumbing

3–6

Collectors/brokers deployed, routing rules, hot/warm/cold stores, integrity controls, schema catalog.

Use-cases & tuning

7–9

Detection use-cases, saved queries, dashboards, noise filters, investigation runbooks.

Attest & operate

10–12

Quarterly metrics, gap closure plan, cost report, evidence pack for AO/internal audit.

Control economics: fund what reduces risk

Control

Primary Benefit

Cost Signal

Decision

Centralized log routing/broker

One policy plane; reliable delivery; integrity

↓ Duplicate ingest, cheaper down-tiering

Fund

Phishing-resistant MFA logs & admin trails

Identity attack visibility

Modest ingest; high investigation value

Fund

Verbose DEBUG in hot storage

Occasional deep dives

↑ Cost; low daily value

Defer to warm/cold

High-cardinality payload capture

For rare forensic needs

↑↑ Cost; privacy risk

Scope tightly

FAQ

What about “EL4”?

Some programs define an internal “EL4/Optimized” tier to drive continuous improvement beyond EL3 (for example, stronger integrity controls or longer searchable retention). It’s optional—treat it as an agency standard, not a new policy requirement.

How do I avoid runaway costs?

Filter at the source, route by value, down-tier aggressively with lifecycle policies, and publish a quarterly cost per GB vs. MTTD trend. If MTTD stays flat while cost drops, you’re cutting waste—not signal.

What’s the quickest win?

Close identity and admin-action logging first. It unlocks the largest set of high-value detections and accelerates investigations across apps, cloud, and SaaS.

How do I show progress to leadership?

Use a simple scorecard: % systems at EL3, coverage by required event type, search latency, and unit cost. Pair the scorecard with two improved investigation case studies.

 

 

 
 
 

Comments


bottom of page