Job Description: SOC Incident Management Analyst (L3/L4) AWS, Cloudflare & CrowdStrike (MSSP-led L1/L2)
About the role
We are hiring a SOC Incident Management Analyst (L3/L4) to own advanced incident response, investigation, and long-term risk reduction across our AWS, Cloudflare, and CrowdStrike security stack. L1/L2 monitoring and initial triage are handled by a CrowdStrike MSSP; this role is the escalation point for complex, high-severity, or ambiguous alerts, and is accountable for driving incidents to closure with strong technical depth, clear communications, and measurable improvements (tuning, automation, and hardening).
This is a regular, non-shift role (business hours) with exception-based support for critical incidents.
Key Responsibilities
1) L3/L4 Incident Response & Technical Investigations
- Act as the internal escalation point for MSSP-raised incidents and complex detections across endpoint, cloud, and edge.
- Perform deep-dive investigations: correlate signals across CrowdStrike detections, AWS control plane/activity, and Cloudflare edge/WAF telemetry to determine scope, impact, and root cause.
- Lead containment and eradication planning for high-risk events (credential compromise, suspicious IAM activity, malware outbreaks, data exposure risks, abuse at the edge).
- Define and execute evidence capture and forensics-friendly actions (log preservation, snapshots, chain-of-custody aligned practices where applicable).
2) Incident Management, Coordination & Communication
- Own incident severity assessment, timelines, and executive-ready updates; ensure stakeholders understand risk, status, and next steps.
- Coordinate across Security Engineering, Cloud/Platform, IT, and Application teams to drive remediation and validate recovery.
- Maintain high-quality incident documentation: technical narrative, IOCs, affected assets, decisions taken, and residual risk.
3) MSSP Governance & Escalation Quality
- Define what constitutes escalation to L3/L4 and establish clear acceptance criteria (minimum artifacts, logs, context, reproduction steps).
- Review MSSP outputs for quality and completeness; provide feedback loops to improve triage accuracy and reduce false positives.
- Maintain escalation runbooks and ensure smooth handoffs (RACI, SLAs, contact paths, and standard update cadence).
4) Detection Engineering, Tuning & Threat-Informed Improvements
- Improve detection fidelity and response outcomes:
- Tune Cloudflare WAF / firewall/rate limits/bot controls based on attack patterns and business traffic baselines.
- Improve CrowdStrike detection handling (workflows, response policies, exclusions with guardrails).
- Strengthen AWS detections around IAM abuse, abnormal API patterns, risky configurations, and sensitive resource access.
- Map incidents and detections to MITRE ATT&CK to drive coverage and prioritization.
- Partner with engineering to implement preventive controls (IAM hardening, least privilege, segmentation, secure-by-default patterns).
5) Automation & Operational Efficiency (SRE mindset)
- Build or guide automation to reduce time-to-contain and time-to-recover:
- Enrichment pipelines (asset inventory, identity context, ownership, blast radius).
- Standardized response actions with approvals/guardrails.
- Dashboards/metrics for incident trends, recurring causes, and MSSP performance.
6) Post-Incident Reviews & Risk Reduction
- Run or contribute to post-incident reviews and ensure corrective actions are tracked to closure.
- Identify systemic issues and create a prioritized backlog (controls, hardening, logging gaps, detection gaps, playbooks).
What success looks like (First 6090 days)
- Establishes crisp L3/L4 escalation criteria and improves MSSP handoff quality.
- Reduces MTTR for escalated incidents through consistent investigation workflows and better cross-team coordination.
- Delivers measurable improvements in detection fidelity and alert-to-incident conversion (fewer false positives, faster scoping).
- Produces or upgrades runbooks/playbooks for top incident categories and implements at least a few high-impact hardening/tuning wins.
Required qualifications
- 3-5 years in Security Operations / Incident Response / Detection Engineering / Cloud Security with demonstrated L3/L4 ownership.
- Strong investigation experience across:
- CrowdStrike (detections, containment policies, host telemetry interpretation, response workflows).
- AWS security (IAM, CloudTrail-style auditing, STS, security groups/NACLs, typical cloud attack paths and mitigations).
- Cloudflare security (WAF rules, firewall events, DDoS/bot mitigation, traffic analysis and tuning).
- Strong Linux fundamentals; ability to reason about process/network behavior and log evidence.
- Scripting and automation skills in Python and/or Bash; ability to use APIs and build repeatable workflows.
- Excellent incident communications and stakeholder management (technical + non-technical audiences).
Nice to have
- Experience with AWS security services (e.g., GuardDuty, Security Hub, Detective) and centralized logging architectures.
- Kubernetes/EKS exposure and container/workload investigation.
- Practical forensics experience (disk/memory concepts, evidence preservation, legal/compliance awareness).
- Infrastructure-as-Code (Terraform/CloudFormation) and CI/CD security troubleshooting.
- Experience in FinTech / regulated environments and security control frameworks.
Soft skills
- High ownership and calm execution under ambiguity.
- Strong written documentation habits (clear timelines, hypotheses, decisions, evidence).
- Ability to balance short-term containment with long-term fixes (controls, tuning, automation).
Work model
- Regular business-hours role; not shift-based.
- May support critical incidents outside hours on an exception basis as part of an agreed escalation policy.