Search by job, company or skills

millionlogics

Security Expert - AI Benchmark & Vulnerability Annotation

7-9 Years
Save
new job description bg glownew job description bg glow
  • Posted 5 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Company Description

MillionLogics, a trusted Oracle Partner, is a global IT solutions leader with offices in London, UK, and a development hub in Hyderabad, India. We specialise in driving digital transformation through scalable and innovative solutions, focusing on Data & AI, Cloud, IT consulting, security, and Oracle technologies. Backed by a team of over 55 AI experts, we prioritise delivering tailored, result-driven solutions that empower enterprises to excel. Committed to innovation and excellence, MillionLogics is at the forefront of helping businesses adapt and thrive in a rapidly evolving digital landscape. For more insights into our services and leadership, visit our website: MillionLogics.

Role Description

We are looking for a Senior Security Expert (LLM Benchmark & AI Safety) to help design, build, and validate a high-difficulty cybersecurity benchmark targeting frontier AI model evaluation.

This is a dual-mandate role: you will both architect challenging, real-world security scenarios for the benchmark and serve as a human annotator, verifying that included vulnerabilities are technically sound, genuinely exploitable, and represent a high-value signal for leading AI labs such as Anthropic, Google DeepMind, and OpenAI.

This role sits at the intersection of offensive security, security research, and AI safety evaluation. You will work closely with ML and data teams to ensure the benchmark reflects the complexity, nuance, and adversarial depth required for evaluating frontier models.

Offer Details:

Mode of work: Fully Remote

Pay: INR 2 lakhs to 2.25 Lakhs Per month (net/take-home)

Duration of Contract: 12 months

Number of positions: 5

Experience: 7+

What does day-to-day look like

Benchmark Design & Example Creation

  • Design and develop complex, multi-step security challenges across:  
  • Application security
  • Cloud misconfigurations
  • Binary exploitation
  • Cryptographic weaknesses
  • API abuse
  • Supply chain attacks
  • Map scenarios to frameworks such as MITRE ATT&CK, OWASP Top 10, and OWASP ASVS
  • Create challenges that distinguish surface-level pattern matching vs deep security reasoning in LLMs
  • Develop grading rubrics and ground-truth solutions, including partial-credit logic
  • Ensure coverage across multiple difficulty tiers to benchmark model capability progression

Human Annotation & Vulnerability Verification

  • Review and annotate benchmark samples across:
  • Validity – Is the vulnerability technically correct
  • Reachability – Can it realistically be triggered
  • Exploitability – What effort and primitives are required
  • Clarity – Is the challenge unambiguous yet non-trivial
  • Flag unrealistic assumptions, inaccuracies, or oversimplifications
  • Evaluate whether samples provide a high-value signal for AI safety and capability evaluations

Quality & Safety Review for Lab Submission

  • Apply dual-use risk filtering to prevent real-world misuse while maintaining technical depth
  • Produce structured metadata: Difficulty rating
  • Domain category
  • Required attacker knowledge
  • Recommended use (capability eval, red-teaming, safety eval, fine-tuning)
  • Collaborate with AI lab evaluation teams to refine benchmark quality and coverage

Security Architecture Input

  • Advise on secure infrastructure for: Benchmark hosting
  • Sample storage
  • Model evaluation pipelines
  • Review tooling and agentic evaluation frameworks from a security perspective

Required Skills and Experience

  • 7+ years in offensive or applied security roles (penetration testing, red teaming, vulnerability research, application security)
  • Proven ability to identify, reproduce, and document real vulnerabilities across:
  • Web applications
  • Cloud environments
  • APIs
  • Systems-level software
  • Strong knowledge of: MITRE ATT&CK, CVE/CVSS methodologies, Exploit development fundamentals
  • Deep understanding of what makes a security challenge genuinely difficult vs superficially complex
  • Experience writing structured technical documentation (vulnerability reports, threat models, risk assessments)
  • Ability to work in high-volume annotation/review pipelines with consistent judgment

Additional Details

  • Commitments Required: 40 hours per week with an overlap of 4 hours with PST. 
  • Engagement Type: Contractor assignment (no medical/paid leave)
  • Duration of contract: 12 months; [expected start date is next week]

Evaluation Process

  • Technical Interview (60 mins)

How to Apply

Please send us your updated CV to [Confidential Information] with email subject: Security Expert - AI

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 147498295