Security Expert - AI Benchmark & Vulnerability Annotation

millionlogics

India

7-9 Years

Save

Posted 5 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Company Description

MillionLogics, a trusted Oracle Partner, is a global IT solutions leader with offices in London, UK, and a development hub in Hyderabad, India. We specialise in driving digital transformation through scalable and innovative solutions, focusing on Data & AI, Cloud, IT consulting, security, and Oracle technologies. Backed by a team of over 55 AI experts, we prioritise delivering tailored, result-driven solutions that empower enterprises to excel. Committed to innovation and excellence, MillionLogics is at the forefront of helping businesses adapt and thrive in a rapidly evolving digital landscape. For more insights into our services and leadership, visit our website: MillionLogics.

Role Description

We are looking for a Senior Security Expert (LLM Benchmark & AI Safety) to help design, build, and validate a high-difficulty cybersecurity benchmark targeting frontier AI model evaluation.

This is a dual-mandate role: you will both architect challenging, real-world security scenarios for the benchmark and serve as a human annotator, verifying that included vulnerabilities are technically sound, genuinely exploitable, and represent a high-value signal for leading AI labs such as Anthropic, Google DeepMind, and OpenAI.

This role sits at the intersection of offensive security, security research, and AI safety evaluation. You will work closely with ML and data teams to ensure the benchmark reflects the complexity, nuance, and adversarial depth required for evaluating frontier models.

Offer Details:

Mode of work: Fully Remote

Pay: INR 2 lakhs to 2.25 Lakhs Per month (net/take-home)

Duration of Contract: 12 months

Number of positions: 5

Experience: 7+

What does day-to-day look like

Benchmark Design & Example Creation

Design and develop complex, multi-step security challenges across:
Application security
Cloud misconfigurations
Binary exploitation
Cryptographic weaknesses
API abuse
Supply chain attacks
Map scenarios to frameworks such as MITRE ATT&CK, OWASP Top 10, and OWASP ASVS
Create challenges that distinguish surface-level pattern matching vs deep security reasoning in LLMs
Develop grading rubrics and ground-truth solutions, including partial-credit logic
Ensure coverage across multiple difficulty tiers to benchmark model capability progression

Human Annotation & Vulnerability Verification

Review and annotate benchmark samples across:
Validity – Is the vulnerability technically correct
Reachability – Can it realistically be triggered
Exploitability – What effort and primitives are required
Clarity – Is the challenge unambiguous yet non-trivial
Flag unrealistic assumptions, inaccuracies, or oversimplifications
Evaluate whether samples provide a high-value signal for AI safety and capability evaluations

Quality & Safety Review for Lab Submission

Apply dual-use risk filtering to prevent real-world misuse while maintaining technical depth
Produce structured metadata: Difficulty rating
Domain category
Required attacker knowledge
Recommended use (capability eval, red-teaming, safety eval, fine-tuning)
Collaborate with AI lab evaluation teams to refine benchmark quality and coverage

Security Architecture Input

Advise on secure infrastructure for: Benchmark hosting
Sample storage
Model evaluation pipelines
Review tooling and agentic evaluation frameworks from a security perspective

Required Skills and Experience

7+ years in offensive or applied security roles (penetration testing, red teaming, vulnerability research, application security)
Proven ability to identify, reproduce, and document real vulnerabilities across:
Web applications
Cloud environments
APIs
Systems-level software
Strong knowledge of: MITRE ATT&CK, CVE/CVSS methodologies, Exploit development fundamentals
Deep understanding of what makes a security challenge genuinely difficult vs superficially complex
Experience writing structured technical documentation (vulnerability reports, threat models, risk assessments)
Ability to work in high-volume annotation/review pipelines with consistent judgment

Additional Details