Company Description
MillionLogics, a trusted Oracle Partner, is a global IT solutions leader with offices in London, UK, and a development hub in Hyderabad, India. We specialise in driving digital transformation through scalable and innovative solutions, focusing on Data & AI, Cloud, IT consulting, security, and Oracle technologies. Backed by a team of over 55 AI experts, we prioritise delivering tailored, result-driven solutions that empower enterprises to excel. Committed to innovation and excellence, MillionLogics is at the forefront of helping businesses adapt and thrive in a rapidly evolving digital landscape. For more insights into our services and leadership, visit our website: MillionLogics.
Role Description
We are looking for a Senior Security Expert (LLM Benchmark & AI Safety) to help design, build, and validate a high-difficulty cybersecurity benchmark targeting frontier AI model evaluation.
This is a dual-mandate role: you will both architect challenging, real-world security scenarios for the benchmark and serve as a human annotator, verifying that included vulnerabilities are technically sound, genuinely exploitable, and represent a high-value signal for leading AI labs such as Anthropic, Google DeepMind, and OpenAI.
This role sits at the intersection of offensive security, security research, and AI safety evaluation. You will work closely with ML and data teams to ensure the benchmark reflects the complexity, nuance, and adversarial depth required for evaluating frontier models.
Offer Details:
Mode of work: Fully Remote
Pay: INR 2 lakhs to 2.25 Lakhs Per month (net/take-home)
Duration of Contract: 12 months
Number of positions: 5
Experience: 7+
What does day-to-day look like
Benchmark Design & Example Creation
- Design and develop complex, multi-step security challenges across:
- Application security
- Cloud misconfigurations
- Binary exploitation
- Cryptographic weaknesses
- API abuse
- Supply chain attacks
- Map scenarios to frameworks such as MITRE ATT&CK, OWASP Top 10, and OWASP ASVS
- Create challenges that distinguish surface-level pattern matching vs deep security reasoning in LLMs
- Develop grading rubrics and ground-truth solutions, including partial-credit logic
- Ensure coverage across multiple difficulty tiers to benchmark model capability progression
Human Annotation & Vulnerability Verification
- Review and annotate benchmark samples across:
- Validity – Is the vulnerability technically correct
- Reachability – Can it realistically be triggered
- Exploitability – What effort and primitives are required
- Clarity – Is the challenge unambiguous yet non-trivial
- Flag unrealistic assumptions, inaccuracies, or oversimplifications
- Evaluate whether samples provide a high-value signal for AI safety and capability evaluations
Quality & Safety Review for Lab Submission
- Apply dual-use risk filtering to prevent real-world misuse while maintaining technical depth
- Produce structured metadata: Difficulty rating
- Domain category
- Required attacker knowledge
- Recommended use (capability eval, red-teaming, safety eval, fine-tuning)
- Collaborate with AI lab evaluation teams to refine benchmark quality and coverage
Security Architecture Input
- Advise on secure infrastructure for: Benchmark hosting
- Sample storage
- Model evaluation pipelines
- Review tooling and agentic evaluation frameworks from a security perspective
Required Skills and Experience
- 7+ years in offensive or applied security roles (penetration testing, red teaming, vulnerability research, application security)
- Proven ability to identify, reproduce, and document real vulnerabilities across:
- Web applications
- Cloud environments
- APIs
- Systems-level software
- Strong knowledge of: MITRE ATT&CK, CVE/CVSS methodologies, Exploit development fundamentals
- Deep understanding of what makes a security challenge genuinely difficult vs superficially complex
- Experience writing structured technical documentation (vulnerability reports, threat models, risk assessments)
- Ability to work in high-volume annotation/review pipelines with consistent judgment
Additional Details
- Commitments Required: 40 hours per week with an overlap of 4 hours with PST.
- Engagement Type: Contractor assignment (no medical/paid leave)
- Duration of contract: 12 months; [expected start date is next week]
Evaluation Process
- Technical Interview (60 mins)
How to Apply
Please send us your updated CV to [Confidential Information] with email subject: Security Expert - AI