The OpportunityEvaluate cutting-edge AI-generated code through real-world engineering challenges. Work with major open-source projects — data orchestration engines, ML/AI model libraries, LLM tooling frameworks, and workflow automation platforms. Your structured, evidence-based evaluations help train and benchmark the next generation of AI coding assistants.
About Biz-Tech AnalyticsBiz-Tech Analytics (BTA) is a data services and AI solutions company specializing in high-quality labeled datasets, training data, and AI model evaluation. We partner with leading AI research teams and enterprises to deliver the data infrastructure that powers responsible AI development.
What You'll Do- Analyse Real Engineering Challenges: Work with complex engineering tasks sourced from major open-source projects (e.g., implementing new schedulers, adding model architectures, building tool integrations).
- Write Engineering Prompts: Craft clear, self-contained prompts that describe engineering tasks for AI coding models to attempt.
- Set Up Development Environments: Prepare local development setups using Git, VS Code, tmux, and Python. Clone repos, create branches, and configure dependencies.
- Run Evaluation Workflows: Use provided tooling to generate AI model outputs on engineering tasks and capture the resulting code changes.
- Review Code Changes: Thoroughly examine every file change in diffs. Validate correctness, code quality, adherence to best practices, and alignment with original requirements.
- Complete Structured Evaluations: Write 500–800 word evaluations assessing AI-generated code across multiple dimensions including correctness, code quality, and engineering judgment. Provide evidence-backed ratings using structured rubrics.
You Must Have- 3+ years of professional software development experience
- Strong proficiency in Python AND at least one of: JavaScript/TypeScript, Go, Rust, Java, or C++
- Experience reading and reasoning about large, unfamiliar codebases — including tracing cross-file dependencies across 10-50+ file changes
- Solid understanding of software testing: unit/integration tests, test frameworks (pytest, Jest, Mocha), and ability to assess test completeness and coverage gaps
- Git fluency: branching, PRs, diffs, cherry-picks, and merge conflict resolution
- Ability to set up and troubleshoot development environments (Docker, tmux, VS Code, terminal/CLI workflows)
- Strong written English: ability to compose detailed, structured evaluation reports (500+ words, clear logic, professional tone)
- Code review experience: ability to read code critically, trace architectural impact across files, assess backward compatibility, and identify correctness, performance, and maintainability issues
Bonus Points- Contributions to open-source projects (data orchestration, ML/AI libraries, LLM tooling, workflow automation, or similar)
- Experience with ML/AI frameworks (PyTorch, TensorFlow, JAX) — strongly preferred as 40% of tasks involve ML-related codebases
- Familiarity with AI code generation tools (GitHub Copilot, Claude, ChatGPT)
- Background in QA/testing roles, formal code review, or experience across diverse domains (backend APIs, data pipelines, ML systems, developer tooling)
Please Don't Apply If- You've only done data engineering/ETL without application development
- You struggle to read and reason about code you didn't write
- You're uncomfortable with terminal/CLI workflows
- Your written English isn't strong enough for structured 500+ word evaluations
- You're looking for a pure coding/development role (this is evaluation-focused)