Job Description
- Design and build data-centric GenAI methods for synthetic data generation, multimodal data curation, data augmentation, filtering, deduplication, and quality assessment.
- Develop and evaluate synthetic data pipelines for text, speech, vision, and multimodal GenAI use cases, including controllable generation, provenance tracking, safety checks, and domain adaptation.
- Build evaluation frameworks that connect data quality to downstream GenAI model performance, including benchmark design, ablation studies, error analysis, and model-feedback loops.
- Research and implement modern generative AI techniques, including LLM/VLM-based data generation, fine-tuning, instruction tuning, preference optimization, and model-based data labeling.
- Build scalable data and ML pipelines for acquisition, cleaning, transformation, metadata extraction, embedding generation, labeling, training, and evaluation.
- Develop production-quality code for batch and real-time ML workflows, including model inference, feature processing, data validation, monitoring, and operational automation.
- Translate research papers and emerging GenAI techniques into practical systems that improve data quality, model quality, and customer-facing AI outcomes.
- Partner with modeling, product, infrastructure, and domain teams to define GenAI data requirements, quality bars, evaluation criteria, and delivery plans.
- Operate across the full lifecycle: research, prototyping, experimentation, productionization , testing, CI/CD, monitoring, runbooks, and production support.
Responsibilities
- Ph.D. degree, Master's degree, or equivalent experience in computer science, artificial intelligence, machine learning, operations research, statistics, or a related technical field.
- 5+ years with a Master's degree or 3+ years with a Ph.D. applying machine learning to real-world problems.
- Strong Python programming skills and experience building production-quality ML, GenAI, or data systems.
- Hands-on experience with PyTorch and modern deep learning stacks; experience with Hugging Face, LLMs, VLMs, diffusion models, or multimodal models is strongly preferred.
- Experience with data-centric AI or GenAI methods such as synthetic data generation, data quality measurement, dataset curation, weak supervision, model-based labeling, active learning, deduplication, or data augmentation.
- Experience designing experiments and interpreting results through statistical analysis, ablation studies, benchmark evaluation, and error analysis.
- Strong understanding of model training, inference, evaluation, and production monitoring.
- Ability to read research papers, identify practical value, and implement useful techniques in real systems.
- Strong written and verbal communication skills, including technical proposals, design documents, experiment reports, and stakeholder presentations.
- Experience building scalable data or ML pipelines using distributed compute, cloud storage, batch processing, or workflow orchestration.
Qualifications
Career Level - IC3
About Us
Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.
True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We're committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [Confidential Information] or by calling 1-888-404-2494 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.