Key Skills:AI Concepts: Machine Learning, NLP and Deep Learning, AI/ML Technologies: LLM APIs, Prompt Engineering, Machine Learning, GenAi, Advanced ML and Deep Learning Fundamentals (ANN, CNN, RNN)
Roles and Responsibilities:
- Design, build, and maintain end-to-end machine learning pipelines for batch and large-scale data processing.
- Deploy, manage, and scale machine learning models in production on AWS using services such as SageMaker, Bedrock, or custom ML infrastructure.
- Implement and manage MLflow for experiment tracking, model versioning, and model registry management.
- Architect batch and real-time inference systems optimized for performance, scalability, and cost efficiency.
- Work with structured, unstructured, and geospatial data, including satellite and aerial imagery where applicable.
- Collaborate with data scientists to transition models from experimentation to robust production systems.
- Partner with platform engineering teams to design and optimize compute infrastructure, GPU clusters, and storage solutions.
- Build and maintain model monitoring systems to detect performance degradation, bias, and data drift.
- Design and execute canary deployments and A/B testing strategies for safe and reliable model rollouts.
- Develop active learning pipelines to continuously improve model accuracy while minimizing labeling efforts.
- Establish standardized model evaluation frameworks and benchmarking processes.
- Implement observability, logging, and alerting mechanisms for production ML workloads.
- Mentor junior ML engineers and data scientists on best practices for scalable and production-ready ML systems.
- Drive technical decisions related to ML architecture, tooling, and long-term platform strategy.
- Contribute to engineering standards, documentation, and architectural roadmaps.
Skills Required:
- Strong understanding of core AI concepts, including Machine Learning, Natural Language Processing (NLP), and Deep Learning, is required.
- Hands-on experience with GenAI technologies, including LLM APIs and prompt engineering, is required.
- Solid experience in designing, training, and deploying machine learning models in production environments is required.
- Proficiency in deep learning frameworks and techniques, including CNNs, RNNs, and advanced neural network architectures, is required.
- Experience with MLOps practices, including model deployment, monitoring, versioning, and lifecycle management, is required.
- Strong exposure to AWS cloud services for ML workloads is required.
- Experience with ML experiment tracking and model management tools such as MLflow is required.
- Ability to design scalable and cost-efficient inference pipelines is required.
- Familiarity with data drift detection, model performance monitoring, and observability is required.
- Strong problem-solving skills and the ability to work on complex, large-scale ML systems are required.
- Excellent collaboration and communication skills for working with cross-functional engineering and product teams are required.
Education:Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related technical field.