Staff AI/ML full Stack Engineer & Lead
Role Summary
We are seeking a Staff AI/ML solution lead to lead the architecture, design, and delivery of high-performance, enterprise-grade applications. This role combines deep hands-on coding with high-level architectural decision-making. You will work across frontend, backend, cloud infrastructure, database selection and integration layers, ensuring our systems are secure, scalable, and maintainable while enabling long-term technical growth. This hybrid role combines hands-on software engineering, devops and architectural leadership, enabling the delivery of robust, scalable, and innovative AI systems.
- Key Responsibilities:Architecture Leadership Define system architecture, integration patterns, and technology standards for large-scale web and enterprise applications.
- Full Stack Development Build and maintain robust, responsive applications using modern frontend frameworks (React, Vue, streamlit or Angular) and backend services in Python, Golang or RUST.
- Cloud & Infrastructure Architect cloud-native solutions leveraging AWS with a focus on scalability, security, and performance. Implement containerized services with Docker and orchestrate deployments using Kubernetes (K8s).
- API & Service Design Develop RESTful and GraphQL APIs for internal and external integrations.
- DevOps & CI/CD Establish best practices for deployment pipelines, automated testing, and infrastructure-as-code (Terraform, Pulumi).
- Performance Optimization Drive system performance tuning, load balancing, and efficient code design.
- Technical Mentorship Coach and mentor engineers, conduct design/code reviews, and uphold engineering best practices.
- Cross-Functional Collaboration Partner with product, design, and business teams to deliver impactful solutions aligned with company objectives.
- Databases: Will be performing database selection and deployment (strong devops experience required)
- ML: Experience with both ML and LLM stack design (model hubs, vector DBs, embedding pipelines). The role required knowledge to deploy end-to-end architecture of ML applications, traditional and RAG applications, Design of the MLOPS architectures databricks, aws and google
- ML ops: Strong understanding of Agentic AI, framework, best practices
- Clouds: Databricks, AWS mandatory
- End to End production level AI/MLl product deployment experience is required
Qualifications
Must Have:
- Required Qualifications:At least bachelor's in Computer Science mandatory
- 10+ years in deployment enterprise grade cloud level experience and 5+ years in software development
- 5+ years of experience with Databricks and AWS MLops deployment
- This role is more of a software lead and developer with strong Cloud experience to develop infra softwares.
- Architect end-to-end agentic pipelines and tools for others to contribute in the team
- The role required knowledge to deploy end-to-end architecture of ML applications, traditional and RAG applications.
- Architect end-to-end AI/ML systems from data ingestion to model deployment.
- Define best practices for model serving, data pipelines, and ML-OPS strategies.
- engineering, including hands-on model development and architectural design.
- Expertise in traditional ML, deep learning, LLMs, embeddings, and RAG frameworks.
- Strong software engineering skills: Python, API development, microservices, database design, and version control (Git).
- Experience with cloud platforms (AWS, Databricks, Google) and containerized deployments (Docker, Kubernetes).
- Knowledge of ML-OPS, CI/CD for AI, and production model monitoring.
- Strong understanding of software architecture patterns, distributed systems, and scalable data pipelines.
- Databases: Will be performing database selection and deployment (strong devops experience required)
- Preferred:Experience with event-driven architectures and messaging systems (NATs, Kafka, RabbitMQ).
- Familiarity with authentication and authorization frameworks (OAuth2, JWT, SSO).
- Knowledge of observability and monitoring tools (Prometheus, Grafana, OpenTelemetry).
- Background in designing large-scale enterprise or SaaS platforms.
- Python, Golang and Rust development experience is preferred
- Experience in manufacturing and predictive maintenance is a plus
- Background in controls engineering is a plus
- Soft SkillsStrong decision-making and problem-solving skills in high-stakes technical environments.
- Ability to lead and influence architectural direction across teams.
- Excellent communication with both technical and non-technical stakeholders.