Founding AI OPS Engineer
AIOps + DevOps + CloudOps — Where Intelligence Meets Infrastructure
Equity Only | Pre-Seed Stage Startup | India Only
About Us
We are a startup based in the US and have a registered office in India. We are building an industry-leading FinTech mobile app that brings hedge-fund-grade trading intelligence to everyday investors. Think Robinhood but powered by AI-driven insights, ultra-low-latency systems, and radically transparent user experiences. We need your investment of a minimum of 20 Hours per Week (Part-time), which offers significant equity returns, as you continue working while we secure funding and onboard you as a Full-time employee within the next 6–8 months. You'll be a partner with our Data Scientist, AI engineers, quantitative research, frontend, and backend teams.
As part of our founding technical team, you will inherit, optimize, and implement the operational (Dev/Cloud Ops) foundation that powers our entire platform from AI model infrastructure and trading services to real-time observability and self-healing systems.
Role Overview
An AI-driven Cloud/DevOps Engineer (GitHub CI/CD + Cloud Provisioning/Operations) is a rare hybrid who combines deep knowledge of cloud infrastructure with AIOps principles. You will bridge Dev, Cloud-Ops, and use AI to automate, optimize, and proactively manage our complex, distributed FinTech environment, reducing manual toil and eliminating incidents before they reach production.
Experience
- Minimum 10 years of hands-on work experience in DevOps, CloudOps, AIOps, or a closely related engineering discipline.
Key Responsibilities
Cloud Operations (CloudOps)
- Provision and manage cloud resources on AWS and Azure using both portal-based workflows and Infrastructure as Code (IaC) via Terraform.
- Design and maintain cloud IAM policies in JSON and YAML for both AWS and Azure, ensuring least-privilege access controls.
- Architect scalable, cost-efficient cloud environments aligned with FinTech compliance and security requirements.
Development Operations (DevOps)
- Develop, manage, and optimize GitHub Actions workflows for CI/CD pipelines across all services.
- Implement and maintain container orchestration using Docker and Kubernetes (EKS) for reliable deployments.
- Manage deployment lifecycle, rollbacks, blue-green deployments, and feature flag strategies.
AIOps & Intelligent Monitoring
- Build ML-powered predictive alerting systems to identify anomalies and forecast incidents before they impact production.
- Integrate and administer AIOps platforms such as Splunk and Datadog into the existing IT infrastructure.
- Configure observability stacks using Prometheus and Grafana for real-time dashboards and SLO tracking.
Automated Incident Response
- Design and implement self-healing mechanisms and automated runbooks to resolve recurring IT issues without manual intervention.
- Develop intelligent playbooks that leverage AI to triage, categorize, and route incidents automatically.
- Define and own Mean Time to Resolution (MTTR) SLAs; continuously improve incident response workflows.
Log, Data & Root Cause Analysis
- Analyze high-volume logs, metrics, and distribute traces for root cause analysis (RCA) using ML-assisted tooling.
- Build pipelines that ingest operational telemetry into analytics platforms for continuous insight generation.
- Correlate signals across systems to surface hidden dependencies and failure patterns.
AI Model Lifecycle Management
- Maintain, scale, and monitor AI/ML models in production to ensure consistent performance and reliability.
- Implement model versioning, A/B deployment strategies, and automated rollback on degraded performance.
- Collaborate with ML engineers to operationalize new models with minimal downtime.
Essential Skills & Qualifications
Programming & Scripting
- High proficiency in Python — automation scripts, data pipelines, CLI tooling, and ML integrations.
- Proficiency in Bash/Shell scripting for system-level automation and operational tasks.
- Familiarity with YAML and JSON for configuration management and IAM policy authoring.
Cloud & Infrastructure
- Hands-on experience with AWS services: EC2, Lambda, S3, EKS, IAM, CloudWatch, VPC.
- Working knowledge of Microsoft Azure services and Azure IAM (RBAC, Managed Identities).
- Proficiency with Terraform for Infrastructure as Code across multi-cloud environments.
- Experience with Docker and Kubernetes for containerized workloads and microservices.
AIOps & Observability
- Experience with AIOps platforms: Splunk, Datadog, Dynatrace, or equivalent.
- Proficiency in Prometheus and Grafana for metrics collection, alerting, and dashboards.
- Familiarity with distributed tracing tools (Jaeger, OpenTelemetry) and log aggregation (ELK Stack).
AI / ML Knowledge
- Working understanding ML algorithms, anomaly detection techniques, and NLP.
- Exposure to Generative AI and Large Language Models (LLMs, GPT-based systems) in operational contexts.
- Ability to interpret model performance metrics and identify drift or degradation in production.
DevOps & CI/CD
- Proficiency with GitHub Actions for CI/CD pipeline development and management.
- Understanding of GitOps principles, branching strategies, and release management.
- Experience with secrets management tools (AWS Secrets Manager).
Analytical & Debugging Skills
- Strong ability to debug complex, distributed systems at scale.
- Systematic approach to root cause analysis using data-driven methodologies.
- Comfortable operating in high-ambiguity, early-stage environments with shifting priorities.
Why Join Us
- Founding equity stake: You will own a meaningful share of what we are building.
- Greenfield architecture, no legacy systems; design the stack from scratch, the right way.
- High-impact role at the intersection of AI and financial technology.
- Direct collaboration with founders; your decisions shape the product and company direction.
- A mission that democratizes institutional-grade trading intelligence for everyday investors.