[Remote] DevOps Engineer - Senior Vice President
Note: The job is a remote job and is open to candidates in USA. iCapital is a company focused on ensuring that production and development environments operate smoothly and securely. They are seeking a Senior Vice President DevOps Engineer to leverage advanced cloud capabilities, support MLOps pipelines, and partner with various teams to deliver automated platforms for AI and machine learning workloads.
Responsibilities
- Design, build, and operate MLOps pipelines supporting the full ML lifecycle (training, validation, deployment, monitoring)
- Enable production workloads for AI/ML and Generative AI systems, including LLM-based services
- Develop and maintain CI/CD pipelines for AI/ML services and supporting infrastructure
- Build and manage cloud-native infrastructure on AWS, with heavy use of Kubernetes and containerized workloads
- Automate infrastructure provisioning and configuration using Infrastructure as Code (Terraform)
- Implement model versioning, experiment tracking, and artifact management across environments
- Ensure reliability, scalability, observability, and cost efficiency of AI platforms
- Partner with AI/ML engineers to operationalize models and standardize deployment patterns
- Implement monitoring and alerting for system health, model performance, and drift
- Enforce security, compliance, and governance requirements for AI workloads
- Participate in incident response, root cause analysis, and continuous improvement initiatives
- Document standards, best practices, and reference architectures for MLOps and AI infrastructure
Skills
- 15+ years of experience in DevOps, SRE, or Platform Engineering, with AWS as a primary cloud
- Experience supporting machine learning systems in production, including deployment and monitoring concerns
- Hands-on experience with machine learning platforms, particularly AWS SageMaker (required)
- Strong hands-on experience with Kubernetes, containerized workloads, and cloud networking
- Proven experience building and operating CI/CD pipelines (e.g., GitLab CI, ArgoCD)
- Strong proficiency with Terraform and scripting/programming in Python or similar languages
- Solid Linux, systems, and troubleshooting fundamentals
- Excellent communication skills and ability to work across teams
- Direct experience with MLOps platforms and tooling (model registries, experiment tracking, feature stores)
- Exposure to Generative AI / LLM workloads in production environments
- Familiarity with data stores commonly used in ML systems (e.g., Postgres, DynamoDB, object storage)
- Experience operating in regulated or fintech environments
- Background in cost optimization for compute-intensive workloads
- Strong written and verbal communication skills
- AWS certifications are a plus
Benefits
- Equity for all full-time employees
- An annual performance bonus
- A comprehensive benefits package that includes an employer matched retirement plan
- Generously subsidized healthcare with 100% employer paid dental, vision, telemedicine, and virtual mental health counseling
- Parental leave
- Unlimited paid time off (PTO)
- Employees in this role will work in the office Monday-Thursday, with the flexibility to work remotely on Friday
Company Overview
Company H1B Sponsorship