[Remote] DevOps/Observability Engineer
Note: The job is a remote job and is open to candidates in USA. Quantiphi is an award-winning, AI-First global digital engineering company that helps leading Fortune 1000 organizations transform ideas into measurable business impact. They are seeking a highly experienced Senior DevOps/Observability Engineer to lead the design and implementation of their next-generation observability platform, focusing on architecting sophisticated observability pipelines using modern technologies on AWS.
Responsibilities
- Lead the design and implementation of a next-generation, unified observability platform
- Architect a sophisticated observability pipeline leveraging a modern, open-source-centric stack on AWS
- Deploy, configure, and integrate a suite of tools including Prometheus, Grafana, and Splunk for insights into distributed systems
Skills
- Unified Pipeline Architecture: Proven ability to design and implement end-to-end observability pipelines using OpenTelemetry, Prometheus, and Grafana on centralized infrastructure
- Cross-Account AWS Observability: Deep expertise in centralizing AWS telemetry, including multi-account CloudTrail organization trails, cross-account CloudWatch metrics/logs, and VPC Flow Logs
- Log Aggregation & Routing: Strong experience designing log aggregation strategies, implementing noise reduction/filtering at the collector level, and configuring Splunk HTTP Event Collector (HEC) integrations
- Advanced Alerting & Dashboarding: Hands-on experience building comprehensive alerting frameworks using Alertmanager and CloudWatch Alarms, coupled with advanced dashboard engineering in Grafana (using PromQL)
- Infrastructure as Code (IaC): Advanced proficiency in writing Terraform modules specifically for deploying and managing observability stacks and EC2 infrastructure
- Enterprise Scale Log Management: Demonstrated experience managing, routing, and optimizing log pipelines at massive scale (TB/day)
- Kubernetes/Container Observability: Experience deploying Prometheus and OTel within Kubernetes (EKS) or containerized (ECS) environments
- Cost Optimization: Proven track record of reducing observability spend through strategic metric dropping, log filtering, and efficient storage tiering
Benefits
- Join one of the world's fastest-growing AI-first digital engineering companies and make a real impact at scale.
- Lead and collaborate with a high-energy team of talented, driven individuals solving complex, meaningful challenges.
- Work with Fortune 500 companies and disruptive innovators in a research-driven environment with 60+ patents.
- Stay ahead of the curve by gaining hands-on experience with cutting-edge AI, ML, data, and cloud technologies while continuously upskilling.
Company Overview
Company H1B Sponsorship