All roles

[Remote] Cloud Operations Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. O'Reilly Media is dedicated to sharing the knowledge of innovators and helping professionals develop expertise. As a Cloud Operations Engineer, you will work on systems and tooling that power the learning platform, focusing on infrastructure-as-code and maintaining Kubernetes while collaborating with product engineering teams.

Responsibilities

  • Maintaining and updating our Kubernetes cluster to ensure steady-state operations
  • Writing or extending Terraform modules to provision and manage cloud infrastructure
  • Contributing features to the Python CLI tooling we use to manage infrastructure workflows
  • Design, build, and maintain cloud infrastructure using infrastructure-as-code (Terraform) on GCP
  • Manage and evolve our Kubernetes platform, including cluster operations, workload configuration, and service mesh (Istio)
  • Develop and improve internal tooling that abstracts cloud complexity and improves the developer experience
  • Collaborate with product engineering teams to understand service deployment needs and deliver infrastructure solutions
  • Monitor platform health using Datadog; proactively identify and resolve performance, availability, and security issues
  • Participate in on-call rotation and incident response; drive blameless post-mortems and eliminate recurring issues at their root cause
  • Define and track service-level indicators and objectives (SLIs/SLOs) for critical platform components
  • Implement and refine alerting, dashboards, and runbooks that reduce mean time to resolution
  • Embed security best practices into infrastructure workflows (DevSecOps) — not as an afterthought, but as a design principle
  • Help maintain cloud security posture, IAM hygiene, and policy guardrails across our cloud environment
  • Stay current with cloud security developments and proactively surface risks to the team
  • Execute and maintain our automated disaster recovery processes
  • Work closely with product engineering teams to understand their needs and remove infrastructure friction
  • Document systems, processes, and architectural decisions clearly so knowledge is shared, not siloed
  • Recommend improvements to tooling, architecture, and processes — and help drive them to completion
  • Keep current with the evolving cloud-native ecosystem and bring relevant knowledge back to the team

Skills

  • Bachelor's degree in Computer Science or a related field
  • 5+ years of experience working in cloud infrastructure, platform engineering, or a related discipline
  • In lieu of degree, equivalent education and/or experience may be considered
  • Hands-on experience with Kubernetes in production environments (cluster management, workloads, networking)
  • Proficiency with infrastructure-as-code tools, particularly Terraform
  • Experience with at least one major cloud provider (GCP, AWS, or Azure)
  • Solid scripting and automation skills in Python, Bash, or a comparable language
  • Experience with modern observability platforms (Datadog, Grafana, or similar)
  • Strong understanding of Linux systems administration
  • Working knowledge of CI/CD concepts and tools (GitHub Actions, ArgoCD, Jenkins, or similar)
  • Excellent communication skills — you write clearly, ask good questions, and explain complex systems accessibly
  • AI-Augmented Development: Has the ability to demonstrate using AI-enabled development tools (e.g., Claude Code, Cursor) to streamline coding, debugging, and infrastructure-as-code authoring
  • Experience with service mesh technologies such as Istio or Linkerd
  • Familiarity with GitOps workflows and tools (ArgoCD, Flux)
  • Experience with DevSecOps practices and tooling (Snyk, Trivy, OPA, or similar)
  • Working knowledge of SQL databases (PostgreSQL or MySQL)
  • Familiarity with FinOps practices and cloud cost optimization
  • Experience building or consuming internal developer platforms (IDPs)
  • Configuration management experience (Ansible, Chef, or similar)
  • Relevant certifications (CKA, CKAD, AWS/GCP Professional, or similar)

Company Overview

  • Inspiring the future for more than 45 years We share the knowledge and teach the skills people need to change their world. It was founded in 1978, and is headquartered in Seattle, Washington, USA, with a workforce of 201-500 employees. Its website is http://dankaminsky.com.
  • Apply To This Job

    Related roles

    [Remote] Account Director, Central US (Remote)

    Remote · USA Full-time

    [Remote] SOLUTIONS ARCHITECT- [Clinician/ UX]

    Remote · USA Full-time

    [Remote] Lead Product Manager, First-Party Data Platform

    Remote · USA Full-time

    [Remote] Senior Technical Recruiter, AI/ML Research

    Remote · USA Full-time

    [Remote] Epic Clarity Analyst/ SQL Developer - Remote

    Remote · USA Full-time

    [Remote] Master Network Engineer - Security Infrastructure

    Remote · USA Full-time

    [Remote] Sr Epic Application Analyst - Epic Bones & Kaleidoscope-27665

    Remote · USA Full-time

    [Remote] Software Engineer | $75/hr Remote

    Remote · USA Full-time

    [Remote] Sr Epic Application Analyst - Epic Beaker-25571

    Remote · USA Full-time

    [Remote] Senior Machine Learning Engineer

    Remote · USA Full-time

    QA Auditor - Anatomical Pathology

    Remote · USA Full-time

    Experienced Customer Service Representative - Travel Industry Expert - Join arenaflex's Dynamic Team

    Remote · USA Full-time

    Experienced Data Entry Professional for Remote Work Opportunity with blithequark – No Prior Experience Required for Home-Based Career Development

    Remote · USA Full-time

    Director - Software Engineering; ServiceNow

    Remote · USA Full-time

    [Remote/WFM] Need PHYSICAL THERAPIST OUTPATIENT(PER DIEM) in

    Remote · USA Full-time

    Sales Representative BC - Newark, NJ (Spanish Required)

    Remote · USA Full-time

    Data Engineering Technical Lead - VP

    Remote · USA Full-time

    Experienced Customer Service Representative – Amazon Online Remote Job

    Remote · USA Full-time

    Solution Innovation Architect - AI/ML

    Remote · USA Full-time

    Experienced Customer Service Representative (Contract) – Healthcare Industry Expert

    Remote · USA Full-time