[Remote] Software Engineer- Site Reliability Engineering

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. Noctua Technology is a software engineering and consulting corporation focused on data engineering, machine learning, and cloud technologies. They are seeking a motivated Site Reliability Engineer (SRE) to apply software engineering principles to operations, ensuring the reliability, scalability, and performance of production systems.

Responsibilities

Define, measure, and report on Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure system reliability and uptime
Develop and deploy Infrastructure as Code (IaC) using Terraform, CloudFormation, or similar tools, with an emphasis on repeatability and change management
Implement and manage containerized and serverless architectures using Docker, Kubernetes, and cloud-native services, focusing on performance and error budgets
Build and maintain reliable and self-healing CI/CD pipelines to automate deployments and improve development workflows
Implement and refine comprehensive monitoring, alerting, and logging to detect and address performance and availability issues proactively
Eliminate toil by extensively automating operational tasks, including provisioning, patching, and deployments, using scripting and configuration management tools such as Python, Bash, or Ansible
Conduct post-incident reviews (blameless postmortems) to drive continuous improvement in system reliability and operational processes
Implement cloud security best practices, including identity and access management (IAM), encryption, and compliance controls
Proactively identify and address system weaknesses and ensure performance under stress
Support disaster recovery and high availability strategies through backup and failover planning
Collaborate with development teams to improve the operability and production readiness of applications from design through deployment
Create and maintain documentation for cloud architectures, deployment processes, and best practices
Contribute to internal knowledge-sharing initiatives, ensuring continuous learning within the team
Provide technical guidance and support to clients and internal teams on cloud infrastructure and reliability best practices, with a focus on defining Service Level Agreements (SLAs)
Act on client feedback to refine and enhance cloud solutions
Conduct training and knowledge-sharing sessions to help clients manage their cloud environments effectively
Stay updated on the latest developments in cloud infrastructure and technology trends
Drive innovation by proposing and implementing new techniques and technologies

Skills

1-5 years of experience in site reliability engineering, cloud engineering, or related fields
Strong software engineering skills with an emphasis on writing clean, modular, and maintainable code, specifically for automation and system management
Proficiency in Infrastructure as Code (IaC) tools like Terraform or CloudFormation
Experience with containerization and orchestration tools like Docker and Kubernetes
Knowledge of networking concepts, cloud security best practices, and identity management
Experience with programming or scripting languages such as Python, Bash, or Go
Familiarity with CI/CD pipelines and DevOps methodologies
Strong problem-solving skills and the ability to troubleshoot complex cloud environments
Effective communication skills and a willingness to learn and collaborate
Bachelor's or advanced degree in Computer Science or a related field
Google Cloud Professional Cloud Architect
Google Cloud Professional Cloud DevOps Engineer
AWS Certified Solutions Architect
AWS Certified Developer
AWS Certified SysOps Administrator
Azure Solutions Architect Expert
CompTIA Security+ certification or an equivalent DoD 8140/8570 IAT Level II baseline certification

Company Overview

The experts at Noctua Technology, Inc. It was founded in 2024, and is headquartered in San Diego, California, US, with a workforce of 2-10 employees. Its website is https://noctuatech.com.

Apply To This Job

Apply

[Remote] Software Engineer- Site Reliability Engineering

Related roles

[Remote] Software Engineer - Application Development

Product Security Engineer

Software Development Engineer

Scientist I / Scientist II, Computational Protein Generation

Software Engineer - AI

IT Ops Spec

Software Engineer II

Software Engineer I, Backend

Software Engineer I, Backend

Wires Platform Analyst

Maintenance Project Coordinator

Experienced Remote Customer Service Representative – arenaflex

Experienced Travel Consultant

Client Advisor with Athletic Background

Amazon Delivery Driver

Customer Success Manager (Agencies/Entities)

Experienced Overnight Customer Care Representative (Remote Live Chat) - No Experience Required

Experienced Phone and Chat Specialist with Bonus Opportunity – Join arenaflex's Critical Team Supporting Essential Healthcare Workers

Lead Data Science Consultant - Customer Excellence Data and Analytics

Client Service Associate