[Remote] Site Reliability Engineer (Dynatrace, AWS and Kubernetes)
Note: The job is a remote job and is open to candidates in USA. Solugenix is assisting a client in their search for a Site Reliability Engineer specializing in Dynatrace, AWS, and Kubernetes. This role involves managing application performance monitoring, automation development, and high-availability design to enhance system reliability and performance.
Responsibilities
- Minimum 10 years of IT experience
- Dynatrace platform administration and advanced configuration aligned to best practices
- Application and infrastructure onboarding, including APM, RUM, tracing, and dependency mapping
- Alerting, events, and problem management design to reduce noise and improve signal quality
- Development of automation using Dynatrace APIs, Terraform, Ansible, and scripting (Python, PowerShell, shell)
- Standardized monitoring intake and lifecycle processes for new systems and applications
- Dashboarding and reporting across applications, infrastructure, cloud, and key platforms (including database, storage, network, and SAP workloads)
- High‑availability design and monitoring for Dynatrace agents, extensions, and synthetic tests
- Proactive application and performance analysis including stack traces, RUM insights, and network flows
- Definition of observability standards, access models, and operating procedures
- Enablement of client teams through documentation, working sessions, and periodic value reviews
- Should be willing to work in EST timings
Skills
- Minimum 10 years of IT experience
- Dynatrace platform administration and advanced configuration aligned to best practices
- Application and infrastructure onboarding, including APM, RUM, tracing, and dependency mapping
- Alerting, events, and problem management design to reduce noise and improve signal quality
- Development of automation using Dynatrace APIs, Terraform, Ansible, and scripting (Python, PowerShell, shell)
- Standardized monitoring intake and lifecycle processes for new systems and applications
- Dashboarding and reporting across applications, infrastructure, cloud, and key platforms (including database, storage, network, and SAP workloads)
- High‑availability design and monitoring for Dynatrace agents, extensions, and synthetic tests
- Proactive application and performance analysis including stack traces, RUM insights, and network flows
- Definition of observability standards, access models, and operating procedures
- Enablement of client teams through documentation, working sessions, and periodic value reviews
- Should be willing to work in EST timings
Company Overview