Senior Site Reliability Engineer - Azure Red Hat OpenShift
Apply remote type Hybrid locations Barcelona - Colonial Remote Spain time type Full time posted on Posted 7 Days Ago job requisition id R-039991
The Red Hat OpenShift Dedicated Site Reliability Engineering (SRE) team is looking for a Senior Software Engineer to join our global team. In this role, you will work on Red Hat OpenShift, which is enterprise Kubernetes, as part of a team that develops and operates Red Hat OpenShift Dedicated, a public cloud service based on Red Hat OpenShift for large enterprise customers. You’ll play a key role in contributing to solutions that make Red Hat OpenShift Dedicated scalable, featureful, resilient, and secure while maintaining a balance between development and operations work. You’ll contribute to the design and development of automation software to provision, upgrade, monitor, and heal a large global fleet of Red Hat OpenShift clusters deployed across multiple public clouds. You'll participate in a global on-call rotation and help lead incident management, root cause analysis, and continuous improvement activities, managing engineering efforts against a service-level agreement (SLA) and error budget. OpenShift SRE is a sophisticated, global, fast-paced team inside the world's open source leader with constant opportunities to learn new skills and innovate new solutions to meet our customers' demands. As a Software Engineer on this team, you will directly contribute to Red Hat's success in the rapidly growing Kubernetes as a Service (KaaS) market.
What you will do:
* Design and write automation software to provision, upgrade, monitor, and heal a large global fleet of Red Hat OpenShift clusters deployed across multiple public clouds
* Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions
* Participate in the release cycles of our offerings, deploying code to integration, staging, and production environments, integrating with continuous integration (CI) and continuous delivery (CD) tooling, monitoring, and change management
* Perform software updates, peer code reviews, testing, and Common Vulnerabilities and Exposures (CVE) analysis; respond to security threats
* Interact with automated monitoring and healing infrastructure to ensure healthy environments
* Provide engineering support to Red Hat's global technical support team to resolve customer issues
* Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in our environment
* Participate in a global on-call rotation, including periodic weekend and holiday on-call duties
What you will bring:
* 3+ years of software engineering experience using object-oriented languages; Golang and Python are preference
* Experience managing Linux-based systems in a public cloud like Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure
* Commercial experience with enterprise system monitoring; knowledge of Prometheus is a plus
* Experience with container technology, Kubernetes, Openshift and configuration management tools (Red Hat Ansible Automation, Puppet, or Chef) is a big plus
* Demonstrated ability to quickly and accurately troubleshoot systems issues
* Solid written and verbal communication skills in English
About Red Hat:
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote, depending on the requirements of their role. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact.
#J-18808-Ljbffr