Site Reliability Engineer
What you will do
As a Site Reliability Engineer (SRE) you will solve exciting technical challenges by defining designing deploying and troubleshooting key Oracle Cloud services platforms and infrastructure always thinking about reliability scalability resilience security and performance.
You will be part of a team of SREs whose mission is the shared full stack ownership of a collection of cloud services and technologies areas integral to the support of medical institutions across the world.
* Ensure System Reliability Monitor and maintain the health of our production environments implementing strategies to achieve high availability and minimal downtime.
* Incident Management Quickly identify and resolve incidents conducting thorough post-mortem analyses to prevent future occurrences and improve our response processes.
* Performance Optimization Analyze system performance metrics to identify bottlenecks and develop solutions that enhance system efficiency and scalability.
* Automation Tooling Create and maintain automated processes for deployment monitoring and incident response to streamline operations and reduce manual intervention.
* Infrastructure as Code Utilize tools like Terraform or CloudFormation to define and manage infrastructure ensuring consistency and repeatability across environments.
* Capacity Planning Collaborate with teams to forecast future system demands and implement scalable solutions that meet growing user needs.
* Collaboration Communication Work closely with development and product teams to integrate reliability into the software development lifecycle advocating for best practices and sharing insights.
* Security Compliance Implement security best practices and ensure compliance with industry standards to protect our systems and data.
* Continuous Improvement Contribute to a culture of continuous improvement by identifying areas for enhancement sharing knowledge and mentoring junior engineers.
* On-Call Duties Participate in an on-call rotation to provide after-hours support and ensure operational excellence around the clock.
Responsibilities
Required experience
* 2+ Years of Experience Managing Complex IT Systems/Managing IT Systems
* Scripting languages such as Python Ruby Bash etc.
* Configuration management tools such as Chef Ansible etc.
* Monitoring and Instrumentation
* DevOps toolchain (general understanding)
* IT Security and compliance
* Methodical approach to troubleshooting complex problems
* Fluent in English (C1)
* Willing to be on on-call duty
* Hybrid working environment -1-2 days per week in the Madrid Office
* Knowledge In - Server Hardware Configuration
* Linux Internals
* Networking and TCP/IP
* Standard Internet Services (DNS/HTTP/etc)
Preferred Experience
* 5+ year experience of running large scale customer facing web services
* Cloud infrastructure Knowledge (AWS/OCI)
* Kubernetes Experience
What we will offer you
* Learning and development opportunities to advance your career
* An Employee Assistance Program to support your well-being
* Flexible and hybrid working so you can do your best work
* Employee resource groups that champion our diverse communities
* Core benefits such as life insurance and access to meal vouchers
* An inclusive culture that celebrates what makes you unique
About Us
As a world leader in cloud solutions Oracle uses tomorrows technology to tackle todays problems. True innovation starts with diverse perspectives and various abilities and backgrounds.
When everyones voice is heard were inspired to go beyond whats been done before. Its why were committed to expanding our inclusive workforce that promotes diverse insights and perspectives.
Weve partnered with industry-leaders in almost every sector-and continue to thrive after 40+ years of change by operating with integrity.
Python, Ruby, Bash, linux,