Site Reliability Engineer - DevOps
Site Reliability Engineer
Must have mix of software development (Java, Python NodeJS or Scala) and network engineering or systems admin
Working for a leading technology organisation, they architect and build the ground-breaking solutions. A Site Reliability Engineering (SRE) is what you get when you treat operations like a software problem.
You will be part of an organisation that lives and breathes technology. They don't do off-the-shelf and don't do easy. Because they are writing, building and running all their own software systems, they are constantly pushing the limits of what technology can do.
The new SRE team will be doing work that has historically been done by an operations team, but using engineers with software expertise.
Candidates will have with good software skills, who also is an expert in network engineering or system administration. Typicall they look for about a 50/50 mix of people who have more of a software background and people who have more of a systems engineering background.
As a Software Engineer on the SRE team, you will have the opportunity to tackle the complex problems while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
What will you be doing
- Design, write and deliver software to improve the availability, scalability, latency, and efficiency of the client's services.
- Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.
- Influence and create new designs, architectures, standards and methods for large-scale distributed systems.
- Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
What you need to know
- Experience with algorithms, data structures, complexity analysis and software design.
- Experience in one or more of: Java, Python, NodeJS, Scala.
- Expertise in designing, analysing and troubleshooting large-scale distributed systems.
- Familiarity with running web services at scale; understanding of Unix systems internals and networking.
- Understanding of Unix/Linux systems from Kernel to Shell and beyond, taking in system libraries, file systems, and Client Server protocols along the way.
- Networking: knowledge and understanding of network theory, such as different protocols (TCP/IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
- Systematic problem solving approach, coupled with a strong sense of ownership and drive.