Share this Job

Site Reliability Engineer

Date: 16-Jan-2022

Location: Macquarie Park, Australia

Company: Singtel Group


Position Summary


This position reports to the SRE Manager within the Applications Engineering (AE) business Unit and Group IT division.

The Site reliability engineer role is an essential role in the Application Engineering Operations team working with stakeholders in both business and engineering teams to ensure the effective delivery of supporting operational tasks including Incident, Problem and Change management. SR engineer will have a passion for testing, architecture, and observability and able to design and develop new ops tools and explore new concepts.


Key Responsibilities


  • Improve infrastructure performance and operational efficiency by reducing service downtime and maintaining or reducing transaction duration times.
  • Maintain supporting process and tools.
  • Complete all required training courses within the allocated timeframes.
  • Liaise with stakeholders such as internal and external development teams, and relevant IT teams ensuring ensure all stakeholders are considered and are informed of issues and risks in production and test environments. This will be measured by the number of escalations pertaining to lack of communication of production issues and risks.
  • Develop Subject Matter Expertise for the managed AE Products and Platforms. Ensure that all standard operating procedures, known issues and workarounds are documented in the knowledge support systems (i.e., wiki)
  • Follow Change Management processes and procedures ensuring all necessary documentation is completed in necessary timeframes.
  • Maintain AE systems and make recommendations for improvements to those systems. Write code and change config to facilitate the improvements.
  • Actively drive incident resolution and system restoration within SLA’s. This will be measured by the IT fault management reporting systems.
  • Play a key role within Delivery Team, assist the project delivery team to remove all roadblocks within system, platform, security, network, and infrastructure areas
  • Address all platform security compliances


Experience and Qualifications


  • A minimum of 5 years’ experience in the operation and management of large-scale systems in a mission critical environment, in positions requiring the exercise of good technical judgment, proactive solution development complemented with a high level of customer service focus.
  • Deep knowledge and hands on experience of working in a 24 X 7 operations environment supporting mission critical systems for external customers
  • Experience exercising service management processes, incident management, problem management, change management etc
  • Demonstrated experience Java development and building Ops monitoring tools.
  • Experience designing, debugging, and running fault tolerant large-scale distributed systems
  • Springboot, Jboss Fuse framework experience.  
  • Demonstrated experience with software repositories such as GIT & Bitbucket