DevOps Engineer-GPU Cloud

Date: 8 Apr 2024

Location: Singapore, Singapore

Company: Singtel Group

“At Singtel, our mission is to Empower Every Generation. We are dedicated to foster an equitable and forward-thinking work environment where our employees experience a strong sense of Belonging, to make meaningful Impact and Grow both personally and professionally. By joining us, you will be part of a caring, inclusive and diverse workforce that creates positive impact and a sustainable future for all.

 

Be a Part of Something BIG!

 

We are expanding our team at Singtel’s Digital InfraCo as we will be launching GPU-as-a-Service (GPUaaS) in Singapore and Southeast Asia in the third quarter of this year, providing enterprises with access to NVIDIA’s AI computing power to drive greater efficiencies to accelerate growth and innovation.

 

At launch, Singtel’s GPUaaS will be powered by NVIDIA H100 Tensor Core GPU-powered clusters that are operated in existing upgraded data centres in Singapore. In addition to NVIDIA H100 GPUs, we will be among the world's first to deploy NVIDIA’s next-generation GB200 Grace Blackwell Superchips, which deliver 30X faster real-time large language model inference than its predecessor. We will be one of the first NVIDIA Partner Network Cloud Partners to receive them early next year and this will give enterprise customers options of different types of accelerators for their advanced computing and AI needs.”

 

Make an Impact by

 

  • Architect and automate the CI/CD production, staging and development pipelines
  • Define new best practises and devOps standards as required.
  • Look for opportunities to optimize and enable consistent automated deployments.
  • Hands-on automation engineer with experience in creating Infrastructure as Code, automating application deployments, and working with vendor and hyper scalar APIs to automated deployments
  • Create tools and scripts that help automate deployments
  • Direct project teams toward solutions that align with agreed guiding principles, strategy, architecture, and standards
  • Guiding multiple teams on how to automate application and infrastructure deployment
  • Serve as a leader and mentor for a team of engineers with a primary focus on automation
  • Embed security controls, implementation, and testing into the DevOps practices
  • Drive improvements for the design, development, and delivery of applications
  • Drive systems engineering design and recovery by eliminating manual involvement and leading continuous improvements that create an operating environment that includes dynamically  monitoring, alerting, and automated self-healing and recovery
  • Work with an automation first mindset and work to install that in others
  • Utilize agile practices to ensure consistent and transparent execution.
  • Provide mentoring and knowledge transfer to others, and promote open culture and DevOps.
  • Manage and maintain the DevOps pipeline, and work with dev teams on a combined pipeline.
  • Lead technology evaluations and implementations to fill gaps in the Technology Architecture for software build, testing, deployment and scalability.
  • Monitor standards/policy compliance by developing and executing governance processes and tools.

 

Skills for Success

 

  • Strong hands-on and working experience in Ubuntu , Linux Operating system and good to have RHCE/RHCSA Certification.
  • Have hands-on and good experience in Cloud Technologies with one of the public cloud providers ​AWS, GCP, Azure.
  • Strong understanding on computer networking ​VPC, Subnets, VPN and network connectivity (TCP, UDP, ICMP), etc.
  • Experience in deploying ​IaC infrastructure as code​ with ​Terraform.
  • Possess solid and deep knowledge and experience with ​containers​ and containers orchestration and deployments tools like Docker Swarm, ​Kubernetes, Helm etc.,
  • You have an aptitude and ability to build and maintain continuous integration (​CI​) and continuous deployment/delivery (​CD​) systems for complex, distributed applications, using tools like GitHub Actions, ​Jenkins etc.
  • Working experience in one of the configuration tools like Ansible, Chef and Puppet.
  • State of the art experience ​diagnosing​ and ​debugging​ applications in ​complex, distributed heterogeneous ​computing​ environments.
  • Mastery of essential development tools like​ GIT​and familiarity with collaboration tools such as ​Jira​ and Confluence​ or similar tools.
  • Skills in ​API usage​, command-line interface, and ​SDKs​ for writing applications
  • Have networking experience and understanding of network protocols, ​DNS, VPN​ , and ​Load Balancing​.
  • Have ​API Gateway​ experience ​Nginx, Kong, APIGEE etc.
  • Have extensive scripting experience in​ Shell (bash, zsh, csh, ksh), Python, Perl etc.
  • Experience in logging, monitoring, tracing with tools like Azure Monitor, ​Cloudwatch, ​ ​Zabbix, Elasticsearch/Kibana (ELK), Prometheus/Grafana, ​ ​New Relic, Data Dog, Dynatrace​ , etc.
  • Good understanding of the DB technologies SQL and NOSQL such as ​MongoDB, DynamoDB​, MySQL and ​PostgreSQL​.

 

 

Preferred Skills:

 

  • Good understanding of technologies such as ​pubsub​ - ​Kafka​, ​service mesh​ - ​Istio, Envoy, ​design patterns - ​REST API, GraphQL, microservice architecture, ​security - ​Vault, ​service discovery -Consul, ZooKeeper, etcd ​, etc.

 

Rewards that Go Beyond

  • Flexible work arrangements
  • Full suite of health and wellness benefits 
  • Ongoing training and development programs 
  • Internal mobility opportunities

 

Your Career Growth Starts Here. Apply Now!

 

We are committed to a safe and healthy environment for our employees & customers and will require all prospective employees to be fully vaccinated.