LogoLanguage
ZAFIN SOFTWARE CENTRE OF EXCELLENCE PVT LTD

G4, THEJASWINI, TECHNOPARK, TRIVANDRUM , 695581

Cloud Site Reliability Engineer 2

Closing Date:26,Apr 2025
Job Published: 12,Mar 2025
Contact Email: careers@zafin.com

Brief Description

  • Lead and manage the resolution of complex technical issues involving Zafin’s products and Azure cloud environment.
  • Design and implement strategic, operational enhancements to improve resiliency and system reliability.
  • Conduct in-depth Root Cause Analysis (RCA) for high-severity incidents and drive initiatives to reduce error recurrence.
  • Represent the organisation in external client escalation calls, providing expert guidance and solutions.
  • Architect and optimise cloud infrastructure for high performance, scalability, and cost-effectiveness.
  • Provide thought leadership in managing and scaling container orchestration platforms such as AKS and OpenShift.
  • Oversee the implementation of advanced monitoring solutions and integrate predictive analytics for proactive issue resolution.
  • Develop and execute automation strategies to streamline operational workflows and incident responses.
  • Create and maintain comprehensive documentation of cloud architectures, processes, and incident management strategies.
  • Mentor and coach junior engineers, fostering a culture of continuous learning and innovation.
  • Drive strategic initiatives, collaborating with cross-functional teams to achieve organisational objectives.

Preferred Skills

  • Bachelor’s degree in Computer Science, Engineering, or a related field (Master’s degree preferred).
  • 12+ years of experience in cloud support, operations, or a related role.
  • Advanced expertise in Microsoft Azure (preferred) or equivalent cloud platforms.
  • Demonstrated experience in designing and scaling container orchestration systems like AKS or OpenShift.
  • Proven leadership in managing automated deployment pipelines, including Azure DevOps.
  • Mastery of enterprise monitoring platforms (e.g., Azure Insights, Grafana) and predictive analytics tools.
  • Advanced scripting skills with PowerShell, Python, or similar languages.
  • Extensive experience in incident management and defining SLAs for global production environments.
  • In-depth knowledge of database management, particularly Postgres.
  • Preferred Qualifications
    • Advanced certifications in cloud platforms (e.g., Azure Solutions Architect Expert).
    • Experience with ITSM tools and processes (e.g., ServiceNow).
    • Comprehensive understanding of security and compliance in cloud environments.
  • Soft Skills
    • Exceptional analytical and problem-solving abilities.
    • Strong leadership and mentoring skills.
    • Advanced communication and collaboration capabilities.
    • Visionary approach to operational innovation and strategic planning.