Microsoft

Senior Reliability Engineer

Job Description

Posted on: 
2025-10-18

Responsibilities

  • Build and bring specialized knowledge across multiple production aspects (monitoring, release engineering, testing, live site excellence, buildout, performance optimization, capacity management).
  • Analyze large-scale telemetry and operational data to uncover insights and drive data-informed decisions.
  • Use principles and practices such as safe deployment, testing for reliability, and disaster recovery.
  • Respond to alerts and incidents.
  • Build and follow playbooks to drive root cause analysis and reviews.
  • Partner with hardware and firmware teams to understand system behavior and identify predictive analytics opportunities.
  • Participate in an on-call rotation and contribute to service reliability and incident resolution.

Job Requirements

  • Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree AND 4+ years of experience.
  • 3+ years of experience in software engineering or operations for large-scale distributed systems.
  • Ability to support a 24x7 data center environment and participate in on-call rotations.
  • Proficiency in programming languages (C#, Python, Go, etc.).
  • Understanding of cloud infrastructure (Azure preferred), networking, and system design.
  • Familiarity with monitoring tools, incident management frameworks, and DevOps practices.
  • Must meet Microsoft security screening requirements.
Apply now

More job openings