

Site Reliability Engineer
Location
Redmond, WA
Level
Mid-Level
Department
Consumer Electronics
Type
Salary
$84,000 - $165,000
Job Description
Posted on:
2026-02-14
Responsibilities
- Own end-to-end reliability for Azure Storage hardware in lab environments.
- Partner with silicon, firmware, BIOS, networking, and OS teams for DPU hardware validation.
- Define and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
- Lead incident response and mitigation for hardware and firmware-related issues.
- Build automation for provisioning and recovery of DPU-enabled Azure Storage systems.
- Develop reliability validation strategies, including stress and fault-injection testing.
- Create and maintain operational runbooks and diagnostics for DPU platforms.
Job Requirements
- Associate's or Bachelor's Degree in Computer Science, IT, or related field.
- 2+ years of technical experience in software engineering, network engineering, or systems administration.
- Experience with large-scale, distributed systems in validation.
- Proficiency in programming or scripting languages (C++, C#, Python, etc.).
- Hands-on experience with Microsoft Azure lab infrastructure and live-site operations.
- Understanding of networking and performance characteristics of I/O-intensive systems.
- Familiarity with firmware lifecycles and hardware validation processes.

