

Site Reliability Engineer, HPC and LSF
Location
Seattle, WA
Level
Mid-Level
Department
Semiconductors
Type
Salary
$120,000 - $190,000
Job Description
Posted on:
2025-11-06
Responsibilities
- Troubleshoot incoming support requests in a large-scale HPC environment.
- Enhance deployment automation, configuration management, observability, and operational monitoring.
- Ensure compute servers are configured correctly with the appropriate Operating System.
- Perform comprehensive troubleshooting from bare metal to application level.
- Collaborate with specialist teams to resolve issues.
- Work with domain experts to improve chip development processes utilizing infrastructure.
- Contribute to overall quality and improve time to market for next-generation chips.
Job Requirements
- Proficient in administering Centos/RHEL Linux distributions.
- Understanding of container technologies like Docker.
- Proficiency in Python and UNIX scripting languages such as bash.
- Excellent problem-solving skills for complex systems analysis.
- Strong communication and teamwork skills.
- BS in Computer Science or equivalent experience with 2+ years of relevant post-degree experience.
- Solid understanding of cluster configuration management tools such as Ansible.




