AMD

Senior Cluster Performance Engineer

Job Description

Posted on: 
2025-11-29

Responsibilities

  • Collaborate with hardware and software teams to enhance GPU cluster performance, focusing on RDMA throughput and latency.
  • Develop and execute benchmarking strategies to assess performance and identify bottlenecks.
  • Conduct scalability testing of GPU clusters under various workloads.
  • Utilize profiling tools to analyze performance bottlenecks and provide insights for improvement.
  • Implement optimization strategies including protocol enhancements and load balancing techniques.
  • Document performance analysis and tuning efforts, providing clear reports for stakeholders.
  • Work closely with cross-functional teams to integrate performance improvements into GPU cluster architecture.

Job Requirements

  • Proven experience in optimizing GPU cluster performance.
  • Understanding of RDMA network drivers and GPU architectures.
  • Proficiency in scripting languages for automation and performance analysis.
  • Experience with system-level performance analysis tools.
  • Strong analytical mindset with problem-solving skills.
  • Familiarity with cluster management tools and Linux kernel networking.
  • Bachelor's or Master's degree in computer science or equivalent experience.
Apply now

More job openings