Job Information

Sandia National Laboratories System Administrator - HPC and Data Analytics Testbeds (Experienced) in Albuquerque, New Mexico

Posting Duration:

This posting will be open for application submissions for a minimum of seven (7) calendar days, including the ‘posting date’. Sandia reserves the right to extend the posting date at any time.

Salary Range:

$86,300 - $166,800 *Salary range is estimated, and actual salary will be determined after consideration of the selected candidate¿s experience and qualifications, and application of any approved geographic salary differential.

What Your Job Will Be Like:

Sandia's Advanced Architecture Platforms Administration Team develops, deploys, and manages testbed hardware in support of high performance computing (HPC) application research and development. The team manages ground breaking, one-off, and experimental hardware clusters, as well as standard systems that use unique software stacks. Members of the team work with local colleagues at Sandia, as well as collaborators at other Department of Energy laboratories and industry partners.

In this role, you will apply your existing knowledge of Linux systems on scientific or data analysis clusters to maintain and improve existing and future scientific computing testbed resources. You will work with all aspects of cluster management, from cabling new clusters, to compiling and testing kernel drivers for new hardware, to creating automation that keeps the testbeds running efficiently. This is an outstanding opportunity to partner with researchers and vendors to explore and influence technical directions and drive transformation in how scientific and national security applications use computing in the coming decade and beyond.

This position is primarily on-site at Sandia’s Albuquerque, New Mexico site, with some telecommuting permitted. Sandia provides generous relocation benefits for successful candidates. This job is posted at the level of Information Systems Architect (Experienced).

Every day will be different in our team, but typical activities include:

  • Collaborate with research and development staff, colleagues, and vendors to build and maintain testbed resources for testing novel networking, accelerators, scientific software workflows, and other technologies

  • Develop new operational methodologies and design of infrastructure to enable efficient operations of multiple, concurrent, emerging technology and prototype HPC Clusters

  • With the full testbeds team, maintain all system aspects of security, networks, filesystems, system software installation, and user support

  • Participate in all aspects of the HPC system lifecycle including facility integration, standup, acceptance testing, performance benchmarking, operational support, and reclamation.

Qualifications We Require:

  • Bachelor’s degree in Computer Science, Computer Engineering, Information Systems Engineering (CIS/MIS), or relevant STEM field plus five more years of relevant IT experience

  • Minimum of 5 years’ experience managing Linux/Unix clusters dedicated to scientific computing, data analysis, or similar workflows

  • Experience with one or more of the following: parallel filesystems, high speed networking, accelerators, application code optimization for HPC, HPC resource and job management

  • Ability to acquire and maintain a DOE Q level clearance

Qualifications We Desire:

  • Experience with automation tools for configuration management (e.g., Ansible, Puppet, Chef) and revision control systems (e.g., Git, Mercurial)

  • Experience administering Linux/Unix Cluster systems at scale (100’s -- 10000’s nodes)

  • Experience administering GPU, ARM, and/or FPGA-based clusters

  • Experience with complex programming environments typical in HPC platforms, including use of MPI

  • Experience with container runtimes such as Docker or Podman

  • Experience designing and implementing infrastructure supporting multiple HPC systems

  • Experience leading a team in complex design, standup, or similar activities of HPC systems

About Our Team:

The HPC Development Department (9328) partners with different scientific and computing disciplines at Sandia, and externally, to advance high performance computing (HPC) and operations. The Advanced Architecture Platforms are acquired and operated as part of a multi-center collaboration to explore, evaluate, and influence next-generation computing. Team members enjoy exploration of cutting-edge technologies and the ability to drive change in the field of computing.

About Sandia:

Sandia National Laboratories is the nation’s premier science and engineering lab for national security and technology innovation, with teams of specialists focused on cutting-edge work in a broad array of areas. Some of the main reasons we love our jobs:

  • Challenging work with amazing impact that contributes to security, peace, and freedom worldwide

  • Extraordinary co-workers

  • Some of the best tools, equipment, and research facilities in the world

  • Career advancement and enrichment opportunities

  • Flexible work arrangements for many positions include 9/80 (work 80 hours every two weeks, with every other Friday off) and 4/10 (work 4 ten-hour days each week) compressed workweeks, part-time work, and telecommuting (a mix of onsite work and working from home)

  • Generous vacations, strong medical and other benefits, competitive 401k, learning opportunities, relocation assistance and amenities aimed at creating a solid work/life balance*

World-changing technologies. Life-changing careers. Learn more about Sandia at:*These benefits vary by job classification.

Security Clearance:

This position does not currently require a Department of Energy (DOE) security clearance.

Sandia will conduct a pre-employment drug test and background review that includes checks of personal references, credit, law enforcement records, and employment/education verifications. Furthermore, employees in New Mexico need to pass a U.S. Air Force background screen for access to Kirtland Air Force Base. Substance abuse or illegal drug use, falsification of information, criminal activity, serious misconduct or other indicators of untrustworthiness can cause access to be denied or terminated, resulting in the inability to perform the duties assigned and subsequent termination of employment.

If hired without a clearance and it subsequently becomes necessary to obtain and maintain one for the position, or you bid on positions that require a clearance, a pre-processing background review may be conducted prior to a required federal background investigation. Applicants for a DOE security clearance need to be U.S. citizens. If you hold more than one citizenship (i.e., of the U.S. and another country), your ability to obtain a security clearance may be impacted.

Members of the workforce (MOWs) hired at Sandia who require uncleared access for greater than 179 days during their employment, are required to go through the Uncleared Personal Identity Verification (UPIV) process. Access includes physical and/or cyber (logical) access, as well as remote access to any NNSA information technology (IT) systems. UPIV requirements are not applicable to individuals who require a DOE personnel security clearance for the performance of their SNL employment or to foreign nationals. The UPIV process will include the completion of a USAccess Enrollment, SF-85 (Questionnaire for Non-Sensitive Positions) and OF-306 (Declaration of for Federal Employment). An unfavorable UPIV determination will result in immediate retrieval of the SNL issued badge, removal of cyber (logical) access and/or removal from SNL subcontract. All MOWs may appeal the unfavorable UPIV determination to DOE/NNSA immediately. If the appeal is unsuccessful, the MOW may try to go through the UPIV process one year after the decision date.


All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, or veteran status and any other protected class under state or federal law.

Job ID: 686651