HPC Engineer - Hybrid
Caris Life Sciences
Position Summary
An HPC (High Performance Computing) Engineer is responsible for implementing, and maintaining a High Performance Computing (HPC) systems primarily running on Linux operating systems, which involves tasks like installing, configuring, optimizing, and troubleshooting hardware and software components within a complex cluster environment, often requiring expertise in parallel processing, network architecture, and job scheduling tools like LSF, while ensuring optimal system performance and user support.
Job Responsibilities
Installing and configuring Linux operating systems on HPC clusters, including network settings, storage systems, and parallel file systems like GPFS.
Monitoring system performance, identifying bottlenecks, and tuning system parameters to maximize computational efficiency.
Managing user job submissions and queues using tools like LSF or SLURM, ensuring fair allocation of computing resources.
Implementing security measures to protect HPC systems and data from unauthorized access.
Diagnosing and resolving hardware and software issues, applying updates and patches, and performing routine system maintenance.
Providing technical assistance to researchers and other users on the HPC system, including account management and application support.
Forecasting future computing needs and planning for system upgrades or expansions
Required Qualifications
4 years managing Linux servers, direct experience managing HPC clusters preferred.
Technical experience with system configuration, implementation, management and user support.
Strong understanding of Linux system administration
Expertise in parallel computing concepts and programming paradigms.
Knowledge of high-performance networking technologies
Familiarity with cluster management tools (e.g., LSF, Slurm, PBS)
Experience with distributed file systems (Lustre, Ceph, GPFS)
Scripting languages like Python and Shell scripting (e.g.,bash,ksh) for automation
Understanding of computer architecture and performance optimization techniques
Strong Linux system administration skills: Expertise in Linux commands, system configuration, and troubleshooting.
HPC cluster knowledge: Understanding of cluster architectures, network topologies (like InfiniBand), and parallel processing concepts.
Job scheduling tools: Proficiency with job scheduling systems like LSF or SLURM
Performance analysis tools: Familiarity with tools to monitor and analyze system performance
Scripting languages: Ability to write scripts (e.g., Bash, Python) for automation and system management
Networking expertise: Understanding of network protocols, network troubleshooting, and high-speed networking technologies
Storage management: Knowledge of parallel file systems and data management strategies
Preferred Qualifications
Experience with HPC schedulers and resource managers
Experience writing user documentation
Experience developing and delivering training for users
Strong technical and analytical skills
Strong verbal and written communication skills
Always maintains the highest level of professionalism when interacting with internal and external customers
Demonstrates a high-energy, positive attitude and commitment to quality customer service
Contributes to a positive team environment within the center by demonstrating a strong work ethic, effectively communicating with others, and proactively anticipating center and user needs
Experience coordinating and running support teams
Physical Demands
Ability to lift, move and install HPC data center hardware and supplies.
Standing for extended periods while performing data center related tasks.
Training
All job specific, safety, and compliance training are assigned based on the job functions associated with this employee.
Other
This position requires periodic travel and some evenings, weekends, and/or holidays.
Job may require after-hours response to emergency issues.
Periodically scheduled on-call may require after-hours response for technical emergencies not explicitly related to assigned job responsibilities
Conditions of Employment: Individual must successfully complete pre-employment process, which includes criminal background check, drug screening, credit check ( applicable for certain positions) and reference verification.
This job description reflects management’s assignment of essential functions. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities to this job at any time.
Caris Life Sciences is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, gender identity, sexual orientation, age, status as a protected veteran, among other things, or status as a qualified individual with disability.