GCP HPC DevOps Engineer (Praca zdalna)

Xebia sp. z o.o.

Rzeszów +5 więcej
28500 zł/mth.
Zdalna, Hybrydowa
GCP
🐍 Python
Ansible
High-Performance Computing
Bash
Terraform
🌐 Zdalna
Hybrydowa

Requirements

Expected technologies

GCP

Python

Ansible

High-Performance Computing

Bash

Terraform

Operating system

Windows

macOS

Linux

Our requirements

  • 5+ years of experience with HPC (High-Performance Computing) environments, including SLURM workload manager, MPI, and other HPC-related software,
  • extensive hands-on experience managing Linux-based systems, including performance tuning and troubleshooting in an HPC context,
  • proven experience migrating and managing SLURM clusters in cloud environments, preferably GCP,
  • proficiency with automation tools such as Ansible and Terraform for cluster deployment and management,
  • experience with Spack for managing and deploying HPC software stacks,
  • strong scripting skills in Python, Bash, or similar languages for automating cluster operations,
  • in-depth knowledge of GCP services relevant to HPC, such as Compute Engine (GCE), Cloud Storage, and VPC networking,
  • strong problem-solving skills with a focus on optimizing HPC workloads and resource utilization,
  • work from the European Union region and a work permit are required.

Optional

  • Google Cloud Professional DevOps Engineer or similar GCP certifications,
  • familiarity with GCP’s HPC-specific offerings, such as Preemptible VMs, HPC VM images, and other cost-optimization strategies,
  • experience with performance profiling and debugging tools for HPC applications,
  • advanced knowledge of HPC data management strategies, including parallel file systems and data transfer tools,
  • understanding of container technologies (e.g., Singularity, Docker) specifically within HPC contexts,
  • experience with Spark or other big data tools in an HPC environment.

Your responsibilities

  • leading the migration of on-premises SLURM-based HPC (High-Performance Computing) clusters to Google Cloud Platform,
  • designing, implementing, and managing scalable and secure HPC infrastructure solutions on GCP,
  • optimizing SLURM configurations and workflows to ensure efficient use of cloud resources,
  • managing and optimizing HPC environments, focusing on workload scheduling, job efficiency, and scaling SLURM clusters,
  • automating cluster deployment, configuration, and maintenance tasks using scripting languages (Python, Bash) and automation tools (Ansible, Terraform),
  • integrating HPC software stacks using tools like Spack for dependency management and easy installation of HPC libraries and applications,
  • deploying, managing, and troubleshooting applications using MPI, OpenMP, and other parallel computing frameworks on GCP instances,
  • collaborating with engineering, support teams, and stakeholders to ensure smooth migration and ongoing operation of HPC workloads,
  • providing expert-level support for performance tuning, job scheduling, and cluster resource optimization,
  • staying current with emerging HPC technologies and GCP services to continually improve HPC cluster performance and cost efficiency.
Wyświetlenia: 9
Opublikowana30 dni temu
Wygasaza 14 dni
Tryb pracyZdalna, Hybrydowa
Źródło
Logo
Logo
Logo
Logo

Podobne oferty, które mogą Cię zainteresować

Na podstawie "GCP HPC DevOps Engineer"