Senior Site Reliability Engineer

EPAM Systems (Poland) sp. z o.o.

Kraków, Grzegórzki +1 mehr
Remote
🚢 Kubernetes
🐳 Docker
🌐 Remote
🐍 Python
JavaScript
Java
ServiceNow
Splunk
Git
GitLab
☁️ Microsoft Azure
☁️ Azure Kubernetes Service

Requirements

Expected technologies

Kubernetes

Docker

Optional technologies

Google Cloud Platform

AWS

Microsoft Azure

Operating system

Linux

Our requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • Proven experience in any cloud (AWS/GCP/Azure)
  • Experience with implementing SRE practices such SLO/SLI, Error budgets, Postmortems, Reducing Toil, capacity planning, and Incidents Management
  • Python or other scripting/programming language
  • Strong background in monitoring tools
  • Proficient in CI/CD tools, infrastructure as code and configuration management
  • Solid knowledge of container orchestration technologies (Kubernetes, Docker)

Optional

  • Expertise in deployment and management of LLMs, including technologies like RAG
  • Certification in Kubernetes, AWS/GCP/Azure, or similar technologies
  • Proven experience in DevOps
  • Knowledge of managing and optimizing AI/ML models in production environments, including basic deployment, monitoring, and maintenance

Your responsibilities

  • Collaborate with development, security, quality and operation teams to implement SRE practices and ensure system reliability
  • Define and support required level of reliability, availability, and performance for a services and applications
  • Troubleshoot, mitigate and support fixing of the infrastructure and application issues in a timely manner
  • Implement monitoring system for the infrastructure and application reliability
Aufrufe: 1
Veröffentlichtvor 6 Tagen
Läuft abin 13 Tagen
ArbeitsmodusRemote
Quelle
Logo
Logo

Ähnliche Jobs, die für Sie von Interesse sein könnten

Basierend auf "Senior Site Reliability Engineer"