Data Engineer

CODILIME SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ

Warszawa, Śródmieście
27500 zł/mth.
Zdalna
Snowflake
DBT
Apache Spark
☁️ Azure Databricks
🤖 Apache Airflow
☁️ Azure Data Factory
SQL
🐍 Python
Git
🌐 Zdalna

Requirements

Expected technologies

Snowflake

DBT

Apache Spark

Azure Databricks

Apache Airflow

Azure Data Factory

SQL

Python

Git

Optional technologies

PostgreSQL

GitHub Actions

API Gateway

FastAPI

Azure AI Search

AWS OpenSearch

Our requirements

  • Strong experience with Snowflake and DBT (must-have)
  • Experience with data processing frameworks, such as Apache Spark (preferably on Azure Databricks)
  • Experience with orchestration tools like Apache Airflow, Azure Data Factory (ADF), or similar
  • Experience with Docker, Kubernetes, and CI/CD practices for data workflows
  • Strong SQL skills, including experience with query optimization
  • Experience with large-scale datasets
  • Very good understanding of data pipeline design concepts and approaches
  • Experience with data lake architectures for large-scale data processing and analytics
  • Strong Python coding skills
  • Writing clean, scalable, and testable code (unit testing)
  • Understanding and applying object-oriented programming (OOP)
  • Experience with version control systems: Git
  • Good knowledge of English (minimum C1 level)

Optional

  • Experience with PostgreSQL (ideally Azure Database for PostgreSQL)
  • Experience with GitHub Actions for CI/CD workflows
  • Experience with API Gateway, FastAPI (REST, async)
  • Experience with Azure AI Search or AWS OpenSearch
  • Familiarity with developing ETL/ELT processes (a plus)
  • Optional but valuable: familiarity with LLMs, Azure OpenAI, or Agentic AI system

Your responsibilities

Data Pipeline Development:

Designing, building, and maintaining scalable, end-to-end data pipelines for ingesting, cleaning, transforming, and integrating large structured and semi-structured datasets. Optimizing data collection, processing, and storage workflows. Conducting periodic data refresh processes (via data pipelines). Building a robust ETL infrastructure using SQL technologies. Assisting with data migration to a new platform. Automating manual workflows and optimizing data delivery.

Data Transformation & Modeling:

Developing data transformation logic using SQL and DBT for Snowflake. Designing and implementing scalable and high-performance data models. Creating matching logic to deduplicate and connect entities across multiple sources. Ensuring data quality, consistency, and performance to support downstream applications.

Workflow Orchestration:

Orchestrating data workflows using Apache Airflow running on Kubernetes. Monitoring and troubleshooting data pipeline performance and operations.

Data Platform & Integration:

Enabling integration of 3rd-party and pre-cleaned data into a unified schema with rich metadata and hierarchical relationships. Working with relational (Snowflake, PostgreSQL) and non-relational (Elasticsearch) databases.

Software Engineering & DevOps:

Writing data processing logic in Python. Applying software engineering best practices: version control (Git), CI/CD pipelines (GitHub Actions), DevOps workflows. Ensuring code quality using tools like SonarQube. Documenting data processes and workflows. Participating in code reviews.

Future-Readiness & Integration:

Preparing the platform for future integrations (e.g., REST APIs, LLM/agentic AI). Leveraging Azure-native tools for secure and scalable data operations. Being proactive and motivated to deliver high-quality work, Communicating and collaborating effectively with other developers, Maintaining project documentation in Confluence.

Wyświetlenia: 1
Opublikowana16 dni temu
Wygasaza 27 dni
Tryb pracyZdalna
Źródło
Logo
Logo

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Data Engineer"