Data Engineer (Praca zdalna)

Hexjobs ATS

Data Engineer (Praca zdalna)

CODILIME SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ

Warszawa, Śródmieście

16 500–27 500 zł netto (+ VAT) / mies.

Praca zdalna

Kontrakt B2B

💼 Kontrakt B2B

🌐 Praca zdalna

⏰ Pełny etat

Snowflake

DBT

Apache Spark

☁️ Azure Databricks

🤖 Apache Airflow

☁️ Azure Data Factory

SQL

🐍 Python

Git

About the project

The goal of this project is to build a centralized, large-scale business data platform for one of the biggest global consulting firms. The final dataset must be enterprise-level, providing consultants with reliable, easily accessible information to help them quickly and effectively analyze company profiles during Mergers & Acquisitions (M&A) projects.
You will be involved in building data pipelines that ingest, clean, transform, and integrate large datasets from over 10 different data sources, creating a unified database of over 300 million company records. The data must be accurate, well-structured, and optimized for low-latency queries. The platform will support multiple internal applications, enabling efficient search across massive datasets and ensuring that your work has a direct impact on the entire organization.
The data will provide company- and site-level information, including firmographics, technographics, and hierarchical relationships (e.g., GU, DU, subsidiary, site). This platform will serve as a key data backbone for consultants, providing critical metrics such as revenue, CAGR, EBITDA, number of employees, acquisitions, divestitures, competitors, industry classification, web traffic, related brands, and more.
Technology stack:
• Languages: Python, SQL
• Data Stack: Snowflake + DBT, PostgreSQL, Elasticsearch
• Processing: Apache Spark on Azure Databricks
• Workflow Orchestration: Apache Airflow
• Cloud Platform: Microsoft Azure
- Compute / Orchestration: Azure Databricks (Spark clusters), Azure Kubernetes Service (AKS), Azure Functions, Azure API Management.
- Database & Storage: Azure Database for PostgreSQL, Azure Cosmos DB, Azure Blob Storage
- Security & Configuration: Azure Key Vault, Azure App Configuration, Azure Container Registry (ACR)
- Search & Indexing: Azure AI Search
• CI/CD: GitHub Actions
• Static Code Analysis: SonarQube
• AI Integration (Future Phase): Azure OpenAI

Your responsibilities

Data Pipeline Development:
• Designing, building, and maintaining scalable, end-to-end data pipelines for ingesting, cleaning, transforming, and integrating large structured and semi-structured datasets.
• Optimizing data collection, processing, and storage workflows.
• Conducting periodic data refresh processes (via data pipelines).
• Building a robust ETL infrastructure using SQL technologies.
• Assisting with data migration to a new platform.
• Automating manual workflows and optimizing data delivery.
Data Transformation & Modeling:
• Developing data transformation logic using SQL and DBT for Snowflake.
• Designing and implementing scalable and high-performance data models.
• Creating matching logic to deduplicate and connect entities across multiple sources.
• Ensuring data quality, consistency, and performance to support downstream applications.
Workflow Orchestration:
• Orchestrating data workflows using Apache Airflow running on Kubernetes.
• Monitoring and troubleshooting data pipeline performance and operations.
Data Platform & Integration:
• Enabling integration of 3rd-party and pre-cleaned data into a unified schema with rich metadata and hierarchical relationships.
• Working with relational (Snowflake, PostgreSQL) and non-relational (Elasticsearch) databases.
Software Engineering & DevOps:
• Writing data processing logic in Python.
• Applying software engineering best practices: version control (Git), CI/CD pipelines (GitHub Actions), DevOps workflows.
• Ensuring code quality using tools like SonarQube.
• Documenting data processes and workflows.
• Participating in code reviews.
Future-Readiness & Integration:
• Preparing the platform for future integrations (e.g., REST APIs, LLM/agentic AI).
• Leveraging Azure-native tools for secure and scalable data operations.
• Being proactive and motivated to deliver high-quality work,
• Communicating and collaborating effectively with other developers,
• Maintaining project documentation in Confluence.

Our requirements

Strong experience with Snowflake and DBT (must-have)
Experience with data processing frameworks, such as Apache Spark (preferably on Azure Databricks)
Experience with orchestration tools like Apache Airflow, Azure Data Factory (ADF), or similar
Experience with Docker, Kubernetes, and CI/CD practices for data workflows
Strong SQL skills, including experience with query optimization
Experience with large-scale datasets
Very good understanding of data pipeline design concepts and approaches
Experience with data lake architectures for large-scale data processing and analytics
Strong Python coding skills
Writing clean, scalable, and testable code (unit testing)
Understanding and applying object-oriented programming (OOP)
Experience with version control systems: Git
Good knowledge of English (minimum C1 level)

Optional

Experience with PostgreSQL (ideally Azure Database for PostgreSQL)
Experience with GitHub Actions for CI/CD workflows
Experience with API Gateway, FastAPI (REST, async)
Experience with Azure AI Search or AWS OpenSearch
Familiarity with developing ETL/ELT processes (a plus)
Optional but valuable: familiarity with LLMs, Azure OpenAI, or Agentic AI system

Technologies we use

What we offer

Flexible working hours and approach to work: fully remotely, in the office or hybrid
Professional growth supported by internal training sessions and a training budget
Solid onboarding with a hands-on approach to give you an easy start
A great atmosphere among professionals who are passionate about their work
The ability to change the project you work on

Wyświetlenia: 14

Zgłoś

Opublikowana	4 dni temu
Wygasa	za 15 dni
Rodzaj umowy	Kontrakt B2B
Tryb pracy	Praca zdalna
Źródło

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Data Engineer"

Dlaczego nikt nie odpowiada na Twoje CV?

Milczenie jest przytłaczające. Wysyłasz aplikacje jedna po drugiej, ale Twoja skrzynka odbiorcza pozostaje pusta. Nasze AI ujawnia ukryte bariery, które utrudniają Ci dotarcie do rekruterów.

Nie znaleziono ofert, spróbuj zmienić kryteria wyszukiwania.