Data Engineer (Praca zdalna)

CODILIME SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ

Warszawa, Śródmieście
16 500–27 500 zł netto (+ VAT) / mies.
Praca zdalna
Kontrakt B2B
💼 Kontrakt B2B
🌐 Praca zdalna
Pełny etat
Snowflake
DBT
Apache Spark
☁️ Azure Databricks
🤖 Apache Airflow
☁️ Azure Data Factory
SQL
🐍 Python
Git

About the project

  • The goal of this project is to build a centralized, large-scale business data platform for one of the biggest global consulting firms. The final dataset must be enterprise-level, providing consultants with reliable, easily accessible information to help them quickly and effectively analyze company profiles during Mergers & Acquisitions (M&A) projects.
  • You will be involved in building data pipelines that ingest, clean, transform, and integrate large datasets from over 10 different data sources, creating a unified database of over 300 million company records. The data must be accurate, well-structured, and optimized for low-latency queries. The platform will support multiple internal applications, enabling efficient search across massive datasets and ensuring that your work has a direct impact on the entire organization.
  • The data will provide company- and site-level information, including firmographics, technographics, and hierarchical relationships (e.g., GU, DU, subsidiary, site). This platform will serve as a key data backbone for consultants, providing critical metrics such as revenue, CAGR, EBITDA, number of employees, acquisitions, divestitures, competitors, industry classification, web traffic, related brands, and more.
  • Technology stack:
  • • Languages: Python, SQL
  • • Data Stack: Snowflake + DBT, PostgreSQL, Elasticsearch
  • • Processing: Apache Spark on Azure Databricks
  • • Workflow Orchestration: Apache Airflow
  • • Cloud Platform: Microsoft Azure
  • - Compute / Orchestration: Azure Databricks (Spark clusters), Azure Kubernetes Service (AKS), Azure Functions, Azure API Management.
  • - Database & Storage: Azure Database for PostgreSQL, Azure Cosmos DB, Azure Blob Storage
  • - Security & Configuration: Azure Key Vault, Azure App Configuration, Azure Container Registry (ACR)
  • - Search & Indexing: Azure AI Search
  • • CI/CD: GitHub Actions
  • • Static Code Analysis: SonarQube
  • • AI Integration (Future Phase): Azure OpenAI

Your responsibilities

  • Data Pipeline Development:
  • • Designing, building, and maintaining scalable, end-to-end data pipelines for ingesting, cleaning, transforming, and integrating large structured and semi-structured datasets.
  • • Optimizing data collection, processing, and storage workflows.
  • • Conducting periodic data refresh processes (via data pipelines).
  • • Building a robust ETL infrastructure using SQL technologies.
  • • Assisting with data migration to a new platform.
  • • Automating manual workflows and optimizing data delivery.
  • Data Transformation & Modeling:
  • • Developing data transformation logic using SQL and DBT for Snowflake.
  • • Designing and implementing scalable and high-performance data models.
  • • Creating matching logic to deduplicate and connect entities across multiple sources.
  • • Ensuring data quality, consistency, and performance to support downstream applications.
  • Workflow Orchestration:
  • • Orchestrating data workflows using Apache Airflow running on Kubernetes.
  • • Monitoring and troubleshooting data pipeline performance and operations.
  • Data Platform & Integration:
  • • Enabling integration of 3rd-party and pre-cleaned data into a unified schema with rich metadata and hierarchical relationships.
  • • Working with relational (Snowflake, PostgreSQL) and non-relational (Elasticsearch) databases.
  • Software Engineering & DevOps:
  • • Writing data processing logic in Python.
  • • Applying software engineering best practices: version control (Git), CI/CD pipelines (GitHub Actions), DevOps workflows.
  • • Ensuring code quality using tools like SonarQube.
  • • Documenting data processes and workflows.
  • • Participating in code reviews.
  • Future-Readiness & Integration:
  • • Preparing the platform for future integrations (e.g., REST APIs, LLM/agentic AI).
  • • Leveraging Azure-native tools for secure and scalable data operations.
  • • Being proactive and motivated to deliver high-quality work,
  • • Communicating and collaborating effectively with other developers,
  • • Maintaining project documentation in Confluence.

Our requirements

  • Strong experience with Snowflake and DBT (must-have)
  • Experience with data processing frameworks, such as Apache Spark (preferably on Azure Databricks)
  • Experience with orchestration tools like Apache Airflow, Azure Data Factory (ADF), or similar
  • Experience with Docker, Kubernetes, and CI/CD practices for data workflows
  • Strong SQL skills, including experience with query optimization
  • Experience with large-scale datasets
  • Very good understanding of data pipeline design concepts and approaches
  • Experience with data lake architectures for large-scale data processing and analytics
  • Strong Python coding skills
  • Writing clean, scalable, and testable code (unit testing)
  • Understanding and applying object-oriented programming (OOP)
  • Experience with version control systems: Git
  • Good knowledge of English (minimum C1 level)

Optional

  • Experience with PostgreSQL (ideally Azure Database for PostgreSQL)
  • Experience with GitHub Actions for CI/CD workflows
  • Experience with API Gateway, FastAPI (REST, async)
  • Experience with Azure AI Search or AWS OpenSearch
  • Familiarity with developing ETL/ELT processes (a plus)
  • Optional but valuable: familiarity with LLMs, Azure OpenAI, or Agentic AI system

Technologies we use

What we offer

  • Flexible working hours and approach to work: fully remotely, in the office or hybrid

  • Professional growth supported by internal training sessions and a training budget

  • Solid onboarding with a hands-on approach to give you an easy start

  • A great atmosphere among professionals who are passionate about their work

  • The ability to change the project you work on

Wyświetlenia: 14
Opublikowana4 dni temu
Wygasaza 15 dni
Rodzaj umowyKontrakt B2B
Tryb pracyPraca zdalna
Źródło
Logo
Logo

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Data Engineer"