Data Engineer

5d5 days ago

Awin

Iaşi, RO · Full-time · RON 120,000 – RON 200,000

About this role

Data sits at the heart of the company. This role ensures Awin leverages data across the group to build reporting for better commercial decisions and best-in-class campaign management supporting client-facing departments. Design, build, and optimize scalable data pipelines, curated datasets, and analytical data models in Azure, AWS, and Databricks environments.

Work with large-scale datasets to improve performance, reliability, and translate business logic into well-structured tables, metrics, and transformation rules. Build and maintain ETL/ELT pipelines using Databricks, PySpark, Spark SQL, and Delta Lake. Develop production-ready notebooks, workflows, and data lake integrations while applying Spark optimization best practices.

Design curated datasets, semantic layers, and data marts that power analytics and reporting. Partner with business stakeholders, product owners, and analysts to align datasets with business processes and decision-making needs. Document data models, lineage, logic, and dataset behavior clearly.

Optimize transformations and storage for scale and cost; tune PostgreSQL databases and implement robust data validation and quality checks. Collaborate with engineers, analysts, and stakeholders to deliver reliable solutions. Communicate effectively with technical and non-technical audiences in Agile environments.

Requirements

Bachelor’s degree in Computer Science, Data Engineering, or related field
Hands-on experience with Databricks (PySpark, Spark SQL, Delta Lake, Lakebase), PostgreSQL
Background working with large, distributed datasets
Proficiency in Python, PySpark, and SQL
Experience with data modeling, curated datasets, semantic layers, and medallion architecture
Experience with AWS (Lambda, CloudWatch, and Step Functions)
Competence in using Datadog or similar observability/monitoring platforms
Strong debugging, problem-solving skills, Agile comfort, and commitment to thorough documentation

Responsibilities

Build and maintain ETL/ELT pipelines using Databricks, PySpark, Spark SQL, and Delta Lake
Develop production-ready notebooks, workflows, and data lake integrations
Apply best practices for Spark optimization including partitioning, caching, avoiding shuffle, and file compaction
Design curated datasets, semantic layers, and data marts that power analytics and reporting
Partner with business stakeholders to understand requirements and align datasets with business processes
Convert business requirements into data models defining tables, metrics, KPIs, and transformation rules
Work with large datasets optimizing transformations and storage; tune PostgreSQL databases
Implement robust data validation, schema enforcement, and quality checks across pipelines