
About this role
As a Senior Data Engineer, you will be responsible for breaking down data silos. This role focuses on building a unified, high-performance data layer using Data Federation techniques. Architect a Data Lakehouse environment where disparate sources feel like a single, cohesive database for analytics and AI teams.
Design and implement federated query layers using Starburst/Trino for high-speed analytics across distributed data sources without unnecessary data movement. Build scalable, distributed ETL/ELT pipelines with Python and Apache Spark (PySpark). Manage modern table formats like Delta Lake, Apache Iceberg, or Hudi for ACID transactions in the data lake.
Optimize Spark jobs and SQL queries across the federation layer to minimize latency and manage compute costs. Implement fine-grained access control and data masking within the federation engine. Ensure data privacy across all connected platforms for analytics and AI teams.
Leverage cloud platforms like AWS (EMR, S3, Glue), Azure (Databricks, ADLS), or GCP. Apply expert SQL skills and data modeling proficiency in Star/Snowflake schemas and Medallion Architecture. Explore bonus skills in IaC, dbt, Kubernetes, Data Mesh, or Data Fabric.
Requirements
- 5+ years of experience with Python and deep expertise in Apache Spark tuning (partitioning, shuffling, caching)
- Hands-on experience with Starburst Enterprise, Trino (Presto), or Dremio
- Proven track record working with Delta Lake or Iceberg architectures
- Extensive experience with AWS (EMR, S3, Glue), Azure (Databricks, ADLS), or GCP
- Expert-level SQL skills for complex analytical queries and query plan analysis
- Proficiency in designing Star/Snowflake schemas and understanding Medallion Architecture (Bronze, Silver, Gold layers)
Responsibilities
- Design and implement federated query layers (e.g., Starburst/Trino) to allow high-speed analytics across distributed data sources without unnecessary data movement
- Build scalable, distributed data processing pipelines using Python and Apache Spark (PySpark)
- Manage and optimize modern table formats like Delta Lake, Apache Iceberg, or Hudi to bring ACID transactions to the data lake
- Optimize Spark jobs and SQL queries across the federation layer to minimize latency and manage compute costs
- Implement fine-grained access control and data masking within the federation engine to ensure data privacy across all connected platforms
Benefits
- Medical, vision, and dental benefits
- 401k retirement plan
- Variable pay/incentives
- Paid time off
- Paid holidays
- Compensation range: USD 40,000 - 140,000 based on experience and qualifications
Similar roles

Senior Data Engineer
5d5 days agoMakpar
Washington, US · Full-time · $150,000 – $190,000

Senior Data Engineer
5d5 days agoPostbank
Berlin, DE · Full-time · €80,000 – €110,000

Infrastructure Engineering Manager
5d5 days agoFortnox
Växjö, SE · Full-time · SEK 800,000 – SEK 1,100,000

Senior AI/ML Engineer - Shared Services Automation - Remote
5d5 days agoMayo Clinic
Rochester, US · Full-time · $160,000 – $220,000