Senior Data Engineer - Apache Spark
What You'll Do:
• Design and build ETL/ELT pipelines that move data across our platform.
• Model data for analytics and downstream consumption clean, well-documented, and built to last.
• Review your teammates' code and pipelines, and have your own work reviewed in return.
• Work directly with analysts, clinicians, and business teams to understand what they actually need (often different from what they first ask for).
• Keep production stable. Investigate issues. Fix root causes, not symptoms.
Must-Haves:
• Hands-on experience with Apache Spark you've written, tuned, and debugged Spark jobs in production.
• Hands-on experience with dbt you've built and maintained dbt projects of meaningful size, written tests, and dealt with the messy parts (incremental models, macros, dependency hell).
• Strong data modeling skills dimensional modeling, normalization vs. denormalization tradeoffs, slowly changing dimensions, and the judgment to know what to use when. We'll ask you to walk us through real models you've designed.
Nice-to-Haves:
• Experience with Databricks.
• Healthcare data background.
• Experience with Airflow.
• ci/cd pipelines
• familiar with open table formats (iceberg, delta lake, hudi)
• Data catalog and data discovery
What We Care About in Candidates:
We're less interested in certifications, course lists, and tool checklists, and more interested in:
• What you've actually built, and what broke along the way.
• How you think when something goes wrong in production.
• Whether you can explain a complex technical decision to a non-technical stakeholder without losing them.
• Whether you take ownership of your work end-to-end.