Senior ETL Pipeline Engineer
Caris Life Sciences
At Caris, we understand that cancer is an ugly word—a word no one wants to hear, but one that connects us all. That’s why we’re not just transforming cancer care—we’re changing lives.
We introduced precision medicine to the world and built an industry around the idea that every patient deserves answers as unique as their DNA. Backed by cutting-edge molecular science and AI, we ask ourselves every day: “What would I do if this patient were my mom?” That question drives everything we do.
But our mission doesn’t stop with cancer. We're pushing the frontiers of medicine and leading a revolution in healthcare—driven by innovation, compassion, and purpose.
Join us in our mission to improve the human condition across multiple diseases. If you're passionate about meaningful work and want to be part of something bigger than yourself, Caris is where your impact begins.
Position Summary
The Senior ETL Pipeline Engineer is a Senior level role within Caris Life Sciences responsible for productionalizing and maintaining existing pipelines, designing and implementing new ones, automating data quality checks, and enabling smooth handoffs to downstream teams. You will work with large, complex datasets and collaborate closely with upstream data providers and downstream users to ensure our pipelines meet scientific and operational needs. This role requires a self-starter who can independently drive projects from concept through deployment.
Job Responsibilities
Deliver high-quality, maintainable, and reproducible pipelines that meet production standards, including logging, error handling, and modular design to support genomics research
Productionalize existing ETL pipelines to enable consistent, repeatable execution.
Create new data pipelines using tools like AWS Step Functions and Metaflow to support evolving research and analysis goals.
Implement automated QC/QA steps to ensure data accuracy, completeness, and reproducibility.
Monitor pipeline performance and proactively address data anomalies or failures.
Work closely with upstream data providers (e.g., lab systems, sequencing platforms) to understand data formats and delivery schedules.
Partner with downstream consumers (e.g., data scientists, bioinformaticians, clinical teams) to ensure data usability and accessibility.
Integrate pipelines with AWS-based infrastructure, primarily writing outputs to S3.
Lead technical decision-making, advocate for best practices, and see projects through to completion.
Meet all assigned targets and goals set by management.
Perform other related duties as assigned.
Stay current with emerging technologies in data engineering, genomics, and cloud computing.
Take ownership of assigned projects through deployment and monitoring.
Communicate progress, risks, and blockers to ensure timely delivery of milestones.
Required Qualifications
Bachelor's degree from an accredited university or equivalent work experience in a related field.
6+ years in data engineering, with a track record of building robust, production-grade pipelines, having owned and delivered complex ETL systems end-to-end, not just scripts or prototypes.
Proficient in Python for data processing and workflow development (e.g., using pandas, boto3, etc.).
Comfortable navigating and contributing to large, modular codebases that use object-oriented design principles to promote reuse and maintainability.
Strong software engineering fundamentals — version control, testing, code review, and documentation are second nature
Solid SQL skills — able to read from and query SQL databases effectively.
Experience with AWS, particularly S3, Step Functions, and general cloud-based data workflows.
Experience working with large, complex datasets in a production environment
Familiarity with workflow management tools such as Metaflow.
Demonstrated ability to work independently, prioritize tasks, and deliver results with minimal supervision.
Proven ability to collaborate with upstream data providers and downstream users to troubleshoot issues and ensure reliable data delivery
Preferred Qualifications
Exposure to genomics, molecular biology, or biomedical datasets (e.g., VCF, gene expression matrices, variant annotation tables) is a strong plus.
Experience with Athena, Glue, or similar AWS data catalog/query tools
Knowledge of data lineage, reproducibility, and scientific computing best practices.
Familiarity with infrastructure-as-code or containerization tools (e.g., Terraform, Docker, CDK).
Training
All job specific, safety, and compliance training are assigned based on the job functions associated with this employee.
Conditions of Employment: Individual must successfully complete pre-employment process, which includes criminal background check, drug screening, credit check ( applicable for certain positions) and reference verification.
This job description reflects management’s assignment of essential functions. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities to this job at any time.
Caris Life Sciences is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, gender identity, sexual orientation, age, status as a protected veteran, among other things, or status as a qualified individual with disability.