Job Description This role involves engaging directly with the Northwell Genomics Health Initiative (NGHI) team to develop and maintain bioinformatics pipelines for next-generation sequencing (NGS) data processing and analysis, specifically including Whole Genome Sequencing (WGS), Whole Exome Sequencing (WES), and RNA-Seq. The individual will define and build the next iteration of features for the NGHI team, and will be responsible for implementing, modifying, expanding, and optimizing our pipelines and warehousing to incorporate feature stores, big data, and cloud technologies. Additionally, this position requires collaborative work with members from Information Technology, Diagnostic Labs, and Clinical teams to support clinical/research scientists, systems, and initiatives at both the departmental and enterprise levels. Job Responsibility
- Design, develop, and maintain robust and scalable bioinformatics pipelines for WGS, WES, and RNA-Seq data analysis using Nextflow or other workflow languages.
- Optimize pipeline performance and efficiency in cloud or HPC environments.
- Collaborate with other scientists and engineers to integrate pipelines into larger data analysis workflows.
- Contribute to the development of novel bioinformatics methods and algorithms using Machine Learning and AI.
- Builds and maintains infrastructure for data management and assembles datasets from disparate sources to meet team requirements.
- Maintains data acquisition process and data pipelines; verifies data quality, and/or ensures it via data cleaning and processing. Develops and optimizes ETL processes, implements transformations and quality checks results.
- Designs, develops, and maintains data pipelines between serves, databases, and other sources to support Genomic research and development and production pipelines.
- Identifies, designs, and implements process improvements for optimization, efficiency, greater scalability, and automation.
- Develops and shares best practices for code quality, versioning, repository, documentation, and data flows among the Genomic team.
- Assists data scientists, engineers, cloud architects, and subject matter advisors in testing, deploying, and maintaining artificial intelligence and machine learning algorithms.
- Facilitates deployment of machine learning and other models to production; monitors performance and health; and updates/retrains models.
- Works collaboratively to develop, construct, test, and maintain large scale data processing systems and databases.
- Maintains development and production environments, both on-premises and cloud-based.
- Participates in projects to architect (research, recommend, design, develop and deploy) advanced systems for the collection, aggregation and analysis of those data in alignment with business objectives.
- Provides big data technology assessments, strategies, and roadmaps in several technical domains and act as a subject matter advisor on big data.
- Works with cross functional research leadership, technical and analytical teams to understand current and future enterprise-wide big data analytics goals spanning disparate platforms and datatypes.
- Assists in ensuring that systems are implemented to support Health System initiatives and goals to improve the quality of patient care, to maximize patient safety, and to provide operational efficiencies.
- Demonstrates familiarity with current health system information systems.
- Operates under limited guidance and work assignments involve moderately complex to complex issues where the analysis of situations or data requires in-depth evaluation of variable factors.
- Makes decisions on moderately complex to complex issues regarding technical approach and completion of own tasks/responsibilities of substantial complexity.
- Performs related duties as required. All responsibilities noted here are considered essential functions of the job under the Americans with Disabilities Act. Duties not mentioned here, but considered related are not essential functions.
Job Qualification
- Bachelor's Degree in Computer Science, Informatics, Statistics, Engineering, Data Science, or relative quantitative field required, or equivalent combination of education and related experience. Master's Degree, preferred.
- 3-5 years of experience with enterprise level design and implementation of relational databases, big data pipelines, cloud computing, and other advanced data science and big data technologies, required.
- Advanced working knowledge and experience with SQL and relational databases, required. Strong knowledge and experience using the following software and tools, required: SQL/NoSQL, Python, R, Spark, Git, data pipeline tools (e.g., Airflow), cloud infrastructure (Microsoft Azure preferred).
- Experience in building and optimizing data pipelines; deploying and maintaining production machine learning algorithms; and building, testing, and deploying code on cloud infrastructure, required.
- Experience with managing healthcare data, preferred.
- Experience with managing unstructured data and streaming data, preferred.
- Experience in architecting data warehouses and/or data lakes with traditional database enterprise-class RDBMS technologies, preferred.
- Strong knowledge of Business Intelligence & Analytics concepts and platforms, inclusive of data virtualization, data preparation, data visualization and advanced analytics technologies, preferred.
*Additional Salary Detail The salary range and/or hourly rate listed is a good faith determination of potential base compensation that may be offered to a successful applicant for this position at the time of this job advertisement and may be modified in the future. When determining a team member's base salary and/or rate, several factors may be considered as applicable (e.g., location, specialty, service line, years of relevant experience, education, credentials, negotiated contracts, budget and internal equity).
|