Background image

Data Science and Engineering
Case studies

Harness Tens of Thousands Longitudinal EHRs of Patients Across a Dozen Diseases Within the Research

Challenge

Our client integrates the latest computational and experimental biology development achievements to discover novel therapeutics for a broad range of diseases. They needed to determine if there are definable, genetically hidden sub-populations of patients given their longitudinal EHR (procedures, lab values, medications) data and to verify these populations using genomic data. However, their analytical systems did not perform efficiently.

Solution

To boost the inefficient analytical system, Quantori implemented a high-performance custom pipeline to group and post-verify patients for genomically distinguished sub-populations. The data was digitized through the lens of pleiotropy. The Quantori team was enriched with data scientists with expertise in the relevant disease investigations. The project's biggest challenge was working with heterogeneous real-world data. The team built up leading-edge tools for extracting, polishing, and consolidating data in different formats, such as data out of the client’s private data in OMOP, national biobank data, surveys and examinations, and typical EHR data. Another challenge was to attain factful clusterization. For this, the team utilized multiple methods: mixture-learning, the HDBSCAN hierarchical clustering algorithms, the t-SNE and UMAP dimensionality reductions for visualization, and other dimensionality reductions, such as NMF and PCA for feature creation.

Outcome

Our client gained streamlined and standardized ETL pipelines to parse datasets of over tens of thousands longitudinal EHRs of patients across multiple diseases within the research.


Design, Implement and Support Life Science IT Infrastructure, Oligo Design and Ordering

Challenge

Our client is a leading US biotech, providing products and services for the entire genomics workflow. They needed a new powerful application for a wet laboratory to design and evaluate quantitative polymerase chain reaction (PCR and qPCR) primers and probes. The current tool was narrow-focused, resulting in a scientific backlog for the client’s customers.

Solution

The Quantori team built a back-end service as a sophisticated bioinformatic constructor for reaction components and chemical conditions to control PCR and qPCR. Our project team consisted of DevOps specialists, software engineers, chemists, bioinformaticians, mathematicians, and analysts. The application is implemented using C# wrapped in Docker with Kubernetes for hosting, compatible with Excel, FASTA, CSV files, and allows manual input. The input of sequences is also possible through the NCBI DB. The sequences are analyzed in Quantori’s proprietary open-source application to design and evaluate required oligo-sequences. The team implemented an Oligo DWH to store oligo-assays to reuse and redesign moving forward. Specifically for the project, the team conducted profound scientific research at the intersection of genetics, biochemistry and bioinformatics, thermodynamics, complex computations, and algorithms. The team additionally carried out concurrent double-testing with golden cases and market counterparts, such as Primer3 and other custom tools, to present the most precise algorithms and boost the domain achievements. Within this ongoing project, the team created a UI to unify the application, the DWH, and other scaling systems.

Outcome

Our client received a custom-developed,unprecedented large-scale multi-attribute application, as well as redefined parameters for probes, denaturation, nucleotide sizes of DNA and primers and GVC  — content, including PCR/qPCR singleplex and multiplex regimens. We implemented thermodynamic algorithms for the oligo design with Gibbs free energy minimization and introduced analysis of the secondary structure formation for probes, primers, and genotyping design to avoid undesirable bonding. Our new solution outperformed the client’s existing tool and market equivalents in attributes variations, flexibility, and accuracy.