Article Clasterization Tool to Improve the Bibliography Search
Challenge
The customer approached Quantori to enhance their bibliography search by developing a tool for article clustering. Although they had a basic prototype on Streamlit, they needed a high-performance production system, as no existing bibliographic tools provided clustering of publications. The goal was to create a complex clustering feature, optimizing data recording for graphs, and reduce the search program's time and memory usage, which initially handled only one request at a time.
Solution
Quantori Team built a high-performance bibliographic tool using the optimized datasets provided by the customer. We developed a front-end platform with enhanced UX and built the back-end infrastructure on GCP.
Outcome
The article clustering functionality helps identify the most relevant references and core articles, with metrics for ranking authors and publications within clusters. It supports exporting search results to CSV, including separate clusters, 'seed papers,' and search parameters. The system features a user-friendly interface and provides more precise bibliographic search and filtering, delivering unique data quickly.
Calculation of 15,000 Human Genomes using GCP
Challenge
The R&D company needed experts in GCP and genotyping to process terabytes of human genome data and compare it with reference sequences. This would help speed up the discovery phase.
Solution
To support the project, Quantori Team introduced the GCP infrastructure and the Terra Baer portal environment. Next, we evaluated cutting-edge bioinformatics technologies, including DRAGEN, DeepVariant, and GATK. We established a pipeline to retrieve data from Answer ALS and the 1000 Genomes Project (1KGP) and input it into GATK in GVCF format, streamlining the data processing workflow.
Outcome
We implemented a high-performance joint genotyping pipeline, resulting in the successful genotyping of approximately 5,000 genomes from Answer ALS and the 1000 Genomes Project (1KGP).