Boosting Bioinformatics Productivity with Coretex: QIIME2 Use Case

Đurđina Bjelan
6.6.2024
Bioinformatics
User Experience
Machine learning techniques have transformed the field of bioinformatics. The integration of Coretex and Qiime2 represents a world of exciting possibilities recognized by the Institute of Forensics in Zurich. The built in features of Coretex, such as sample quality data visualization of FASTQ files, parameter optimization, and data encryption, are showcased.

AI in Bioinformatics

Artificial intelligence (AI) has made significant contributions to the field of bioinformatics, revolutionizing the way biological data is analyzed, interpreted, and applied in various domains.    

AI algorithms, particularly machine learning (ML) and deep learning (DL) techniques, are employed to analyze biological sequences such as DNA, RNA, and protein sequences. These algorithms can predict gene function, identify regulatory elements, annotate genetic variations, and classify sequences into functional categories.  

TCGA strands

Bioinformatics workflows involve a series of interconnected data processing steps that filter, transform, and visualize biological data.

DNA sequencing is the process of determining the precise order of nucleotides (adenine, thymine, cytosine, and guanine) in a segment of DNA, represented by two complementary strands forming a double helix structure (TCGA strands).

Bioinformatics workflows are utilized across diagnostics, clinical studies, drug effect analysis, and drug discovery to filter, transform, and visualize biological data for applications ranging from disease diagnostics to drug target identification.

Unlocking the mysteries of life at its most fundamental level requires more than just curiosity—it demands the innovative fusion of biology and data science. Bioinformatics, the dynamic marriage of these disciplines, dives deep into the intricate complexities of biological data, harnessing the power of cutting-edge tools to unravel nature's secrets.  

DNA structure

Qiime2, Bioconductor and Nextflow

Researchers and industry leaders can leverage tools like Qiime2, Bioconductor and NextFlow for a wide range of complex and impactful projects across various fields. For example, they can use these tools to conduct personalized medicine development by analyzing genomic data to identify biomarkers for personalized therapies, or streamline whole-genome sequencing workflows in cancer genomics research to pinpoint mutations driving cancer progression.

Qiime2 is instrumental in microbial ecology studies, aiding the analysis of community structure in soil and water ecosystems, while also facilitating metagenomics research for public health to assess microbial communities in urban environments. Bioconductor proves valuable in neurogenomics research to investigate brain tissue samples, as well as in drug discovery and development to identify pharmacogenomics data for potential new drug targets. Meanwhile, NextFlow supports biotechnology product development through the design and optimization of processes like enzyme production. Marine biology research also benefits from Qiime2, as it helps scientists study the complex interactions of microbial communities in coral reefs.

These tools exemplify the indispensable role of data science in bioinformatics and highlight the importance of utilizing specialized software to tackle the unique challenges posed by biological datasets.

But what is the significance of Coretex in all this? By integrating Coretex with QIIME2 workflows, researchers can enhance the analysis, interpretation, and prediction capabilities in microbiome research, ultimately leading to deeper insights into the structure and function.  

The significance of Coretex

Enter Coretex, a platform designed to simplify data management, make simpler experiment execution, and ensure that valuable time is spent on the heart of the research itself. The central feature here is the concept of optimized parameters. It's not just about automation; it's about precision. Coretex allows researchers to set the parameters just right, minimizing the need for manual intervention. The result? Multiple experiments can be executed with a single click, accelerating the pace of research, boosting efficiency, and fostering a culture of innovation.

In addition to the possibility to create each step of the Qiime2 workflow individually, there are ready-made Coretex worfklow templates. It is necessary to choose the desired template and the game with parameter optimization can begin.

Workflow templates

Visual and Advanced Workflow Preview

The visual representation of the workflow enables users to grasp the intricate web of task connections, showcasing the graph of inter-task dependencies utilized by the Coretex task scheduler. This intelligent system determines the execution order of tasks, optimizing the use of compute infrastructure. Coretex keeps track of nodes connecting, disconnecting, and completing tasks, orchestrating the execution of subsequent tasks only when all dependent tasks have been completed, ensuring that their outputs are ready.

Visual Workflow designer

While the Advanced tab offers an alternative list view for those who prefer a non-visual approach, parameters should ideally be set directly within the visual workflow designer. This intuitive interface allows users to easily manage and adjust parameters within the context of the entire workflow.

Coretex Advanced workflow setup view

When the optimal value of a parameter is not known in advance, Coretex simplifies the process by enabling users to run a grid search across a range of candidate values. By specifying a list of potential values for each parameter, users can preview the Workflow Execution Plan, which visually outlines the number of Task Runs Coretex will generate. This visual preview is helpful for validating the execution logic just before scheduling all runs.

DNA read quality plots used in filtering pipeline

R&D director Igor Peric and our senior ML engineer Dusko Mirkovic discussed the integration of QIIME2 into Coretex. In this video tutorial, they went through the entire process, step by step answering potential questions and concerns.

Our partners at Institute of Forensics in Zurich report that they spend 80% of their time in this phase, finding this feature particularly beneficial for their work.

Data Encryption

Coretex supports commonly used formats in molecular biology and bioinformatics research, such as .fastq, .fasta, and .qza files. But what if I use sensitive data?

To guarantee compliance with the highest EU standards, Coretex implemented robust encryption techniques and security protocols to protect personal data from unauthorized access, disclosure, and alteration.

Your data is secure on our server thanks to advanced encryption. How does it work? Only you have access to the encryption keys, meaning your data remains completely unreadable to us. Your privacy and security are our top priorities.

If you have any doubts or questions, please reach out on our Discord channel to get support from our engineers. Additionally, you can read our documentation on encryption in Coretex for more information.

Encryption Key

Coretex in practise

The importance of Coretex in the field of microbiome analysis was recognized by the Institute of Forensics in Zurich. Their challenge is managing the complexities of DNA sequencing analysis. These experts conduct numerous experiments, each with its own set of specific parameters. Prior to the integration of Coretex, this was a tough task, demanding manual configuration for each individual experiment. The massive amount of time and effort required for this part of the job was overwhelming, hindering progress and stifling creativity.

Dr Natasha Arora, research associate at Zurich Institute of Forensic Medicine, found Coretex very helpful in her project studies.

"As a researcher working on microbiome sequencing applications, and a mentor of PhD and MSc students at the University of Zurich, this platform has simplified our project studies. When working with microbial DNA sequences, i.e. with .fastq files, our group spends most of their time preprocessing reads and testing various parameter values. It used to be extremely time-consuming, requiring manual and repetitive process. With Coretex's parameter optimisation feature we can now run multiple experiments with a single click. This streamlined process has significantly improved our efficiency and productivity, allowing me to focus more on the management. Furthermore, as an educator, Coretex's transparency in code and results is invaluable. I can easily track and assess my students' progress, ensuring that their work aligns with the highest standards of quality and accuracy."

Pricing

You might be wondering about the cost. Since researchers manage everything on the university's Node, the only cost they need to pay is the monthly Coretex subscription. This setup keeps things straightforward and cost-effective, making it an attractive option for your projects. Sounds good, right?

By leveraging Coretex's comprehensive features, researchers can maximize productivity while minimizing overhead costs. As a result, they can spend more time discovering new insights and pushing the boundaries of their fields.

Ready to see how Coretex can optimize your workflows? Start using Coretex for free today!

Đurđina Bjelan
June 6, 2024