African Society for Bioinformatics and Computational Biology

ASBCB Omics Codeathon – October 2022

Omics codeathon is an established event where life scientists work on research projects. It is held twice a year and led by Olaitan I. Awe, the current training officer for the African Society for Bioinformatics and Computational Biology (ASBCB).

The codeathons have been organized through a collaboration between ASBCB and the National Institutes of Health, Office of Data Science Strategy (NIH ODSS).

The primary aim of the omics codeathon is to use omic sequences and bioinformatics to advance the understanding of the biology of model organisms and pathogens to ultimately improve human health. Our research projects typically use human, cellular, cancer and pathogen genomics to investigate the molecular mechanisms of mendelian disorders and complex traits. By using different omics approaches, we are interested in trying to understand how diseases work at the molecular level.

Codeathon participants and applicants come from across the globe, including South Africa, Nigeria, Algeria, Zimbabwe, Kenya, Morocco, Tunisia, Egypt, Senegal, Mali, Ghana, Brazil, Uganda, Poland, Tanzania, Mozambique, Bangladesh, Botswana, United States, United Kingdom, Mexico, Ireland, Burkina Faso, Gambia, Benin Republic, Sweden, China, Ethiopia, South Korea, Canada and India.

The October 2022 codeathon was virtual and it was our first codeathon led by African women. The projects were in these categories; Human Genomic Variation, Bulk Transcriptomics, Metagenomics, Cheminformatics, Clinical Applications, Farming Systems, Antimicrobial Resistance, Population Genomics, Oncology, Plant Genomics, Machine Learning and Single-cell Transcriptomics.

Projects

Title
Team
Project Description

ExpVar: A Package for Data Analysis and Visualization of Gene Expression and Genetic Variants

Hiba Ben Aribi, Imraan Dixon and Najla Abassi 

RNA-seq data manipulation workflows from raw FASTQ/BAM files to biologically-significant information are complex and require various skills and tools. These analyses are easy to perform by bioinformaticians but not by biologists. This creates the need for biologist-friendly genomic data analysis and visualization tools.

In this study, we developed a novel R package, named ExpVar, to analyze gene expression and genetic variant calling data from  Fastq/BAM files, and three R shiny apps EXPviz, SNPviz, and CNVviz respectively for differential gene expression, Single Nucleotide Polymorphisms, and Copy Number Variant data visualization, integrated as functions.

ExpVar provides a unique combination of analysis and visualization features for biologists with limited programming skills. The user can perform multiple analyses and visualize biological data without the need for any third-party program.

Transcriptomic Profiling of Epilepsy Patients Using Bioinformatics Analysis

Fatma EL ABED, Marion Nyaboke, Careen Naitore and Ghada BARAKET

Epilepsy is a chronic neurological disease characterized by recurrent and spontaneous seizures that result from an imbalance of discharges of neuronal cells. The diagnosis is based on the examination of manifestations, the medical history, electroencephalograms and neuroimaging, and currently, there is no potential biomarker that can help in clinical diagnosis which is quite complicated in some cases.
Recently, miRNAs have been used as potential biomarkers for many diseases and they have been implicated in the development of multiple disorders including seizures. In this omics codeathon project, we extracted data from the Gene Expression Omnibus (GEO) database in order to analyze the differentially expressed genes and miRNAs between patients and control or between different disease conditions and to identify new miRNAs associated with epilepsy and targeting new mRNA genes involved in the development of seizures. Our aim was to find miRNAs which can be used as diagnostic biomarkers in epilepsy patients.
These analyses were based on genomes from different populations, blood and brain tissue samples and different bioinformatic tools such as FastQC, Cutadapt, HISAT2, DESeq2 and GEO2R.
These miRNAs can serve as new diagnostic tools that allow an accurate clinical diagnosis and facilitate an appropriate treatment.

Shotgun Sequencing Analysis of the Gut Microbiome in Colorectal Cancer Patients

Nouhaila En Najih, Latifah Benta Mukanga, Edward Jenner Tettevi, Olaitan I. Awe, Ruvarashe Joylyne Madzime and Omolanke Temitope Oyedemi

Colorectal cancer (CRC) is the third most common malignancy and the second leading cause of cancer-related death in the world, claiming almost a million lives each year. Recent findings suggest that the dysbiosis of the gut microbiota leads to alteration in host physiology, contributing to the pathogenic processes in various diseases including Colorectal Cancer. The changes in the microbiome can be used as biomarkers for the early detection of CRC and for the improvement of screening strategies. In this study, we analyzed shotgun metagenomes with the goal of determining the taxonomic abundance and functional potential of the microbial communities in the gut of Colorectal Cancer patients. Results from the analysis of the shotgun metagenomics sequence data inidicated some insights into the taxonomic and functional potential of the gut microbiome and its association with colorectal cancer.

Targeting cdk1 for Potential Inhibitors in Colorectal Cancer: A Computational Approach

Ojochenemi A. Enejoh, Pauline Gachanja, Pranavathiyavani Gnanasekar, Shamim Osata, Chinelo H. Okonkwo, Halimat C. Atanda and Uchechukwu C. Ogbodo

Colorectal cancer (CRC) is a major public health concern with serious consequences for those who are affected. Important recurring events in the development of CRC include the disruption of the cell cycle and over-expression of a few regulators and checkpoint activators. Cyclin-Dependent Kinase 1 (cdk1), a key regulator essential for preserving cell cycle efficiency, has purportedly been connected to CRC. In this study, we set out to identify cdk1 inhibitors with promise for CRC clinical therapy and to evaluate their feasibility by means of in-silico methods. With the aid of computational molecular docking and molecular dynamics simulation techniques, we examined natural compounds retrieved from the PubChem database for their inhibitory efficacies against Cdk1. Free binding energy calculations of the protein-ligand complexes were performed using molecular mechanics with generalized born and surface area (MMGBSA) solvation. The drug-likeness and ADMET (Absorption, Distribution, Metabolism, Excretion and Toxicity) characteristics of successfully screened lead candidates were also profiled.

Multiple Sclerosis Stages and their Differentially Expressed Genes: A Bioinformatics Analysis

Faten ALAYA, Mark T. Kivumbi, Daniel A. Adediran, Nikita Sitharam, Katelyn Cuttler and Itunu Ajiboye

Multiple sclerosis (MS) is a chronic inflammatory and demyelinating disease of the central nervous system. Different courses of MS are distinguished: relapsing-remitting MS (RRMS), secondary progressive MS (SPMS) and primary progressive MS (PPMS). These stages affect the progression and therefore the treatment of the disease. Hence, finding key genes and microRNAs (miRNA) associated with MS stages and analyzing their interactions is important in order to better understand the molecular mechanism underlying the occurrence and the evolution of MS.
In this study, we analyzed public datasets of mRNA and miRNA expression in order to determine differentially expressed genes (DEG) and miRNAs (DEM) between patients with different stages of MS and healthy controls and between both relapsing and remitting phases of RRMS and we analyzed miRNA-mRNA regulatory interaction and gene ontology for the DEGs. Study findings demonstrate some key genes and miRNAs as potential biomarkers of RRMS and SPMS that are potentially involved in the occurrence and the evolution of MS.

Investigation of the Impact of Long-term Push-pull Technology on Soil Microbiome Using Shotgun Sequencing

Aneth Bella David, Stanley Onyango, Fredrick Kebaso and Chikodili G. Anaukwu

Push-pull technology is an affordable, effective and environmental-friendly approach for pest management in cereal smallholder farming.
The technology harnesses the dynamics of intercropping to remove insect pests from the main crop and trapping them so that they don’t damage crops. Research on push-pull technology has focused on plant-plant as well as plant-insect interactions, ignoring below-ground communities and any role they may play on the functioning of the technology. In a previous study we used targeted sequencing (16S rRNA gene and ITS region sequencing) to uncover the composition and differential abundance of soil microorganisms in long-term maize push-pull plots relative to monoculture. In addition, diversity patterns of the soil microbes were highlighted.
However, the resolution of the soil microbial taxa was limited to the genus level where most bacterial taxa were still unclassified. In addition, functional annotation was not possible to infer from targeted sequencing.
In this study, we utilized shotgun sequencing in order to improve the resolution of the taxonomic classification of soil microbial communities in long-term push-pull systems as well as infer their functions.

Identification of Biomarkers for Colorectal Cancer Using Machine Learning

Brenda Kamau, Rudolph A. Serage, James Wachira, Bonface Onyango and Henry Ndugwa

Colorectal cancer (CRC) is one of the leading causes of mortality worldwide.  Early diagnosis of cancer is challenging because it is highly heterogeneous. Cancer-related biomarkers play an important role in the early diagnosis and treatment of cancer. Traditional methods of cancer diagnosis are prone to bias or have a tendency to fail to detect key CRC biomarkers. Recent advancements in next-generation sequencing (NGS) technologies have been useful in exploring novel gene expression in colon cancer pathogenesis and these NGS methods have resulted in extremely huge datasets whose analysis largely depends on machine learning. Algorithms based on machine learning have been proven to be not only effective but also computationally efficient for the identification of disease biomarkers from high dimensional datasets. In this project, we utilized single-cell RNA-seq datasets coupled with LightGBM and XGBoost models for the determination of CRC biomarkers. We developed a Nextflow pipeline that predicts CRC status from RNA-seq data and extracts biomarkers from CRC single-cell RNA-seq data.

Investigating Antimicrobial Resistance Genes in Kenya, Uganda, and Tanzania Cattles Using Metagenomics

Kauthar M. Omar, George L. Kitundu, Dorcus N. Namikelwa, Felix M. Lisso, Seun E. Olufemi, Adijat O. Jimoh and Abiola A. Babajide

Despite the significant reduction of disease burden brought forth by the use of antimicrobial drugs, there are growing global concerns about microbial resistance to antimicrobial drugs, particularly antibiotics, amid fears that Antimicrobial Resistance (AMR) will reverse previous gains. The dairy and meat sector among others will be much affected by AMR. Ahead of Eastern Africa Community (EAC) integration agenda, the movement of dairy and meat products will greatly improve the trade sector within member states but all this is hampered by AMR.
Surveillance of Antimicrobial-resistant genes and genes available in cattle populations found in EAC member states will help guide the formation of policies, standards that govern the movement of these products, and proper animal husbandry management without affecting the sector. In this study, we therefore leveraged on the use of public metagenomics data and analyzed cattle rumen microbiome from Kenya, Uganda, and Tanzania in order to determine the present AMR genes circulating within the three countries as well as assess their diversity and infer the reasons for their emergence.

Differential Gene Expression Analysis of Common Bean Lineages During Development

Brenda Kiage and Olaitan I. Awe

In this study, we investigated three common bean varieties selected based on their culinary quality (cooking times). We used public datasets from a previous study where Rosecoco (fast-cooking), Pinto (slow-cooking) and canadian wonder (intermediate-cooking) common bean varieties were grown, collected at different early development stages and sequenced using next-generation technologies in Vrije Universiteit in Brussels. The aim of this study is to identify differentially expressed genes during seed development, useful in the development of the hard-to-cook defect through transcriptomic profiling. By employing RNA-seq bioinformatics techniques, we performed data analysis involving traditional Next-Generation Sequencing pipelines and a differential gene expression analysis in order to determine which genes were expressed in these common bean varieties during development.