BIOINFORMATICS 2022 Abstracts

Full Papers

Paper Nr:	3
Title:	In Silico De Novo Synthesis, Screening, and ADME/T Profiling of DNA-pA104R Inhibitors as Potential African Swine Fever Therapeutics
Authors:	Kim Rafaelle E. Reyes, Timothy Jen R. Roxas, Marineil C. Gomez and Lemmuel L. Tayo
Abstract:	African Swine Fever Virus (ASFV) is a dsDNA virus causative of the African Swine Fever (ASF) in wild and domestic hogs. ASF is characterized by hemorrhagic fever, high mortality, and transmissibility. The binding of the DNA to the pA104R protein mediates viral replication and genome packaging. In the present study, we generated nine (9) reference compounds that exhibited high docking affinities through de novo computer-aided drug design (CADD). These compounds were then used as query molecules to find commercially available drug-like compounds using ligand-based virtual screening (VS). We were able to retrieve 900 hit compounds that exhibited the same pharmacophoric activities. Then, these hit compounds were subjected to drug-likeness filtration to identify the most likely to be developed as commercial drugs based on established parameters. We identified sixty-two (62) drug-like molecules. Molecular docking was then performed to determine the top five compounds with the highest binding affinities against the target protein. ADME/T profiling was done on these compounds to assess their pharmacokinetic properties. Compound 8.45 performed best based on our devised scoring system. This paper shall serve as a good reference in the discovery and development of anti-ASFV therapeutics.
Download

Paper Nr:	4
Title:	Generative Adversarial Network for the Segmentation of Ground Glass Opacities and Consolidations from Lung CT Images
Authors:	Xiaochen Wang and Natalia Khuri
Abstract:	The coronavirus disease 2019 is a global pandemic that threatens lives of many people and poses a significant burden for healthcare systems worldwide. Computerized Tomography can detect lung infections, especially in asymptomatic cases, and the detection process can be aided by deep learning. Most of the recent research focused on the segmentation of the entire infected region in a lung. To automate a more fine-grained analysis, a generative adversarial network, comprising two convolutional neural networks, was developed for the segmentation of ground glass opacities and consolidations from tomographic images. The first convolutional neural network acts as a generator of segmented masks, and the second as a discriminator of real and artificially segmented objects, respectively. Experimental results demonstrate that the proposed network outperforms the baseline U-Net segmentation model on the benchmark data set of 929 publicly available images. The dice similarity coefficients of segmenting ground glass opacities and consolidations are 0.664 and 0.625, respectively.
Download

Paper Nr:	11
Title:	Lossy Compressor Preserving Variant Calling through Extended BWT
Authors:	Veronica Guerrini, Felipe A. Louza and Giovanna Rosone
Abstract:	A standard format used for storing the output of high-throughput sequencing experiments is the FASTQ format. It comprises three main components: (i) headers, (ii) bases (nucleotide sequences), and (iii) quality scores. FASTQ files are widely used for variant calling, where sequencing data are mapped into a reference genome to discover variants that may be used for further analysis. There are many specialized compressors that exploit redundancy in FASTQ data with the focus only on either the bases or the quality scores components. In this paper we consider the novel problem of lossy compressing, in a reference-free way, FASTQ data by modifying both components at the same time, while preserving the important information of the original FASTQ. We introduce a general strategy, based on the Extended Burrows-Wheeler Transform (EBWT) and positional clustering, and we present implementations in both internal memory and external memory. Experimental results show that the lossy compression performed by our tool is able to achieve good compression while preserving information relating to variant calling more than the competitors. Availability: the software is freely available at https://github.com/veronicaguerrini/BFQzip.
Download

Paper Nr:	16
Title:	Aggregating Statistically Correlated Metabolic Pathways Into Groups to Improve Prediction Performance
Authors:	Abdur Rahman M. A. Basher and Steven J. Hallam
Abstract:	Metabolic pathway prediction from genomic sequence information is an essential step in determining the capacity of living things to transform matter and energy at different levels of biological organization. A detailed and accurate pathway map enables researchers to interpret and engineer the flow of biological information from genotype to phenotype in both organismal and multi-organismal contexts. In this paper, we propose two novel hierarchical mixture models, SOAP (sparse correlated pathway group) and SPREAT (distributed sparse correlated pathway group), to improve pathway prediction outcomes. Both models leverage pathway abundance to represent an organismal genome as a mixed distribution of groups, and each group, in turn, is a mixture of pathways. Moreover, both models deal with missing potential pathways in the training set by provisioning supplementary pathways into the learning framework as part of noise reduction efforts. Because the introduction of supplementary pathways may lead to overestimation of some pathways, dual sparseness is applied. The resulting pathway group dataset is then used to train multi-label learning algorithms. Model effectiveness was evaluated on metabolic pathway prediction where correlated models, in particular, SOAP was able to equal or exceed the performance of previous pathway prediction algorithms on organismal genomes.
Download

Paper Nr:	22
Title:	Efficient k-mer Indexing with Application to Mapping-free SNP Genotyping
Authors:	Mattia Marcolin, Francesco Andreace and Matteo Comin
Abstract:	Advances in sequencing technologies and computational methods have enabled rapid and accurate identification of genetic variants. Accurate genotype calls and allele frequency estimations are crucial for population genomics analyses. One of the most demanding step in the genotyping pipeline is mapping reads to the human reference genome. Recently mapping-free methods, like Lava and VarGeno, have been proposed for the genotyping problem. They are reported to perform 30 times faster than a standard alignment-based genotyping pipeline while achieving comparable accuracy. Moreover, these methods are able to include known genomic variants in the reference making read mapping, and genotyping variant-aware. However, in order to run they require a large k-mers database, of about 60GB, to be loaded in memory. In this paper we study the problem of genotyping using new efficient data structures based on k-mers set compression, and we present a fast mapping-free genotyping tool, named GenoLight. GenoLight reports accuracy results similar to the standard pipeline, but it is up to 8 times faster. Also, GenoLight uses between 5 to 10 times less memory than the other mapping-free tools, and it can be run on a laptop. Availability: https://github.com/CominLab/GenoLight.
Download

Short Papers

Paper Nr:	5
Title:	Novel Role of Transcriptional Factor Kaiso in HIV Infection
Authors:	Zainab H. Alwan, Gopal Reddy and Balasubramanyam Karanam
Abstract:	The role of Kaiso, a POZ-ZF transcriptional factor in HIV infection, which has been disparagingly affecting African Natives as well as African Americans, has not been well studied. For these reasons, this research aimed to investigate the level of expression and the role of Kaiso in HIV-1 infected African Natives compared with patients in the United States. In silico data of 185 whole blood samples were analyzed to study gene expression by array in GEO (Gene expression Omnibus) dataset from the National Center for Biotechnology Information (NCBI). Different bioinformatics approaches were used to analyze the data. Two or more groups of samples were compared using GEO2R to identify differentially expressed genes across experimental conditions. Pathways that were significantly associated with specific gene sets were determined by Gene Set Enrichment analysis (GSEA). Results showed higher level of Kaiso expression in HIV-1 patients compared to healthy control (p = 3.89e-10), and it was significantly higher in African Natives compared to United States patients (p = 0.002). Importantly, this study revealed a negative correlation between Kaiso expression and CD4+ T cell count in HIV-1 infected African Native patients (p = 0.003). This negative correlation between Kaiso and CD4+ T cell count was accompanied by increased viral load in African Natives with a higher viral set point compared to US HIV-1 patients. These data may at least partly explain the reasons for faster progression to AIDS in African Natives than seen in US patients. Kaiso associated enrichment pathways showed that Kaiso upregulation may contribute to CD4 depletion in HIV-1 infection, and may upregulate HIV associated neurological impairment marker genes. The results also showed that Kaiso expression may also be associated with increased Wnt/β-catenin signaling pathway by downregulating GSK3β, MAPK1 and MAPK3 through different downregulated pathways in African Native patients. This study suggests that Kaiso may play a role in the crosstalk between different pathways in HIV-1 infection. In conclusion, the present study suggests, for the first time, that Kaiso expression levels may possibly play a role in the faster acceleration of HIV-1 infection towards AIDS in African ancestry patients and this may be through the involvement of Wnt/βcatenin signaling pathway. Data of this study also suggests that Kaiso expression level may contribute to increased crosstalk between different pathways in HIV-1 pathogenesis. Further studies are needed to fully delineate the role of Kaiso in different HIV1 infected ethnic groups through the involvement of different intermediary pathways.
Download

Paper Nr:	6
Title:	A Python SDK for Authoring and using Computer-interpretable Guidelines
Authors:	Marcus Barann, Stefan Heldmann, Jan Klein and Stefan Kraß
Abstract:	In this paper we describe a Python SDK that we developed and which we used to create a decision support system (DSS) for determining and presenting clinical practice guideline (CPG) recommendations for individual patients. Computer-interpretable guidelines (CIGs) are formalisms that represent CPG knowledge. Our Python SDK implements a model and an engine for a CIG formalism that can be easily integrated into any Python-based application. We describe important aspects of creating a guideline model with our CIG and present a web application that interacts with our guideline engine through a REST API. The web application implements generic components to manage and display the current input needs, recommendations, and statements. In comparison to PROforma, we added predicate components, which facilitate the reuse of logical expressions. Arguments refer to predicates instead of including expressions. This allows reusing the same expression in multiple arguments. We also allow the use of the predicates in other expressions, like in expressions of other predicates and task preconditions. To facilitate the integration of our CIG in decision support systems, we added properties to all PROforma components that represent a code from a terminology system.
Download

Paper Nr:	8
Title:	VAEResTL: A Novel Generative Model for Designing Complementarity Determining Region of Antibody for SARS-CoV-2
Authors:	Saeed Khalilian, Zahra Moti, Arian Baloochestani, Yeganeh Hallaj, Alireza Chavosh and Zahra Hemmatian
Abstract:	The global impact of the COVID-19 pandemic underlines the importance of developing a competent machine learning (ML) approach that can rapidly design therapeutics and prophylactics such as antibodies/nanobodies against novel viral infections despite data shortage problems and sequence complexity. Here, we propose a novel end-to-end deep generative model based on convolutional Variational Autoencoder (VAE), Residual Neural Network (Resnet), and Transfer Learning (TL), named VAEResTL that can competently generate CDR-H3 sequences for a novel target lacking sufficient training data. We further demonstrate that our proposed method generates the third complementarity-determining region (CDR) of the heavy chain (CDR-H3) sequences for designing and developing therapeutic antibodies/nanobodies that can bind to different variants of SARS-CoV-2 despite the shortage of SARS-CoV-2 training data. The predicted CDR-H3 sequences are then screened and filtered for their developability parameters namely viscosity, clearance, solubility, stability, and immunogenicity through several in-silico steps resulting in a list of highly optimized lead candidates.
Download

Paper Nr:	10
Title:	Unsupervised Learning to Understand Patterns of Comorbidity in 633,330 Patients Diagnosed with Osteoarthritis
Authors:	Marta Pineda-Moncusi, Victoria Y. Strauss, Danielle E. Robinson, Daniel Prieto-Alhambra and Sara Khalid
Abstract:	With the advent of big data in healthcare, machine learning has rapidly gained popularity due to its potential to analyse large volumes of complex data from a variety of sources. Unsupervised learning can be used to mine data and discover patterns such as sub-groups within large patient populations. However challenges with implementation in large-scale datasets and interpretability of solutions in a real-world context remain. This work presents an application of unsupervised clustering techniques for discovering patterns of comorbidities in a large dataset of osteoarthritis patients with a view to discover interpretable and clinically-meaningful patterns.
Download

Paper Nr:	13
Title:	Virtual Planning and Simulation of Coarctation Repair in Hypoplastic Aortic Arches: Is Fixing the Coarctation Alone Enough?
Authors:	Seda Aslan, Xiaolong Liu, Qiyuan Wu, Paige Mass, Yue-Hin Loke, Narutoshi Hibino, Laura Olivieri and Axel Krieger
Abstract:	Coarctation of aorta (CoA) is a congenital heart disease that may coexist with transverse arch hypoplasia (TAH). Infants who suffer from both conditions are often treated only for CoA at the initial repair if the degree of TAH is diagnosed as mild. In this study, we investigated the effect of virtually repairing the CoA of three patients (n=3) who also suffer from TAH. We repaired the CoA by using virtual stents that were modeled based on descending aorta diameters of the patients. Using computational fluid dynamics (CFD) simulations, we investigated the changes in time-averaged wall shear stress (TAWSS) after the virtual repair and calculated the peak systolic pressure drop (PSPD), which is the indicator of the performance of the repair. The magnitude of TAWSS was reduced in the repaired CoA regions in all the patients. The PSPD was improved in two patients, remaining above 20 mmHg in one of them. There was no significant change in PSPD for one patient after the virtual repair. The results may potentially help clinicians to gain better insights into whether the CoA repair alone in patients with existing TAH is sufficient.
Download

Paper Nr:	14
Title:	Adjustive Linear Regression and Its Application to the Inverse QSAR
Authors:	Jianshen Zhu, Kazuya Haraguchi, Hiroshi Nagamochi and Tatsuya Akutsu
Abstract:	In this paper, we propose a new machine learning method, called adjustive linear regression, which can be regarded as an ANN on an architecture with an input layer and an output layer of a single node, wherein an error function is minimized by choosing not only weights of the arcs but also an activation function at each node in the two layers simultaneously. Under some conditions, such a minimization can be formulated as a linear program (LP) and a prediction function with adjustive linear regression is obtained as an optimal solution to the LP. We apply the new machine learning method to a framework of inferring a chemical compound with a desired property. From the results of our computational experiments, we observe that a prediction function constructed by adjustive linear regression for some chemical properties drastically outperforms that by Lasso linear regression.
Download

Paper Nr:	27
Title:	A Comprehensive and Scientifically Accurate Pharmaceutical Knowledge Ontology based on Multi-source Data
Authors:	Pengfei Wang, Yiqing Mao, Wei Song, Wenting Jiang, Yang Liu, Liumeng Zheng, Bin Ma, Qingqing Sun and Sheng Liu
Abstract:	Recently, knowledge graphs have been applied by large pharmaceutical companies to improve the efficiency of drug discovery. Specifically, knowledge graphs based on drug ontology have been used for many purposes. Current drug ontologies have different scopes, but mainly focus on the description of basic drug information. Here, we describe a comprehensive pharmaceutical knowledge ontology, including information of active ingredients, indications, inactive ingredients, drugs, clinical trials, organs and tissues, literature, patents, targets, therapeutics, and biomolecules. Using multiple data sources, we apply a seven-step method for ontology modelling using Protégé software. A comprehensive pharmaceutical knowledge ontology model is established to complete the knowledge representation of drug information. By means of ontology theory, the pharmaceutical knowledge is modelled, standardized and networked, so as to clarify the knowledge structure and quickly acquire related knowledge and logical relationships. In the future, knowledge graphs based on this ontology could be helpful to deal with the dispersion, heterogeneity, redundancy and fragmentation of medical big data, to share and integrate pharmaceutical data, and to provide a set of solutions for the networked development of pharmaceutical knowledge.
Download

Paper Nr:	1
Title:	Bioinformatics Analysis of Gene Targets for Birt-Hogg-Dube Syndrome Associated with Renal Cell Cancer using NetworkAnalyst
Authors:	Mariemme Keilsy D. Martos and Marineil C. Gomez
Abstract:	CrRCC (chromophobe renal cell cancer) belongs to the group of non-clear cell cancer which accounts 4%-5% of RCC. Birt-Hogg-Dube Syndrome (BHDS), a subtype for crRCC, occurs due to the germline mutation of Folliculin (FLCN). Each disease has designated treatment and contrasting prognosis, but the histological features of this syndrome may overlap with the other subtypes of RCC which makes it difficult to differentiate and it has a limited amount of information available due to its uncommonness. This study aims to differentiate the pathway and genes involved in BHDS disease through NetworkAnalyst. The dataset was gathered from ArrayExpress and generated 395 significant DEGs in BHDS, which was then used to produce a pathway enrichment network and protein-protein interaction (PPI). Cytoskeletal protein binding correlating with hub genes KIT, RHOB, and UBC in BHDS indicates that this disease has a high risk for cell metastasis. This study gives a new promising therapeutic target for the said disease.
Download

Paper Nr:	2
Title:	Cancer Detec-Lung Cancer Diagnosis Support System: First Insights
Authors:	Nelson Faria, Sofia Campelos and Vítor Carvalho
Abstract:	Lung cancer is the type of cancer that causes most deaths worldwide and as sooner it is discovered as more possibilities there are for the patient to be treated. An accurate histological classification of tumours is essential for lung cancer diagnosis and adequate patient management. Whole-slide images (WSI) generated from tissue samples can be analysed using Deep Learning techniques to assist pathologists. In this study it is given an overview of the lung cancer exploring the different types of implementations undertaken until the present. These methods show a two-step implementation in which the tasks consist primarily of the detection of the tumour and after on the histologic classification of the tumour. To detect the neoplastic cells, the WSI is split in patches, and then a convolutional neural network is applied to identify and generate a heatmap highlighting the tumour regions. In the next step, features are extracted from the neoplasic regions and submitted in a classifier to determine the histologic type of tumour present in each patch. Moreover, in this paper, it is proposed a possible approach based on the literature review to surpass the limitations found in the actual models, and with better performance and accuracy, that could be used as an aid in the pathological diagnosis of the lung cancer.
Download

Paper Nr:	9
Title:	Brain MRI Images Pre-processing of Heterogeneous Data-sets for Deep Learning Applications
Authors:	S. Ostellino, A. Benso and G. Politano
Abstract:	Automatic segmentation of tissues and lesions is a very important step in any Artificial Intelligence pipeline designed to analyze medical images (especially MRI). This is particularly true for brain MRI images of patients affected by neurological pathologies like Multiple Sclerosis (MS). To perform well, cutting edge Artificial Intelligence approaches like Deep Learning need a huge amount of training data. Unfortunately, available data-sets of MRI medical images often lack annotations, standardized acquisition protocols, formats and dimensions. This heterogeneity in the data-sets makes it often very difficult to use and integrate different data-sets in the same pipeline. Available image pre-processing tools have specific requirements and might not be adequate for extensive usage with heterogeneous data-sets. This paper presents an on-going work on a comprehensive and consistent brain MRI images pre-processing pipeline for Deep Learning applications enabling the creation of a congruous data-set. The pipeline was tested with the public available ISBI2015 data-set.
Download

Paper Nr:	12
Title:	Hybrid Gene Regulation Models of Mammalian Circadian Cycles
Authors:	Lelde Lace, Karlis Cerans, Karlis Freivalds, Gatis Melkus and Juris Viksna
Abstract:	We present hybrid system based gene regulation models of mammalian circadian cycle and the results of model behaviour analysis. The models cover genes of two recently proposed biological models with 5 and 3 gene ’core oscillators’. The advantage of the used HSM framework is limited model dependence on parameter values, which are described only at qualitative level at the extent they affect models’ observable behaviour. The models represent gene regulatory networks in terms of genes, proteins, binding sites, regulatory functions, and constraints on growth rates and binding affinities. Although such models do provide limited accuracy, they are less dependent from parameter fitting and can provide predictions on some biological aspects of gene regulation that are not dependent form the choice of particular parameter values. The models can provide biologically feasible predictions about synchronised oscillation of the involved genes and functions that regulate gene activity on basis of regulatory network topology alone. The work also includes developments of new analysis methods, in particular, for analysis of available trajectories in HSM state spaces and derivation of constraints that are needed for state transition trajectories to satisfy the required specific properties.
Download

Paper Nr:	15
Title:	Separation of Concerns in Extended Epidemiological Compartmental Models
Authors:	A Yvan Guifo Fodjo, Mikal Ziane, Serge Stinckwich, Bui Thi Mai Anh and Samuel Bowong
Abstract:	Epidemiological models become more and more complex as new concerns are taken into account (age, sex, spatial heterogeneity, containment or vaccination policies, etc.). This is problematic because these aspects are typically intertwined which makes models difficult to extend, change or reuse. The Kendrick approach has shown promising results to separate epidemiological concerns but is restricted to homogeneous compartmental models. In this paper, we report on an attempt to generalize the Kendrick approach to support some aspects of contact networks, thereby improving the predictive quality of models with significant heterogeneity in the structure of contacts, while keeping the simplicity of compartmental models. This approach has been validated on two different techniques to generalize compartmental models.
Download

Paper Nr:	19
Title:	Replicability of Differentially Expressed Genes Versus Biological Pathways Biomarkers in Diagnosing Sepsis
Authors:	Kelsey Winkeler and Carly A. Bobak
Abstract:	It is generally believed that biological pathways representing curated gene sets are not only more interpretable, but also more replicable and reproducible than gene signatures. With the falling costs of next generation sequencing, we are approaching a point where the cost fully sequencing the transcriptome is competitive with quantifying a targeted gene expression signature which opens up the possibility of pathway signatures for infectious disease. In this work, we evaluated if pathway based signatures are really more reproducible than gene signatures (improvement between 0.83 and over 1 million fold), and amend a meta-analysis framework known for generating highly reproducible gene signatures to instead produce pathway signatures (AUC improves from 0.854 to 0.964 and 0.556 to 0.677 between gene and pathway signatures in independent validation data). We conclude that pathway based signatures show clinical promise for the diagnosis of infectious disease, and there is a growing need for methods considering such signatures.
Download