BIOINFORMATICS 2023 Abstracts

Full Papers

Paper Nr:	4
Title:	Condition for Sustained Oscillations in Repressilator Based on a Hybrid Modeling of Gene Regulatory Networks
Authors:	Honglu Sun, Jean-Paul Comet, Maxime Folschette and Morgan Magnin
Abstract:	In this work, we study the existence of sustained oscillations in the “canonical” repressilator, a basic synthetic circuit of 3 genes leading to sustained oscillations. Previous works mostly used differential equations to study the repressilator. In our work, a pre-existing hybrid modeling framework of gene regulatory networks called HGRN is used to model this system. Compared to differential equations, dynamical properties of HGRNs are easier to prove theoretically due to its lower dynamical complexity. The objective of this work is to find conditions for the existence of sustained oscillations described by separable constraints on parameters. With such separable constraints, each parameter is constrained individually by an interval, which can provide useful information for the design of synthetic circuits. Our two major contributions are the following: firstly, we develop, by using the Poincaré map, a sufficient and necessary condition for the existence of sustained oscillations; then, based on this condition, we give a method using the range enclosure property of Bernstein coefficients to compute compatible separable constraints. By applying this method, we successfully obtain sets of conditions for the existence of sustained oscillations described as separable constraints.
Download

Paper Nr:	9
Title:	Local Forward-Motion Panoramic Views for Localization and Lesion Detection for Multi-Camera Wireless Capsule Endoscopy Videos
Authors:	Marina Oliveira and Helder Araujo
Abstract:	Understanding Wireless Capsule Endoscopy videos is a challenging process since it demands a substantial amount of time and expertise to recognise and accurately interpret findings. The low lesion detection rate with this technology is mainly attributed to the poor image quality of the retrieved frames, the large sets of image data information to process and the time constraints. To overcome these limitations, in this paper, we explore a methodology for constructing local forward-motion panoramic overviews to condense valuable information for lesion detection and localization procedures.
Download

Paper Nr:	11
Title:	Hemodynamics of Convergent Cavopulmonary Connection with Ventricular Assist Device for Fontan Surgery: A Computational and Experimental Study
Authors:	Qiyuan Wu, Vincent Cleveland, Seda Aslan, Xiaolong Liu, Jacqueline Contento, Paige Mass, Byeol Kim, Catherine Pollard, Pranava Sinha, Yue-Hin Loke, Laura Olivieri and Axel Krieger
Abstract:	Fontan surgery is the clinical standard for single ventricle heart disease, with total cavopulmonary connection (TCPC) as the current preferred configuration. Mechanical circulatory support (MCS) is often desired to improve hemodynamics and reduce post-surgical complications. Convergent cavopulmonary connection (CCPC) was recently proposed to solve the difficulty of integrating MCS in TCPC. In this study, we investigated the hemodynamics of the CCPC conduit with a ventricular assist device (VAD) integrated and explored indexed power jump (iPJ) and time-averaged wall shear stress (TAWSS) by computational fluid dynamics (CFD) with assistance from flow loop experiments. Positive time-averaged iPJ was observed in the cases with limited cardiac output, and regions with non-physiologic low TAWSS were significantly reduced for all cases. These results could strengthen the feasibility of this novel CCPC Fontan configuration as a solution for MCS integration.
Download

Paper Nr:	12
Title:	Efficient Representation of Biochemical Structures for Supervised and Unsupervised Machine Learning Models Using Multi-Sensoric Embeddings
Authors:	Katrin S. Bohnsack, Alexander Engelsberger, Marika Kaden and Thomas Villmann
Abstract:	We present an approach to efficiently embed complex data objects from the chem- and bioinformatics domain like graph structures into Euclidean vector spaces such that those data bases can be handled by machine learning models. The method is denoted as sensoric response principle (SRP). It uses a small subset of objects serving as so-called sensors. Only for these sensors, the computationally demanding dissimilarity calculations, e.g. graph kernel computations, have to be executed and the resulting response values are used to generate the object embedding into an Euclidean representation space. Thus, the SRP avoids to calculate all object dissimilarities for embedding, which usually is computationally costly due to the complex proximity measures in use. Particularly, we consider strategies to determine the number of sensors for an appropriate embedding as well as selection strategies for SRP. Finally, the quality of the embedding is evaluated w.r.t. to the preservation of the original object relations in the embedding space. The SRP can be used for unsupervised and supervised machine learning. We demonstrate the ability of the approach for classification learning in context of an interpretable machine learning classifier.
Download

Paper Nr:	13
Title:	Fast Exact String to D-Texts Alignments
Authors:	Njagi M. Mwaniki, Erik Garrison and Nadia Pisanti
Abstract:	A D-strings is a degenerate string representing similar and aligned strings by collapsing common fragments and highlighting variants. D-strings can represent a MSA or a pan-genome. In this paper we propose a new, fast and exact method to align a string to a D-string. In recent years, aligning a sequence to a pangenome has become a central problem in computational genomics and pangenomics. A fast and accurate solution to this problem can serve as a toolkit to many crucial tasks such as read-correction, Multiple Sequences Alignment (MSA), genome assemblies, and variant calling, just to name a few. An implementation of our tool is publicly available on github at https://github.com/urbanslug/dsa.
Download

Paper Nr:	17
Title:	Ontology-Powered Boosting for Improved Recognition of Ontology Concepts from Biological Literature
Authors:	Pratik Devkota, Somya D. Mohanty and Prashanti Manda
Abstract:	Automated ontology curation involves developing machine learning models that can learn patterns from scientific literature to predict ontology concepts for pieces of text. Deep learning has been used in this area with promising results. However, these models often ignore the semantically rich information that’s embedded in the ontologies and treat ontology concepts as independent entities. Here, we present a novel approach called Ontology Boosting for improving prediction accuracy of automated curation techniques powered by deep learning. We evaluate the performance of our models using Jaccard semantic similarity – a metric designed to assess similarity between ontology concepts. Semantic similarity metrics have the capability to estimate partial similarity between ontology concepts thereby making them ideal for evaluating the performance of annotation systems such as deep learning where the goal is to get as close as possible to human performance. We use the CRAFT gold standard corpus for training our architectures and show that the Ontology Boosting approach results in substantial improvements in the performance of these architectures.
Download

Paper Nr:	18
Title:	Skin Lesion Segmentation Using Attention-Based DenseUNet
Authors:	Anwar Jimi, Hind Abouche, Nabila Zrira and Ibtissam Benmiloud
Abstract:	Skin lesion segmentation in dermoscopic images is still a challenging problem due to the blurry borders and low contrast of the lesions. Deep learning networks, like U-Net, have been successfully used to segment medical images over the past few years, and their performance has improved in terms of time and accuracy. This paper proposes an automated method for segmenting lesion boundaries that combines two architectures (i.e., the U-Net and the DenseNet as backbone) as well as the attention mechanism. Moreover, we also used adaptive gamma correction to enhance the contrast of the image, which considerably enhanced the segmentation results. Furthermore, we trained our model on the ISIC 2016, the ISIC 2017, and the ISIC 2018 datasets. Finally, the qualitative and quantitative experimental results of the skin lesion segmentation are very promising.
Download

Paper Nr:	24
Title:	PUMP: An Underspecification Analysis Tool
Authors:	Jonathan Tang, McClain Kressman, Harsha Lakshmankumar, Belle Aduaka, Ava Jakusovszky, Paul Anderson and Jean Davidson
Abstract:	In fields such as biomedicine, neural networks may encounter a problem known as underspecification, in which models learn a solution that performs poorly and inconsistently when deployed in more generalized real-world scenarios. A current barrier to studying this problem in biomedical research is a lack of tools engineered to uncover and measure the degree of underspecification. For this reason, we have developed Predicting Underspecification Monitoring Pipeline or PUMP. We demonstrate the utility of PUMP in predictive modeling of breast cancer subtypes. In addition to providing methods to measure, monitor, and predict underspecification, we explore methods to minimize the production of underspecified models by incorporating biological insight that aims to rank potential models.
Download

Paper Nr:	37
Title:	Unsupervised Cardiac Differentiation Stage Portraying and Pseudotime Mapping Based on Gene Expression Data
Authors:	Sofia P. Agostinho, Joaquim S. Cabral, Ana N. Fred and Carlos V. Rodrigues
Abstract:	This paper presents a reanalysis, of a previously published RNA-seq dataset, using several unsupervised learning algorithms to study, from a whole transcriptome point of view, the changes occurring during stem cell cardiac differentiation. The main objectives of this work were to highlight differences in gene expression patterns between differentiation stages and, to create a strategy to map bulk RNA-seq samples onto a pseudotime axis to analyse, quantitatively, how the transcriptome is evolving in comparison to the real culture time. The method here proposed effectively portrayed the transcriptomic changes that occurred throughout the differentiation processes, with a visual representation of the entire transcriptome. The portraits revealed over-expressed genes correlated with different biological processes and gene sets for each stage of the differentiation. The time mapping results highlighted not only the abrupt changes in the transcriptome due to the activation and inhibition of the Wnt signalling pathway, but also the fact that upon the effect of the Wnt inhibitor, and despite the additional culture days, the transcriptome is not changing as fast as previously posing questions regarding maturation strategies. Taken together the proposed workflow, was considered promising as a tool to compare different differentiation protocols and maturation strategies.
Download

Short Papers

Paper Nr:	5
Title:	MRQPMS: Design of a Map Reduce Bioinspired Model for Solving Quorum Planted Motif Search for High-Speed Deployments
Authors:	Aditi R. Durge and Deepti D. Shrimankar
Abstract:	Quorum Planted Motif Search (qPMS) is a specialized field of PMS which provides matching outputs only if the search motif appears in q% of the results. Designing qPMS models is a multidomain task, that involves collection of application-specific datasets, pre-processing of these datasets for identification of frequent patters, matching of these patterns, and contextual post-processing operations. Due to large-length sequences, the search process is highly complex, and requires dataset-specific optimizations. To perform these optimizations, a wide variety of tools are developed by researchers and each of them vary in terms of their qualitative & quantitative characteristics. Most of these models are non-reconfigurable, and can be used only for specific datasets, while others present highly complex search mechanisms, which limits their applicability. To overcome these limitations, this text proposes design of a Map Reduce Model for solving Quorum Planted Motif Search for high-speed deployments. The proposed model initially stores input genomic sequences via a Map Reduce framework, which assists in faster search via use of unique entity-level keys for different sequence types. These keys are stored via the Apache Hadoop framework, which assists in improving search performance under large dataset scenarios. Due to use of Map Reduce, the model is capable of higher scalability, better flexibility, low delay, and security via parallel processing operations. This was possible due to pre-processing of input DNA sequences and reducing them into index-based searchable formats. The model also deploys a Genetic Algorithm (GA) for identification of optimum Q values for enhanced accuracy under different use cases. It was tested for protein & DNA sequences, and its performance was evaluated in terms of accuracy, retrieval delay, precision, & throughput parameters, and compared with various state-of-the-art models under different use case scenarios. Based on this comparison, it was observed that the proposed model was capable achieving 3.5% higher accuracy, 9.4% lower delay, 2.9% higher precision, and 8.5% higher throughput under different scenarios. Due to these advantages the proposed model is capable of deployment for a wide variety of real-time use cases.
Download

Paper Nr:	6
Title:	Intelligence-Based Recommendation System for Critical Stroke Management in Intensive Care Units
Authors:	Luis García Terriza, José L. Risco-Martín, José L. Ayala and Gemma R. Roselló
Abstract:	This work presents an integrated recommendation system capable of providing support in healthcare critical environments such as Intensive Care Units or Stroke Care Units using Machine Learning techniques. The system can manage several patients by reading monitoring hemodynamic data in real-time, presenting current death risk probability, and showing recommendations that would reduce such probability and, in some cases, avoid death. This system introduces a novel method to produce recommendations based on genetic models and supervised machine learning. The interface is built upon a web application where clinicians can evaluate recommendations and straightforwardly provide feedback.
Download

Paper Nr:	7
Title:	Building a DNA Methylation Aging Clock Model on Less Labelled Data Using Item Response Theory
Authors:	Keiji Yasuda, Miyuki Nakamura, Masatoshi Nagata and Masaru Honjo
Abstract:	A method is proposed for DNA methylation analysis using item response theory. The analysis method consists of two steps: a cytosine phosphate guanine (CpG) sites selection step, and a parameters estimation step. Experiments are carried out to compare several CpG site selection conditions and evaluate if item response theory (IRT) can be applied to methylation analysis. According to the results of an experiment on public data measured by infinium HumanMethylation450 BeadChip, even under the condition of less age-labelled epigenetic data, CpG site filtering works well and the following IRT-based epigenetic clock model produced precise performance.
Download

Paper Nr:	8
Title:	A Formal Probabilistic Model of the Inhibitory Control Circuit in the Brain
Authors:	Elisabetta De Maria, Benjamin Lapijover, Thibaud L’Yvonnet, Sabine Moisan and Jean-Paul Rigault
Abstract:	The decline of inhibitory control efficiency in aging subjects with neurodegenerative diseases is due to anatomical and functional changes in (pre)frontal regions of the brain. Among these regions, the basal ganglia play a central role in the inhibitory control loop. We propose a probabilistic formal model of the biological neural network governing the inhibitory control function and we study some of its relevant dynamic properties. We also explore how parameter variations influence the probability for the model to display some key behaviors. We model the different structures of the inhibitory control loop thanks to discrete Markov chains representing Leaky Integrate and Fire neurons. The model is implemented and verified using the P RISM framework. The final aim is to detect sources of pathological behaviors in the neural network responsible for inhibitory control.
Download

Paper Nr:	15
Title:	SAT-Based Method for Finding Attractors in Asynchronous Multi-Valued Networks
Authors:	Takehide Soh, Morgan Magnin, Daniel Le Berre, Mutsunori Banbara and Naoyuki Tamura
Abstract:	In this paper, we propose a SAT-based method for finding attractors of bounded size in asynchronous automata networks. The automata network is a multi-valued mathematical model which has been studied for the qualitative modeling of biological regulatory networks. An attractor is a minimal set of states in automata networks that cannot be escaped and thus loops indefinitely. Attractors are crucial to validate the initial design of a biological model and predict possible asymptotic behaviors, e.g., how cells may result through maturation in differentiated cell types. Developing an efficient computational method to find attractors is thus an important research topic. Our contribution is a translation of the problem of finding attractors of automata networks into a sequence of propositional satisfiability (SAT) problems. We also propose to add two optional constraints to improve the computation time of attractors. Experiments are carried out using 30 automata networks, 8 coming from real biological case studies and 22 crafted ones with controlled attractor size. The experimental results show that our method scales better than the state-of-the-art ASP method when the size of the attractors increases.
Download

Paper Nr:	16
Title:	NP-BERT: A Two-Staged BERT Based Nucleosome Positioning Prediction Architecture for Multiple Species
Authors:	Ahtisham Fazeel, Areeb Agha, Andreas Dengel and Sheraz Ahmed
Abstract:	Nucleosomes are complexes of histone and DNA base pairs in which DNA is wrapped around histone proteins to achieve compactness. Nucleosome positioning is associated with various biological processes such as DNA replication, gene regulation, DNA repair, and its dysregulation can lead to various diseases such as sepsis, and tumor. Since nucleosome positioning can be determined only to a limited extent in wet lab experiments, various artificial intelligence-based methods have been proposed to identify nucleosome positioning. Existing predictors/tools do not provide consistent performance, especially when evaluated on 12 publicly available benchmark datasets. Given such limitation, this study proposes a nucleosome positioning predictor, namely NP-BERT. NP-BERT is extensively evaluated in different settings on 12 publicly available datasets from 4 different species. Evaluation results reveal that NP-BERT achieves significant performance on all datasets, and beats state-of-the-art methods on 8/12 datasets, and achieves equivalent performance on 2 datasets. The codes and datasets used in this study are provided in https://github.com/FAhtisham/Nucleosome-position-prediction.
Download

Paper Nr:	21
Title:	GediNETPro: Discovering Patterns of Disease Groups
Authors:	Emma Qumsiyeh, Miray U. Yazıcı and Malik Yousef
Abstract:	The GediNET tool is based on the Grouping, Scoring, Modeling (G-S-M) approach for detecting disease-disease association (DDA). In this study, we have developed the pro version, GediNETPro, that utilizes the Cross-Validation (CV) information to detect patterns of disease groups association by applying clustering approaches, such as K-means, extracted from the groups’ ranks over the CV iterations. Additionally, a cluster score is computed to measure its significance to provide a deep analysis of the output of GediNET, yielding new biological knowledge that GediNET did not detect. Further, GediNETPro utilizes a visualization approach, such as a heatmap, to get novel insights and in-depth analysis of the disease groups clusters revealing the relationship between diseases that might be used for developing effective interventions for diagnosing. We have tested GediNETPro on the Breast cancer dataset downloaded from the TCGA database. Results showed deeper insight into the interaction and collective behavior of the DDA, facilitating the identification and association of potential biomarkers.
Download

Paper Nr:	22
Title:	DT-ML: Drug-Target Metric Learning
Authors:	Domonkos Pogány and Péter Antal
Abstract:	The challenges of modern drug discovery motivate the use of machine learning-based methods, such as predicting drug-target interactions or novel indications for already approved drugs to speed up the early discovery or repositioning process. Publication bias has resulted in a shortage of known negative data points in large-scale repositioning datasets. However, training a good predictor requires both positive and negative samples. The problem of negative sampling has also recently been addressed in subfields of machine learning with utmost importance, namely in representation and metric learning. Although these novel negative sampling approaches proved to be efficient solutions for learning from imbalanced data sets, they have not yet been used in repositioning in such a way that the learned similarities give the predicted interactions. In this paper, we adapt representation learning-inspired methods in pairwise drug-target/drug-disease predictors and propose a modification to one of the loss functions to better manage the uncertainty of negative samples. We evaluate the methods using benchmark drug discovery and repositioning data sets. Results indicate that interaction prediction with metric learning is superior to previous approaches in highly imbalanced scenarios, such as drug repositioning.
Download

Paper Nr:	23
Title:	GPTree: Generator of Phylogenetic Trees with Overlapping and Biological Events for Supertree Inference
Authors:	Aleksandr Koshkarov and Nadia Tahiri
Abstract:	Summary: More and more evolutionary and molecular biologists are interested in building alternative supertrees. Often, developing new approaches or testing new metrics requires relevant datasets that are not always easy to obtain. In order to solve this problem of lack of data, we propose a new approach and developed a program in Python to generate overlapping phylogenetic trees with biological events to simplify the process of obtaining this type of data. The new tool takes the number of phylogenetic trees the user wants to generate, the maximum number of leaves per tree to generate, and the average level of leaf overlap between phylogenetic trees as input parameters. The program returns to the user a set of phylogenetic trees in Newick format, respecting the parameters given as input, in order to use them to infer a supertree (or supertrees). This data can be an important resource for research; the user can download the generated data and use it later in their relevant application tasks. Availability and implementation: The generator is freely and publicly available to the entire scientific community on the GitHub platform, without any registration, https://github.com/tahiri-lab/gptree under the MIT licence. The pipeline is written in Python 3.7.
Download

Paper Nr:	25
Title:	System Modeling and Machine Learning in Prediction of Metastases in Lung Cancer
Authors:	Andrzej Swierniak, Emilia Kozłowska, Krzysztof Fujarewicz, Damian Borys, Agata Wilk, Jaroslaw Smieja and Rafal Suwinski
Abstract:	The aim of this paper is to present goals and preliminary results of our project devoted to system engineering approach in prediction of metastases in lung cancer. More specifically we consider existing and develop new methods of system modeling, machine learning, signal processing and intelligent control to find biomarkers enabling prediction of risk of tumor spread and colonization of distant organs in non-small-cell lung carcinoma basing on clinical data and medical images. The results could bring us knowledge about the dynamics and origin of metastatic dissemination of lung cancer. By dynamics, we understand when and where a tumor will disseminate, and by origin we mean dissemination path (directly from original tumor or through lymphatic nodes). This information is very valuable for clinicians, as it could guide the personalized treatment of lung cancer patients. The results will elucidate important issues concerning prediction of individual progress of cancer and treatment outcome in oncology. They will provide both theoretical and simulation tools to support decision making and diagnostics in oncology, on the basis of individual patient state.
Download

Paper Nr:	26
Title:	Patient Similarity Networks Integration for Partial Multimodal Datasets
Authors:	Jessica Gliozzo, Alex Patak, Antonio Puertas-Gallardo, Elena Casiraghi and Giorgio Valentini
Abstract:	Integration of partial samples in Patients Similarity Networks, i.e. the combination of multiple data sources when some of them are completely missing in some samples, is a largely overlooked problem in the multi-omics data integration literature for Precision Medicine. Nevertheless in clinical practice it is quite usual that one or more types of data are missing for a subset of patients. We present an algorithm able to combine multiple sources of data in Patients Similarity Networks when data of one or more sources are completely missing for a subset of patients. The proposed approach relies on a message-passing learning strategy to recover and combine completely missing data leveraging the Similarity Network Fusion algorithm. Preliminary results on TCGA breast cancer data show the effectiveness of the proposed approach.
Download

Paper Nr:	28
Title:	Intrinsic-Dimension Analysis for Guiding Dimensionality Reduction in Multi-Omics Data
Authors:	Valentina Guarino, Jessica Gliozzo, Ferdinando Clarelli, Béatrice Pignolet, Kaalindi Misra, Elisabetta Mascia, Giordano Antonino, Silvia Santoro, Laura Ferré, Miryam Cannizzaro, Melissa Sorosina, Roland Liblau, Massimo Filippi, Ettore Mosca, Federica Esposito, Giorgio Valentini and Elena Casiraghi
Abstract:	Multi-omics data are of paramount importance in biomedicine, providing a comprehensive view of processes underlying disease. They are characterized by high dimensions and are hence affected by the so-called ”curse of dimensionality”, ultimately leading to unreliable estimates. This calls for effective Dimensionality Reduction (DR) techniques to embed the high-dimensional data into a lower-dimensional space. Though effective DR methods have been proposed so far, given the high dimension of the initial dataset unsupervised Feature Selection (FS) techniques are often needed prior to their application. Unfortunately, both unsupervised FS and DR techniques require the dimension of the lower dimensional space to be provided. This is a crucial choice, for which a well-accepted solution has not been defined yet. The Intrinsic Dimension (ID) of a dataset is defined as the minimum number of dimensions that allow representing the data without information loss. Therefore, the ID of a dataset is related to its informativeness and complexity. In this paper, after proposing a blocking ID estimation to leverage state-of-the-art (SOTA) ID estimate methods we present our DR pipeline, whose subsequent FS and DR steps are guided by the ID estimate.
Download

Paper Nr:	30
Title:	MOT: A Multi-Omics Transformer for Multiclass Classification Tumour Types Predictions
Authors:	Mazid A. Osseni, Prudencio Tossou, François Laviolette and Jacques Corbeil
Abstract:	Motivation: Breakthroughs in high-throughput technologies and machine learning methods have enabled the shift towards multi-omics modelling as the preferred means to understand the mechanisms underlying biological processes. Machine learning enables and improves complex disease prognosis in clinical settings. However, most multi-omic studies primarily use transcriptomics and epigenomics due to their over-representation in databases and their early technical maturity compared to others omics. For complex phenotypes and mechanisms, not leveraging all the omics despite their varying degree of availability can lead to a failure to understand the underlying biological mechanisms and leads to less robust classifications and predictions. Results: We proposed MOT (Multi-Omic Transformer), a deep learning based model using the transformer architecture, that discriminates complex phenotypes (herein cancer types) based on five omics data types: transcriptomics (mRNA and miRNA), epigenomics (DNA methylation), copy number variations (CNVs), and proteomics. This model achieves an F1-score of 98:37% among 33 tumour types on a test set without missing omics views and an F1-score of 96:74% on a test set with missing omics views. It also identifies the required omic type for the best prediction for each phenotype and therefore could guide clinical decision-making when acquiring data to confirm a diagnostic. The newly introduced model can integrate and analyze five or more omics data types even with missing omics views and can also identify the essential omics data for the tumour multiclass classification tasks. It confirms the importance of each omic view. Combined, omics views allow a better differentiation rate between most cancer diseases. Our study emphasized the importance of multi-omic data to obtain a better multiclass cancer classification. Availability and implementation: MOT source code is available at https://github.com/dizam92/multiomic predictions.
Download

Paper Nr:	31
Title:	Separation of Concerns in an Edge-Based Compartmental Modeling Framework
Authors:	A. Y. Guifo Fodjo, Jerry L. Zeutouo and Samuel Bowong
Abstract:	A well-known framework with strong potential for epidemic prediction and the ability to incorporate realistic contact structures is edge-based compartmental modeling (EBCM). However, models built from this framework lead to a multiplication of ordinary differential equations and many parameters to be estimated, which make the models complex and difficult to extend or to reuse. The Kendrick approach has shown promising results in generalizing compartmental models to take into account aspects of contact networks while preserving the separation of concerns, thus allowing to define modular, extensible and reusable models. But this generalization of compartmental models to contact network aspects is still limited to a few contact networks. In this paper, we present an attempt to extend Kendrick’s approach from an approximation of EBCM models to further support aspects of contact networks, thereby improving the predictive quality of models with significant heterogeneity in contact structure, while maintaining the simplicity of compartmental models. This extension consists of an integration of the basic reproductive number R0 into the compartmental SIR framework. This attempted is validated using Miller’s mass action and the approximation of EBCM configuration model.
Download

Paper Nr:	32
Title:	Predicting Moonlighting Proteins from Protein Sequence
Authors:	Jing Hu and Yihang Du
Abstract:	High-throughput proteomics projects have resulted in a rapid accumulation of protein sequences in public databases. For the majority of these proteins, limited functional information has been known so far. Moonlighting proteins (MPs) are a class of proteins which perform at least two physiologically relevant distinct biochemical or biophysical functions. These proteins play important functional roles in enzymatic catalysis process, signal transduction, cellular regulation, and biological pathways. However, it has been proven to be difficult, time-consuming, and expensive to identify MPs experimentally. Therefore, computational approaches which can predict MPs are needed. In this study, we present MPKNN, a K-nearest neighbors method which can identify MPs with high efficiency and accuracy. The method is based on the bit-score weighted Euclidean distance, which is calculated from selected features derived from protein sequence. On a benchmark dataset, our method achieved 83% overall accuracy, 0.64 MCC, 0.87 F-measure, and 0.86 AUC.
Download

Paper Nr:	36
Title:	Semi-Automated Workflow for Computer-Generated Scoring of Ki67 Positive Cells from HE Stained Slides
Authors:	Dominika Petríková, Ivan Cimrák, Katarína Tobiášová and Lukáš Plank
Abstract:	The Ki67 positive cell score assessed by immunohistochemistry (IHC) is considered a good biomarker of cell proliferation in determining therapeutic protocols. Manual estimation of Ki67 scores has several limitations as it is time consuming and subject to inter-rater variability. Moreover, the IHC staining is not always available. This could potentially be addressed by using neural network models to predict Ki67 scores directly from hematoxylin and eosin (HE) stained tissue. However, neural networks require large well-annotated datasets, the creation of which is often a laborious process requiring the work of experienced pathologists. Such database containing images of HE stained tissue with Ki67 labels is currently not available. In this paper, we propose a semi-automated dataset generation approach to predict Ki67 scores from pairs of HE and IHC slides with minimal assistance from experts. Using a sample of 15 pairs of whole slide images stained by HE and IHC methods, we proposed a workflow for generating HE patches with Ki67 labels using image analysis methods such as clustering and tissue registration. From the IHC images processed by the aforementioned methods, we estimated the percentage of Ki67 positive cells in each patch. To verify the validity of the proposed approach we automatically assigned Ki67 labels to HE patches from manually annotated HE - Ki67 pairs. To illustrate the potential of neural network for assigning the Ki67 label to HE patches, we trained a neural network model on a sample of three whole slide images, which was able to classify Ki67 positivity ratio of tissue from HE patches into two Ki67 labels.
Download

Paper Nr:	10
Title:	Efficient Hashing of Multiple Spaced Seeds with Application
Authors:	Eleonora Mian, Enrico Petrucci, Cinzia Pizzi and Matteo Comin
Abstract:	Alignment-Free analysis of sequences has enabled high-throughput processing of sequencing data in many bioinformatics pipelines. Hashing k-mers is a common function across many alignment-free applications and it is widely used for indexing, querying and rapid similarity search. Recently, spaced seeds, a special type of pattern that accounts for errors or mutations, are routinely used instead of k-mers. Spaced seeds allow to improve the sensitivity, with respect to k-mers, in many applications, however the hashing of spaced seeds increases substantially the computational time. Moreover, if multiple spaced seeds are used the accuracy can further increases at the cost of running time. In this paper we address the problem of efficient multiple spaced seed hashing. The proposed algorithms exploit the similarity of adjacent spaced seed hash values in an input sequence in order to efficiently compute the next hashes. We report the results on several tests which show that our methods significantly outperform the previously proposed algorithms, with a speedup that can reach 20x. We also apply these efficient spaced seeds hashing algorithms to an application in the field of metagenomic, the classification of reads performed by Clark-S (Ounit and Lonardi, 2016), and we shown that a significant speedup can be obtained, thus resolving the slowdown introduced by the use of multiple spaced seeds. Code available at: https://github.com/CominLab/MISSH.
Download

Paper Nr:	19
Title:	Prediction of Antimicrobial Peptides Using Deep Neural Networks
Authors:	Ümmü G. Söylemez, Malik Yousef and Burcu Bakir-Gungor
Abstract:	Antimicrobial peptides (AMPs) are crucial elements of the innate immune system; and they are effective against bacteria that cause several diseases. These peptides are investigated as potential alternatives of antibiotics to treat infections. Since wet lab experiments are expensive and time-consuming, computational methods become crucial in this field. In this study, we suggest a computational technique for AMP prediction using deep neural networks (DNN). We trained a DNN classifier using physicochemical features that include a sequential model; and evaluated the model with 10-fold cross-validation on a benchmark dataset. We compared our method with other machine learning approaches and demonstrated that the method we developed generates higher performance (accuracy: 92%, precision: 92%, recall: 93%, f1: 93%, AUC: 98%). In our experiments, we have realized that there is a strong positive correlation between the ‘Normalized Hydrophobic Moment’ feature and ‘Angle Subtended by the Hydrophobic Residues’ feature; and strong negative correlations between ‘Normalized Hydrophobicity’ feature and ‘Disordered Conformation Propensity’ feature, and between ‘Amphilicity Index’ - ‘Disordered Conformation Propensity’ features. We believe that the approach we proposed could guide further experimental studies and could facilitate the prediction of other types of AMPs having anticancer, antivirus, antiparasitic activities.
Download

Paper Nr:	27
Title:	Formal Analysis of Rewriting System Representing RNA Folding
Authors:	Krishnendu Ghosh and Julia Goldman
Abstract:	Prediction of RNA structure is an important problem in understanding biological processes in living organism. Computational models have been created to study the processes with the aim of unravelling the RNA structure. In this work, a novel formalism for formal analysis of RNA structure prediction is described. A graph rewriting system is formalized to represent structural dynamics of RNA structure under uncertainty. Probabilistic model checking is performed on queries seeking structural properties in RNA. Experiments were conducted to evaluate the computational feasibility of the model.
Download

Paper Nr:	33
Title:	Ongoing Work to Study the Underlying Statistical Patterns of Oesophageal Chromothripsis
Authors:	Jack Fraser-Govil and Zemin Ning
Abstract:	In this position paper we demonstrate our ongoing efforts to develop and test a number of statistical tools and methedologies which allow us to study the underlying statistical properties of a genetic sequence which has undergone chromothripsis, and hence provide some novel probes into the mechanisms which cause such catastrophic genomic rearrangement. Using these tools, we study an oesophogeal cancer sample showing more than 1000 rearrangements, with 800 of these on chromosome 6. By studying this chromosome, we challenge a prevalent idea within the literature: that chromothripsis breakpoints are non-random, finding instead that despite a high degree of clustering, the clusters themselves are uniformly distributed across the chromosome. We also show that although 3-dimensional proximity is a tempting explanation for the rearrangement pattern, the statistical evidence does not favour it at the current time. In addition, we attempt to disambiguate some of the terminology surrounding chromothripsis.
Download

Paper Nr:	34
Title:	Computational Study of Particle Separation Based on Inertial Effects in Rectangular Serpentine Channels with Different Aspect Ratios
Authors:	Alžbeta Bugáňová and Ivan Cimrák
Abstract:	Inertial effects in straight and curved microfluidic channels have great potential for separation of particles of different sizes. the geometry of the channels influences the separation. In this work we consider a serpentine channel with rectangular cross section of different sizes to explore the influence of aspect ratio on focusing performance and particle separation possibilities. Particle trajectories of different sizes are studied by means of a computational simulations. We show that low-aspect ratio offers more possibilities for separation in terms of particle sizes as well as in terms of higher throughput.
Download

Paper Nr:	39
Title:	GenExViz: Effective Visualizations of Bioinformatics Data - An Analysis Studies on Cancer Prevention
Authors:	Tommy Dang
Abstract:	Data visualization plays an essential role in analyzing bioinformatics as it can provide a holistic view of the data, facilitate high-dimensional biological data analysis, and uncover the latent relations between proteins. However, current methods can not deal with large and complex multidimensional bioinformatics data. This paper explores the novel marriage of data visualization and user interface for analyzing large gene expression data generated under different tested conditions. In particular, we focus on analyzing and visualizing the gene networks of cancer pathways. Although our work focuses on analyzing cancer datasets, our methodology has more general implications for other bioinformatics data sets in a similar setup.
Download