BIOINFORMATICS 2025 Abstracts


Full Papers
Paper Nr: 53
Title:

Enhanced Body Composition Estimation from 3D Body Scans

Authors:

Boyuan Feng, Yijiang Zheng, Ruting Cheng, Shuya Feng, Khashayar Vaziri and James Hahn

Abstract: Accurate body composition assessment is essential for evaluating health and diagnosing conditions like sarcopenia and cardiovascular disease. Approaches for accurately measuring body composition, such as Dual-Energy X-ray Absorptiometry (DXA) and Magnetic Resonance Imaging (MRI), are precise but costly and limited in accessibility. Some studies have explored predicting body composition by using shapes since 3D scanning techniques allow for precise and efficient digital measurements of body shape. This study introduces an enhanced method using 3D body scanning integrated with a part-to-global Multilayer Perceptron (MLP) network that incorporates predefined high-level features for body composition prediction. For lean mass estimation, our method achieved a root mean square error (RMSE) of 2.85 kg. For fat mass estimation, the RMSE was 2.50 kg, and for bone mineral content (BMC), the RMSE was 193.50 g. These results represent substantial improvements over existing methods, highlighting the effectiveness and reliability of our approach in accurately predicting body composition metrics.
Download

Paper Nr: 117
Title:

Impact of Biased Data Injection on Model Integrity in Federated Learning

Authors:

Manuel Lengl, Marc Benesch, Stefan Röhrl, Simon Schumann, Martin Knopp, Oliver Hayden and Klaus Diepold

Abstract: Federated Learning (FL) has emerged as a promising solution in the medical domain to overcome challenges related to data privacy and learning efficiency. However, its federated nature exposes it to privacy attacks and model degradation risks posed by individual clients. The primary objective of this work is to analyze how different data biases (introduced by a single client) influence the overall model’s performance in a Cross-Silo FL environment and whether these biases can be exploited to extract information about other clients. We demonstrate, using two datasets, that bias injection can significantly affect model integrity, with the impact varying considerably across different datasets. Furthermore, we show that minimal effort is sufficient to infer the number of training samples contributed by other clients. Our findings highlight the critical need for robust data security mechanisms in FL, as even a single compromised client can pose serious risks to the entire system.
Download

Paper Nr: 121
Title:

Feasibility of Inferring Spatial Transcriptomics from Single-Cell Histological Patterns for Studying Colon Cancer Tumor Heterogeneity

Authors:

Michael Y. Fatemi, Yunrui Lu, Zarif L. Azher, Cyril Sharma, Eric Feng, Alos B. Diallo, Gokul Srinivasan, Grace M. Rosner, Kelli B. Pointer, Brock C. Christensen, Lucas A. Salas, Gregory J. Tsongalis, Scott M. Palisoul, Laurent Perreard, Fred W. Kolling IV, Louis J. Vaickus and Joshua J. Levy

Abstract: Spatial transcriptomics (ST) enables studying spatial organization of gene expression within tissues, offering insights into the molecular diversity of tumors. Recent methods have demonstrated the capability to disaggregate this information at subspot resolution by leveraging both expression and histological patterns. Elucidating such information from histology alone presents a significant challenge, but if solved can enable spatial molecular analysis at cellular resolution even where ST data is not available, reducing study costs. This study explores integrating single-cell histological and transcriptomic data to infer spatial mRNA expression patterns in colorectal cancer whole slide images. A cell-graph neural network algorithm was developed to align histological information extracted from detected cells with single cell RNA, facilitating the analysis of cellular groupings and gene relationships. We demonstrate that single-cell transcriptional heterogeneity within a spot could be predicted from histological markers extracted from cells detected within it. Our model exhibited proficiency in delineating overarching gene expression patterns across whole-slide images. This approach compared favorably to traditional computer vision methods which did not incorporate single cell expression during the model training. This innovative approach augments the resolution of spatial molecular assays utilizing histology as sole input through co-mapping of histological and transcriptomic datasets at the single-cell level.
Download

Paper Nr: 134
Title:

Generating Multiple Alignments of Genomes of the Same Species

Authors:

Jannik Olbrich, Thomas Büchler and Enno Ohlebusch

Abstract: In this paper, we tackle the problem of generating a multiple alignment of assembled genomes of individuals of the same species. Of course, a (colinear) multiple alignment cannot capture structural variants such as inversions or transpositions, but if these are relatively rare (as, for instance, in human or mouse genomes), it makes sense to generate such a multiple alignment. In the following, it is assumed that each assembled genome is composed of contigs. We will show that the combination of a well-known anchor-based method with the technique of prefix-free parsing yields an approach that is able to generate multiple alignments on a pangenomic scale, provided that large structural variants are rare. Furthermore, experiments with real world data show that our software tool PANAMA (PANgenomic Anchor-based Multiple Alignment) significantly outperforms current state-of-the art programs.
Download

Paper Nr: 141
Title:

GenomeCruzer, a 3D Interactive Environment for Genomic Data Visualization and Analysis

Authors:

Cassisa Anna, Jamal Elhasnaoui, Uliveto Chiara, Riccardo Corsi, Elena Grassi, Dalibor Stuchlík, Livio Trusolino, Aleš Křenek, Luca Vezzadini, Andrea Bertotti, Claudio Isella and Enzo Medico

Abstract: The development of high-throughput sequencing technologies has generated vast amounts of multi-layered molecular data from human tumours, but effectively visualizing and analysing these complex datasets remains a significant challenge for researchers. We introduce GenomeCruzer, a software designed to enable real-time, interactive visualization and analysis of large, multi-layer genomic and clinical data. GenomeCruzer uses graphical metaphors to represent continuous variables like gene expression, DNA methylation, and copy number alterations (CNA) through 3D objects with varying colour, size, and transparency, while discrete variables are represented by highlighting or blinking. We applied GenomeCruzer to DNA methylation and DNA/RNA sequencing data from colorectal cancer (CRC) samples from The Cancer Genome Atlas (TCGA) and CRC Patient-Derived Xenografts (PDXs). The software successfully generated 3D landscapes, allowing intuitive exploration of associations between omic profiles and clinical features. GenomeCruzer demonstrates its utility in highlighting subgroup differences, selecting representative cases, annotating samples, and identifying relationships between sample groups and gene signatures. Its intuitive interface and ability to visualize complex data make it a valuable tool for biomedical research.
Download

Paper Nr: 177
Title:

Enhancing the Efficiency of the Grouping-Scoring-Modeling Framework with Statistical Pre-Scoring Component for Transcriptomic Data Analysis

Authors:

Maham Khokhar, Burcu Bakir-Gungor and Malik Yousef

Abstract: The advent of high-throughput transcriptomic technologies has generated vast transcriptomic datasets, challenging current analytical methodologies with their sheer volume and complexity. The Grouping-Scoring-Modeling (G-S-M) approach is one of the recent approaches that treat groups of genes (or clusters of genes) by embedding prior biological knowledge with machine learning in order to detect the most significant groups for classification tasks. The G-S-M might need to treat thousand ten thousand of groups (scoring those groups) which might affect the speed and performance of the algorithm. In response, this study introduces the Pre-Scoring G-S-M model, an enhancement of the established Grouping-Scoring-Modeling (G-S-M) framework. This approach incorporates a Pre-Scoring component that leverages the Limma package for its empirical Bayes methods to optimize initial transcriptomic data evaluation through a percentage-based selection of statistically significant gene groups. Aimed at reducing computational demand and streamlining feature selection, the model also addresses data redundancy by eliminating duplicate gene-disease associations. Application to nine human gene expression datasets from the GEO database showed promising results. It demonstrated improvements in computational efficiency and analytical precision while reducing the number of features selected per dataset compared to the traditional G-S-M approach, without compromising accuracy. These initial findings highlight the Pre-Scoring G-S-M model's potential to enhance transcriptomic data analysis, indicating a promising direction for future bioinformatics research.
Download

Paper Nr: 234
Title:

Leveraging Large Language Models and RNNs for Accurate Ontology-Based Text Annotation

Authors:

Pratik Devkota, Somya D. Mohanty and Prashanti Manda

Abstract: This study investigates the performance of large language models (LLMs) and RNN-based architectures for automated ontology annotation, focusing on Gene Ontology (GO) concepts. Using the Colorado Richly Annotated Full-Text (CRAFT) dataset, we evaluated models across metrics such as F1 score and semantic similarity to measure their precision and understanding of ontological relationships. The Boosted Bi-GRU, a lightweight model with only 38M parameters, achieved the highest performance, with an F1 score of 0.850 and semantic similarity of 0.900, demonstrating exceptional accuracy and computational efficiency. In comparison, LLMs like Phi (1.5B) performed competitively, balancing moderate GPU usage with strong annotation accuracy. Larger models, including Mistral, Meditron, and Llama 2 (7B), delivered comparable results but required significantly higher computational resources for fine-tuning and inference, with GPU usage exceeding 125 GB during fine-tuning. Fine-tuned ChatGPT 3.5 Turbo underperformed relative to other models, while ChatGPT 4 showed limited applicability for this domain-specific task. To enhance model performance, techniques such as prompt tuning and full fine-tuning were employed, incorporating hierarchical ontology information and domain-specific prompts. These findings highlight the trade-offs between model size, resource efficiency, and accuracy in specialized tasks. This work provides insights into optimizing ontology annotation workflows and advancing domain-specific natural language processing in biomedical research.
Download

Paper Nr: 249
Title:

Lower Leg Joint Strategies in the Outside Pass in Soccer

Authors:

Yudai Yamamoto, Viktor Kozák and Ikuo Mizuuchi

Abstract: We study the leg motion for an outside pass in soccer, observing four different movement strategies. The aim of this research is to validate the presence of these four strategies by training an agent with a higher reward for kicking a faster ball. Additionally, we aim to explore the role of the collateral ligaments’ stiffness in the outside pass. We built two leg models: (a) a two-degree-of-freedom leg model that applies torque around the hip joint, and (b) a three-degree-freedom leg model that applies pitch-roll-yaw torque around the hip joint and pitch torque around the knee joint. We trained a Deep Deterministic Policy Gradient (DDPG) agent using these models and analyzed the torques around the hip and knee joints, as well as the ball velocity after the leg loses contact with the ball. We observed three strategies similar to human behavior throughout agent learning.
Download

Paper Nr: 268
Title:

Assessing the Influence of scRNA-Seq Data Normalization on Dimensionality Reduction Outcomes

Authors:

Marcel Ochocki, Michal Marczyk and Joanna Zyla

Abstract: Through the decades, improvements in high-throughput molecular biology techniques have brought to the level of sequencing transcripts from single cells (scRNA-Seq) instead of bulk material. Implementing these new techniques requires innovative analytical methods and knowledge about their performance. Data normalization is a crucial step in the bioinformatical pipeline applied in scRNA-Seq analysis. We evaluated the impact of six commonly used normalization methods on two dimensionality reduction methods, namely tSNE and UMAP, using three real scRNA-Seq datasets. We tested dispersion and clustering efficiency using three clustering algorithms after dimensionality reduction. Our results demonstrated that simple normalization methods, such as log2 or Freeman-Tukey, as well as scran normalization consistently outperformed other scRNA-seq-dedicated techniques, yielding superior dimensionality reduction and clustering efficiency for small and medium-sized datasets. Regardless of no statistically significant enhancement in results for any dimensionality reduction methods or clustering techniques, the Louvain clustering method consistently demonstrated lower performance results. We conclude, that the choice of normalization technique should be carefully tailored to the dataset’s size and characteristics since it may affect the final within-pipeline processing results.
Download

Short Papers
Paper Nr: 23
Title:

System DietadHoc: A Fusion of Human-Centered Design and Agile Development for the Explainability of AI Techniques Based on Clinical and Nutritional Data

Authors:

Michelangelo Sofo, Giuseppe Labianca, Giancarlo Mauri and Francesco Combierati

Abstract: In recent years, the scientific community's interest in the exploratory analysis of biomedical data has increased exponentially. Considering the field of research of nutritional biologists, the curative process, based on the analysis of clinical data, is a very delicate operation due to the fact that there are multiple solutions for the management of pathologies in the food sector (for example can recall intolerances and allergies, management of cholesterol metabolism, diabetic pathologies, arterial hypertension, up to obesity and breathing and sleep problems). In this regard, in this research work a system was created capable of evaluating various dietary regimes for the aforementioned specific patient pathologies. The system is based on a mathematical-numerical model and is tailored for the real working needs of experts in human nutrition, endocrinologists and cardiologists, using the Human-Centered Design (HCD - ISO 9241 210). DietAdhoc is a decision support system to the aforementioned specialists for patients of both sexes (from 18 years of age) developed with an innovative agile methodology. The software consists in drawing up the biomedical and clinical profile of the specific patient by applying two implementation approaches on nutritional data.
Download

Paper Nr: 45
Title:

Petri Net Modeling of Root Hair Response to Phosphate Starvation in Arabidopsis Thaliana

Authors:

Amber H. B. Fijn, Casper H. Stiekema, Stijn Boere, Marijan Višić and Lu Cao

Abstract: Limited availability of inorganic phosphate (Pi) in soil is an important constraint to plant growth. In order to understand better the underlying mechanism of plant response to Pi, the response to phosphate starvation in Arabidopsis thaliana was investigated through use of Petri Nets, a formal language suitable for bio-modeling. A. thaliana displays a range of responses to deal with Pi starvation, but special attention was paid to root hair elongation in this study. A central player in the root hair pathway is the transcription factor ROOT HAIR DEFECTIVE 6-LIKE 4 (RSL4), which has been found to be upregulated during the Pi stress. A Petri Net was created which could simulate the gene regulatory networks responsible for the increase in root hair length, as well as the resulting increase in root hair length. Notably, discrepancies between the model and the literature suggested an important role for RSL2 in regulating RSL4. In the future, the net designed in the current study could be used as a platform to develop hypotheses about the interaction between RSL2 and RSL4.
Download

Paper Nr: 46
Title:

Modeling HIF-ILK Interaction Using Continuous Petri Nets

Authors:

Viktor Gilin, Sanne Laauwen, Yuying Xia, Noria Yousufi and Lu Cao

Abstract: Oxygen concentration in tumor micro-environment is a well-established signal that can induce aggressive cancer behaviour. In particular, low oxygen levels (hypoxia) activate the Hypoxia-Inducible Factor(HIF) pathway which has an array of target systems. One of these systems is Integrin-Linked Kinase (ILK) pathway, which influences key signaling pathways for cell survival, proliferation, and migration. Hence, this paper aimed to explore the interconnection between these two pathways. Using the Petri net modeling tool Snoopy, an established HIF network model was transformed to be a continuous Petri net. Subsequently, the network was expanded to incorporate a feedback element from the ILK pathway to HIF, based on gene expression data. The resulting model conserved the oxygen switch response of the original HIF model and positively amplified HIF’s output. Therefore, this model provides a starting point for establishing a system reflecting crucial effect on hypoxia-induced cancer behavior, and could potentially serve as a basis for future drug development.
Download

Paper Nr: 91
Title:

The Devil Lies in the Details for Species Coexistence Stability in Ablated and Unablated Five-Species Evolutionary Spatial Cyclic Games

Authors:

Dave Cliff

Abstract: I present exploration of key results from the series of ablated five-species “Rock-Paper-Scissors-Lizard-Spock” minimal agent-based evolutionary models of biodiversity introduced by Zhong, Zhang, Li, Dai, & Yang in their 2022 paper “Species coexistence in spatial cyclic game of five species” (Chaos, Solitons and Fractals, 156: 111806). At the heart of Zhong et al.’s model of ecosystems coexistence is the Elementary Step (ES) algorithm in which one or two neighboring agents are chosen at random to engage in one or more interactions selected at random from the set {COMPETE, REPRODUCE, MOVE}. Minor revisions to the ES algorithm have recently been introduced to make it more computationally efficient in space and in time, and one contribution of this paper is to demonstrate that switching to this “Revised ES” (RES) has the unexpected effect of totally changing the outcomes of Zhong et al.’s simulation experiments. I present analysis of the RES-based experiments which shows that the key difference is that in RES the likelihood of an agent moving is decoupled from the likelihood of an agent reproducing or competing, whereas in the original ES the likelihoods of the three possible actions are interdependently coupled. The fact that such relatively minor changes to the ES algorithm result in such major changes in the experiment outcomes casts significant doubt on the extent to which results such as those from Zhong et al.’s original experiments can be trusted as truly representative of the real-world biological systems that they are supposedly intended to model. Python source-code available on GitHub can be used to replicate the results presented here.
Download

Paper Nr: 95
Title:

Highly Interpretable Prediction Models for SNP Data

Authors:

Robin Nunkesser

Abstract: Binary prediction models for SNP data are often used in genetic association studies. The models should be highly interpretable to help understand possible underlying biological mechanisms. logicFS, GPAS, and logicDT can yield highly interpretable prediction models. The automatic prevention of overfitting requires improvement, however. We propose using GPAS as a black box and applying an external method for automatic model selection. We present an approach using the GPAS algorithm as a black box and show initial results on simulated data. The simulation is designed to motivate research to extend GPAS with automatic model selection. Additionally, we give an outlook on further extensions of GPAS.
Download

Paper Nr: 146
Title:

Enhanced Graph Representations of Chromatin Interaction Networks

Authors:

Edgars Celms, Lelde Lace, Gatis Melkus, Peteris Rucevskis, Sandra Silina, Andrejs Sizovs and Juris Viksna

Abstract: We present a novel extension of graph representations of chromatin interaction networks incorporating edge directionality and vertex label assignments and focus on patterns defined by different types of 3-cliques that can occur under such assignments. 3-cliques are chosen for their simplicity and comparative ubiquity in chromatin interaction graphs; also, our previous work indicates a certain level of biological significance that can be assigned to them. Here we explore statistical distributions of different types of directionality- and strand-based 3-cliques patterns in two well-curated promoter capture Hi-C data sets and observe that the pattern distributions strongly deviate from random, if they are considered in the context of a number of additional features. These observations provide a good justification for further exploration of chromatin interaction data sets using network representations that include edge directionality and node label assignments and indicate a possibility that these annotation features could be related to some specific underlying biological mechanisms.
Download

Paper Nr: 157
Title:

Clustering Single-Cell RNA-seq Data: Impact of Data Binarization on Algorithmic Performance

Authors:

Karolina Widzisz, Mateusz Kania, Joanna Zyla and Andrzej Polański

Abstract: The primary objective of this study was to test the hypothesis that the binary information on the presence or absence of gene expression can sufficiently capture the inherent heterogeneity within single-cell RNA sequencing (scRNA-seq) data. This hypothesis posits that even without detailed expression levels, valuable insights about cellular diversity can be obtained. Utilizing this method can be particularly advantageous when analyzing large datasets, a common scenario in the field of scRNA-seq. In this paper, we evaluate clustering performance and cluster separability of a variety of model-based algorithms and distance-based methods to analyze both expression level data and threshold-encoded binarized data. We examined the performance of the Bernoulli-mixture model and Gaussian-mixture model. These were compared against traditional clustering techniques such as hierarchical clustering, K-means, and the Louvain algorithm on a range of scRNA-seq datasets. Our findings reveal that mixture models exhibit a lower dependence on the specific dataset compared to distance-based methods. Mixture models, particularly, demonstrate greater efficacy in accurately estimating the number of clusters present within the data. Among analyzed algorithms, the Bernoulli-mixture model stands out, outperforming distance-based approaches in several key aspects. Binary data, presence/absence of gene expression, seem to be indeed adequate to capture the heterogeneity of scRNA-seq data when clustering with methods specifically designed for binary datasets. The implications of this finding are significant, as it opens up new possibilities for simplifying data analysis in scRNA-seq studies without compromising the accuracy of the results.
Download

Paper Nr: 173
Title:

Improving Antibody-Antigen Interaction Prediction Through Flexibility with ESMFold

Authors:

Sara Joubbi, Giuseppe Maccari, Giorgio Ciano, Alessio Micheli, Paolo Milazzo and Duccio Medini

Abstract: Antibodies are essential proteins in the immune system due to their capacity to bind to specific antigens. They also play a critical role in developing vaccines and treatments for infectious diseases. Their complex structure, with variable regions for antigen binding and flexible hinge regions, presents challenges for accurate computational modeling. Recent advancements in deep learning have revolutionized protein structure prediction. Despite these advancements, predicting interactions between antibodies and antigens remains challenging, mainly due to the flexibility of antibodies and the dynamic nature of binding events. This study uses fingerprint-based methodologies that incorporate ESMFold confidence scores as a flexibility feature to model Ab-Ag interactions. Our methodology shows how including flexibility has improved Ab-Ag interactions by 3%, achieving an AUC-ROC of 91%.
Download

Paper Nr: 184
Title:

On the Detection of Retinal Image Synthesis Obtained Through Generative Adversarial Network

Authors:

Marcello Di Giammarco, Antonella Santone, Mario Cesarelli, Fabio Martinelli and Francesco Mercaldo

Abstract: Adversarial machine learning on medical imaging is one of the many applications for which the evaluation of Generative Adversarial Networks in the medical field has demonstrated remarkable interest. This paper proposes a method in which Convolutional neural Networks are trained and tested on the binary classification of real and fake images, generated through generative adversarial networks. In this paper, the considered experiments are on the RGB fundus retina images of the human eye. Results highlight networks with optimal performance, and completely recognize real/fake classification; however, on the other hand, other networks misclassify the images, enhancing security and reliability problems.
Download

Paper Nr: 197
Title:

Machine Learning-Based Prediction of Key Genes Correlated to the Subretinal Lesion Severity in a Mouse Model of Age-Related Macular Degeneration

Authors:

Kuan Yan, Yue Zeng, Dai Shi, Ting Zhang, Dmytro Matsypura, Mark C. Gillies, Ling Zhu and Junbin Gao

Abstract: Age-related macular degeneration (AMD) is a major cause of blindness in older adults, severely affecting vision and quality of life. Despite advances in understanding AMD, the molecular factors driving the severity of subretinal scarring (fibrosis) remain elusive, hampering the development of effective therapies. This study introduces a machine learning-based framework to predict key genes that are strongly correlated with lesion severity and to identify potential therapeutic targets to prevent subretinal fibrosis in AMD. Using an original RNA sequencing (RNA-seq) dataset from the diseased retinas of JR5558 mice, we developed a novel and specific feature engineering technique, including pathway-based dimensionality reduction and gene-based feature expansion, to enhance prediction accuracy. Two iterative experiments were conducted by leveraging Ridge and ElasticNet regression models to assess biological relevance and gene impact. The results highlight the biological significance of several key genes and demonstrate the framework’s effectiveness in identifying novel therapeutic targets. The key findings provide valuable insights for advancing drug discovery efforts and improving treatment strategies for AMD, with the potential to enhance patient outcomes by targeting the underlying genetic mechanisms of subretinal lesion development.
Download

Paper Nr: 208
Title:

Electroencephalograph Based Emotion Estimation Using Multidimensional Directed Coherence and Neural Networks Under Noise

Authors:

Haruka Torii, Takamasa Shimada, Osamu Sakata and Tadanori Fukami

Abstract: In recent years, research focused on emotion based on brain activity has yielded significant insights into the mechanisms of information processing in the brain. Leveraging this knowledge, studies have increasingly examined the effects of various stimuli on human emotions, with applications progressing in fields such as neuromarketing. However, existing methods for emotion estimation from EEG—such as those using power spectra, correlations, or deep learning—face challenges in generalizability due to considerable individual differences. In this study, we applied multidimensional directed coherence analysis, which can analyze the flow of information in the brain, to the measured EEG data. Following this, we trained a neural network using data augmented with noise to simulate individual differences, proposing a method capable of generalizable emotion inference. As a result, we achieved an average accuracy rate of 99.91% on training data and 90.83% on test data.
Download

Paper Nr: 209
Title:

From Polar Bears to People: The Role of Ethnic Genetic Variation in Thermoregulation and Heat-Related Health Risk

Authors:

Alexandra Baumann, Jakob Thiel, Nina Haffer, Shailendra Gupta and Markus Wolfien

Abstract: As climate change increases the frequency and severity of acute heat events, it is crucial to determine factors for appropriate healthcare strategies and predictive models. Previously, it was stated that socioeconomic factors primarily play a role in heat-related illness risk. Analogous to the polar bear’s unique adaptations to the cold, humans exhibit distinct genetic traits shaped by their migration to diverse climates. This position paper hypothesizes that genetic differences among human ethnic groups, in addition to socioeconomic and other factors, also contribute to variations in thermoregulation and influence susceptibility to heat-related diseases. To understand genetic adaptations across human ethnicities (initially European and African), we propose a genetic association analysis of single nucleotide polymorphisms (SNPs) in genes associated with thermoregulation. An assessment of changes in thermoregulation gene regulation networks will be possible by conducting a functional pathway analysis. Expected outcomes include identifying differences in SNP distributions of thermoregulation-associated genes across ethnicities. Challenges such as the underrepresentation of African populations in genomic databases must also be addressed. This research aims to provide a foundational understanding of genetic contributions to heat adaptation, guiding the development of personalized, equitable healthcare responses to climate-induced heat stress.
Download

Paper Nr: 257
Title:

2.5D Deep Learning Model with Attention Mechanism for Pancreas Segmentation on CT Scans

Authors:

Idriss Cabrel Tsewalo Tondji, Francesca Lizzi, Camilla Scapicchio and Alessandra Retico

Abstract: The accurate segmentation of the irregularly shaped pancreas on Computed Tomography (CT) scans, consisting of 3D images, is a crucial but difficult part of the diagnostic evaluation of pancreatic cancer. Most current deep learning (DL) methods tend to focus on the pancreas or the tumor separately. However, these methods often struggle because the pancreas region is affected by the surrounding complex and low-contrast tissues. This study aims to develop a DL system for pancreas segmentation to improve early detection of tumors. Recognizing the powerful performance with computational demands of 3D models, 2D models appear to be an alternative in terms of computation with a lightweight structure but they disregard the inter-slice correlation which affects the performance. To address this, we are investigating the effect of the data preparation by using a multi-channel input image on the pancreas segmentation model, which is referred to as 2.5D model. Our method is developed and evaluated on a widely used public dataset, the Medical Segmentation Decathlon (MSD) pancreas segmentation dataset. The 2.5D model demonstrates superior performance, reaching a Dice Similarity Coefficient of 75.1%, surpassing the 2D segmentation model, while remaining computationally efficient.
Download

Paper Nr: 317
Title:

PathDisGene: Discovering Informative Gene Groups for Disease Diagnosis Using Pathway-Disease Associations and a Grouping, Scoring, Modeling-Based Machine Learning Approach

Authors:

Emma Qumsiyeh, Burcu Bakir-Gungo and Malik Yousef

Abstract: Recently, machine learning and various feature selection techniques have become popular for understanding the relationship between genes, molecular pathways, and diseases. Integrating existing domain knowledge into biological data analysis has demonstrated considerable potential for finding new biomarkers with translational uses. This paper presents PathDisGene, an innovative machine-learning tool that integrates existing domain knowledge by utilizing a Grouping-Scoring-Modeling (G-S-M) approach to discover associations among gene-pathway-disease. The first step in PathDisGene is the grouping component that associates genes according to their biological associations with diseases and pathways. This component uses the Comparative Toxicogenomics Database (CTD). Subsequently, the scoring component is applied to score each group and the highest-ranked groupings are then used to train the classifier. We test PathDisGene on ten GEO datasets and demonstrate its performance, where most of them are with high accuracy, sensitivity, specificity, and AUC values across various diseases. The tool's capacity to recognize new pathway-disease associations and uncover connections between pathways and diseases along their associated genes underscores its potential as a significant asset in promoting precision medicine and systems biology.
Download

Paper Nr: 327
Title:

Towards a New Method for Perturbation Analysis in Biochemical Pathways Based on Network Propagation

Authors:

Niccolò De Paolis and Paolo Milazzo

Abstract: We introduce a preliminary definition of a network propagation approach to tackle the problem of investigating the spread of mutation-induced perturbations in biochemical pathways relying on network topology alone, without the need for quantitative details such as species concentrations and kinetic constants of reactions required to model the trajectory of species concentrations using stochastic and deterministic algorithms. These details are not always available, hence our goal is to provide insights regarding the impact of perturbations even when lacking such information. We further describe the definition of a synthetic dataset the algorithm has been tested on and provide the results obtained in terms of accuracy in identifying the effect of the perturbation on each species. Finally, a real world scenario is presented in order to show the potential of the proposed solution and spot its possible limitations.
Download

Paper Nr: 130
Title:

Metaheuristics Applied to Optimal Feature Selection for Accurate Predictive Models in Smart Health: A Case Study on Hypotension Prediction in Hemodialysis Patients

Authors:

María Santamera-Lastras, Felipe Cisternas Caneo, José Carlos Barrera García, Broderick Crawford, Alberto Garcéz-Jiménez, Diego Rodríguez Puyol and José Manuel Gómez Pulido

Abstract: Predicting potential hypotensive episodes in chronic kidney disease patients before dialysis is crucial for preventing complications and ensuring effective treatment. This study explores the use of metaheuristic algorithms to optimize the complex task of selecting the feature set needed to develop a highly accurate predictive machine learning model for detecting hypotension, based on clinical parameters from the dialyzer and analytical data from blood tests. Metaheuristic algorithms offer a robust approach to optimal variable selection and subsequent dimensionality reduction, leading to more accurate machine learning predictor models. In this context, two relevant metaheuristic algorithms were employed: Particle Swarm Optimization (PSO) and Grey Wolf Optimizer (GWO), along with the supervised machine learning algorithm XGBoost. The results demonstrate that the application of metaheuristic techniques not only reduces the feature count from 67 to 36 variables but also improves classifier performance, thereby enhancing the prediction of hypotensive events. Specifically, the optimized model achieved an Area Under the Curve (AUC) of 0.76 and a recall of 0.764 for the minority class (hypotensive episodes) in chronic kidney disease patients prior to hemodialysis procedures.
Download

Paper Nr: 142
Title:

Optimized Machine Learning Models for Accurate Detection of Candida spp. in Gram-Stained Microscopy Images

Authors:

Daniella Peña-Pedraza, Manuel Linares-Rufo, Franciso-Javier Bueno-Guillén, Carlos García-Bertolín, Harold Bermúdez-Marval, Alberto Garcéz-Jiménez and José-Manuel Gómez-Pulido

Abstract: Image interpretation is crucial for clinical microbiological diagnosis. Manual reading of Gram-stained slides is timeconsuming and complex. The use of artificial vision systems based on machine learning (ML) models can speed up the detection of microorganisms of interest, ensuring that irrelevant images are discarded and those relevant for the diagnosis are considered. This automated pre-diagnosis process significantly reduces the burden on microbiologists and their subjectivity. It is possible to automate the morphological study of Gram-stained samples, through the identification of yeast-like cells or filamentous structures indicative of Candida spp. Several multiclass Machine Learning models (XGBoost, Artificial Neural Networks, and K-Nearest Neighbors) have been implemented, taking the relevant morphological characteristics from the images. The dataset dimensionality is optimized with innovative metaheuristic algorithms using objective functions for the specific detection of yeast and hypha. The best-optimized model achieved an accuracy of 0.821, precision macro of 0.827, recall macro of 0.790, and F1 macro of 0.806.
Download

Paper Nr: 156
Title:

Identifying Inflammatory Bowel Disease-Associated Gene Ontology Groups Using Biological Knowledge-Based Machine Learning

Authors:

Nur Sebnem Ersoz, Burcu Bakir-Gungor and Malik Yousef

Abstract: Inflammatory bowel disease (IBD) is a chronic inflammatory disease. Complex pathogenesis behind disease formation and progression necessitated the development of new approaches to identify disease related genes and affected gene ontology (GO) terms. In this study, via exploiting GeNetOntology method, we have reanalysed a gene expression data including Crohn’s Disease (CD) and Ulcerative colitis (UC) patients and controls. In order to identify IBD related genes and affected GO terms, GeNetOntology uses GO hierarchy as the biological domain knowledge while performing gene expression data analysis based on machine learning (ML). In the training part of GeNetOntology, genes annotated with selected ontology terms have been utilized to perform a two-class classification task which generates an important set of ontologies as an output. IBD data samples were obtained from peripheral blood and colon tissue. In order to investigate the effect of different collection sites, IBD data have been analysed under different scenarios; i.e., all samples, only tissue samples and only blood samples. Experimental findings indicate that GeNetOntology can successfully determine significant disease-related ontology terms. Performance of the model slightly differs according to the sample source. Via analysing the differences/commonalities between affected gene ontologies under different scenarios, we attempt to enlighten IBD development mechanisms.
Download

Paper Nr: 193
Title:

Algorithms for Fast and Efficient Sequence Alignment

Authors:

Valeriy Titarenko and Sofya Titarenko

Abstract: Aligning short sequences against long reference genomes is a challenging task in bioinformatics, particularly when working with the human reference genome. The difficulty increases further when addressing metage-nomic problems or dealing with damaged sequences. One way to enhance efficiency in this process is by using spaced seeds to identify potential alignment locations. Hashing is a foundational technique in many sequence alignment software applications, and improving the speed of hashing can significantly boost the computational efficiency of sequence alignment. Many hashing strategies were developed decades ago, and with recent advances in hardware, it is necessary to reevaluate these approaches. Our research aims to develop optimal tools for sequence alignment that leverage the latest hardware advancements. In this work, we will introduce a new fast hashing strategy focused on optimal data storage, which minimizes the number of logical and bit-shifting SIMD operations required. We will also profile these algorithms against existing sequence alignment tools.
Download

Paper Nr: 200
Title:

Machine Learning Methods for Phenotype Prediction from High-Dimensional, Low Population Aquaculture Data

Authors:

Giovanni Faldani, Enrico Rossignolo, Eleonora Signor, Alessio Longo, Sara Faggion, Luca Bargelloni, Matteo Comin and Cinzia Pizzi

Abstract: Recent research has increasingly focused on classification rules within the big data framework, yet many bioinformatics applications still address prediction problems that involve small-sample, high-dimensional data. In phenotype prediction, especially with the rise of large-scale genomic data, a central challenge arises from handling high-dimensional datasets where the number of genetic features (such as SNPs) far exceeds the sample size. A significant example of such high-dimensional, low-sample datasets is found in aquaculture, a rapidly growing sector within global food production and a crucial source of high-quality protein. This study uses data from an experiment performed on European seabass as a test case, focusing on predicting resistance to Viral Nervous Necrosis (VNN) as a specific phenotype of interest. We explore a range of machine learning techniques to address the complexities of high-dimensional data, from established methods like gradient boosting, SVM, and deep learning to newer approaches. This paper evaluates various methods for associating SNPs with phenotypic traits, benchmarking their performance on challenging aquaculture genomic data to provide insight into the effectiveness of these techniques.
Download

Paper Nr: 243
Title:

A Framework for Reproducible Parallel DNA String Matching

Authors:

Ricardo Regis Cavalcante Chaves and Alba Cristina Magalhaes Alves de Melo

Abstract: In this paper, we propose an output reproducible framework that executes parallel sequence comparison algorithms, computing the edit distance. The framework generates tables/graphics and linear regressions that can be used to predict the execution times. We also propose parallel OpenMP versions of serial algorithms (DP and UK) used to compute the edit distance. Our parallel DP is antidiagonal block-based, where the blocks that belong to the same set of antidiagonals are assigned to different threads, which compute them simultaneously. Due to data dependencies presented by UK, we opted to compute each antidiagonal in parallel. Our results with synthetic and real sequences show that the parallel UK version presents the best execution times in most cases. We also show that the linear regressions generated by our tool have errors below 10%, on average.
Download