Publications | David Guijo-Rubio

April 2026 Pattern Recognition

Splitting criteria for ordinal decision trees: an experimental study

Rafael Ayllón-Gavilán, Francisco José Martínez-Estudillo, David Guijo-Rubio, César Hervás-Martínez, Pedro Antonio Gutiérrez

Ordinal Classification (OC) addresses those classification tasks where the labels exhibit a natural order. Unlike nominal classification, which treats all classes as mutually exclusive and unordered, OC takes the ordinal relationship into account, producing more accurate and relevant results. This is particularly critical in applications where the magnitude of classification errors has significant consequences. Despite this, OC problems are often tackled using nominal methods, leading to suboptimal solutions. Although decision trees are among the most popular classification approaches, ordinal tree-based approaches have received less attention when compared to other classifiers. This work provides a comprehensive survey of ordinal splitting criteria, standardising the notations used in the literature to enhance clarity and consistency. Three ordinal splitting criteria, Ordinal Gini (OGini), Weighted Information Gain (WIG), and Ranking Impurity (RI), are compared to the nominal counterparts of the first two (Gini and information gain), by incorporating them into a decision tree classifier. An extensive repository considering 45 publicly available OC datasets is presented, supporting the first experimental comparison of ordinal and nominal splitting criteria using well-known OC evaluation metrics. The results have been statistically analysed, highlighting that OGini stands out as the best ordinal splitting criterion to date, reducing the mean absolute error achieved by Gini by more than 3.02 %. To promote reproducibility, all source code developed, a detailed guide for reproducing the results, the 45 OC datasets, and the individual results for all the evaluated methodologies are provided.

DOI URL

January 2026 Liver International

Excluding Ascites From the GEMA-Na Score Does Not Impact Outcome Predictions in Liver Transplant Candidates

Manuel Luis Rodríguez-Perálvarez, Antonio Manuel Gómez-Orellana, Avik Majumdar, Geoffrey W. McCaughan, María Kalafateli, Rhiannon Taylor, Gloria De La Rosa, María Victoria Aguilera, Mikel Gastaca, Carmen Cepeda-Franco, María Luisa Ortiz, Jordi Colmenero, Alejandra Otero, Rocío González Grande, Alba Cachero, Esther Molina Pérez, Mónica Barreales, Rosa Martín Mateos, María Rodríguez-Soler, Mario Romero, Cristina Dopazo, Carmen Alonso Martín, Elena Otón, Luisa González Diéguez, María Dolores Espinosa, Ana Arias Milla, Gerardo Blanco Fernández, Sara Lorente, Antonio Cuadrado Lavín, Miguel Sogbe, David Guijo-Rubio, César Hervás-Martínez, Emmanuel Tsochatzis

ABSTRACT Background and Aims Although GEMA-Na outperforms MELD 3.0 for liver allocation, concerns about the subjectivity of its ascites component persist. We compared the performance of a GEMA-Na iteration that excludes ascites with other allocation scores. Approach and Results A multinational cohort study was conducted, including adult candidates for elective liver transplantation in the UK (2010–2020), Australia (1998–2020), and Spain (2016–2021). The primary outcome was mortality or delisting for sickness within 90days. The prognostic impact of ascites was evaluated using multivariable Cox’s regression. Discrimination was assessed using Harrell’s c-statistics (Hc). The study included 15391 patients (28.5% women). The prevalence of the primary outcome was 5.8% in the UK, 5.3% in Australia, and 4.7% in Spain. The presence and severity of ascites was associated with an incremental risk of the primary outcome: 3.3% without ascites, 5.8% with mild ascites, and 7.7% with moderate–severe ascites (p<0.001). Removal of ascites from the GEMA-Na score resulted in a one-point reduction in 18% of patients (52.4% of patients with moderate–severe ascites). GEMA-Na without ascites showed only a marginal decrease in discrimination (Hc=0.755 vs. Hc=0.753; p=0.007) but still significantly outperformed MELD 3.0 (Hc=0.734; p<0.001) and MELD-Na (Hc=0.737; p<0.001). In women, GEMA-Na with and without ascites demonstrated comparable discrimination (Hc=0.784 vs. Hc=0.783; p=0.61), both outperforming MELD 3.0 (Hc=0.750; p<0.001), and MELD-Na (Hc=0.749; p<0.001). Conclusions Despite the prognostic impact of ascites among liver transplant candidates, GEMA-Na without ascites outperformed other scores in predicting wait-list outcomes and may be used wherever the inclusion of ascites is considered too subjective.

DOI

December 2025 Energy and AI

Enhancing wind speed prediction in wind farms through ordinal classification

Antonio Manuel Gómez-Orellana, Marta Vega-Bayo, David Guijo-Rubio, Jorge Pérez-Aracil, Víctor Manuel Vargas, Pedro Antonio Gutiérrez, Luis Prieto-Godino, Sancho Salcedo-Sanz, César Hervás-Martínez

This paper presents and evaluates two novel ordinal classification methods for wind speed prediction, considering three prediction time-horizons: 1h, 4h, and 8h. To address the problem, wind speed values are discretised into four classes, critical for wind farm management. Each class represents essential information for wind farm production, ranging from very low wind speeds to extreme wind speed events and the corresponding production conditions, facilitating operational decisions for wind farm operators. Ordinal classifiers are more suitable than nominal methods to tackle this problem. The study’s primary objective is to compare recently proposed ordinal classifiers for addressing the challenges of wind speed prediction with a focus on extreme wind conditions, which are responsible for many turbine shutdowns. Hourly wind speed measurements from a Spanish wind farm and predictor variables from the European Centre for Medium-Range Weather Forecasts Reanalysis v5 (ERA5 Reanalysis) model are used. The proposed methods include an Artificial Neural Network (ANN) model implementing the Cumulative Link Model as an ordinal output function (MLP-CLMO), which emphasises overall performance, and an ANN model optimised using a soft labelling technique based on triangular distributions (MLP-TO), which excels at handling extreme class performance. The results demonstrate the superiority of both approaches over other nominal and ordinal methods across performance metrics that account for the unbalanced nature and ordinality of the data. MLP-CLMO excels in overall and ordinal performance, while MLP-TO demonstrates superior handling of the extreme class predictions.

DOI URL

September 2025 Atmosphere

Artificial Intelligence-Based Methods and Algorithms in Fog and Atmospheric Low-Visibility Forecasting

Sancho Salcedo-Sanz, David Guijo-Rubio, Jorge Pérez-Aracil, César Peláez-Rodríguez, Antonio Manuel Gomez-Orellana, Pedro Antonio Gutiérrez-Peñaa

The accurate prediction of atmospheric low-visibility events due to fog, haze or atmospheric pollution is an extremely important problem, with major consequences for transportation systems, and with alternative applications in agriculture, forest ecology and ecosystems management. In this paper, we provide a comprehensive literature review and analysis of AI-based methods applied to fog and low-visibility events forecasting. We also discuss the main general issues which arise when dealing with AI-based techniques in this kind of problem, open research questions, novel AI approaches and data sources which can be exploited. Finally, the most important new AI-based methodologies which can improve atmospheric visibility forecasting are also revised, including computational experiments on the application of ordinal classification approaches to a problem of low-visibility events prediction in two Spanish airports from METAR data.

DOI

June 2025 International Work-Conference on Artificial Neural Networks

Knee Osteoarthritis Severity Grading Using Soft Labelling and Ordinal Classification

Francisco Bérchez-Moreno, Víctor M. Vargas, Antonio M. Gómez-Orellana, David Guijo-Rubio, Luca Romeo, Edoardo Conti, Pedro A. Gutiérrez, César Hervás-Martínez

Knee Osteoarthritis (KOA) is a progressive joint disease characterised by stiffness and pain, among others. It is generally diagnosed by evaluating physical symptoms, medical history, and screening techniques. However, conventional methods are often subjective, posing a significant challenge to the early grading of disease progression. To address this issue and support clinical decision-making, we propose an ordinal deep learning framework to study the optimal combination of loss functions, and output methodologies with soft labelling approaches, for automatic KOA severity grading based on Kellgren and Lawrence scores from X-ray images. A total of 20 combinations (2 loss functions x 2 output methodologies x 5 soft labelling approaches) are compared in this study, using a public dataset. The optimal configuration uses the categorical cross entropy loss, a cumulative link model as output, and a beta distribution for soft labelling. The results achieved demonstrate the efficacy of these ordinal classification approaches.

DOI URL

June 2025 International Work-Conference on Artificial Neural Networks

Hybrid Dropout for Deep Ordinal Classification

Francisco Bérchez-Moreno, Francisco Moreno-Cano, David Guijo-Rubio, Víctor M. Vargas, Pedro A. Gutiérrez, César Hervás-Martínez

This paper presents a new application of a hybrid dropout technique for Ordinal Classification (OC), based on a novel regularisation method. Unlike standard dropout, which ignores class ordering, this hybrid dropout integrates ordinal information by adjusting neurons dropout probabilities based on their correlation with target labels. We evaluate its effectiveness using a ResNet18 architecture over three new OC datasets and compare it with the standard dropout approach and with an architecture with no dropout. Results show that the hybrid dropout consistently achieves the best performance across multiple well-known metrics (1-off, QWK, MAE, AMAE, and RPS), while also reducing prediction variability. Statistical analysis using the Wilcoxon signed-rank test confirms its robustness, obtaining 21 significant wins out of 30 comparisons, with no losses. These results highlight the importance of designing regularisation strategies that consider the problems ordinal structure, demonstrating that hybrid dropout effectively enhances generalisation and predictive accuracy.

DOI URL

May 2025 EGU General Assembly 2025

Wind speed prediction using ordinal classification: an analysis of extreme values

David Guijo-Rubio, Antonio M. Gómez-Orellana, Víctor M. Vargas, Rafael Ayllón-Gavilán, Laura Cornejo-Bueno, Francisco Moreno-Cano, César Hervás-Martínez, Sancho Salcedo-Sanz, Pedro A. Gutiérrez

Wind speed forecasting represents a significant challenge in the global transition to sustainable energy systems. Wind energy, characterised by zero greenhouse gas emissions and relatively low cost, is a renewable resource that depends heavily on meteorological conditions, which are inherently variable and unpredictable. This variability and intermittency present substantial obstacles to ensuring a consistent power supply, underscoring the importance of accurate wind speed prediction as a critical area of research. Among the various approaches explored to address this challenge, machine learning (ML) has emerged as a prominent solution. ML includes methodologies such as regression (predicting continuous values of wind speed) and nominal classification (predicting discrete categories of wind speed). In nominal classification, wind speeds are discretised into classes to provide essential information for wind farm operations. In this study, wind speeds are categorised into four classes: 1) very low speeds, 2) moderate speeds, 3) high speeds, and 4) extreme wind speeds. While both very low and extreme speeds result in no power generation, this work focuses on the extreme wind speed class, as these events often necessitate turbine shutdowns to prevent structural damage. To address the challenges of wind speed forecasting with a focus on extreme wind events, we propose the use of ordinal classification, a ML paradigm specifically designed for tasks where output categories exhibit a natural order, as is the case in this work. This study evaluates hourly wind speed predictions for a wind farm in Spain, using data collected over more than 15 years. Additionally, input features include meteorological variables such as temperature, wind components (u and v), and sea level pressure, among others. Forecasts are generated for three time horizons (1h, 4h, and 8h) to provide sufficient lead time for mitigating risks associated with extreme wind conditions. Two ordinal classification models based on artificial neural networks (ANNs) are analysed: 1) an ANN coupled with the cumulative link model (CLM), and 2) an ANN using a soft labelling optimisation technique. Additionally, other competitive ordinal and nominal classification methods are included for comparative analysis. The results demonstrate that the proposed models outperform a number of nominal and ordinal classification methods. The ANN coupled with CLM delivers superior overall performance across all four classes, while the ANN employing the soft labelling approach achieves higher accuracy in predicting extreme wind speed events. These findings underscore the potential of ordinal classification to enhance wind speed forecasting, contributing to more effective wind farm management and the broader integration of renewable energy sources.

DOI URL

May 2025 Integrated Computer-Aided Engineering

Simultaneous multi-step wind speed prediction on multiple farms using multi-task deep learning

Rafael Ayllón-Gavilán, Antonio Manuel Gómez-Orellana, Víctor Manuel Vargas, David Guijo-Rubio, Jorge Pérez-Aracil, Sancho Salcedo-Sanz, Pedro Antonio Gutiérrez, César Hervás-Martínez

In this paper, we present the MUSONet model, which leverages information from different sources (in this case, wind farms) to perform a multi-step wind speed prediction. The main goal of this approach is improving the global prediction accuracy, specifically at longer prediction horizons. Thus, the proposed model is able to simultaneously predict the wind speed at three different prediction horizons (6h, 12h, and 24h), across three different wind farms located in Spain. We also evaluate the performance of the presented methodology by considering three different activation functions for hidden neurons in the neural network: Sigmoid, ReLU, and ELUs+2L. The results show that the proposed multi-source approach improves the performance of the single-source counterpart for the longer prediction horizons (12h and 24h). In addition, the proposed multi-source method reduces by over 30 % the number of parameters compared to three single-source models (in this case, one model per wind farm), resulting in a simpler solution for the problem addressed and requiring much lower computational resources.

DOI URL

May 2025 EGU General Assembly 2025

Improving Resilience to Wind Extremes: An AI-Driven Approach

Laura Cornejo-Bueno, César Peláez-Rodríguez, David Guijo-Rubio, Cosmin Marina, Sancho Salcedo-Sanz

Wind extremes, encompassing both high-intensity wind events and periods of diminished wind activity, pose multifaceted challenges across sectors such as renewable energy production, infrastructure resilience, and environmental risk management. These phenomena, driven by complex interactions within atmospheric systems, demand innovative analytical and predictive approaches. This study explores the application of artificial intelligence (AI) to address these challenges, focusing on its potential to enhance the identification of patterns, improve forecasting accuracy, and integrate diverse meteorological datasets. By leveraging machine learning models and exploring their adaptability to wind-related datasets, this work aims to outline a framework for robust analysis and prediction of wind extremes. The versatility of AI techniques in handling the complexities of wind extremes positions them as pivotal tools for improving preparedness and resilience in various sectors.

DOI URL

April 2025 Transplantation

Machine Learning Algorithms in Controlled Donation After Circulatory Death Under Normothermic Regional Perfusion: A Graft Survival Prediction Model

Rafael Calleja, Marcos Rivera, David Guijo-Rubio, Amelia J Hessheimer, Gloria De La Rosa, Mikel Gastaca, Alejandra Otero, Pablo Ramírez, Andrea Boscà-Robledo, Julio Santoyo, Others

Background: Several scores have been developed to stratify the risk of graft loss in controlled donation after circulatory death (cDCD). However, their performance is unsatisfactory in the Spanish population, where most cDCD livers are recovered using normothermic regional perfusion (NRP). Consequently, we explored the role of different machine learning-based classifiers as predictive models for graft survival. A risk stratification score integrated with the model of end-stage liver disease score in a donor-recipient (D-R) matching system was developed. Methods: This retrospective multicenter cohort study used 539 D-R pairs of cDCD livers recovered with NRP, including 20 donor, recipient, and NRP variables. The following machine learning-based classifiers were evaluated: logistic regression, ridge classifier, support vector classifier, multilayer perceptron, and random forest. The endpoints were the 3- and 12-mo graft survival rates. A 3- and 12-mo risk score was developed using the best model obtained. Results: Logistic regression yielded the best performance at 3 mo (area under the receiver operating characteristic curve = 0.82) and 12 mo (area under the receiver operating characteristic curve = 0.83). A D-R matching system was proposed on the basis of the current model of end-stage liver disease score and cDCD-NRP risk score. Conclusions: The satisfactory performance of the proposed score within the study population suggests a significant potential to support liver allocation in cDCD-NRP grafts. External validation is challenging, but this methodology may be explored in other regions.

DOI URL

March 2025 Clinical Gastroenterology and Hepatology

Gender-Equity Model for Liver Allocation using Artificial Intelligence (GEMA-AI) for waiting list liver transplant prioritization

Antonio Manuel Gómez-Orellana, Manuel Luis Rodríguez-Perálvarez, David Guijo-Rubio, Pedro Antonio Gutiérrez, Avik Majumdar, Geoffrey W McCaughan, Rhiannon Taylor, Emmanuel a Tsochatzis, César Hervás-Martínez

Background & Aims: We aimed to develop and validate an artificial intelligence score (GEMA-AI) to predict liver transplant (LT) waiting list outcomes using the same input variables contained in existing models. Methods: Cohort study including adult LT candidates enlisted in the United Kingdom (2010-2020) for model training and internal validation, and in Australia (1998-2020) for external validation. GEMA-AI combined international normalized ratio, bilirubin, sodium, and the Royal Free Glomerular Filtration Rate in an explainable Artificial Neural Network. GEMA-AI was compared with GEMA-Na, MELD 3.0, and MELD-Na for waiting list prioritization. Results: The study included 9,320 patients: training cohort n=5,762, internal validation cohort n=1,920, and external validation cohort n=1,638. The prevalence of 90-days mortality or delisting for sickness ranged 5.3%-6% across different cohorts. GEMA-AI showed better discrimination than GEMA-Na, MELD-Na and MELD 3.0 in the internal and external validation cohorts, with a more pronounced benefit in women and in patients showing at least one extreme analytical value. Accounting for identical input variables, the transition from a linear to a non-linear score (from GEMA-Na to GEMA-AI) resulted in a differential prioritization of 6.4% of patients within the first 90 days and would potentially save one in 59 deaths overall, and one in 13 deaths among women. Results did not substantially change when ascites was not included in the models. Conclusions: The use of explainable machine learning models may be preferred over conventional regression-based models for waiting list prioritization in LT. GEMA-AI made more accurate predictions of waiting list outcomes, particularly for the sickest patients.

DOI URL

March 2025 Neurocomputing

dlordinal: A Python package for deep ordinal classification

Francisco Bérchez-Moreno, Rafael Ayllón-Gavilán, Vı́ctor Manuel Vargas, David Guijo-Rubio, César Hervás-Martínez, Juan Carlos Fernández, Pedro Antonio Gutiérrez

dlordinal is a new Python library that unifies many recent deep ordinal classification methodologies available in the literature. Developed using PyTorch as underlying framework, it implements the top performing state-of-the-art deep learning techniques for ordinal classification problems. Ordinal approaches are designed to leverage the ordering information present in the target variable. Specifically, it includes loss functions, various output layers, dropout techniques, soft labelling methodologies, and other classification strategies, all of which are appropriately designed to incorporate the ordinal information. Furthermore, as the performance metrics to assess novel proposals in ordinal classification depend on the distance between target and predicted classes in the ordinal scale, suitable ordinal evaluation metrics are also included. dlordinal is distributed under the BSD-3-Clause license and is available at https://github.com/ayrna/dlordinal

DOI URL

February 2025 IEEE Transactions on Cybernetics

Convolutional-and Deep Learning-Based Techniques for Time Series Ordinal Classification

Rafael Ayllón-Gavilán, David Guijo-Rubio, Pedro Antonio Gutiérrez, Anthony Bagnall, César Hervás-Martı́nez

Time-series classification (TSC) covers the supervised learning problem where input data is provided in the form of series of values observed through repeated measurements over time, and whose objective is to predict the category to which they belong. When the class values are ordinal, classifiers that take this into account can perform better than nominal classifiers. Time-series ordinal classification (TSOC) is the field bridging this gap, yet unexplored in the literature. There are a wide range of time-series problems showing an ordered label structure, and TSC techniques that ignore the order relationship discard useful information. Hence, this article presents the first benchmarking of TSOC methodologies, exploiting the ordering of the target labels to boost the performance of current TSC state of the art. Both convolutional- and deep-learning-based methodologies (among the best performing alternatives for nominal TSC) are adapted for TSOC. For the experiments, a selection of 29 ordinal problems has been made. In this way, this article contributes to the establishment of the state of the art in TSOC. The results obtained by ordinal versions are found to be significantly better than current nominal TSC techniques in terms of ordinal performance metrics, outlining the importance of considering the ordering of the labels when dealing with this kind of problems.

DOI URL

December 2024 Applied Ocean Research

Fuzzy-based ensemble methodology for accurate long-term prediction and interpretation of extreme significant wave height events

César Peláez-Rodríguez, Jorge Pérez-Aracil, Antonio Manuel Gómez-Orellana, David Guijo-Rubio, Víctor Manuel Vargas, Pedro Antonio Gutiérrez, César Hervás-Martínez, Sancho Salcedo-Sanz

Providing an accurate prediction of Significant Wave Height (SWH), and specially of extreme SWH events, is crucial for coastal engineering activities and holds major implications in several sectors as offshore renewable energy. With the aim of overcoming the challenge of skewness and imbalance associated with the prediction of these extreme SWH events, a fuzzy-based cascade ensemble of regression models is proposed. This methodology allows to remarkably improve the predictive performance on the extreme SWH values, by using different models specialised in different ranges on the target domain. The method’s explainability is enhanced by analysing the contribution of each model, aiding in identifying those predictor variables more characteristic for the detection of extreme SWH events. The methodology has been validated tackling a long-term SWH prediction problem, considering two case studies over the southwest coast of the United States of America. Both reanalysis data, providing information on various meteorological factors, and SWH measurements, obtained from the nearby stations and the station under examination, have been considered. The goodness of the proposed approach has been validated by comparing its performance against several machine learning and deep learning regression techniques, leading to the conclusion that fuzzy ensemble models perform much better in the prediction of extreme events, at the cost of a slight deterioration in the rest of the samples. The study contributes to advancing the SWH prediction field, specially, to understanding the behaviour behind extreme SWH events, critical for various sectors reliant on oceanic conditions.

DOI URL

September 2024 Knowledge-Based Systems

EBANO: A novel Ensemble BAsed on uNimodal Ordinal classifiers for the prediction of significant wave height

Víctor M Vargas, Antonio M Gómez-Orellana, Pedro a Gutiérrez, César Hervás-Martínez, David Guijo-Rubio

In this study, we present EBANO (Ensemble BAsed on uNimodal Ordinal classifiers), which is a novel ensemble approach of ordinal classifiers that includes four soft labelling approaches along with an ordinal logistic regression model. These models are integrated within the ensemble using a new aggregation methodology that automatically weights each individual classifier using a randomised search algorithm. In addition, the proposed EBANO methodology is applied to tackle short-term prediction of Significant Wave Height (SWH). Thus, we employ EBANO using a diverse set of eight datasets derived from reanalysis data and buoy-recorded SWH measurements. To approach the problem from an ordinal classification perspective, the SWH values are discretised into five ordered classes by applying hierarchical clustering. EBANO is compared with each of the individual classifiers integrated in the proposed ensemble along with a different ensemble technique termed HESCA. Both the average results and the ranks obtained show the superiority of EBANO over the compared methodologies, being more pronounced in the metrics that account for the imbalance present in the datasets considered. Finally, a statistical analysis is performed, confirming the statistical significance of the observed differences in all comparisons. This analysis underscores the effectiveness of EBANO in addressing the problem of SWH prediction, showcasing its excellence.

DOI URL

September 2024 Journal of Machine Learning Research

aeon: a Python toolkit for learning from time series

Matthew Middlehurst, Ali Ismail-Fawaz, Antoine Guillaume, Christopher Holder, David Guijo-Rubio, Guzal Bulatova, Leonidas Tsaprounis, Lukasz Mentel, Martin Walter, Patrick Schäfer, Tony Bagnall

aeon is a unified Python 3 library for all machine learning tasks involving time series. The package contains modules for time series forecasting, classification, extrinsic regression and clustering, as well as a variety of utilities, transformations and distance measures designed for time series data. aeon also has a number of experimental modules for tasks such as anomaly detection, similarity search and segmentation. aeon follows the scikit-learn API as much as possible to help new users and enable easy integration of aeon estimators with useful tools such as model selection and pipelines. It provides a broad library of time series algorithms, including efficient implementations of the very latest advances in research. Using a system of optional dependencies, aeon integrates a wide variety of packages into a single interface while keeping the core framework with minimal dependencies. The package is distributed under the 3-Clause BSD license and is available at https://github.com/aeon-toolkit/aeon.

URL

August 2024 eClinicalMedicine

GEMA-Na and MELD 3.0 severity scores to address sex disparities for accessing liver transplantation: a nationwide retrospective cohort study

Manuel Luis Rodríguez-Perálvarez, Gloria De La Rosa, Antonio Manuel Gómez-Orellana, Maríaa Victoria Aguilera, Teresa Pascual Vicente, Sheila Pereira, María Luisa Ortiz, Giulia Pagano, Francisco Suarez, Rocío González Grande, Others

Background: The Gender-Equity Model for liver Allocation corrected by serum sodium (GEMA-Na) and the Model for End-stage Liver Disease 3.0 (MELD 3.0) could amend sex disparities for accessing liver transplantation (LT). We aimed to assess these inequities in Spain and to compare the performance of GEMA-Na and MELD 3.0. Methods: Nationwide cohort study including adult patients listed for a first elective LT (January 2016–December 2021). The primary outcome was mortality or delisting for sickness within the first 90 days. Independent predictors of the primary outcome were evaluated using multivariate Cox’s regression with adjusted relative risks (RR) and 95% confidence intervals (95% CI). The discrimination of GEMA-Na and MELD 3.0was assessed using Harrell c-statistics (Hc). Findings: The study included 6071 patients (4697 men and 1374 women). Mortality or delisting for clinical deterioration occurred in 286 patients at 90 days (4.7%). Women had reduced access to LT (83.7% vs. 85.9%; p = 0.037) and increased risk of mortality or delisting for sickness at 90 days (adjusted RR = 1.57 [95% CI 1.09–2.28]; p = 0.017). Female sex remained as an independent risk factor when using MELD or MELD-Na but lost its significance in the presence of GEMA-Na or MELD 3.0. Among patients included for reasons other than tumours (n = 3606; 59.4%), GEMA-Na had Hc = 0.753 (95% CI 0.715–0.792), which was higher than MELD 3.0 (Hc = 0.726 [95% CI 0.686–0.767; p = 0.001), showing both models adequate calibration. Interpretation: GEMA-Na and MELD 3.0 might correct sex disparities for accessing LT, but GEMA-Na provides more accurate predictions of waiting list outcomes and could be considered the standard of care for waiting list prioritization.

DOI URL

August 2024 Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

A hands-on introduction to time series classification and regression

Anthony Bagnall, Matthew Middlehurst, Germain Forestier, Ali Ismail-Fawaz, Antoine Guillaume, David Guijo-Rubio, Chang Wei Tan, Angus Dempster, Geoffrey I Webb

Time series classification and regression are rapidly evolving fields that find areas of application in all domains of machine learning and data science. This hands on tutorial will provide an accessible overview of the recent research in these fields, using code examples to introduce the process of implementing and evaluating an estimator. We will show how to easily reproduce published results and how to compare a new algorithm to state-of-the-art. Finally, we will work through real world examples from the field of Electroencephalogram (EEG) classification and regression. EEG machine learning tasks arise in medicine, brain-computer interface research and psychology. We use these problems to how to compare algorithms on problems from a single domain and how to deal with data with different characteristics, such as missing values, unequal length and high dimensionality. The latest advances in the fields of time series classification and regression are all available through the aeon toolkit, an open source, scikit-learn compatible framework for time series machine learning which we use to provide our code examples.

DOI URL

July 2024 Engineering Applications of Artificial Intelligence

ORFEO: Ordinal classifier and Regressor Fusion for Estimating an Ordinal categorical target

Antonio Manuel Gómez-Orellana, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez, Víctor Manuel Vargas

In this paper we present a novel methodology, referenced as ORFEO (Ordinal classifier and Regressor Fusion for Estimating an Ordinal categorical target), to enhance the performance in ordinal classification problems for which the latent variable is observable. ORFEO is an artificial neural network model incorporating two outputs, one for ordinal classification, using the cumulative link model, and one for regression, using a linear model. Both outputs are simultaneously optimised considering a loss function that linearly combines both classification and regression losses. The main motivation behind developing the proposed approach is to enhance the performance of a standard ordinal classifier. This improvement is facilitated by considering the regression output, which allows the model to differentiate between patterns within the same category. The ORFEO model is applied to two problems in the field of marine and ocean engineering: short-term prediction of both significant wave height and flux of energy. Both problems are addressed considering four different coastal zones of the United States of America, using 13 datasets formed by buoys measurements and reanalysis data. A comprehensive comparison against 20 methodologies, including regression and nominal/ordinal classification approaches is performed, by using diverse nominal and ordinal performance metrics. Ranks achieved indicate that ORFEO outperforms all the compared methodologies in terms of all the performance measures, demonstrating the efficacy and robustness of the proposal. Finally, a statistical analysis is conducted, concluding that there are statistically significant differences across ordinal and nominal performance metrics in favour of the proposed ORFEO model.

DOI URL

June 2024 Journal of Hepatology

OS-024 The gender-equity model for liver allocation built on artificial intelligence (GEMA-AI) improves outcome predictions among liver transplant candidates

Manuel Rodríguez-Perálvarez, Antonio M Gómez-Orellana, David Guijo-Rubio, Pedro Gutierrez, Avik Majumdar, Geoff McCaughan, Rhiannon Taylor, César Hervás, Emmanuel Tsochatzis

Background and aims: Current prioritization models for liver transplantation (LT) are hampered by their linear nature, which does not fully capture the severity of patients with extreme analytical values. We aimed to develop and externally validate the Gender-Equity Model for Liver Allocation built on Artificial Intelligence (GEMA-AI) to predict waiting list outcomes in candidates for LT. Method: Cohort study including adult patients who qualified for elective LT in the United Kingdom (2010–2020, model training and internal validation) and in two Australian institutions (1998–2020, external validation). The Gender-Equity Model for Liver Allocation corrected by serum sodium (GEMA-Na) was compared with GEMA-AI, which was built on a shallow artificial neural network optimized by neuroevolution and hybridization using the same input variables. The primary outcome was mortality or delisting for sickness within the first 90 days. Discrimination was assessed by Harrell’s c-statistic (Hc). This study was funded by the Instituto de Salud Carlos III (Project no. PI22/00312) and co-funded by the European Union. Results: The study population comprised 9, 320 patients: training cohort n = 5,762, internal validation cohort n = 1, 920, and external validation cohort n = 1, 638. The prevalence of 90-days mortality or delisting for sickness ranged from 5.3% to 6% in the different cohorts. The transition from a linear to a non-linear score (from GEMA-Na to GEMA-AI) resulted in improved discrimination in the internal and external validation cohorts (Hc = 0.766 vs Hc = 0.781; p = 0.035 and Hc = 0.774 vs Hc = 0.793; p = 0.003, respectively), being these differences more pronounced in women (Hc = 0.802 vs Hc = 0.826; p = 0.048 and Hc = 0.796 vs Hc = 0.836; p = 0.014, respectively). Among 1,403 patients (39.4% of the merged validation cohorts) who showed at least one extreme analytical value, GEMA-AI had Hc = 0.823 compared to Hc = 0.797 ( p = 0.036). In this subpopulation, GEMA-AI showed a good calibration (chi-square = 5.04; p = 0.66) whereas GEMA-Na did not (chi-square = 18.94; p = 0.015). A meaningful change ≥2 score prioritization points occurred in 27.8% of patients (11.4% upgraded, 16.4% downgraded). Differential prioritization would occur in 6.4% of the available organs within the first 90 days and would save one in 59 deaths overall, and one in 13 deaths among women. Conclusion: The use of non-linear explainable machine learning models may improve predictions of waiting list outcomes, particularly in the sickest patients showing extreme analytical values. Their use should be preferred over Cox’s regression-based models.

DOI URL

June 2024 Conference of the Spanish Association for Artificial Intelligence

O-Hydra: A Hybrid Convolutional and Dictionary-Based Approach to Time Series Ordinal Classification

Rafael Ayllón-Gavilán, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martı́nez

Time Series Ordinal Classification (TSOC) is a yet unexplored field with a substantial projection in following years given its applicability to numerous real-world problems and the possibility to obtain more consistent prediction than nominal Time Series Classification (TSC). Specifically, TSOC involves time series data along with an ordinal categorical output. That is, there is a natural order relationship among the labels associated with the time series. TSOC is a subfield of nominal TSC, with the main distinction being that TSOC exploits the ordinality of the labels to boost the performance. Two categories within the TSC taxonomy are dictionary-based and convolution-based methodologies, each representing competing approaches presented in the literature. In this study, we adapt the Hybrid Dictionary-Rocket Architecture (Hydra) approach, which incorporates elements from the two previous categories, to TSOC, resulting in O-Hydra. For the experiments, we have included a collection of 21 ordinal problems sourced from two well-known archives. O-Hydra has been benchmarked against its nominal counterpart, Hydra, as well as against two state-of-the-art approaches in the two previous categories, TDE and ROCKET, including their ordinal counterparts, O-TDE and O-ROCKET, respectively. The results achieved by the ordinal versions significantly outperformed those of current nominal TSC techniques. This underscores the significance of incorporating the label ordering when addressing such problems.

DOI URL

June 2024 International Work-Conference on the Interplay Between Natural and Artificial Computation

Energy Flux Prediction Using an Ordinal Soft Labelling Strategy

Antonio M Gómez-Orellana, Vı́ctor M Vargas, Pedro a Gutiérrez, Jorge Pérez-Aracil, Sancho Salcedo-Sanz, César Hervás-Mart\ńez, David Guijo-Rubio

This paper addresses the problem of short-term energy flux prediction. For this purpose, we propose the use of an ordinal classification neural network model optimised using the triangular regularised categorical cross-entropy loss, termed MLP-T. This model is based on a soft labelling strategy, that replaces the crisp 0/1 labels on the loss computation with soft versions encoding the ordinal information. This soft label encoding leverages the inherent ordering between categories to reduce the cost of ordinal classification errors and improve model generalisation performance. Specifically, the soft labels for each target class are derived from triangular probability distributions. To assess the performance of MLP-T, six datasets built from buoy measurements and reanalysis data have been used. MLP-T has been compared to nominal and ordinal classification techniques in terms of four performance metrics. MLP-T achieved an outstanding performance across all datasets and performance metrics, securing the best mean results. Despite the imbalanced nature of the problem, which makes the ordinal classification task notably difficult, MLP-T achieved good results in all classes across all datasets, including the underrepresented classes. Remarkably, MLP-T was the only approach that correctly classified at least one instance of the minority class in all datasets. Furthermore, MLP-T secured the top rank in all cases, confirming its suitability for the problem addressed.

DOI URL

June 2024 Conference of the Spanish Association for Artificial Intelligence

Age Estimation Using Soft Labelling Ordinal Classification Approaches

Vı́ctor M Vargas, Antonio M Gómez-Orellana, David Guijo-Rubio, Francisco Bérchez-Moreno, Pedro Antonio Gutiérrez, César Hervás-Mart\ńez

This work explores the use of diverse soft labelling approaches recently proposed in the literature to address four distinct problems in age estimation. This kind of challenge can be considered an ordinal classification problem in machine learning or deep learning areas, as it exhibits a natural order among categories, reflecting the underlying age ranges defining each category. Soft labelling represents a machine learning approach in which, instead of assigning a single label to each instance in the dataset, a probability distribution across a range of labels is allocated. Soft labelling approaches prove particularly effective for age estimation due to the inherent uncertainty and continuity in age progression, which makes accurate age estimation from physical appearance difficult. Unlike categorical labels, age is a continuous variable that evolves over time. Thus, unlike hard labelling, soft labelling more effectively acknowledges the continuity and uncertainty inherent in age estimation. The experiments conducted in this study facilitate the comparison of soft labelling approaches against the nominal baseline. Results demonstrate superior performance of soft labelling approaches. Moreover, the statistical analysis reveals that use of a beta distribution to define soft labels yields the best results.

DOI URL

May 2024 Data Mining and Knowledge Discovery

Unsupervised feature based algorithms for time series extrinsic regression

David Guijo-Rubio, Matthew Middlehurst, Guilherme Arcencio, Diego Furtado Silva, Anthony Bagnall

Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, DrCIF is the only one that significantly outperforms a standard rotation forest regressor.

DOI URL

May 2024 XII Congreso Cientı́fico de Investigadores en Formación

Predicción a corto plazo de la energía undimotriz mediante un enfoque ordinal de etiquetado suave

Antonio Manuel Gómez-Orellana, David Guijo-Rubio, Pedro Antonio Gutiérrez

In this study, the problem of short-term prediction of wave energy is approached from an ordinal perspective. For this purpose, we propose the use of a soft labeling approach, which replaces the 0/1 encoding of the classes with soft labels. Specifically, such soft labels or probabilities are obtained from triangular probability distributions, which better distribute the probabilities: the target class receives higher probability than its adjacents classes. Therefore, integrating the soft labeling approach into the loss function modifies the computation of the error during model optimization, now taking into account the ordinal information encoded in the soft labels. For this purpose, an ordinal classification artificial neural network model, termed RNA-T, is implemented and optimized using a categorical cross-entropy loss function that integrates the proposed soft labeling approach. The performance of the RNA-T model is analyzed using two datasets built from reanalysis data and measurements recorded by marine buoys. The RNA-T model is compared, in terms of two ordinal performance metrics, with two standard ordinal classification techniques. The results confirm the superiority of the RNA-T model over the compared techniques.

May 2024 2024 Joint International Congress of ILTS, ELITA and LICAGE

Explainable artificial neural networks improve the performance of the Gender-Equity Model for liver Allocation (GEMA) to prioritize candidates for liver transplantation

Manuel Luis Rodríguez-Perálvarez, Antonio Manuel Gómez-Orellana, Avik Majumdar, Michael Bailey, Geoffrey W McCaughan, Paul Gow, Marta Guerrero, Rhiannon Taylor, David Guijo-Rubio, César Hervás-Martínez, Emmanuel a Tsochatzis

Background: Current prioritization models for liver transplantation (LT) are hampered by their linear nature, which does not fully capture the severity of patients with extreme analytical values. Methods: Cohort study including adult patients who qualified for elective LT in the United Kingdom (2010-2020, model training and internal validation) and in two Australian institutions (1998-2020, external validation). The Gender-Equity model for Liver Allocation corrected by serum sodium (GEMA-Na) was compared with a shallow artificial neural network optimized by neuroevolution and hybridization (GEMA-AI) using the same input variables. The primary outcome was mortality or delisting for sickness within the first 90 days. Discrimination was assessed by Harrell’s c-statistic (Hc). Results: The study population comprised 9,320 patients: training cohort n=5,762, internal validation cohort n=1,920, and external validation cohort n=1,638. The prevalence of 90-days mortality or delisting for sickness ranged from 5.3% to 6% in the different cohorts. The transition from a linear to a non-linear score (from GEMA-Na to GEMA-AI) resulted in improved discrimination in the internal and external validation cohorts (Hc=0.766 vs Hc=0.781; p=0.035 and Hc=0.774 vs Hc=0.793; p=0.003, respectively), being these differences more pronounced in women (Hc=0.802 vs Hc=0.826; p=0.048 and Hc=0.796 vs Hc=0.836; p=0.002, respectively). Among 1,403 patients (39.4% of the merged validation cohorts) who showed at least one extreme analytical value, GEMA-AI had Hc=0.823 compared to Hc=0.797 (p=0.036). A meaningful change ≥2 score prioritization points occurred in 27.8% of patients (11.4% upgraded, 16.4% downgraded). Differential prioritization would occur in 6.4% of the available organs within the first 90 days and would save one in 59 deaths overall, and one in 13 deaths among women. Conclusions: The use of non-linear explainable machine learning models may improve predictions of waiting list outcomes, particularly in the sickest patients showing extreme analytical values. Their use should be preferred over Cox’s regression-based models.

January 2024 International Work-Conference on the Interplay Between Natural and Artificial Computation

Medium-and Long-Term Wind Speed Prediction Using the Multi-task Learning Paradigm

Antonio M Gómez-Orellana, Víctor M Vargas, David Guijo-Rubio, Jorge Pérez-Aracil, Pedro a Gutiérrez, Sancho Salcedo-Sanz, César Hervás-Martínez

Renewable energies, particularly wind energy, have gain significant attention due to their clean and inexhaustible nature. Despite their commendable efficiency and minimal environmental impact, wind energy faces challenges such as stochasticity and intermittence. Machine learning methods offer a promising avenue for mitigating these challenges, particularly through wind speed prediction, which is crucial for optimising wind turbine performance. One important aspect to consider, regardless of the methodology employed and the approach used to tackle the wind speed prediction problem, is the prediction horizon. Most of the works in the literature have been designed to deal with a single prediction horizon. However, in this study, we propose a multi-task learning framework capable of simultaneously handling various prediction horizons. For this purpose, Artificial Neural Networks (ANNs) are considered, specifically a multilayer perceptron. Our study focuses on medium- and long-term prediction horizons (6 h, 12 h, and 24 h ahead), using wind speed data collected over ten years from a Spanish wind farm, along with ERA5 reanalysis variables that serve as input for the wind speed prediction. The results obtained indicate that the proposed multi-task model performing the three prediction horizons simultaneously can achieve comparable performance to corresponding single-task models while offering simplicity in terms of lower complexity, which includes the number of neurons and links, as well as computational resources.

DOI URL

January 2024 International Work-Conference on the Interplay Between Natural and Artificial Computation

Data Augmentation Techniques for Extreme Wind Prediction Improvement

Marta Vega-Bayo, Antonio Manuel Gómez-Orellana, Víctor Manuel Vargas Yun, David Guijo-Rubio, Laura Cornejo-Bueno, Jorge Pérez-Aracil, Sancho Salcedo-Sanz

Predicting extreme winds (i.e. winds speed equal to or greater than 25 m/s), is essential to predict wind power and accomplish safe and efficient management of wind farms. Although feasible, predicting extreme wind with supervised classifiers and deep learning models is particularly difficult because of the low frequency of these events, which leads to highly unbalanced training datasets. To tackle this challenge, in this paper different traditional data augmentation techniques, such as random oversampling, SMOTE, time series data warping and multidimensional data warping, are used to generate synthetic samples of extreme wind and its predictors, such as previous samples of wind speed and meteorological variables of the surroundings. Results show that using data augmentation techniques with the right oversampling ratio leads to improvement in extreme wind prediction with most machine learning and deep learning models tested. In this paper, advanced data augmentation techniques, such as Variational Autoencoders (VAE), are also applied and evaluated when inputs are time series.

DOI URL

November 2023 Information Sciences

Generalised triangular distributions for ordinal deep learning: Novel proposal and optimisation

Víctor Manuel Vargas, Antonio Manuel Durán-Rosal, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez

Deep learning techniques for ordinal classification have recently gained significant attention. Predicting an ordinal variable, that is, a variable that demonstrates a natural relationship between categories, is of relevance for a number of real-world problems in various fields of knowledge. For example, a medical diagnosis can occur at different stages of the disease. Applying standard classifiers to ordered labels can lead to errors in distant categories, when errors in an ordinal problem ideally tend to be produced in adjacent classes because of their similarity. To address this issue, we propose a soft labelling approach based on generalised triangular distributions, which are asymmetric and different for each class. The parameters of these distributions are determined using a metaheuristic and are specifically adapted to the given problem. Moreover, this approach enables the model to avoid errors in distant classes (e.g. classifying a patient with a severe disease as healthy). A comprehensive comparison was performed using eight datasets and five performance metrics. The main advantage of the proposed soft-labelling approach is that it adapts the distributions to each problem, resulting in greater flexibility and better performance. The results and statistical analysis show that the proposed methodology significantly outperforms all other methods.

DOI URL

November 2023 29o CONGRESO Sociedad Española de Trasplante Hepático

Emparejamiento donante-receptor durante la donación en asistolia controlada con perfusión regional normotérmica: papel de los clasificadores de machine learning como modelos predictivos de la supervivencia del injerto

Rafael Calleja, Marcos Rivera, Amelia J. Hessheimer, Beatriz Domínguez-Gil, David Guijo-Rubio, Constantino Fontdevila, Mikel Gastaca Mateo, Manuel Gómez, Pablo Ramírez-Romero, Rafael López-Andújar, Lánder-Atutxa, Julio Santoyo, Miguel Ángel Gómez-Bravo, Jesús María-Villar-Del-Moral, Carolina González-Abos, Bárbara Vidal, Laura Lladó-Garriga, José Roldán, Carlos Jiménez-Romero, Víctor Sánchez-Turrión, Gonzalo Rodríguez-Laiz, José Ángel López-Baena, Ramón Charco-Torra, Evaristo Varo, Fernando Rotellar, Manuel Barrera, Juan Carlos Rodríguez-San Juan, Gerardo Blanco Fernández, Javier Nuño, David Pacheco-Sánchez, Elisabeth Coll, Gloria De La Rosa, César Hervás-Martínez, Javier Briceño

Objetivo: Nuestro objetivo es explorar el posible papel de diferentes clasificadores de Machine Learning como modelos de predicción de la supervivencia del injerto en la donación controlada en asistolia bajo perfusión regional normotérmica (PRN). A partir del mejor modelo obtenido, se establecerá un score de riesgo, que se integrará en un sistema automático de emparejamiento donante-receptor (D-R). Método: Se realizó un estudio de cohortes multicéntrico retrospectivo español utilizando 543 emparejamientos D-R bajo PRN. Se incluyeron inicialmente 17 variables de donante y receptor. Los clasificadores evaluados fueron la regresión logística (RL), el clasificador Ridge (RC), máquinas de vectores de soporte (SVC), el perceptrón multicapa (MLP) y los Random Forest (RF). El end-point del estudio fue la supervivencia del injerto a 3 y 12 meses. Se desarrolló un score de riesgo para la pérdida del injerto a los 12 meses basado en el mejor modelo obtenido. Este score fue comparado con el UK DCD score en nuestra población. Resultado: De los algoritmos de ML evaluados, la RL (AUC 0,838) superó a los demás clasificadores en la predicción de la supervivencia del injerto tanto a 3 como a 12 meses. Nuestro score, obtuvo una performance superior al UK DCD score (índice C 0,837 frente a 0,565; p<0,05) en la predicción de supervivencia del injerto a 12 meses. Las variables de mayor peso en el modelo fueron la isquemia fría y el retrasplante. El score fue integrado con el sistema MELD en un sistema de emparejamiento automático basado en reglas. Conclusión: En la DAC bajo NRP, los clasificadores como MLP o RF fueron superados por la RL. Estos hallazgos estuvieron condicionados por las características del conjunto de datos. Nuestro score obtuvo un rendimiento satisfactorio en nuestra población. El sistema de emparejamiento D-R podría asistir en la decisión de asignación del injerto.

November 2023 Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR

Barycentre Averaging for the Move-Split-Merge Time Series Distance Measure

Christopher Holder, David Guijo-Rubio, Anthony Bagnall

Distance functions play a core role in many time series machine learning algorithms for tasks such as clustering, classification and regression. Time series often require bespoke distance functions because small offsets in time can lead to large distances between series that are conceptually similar. Elastic distances compensate for misalignment by creating a path through a cost matrix by warping and/or editing time series. Time series are most commonly clustered with partitional algorithms such as k-means and k-medoids using elastic distance measures such as Dynamic Time Warping (DTW). The distance is used to assign cases to the closest cluster representative. k-means requires the averaging of time series to find these representative centroids. If DTW is used to assign membership, but the arithmetic mean is used to find centroids, k-means performance degrades significantly. An averaging technique specific to DTW, called DTW Barycentre Averaging (DBA), overcomes the averaging problem. However, can only be used with DTW. As such alternative distance functions such as Move-Split-Merge (MSM) are forced to use the arithmetic mean to compute new centroids and suffer similar degraded performance as k-means-DTW without DBA. To address this we propose a averaging method for MSM distance, MSM Barycentre Averaging (MBA) and show that when used to find centroids it significantly improves MSM based k-means and is better than commonly used alternatives

DOI

October 2023 Applied Soft Computing

An Evolutionary Artificial Neural Network approach for spatio-temporal wave height time series reconstruction

David Guijo-Rubio, Antonio M. Durán-Rosal, Antonio M. Gómez-Orellana, Juan C. Fernández

This paper proposes a novel methodology for recovering missing time series data, a crucial task for subsequent Machine Learning (ML) analyses. The methodology is specifically applied to Significant Wave Height (SWH) time series in the field of marine engineering. The proposed approach involves two phases. Firstly, the SWH time series for each buoy is independently reconstructed using three transfer function models: regression-based, correlation-based, and distance-based. The distance-based transfer function exhibits the best overall performance. Secondly, Evolutionary Artificial Neural Networks (EANNs) are utilised for the final recovery of each time series, using as inputs highly correlated buoys that have been intermediately recovered. The EANNs are evolved considering two metrics, the novel squared error relevance area, which balances the importance of extreme and around-mean values, and the well-known mean squared error. The study considers SWH time series data from 15 buoys in two coastal zones in the United States. The results demonstrate that the distance-based transfer function is generally the best transfer function, and that EANNs outperform a range of state-of-the-art ML techniques in 12 out of the 15 buoys, with a number of connections comparable to linear models. Furthermore, the proposed methodology outperforms the two most popular approaches for time series reconstruction, BRITS and SAITS, for all buoys except one. Therefore, the proposed methodology provides a promising approach, which may be applied to time series from other fields, such as wind or solar energy farms in the field of green energy.

DOI URL

September 2023 International Workshop on Advanced Analytics and Learning on Temporal Data

Clustering Time Series with k-Medoids Based Algorithms

Christopher Holder, David Guijo-Rubio, Anthony Bagnall

Time Series Clustering (TSCL) involves grouping unlabelled time series into homogeneous groups. A popular approach to TSCL is to use the partitional clustering algorithms k-means or k-medoids in conjunction with an elastic distance function such as Dynamic Time Warping (DTW). We explore TSCL using nine different elastic distance measures. Both partitional algorithms characterise clusters with an exemplar series, but use different techniques to do so: k-means uses an averaging algorithm to find an exemplar, whereas k-medoids chooses a training case (medoid). Traditionally, the arithmetic mean of a collection of time series was used with k-means. However, this ignores any offset. In 2011, an averaging technique specific to DTW, called DTW Barycentre Averaging (DBA), was proposed. Since, k-means with DBA has been the algorithm of choice for the majority of partition-based TSCL and much of the research using medoids-based approaches for TSCL stopped. We revisit k-medoids based TSCL with a range of elastic distance measures. Our results show k-medoids approaches are significantly better than k-means on a standard test suite, independent of the elastic distance measure used. We also compare the most commonly used alternating k-medoids approach against the Partition Around Medoids (PAM) algorithm. PAM significantly outperforms the default k-medoids for all nine elastic measures used. Additionally, we evaluate six variants of PAM designed to speed up TSCL. Finally, we show PAM with the best elastic distance measure is significantly better than popular alternative TSCL algorithms, including the k-means DBA approach, and competitive with the best deep learning algorithms.

DOI

September 2023 Expert Systems with Applications

Cluster analysis and forecasting of viruses incidence growth curves: Application to SARS-CoV-2

Miguel Dı́az-Lozano, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez

The sanitary emergency caused by COVID-19 has compromised countries and generated a worldwide health and economic crisis. To provide support to the countries’ responses, numerous lines of research have been developed. The spotlight was put on effectively and rapidly diagnosing and predicting the evolution of the pandemic, one of the most challenging problems of the past months. This work contributes to the existing literature by developing a two-step methodology to analyze the transmission rate, designing models applied to territories with similar pandemic behavior characteristics. Virus transmission is considered as bacterial growth curves to understand the spread of the virus and to make predictions about its future evolution. Hence, an analytical clustering procedure is first applied to create groups of locations where the virus transmission rate behaved similarly in the different outbreaks. A curve decomposition process based on an iterative polynomial process is then applied, obtaining meaningful forecasting features. Information of the territories belonging to the same cluster is merged to build models capable of simultaneously predicting the 14-day incidence in several locations using Evolutionary Artificial Neural Networks. The methodology is applied to Andalusia (Spain), although it is applicable to any region across the world. Individual models trained for a specific territory are carried out for comparison purposes. The results demonstrate that this methodology achieves statistically similar, or even better, performance for most of the locations. In addition to being extremely competitive, the main advantage of the proposal lies in its complexity cost reduction. The total number of parameters to be estimated is reduced up to 93.51% for the short term and 93.31% for the mid-term forecasting, respectively. Moreover, the number of required models is reduced by 73.53% and 58.82% for the short- and mid-term forecasting horizons.

DOI

June 2023 International Work-Conference on Artificial Neural Networks

Ordinal Classification Approach for Donor-Recipient Matching in Liver Transplantation with Circulatory Death Donors

Marcos Rivera-Gavilán, Vı́ctor Manuel Vargas, Pedro Antonio Gutiérrez, Javier Briceño, César Hervás-Martínez, David Guijo-Rubio

This paper tackles the Donor-Recipient (D-R) matching for Liver Transplantation (LT). Typically, D-R matching is performed following the knowledge of a team of experts guided by the use of a prioritisation system. One of the most extended, the Model for End-stage Liver Disease (MELD), aims to decrease the mortality in the waiting list. However, it does not take into account the result of the transplant. In this sense, with the aim of developing a system able to bear in mind the survival benefit, we propose to treat the problem as an ordinal classification one. The organ survival will be predicted at four different thresholds. The results achieved demonstrate that ordinal classifiers are capable of outperforming nominal approaches in the state-of-the-art. Finally, this methodology can help experts make more informed decisions about the appropriateness of assigning a recipient for a specific donor, maximising the probability of post-transplant survival in LT.

DOI

June 2023 International Work-Conference on Artificial Neural Networks

Gramian Angular and Markov Transition Fields Applied to Time Series Ordinal Classification

Vı́ctor Manuel Vargas, Rafael Ayllón-Gavilán, Antonio Manuel Durán-Rosal, Pedro Antonio Gutiérrez, César Hervás-Martínez, David Guijo-Rubio

This work presents a novel ordinal Deep Learning (DL) approach to Time Series Ordinal Classification (TSOC) field. TSOC consists in classifying time series with labels showing a natural order between them. This particular property of the output variable should be exploited to boost the performance for a given problem. This paper presents a novel DL approach in which time series are encoded as 3-channels images using Gramian Angular Field and Markov Transition Field. A soft labelling approach, which considers the probabilities generated by a unimodal distribution for obtaining soft labels that replace crisp labels in the loss function, is applied to a ResNet18 model. Specifically, beta and triangular distributions have been applied. They have been compared against three state-of-the-art deep learners in the Time Series Classification (TSC) field using 13 univariate and multivariate time series datasets. The approach considering the triangular distribution (O-GAMTFT) outperforms all the techniques benchmarked.

DOI

June 2023 International Work-Conference on Artificial Neural Networks

A Dictionary-based approach to Time Series Ordinal Classification

Rafael Ayllón-Gavilán, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez

Time Series Classification (TSC) is an extensively researched field from which a broad range of real-world problems can be addressed obtaining excellent results. One sort of the approaches performing well are the so-called dictionary-based techniques. The Temporal Dictionary Ensemble (TDE) is the current state-of-the-art dictionary-based TSC approach. In many TSC problems we find a natural ordering in the labels associated with the time series. This characteristic is referred to as ordinality, and can be exploited to improve the methods performance. The area dealing with ordinal time series is the Time Series Ordinal Classification (TSOC) field, which is yet unexplored. In this work, we present an ordinal adaptation of the TDE algorithm, known as ordinal TDE (O-TDE). For this, a comprehensive comparison using a set of 18 TSOC problems is performed. Experiments conducted show the improvement achieved by the ordinal dictionary-based approach in comparison to four other existing nominal dictionary-based techniques.

DOI

May 2023 2023 Joint International Congress of ILTS, ELITA and LICAGE

Performance of the gender-equity model for liver allocation (GEMA-Na) within the first 30 and 60 days of listing

Manuel Luis Rodríguez-Perálvarez, Antonio Manuel Gómez-Orellana, Avik Majumdar, Michael Bailey, Geoffrey W McCaughan, Paul Gow, Marta Guerrero, Rhiannon Taylor, David Guijo-Rubio, César Hervás-Martínez, Emmanuel a Tsochatzis

Background: Models for liver transplant (LT) allocation have been trained to predict mortality or delisting for sickness at 90 days. Their performance in a context of waiting list shortening is uncertain. Methods: Retrospective study of two cohorts of patients enlisted for LT in the UK (2010-2020) and Australia (1997-2020). Mortality or delisting for sickness within the first 30 and 60 days was evaluated. Harrell’s c statistics (Hc) were used to assess discrimination. Results: In all, 7,133 patients from the UK (33% women) and 1,638 patients from Australia (26.4% women) were included. Mortality or delisting for sickness at 30 and 60 days were 2.6% and 4.5% in the UK, and 2.7% and 4.6% in Australia. In the UK at 30 days, GEMA-Na obtained the highest discrimination (Hc=0.818; 95%CI 0.783-0.853), followed by MELD-Na (Hc=0.796; 95%CI 0.759-0.834; p=0.004), and MELD 3.0 (Hc=0.782; 95%CI 0.743-0.821; p<0.001). In Australia at 30 days, the most accurate forecasts came from GEMA-Na (Hc=0.832; 95%CI 0.770-0.894), followed by MELD 3.0 (Hc=0.805; 95%CI 0.737-0.872; p=0.030), and MELD-Na (Hc=0.789; 95%CI 0.717-0.861; p=0.018). The discrimination benefit of GEMA-Na at 30 days was more pronounced in women: Hc=0.824 (95%CI 0.767-0.822) in the UK, and Hc=0.856 (95%CI 0.768-0.944) in Australia. Results were similar at 60 days. Differential prioritization at 30 days occurred in 15% of patients when comparing GEMA-Na vs MELD-Na, and in 15.3% of patients when comparing GEMA-Na vs MELD 3.0. Patients differently prioritized by GEMA-Na over MELD-Na had higher risk of the outcome at 30 days (OR=2.5; p=0.043) and at 60 days (OR=4.2; p=0.005). Patients differently prioritized by GEMA-Na over MELD 3.0 had increased likelihood of the outcome at 30 days (OR=7.3; p=0.009) and at 60 days (OR=3.8; p=0.004). Conclusions: GEMA-Na outperformed MELD-Na and MELD 3.0 to predict mortality in a context of waiting list shortening and it could obviate gender-disparities for accessing LT.

March 2023 48o Congreso Anual de la Asociación Española para el Estudio del Hígado

Utilidad del gender-equity model for liver allocation (GEMA) en un contexto de acortamiento de la lista de espera de trasplante hepático

Manuel Luis Rodríguez-Perálvarez, Antonio Manuel Gómez-Orellana, Avik Majumdar, María Dolores Ayllón, Pedro Antonio Gutiérrez, Pilar Barrera Baena, David Guijo-Rubio, César Hervás-Martínez, Manuel De La Mata, Emmanuel a Tsochatzis

Introducción: Los modelos de priorización en lista de espera de trasplante hepático (TH) han sido entrenados para predecir mortalidad en lista a 90 días. Sin embargo, muchos centros tienen listas de espera más cortas, lo cual podría cuestionar su utilidad. Métodos: Estudio observacional en dos cohortes poblacionales de adultos incluidos en lista de espera de TH electivo. El evento principal del estudio fue la mortalidad en lista o la salida de lista por empeoramiento en los primeros 30 días o 60 días tras la inclusión. Se utilizó el estadístico c de Harrell (Hc) para comparar la capacidad discriminativa de los modelos MELD-Na, MELD 3.0 y GEMA-Na (Gender-Equity Model for Liver Allocation; Rodríguez-Perálvarez et al. Lancet Gastro Hepatol, en prensa). Resultados: Se incluyeron 7.133 pacientes del Reino Unido (33% mujeres) y 1.638 pacientes (26,4% mujeres) de Australia. La probabilidad de mortalidad o salida de lista por empeoramiento a 30 días y a 60 días fue de 2,6% y 4,5% en la cohorte británica, y de 2,7% y 4,6% en la cohorte australiana. En la cohorte británica a los 30 días, el modelo con mejor discriminación fue GEMA-Na (Hc = 0,818; IC95% 0,783-0,853), seguido de MELD-Na (Hc = 0,796; IC95% 0,759-0,834; p = 0,004), y MELD 3.0 (Hc = 0,782; IC95% 0,743-0,821; p < 0,001). En la cohorte australiana a los 30 días, el modelo que mostró mejores predicciones fue GEMA-Na (Hc = 0,832; IC95% 0,770- 0,894), seguido de MELD 3.0 (Hc = 0,805; IC95% 0,737-0,872; p = 0,030), y MELD-Na (Hc = 0,789; IC95% 0,717-0,861; p = 0,018). Estos resultados fueron superponibles en el análisis a 60 días. La ventaja de GEMA-Na frente al resto de modelos fue más pronunciada en el subgrupo de mujeres, donde obtuvo Hc = 0,824 (IC95% 0,767-0,882) a los 30 días y Hc = 0,814 (IC95% 0,770-0,858) a los 60 días en la cohorte británica, y Hc = 0,856 (IC95% 0,768-0,944) a los 30 días y Hc = 0,784 (IC95% 0,680-0,889) a los 60 días en la cohorte australiana. En la cohorte total, el 51,6% de los pacientes modificarían su puntuación en dos o más puntos al comparar GEMA-Na con MELD-Na. Se realizaron 1,977 trasplantes a los 30 días (22,5%) y 3,013 trasplantes a los 60 días (34,4%). Comparando GEMA-Na con MELD-Na, existió priorización diferencial en 15% y 10,4% de los pacientes a 30 y 60 días, respectivamente. Comparando GEMA-Na con MELD3.0, existió priorización diferencial en 15,3% y 12,8% de los pacientes a 30 y 60 días, respectivamente. Los pacientes diferencialmente priorizados por GEMA-Na vs. MELD-Na presentaron mayor riesgo del evento principal a 30 días (OR = 2,5; p = 0,043) y 60 días (OR = 4,2; p = 0,005). Del mismo modo, Los pacientes diferencialmente priorizados por GEMA-Na vs. MELD 3.0 presentaron mayor riesgo del evento principal a 30 días (OR = 7,3; p = 0,009) y 60 días (OR = 3,8; p = 0,004). Conclusiones: El modelo GEMA es superior a MELD-Na y MELD 3.0 en un contexto de lista de espera de TH inferior a 90 días, particularmente en mujeres, ofreciendo la posibilidad de eliminar disparidades de género en el acceso al trasplante.

March 2023 Atmospheric Research

One month in advance prediction of air temperature from Reanalysis data with eXplainable Artificial Intelligence techniques

Antonio Manuel Gómez-Orellana, David Guijo-Rubio, Jorge Pérez-Aracil, Pedro Antonio Gutiérrez, Sancho Salcedo-Sanz, César Hervás-Martı́nez

In this paper we have tackled the problem of long-term air temperature prediction with eXplainable Artificial Intelligence (XAI) models. Specifically, we have evaluated the performance of an Artificial Neural Network (ANN) architecture with sigmoidal neurons in the hidden layer, trained by means of an evolutionary algorithm (Evolutionary ANNs, EANNs). This XAI model architecture (XAI-EANN) has been applied to the long-term air temperature prediction at different sub-regions of the South of the Iberian Peninsula. In this case, the average August air temperature has been predicted from ERA5 Reanalysis data variables, obtaining good predictions skills and explainable models in terms of the input climatological variables considered. A cluster analysis has been first carried out in terms of the average air temperature in the zone, in such a way that a number of sub-regions with different air temperature behaviour have been defined. The proposed XAI-EANN model architecture has been applied to each of the defined sub-regions, in order to find significant differences among them, which can be explained with the XAI-EANN models obtained. Finally, a comprehensive comparison against some state-of-the-art techniques has also been carried out, concluding that there are statistically significant differences in terms of accuracy in favour of the proposed XAI-EANN model, which also benefits from being an XAI model.

DOI

March 2023 The Lancet Gastroenterology & Hepatology

Development and validation of the Gender-Equity Model for Liver Allocation (GEMA) to prioritise candidates for liver transplantation: a cohort study

Manuel Luis Rodríguez-Perálvarez, Antonio Manuel Gómez-Orellana, Avik Majumdar, Michael Bailey, Geoffrey W McCaughan, Paul Gow, Marta Guerrero, Rhiannon Taylor, David Guijo-Rubio, César Hervás-Martínez, Emmanuel a Tsochatzis

Summary Background The Model for End-stage Liver Disease (MELD) and its sodium-corrected variant (MELD-Na) have created gender disparities in accessing liver transplantation. We aimed to derive and validate the Gender-Equity Model for liver Allocation (GEMA) and its sodium-corrected variant (GEMA-Na) to amend such inequities. Methods In this cohort study, the GEMA models were derived by replacing creatinine with the Royal Free Hospital glomerular filtration rate (RFH-GFR) within the MELD and MELD-Na formulas, with re-fitting and re-weighting of each component. The new models were trained and internally validated in adults listed for liver transplantation in the UK (2010–20; UK Transplant Registry) using generalised additive multivariable Cox regression, and externally validated in an Australian cohort (1998–2020; Royal Prince Alfred Hospital [Australian National Liver Transplant Unit] and Austin Hospital [Victorian Liver Transplant Unit]). The study comprised 9320 patients: 5762 patients for model training, 1920 patients for internal validation, and 1638 patients for external validation. The primary outcome was mortality or delisting due to clinical deterioration within the first 90 days from listing. Discrimination was assessed by Harrell’s concordance statistic. Findings 449 (5·8%) of 7682 patients in the UK cohort and 87 (5·3%) of 1638 patients in the Australian cohort died or were delisted because of clinical deterioration within 90 days. GEMA showed improved discrimination in predicting mortality or delisting due to clinical deterioration within the first 90 days after waiting list inclusion compared with MELD (Harrell’s concordance statistic 0·752 [95% CI 0·700–0·804] vs 0·712 [0·656–0·769]; p=0·001 in the internal validation group and 0·761 [0·703–0·819] vs 0·739 [0·682–0·796]; p=0·036 in the external validation group), and GEMA-Na showed improved discrimination compared with MELD-Na (0·766 [0·715–0·818] vs 0·742 [0·686–0·797]; p=0·0058 in the internal validation group and 0·774 [0·720–0·827] vs 0·745 [0·690–0·800]; p=0·014 in the external validation group). The discrimination capacity of GEMA-Na was higher in women than in the overall population, both in the internal (0·802 [0·716–0·888]) and external validation cohorts (0·796 [0·698–0·895]). In the pooled validation cohorts, GEMA resulted in a score change of at least 2 points compared with MELD in 1878 (52·8%) of 3558 patients (25·0% upgraded and 27·8% downgraded). GEMA-Na resulted in a score change of at least 2 points compared with MELD-Na in 1836 (51·6%) of 3558 patients (32·3% upgraded and 19·3% downgraded). In the whole cohort, 3725 patients received a transplant within 90 days of being listed. Of these patients, 586 (15·7%) would have been differently prioritised by GEMA compared with MELD; 468 (12·6%) patients would have been differently prioritised by GEMA-Na compared with MELD-Na. One in 15 deaths could potentially be avoided by using GEMA instead of MELD and one in 21 deaths could potentially be avoided by using GEMA-Na instead of MELD-Na. Interpretation GEMA and GEMA-Na showed improved discrimination and a significant re-classification benefit compared with existing scores, with consistent results in an external validation cohort. Their implementation could save a clinically meaningful number of lives, particularly among women, and could amend current gender inequities in accessing liver transplantation. Funding Junta de Andalucía and EDRF.

DOI URL

February 2023 Machine Learning Algorithms and Applications in Engineering

Machine Learning Applications in Real-World Time Series Problems

Antonio Manuel Durán-Rosal, David Guijo-Rubio

This first section introduces the topic presented and the related state-of-theart developments. Time series data mining (TSDM) mainly consists of the following tasks: anomaly detection (Blázquez-García et al., 2020), classification (Ismail-Fawaz et al., 2019), analysis and preprocessing (Hamilton, 1994), segmentation (Keogh et al., 2004), clustering (Liao, 2005) and prediction (Weigend, 2018). More concretely, this chapter is focused on the applications of time series preprocessing, segmentation and prediction to real-world problems.

DOI

January 2023 Ciencia violeta: I Encuentro Científico sobre Investigación con Perspectiva de Género

Corrección de la disparidad de género en el acceso al trasplante hepático

Antonio M Gómez-Orellana, Manuel Lu\ś Rodríguez-Perálvarez, David Guijo-Rubio, Marta Guerrero, César Hervás-Mart\'éz

El acceso al trasplante hepático está basado en el principio de urgencia, el cual concede mayor prioridad a los pacientes más graves. Concretamente, el modelo MELD-Na [1] se utiliza a nivel mundial para priorizar las listas de espera [2], y comprende cuatro parámetros analíticos: creatinina, bilirrubina, INR y sodio.Debido a la influencia de la creatinina por la masa muscular [3], este parámetro infraestima la gravedad de la enfermedad hepática en las mujeres, recibiendo éstas menor puntuación MELD-Na y por ende menor prioridad. Esta disparidad de género en el acceso al trasplante ha motivado un 30% más de mortalidad en las mujeres que esperan un trasplante hepático en comparación con los hombres [4].Para tratar de corregir dicha disparidad se desarrolló el modelo GEMA-Na GenderEquity Model for Liver Allocation [5], el cual sustituye la creatinina por una medida más precisa de la función renal, la cual no está influenciada por la masa muscular.Los resultados obtenidos en casi 10.000 pacientes procedentes de dos países diferentes indican que el modelo GEMA-Na corrige las disparidades de género yreduce la mortalidad en lista de espera de trasplante hepático.

November 2022 Expert Systems with Applications

COVID-19 contagion forecasting framework based on curve decomposition and evolutionary artificial neural networks: A case study in Andalusia, Spain

Miguel Díaz-Lozano, David Guijo-Rubio, Pedro Antonio Gutiérrez, Antonio Manuel Gómez-Orellana, Isaac Túñez, Luis Ortigosa-Moreno, Armando Romanos-Rodríguez, Javier Padillo-Ruiz, César Hervás-Martínez

Many types of research have been carried out with the aim of combating the COVID-19 pandemic since the first outbreak was detected in Wuhan, China. Anticipating the evolution of an outbreak helps to devise suitable economic, social and health care strategies to mitigate the effects of the virus. For this reason, predicting the SARS-CoV-2 transmission rate has become one of the most important and challenging problems of the past months. In this paper, we apply a two-stage mid and long-term forecasting framework to the epidemic situation in eight districts of Andalusia, Spain. First, an analytical procedure is performed iteratively to fit polynomial curves to the cumulative curve of contagions. Then, the extracted information is used for estimating the parameters and structure of an evolutionary artificial neural network with hybrid architectures (i.e., with different basis functions for the hidden nodes) while considering single and simultaneous time horizon estimations. The results obtained demonstrate that including polynomial information extracted during the training stage significantly improves the mid- and long-term estimations in seven of the eight considered districts. The increase in average accuracy (for the joint mid- and long-term horizon forecasts) is 37.61% and 35.53% when considering the single and simultaneous forecast approaches, respectively.

DOI URL

September 2022 Computational Intelligence in Security for Information Systems Conference

Hackathon in Teaching: Applying Machine Learning to Life Sciences Tasks

David Guijo-Rubio, Víctor M Vargas, Javier Barbero-Gómez, Jose v Die, Pablo González-Moreno

Programming has traditionally been an engineering competence, but recently it is acquiring significant importance in several areas, such as Life Sciences, where it is considered to be essential for problem solving based on data analysis. Therefore, students in these areas need to improve their programming skills related to the data analysis process. Similarly, engineering students with proven technical ability may lack the biological background which is likewise fundamental for problem-solving. Using hackathon and teamwork-based tools, students from both disciplines were challenged with a series of problems in the area of Life Sciences. To solve these problems, we established work teams that were trained before the beginning of the competition. Their results were assessed in relation to their approach in obtaining the data, performing the analysis and finally interpreting and presenting the results to solve the challenges. The project succeeded, meaning students solved the proposed problems and achieved the goals of the activity. This would have been difficult to address with teams made from the same field of study. The hackathon succeeded in generating a shared learning and a multidisciplinary experience for their professional training, being highly rewarding for both students and faculty members.

DOI

September 2022 Revista de Innovación y Buenas Prácticas Docentes

Hackathon en docencia: aprendizaje automático aplicado a Ciencias de la Vida

David Guijo-Rubio, Victor Manuel Vargas, Javier Barbero-Gómez, Jose v Die, Pablo González-Moreno

Programming has traditionally been an engineering competence, but recently it is acquiring significant importance in several areas, such as Life Sciences, which is considered essential for problem-solving based on data analysis. This work is a case study framed within the need to improve not only the data analysis skills of life sciencestudents, but also the biological background concerning the given issue of engineering students. Using hackathon and teamwork-based tools, students from both disciplines have been made and challenged with a series of problems in the area of Life Sciences.To solve these problems, we established work teams trained before the competition’s beginning. Their results were assessed concerning the approach to obtain the data, perform the analysis, and finally interpret and present the results to solve the challenges. The project outcomes were assessed using structured surveys for students and their overall perception. The project succeeded, meaning students solved the proposed problems and achieved the activity’s goals. These goals would have been difficult to address with teams composed of students from thesame field of study. The hackathon succeeded in generating a shared learning and a multidisciplinary experience for their professional training, being highly rewarding for both students and faculty members.

DOI

July 2022 Journal of Hepatology

Development and validation of the gender-equity model for liver allocation (GEMA) to prioritize liver transplant candidates

Manuel Rodrı́guez-Perálvarez, Antonio M Gómez-Orellana, Avik Majumdar, Geoff McCaughan, Paul Gow, David Guijo-Rubio, César Hervás, Michael Bailey, Emmanuel Tsochatzis

Background and aims: The model for end stage liver disease (MELD) and its sodium-corrected variant (MELD-Na) have created gender disparities in accessing liver transplantation (LT). We derived and validated a new model that replaced creatinine with the Royal Free glomerular filtration rate (PMID: 27779785) within the MELD and MELD-Na formulas. Method: The “Gender-Equity Model for liver Allocation” (GEMA) and its sodium-corrected variant (GEMA-Na) were trained and internally validated in adults listed for LT in the United Kingdom (2010–2020) using generalized additive multivariate Cox regression. The models were externally validated in an Australian cohort (1998–2020). The primary outcome was mortality or delisting due to clinical deterior- ation at 90 days. The Greenwood-Nam-D’Agostino test was used to test calibration. Results: The study comprised 9, 320 patients: 5, 762 patients for model training, 1, 920 patients for internal validation, and 1, 638 patients for external validation. The prevalence of the primary outcome ranged from 5.3% to 6%. In the internal validation cohort, GEMA and GEMA-Na showed a Harrell’s c-statistic = 0.752 and 0.766, respectively, for the primary outcome, which were significantly higher than those of the MELD score (0.712) and the MELD-Na score (0.742). Results were consistent in the external validation cohort. Among women, these differences were more pronounced (see Harrell’s c-statistics in the table). GEMA and GEMA-Na were adequately calibrated and prioritized differently 43.9% and 41.8% of LT patients, respectively. Patients prioritized by GEMA-Na were more often women, had higher prevalence of ascites and showed triple risk of the primary outcome compared to patients prioritized by MELD- Na. One in 15 deaths would be avoided by using GEMA instead of MELD, and 1 in 21 deaths would be avoided by using GEMA-Na instead of MELD-Na. Among women, 1 in 8 deaths would be avoided in either situation. Conclusion: GEMA-Na predicts mortality or delisting due to clinical deterioration in patients awaiting LT more accurately than MELD-Na and its implementation may amend gender disparities.

DOI

May 2022 Proceedings of the 9th International Work-Conference on the Interplay Between Natural and Artificial Computation, (IWINAC2022)

Clustering of COVID-19 Time Series Incidence Intensity in Andalusia, Spain

Miguel Díaz-Lozano, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez

In this paper, an approach based on a time series clustering technique is presented by extracting relevant features from the original temporal data. A curve characterization is applied to the daily contagion rates of the 34 sanitary districts of Andalusia, Spain. By determining the maximum incidence instant and two inflection points for each wave, an outbreak curve can be described by six intensity features, defining its initial and final phases. These features are used to derive different groups using state-of-the-art clustering techniques. The experimentation carried out indicates that $$k=3$$k=3is the optimum number of descriptive groups of intensities. According to the resulting clusters for each wave, the pandemic behavior in Andalusia can be visualised over time, showing the most affected districts in the pandemic period considered. Additionally, in order to perform a pandemic overview of the whole period, the approach is also applied to joint information of all the considered periods

DOI

January 2022 Renewable Energy

Simultaneous short-term significant wave height and energy flux prediction using zonal multi-task evolutionary artificial neural networks

Antonio Manuel Gómez-Orellana, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez

The prediction of wave height and flux of energy is essential for most ocean engineering applications. To simultaneously predict both wave parameters, this paper presents a novel approach using short-term time prediction horizons (6h and 12h). Specifically, the methodology proposed presents a twofold simultaneity: 1) both parameters are predicted by a single model, applying the multi-task learning paradigm, and 2) the prediction tasks are tackled for several neighbouring ocean buoys with such single model by the development of a zonal strategy. Multi-Task Evolutionary Artificial Neural Network (MTEANN) models are applied to two different zones located in the United States, considering measurements collected by three buoys in each zone. Zonal MTEANN models have been compared in a two-phased procedure: 1) against the three individual MTEANN models specifically trained for each buoy of the zone, and 2) against some state-of-the-art regression techniques. Results achieved show that the proposed zonal methodology obtains not only better performance than the individual MTEANN models, but it also requires a lower number of connections. Besides, the zonal MTEANN methodology outperforms state-of-the-art regression techniques. Hence, the proposed approach results in an excellent method for predicting both significant wave height and flux of energy at short-term prediction time horizons.

PDF DOI

January 2022 Computational Intelligence in Security for Information Systems Conference

Gamifying the Classroom for the Acquisition of Skills Associated with Machine Learning: A Two-Year Case Study

Antonio M Durán-Rosal, David Guijo-Rubio, Vı́ctor M Vargas, Antonio M Gómez-Orellana, Pedro a Gutiérrez, Juan C Fernández

Machine learning (ML) is the field of science that combines knowledge from artificial intelligence, statistics and mathematics intending to give computers the ability to learn from data without being explicitly programmed to do so. It falls under the umbrella of Data Science and is usually developed by Computer Engineers becoming what is known as Data Scientists. Developing the necessary competences in this field is not a trivial task, and applying innovative methodologies such as gamification can smooth the initial learning curve. In this context, communities offering platforms for open competitions such as Kaggle can be used as a motivating element. The main objective of this work is to gamify the classroom with the idea of providing students with valuable hands-on experience by means of addressing a real problem, as well as the possibility to cooperate and compete simultaneously to acquire ML competences. The innovative teaching experience carried out during two years meant a great motivation, an improvement of the learning capacity and a continuous recycling of knowledge to which Computer Engineers are faced to.

DOI

November 2021 IEEE Transactions on Cybernetics

Time series clustering based on the characterisation of segment typologies

David Guijo-Rubio, Antonio Manuel Durán-Rosal, Pedro Antonio Gutiérrez, Alicia Troncoso, César Hervás-Martínez

Time-series clustering is the process of grouping time series with respect to their similarity or characteristics. Previous approaches usually combine a specific distance mea- sure for time series and a standard clustering method. However, these approaches do not take the similarity of the different sub- sequences of each time series into account, which can be used to better compare the time-series objects of the dataset. In this article, we propose a novel technique of time-series clustering consisting of two clustering stages. In a first step, a least-squares polynomial segmentation procedure is applied to each time series, which is based on a growing window technique that returns different-length segments. Then, all of the segments are pro- jected into the same dimensional space, based on the coefficients of the model that approximates the segment and a set of statisti- cal features. After mapping, a first hierarchical clustering phase is applied to all mapped segments, returning groups of segments for each time series. These clusters are used to represent all time series in the same dimensional space, after defining another spe- cific mapping process. In a second and final clustering stage, all the time-series objects are grouped. We consider internal clus- tering quality to automatically adjust the main parameter of the algorithm, which is an error threshold for the segmenta- tion. The results obtained on 84 datasets from the UCR Time Series Classification Archive have been compared against three state-of-the-art methods, showing that the performance of this methodology is very promising, especially on larger datasets.

DOI

October 2021 Revista Innovación y Buenas Prácticas Docentes

Potenciando el perfil profesional Científico de Datos mediante dinámicas de competición

David Guijo-Rubio, Victor Manuel Vargas, Antonio Manuel Durán-Rosal, Antonio Manuel Gómez-Orellana, Javier Barbero-Gómez, Juan Carlos Fernández, Pedro Antonio Gutiérrez

Data Science is the area that comprises the development of scientific methods, processes, and systems for extracting knowledge from previously collected data, aiming to analyse the procedures being carried out currently. The professional profile associated with this field is the Data Scientist, generally carried out by Computer Engineers as the skills and competencies acquired during their training are perfectly suited to what this job requires. Due to the need for training new Data Scientists, among other goals, there are different emerging platforms where they can acquire extensive experience, such as Kaggle. The main objective of this teaching experience is to provide students with practical experience on a real problem, as well as the possibility of cooperating and competing at the same time. Thus, the acquisition and development of the necessary competencies in Data Science are carried out in a highly motivating environment. The development of activities related to this profile has had a direct impact on the students, being fundamental the motivation, the learning capacity and the continuous recycling of knowledge to which Computer Engineers are subjected.

DOI

September 2021 Proceedings of the XIX Conference of the Spanish Association for Artificial Intelligence (CAEPIA)

Studying the effect of different Lp norms in the context of Time Series Ordinal Classification

David Guijo-Rubio, Víctor Manuel Vargas Yun, Pedro Antonio Gutiérrez, César Hervás-Martínez

Time Series Ordinal Classification (TSOC) is yet an unexplored field of machine learning consisting in the classification of time series whose labels follow a natural order relationship between them. In this context, a well-known approach for time series nominal classification was previously used: the Shapelet Transform (ST). The exploitation of the ordinal information was included in two steps of the ST algorithm: 1) by using the Pearson’s determination coefficient (R2) for computing the quality of the shapelets, which favours shapelets with better ordering, and 2) by applying an ordinal classifier instead of a nominal one to the transformed dataset. For this, the distance between labels was represented by the absolute value of the difference between the corresponding ranks, i.e. by the L1 norm. In this paper, we study the behaviour of different Lp norms for representing class distances in ordinal regression, evaluating 9 different Lp norms with 7 ordinal time series datasets from the UEA-UCR time series classification repository and 10 different ordinal classifiers. The results achieved demonstrate that the Pearson’s determination coefficient using the L1.9 norm in the computation of the difference between the shapelet and the time series labels achieves a significantly better performance when compared to the rest of the approaches, in terms of both Correct Classification Rate (CCR) and Average Mean Absolute Error (AMAE).

DOI

September 2021 Proceedings of the XIX Conference of the Spanish Association for Artificial Intelligence (CAEPIA)

ReLU-based activations: analysis and experimental study for deep learning

Víctor Manuel Vargas Yun, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez

Activation functions are used in neural networks as a tool to introduce non-linear transformations into the model and, thus, enhance its representation capabilities. They also determine the output range of the hidden layers and the final output. Traditionally, artificial neural networks mainly used the sigmoid activation function as the depth of the network was limited. Nevertheless, this function tends to saturate the gradients when the number of hidden layers increases. For that reason, in the last years, most of the works published related to deep learning and convolutional networks use the Rectified Linear Unit (ReLU), given that it provides good convergence properties and speeds up the training process thanks to the simplicity of its derivative. However, this function has some known drawbacks that gave rise to new proposals of alternatives activation functions based on ReLU. In this work, we describe, analyse and compare different recently proposed alternatives to test whether these functions improve the performance of deep learning models regarding the standard ReLU.

DOI

July 2021 Proceedings of the IEEE International Conference on Fuzzy Systems

Enhancing the ORCA framework with a new Fuzzy Rule Base System implementation compatible with the JFML library

Francisco Javier Rodriguez-Lozano, David Guijo-Rubio, Pedro Antonio Gutiérrez, Jose Manuel Soto-Hidalgo, Juan Carlos Gámez-Granados

Classification and regression techniques are two of the main tasks considered by the Machine Learning area. They mainly depend on the target variable to predict. In this context, ordinal classification represents an intermediate task, which is focused on the prediction of nominal variables where the categories follow a specific intrinsic order given by the problem. Nevertheless, the integration of different algorithms able to solve ordinal classification problems is often unavailable in most of existing Machine Learning software, which hinders the use of new approaches. Therefore, this paper focuses on the incorporation of an ordinal classification algorithm (NSLVOrd) in one of the most complete ordinal regression frameworks, ``Ordinal Regression and Classification Algorithms framework (ORCA)’’ by using both fuzzy rules and the JFML library. The use of NSLVOrd in the ORCA tool as well as a case study with a real database are shown where the obtained results are promising.

DOI

May 2021 PLoS One

Statistical methods versus machine learning techniques for donor-recipient matching in liver transplantation

David Guijo-Rubio, Javier Briceño, Pedro Antonio Gutiérrez, Maria Dolores Ayllón, Rubén Ciria, César Hervás-Martínez

Donor-Recipient (D-R) matching is one of the main challenges to be fulfilled nowadays. Due to the increasing number of recipients and the small amount of donors in liver transplantation, the allocation method is crucial. In this paper, to establish a fair comparison, the United Network for Organ Sharing database was used with 4 different end-points (3 months, and 1, 2 and 5 years), with a total of 39, 189 D-R pairs and 28 donor and recipient variables. Modelling techniques were divided into two groups: 1) classical statistical methods, including Logistic Regression (LR) and Naïve Bayes (NB), and 2) standard machine learning techniques, including Multilayer Perceptron (MLP), Random Forest (RF), Gradient Boosting (GB) or Support Vector Machines (SVM), among others. The methods were compared with standard scores, MELD, SOFT and BAR. For the 5-years end-point, LR (AUC = 0.654) outperformed several machine learning techniques, such as MLP (AUC = 0.599), GB (AUC = 0.600), SVM (AUC = 0.624) or RF (AUC = 0.644), among others. Moreover, LR also outperformed standard scores. The same pattern was reproduced for the others 3 end-points. Complex machine learning methods were not able to improve the performance of liver allocation, probably due to the implicit limitations associated to the collection process of the database.

DOI

November 2020 Energy

Evolutionary artificial neural networks for accurate solar radiation prediction

David Guijo-Rubio, Antonio Manuel Durán-Rosal, Pedro Antonio Gutiérrez, Antonio Manuel Gómez-Orellana, Carlos Casanova-Mateo, Julia Sanz-Justo, Sancho Salcedo-Sanz, César Hervás-Martínez

This paper evaluates the performance of different evolutionary neural network models in a problem of solar radiation prediction at Toledo, Spain. The prediction problem has been tackled exclusively from satellite-based measurements and variables, which avoids the use of data from ground stations or atmospheric soundings. Specifically, three types of neural computation approaches are considered: neural networks with sigmoid-based neurons, radial basis function units and product units. In all cases these neural computation algorithms are trained by means of evolutionary algorithms, leading to robust and accurate models for solar radiation prediction. The results obtained in the solar radiation estimation at the radiometric station of Toledo show an excellent performance of evolutionary neural networks tested. The structure sigmoid unit-product unit with evolutionary training has been shown as the best model among all tested in this paper, able to obtain an extremely accurate prediction of the solar radiation from satellite images data, and outperforming all other evolutionary neural networks tested, and alternative Machine Learning approaches such as Support Vector Regressors or Extreme Learning Machines.

DOI

September 2020 Proceedings of the 5th Workshop on Advances Analytics and Learning on Temporal Data

Ordinal versus nominal time series classification

David Guijo-Rubio, Pedro Antonio Gutiérrez, Anthony Bagnall, César Hervás-Martínez

Time series ordinal classification is one of the less studied problems in time series data mining. This problem consists in classifying time series with labels that show a natural order between them. In this paper, an approach is proposed based on the Shapelet Transform (ST) specifically adapted to ordinal classification. ST consists of two different steps: 1) the shapelet extraction procedure and its evaluation; and 2) the classifier learning using the transformed dataset. In this way, regarding the first step, 3 ordinal shapelet quality measures are proposed to assess the shapelets extracted, and, for the second step, an ordinal classifier is applied once the transformed dataset has been constructed. An empirical evaluation is carried out, considering 7 ordinal datasets from the UEA & UCR Time Series Classification (TSC) repository. The results show that a support vector ordinal classifier applied to the ST using the Pearson’s correlation coefficient (R2) is the combination achieving the best resultsin terms of two evaluation metrics: accuracy and average mean absolute error. A final comparison against three of the most popular and compet-itive nominal TSC techniques is performed, demonstrating that ordinal approaches can achieve higher performances even in terms of accuracy.

PDF DOI

August 2020 Current Opinion in Organ Transplantation

Machine learning methods in organ transplantation

David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez

Purpose of review: Machine Learning techniques play an important role in organ transplantation. Analysing the main tasks for which they are being applied, together with the advantages and disadvantages of their use, can be of crucial interest for clinical practitioners. Recent findings: In the last 10 years, there has been an explosion of interest in the application of ML techniques to organ transplantation. Several approaches have been proposed in the literature aiming to find universal models by considering multicenter cohorts or from different countries. Moreover, recently, deep learning has also been applied demonstrating a notable ability when dealing with a vast amount of information. Summary: Organ transplantation can benefit from ML in such a way to improve the current procedures for donor-recipient matching or to improve standard scores. However, a correct preprocessing is needed to provide consistent and high quality databases for ML algorithms, aiming to robust and fair approaches to support expert decision-making systems.

DOI

July 2020 Proceedings of the 2020 IEEE International Joint Conference on Neural Networks (IJCNN2020)

Time series ordinal classification via shapelets

David Guijo-Rubio, Pedro Antonio Gutiérrez, Anthony Bagnall, César Hervás-Martínez

Nominal time series classification has been widely developed over the last years. However, to the best of our knowledge, ordinal classification of time series is an unexplored field, and this paper proposes a first approach in the context of the shapelet transform (ST). For those time series dataset where there is a natural order between the labels and the number of classes is higher than 2, nominal classifiers are not capable of achieving the best results, because the models impose the same cost of misclassification to all the errors, regardless the difference between the predicted and the ground-truth. In this sense, we consider four different evaluation metrics to do so, three of them of an ordinal nature. The first one is the widely known Information Gain (IG), proved to be very competitive for ST methods, whereas the remaining three measures try to boost the order information by refining the quality measure. These three measures are a reformulation of the Fisher score, the Spearman’s correlation coefficient (ρ), and finally, the Pearson’s correlation coefficient (R²). An empirical evaluation is carried out, considering 7 ordinal datasets from the UEA & UCR time series classification repository, 4 classifiers (2 of them of nominal nature, whereas the other 2 are of ordinal nature) and 2 performance measures (correct classification rate, CCR, and average mean absolute error, AMAE). The results show that, for both performance metrics, the ST quality metric based on R² is able to obtain the best results, specially for AMAE, for which the differences are statistically significant in favour of R².

DOI

May 2020 Atmospheric Research

Ordinal regression algorithms for the analysis of convective situations over Madrid-Barajas airport

David Guijo-Rubio, Carlos Casanova-Mateo, Julia Sanz-Justo, Pedro Antonio Gutiérrez, Sara Cornejo-Bueno, César Hervás-Martínez, Sancho Salcedo-Sanz

In this paper we tackle a problem of convective situations analysis at Adolfo-Suarez Madrid-Barajas International Airport (Spain), based on Ordinal Regression algorithms. The diagnosis of convective clouds is key in a large airport like Barajas, since these meteorological events are associated with strong winds and local precipitation, which may affect air and land operations at the airport. In this work, we deal with a 12-h time horizon in the analysis of convective clouds, using as input variables data from a radiosonde station and also from numerical weather models. The information about the objective variable (convective clouds presence at the airport) has been obtained from the Madrid-Barajas METAR and SPECI aeronautical reports. We treat the problem as an ordinal regression task, where there exist a natural order among the classes. Moreover, the classification problem is highly imbalanced, since there are very few convective clouds events compared to clear days. Thus, a process of oversampling is applied to the database in order to obtain a better balance of the samples for this specific problem. An important number of ordinal regression methods are then tested in the experimental part of the work, showing that the best approach for this problem is the SVORIM algorithm, based on the Support Vector Machine strategy, but adapted for ordinal regression problems. The SVORIM algorithm shows a good accuracy in the case of thunderstorms and Cumulonimbus clouds, which represent a real hazard for the airport operations.

DOI

March 2020 Applied Acoustics

Validation of artificial neural networks to model the acoustic behaviour of induction motors

Francisco Javier Jiménez-Romero, David Guijo-Rubio, Francisco Ramón Lara-Raya, Antonio Ruiz-González, César Hervás-Martínez

In the last decade, the sound quality of electric induction motors is a hot topic in the research field. Specially, due to its high number of applications, the population is exposed to physical and psychological discomfort caused by the noise emission. Therefore, it is necessary to minimise its psychological impact on the population. In this way, the main goal of this work is to evaluate the use of multitask artificial neural networks as a modelling technique for simultaneously predicting psychoacoustic parameters of induction motors. Several inputs are used, such as, the electrical magnitudes of the motor power signal and the number of poles, instead of separating the noise of the electric motor from the environmental noise. Two different kind of artificial neural networks are proposed to evaluate the acoustic quality of induction motors, by using the equivalent sound pressure, the loudness, the roughness and the sharpness as outputs. Concretely, two different topologies have been considered: simple models and more complex models. The former are more interpretable, while the later lead to higher accuracy at the cost of hiding the cause-effect relationship. Focusing on the simple interpretable models, product unit neural networks achieved the best results: 38.77 for MSE and 13.11 for SEP. The main benefit of this product unit model is its simplicity, since only 10 inputs variables are used, outlining the effective transfer mechanism of multitask artificial neural networks to extract common features of multiple tasks. Finally, a deep analysis of the acoustic quality of induction motors in done using the best product unit neural networks.

DOI

February 2020 Neural Computing and Applications

Prediction of convective clouds formation using evolutionary neural computation techniques

David Guijo-Rubio, Pedro Antonio Gutiérrez, Carlos Casanova-Mateo, Juan Carlos Fernández, Antonio Manuel Gómez-Orellana, Pablo Salvador-González, Sancho Salcedo-Sanz, César Hervás-Martínez

The prediction of convective clouds formation is a very important problem in different areas such as agriculture, natural hazards prevention or transport-related facilities, among others. In this paper we evaluate the capacity of different types of evolutionary artificial neural networks to predict the formation of convective clouds, tackling the problem as a classification task. We use data from Madrid-Barajas airport, including variables and indices derived from the Madrid-Barajas airport radiosonde station. As objective variable, we use the cloud information contained in the METAR and SPECI meteorological reports from the same airport and we consider a prediction time-horizon of 12 hours. The performance of different types of evolutionary artificial neural networks has been discussed and analysed, including three types of basis functions (Sigmoidal Unit, Product Unit and Radial Basis Function), and two types of models, a mono-objective evolutionary algorithm with two objective functions and a multi-objective evolutionary algorithm optimised by the two objective functions simultaneously. We show that some of the developed neuro-evolutionary models obtain high quality solutions to this problem, due to its high unbalance characteristic.

DOI

January 2020 PLoS One

Using machine learning methods to determine a typology of patients with HIV-HCV infection to be treated with antivirals

Antonio Rivero-Juárez, David Guijo-Rubio, Francisco Téllez, Rosario Palacios, Dolores Merino, Juan Macías, Juan Carlos Fernández, Pedro Antonio Gutiérrez, Antonio Rivero, César Hervás-Martínez

Several European countries have established criteria for prioritising initiation of treatment in patients infected with the hepatitis C virus (HCV) by grouping patients according to clinical characteristics. Based on neural network techniques, our objective was to identify those factors for HIV/HCV co-infected patients (to which clinicians have given careful consideration before treatment uptake) that have not being included among the prioritisation criteria. This study was based on the Spanish HERACLES cohort (NCT02511496) (April-September 2015, 2940 patients) and involved application of different neural network models with different basis functions (product-unit, sigmoid unit and radial basis function neural networks) for automatic classification of patients for treatment. An evolutionary algorithm was used to determine the architecture and estimate the coefficients of the model. This machine learning methodology found that radial basis neural networks provided a very simple model in terms of the number of patient characteristics to be considered by the classifier (in this case, six), returning a good overall classification accuracy of 0.767 and a minimum sensitivity (for the classification of the minority class, untreated patients) of 0.550. Finally, the area under the ROC curve was 0.802, which proved to be exceptional. The parsimony of the model makes it especially attractive, using just eight connections. The independent variable “recent PWID” is compulsory due to its importance. The simplicity of the model means that it is possible to analyse the relationship between patient characteristics and the probability of belonging to the treated group.

DOI

January 2020 Ocean Engineering

Short- and long-term energy flux prediction using Multi-Task Evolutionary Artificial Neural Networks

David Guijo-Rubio, Antonio Manuel Gómez-Orellana, Pedro Antonio Gutiérrez, César Hervás-Martínez

This paper presents a novel approach to tackle simultaneously short- and long-term energy flux prediction (specifically, at 6h, 12h, 24h and 48h time horizons). The methodology proposed is based on the Multi-Task Learning paradigm in order to solve the four problems with a single model. We consider Multi-Task Evolutionary Artificial Neural Networks (MTEANN) with four outputs, one for each time prediction horizon. For this purpose, three buoys located at the Gulf of Alaska are considered. Measurements collected by these buoys are used to obtain the target values of energy flux, whereas, only reanalysis data are used as input values, allowing the applicability to other locations. The performance of three different basis functions (Sigmoidal Unit, Radial Basis Function and Product Unit) are compared against some popular stateof-the-art approaches such as Extreme Learning Machines and Support Vector Regressors. The results show that MTEANN methodology using Sigmoidal Units in the hidden layer and a linear output achieves the best performance. In this way, the multi-task methodology is an excellent and lower-complexity approach for energy flux prediction at both short- and long-term prediction time horizons. Furthermore, the results also confirm that reanalysis data is enough for describing well the problem tackled.

DOI

November 2019 Proceedings of the 20th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL2019)

Modelling survival by machine learning methods in liver transplantation: application to the UNOS dataset

David Guijo-Rubio, Pedro J. Villalón-Vaquero, Pedro Antonio Gutiérrez, María Dolores Ayllón, Javier Briceño, César Hervás-Martínez

The aim of this study is to develop and validate a machine learning (ML) model for predicting survival after liver transplantation based on pre-transplant donor and recipient characteristics. For this pur- pose, we consider a database from the United Network for Organ Shar- ing (UNOS), containing 29 variables and 39,095 donor-recipient pairs, describing liver transplantations performed in the United States of Amer- ica from November 2004 until June 2015. The dataset contains more than a 74% of censoring, being a challenging and difficult problem. Sev- eral methods including proportional-hazards regression models and ML methods such as Gradient Boosting were applied, using 10 donor char- acteristics, 15 recipient characteristics and 4 shared variables associated with the donor-recipient pair. In order to measure the performance of the seven state-of-the-art methodologies, three different evaluation met- rics are used, being the concordance index (ipcw) the most suitable for this problem. The results achieved show that, for each measure, a dif- ferent technique obtains the highest value, performing almost the same, but, if we focus on ipcw, Gradient Boosting outperforms the rest of the methods.

DOI

November 2019 Proceedings of the 20th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL2019)

A hybrid approach to time series classification with shapelets

David Guijo-Rubio, Pedro Antonio Gutiérrez, R. Tavenard, Anthony Bagnall

Shapelets are phase independent subseries that can be used to discriminate between time series. Shapelets have proved to be very effective primitives for time series classification. The two most prominent shapelet based classification algorithms are the shapelet transform (ST) and learned shapelets (LS). One significant difference between these approaches is that ST is data driven, whereas LS searches the entire shapelet space through stochastic gradient descent. The weakness of the former is that full enumeration of possible shapelets is very time consuming. The problem with the latter is that it is very dependent on the initialisation of the shapelets. We propose hybridising the two approaches through a pipeline that includes a time constrained data driven shapelet search which is then passed to a neural network architecture of learned shapelets for tuning. The tuned shapelets are extracted and formed into a transform, which is then classified with a rotation forest. We show that this hybrid approach is significantly better than either approach in isolation, and that the resulting classifier is not significantly worse than a full shapelet search.

DOI

February 2019 VII Congreso Científico de Investigadores en Formación

Predicción de altura de ola mediante discretización basada en distribuciones utilizando clasificación ordinal

David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez

Wave height prediction is an important task for ocean and marine resource management. Traditionally, regression techniques are used for this prediction, but estimating continuous changes in the corresponding time series can be very difficult. With the purpose of simplifying the prediction, wave height can be discretised in consecutive intervals, resulting in a set of ordinal categories. Despite this discretisation could be performed using the criterion of an expert, the prediction could be biased to the opinion of the expert, and the obtained categories could be unrepresentative of the data recorded. In this paper, we propose a novel automated method to categorise the wave height based on selecting the most appropriate distribution from a set of well- suited candidates. Moreover, given that the categories resulting from the discretisation show a clear natural order, we propose to use different ordinal classifiers. The methodology is tested in real wave height data collected from two buoys located in the Gulf of Alaska. We also incorporate reanalysis data in order to increase the accuracy of the predictors. The results confirm that this kind of discretisation is suitable for the time series considered and that the ordinal classifiers achieve outstanding results in comparison with nominal techniques.

January 2019 International Journal of Refrigeration

Validation of multitask artificial neural networks to model desiccant wheels activated at low temperature

Francisco Comino, David Guijo-Rubio, Manuel Ruiz De Adana, César Hervás-Martínez

Desiccant wheels (DW) could be a serious alternative to conventional dehumidification systems based on direct expansion units, which depend on electrical energy. The main objective of this work was to evaluate the use of multitask artificial neural networks (ANNs) as a modelling technique for DWs activated at low temperature with low computational load and good accuracy. Two different ANN models were developed to predict two output variables: outlet process air temperature and humidity ratio. The results show that a sigmoid unit neural network obtained 0.390 and 2.987 for MSE and SEP, respectively. These results outline the effective transfer mechanism of multitask ANNs to extract common features of multiple tasks, being useful for modelling a DW activated at low temperature. On the other hand, moisture removal capacity of the DW and its performance were analysed under several inlet air conditions, showing an increase under process air conditions close to saturation air.

DOI

December 2018 Atmospheric Research

Prediction of low-visibility events due to fog using ordinal classification

David Guijo-Rubio, Pedro Antonio Gutiérrez, Carlos Casanova-Mateo, Julia Sanz-Justo, Sancho Salcedo-Sanz, César Hervás-Martínez

The prediction of low-visibility events is very important in many human activities, and crucial in transportation facilities such as airports, where they can cause severe impact in flight scheduling and safety. The design of accurate predictors for low-visibility events can be approached by modelling future visibility conditions based on past values of different input variables, recorded at the airport. The use of autoregressive time series forecasters involves adjusting the order of the model (number of past series values or size of the sliding window), which usually depends on the dynamical nature of the time series. Moreover, the same window size is normally used for all the data, thought it would be reasonable to use different sliding windows. In this paper, we propose a hybrid prediction model for daily low-visibility events, which combines fixed-size and dynamic windows, and adapts its size according to the dynamics of the time series. Moreover, visibility is labelled using three ordered categories (FOG, MIST and CLEAR), and the prediction is then carried out by means of ordinal classifiers, in order to take advantage of the ordinal nature of low-visibility events. We evaluate the model using a dataset from Valladolid airport (Spain), where radiation fog is very common in autumn and winter months. The considered data set includes five different meteorological input variables (wind speed and direction, temperature, relative humidity and QNH - pressure adjusted at mean sea level) and the Runway Visual Range (RVR), which is used to characterize the low-visibility events at the airport. The results show that the proposed hybrid window model with ordinal classification leads to very robust performance prediction in daily time-horizon, improving the results obtained by the persistence model and alternative prediction schemes tested.

DOI

November 2018 Proceedings of Third Bilbao Data Science Workshop (BiDAS 3)

Time series clustering based on the characterisation of segment typologies

David Guijo-Rubio, Antonio Manuel Durán-Rosal, Pedro Antonio Gutiérrez, Alicia Troncoso, César Hervás-Martínez

Time series clustering is the process of grouping time series with respect to their similarity or characteristics. Previous approaches usually combine a specific distance measure for time series and a standard clustering method. However, these approaches do not take the similarity of the different subsequences of each time series into account, which can be used to better compare the time series objects of the dataset. In this paper, we propose a novel technique of time series clustering based on two clustering stages. In a first step, a least squares polynomial segmentation procedure is applied to each time series, which is based on a growing window technique that returns different-length segments. Then, all the segments are projected into same dimensional space, based on the coefficients of the model that approximates the segment and a set of statistical features. After mapping, a first hierarchical clustering phase is applied to all mapped segments, returning groups of segments for each time series. These clusters are used to represent all time series in the same dimensional space, after defining another specific mapping process. In a second and final clustering stage, all the time series objects are grouped. We consider internal clustering quality to automatically adjust the main parameter of the algorithm, which is an error threshold for the segmentation. The results obtained on 84 datasets from the UCR Time Series Classification Archive have been compared against two state-of-the-art methods, showing that the performance of this methodology is very promising.

November 2018 Proceedings of the 2018 International Conference on Intelligent Data Engineering and Automated Learning (IDEAL2018)

Distribution-Based Discretisation and Ordinal Classification Applied to Wave Height Prediction

David Guijo-Rubio, Antonio Manuel Durán-Rosal, Antonio Gómez-Orellana, Pedro Antonio Gutiérrez, César Hervás-Martínez

Wave height prediction is an important task for ocean and marine resource management. Traditionally, regression techniques are used for this prediction, but estimating continuous changes in the corresponding time series can be very difficult. With the purpose of simplifying the prediction, wave height can be discretised in consecutive intervals, resulting in a set of ordinal categories. Despite this discretisation could be performed using the criterion of an expert, the prediction could be biased to the opinion of the expert, and the obtained categories could be unrepresentative of the data recorded. In this paper, we propose a novel automated method to categorise the wave height based on selecting the most appropriate distribution from a set of well-suited candidates. Moreover, given that the categories resulting from the discretisation show a clear natural order, we propose to use different ordinal classifiers instead of nominal ones. The methodology is tested in real wave height data collected from two buoys located in the Gulf of Alaska and South Kodiak. We also incorporate reanalysis data in order to increase the accuracy of the predictors. The results confirm that this kind of discretisation is suitable for the time series considered and that the ordinal classifiers achieve outstanding results in comparison with nominal techniques.

DOI

October 2018 Proceedings of the 2018 Conference of the Spanish Association for Artificial Intelligence (CAEPIA2018)

Algoritmos de aprendizaje automático para predicción de niveles de niebla usando ventanas estáticas y dinámicas

Miguel Diaz-Lozano, David Guijo-Rubio, Pedro Antonio Gutiérrez, Carlos Casanova-Mateo, Sancho Salcedo-Sanz, César Hervás-Martínez

Los eventos de muy baja visibilidad producidos por niebla son un problema recurrente en ciertas zonas cercanas a rı́os y grandes montañas, que afectan fuertemente a la actividad humana en diferentes aspectos. Este tipo de eventos pueden llegar a suponer costes materiales e incluso humanos muy importantes. Uno de los sectores más influenciados por las condiciones de muy baja visibilidad son los medios de transporte, fundamen- talmente el transporte aéreo, cuya actividad se ve seriamente mermada, provocando retrasos, cancelaciones y, en el peor de los casos, terribles accidentes. En el aeropuerto de Valladolid son muy frecuentes las situaciones de baja visibilidad por niebla, especialmente en los meses considerados de invierno (noviembre, diciembre, enero y febrero). Esto afecta de forma directa a la manera en la que operan los vuelos de este aeropuerto. De esta forma, es muy importante conocer las posibles condiciones de niebla a corto plazo para aplicar procedimientos de seguridad y organización dentro del aeropuerto. En el presente artı́culo se propone el uso de diferentes modelos de ventanas dinámicas y estáticas junto con clasificadores de aprendizaje automático, para la predicción de niveles de niebla. En lugar de abordar el problema como una tarea de regresión, la variable de interés para la caracterización del nivel de visibilidad en el aeropuerto (Rango Visual de Pista, RVR) se discretiza en 3 categorı́as, lo que aporta mayor robustez a los modelos de clasificación obtenidos. Los resultados indican que una combinación de ventana dinámica con ventana estática, junto con modelos de clasificación basados en Gradient Boosted Trees es la metodologı́a que proporciona los mejores resultados.

PDF

May 2018 Bioinspired Optimization Methods and their Applications (BIOMA2018)

Hybrid Weighted Barebones Exploiting Particle Swarm Optimization Algorithm for Time Series Representation

Antonio Manuel Durán-Rosal, David Guijo-Rubio, Pedro Antonio Gutiérrez, César Hervás-Martínez

The amount of data available in time series is recently increasing in an exponential way, making difficult time series preprocessing and analysis. This paper adapts different methods for time series representation, which are based on time series segmentation. Specifically, we consider a particle swarm optimization algorithm (PSO) and its barebones exploitation version (BBePSO). Moreover, a new variant of the BBePSO algorithm is proposed, which takes into account the ositions of the particles throughout the generations, where those close in time are given more importance. This methodology is referred to as weighted BBePSO (WBBePSO). The solutions obtained by all the algorithms are finally hybridised with a local search algorithm, combining simple segmentation strategies (Top-Down and Bottom-Up). WBBePSO is tested in 13 time series and compared against the rest of algorithms, showing that it leads to the best results and obtains consistent representations.

DOI

June 2017 14th International Work-Conference on Artificial and Natural Neural Networks (IWANN2017)

A coral reef optimization algorithm for wave height time series segmentation problems

Antonio Manuel Durán-Rosal, David Guijo-Rubio, Pedro Antonio Gutiérrez, Sancho Salcedo-Sanz, César Hervás-Martínez

Time series segmentation can be approached using metaheuristics procedures such as genetic algorithms (GAs) methods, with the purpose of automatically finding segments and determine similarities in the time series with the lowest possible clustering error. In this way, segments belonging to the same cluster must have similar properties, and the dissimilarity between segments of different clusters should be the highest possible. In this paper we tackle a specific problem of significant wave height time series segmentation, with application in coastal and ocean engineering. The basic idea in this case is that similarity between segments can be used to characterise those segments with high significant wave heights, and then being able to predict them. A recently metaheuristic, the Coral Reef Optimization (CRO) algorithm is proposed for this task, and we analyze its performance by comparing it with that of a GA in three wave height time series collected in three real buoys (two of them in the Gulf of Alaska and another one in Puerto Rico). The results show that the CRO performance is better than the GA in this problem of time series segmentation, due to the better exploration of the search space obtained with the CRO.

DOI

November 2016 Actas del I Congreso de Investigadores Noveles de la Universidad de Córdoba

Clustering de Series Temporales basado en la Extracción de Tipologías de Segmentos

David Guijo-Rubio, Antonio Manuel Durán-Rosal, Pedro Antonio Gutiérrez, César Hervás-Martínez

Durante los últimos años se ha producido un gran aumento de series de datos distribuidas a lo largo del tiempo, o lo que es lo mismo, de series temporales. Este crecimiento ha traído consigo un interés en su agrupamiento o clustering, proceso de agrupar las series de forma que, las series de un mismo grupo sean muy similares entre sí y muy diferentes a las de otros grupos. Cuando las series temporales son muy largas, presentan ruido o valores perdidos, muchos de los métodos actuales obtienen soluciones que no son aceptables. En este artículo se presenta un nuevo método de clustering para series temporales, mediante polinomios utilizando un método conocido como Growing Window. De esta forma simplificamos la serie a un conjunto de coeficientes lineales de grado variable, para, posteriormente, agrupar los diferentes segmentos y hallar los centroides de cada cluster. Al final la serie queda simplificada a n x d elementos, siendo n el número de clusters y d el grado del polinomio utilizado en la aproximación y es con esta representación con la que se realiza la agrupación final. El objetivo de esta nueva metodología consiste en disminuir la dimensionalidad de la serie temporal, la sensibilidad del agrupamiento y los datos perdidos.

September 2016 Proceedings of the 17th Conference of the Spanish Association for Artificial Intelligence (CAEPIA 2016)

Multiclass Prediction of Wind Power Ramp Events Combining Reservoir Computing and Support Vector Machines

Manuel Dorado-Moreno, Antonio Manuel Durán-Rosal, David Guijo-Rubio, Pedro Antonio Gutiérrez, L. Prieto, Sancho Salcedo-Sanz, César Hervás-Martínez

This paper proposes a reservoir computing architecture for predicting wind power ramp events (WPREs), which are strong increases or decreases of wind speed in a short period of time. This is a problem of high interest, because WPREs increases the maintenance costs of wind farms and hinders the energy production. The standard echo state network architecture is modified by replacing the linear regression used to compute the reservoir outputs by a nonlinear support vector machine, and past ramp function values are combined with reanalysis data to perform the prediction. Another novelty of the study is that we will predict three type of events (negative ramps, non-ramps and positive ramps), instead of binary classification of ramps, given that the type of ramp can be crucial for the correct maintenance of the farm. The model proposed obtains satisfying results, being able to correctly predict around 70% of WPREs and outperforming other models.

DOI

January 0001

Assessing the Efficient Market Hypothesis for Cryptocurrencies with High-Frequency Data Using Time Series Classification | David Guijo-Rubio Search David Guijo-Rubio David Guijo-Rubio Home Experience Publications Teaching Press Contact Assessing the Efficient Market Hypothesis for Cryptocurrencies with High-Frequency Data Using Time Series Classification Rafael Ayllón-Gavilán, David Guijo-Rubio, Pedro a Gutiérrez, César Hervás-Martı́nez January 2022 Cite DOI Abstract This work analyzes the performance of several state-of-the-art Time Series Classification (TSC) techniques in the cryptocurrency returns modeling field.