Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Allied Health Professionals’ Corner
Author Reply
Book Review
Brief Communication
Case Report
Case Series
Clinical Case Report
Clinicopathological Conference
Commentary
Corrigendum
Current Issue
Editorial
Editorial – World Kidney Day 2016
Editorial Commentary
Erratum
Foreward
Guideline
Guidelines
Image in Nephrology
Images in Nephrology
In-depth Review
Letter to Editor
Letter to the Editor
Letter to the Editor – Authors’ reply
Letters to Editor
Literature Review
Media & News
Nephrology in India
Notice of Corrigendum
Notice of Retraction
Obituary
Original Article
Patient’s Voice
Perspective
Research Letter
Retraction Notice
Review
Review Article
Short Review
Special Article
Special Feature
Special Feature - World Kidney Day
Systematic Review
Technical Note
Varia
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Search in posts
Search in pages
Filter by Categories
Allied Health Professionals’ Corner
Author Reply
Book Review
Brief Communication
Case Report
Case Series
Clinical Case Report
Clinicopathological Conference
Commentary
Corrigendum
Current Issue
Editorial
Editorial – World Kidney Day 2016
Editorial Commentary
Erratum
Foreward
Guideline
Guidelines
Image in Nephrology
Images in Nephrology
In-depth Review
Letter to Editor
Letter to the Editor
Letter to the Editor – Authors’ reply
Letters to Editor
Literature Review
Media & News
Nephrology in India
Notice of Corrigendum
Notice of Retraction
Obituary
Original Article
Patient’s Voice
Perspective
Research Letter
Retraction Notice
Review
Review Article
Short Review
Special Article
Special Feature
Special Feature - World Kidney Day
Systematic Review
Technical Note
Varia
View/Download PDF

Translate this page into:

Original Article
ARTICLE IN PRESS
doi:
10.25259/ijn_411_23

Comparison of Different Classification Models to Predict Mortality Among Patients Diagnosed with Acute Kidney Injury

Department of Biostatistics, Jawaharlal Institute of Postgraduate Medical Education and Research (JIPMER), Puducherry, India
Department of Nephrology, Jawaharlal Institute of Postgraduate Medical Education and Research (JIPMER), Puducherry, India

Corresponding author: KT Harichandrakumar, Department of Biostatistics, Jawaharlal Institute of Postgraduate Medical Education and Research (JIPMER), Puducherry, India. E-mail: hckumar2001@gmail.com

Licence
This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, transform, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.

How to cite this article: Renukadevi S, Harichandrakumar KT, Ganapathy S, PS Priyamvada, Nair NS. Comparison of Different Classification Models to Predict Mortality Among Patients Diagnosed with Acute Kidney Injury. Indian J Nephrol. doi: 10.25259/ijn_411_23

Abstract

Background

Acute kidney injury (AKI) is characterized by an abrupt reduction in the kidney’s functioning, and it has long-term repercussions. Predictive models are used widely in predicting mortality, identifying patients who are at risk, and making diagnoses. This study was conducted to compare the predictive accuracy of machine learning models with logistic regression (LR) for mortality among patients with AKI.

Materials and Methods

This study consists of data from 994 patients who underwent treatment for AKI in a tertiary health care center in South India between 2013 to 2021. Univariate Analysis was used to identify potential AKI predictors. The predictive performance of Multiple Binary logistic regression (MBLR) and machine learning models was compared using accuracy rate, sensitivity, specificity, and area under the curve (AUC). The split sample method was used for internal validation.

Results

In the training dataset, the Decision Tree (DT) and Random Forest (RF) achieved high AUCs of 0.87 and 0.86, respectively. However, in the testing dataset, their performance declined, suggesting potential overfitting. In contrast, LR and artificial neural network (ANN) demonstrated stable accuracy in both training and testing, with an AUC of 0.80, indicating better generalizability for clinical application.

Conclusion

While DT and RF showed strong predictive capabilities in training, their reduced performance in testing limits their clinical applicability. LR and ANN demonstrated consistent accuracy across datasets, making them more reliable for real-world mortality prediction in patients with AKI. These findings highlight the importance of carefully validating machine learning models before clinical implementation.

Keywords

Acute kidney injury
Artificial neural network
Decision tree
Logistic regression
Random forest

Introduction

The prevalence of acute kidney injury (AKI) is increasing worldwide but is still underestimated due to the non-referral of patients to hospitals.1 According to reports, 13.3 million people worldwide suffer AKI every year, with 85% of those living in developing countries. Furthermore, AKI is estimated to be responsible for up to 1.7 million deaths each year.2 AKI is often multifactorial, especially in the setting of hospitalization.3 Some of the precipitating factors known to cause AKI are sepsis, urinary tract obstruction, heart failure, liver disease, ischemia, major surgery, myonecrosis, and several nephrotoxins, primarily dominated by sepsis in nearly 50% of the cases.4 Due to the limited data available on AKI from India, there is widespread disparity in the reported prognostic factors that lead to adverse outcomes in AKI. Most studies use a binary logistic regression (LR) as the predictive tool to identify adverse outcomes.

Machine learning is actively enhancing healthcare by creating new medical techniques, managing patient data, and improving disease treatments. Physicians have been using predictive models for a long time, and now these predictive models are used in medical decision-making processes.5

Regression analysis typically aims to identify the linear relationship between the outcome variable and one or more independent variables. However, in real-world data, the relationships can often be non-linear. To capture these non-linear relationships, machine learning models such as decision trees (DTs), random forests (RFs), and neural networks are employed. Traditional predictive models, like LR, have been widely used in medical literature. However, due to limited data on adverse outcomes of AKI, this study seeks to determine if newer machine learning algorithms, which are now extensively tested in medical specialties, offer any advantages over LR. While existing studies on AKI prediction using machine learning models have significantly contributed to early intervention strategies, such as AKI prediction.6-9 The current study focuses on predicting the mortality associated with AKI and compares the robustness of DT, RF, artificial neural network (ANN), and LR analysis for predicting mortality in patients with AKI.

Materials and Methods

The study was a retrospective analysis of record-based data. It considers 994 de-identified patient records with AKI undergoing treatment in the Department of Nephrology at a tertiary healthcare center in South India from 2013 to 2021.

Patients with any condition undergo an initial examination. If AKI is suspected during this process, they are referred to the nephrology department. Patients with AKI who also had CKD were not included. AKI staging followed the KDIGO-2012 Guideline,10, utilizing criteria that include assessments of urine output and serum creatinine levels. Mortality outcomes were recorded during the hospital stay, and the final database included only patients under nephrology care until discharge.

In this study, four classification models, multiple binary logistic regression (MBLR), DT, RF, and ANN, were developed to predict mortality among patients with AKI. During data preprocessing, missing values in the outcome variable (mortality status) for five individuals were excluded, while missing values in the independent variables were retained. Quantitative variables were converted to categorical variables where feasible for analysis in R. Variables for model development were selected through univariate analysis utilizing chi-square and t-tests. Additionally, in DT and RF models, the Gini impurity index and Entropy were calculated to evaluate the importance of variables in predicting mortality among patients with AKI.

Statistical analysis

In data analysis, the distributions of all the categorical variables, such as demographic and clinical characteristics, were summarized as frequency with percentage. The quantitative variables, were summarized as mean with standard deviation or median with interquartile range based on the normality assumption. A chi-square test was performed to assess the association between categorical variables and outcomes. An independent Student’s t-test or Mann-Whitney U test was performed to compare the quantitative variables between the two groups. Further, the strength of the association between each variable and mortality status was assessed using Cramer’s V method. The variables found to be significantly associated with mortality were included in the development of the predictive models. These models were compared using accuracy measures, including accuracy rate, area under the curve, sensitivity, and specificity. Data was split using the random split method, with 70% (663) of subjects in the training dataset and 30% (282) in the testing dataset. The data were analyzed using SPSS version 19 & R Studio, and all the statistical analyses were carried out at a 5% level of significance, and a p-value < 0.05 was considered statistically significant.

Logistic regression analysis

LR is one of the generalized linear models (GLM) where the outcome variable is binary.

MBLR and more than one independent/predictor variable. The model can be written as

log p 1 p = β 0 + β 1 x 1 + β 2 x 2 + + β p x p

Where β’s are regression coefficients.

x’s are independent variables such as ICU admission, diabetic mellitus, cancer, AKI stage, and so on.

p is the probability of mortality.

The expected probability of the binary outcome is:

Y = 1 / X = 1 1 + e β 0 + β 1 x 1 + β 2 x 2 + + β p x p )

Where Y is the mortality (mortality status). The significance of each predictor was assessed by the Wald test statistic, and the goodness of fit of the Binary LR model was assessed by the Hosmer-Lemeshow test.11

A DT is a tree-like structure to construct the models. The dataset is split up into smaller subgroups during this procedure, and a related DT is gradually created in tandem. In a DT, the outcomes or predictions are represented by the leaf nodes, while the choices or decisions based on the input variables are represented by the decision nodes.

DT algorithms use the Gini impurity index/Entropy/Information gain to split a node. The dataset is divided into branches in the initial split of the DT, based on the attribute with the maximum information gain and the lowest Gini index, and the same process is repeated on each branch.12 The current study incorporated the Classification and regression tree (CART) algorithm based on the characteristics of the data.

Where the method supports both classification and regression by recursively partitioning the dataset into binary subsets until further splitting is no longer possible or the maximum tree depth is reached13

The CART Algorithm uses the Gini Index as a measure of impurity or purity to construct DTs.

G i n i i m d e x = 1 i = 1 n p i 2

Where,

The n represents the number of classes (survived/dead) in the label, and

pi is the probability of randomly selecting an example in class i

Gini impurity ranges from 0 to 1, 0 represents the purity of the classification, and 1 represents the impurity of the classification.

Entropy is an information theory metric that also measures the impurity in the observations.

The amount of knowledge a feature imparts to a class is measured by its information gain. Finding the attribute with the most information gain is the key to building a DT.

G a i n ( T , X ) = E n t r o p y T E n t r o p y ( T , X )

T represents the parent node

X represents the child node

Random forest (RF) is a type of ensemble learning method, which means that it combines the predictions of multiple individual models to achieve better accuracy and generalization performance.

It constructs multiple decision trees; each trained on randomly selected data samples and features. During prediction, the outputs of all trees are aggregated by majority vote or averaging to produce the result. Aggregation refers to merging the outputs of multiple subgroup decision trees. Each tree produces a prediction, and the result is determined by majority vote across all trees.14-16

Artificial neural network

The design of ANNs is influenced by the architecture of biological neural networks. They are made up of interconnected neurons or nodes, and the connections between these neurons or nodes determine how well they can anticipate the outcome. The activation signal is passed through a transfer function to activate the neurons depending on the weighted total of inputs, and it then votes for the best answer to create a signal output.17,18

Y = X 1 . W 1 + X 2. W 2 + + X M . W M + B i a s

W is the weightage given to each predictor

X is the independent variable

i.e., Net input

Y = i = 1 m X i . W i + B i a s

Z = A F Y

AF- Activation function

The current study used sigmoid function which is widely used,

f y = 1 1 + e y

where f(y) is the sigmoid function, and e is Euler’s number.

Sigmoid functions most often show a return value in the range of 0 to 1. If the value <0.5 indicates there is no activation, and >0.5 indicates there is an activation.

The network has three types oflayers: input, hidden (which may consist of more than one), and outputIt operates as a feedforward model, passing data from one layer to the next. When an error occurs, the weights are adjusted using backpropagation.19

Results

The distribution of demographic and clinical parameters is presented in Table 1. The results of the univariate analysis in Table 2 show that sex, AKI type, comorbidities, hemoglobin, major surgery, total platelet count, albumin, and alkaline phosphatase were not associated with mortality (p>0.05). The clinical characteristics such as ICU stay, diabetic status, malignancy status, AKI stage, hypertensive status, vasopressors requirement, usage of a ventilator, dialysis, contrast intake, Alkaline Phosphatase, AKI type, and etiology were found to be significantly associated (p<0.05) with mortality, and no quantitative variables considered in the study were found to be significantly different (p>0.05) with the outcome. Following that, the strength of the association of different variables with mortality was quantified using Cramer’s V index. Among the variables considered in the study, the requirement of vasopressors was found to be strongly associated with mortality, followed by ICU stay, usage of a ventilator, dialysis, contrast intake, hypertensive status, malignancy status, AKI stage, Alkaline Phosphatase, and diabetic status.

Table 1: Distribution of clinical characteristics and demographic characteristics among individuals with AKI
Variables Frequency
Sex
 Female 333 (33.6)
 Male 658 (66.4)
AKI type
 CAAKI 564 (56.8)
 HAAKI 429 (43.2)
ICU stay
 No 282 (28.4)
 Yes 712 (71.6)
Diabetic status
 No 831 (83.7)
 Yes 162 (16.3)
Malignancy status
 No 883 (88.8)
 Yes 111 (11.2)
AKI stage
 Stage 1, 2 359 (36.1)
 Stage 3 635 (63.9)
Comorbidities
 No 724 (72.9)
 Yes 269 (27.1)
Hypertensive status
 No 656 (66.1)
 Yes 337 (33.9)
Coronary artery disease
 No 928 (94.3)
 Yes 56 (5.7)
Vasopressors requirement
 No 525 (52.8)
 Yes 469 (47.2)
Usage of a ventilator
 No 477 (48)
 Yes 517 (52)
Dialysis
 No 496 (49.4)
 Yes 484 (50.6)
Contrast intake
 No 743 (74.7)
 Yes 249 (25.1)
Major surgery
 No 656 (79.2)
 Yes 172 (20.8)
 Others 69 (6.9)
Mortality
 Survived 474 (47.7)
 Death 520 (52.3)
Hemoglobin
 Anemic 193 (19.5)
 Non-anemic 796 (80.5)
Total platelet count
 Thrombocytopenia 989 (99.7)
 Normal 3 (0.3)
Albumin
 Non-normal 703 (72.5)
 Normal 267 (27.5)
No days in the hospital
 1 month 969 (97.8)
 >1 month 22 (2.2)
ALP
 Normal 530(55)
 High ALP 433(45)

AKI: Acute kidney injury, CAAKI: Community-acquired AKI, HAAKI: Hospital-acquired AKI, ALP: Alkaline phosphate

Table 2: Association of sociodemographic & clinical characteristics with mortality
Category Status
Statistical significance
Survived Died
Sex
 Female 166 (49.8) 167 (50.2) 0.342
 Male 307 (46.7) 351 (53.3)
AKI type
 CAAKI 267 (47.3) 297 (52.7) 0.776
 HAAKI 207 (48.3) 222 (51.7)
ICU stay
 No 233 (82.6) 49 (17.4) <0.001
 Yes 241 (33.8) 471 (66.2)
Diabetic status
 No 408 (49.1) 423 (50.9) 0.036
 Yes 65 (40.1) 97 (59.9)
Malignancy status
 No 442 (50.1) 441 (49.9) <0.001
 Yes 32 (28.8) 79 (71.2)
AKI stage
 Stage 1,2 195 (54.3) 164 (45.7) 0.002
 Stage 3 279 (43.9) 356 (56.1)
Comorbidities
 No 342 (47.2) 382 (52.8) 0.682
 Yes 131 (48.7) 138 (51.3)
Hypertensive status
 No 351 (53.5) 305 (46.5) <0.001
 Yes 122 (36.2) 215 (63.8)
Coronary artery disease
 No 439 (47.3) 489 (52.7) 0.898
 Yes 26 (46.4) 30 (53.6)
Vasopressors requirement
 No 385 (73.3) 140 (26.7) <0.001
 Yes 89 (19%) 380 (81%)
Usage of ventilator
 No 326 (68.3) 151 (31.7) <0.001
 Yes 148 (28.6) 369 (71.4)
Dialysis
 No 296 (59.7) 200 (40.3) <0.001
 Yes 173 (35.7) 311 (64.3)
Contrast
 No 405 (54.5) 338 (45.5) <0.001
 Yes 69 (27.7) 180 (72.3)
Major surgery
 No 335 (51.1) 321 (48.9) 0.603
 Yes 84 (48.8) 88 (51.2)
Hemoglobin
 Anemic 85 (44) 108 (56) 0.253
 Non anemic 387 (48.6) 409 (51.4)
Total platelet count
 Thrombocytopenia 474 (47.9) 515 (52.1) 0.251
 Normal 0 (0) 3 (100)
Albumin
 Abnormal 325 (46.2) 378 (53.8) 0.256
 Normal 137 (51.3) 130 (48.7)
ALP
 Normal 266 (50.2) 264 (49.8) 0.03
 High 187 (43.2) 246 (56.8)
No days in the hospital
 1 month 461 (47.6) 508 (52.4) 0.517
 >1 month 12 (54.5) 10 (45.5)

AKI: Acute kidney injury, ICU: Intensive care unit, ALP: Alkaline phosphate, CAAKI: Community-acquired AKI, HAAKI: Hospital-acquired AKI

The results of the MBLR model are given in Table 3. Among the variables incorporated into the Multiple logistic regression model, only ICU stay, vasopressor requirement, and contrast intake were found to be statistically significant (p<0.05).

Table 3: Bivariate analysis of the association between clinical characteristics and mortality
Adjusted odds ratio 95% CI
Wald p value
Lower Upper
ICU stay 5.89 3.16 11.30 5.48 <0.01
Diabetic status 1.61 0.93 2.81 1.71 0.087
Malignancy status 0.46 0.23 0.92 -2.15 0.031
AKI stage 0.78 0.52 1.38 -0.9 0.36
Hypertensive status 1.12 0.63 2.01 0.39 0.69
Vasopressors requirement 8.10 0.52 1.48 9.44 <0.001
Usage of ventilator 0.87 0.52 1.48 -0.48 0.69
Dialysis 0.53 0.31 0.90 -2.30 0.020
Contrast intake 6.10 2.99 12.91 4.87 <0.001
Alkaline phosphate 1.37 0.90 2.07 1.48 0.13
AKI type 0.84 0.52 1.38 -0.65 0.513

AKI: Acute kidney injury, ALP: Alkaline phosphate

The DT was made based on CARTs. The nodes in DT were divided based on the Gini impurity index/entropy/information gain. The details of the impurity measures are shown in Table 4.

Table 4: Gini impurity index, entropy & information gain
Variables Gini impurity index Entropy Information Gain
Vasopressors requirement 0.346 0.773 0.226
ICU stay 0.402 0.850 0.148
Usage of ventilator 0.421 0.881 0.117
Dialysis 0.421 0.962 0.037
Contrast intake 0.472 0.958 0.040
Hypertensive status 0.486 0.979 0.020
Malignancy status 0.490 0.985 0.013
AKI stage 0.494 0.991 0.007
ALP 0.496 0.994 0.004
Diabetes status 0.497 0.995 0.003
AKI type 0.499 0.998 <0.001

AKI: Acute kidney injury, ALP: Alkaline phosphate

Table 4 shows the Gini impurity index, entropy, and information gain values for various variables. Based on the table, the vasopressors requirement has the lowest Gini impurity index (0.346) and entropy (0.773), suggesting it is the most important variable for predicting the outcome. AKI type has the highest Gini impurity index (0.499) and entropy (0.998), indicating it has the least impact on the outcome. The DT structure for training data has been shown in Figure 1.

Decision tree structure for the training dataset.
Figure 1:
Decision tree structure for the training dataset.

The neural network architecture selection for the current study selected four nodes for the hidden layer and two nodes for the output layer to code the dependent variable mortality.

The network diagram used to predict mortality from the predictors has been shown in Figure 2. The diagram shows the 11 input nodes, four hidden nodes, and two output nodes representing mortality.

Artificial Neural Network structure.
Figure 2:
Artificial Neural Network structure.

For building the RF Model, the Gini coefficient and the mean reduction in accuracy were used to rank the input variables in the outcome prediction model. The number of DTs was set at 1000.

The feature selection results by the RF algorithm are based on the mean decrease Gini. Vasopressor requirements have a higher mean decrease Gini, and the second most important variable is ICU stay, and the variable alkaline phosphate, malignancy status is found to have the least mean decrease Gini.

An error plot in Figure 3 for the RF model shows the Out-of-bag error rate for different classes and out-of-bag samples over varying numbers of trees. It indicates that the lowest error occurs around 100 trees for the given data, with red indicating “Survived” and green indicating “Dead.” Comparison of predictive accuracy, sensitivity, specificity, and AUC of LR, DT, RF, and ANN for both the training dataset and testing dataset have been shown in Table 5.

Error plot for random forest model.
Figure 3:
Error plot for random forest model.
Table 5: Accuracy measures for the dataset
Training dataset
Testing dataset
Accuracy AUC Sensitivity Specificity Accuracy AUC Sensitivity Specificity
LR 0.79 (0.75,0.82) 0.80 (0.77,0.82) 0.77 (0.73,0.80) 0.82 (0.80,0.83) 0.80 (0.75,0.84) 0.80 (0.75,0.84) 0.82 (0.80,0.83) 0.78 (0.76,0.79)
DT 0.86 (0.83,0.88) 0.87 (0.85,0.89) 0.82 (0.80,0.83) 0.91 (0.90,0.92) 0.77 (0.72,0.81) 0.77 (0.71,0.82) 0.79 (0.77,0.80) 0.75 (0.73,0.76)
RF 0.86 (0.83,0.88) 0.86 (0.82,0.89) 0.86 (0.85,0.87) 0.86 (0.85,0.87) 0.78 (0.74,0.82) 0.79 (0.73,0.84) 0.82 (0.80,0.83) 0.72 (0.70,0.73)
ANN 0.80 (0.75,0.82) 0.80 (0.76,0.83) 0.78 (0.76,0.79) 0.82 (0.80,0.83) 0.80 (0.77,0.83) 0.80 (0.75,0.84) 0.85 (0.83,0.86) 0.75 (0.73,0.76)

LR: Logistic regression, DT: Decision tree, RF: Random forest, ANN: Artificial neural network

Discussion

This study was conducted to compare the predictive accuracy of supervised machine learning models (DT, RF, ANN) with LR for mortality in patients diagnosed with AKI. The findings of the univariate analysis corroborate previous studies indicating that higher serum creatinine levels and the need for vasopressors are key contributors to mortality risk in patients with AKI.20,21 Also, study shows that using a straightforward yes/no assessment for ICU stay, vasopressor use, and contrast administration predicts mortality in patients with AKI as effectively as complex methods.

Table 5 shows the training dataset performance, with DT and RF models exhibiting notable accuracy (86%) and strong AUC values of 0.87 with CI (0.85-0.89) and 0.86 with CI (0.82,0.89). However, their performance declined in the testing dataset, with accuracy dropping to 77% and 78% respectively, suggesting overfitting. LR and ANN demonstrated stable accuracy and AUC (0.80) across both datasets, and these models showed competitive results in terms of accuracy, AUC, sensitivity, and specificity during training and testing, which highlights their generalizability and reliability in clinical applications.

In clinical practice, predicting the risk of mortality and its associated factors among patients diagnosed with AKI is critical for guiding treatment decisions and improving outcomes. MBLR models estimate this risk based on patient data. DTs make it easy to identify the factors increasing risk, such as needing vasopressors or prolonged ICU stay. RFs improve accuracy by combining multiple DTs, with results showing that using about 200 trees gives the best balance of performance and efficiency. ANNs analyze complex patterns in factors like diabetes, cancer, and AKI stage. Using ReLU activation and a sigmoid output, ANNs provide a probability score for death risk. These models help doctors identify high-risk patients and improve treatment plans.

This study has limitations such as the overfitting observed in DT and RF models, which poses a challenge in applying machine learning to clinical data. Additionally, key biomarkers such as TIMP-2 and IGFBP-7, now considered valuable for AKI prognosis, were not included, as they were not available during the study period (2013–2021). Future studies should integrate these biomarkers to enhance predictive accuracy and clinical decision-making.

This study mainly evaluated accuracy measures (AUC, sensitivity, specificity). Future work should include cross-validation, calibration, and larger multi-center datasets to make the models more reliable, generalizable, and useful for AKI risk stratification. Given the increasing role of machine learning in healthcare, future research should focus on optimizing feature selection, hyperparameter tuning, and ensemble techniques to further refine predictive models.

This study underscores the importance of selecting predictive models that balance accuracy, interpretability, and generalizability. While DT and RF excel in capturing complex patterns in training, their overfitting limits real-world applicability. LR and ANN, with their stable and consistent performance, remain the most reliable models for predicting mortality in patients with AKI.

Future research should focus on expanding datasets, integrating novel biomarkers, and validating models with external datasets. Until then, LR and ANNs remain the most dependable models for predicting AKI mortality, offering both interpretability and predictive accuracy.

In conclusion, this study demonstrates that while DT and RF models showed strong predictive performance in training data, their decline in testing accuracy suggests overfitting, which limits their clinical utility. LR and ANN, on the other hand, maintained consistent and reliable performance, making them more suitable for real-world clinical applications.

Conflicts of interest

There are no conflicts of interest.

References

  1. , . Acute kidney injury: Definition, pathophysiology and clinical phenotypes. Clin Biochem Rev. 2016;37:85-98.
    [PubMed] [PubMed Central] [Google Scholar]
  2. , , , , , , et al. Incidence and epidemiology of acute kidney injury in a pediatric Malawian trauma cohort: A prospective observational study. BMC Nephrol. 2020;21:98.
    [CrossRef] [PubMed] [PubMed Central] [Google Scholar]
  3. , , , . Acute kidney injury: Prevalence and outcomes in Southern Indian population. JCDR 2018
    [CrossRef] [Google Scholar]
  4. , . AKI in the ICU: Definition, epidemiology, risk stratification, and outcomes. Kidney Int. 2012;81:819-25.
    [CrossRef] [PubMed] [Google Scholar]
  5. , . The rise of artificial intelligence in healthcare applications. In: Artificial intelligence in healthcare Artificial intelligence in healthcare. Elsevier; . p. :25-60.
    [Google Scholar]
  6. , , , , . Interpretable machine learning models for early prediction of acute kidney injury after cardiac surgery. BMC Nephrol. 2023;24:326.
    [CrossRef] [PubMed] [PubMed Central] [Google Scholar]
  7. , , , , . Machine learning model for predicting acute kidney injury progression in critically ill patients. BMC Med Inform Decis Mak. 2022;22:17.
    [CrossRef] [PubMed] [PubMed Central] [Google Scholar]
  8. , , , , , , et al. Machine learning for the prediction of acute kidney injury in patients with sepsis. J Transl Med. 2022;20:215.
    [CrossRef] [PubMed] [PubMed Central] [Google Scholar]
  9. , , , , , , et al. Predictive modeling for acute kidney injury after percutaneous coronary intervention in patients with acute coronary syndrome: A machine learning approach. European Journal of Medical Research. 2024;29:76.
    [CrossRef] [PubMed] [PubMed Central] [Google Scholar]
  10. , , , , , , et al. KDIGO 2012 clinical practice guideline for the evaluation and management of chronic kidney disease. Kidney Int. 2013;3:5-14.
    [Google Scholar]
  11. , , . Applied logistic regression. John Wiley Sons; .
  12. , , , . Classification and regression trees. Routledge; .
  13. , , . Decision trees for business intelligence and data mining. boca raton, FL: CRC Press; .
  14. , , , , , . Application of Random forest algorithm in short-term wind power prediction. 2018 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC) 2018:1-4.
    [Google Scholar]
  15. , , . A review on the applications of random forest in bioinformatics. Current Bioinformatics. 2018;13:3-12.
    [Google Scholar]
  16. Rashid S, Raza M, Farooq MU, “Impact of data preprocessing techniques on the performance of random forest classifier,” 2019 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), 2019;1-5.
  17. , , . Deep learning. MIT Press; .
  18. . The perceptron: A probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65:386-408.
    [CrossRef] [PubMed] [Google Scholar]
  19. . A review on artificial neural networks and its’ Applicability. BJMSR. 2020;2:48-51.
    [Google Scholar]
  20. , , , , , , et al. Risk factors, clinical features and outcome of new-onset acute kidney injury among critically ill patients: A database analysis based on prospective cohort study. BMC Nephrol. 2021;22:289.
    [CrossRef] [PubMed] [PubMed Central] [Google Scholar]
  21. , , , , , . Risk factors and 180-day mortality of acute kidney disease in critically ill patients: A multi-institutional study. Front Med (Lausanne). 2023;10:1153670.
    [CrossRef] [PubMed] [PubMed Central] [Google Scholar]
Show Sections