GET THE APP

A Method for the Definition of Immunological Non-Response to Antiretroviral Therapy Based on Review Analysis and Supervised Classification Model
Journal of Antivirals & Antiretrovirals

Journal of Antivirals & Antiretrovirals
Open Access

ISSN: 1948-5964

Research Article - (2022)

A Method for the Definition of Immunological Non-Response to Antiretroviral Therapy Based on Review Analysis and Supervised Classification Model

Yong Shuai1,2*, Hemeng Peng1,4, Xiaodong Wang1,3 and Xiaoqing Peng1,3
 
*Correspondence: Yong Shuai, Chongqing CEPREI Industrial Technology Research Institute Co., Ltd, Chongqing, 401332, China, Tel: +086 156 8362 2221, Email:

Author info »

Abstract

Background: Immunological Non-Response (INR) accelerated the progression of AIDS disease and brought serious difficulties to the treatment of HIV-1 infected people. The current definition of INR lacked a credible consensus, which affected the diagnosis, treatment and scientific research of INR.

Methods: We systematically analyzed the open source INR related references, used visualization techniques and machine learning classification models to propose the features, models and criteria that define INR.

Results: We summarized some consensus on the definition of INR. Among the features that defined INR, CD4+ T-cell absolute number and ART time were the best feature to define INR. The supervised learning classification model had high accuracy in defining INR, and the Support Vector Machine (SVM) had the highest accuracy in the commonly used supervised classification learning model. Based on supervised learning model and visualization technology, we proposed some criteria that could help to reach a consensus on INR definition.

Conclusion: This study provided consensus, features, model and criteria for defining INR.

Keywords

Immunological non-response; Definition; Review analysis; Visualization; Supervised learning classification model

Introduction

After Human Immunodeficiency Virus type 1 (HIV-1) entered the human body, it would cause the reduction of CD4+ T lymphocytes (abbreviated as CD4+ T-cell), the gradual exhaustion of CD4+ T-cell,and the destruction of the physical immune function. After effective combined Antiretroviral Therapy (cART), most People Living with HIV (PLWH) will be able to achieve virological suppression, and the CD4+ T-cell count will increase significantly, and the body's immunity function will gradually recover.

However, there was still about 9%-45% of People Living with HIV (PLWH) whose CD4+ T-cell count level had not recovered although they had reached the standard of virological suppression, and immunological non-response has occurred. These PLWH were calling PLWH with poor immune reconstitution or Immunological Non-Responders (INRs) [1]. Corresponding to them were Immunological Responders (IRs), these patients achieved both virological suppression and CD4+ T-cell count return to normal value. Since INR would increase the morbidity and mortality of AIDS-Defining diseases (AD) and Non-AIDS-Defined diseases (NAD), research on INR had become the focus of current HIVrelated research, and the relevant contents included the definition, mechanism and treatment plans of INR [2-6].

Because researchers had different understandings to INR, there was no consensus on the definition of INR. The features that defined INR included the CD4+ T-cell count absolute number, CD4+ T-cell count increase, CD4+ T-cell count growth rate, CD4/CD8 ratio, time to receive effective cART, Virologic Suppression (VS) time. Each standard had a significant interval. For example, the CD4+ T-cell count absolute number included 200, 250, 300, 500, and ART time included 6 months, 1 year, 2 years, 5 years, 10 years, etc. At the same time, some references still have undefined intervals for the definition of INR. For example, the reference [7] defined INR with the standard of cd4 count absolute number <200 and IR with the standard of CD4+ T-cell count absolute number >500, but there was no definition for the patients with the CD4+ T-cell absolute number in the interval of (200,500).

The lack of a consensus standard for the definition of INR will adversely affect the advancement of scientific research and clinical diagnosis. In terms of scientific research, due to the different definitions of INR, it was difficult to understand and compare similar research results, which affected the credibility and reliability of these research results. In terms of clinical diagnosis, most doctors would judge whether a patient was INRs based on guidelines [8] and understanding of INR by using similar indicators or adjacent time. For example, when a patient came to see a doctor on the 13th month and 12 days after receiving ART, the CD4+ T-cell count absolute number and the conversion rate at the 12th month in the guideline were generally used to determine whether the patient was INRs. This clinical judgment method for non-standard time was not accurate enough, and it may lead to misdiagnosis or overtreatment.

In order to solve the above problems, through systematic analysis of INR related references and visualization techniques, we found the best features to define INR firstly. Then we trained the supervised learning classification models and obtained the optimal supervised learning classification model. Finally we proposed some INR definition criteria for references based on the best supervised learning classification model, so as to assist doctors and researchers to carry out diagnosis, treatment and research work.

Methodology

Study design

Data source and definition:

References data: References data was obtained through the websites of https://pubmed.ncbi.nlm.nih.gov/ and https://www.cnki.net/. The searching keywords were from Table 1 of the references [2]. The search language of https://pubmed.ncbi.nlm.nih.gov was English, and the search languages of https://www.cnki.net/ were Chinese and English.

  Number (%) of patients  
INRs IRs Total p
(n=459) (n=192) (N=651)
Age (years) 0.808
Mean 50.27 50.57 50.36  
Median (range) 51.0(39-60.6) 51(37-64) 51(38-62)  
Sex, n (%) <0.001
Male 368(80.17%) 159(82.81%) 527(80.95%)  
Female 91(19.83%) 33(17.19%) 124(19.05%)  
Last CD4+T cell absolute number(cells/uL) 0.435
Mean 275.61 566.17 361.31  
Median (range) 273(189.5-362) 564(427-660.5) 329(229-462.5)  
ART time(months) <0.001
Mean 48.61 38.48 45.62  
Median (range) 46.9(24.85-71.25) 29.35(16.3-61) 41.9(21.45-69.55)  

Table 1: Basic information of the data set.

Training data of the classification model: Among all the INR definition related references [1-3,6,7,9-136] retrieved in this paper, only the reference [59] provided the original INR open sourced data. The reasons why the original data sources of other references cannot be obtained included: The data sources website cannot be opened, the corresponding author needed to be contacted, or only the data after statistical analysis was provided. In order to ensure the credibility of the supervised learning classification modeling results, we used some related data from the electronic medical record database of Chongqing Public Health Medical Center for analysis.

Definition of INR and IR in the data source: For open source data of reference [59], we used the original data. For the data in the electronic medical record database, we assumed that INRs were defined as patients who have adopted INR interventions methods (including the use of Thymalfasin, Thymopentin, Recombinant Human Growth Hormone, Aikeqing Capsule, Peiyuan Capsule, Tang Herb Tablets, Mushroom Polysaccharides, etc.) [5] or were recorded as INR in the cases, and other PLWH were defined as IRs. Both the INRs and IRs reached the standard of virological suppression. We combined the open source data and the data from electronic medical record database into one data set. The basic information of the data set was shown in Table 1.

First of all, through references analysis, we found the associated features of the INR definition, and visualized the relationship between these features and INR. Secondly we proposed hypotheses, used the supervised learning model to classify INR and IR, and used cross-validation and grid search in the supervised learning modeling process to prevent overfitting and obtain the best INR evaluation model and its corresponding parameters. Finally, we proposed some criteria on the definition of INR based on the best supervised learning model. The flowchart of this paper was shown in Figure 1.

Antivirals-Antiretrovirals-paper

Figure 1: Flowchart of this paper.

In order to facilitate other researchers to rebuild the models in this paper and carry out more in-depth researches, we open source the entire modeling process and source code in the paper. The source code was available in the supplement.

Results

Features Analysis of related to the definition of INR

The references used in the papers were all derived from the published literature related to INR. We used the method of literature [2] to systematically analyze the references and sorted out the definition standards of each paper for INR. Through literature analysis, we summarized the following consensus regarding the definition of INR:

1. The HIV-1 antibody test of the patient was positive

2. The patient has received ART for more than 6 months

3. The patient has achieved virologic suppression or reached the virologic suppression standard in the patient's area

4. The CD4+ T-cell absolute number of the patient failed to return to normal level

At the same time, we sorted out the features related to the definition of INR, and its visualization map was shown in Figure 2. It can be seen from this figure that the features related to the definition of INR included CD4+ T-cell count absolute number, CD4+ T-cell count change number, CD4+ T-cell count growth rate, CD4/CD8 ratio, ART time and Virologic Suppression(VS) time.

Antivirals-Antiretrovirals-map

Figure 2: Visualization map of the features related to INR definition.

Features selection

We summarized the six features found in the previous section into two categories, which were called the medical test features(including CD4+ T-cell count absolute number, CD4+ T-cell count change number, CD4+ T-cell count growth rate and CD4/CD8 ratio) and the time features(including ART time and VS time). The usage frequency of each feature to define INR was shown in Table 2.

Feature Number of references Number(n) Percentage(100%)
133 100
Medical test feature CD4+ T-cell count absolute number 115 86.47
CD4+ T-cell count change number 25 18.8
CD4+ T-cell count growth rate 13 9.77
CD4/CD8 ratio 4 3.01
Time feature ART time 102 76.69
VS time 51 38.35

Table 2: INR definition related features and the usage frequency of these features.

It can be seen from Table 1 that the CD4+ T-cell count absolute number in the medical test features was used the most times, and the ART time in the time features was used the most times.

By comparing all the medical test features, we can find that the CD4+ T-cell count absolute number can be obtained every time when a patient went to the hospital. This feature did not need to compare with the previous test values (including the absolute value of the baseline CD4+ T-cell count), but the CD4+ T-cell count change number and the CD4+ T-cell count growth rate needed comparison values. Compared with CD4/CD8, the CD4+ T-cell count absolute number used to define INR had been recognized by more scholars. Therefore, the CD4+ T-cell count absolute number was the optimal feature in medical test features for defining INR.

By comparing the time features, we found that the ART time can be calculated from the time when the patient received ART, which was easy to calculate, and the calculation standard was uniform.

The acquisition of the VS time required the detection of the viral load of HIV RNA. However, the current standards for virologic suppression were not uniform, as shown in Table 3. The acquisition of VS time required the patient to go to the hospital to check again after receiving ART. When the patient achieves VS standard after receiving ART treatment without checking, this time cannot be accurately obtained. Therefore, ART time was the optimal feature in medical time features for defining INR.

No HIV-1 RNA viral load (copies/ml) References number Sum
1 20 13,20,50,88 4
2 40 37, 83,93,106 4
3 40 to 75 102 1
4 48 100 1
5 50 7,15,19,21,22,31,34,36,39,43-45,48,49,52-55,61,69,74,77,92,95,97,98,103,105,108,130,134,136 32
6 75 29 1
7 200 94 1
8 400 14,63,135 3
9 500 136 1
10 1000 24,60 2

Table 3: VS Standards of HIV-1 RNA Viral load.

Based on the above analysis, we chosen both the CD4+ T-cell count absolute number and ART time as the features to define INR. By selecting the references that only used the CD4+ T-cell count absolute number and the ART time, we hoped to discover the relationship among the definitions of INR and Immunological Response (IR) through visualization techniques. The definition and relationship of INR and IR were shown in Table 4 and Figure 3.

Antivirals-Antiretrovirals-point

Figure 3: Relationship between INR and IR displayed by CD4+ T-cell count absolute number and ART time from References.
Note: In Figure 3, blue was used to indicate the repetitive points that occur when CD4+ T-cell count absolute number and ART time were used to define INR and IR at the same values. The larger the area of the point in the figure, the more references that used the definition.

No Definition of INR Definition of IR Number of overlaps
CD4+ T-cell count absolute number ART time Sum References number CD4+ T-cell count absolute number ART time Sum References number
1 200 6 2 16,61 200 6 1 61 3
2 200 12 4 6,17,64,135 250 12 2 14,64 6
3 200 24 10 7,13,15,18-21,68,70,88          
4 200 48 1 11          
5 200 60 1 67          
6 250 12 2 14,22          
7 250 24 7 23-27,35,86 250 24 5 24,25,26,27,35 12
8 250 36 1 28 250 36 1 28  
9 350 6 1 29          
10 350 9 2 30,31          
11 350 12 8 12,32-34,62,81,87,109 350 12 2 17,32 10
12 350 24 16 1,9,36-46,51,59,69 350 24 6 9,36,37,51,59,69 22
13 350 48 2 47,60 350 48 2 11,47 4
14 350 120 1 48 350 120 1 48 2
15 400 12 1 49 400 12 1 33 2
16 400 24 1 50 400 24 1 39 2
17 490 12 1 55 490 12 1 55 2
18 500 12 3 52,53,65 500 12 4 6,22,62,135 7
19 500 48 3 10,73,80 500 48 2 73,80 5
20 500 60 1 54 500 60 2 54,67 3
          500 24 11 7,15,18,19,20,38,41,42,43,44,45  
          600 24 1 50  

Table 4: Relationship to define INR and IR from references by CD4+ T-cell count absolute number and the ART time.

From Figure 3, we can find that with regard to the definition of INR and IR, because different references had different understanding of INR, there was a phenomenon of data overlap. For example, when ART time=24, the overlapping of CD4+ T-cell count absolute number included 250,350 and 400.

In order to show the relationship more clearly, we processed the definition of INR and IR in the references in the following way:

1. Deleted all symbols in the definition, including >, ≥, <, ≤.

2. For the definition of INR, if the same ART time corresponded to multiple CD4+ T-cell count absolute number, the lowest value was used.

3. For the definition of IR, if the same ART time corresponded to multiple CD4+ T-cell count absolute number, the highest value was used.

4. For data in a defined range, the minimum value of the range was taken. For example, if the art time range was 6-12 months, then the art time was taken as 6 months.

5. When the CD4+ T-cell count absolute number of INR and IR were at the same ART time, since the value of INR usually contained < or ≤, and the value of IR usually contained > or ≥, in order to show the difference between INR and IR, when displaying the CD4+ T-cell count absolute number, we set the CD4+ T-cell count absolute number of INR to -20, and the CD4+ T-cell count absolute number of IR to +20.

6. If the CD4+ T-cell count absolute number corresponding to a certain ART time was less than the value at the previous time point but greater than the value at the next time point, this point would be deleted.

Based on the above processing method, the relationship of INR and IR displayed by CD4+ T-cell count absolute number and ART time obtained from references was shown in Figure 4. From Figure 4, We found that between the two types of INR and IR, a line drawn by the CD4+ T-cell count absolute number and ART time may distinguish between INR and IR. This line may be a straight line (the red line in Figure 4) or a curved line (the black line in Figure 4).

Antivirals-Antiretrovirals-displayed

Figure 4: Relationship between INR and IR displayed by CD4+ T-cell count absolute number and ART time.

Classification result

Following the research in the previous section, we converted the distinction between INR and IR into a supervised binary classification problem. In order to facilitate the calculation of the model, we proposed the following assumptions:

1. There was a certain mathematical relationship between CD4+ T-cell count absolute number and ART time. The model established by this mathematical relationship can be used to classify INR and IR.

2. Every doctor was scientific and credible for the diagnosis and medication of INR.

3. Considering the serious harm of INR to the patient’s physical condition, our definition of INR referred to the pessimistic principle in management [137]. When a patient receives INR intervention treatment, but their CD4+ T-cell count was within the normal range, we still define it as INR.

Based on the above assumptions, we used the typical supervised learning classification algorithm in machine learning to obtain a model that can accurately determine INR through training. We used the currently popular machine learning classification models for modeling, including K-Nearest Neighbor (KNN), Least Absolute Shrinkage and Selection Operator (Lasso), Ridge Regression, Support Vector Machine(SVM), Decision Tree(DT), Gradient Boosting Classifier(GBC), Logistic Regression(LR) and Multilayer Perceptron(MLP). We used Cross-validation score (cross_val_score) to determine the optimal classification model.

In order to avoid over-fitting and obtain the optimal classification model, we adopted the shuffle-split cross-validation method, which independently controlled the number of iterations in addition to the size of the training set and the test set. The proportions of the training set and the validation set were defined as 50% and 30% respectively to ensure that a part of the data did not participate in the training in each training time. The detailed modeling process was shown in Part 2 of the supplement. Through modeling analysis, the cross_val_score of each model and its corresponding optimal parameters were shown in Table 5.

No Model name Optimal hyperparameters Cross_val_score
1 KNN 'algorithm': 'ball_tree', 'leaf_size': 10, 'n_neighbors': 2, 'metric': 'chebyshev', 'weights': 'distance' 0.9855
2 Lasso 'alpha': 0.001, 'selection': 'cyclic', 'max_iter': 10000, 'tol': 0.0001 0.5949
3 Ridge 'alpha': 0.001, 'solver': 'cholesky', 'max_iter': 1000, 'tol': 1e-06 0.5986
4 SVM 'kernel': 'rbf', 'gamma': 10, 'C': 100, 'max_iter': 10000, 'tol': 0.0001 0.9911
5 DT 'criterion': 'entropy', 'splitter': 'best', 'max_depth': 5, 'min_samples_leaf': 5 0.9461
6 GBC 'learning_rate': 0.001, 'n_estimators': 90, 'max_depth': 5, 'min_samples_split': 800, 'min_samples_leaf': 60 0.7118
7 LR 'solver': 'liblinear', 'penalty': 'l1', 'C': 10, 'max_iter': 1000, 'tol': 0.01 0.9669
8 MLP 'hidden_layer_sizes': [20, 20], 'activation': 'identity', 'solver': 'lbfgs', 'alpha': 0.01, 'learning_rate': 'constant', 'max_iter': 10000, 'tol': 0.001 0.9675

Table 5: Cross-validation score of the supervised learning model and the corresponding optimal hyper parameters.

It can be seen from Table 5 that SVM had the best Crossvalidation score. We can use the SVM model with the optimal hyper parameters to define INR. The result of using SVM for classification was shown in Figure 5.

Antivirals-Antiretrovirals-classify

Figure 5: Result by using SVM to classify INR and IR.

Discussion

Reliability analysis of results and recommended INR definition interval

Due to the influence of the data amount and outliers on the credibility of the model, the results of the training model in this paper were only valid for the current data. For example, in open source data of reference [59], patients with CD4+ T-cell absolute number=600 and ART time=106.8 were defined as INRs. Normally, no matter what value of the ART time, patients with CD4+ T-cell absolute number=600 should be regarded as the IRs, but the author of this paper defined him as INRs. These data may affect the credibility of the model.

At the same time, although the SVM model can assist in the definition of INR, because this was a supervised learning model, it was not convenient for clinicians to quickly determine whether the patient was INRs or IRs. We also tried semi-supervised learning algorithms and unsupervised learning algorithm, but their classification accuracy and the interpretability were not as good as supervised learning algorithms. The programming codes of semi-supervised learning algorithm and unsupervised learning algorithm was included in Part 3 and Part 4 of the appendix.

In order to facilitate scientific researchers and clinicians to quickly and accurately defined the INR, through the supervised learning classification model, we gave the recommended reference values of the CD4+ T-cell absolute number in each time period of the ART, as shown in Table 6. Based on our judgment, if the CD4+ T-cell absolute number of a patient at a specific ART time was less than the corresponding value in the table, it was considered that the patient has a high probability of belonging to INRs.

ART time CD4 ART time CD4 ART time CD4 ART time CD4 ART time CD4
6 199 17 338 28 404 39 448 50 482
7 219 18 345 29 409 40 452 51 484
8 237 19 353 30 413 41 455 52 487
9 253 20 359 31 418 42 458 53 489
10 267 21 366 32 422 43 461 54 492
11 280 22 372 33 426 44 465 55 494
12 291 23 378 34 430 45 468 56 497
13 302 24 384 35 434 46 470 57 499
14 312 25 389 36 438 47 473 58 501
15 321 26 394 37 441 48 476 59 504
16 330 27 399 38 445 49 479 60+ 506

Table 6: Criteria of CD4+ T-cell absolute number at each ART time for defining INR.

Considering that the SVM model was affected by the amount and quality of data, we simplified Table 6 and proposed criteria for defining INR by using CD4+ T-cell count absolute number and ART Time, as shown in Table 7.

ART time CD4 ART time CD4
6 200 30 410
9 250 36 440
12 300 42 460
18 350 48 480
24 380 60 500

Table 7: Simplified criteria of CD4+ T-cell absolute number at each ART time for defining INR.

Availability of other relevant features for defining INR

In our supervised learning classification model, due to the reasons mentioned in the Features selection section, we did not use CD4+ T-cell count change number, CD4+ T-cell count growth rate, CD4/ CD8 ratio and VS time to define INR. This did not mean that we thought these features were meaningless for defining INR. If these data existed in actual use, on the basis of the recommended criteria in this paper, we can regard the values of these features as adjuvant standards to define INR. For example, the CD4/CD8 ratio less than 1 could help define INR.

Conclusion

On the road to overcome AIDS, INR was still an important research area. Finding the features and methods that accurately define INR will help us understand INR more accurately and discover the pathogenesis and interventions of INR. Through systematic literature review, visualization analysis, and machine learning modeling, we have discovered the consensus, features, and supervised classification methods that could define INR. In the future, we will collect more INR related data; introduce more features and classification methods to obtain better ways to define INR.

Ethics Approval and Consent to Participate

The study complied with the principles of the Declaration of Helsinki and was approved by the Human Science Ethics Committee of Chongqing Public Health Medical Center. The Human Science Ethics Committee authorized the waiver of informed consent based on the observational nature of this study, and the ethics review approval document number: 2021-005-01-KY. The data extracted from the electronic medical record database was studied anonymously.

Acknowledgements

We would like to express our gratitude to all participants involved in this study and the funder of the research.

Availability of Data and Material

The entire modeling process and source code could be obtained by the corresponding author. The datasets used or analyzed during the study were owned by Chongqing Public Health Medical Center. Since the data includes sensitive patient data and involves some patents under development, the data can only be shared after being authorized by the corresponding author and the organization.

Funding

This study was supported by the Chongqing Science and Technology Bureau Project ( cstc2019jscx-fxyd0298, cstc2020jscx-cylhX0001).

Consent for Publication

All authors have provided consent.

Competing Interests

Yong Shuai and Hemeng Peng contributed equally to this work.

References

Author Info

Yong Shuai1,2*, Hemeng Peng1,4, Xiaodong Wang1,3 and Xiaoqing Peng1,3
 
1Chongqing CEPREI Industrial Technology Research Institute Co., Ltd, Chongqing, 401332, China
2Chongqing Public Health Medical Center, Chongqing, 400036, China
3Chongqing Key Laboratory of Reliability Technologies for Smart Electronics, Chongqing, 401332, China
4Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
 

Citation: Shuai Y, Peng H, Wang X, Pend X (2022) A Method for the Definition of Immunological Non-Response to Antiretroviral Therapy Based on Review Analysis and Supervised Classification Model. J Antivir Antiretrovir. S24: 005.

Received: 08-Mar-2022, Manuscript No. JAA-22-16179; Editor assigned: 11-Mar-2022, Pre QC No. JAA-22-16179 (PQ); Reviewed: 25-Mar-2022, QC No. JAA-22-16179; Revised: 28-Mar-2022, Manuscript No. JAA-22-16179 (R); Published: 04-Apr-2022, DOI: 10.35248/1948-5964-22.14.005

Copyright: © 2022 Shuai Y, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Top