Creation and Validation of an Algorithm Predicting Recurrence of Atrial Fibrillation Following Pulmonary Vein Isolation Utilizing Real-World Data and Ensemble Modeling Techniques
Clinical & Experimental Cardiology

Clinical & Experimental Cardiology
Open Access

ISSN: 2155-9880

Research Article - (2023)Volume 14, Issue 1

Creation and Validation of an Algorithm Predicting Recurrence of Atrial Fibrillation Following Pulmonary Vein Isolation Utilizing Real-World Data and Ensemble Modeling Techniques

Adam E. Berman1*, Deepak Nag Ayyala2, Paul Maddux1, Aaron Gopal1 and William White1
*Correspondence: Adam E. Berman, Department of Medicine, Medical College of Georgia, Augusta University, Augusta, USA, Email:

Author info »


Objective: Catheter Ablation (CA) of Atrial Fibrillation (AF) represents a mainstay of the treatment of this increasingly prevalent arrhythmia. Prospective clinical trials investigating the efficacy of CA may poorly represent real-world patient populations. However, many real-world clinical data sets possess missing data, which may impede their applicability in research. Thus, we sought to use ensemble modeling to address missing data and develop a model to estimate the probability of AF recurrence following CA.

Methods: We retrospectively analyzed clinical variables in 476 patients who underwent an initial CA of AF. Univariate and multivariate logistic regression was performed to determine those variables predictive of AF recurrence. A multivariate logistic model was created to estimate the probability of AF recurrence after CA. Missing data was addressed using ensemble modeling and variable selection was performed using the aggregate of multiple models.

Results: After analysis, six variables remained in the model: AF during post-procedural blanking period, coexistence of atrial flutter, end-stage renal disease, reduced left ventricular ejection fraction, prior failure of anti-arrhythmic drugs, and valvular heart disease. Predictive modeling was performed using these variables for 1000 randomly partitioned datasets (80% training, 20% testing) and 1000 random imputations for each partitioned dataset. The model predicted AF recurrence with an accuracy of 74.34% ± 3.99%.

Conclusion: We successfully identified six clinical variables that when modeled, predicted AF recurrence following CA with a high degree of classification accuracy. Application of this model to patients undergoing CA of AF may help identify those at risk of post-procedural AF recurrence.


Electrophysiology; Atrial fibrillation; Real world data; Modeling; Catheter ablation


Catheter Ablation (CA) represents an increasingly utilized interventional technique aimed at reducing or eliminating the frequency and duration of episodes of symptomatic Atrial Fibrillation (AF). While CA has been shown to reduce the burden of AF, it has not been demonstrated to reduce the risk of stroke or death [1]. Consequently, CA is typically recommended for patients experiencing symptomatic AF. Considering the costs and risks associated with CA of AF, enhanced predictive models of ablation success derived from modeling techniques represent an attractive tool to physicians and AF patients alike. While generally low risk, CA of AF has been associated with complication rates that are not insignificant. Additionally, the recently published CABANA study’s findings reiterated that AF recurs in roughly half of patients undergoing AF ablation at 5-years’ follow up [2]. Arbelo et al. reported an overall complication rate of Pulmonary Vein Isolation (PVI) procedures approaching 7.8%; although the rate of complication in the ablation arm of CABANA was lower [3]. Consequently, balancing the combination of AF ablation success rates and procedural risk factors into the shared decision-making process between a PVI procedural candidate and their cardiac electrophysiologist.

Numerous studies have examined the relationship between various patient characteristics and recurrence of atrial arrhythmias post-PVI. Underlying cardiovascular disease, valvular heart disease, increased age, AF classification (i.e. persistent versus paroxysmal), left atrial dimension, and presence of obstructive sleep apnea are patient characteristics associated with post-PVI atrial arrhythmia recurrence [4-12]. Early, as well as late, recurrence of AF and post-ablation atrial arrhythmias have various consequences as well [13]. In addition to clinical characteristics and recurrence monitoring, ablation technique and use of antiarrhythmic drug therapy to supplement CA have been shown to serve as predictors of procedural success [14-17] (Supplementary Table).

Real world challenges to AF patient data acquisition

Traditional prospective clinical studies rely on consistent access to pre-defined patient characteristics when formulating a predictive model for procedural success and outcomes. In conventional clinical settings, however, patient data is often incomplete or missing during review of population level databases. If a patient’s clinical information is unidentifiable or unobtainable, these meaningful patient data are frequently considered incomplete and may go unconsidered. A potential remedy to this widespread challenge is performing imputation in an effort to reasonably predict missing data points, thus allowing partial patient information to remain relevant while permitting its utilization in drawing qualified conclusions [18,19]. We hypothesized that creating a predictive algorithm through the use of ensemble modeling techniques on common clinical variables including imputed missing data could accurately predict the recurrence of atrial arrhythmias following CA of AF in a real-world setting.


Patient population

Participants included all patients (n=476) undergoing their first pulmonary vein isolation ablation between June 2011 and December 2017 at a tertiary medical center, each of whom had at least one ECG or 24-hour Holter monitor performed after the 90-day blanking period, but prior to 1-year post-ablation. In the sampled clinical population, patients were post-operatively followed by either their referring provider or clinical electrophysiologist. Management of recurrent symptomatic AF was either performed by the procedural electrophysiologist or the patient’s referring physician. Patients were excluded if they had undergone a previous MAZE procedure. One patient was excluded due to her death prior to the passage of a year following her ablation procedure. Data from these patients was collected and analyzed retrospectively after approval from the Institutional Review Board.

Data gathering procedure

Information used to determine the clinical variables status was charted prior to the CA, except for AF during the blanking period and antiarrhythmic drug status post-ablation. A retrospective chart review was performed and data was collected on the following variables: age at the time of ablation, sex, body mass index, AF type, method of CA energy delivery (eg: cryoballoon vs. radiofrequency), moderate or worse valvular heart disease, moderate or worse left ventricular concentric hypertrophy, coronary artery disease, history of myocardial infarction, evidence of prior reduced Left Ventricular Ejection Fraction (LVEF), heart failure with preserved EF (HFpEF), hypertension, prior transient ischemic attack, prior failure of antiarrhythmic drug, prior cardiac surgery, end-stage renal disease, coexistence of atrial flutter, antiarrhythmic drugs prescribed prior to ablation, antiarrhythmic drugs prescribed for at least one year following ablation, AF during post-procedural blanking period, and time since initial AF diagnosis. The LVEF, Left Atrial Diameter (LAD), and Left Atrial Volume Index (LAVI) were also included for analysis only when an echocardiogram had been performed less than 6 months before the ablation. AF type was categorized as paroxysmal: AF episodes were intermittent lasting less than 1 week; persistent: AF episodes lasting greater than 1 week but less than 1 year; and long-standing persistent; AF episode lasting greater than 1 year. Clinical success was defined as the absence of a documented atrial arrhythmia, following the 90-day blanking period of greater than 30 seconds at the end of 12 months following ablation.

Model generation

For five variables (LVEF, LAD, LAVI, LV concentric hypertrophy and months since initial AF diagnosis) with missing observations, data was randomly imputed. The observed data was used to identify appropriate models for the imputation process. The variables with complete data were compared between patients with arrythmia recurrence and those who remained free of arrythmia. For categorical variables, relative frequencies, odds ratios (OR) using a specified baseline category, 95% confidence intervals for the OR and p-values computed using Fisher’s exact test are reported. For continuous variables, mean, 95% confidence interval and the p-value computed using a t-test is reported. All analyses were performed in R™ (Vienna, Austria) ver. 3.5.0. Statistical significance was assessed using n=0.05.

Next, a logistic regression model was developed to identify factors associated with the recurrence of arrhythmia. To avoid sampling artifacts in the variables where imputed data was utilized, 1,000 imputed data sets were generated, and the regression model was fit on all the imputed data sets. Using all available variables, the logistic regression model is fit with forward stepwise regression using forward variable selection to determine which variables maximize the ability of the model to correctly predict AF recurrence. Using bagging to combine the results from these ensemble methods, 6 variables were selected for inclusion to estimate the probability of recurrence of atrial arrhythmia within 12 months of the procedure. The following equation gives the estimated model to predict the probability of recurrence of atrial arrhythmia;


Where XAF: AF documented during blanking period; XCTI: Coexistence of atrial flutter; XESRD: End-stage renal disease; XPriorEF: Prior reduced left ventricular ejection fraction; XFailed.Drugs: Prior failure of antiarrhythmic drugs; and XVHD: Presence of valvular heart disease.

Model validation

To study the model’s strength in predicting the recurrence of atrial arrhythmia, the classification model is divided randomly into two groups: training data, consisting of 80% (n=380) of the patients to construct the model; and testing data, consisting of the remaining 20% (n=96) of the patients. Division of data into training and testing data sets was performed after random imputation. To avoid sampling artifacts, we considered 1000 randomly imputed data sets. For each imputed data set, the samples were divided randomly into 1000 training and testing data sets. For the models, we used only the variables selected through forward selection procedure (as described previously). After estimating the model coefficients using the training data, the probability of recurrence of atrial arrhythmia is predicted for the patients in the testing data set. Patients with a predicted probability greater than 50% are classified as having recurrence of atrial arrhythmia. Comparing against the observed recurrence of atrial arrhythmia, accuracy of the prediction is calculated for the data set as the percentage of patients who are correctly classified. Accuracy of the ensemble models are combined using bagging and the mean accuracy of the 1000 training/testing data sets is recorded.


Using the previously discussed method for model creation, six variables were selected that maximized the predictive value of the model. The addition of any other variables to the model did not increase the accuracy of outcome prediction. These variables were AF during post-procedural blanking period, coexistence of atrial flutter, end-stage renal disease, prior reduced LVEF, prior failure of anti-arrhythmic drugs, and moderate or worse valvular heart disease. The total number of models that selected each of the variables is shown in Table 1. In univariate analysis (Tables 2 and 3), valvular heart disease (p=0.0011), AF classification (p=0.0035), evidence of prior reduced EF (p=0.0124), remaining on antiarrhythmic drugs post-procedure for 12 months (p<0.0001), AF documented during blanking period (p<0.0001), LAD (p=0.0125), months since initial AF diagnosis (p=0.0368), and LAVI (p=0.0492) all achieved statistical significance. In multivariate analysis, AF documented during the blanking period (p < 0.0004) and valvular heart disease (p=0.0256) remained statistically significant individual predictors. Not all of the variables determined to be significant in uni-variate analysis strengthened the predictive value of the multivariate model. Coexistence of atrial flutter was the only variable included in the model that decreased the chance of AF recurrence.

Clinical variables Frequency of models selecting the variable1 Frequency of significance2 Mean p-value ± SD p-value range
(minimum, maximum)
AF documented during blanking period 1000 1000 <0.0001 ± < 0.0001 (< 0.0001, < 0.0001)
Coexistence of atrial flutter 963 2 0.111 ± 0.016 (0.043,0.162)
End stage renal disease 893 0 0.137 ± 0.012 (0.078,0.170)
Prior reduced LVEF 893 190 0.065 ± 0.018 (0.020,0.150)
Prior failure of anti-arrhythmic drugs 705 5 0.086 ± 0.016 (0.040,0.146)
Moderate or worse valvular heart disease 1000 906 0.026 ± 0.016 (0.002,0.098)

Table 1: Results of multivariable analysis.

Clinical variables Total number of patients in each category No AF reoccurrence1 AF p-value   Odds ratio CI1
  # (%) # (%) # (%)    
Valvular heart disease n=476 n=277 n=199 0.001 [1.333,3.366]
No 371 78% 231 83% 140 70%    
Yes 105 22% 46 17% 59 30%    
Atrial fibrillation n=476 n=277 n=199 0.004  
Paroxysmal 322 68% 203 73% 119 60%    
Persistent 134 28% 67 24% 67 34%    
Longstanding persistent 20 4% 7 3% 13 7%    
Evidence of prior reduced LVEF2 n=476 n=277 n=199 0.012 [1.098,2.586]
No 347 73% 214 77% 133 67%    
Yes 129 27% 63 23% 66 33%    
Antiarrhythmic drug post-procedure for 12 months n=476 n=277 n=199 <0.0001 [1.549,3.405]
No 287 60% 190 69% 97 49%    
Yes 189 40% 87 31% 102 51%    
AF documented during blanking period n=476 n=277 n=199 <0.0001 [6.230,17.433]
No 344 72% 250 90% 94 47%    
Yes 132 28% 27 10% 105 53%    

Table 2: Results of univariable analysis: Significant categorical variables.

Continuous Variables No Reoccurrence1 Reoccurrence2 CI of Difference3 p-value
  ± SD1 ± SD1    
Left Atrial Diameter (cm) 4.09 ± 0.73 4.34 ± 0.74 (-0.455, -0.056) 0.013
Months since initial AF diagnosis (mo.) 33.37 ± 47.61 46.69 ± 66.1 (-25.759, -0.871) 0.037
LAVI3 (mL/m2) 35.5 ± 13.5 39.81 ± 14.16 (-8.611, -0.016) 0.049

Table 3: Results of univariable analysis: significant continuous variables.

Using the previously discussed method for model validation, for the 1000 randomly imputed data sets, the accuracy of the model is 74.34% with a standard deviation of 3.99% when evaluated on the testing cohort. That is, the model correctly predicts AF recurrence status of 74.34% of the patients in the testing cohort. When the analysis is performed using all of the variables, the accuracy of the model over 1000 randomly imputed data sets is 71.05% with a standard deviation of 4.21%. This indicates that the variables selected previously by the stepwise procedure are sufficient to achieve similar prediction accuracy in the model.


We describe uni-variate and multivariate analysis of common clinical variables in predicting the recurrence of AF following PVI ablation using retrospective analysis of our institutional database. Many of the variables utilized in this study have been described in prior literature [20-24]. In our analysis, we demonstrate that it is possible to predict with respectable reliability, AF recurrence in post-CA AF patients, potentially better stratifying those patients who derive greater clinical benefit from PVI procedures. To create a dataset that better reflects actual practice at a large, regional referral center we did not exclude patients from our analysis owing to missing data. Rather, we sought to develop a model that accommodates for missing data frequently encountered in real-world practice via the use of statistical imputation according to well described techniques [18]. The use of ensemble modeling techniques helps to minimize the uncertainty associated with the imputation of missing data [25]. Furthermore, ensemble modeling creates a model with higher predictive accuracy compared to when just a single model is utilized. Imputation methods used in the creation of ensemble models also allow us to retain variables with missing values, which are otherwise discarded.

The two variables that were found to be significant in univariate analysis and were selected for inclusion by the model (i.e., moderate, or greater valvular heart disease and AF during the blanking period), are variables we are confident play a significant role in predicting diminished success of PVI procedures. These variables have also been selected in prior large retrospective reviews of ablation registries [22-24]. Recurrent AF during the blanking period was also addressed by the ADVICE trial [23]. In this study, significant rates of atrial arrhythmia relapse were noted in patients experiencing documented AF during the post-CA blanking period. A study published in 2021 by Kim et al found that 69.6% of the 751 patients who had recurrence of AF in the first 90-days following CA also had late recurrence [26]. Another study by Yanagisawa and colleagues found that patients who underwent an early repeat CA following AF documented during the post-CA blanking period experienced significantly reduced AF recurrence rates when compared to those who did not undergo CA during the blanking period [27]. Additionally, a 2011 pilot study published by Pokushalov et al showed a significant benefit in early CA for those with AF recurrence in the blanking period with AF initiated by atrial tachycardia, atrial flutter, or premature atrial beats [28]. Compared to deferring repeat CA until evidence of AF after the blanking period, a similar randomized control trial comparing early intervention in a high risk subset of patients with early recurrence may be warranted.

Intriguingly, the type of atrial fibrillation, whether paroxysmal or longstanding-persistent was found to be significant in univariate analysis, as was seen in multiple prior studies. However, in multivariable analysis, the type of AF did not remain in the model [22,24]. This may indicate other factors associated with type of AF may play a role in determining post-CA AF recurrence. Additionally, evidence exists that supports the role of and could be predicated upon the level of structural remodeling of the left atrium on post-CA AF recurrence [29,30]. It is more likely, however, that AF duration offers greater accuracy than the standard classification, and thus may have more predictive value.

Study limitations

Our model was mainly limited in its creation owing to its retrospective nature and our single center experience. Other inherent differences in ablation technique, pre- and peri-procedural management, and patient selection may result in some variability, although in our experience this was partially offset given the high-volume nature of our center’s AF ablation specialists. The outcome of CA is reported as a binary variable with clinical success being defined as the absence of atrial arrhythmia following CA outside of the blanking period. Defining the outcome of CA in terms of percentage reduction in AF burden might allow success to be determined on a more clinically relevant spectrum. Current guidelines state that patients who have failed anti-arrhythmic drug therapy for AF rhythm control have an indication for CA thus limiting the clinical utility of this variable [31]. Another limitation we found was that the vast majority of the patient population is either Caucasian or African American, so the results of this study may be limited in its applicability to populations not encompassed by these two demographics.


We identified six clinical variables that predicted AF recurrence following CA with a high degree of classification accuracy using ensemble modeling techniques and real-world clinical data from a large tertiary-referral center. We propose that this model may offer patients and physicians alike an additional tool when discussing and managing AF ablation procedural outcomes in conventional clinical practice. Further studies examining the prospective utility and accuracy of this model as applied to patients undergoing CA of AF may further clarify its applicability in more routine and widespread clinical use.


We would like to thank our colleagues at Augusta University as well as the patients who underwent treatment for atrial fibrillation for providing us with the information necessary to make this project possible.

Sources of Funding

We have no sources of funding to report.


Author Info

Adam E. Berman1*, Deepak Nag Ayyala2, Paul Maddux1, Aaron Gopal1 and William White1
1Department of Medicine, Medical College of Georgia, Augusta University, Augusta, USA
2Department of Population Health Sciences, Medical College of Georgia, Augusta University, Augusta, USA

Citation: Berman AE, Ayyala DN, Maddux P, Gopal A, White W (2023) Creation and Validation of an Algorithm Predicting Recurrence of Atrial Fibrillation Following Pulmonary Vein Isolation Utilizing Real-World Data and Ensemble Modeling Techniques. J Clin Exp Cardiolog.14:768.

Received: 06-Dec-2022, Manuscript No. JCEC-22-20680; Editor assigned: 08-Dec-2022, Pre QC No. JCEC-22-20680 (PQ); Reviewed: 22-Dec-2022, QC No. JCEC-22-20680; Revised: 29-Dec-2022, Manuscript No. JCEC-22-20680 (R); Published: 10-Jan-2023 , DOI: 10.35248/2155-9880.23.14.768

Copyright: © 2023 Berman AE, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.