Journal of Drug Metabolism & Toxicology

Journal of Drug Metabolism & Toxicology
Open Access

ISSN: 2157-7609

Research Article - (2023)Volume 14, Issue 1

Accurate, fast and easy-to-understand Density Functional Tight Binding and Machine Learning QSAR for the DPPH and ABTS antioxidant activity of phenolic compounds based on No-Code Freeware

Andrés Halabi1,2*, Millaray Hernández1 and Hans Lenes2
*Correspondence: Andrés Halabi, Departamento de Investigación y Desarrollo, Los Alamos National Laboratory, Valdivia, Chile, Email:

Author info »


In this study, we developed various Quantitative Structure-Activity Relationship (QSAR) models for the 1,1-diphenyl-2-picrylhydrazyl (DPPH) and 2,2-azino-bis-3-ethylbenzthiazoline-6-sulphonic acid (ABTS) experimental values of 55 phenolic antioxidants based on Conceptual DFT descriptors calculated with the Density Functional Tight Binding (DFTB) GFN1-xľB. Machine learning algorithms were used for feature selection and regression analysis, and Leave-One-Out Cross-Validation was used for both multiple linear regression (MLR) and sequential minimal optimization regression (SMOreg). For ABTS activity, two models were obtained with a correlation coefficient of 0.94 (MLR) and 0.92 (SMOreg). For DPPH activity, two models were obtained with a Correlation Coefficient of 0.93 (MLR) and 0.91 (SMOreg). The number of phenolic groups in the molecule, Bond Dissociation Enthalpy and radical Fukui of the most active phenolic oxygen were enough to properly predict the Radical Scavenging Activity (RSA) of phenols. Both developed QSAR models were carried out following the Organisation for Economic Co-operation and Development (OECD) recommendations on QSAR models. Considering the importance of antioxidant activities in medicine, pharma, and food industries, this study proposes a highly valuable and cheap method. It is also extremely easy to understand as we need only three descriptors that are directly related to the known chemistry of the substances.


QSRA; DPPH; Oxidative Stress; Phenolic Structure


An imbalance between the production of Free Radicals (FR) and the ability of an organism to quench and eliminate those species leads to a series of, most often, unwanted effects (aging, cancer, peptic ulcers, and others). This process is called Oxidative Stress, which can be caused by a series of factors when endogenous and exogenous chemicals are the most recurrent ones. Most of the endogenous and exogenous FR humans are exposed to are Reactive Oxygen Species (ROS, mostly small inorganic or organic molecules derived from oxygen, very common examples are O2 − and H2O2 ) and Reactive Nitrogen Species (RNS, mostly small inorganic or organic molecules containing nitrogen). As is to be expected, ROS are much more reactive than other species; hence the study of Oxidative Stress and related topics is usually focused on those oxygen-containing species. Biological systems usually present Antioxidant (AOx) mechanisms to help them cope with these damaging species. Antioxidants can also be divided in endogenous and exogenous. Many endogenous antioxidants are enzymes (for example, superoxide dismutase); albumin and bilirubin are examples of non-enzymatic endogenous antioxidants. Organisms can use exogenous AOx obtained from nutrition to compensate for possible deficits in the ratio of AOx and FR, although today, we also have supplements and other pharmaceutical applications. The most common exogenous AOx are phenolic compounds [1,2]. Predicting the antioxidant activity is also extremely valuable for medicinal chemists and pharmaceutics, as many drugs have their therapeutic effects via antioxidant activity. This has been known for many years in the industry and the scientific community [3].

It is known that phenolic compounds tend to undergo one of two reaction mechanisms for quenching FR. Here ArOH corresponds to the phenolic AOx, and R is the FR to eliminate. It is important to note that both mechanisms result in the same products. One mechanism will be preferred over the other depending on the structure of the AOx compound, pH, and polarity of the solvation media [4,5].

1. Hydrogen Atom Transfer (HAT): R + ArOH RH + ArO

2. Single Electron Transfer -Proton Transfer (SET-PT):

Please note that the Rate-Determining Step (RDS) of the HAT mechanism corresponds to the homolytic rupture of the O-H bond, while for the SET mechanism, the RDS is the single transfer of one electron from the ArOH to the R . Hence, we will focus on these two RDSs for the in silico prediction of the Radical Scavenging Activity (RSA). 2, 2’-azino-bis-3-ethylbenzthiazoline-6- sulphonic acid (ABTS) and 1, 1-diphenyl-2-picrylhydrazyl (DPPH) are RSA tests very commonly used in the scientific community to assess the AOx activity. Even though both studied RSA tests were initially developed to test the HAT mechanism, it has been shown that both ABTS and DPPH may incur in HAT and SET mechanisms [6].

The present work presents a QSAR study to develop predictive models for the RSA of phenolic compounds. The QSAR endpoints of this study are the ABTS and DPPH tests results reported by [7] for 55 phenolic compounds isolated from a Chinese herb see (Table 1) for experimental data and (Supplementary Figure 1) for molecular structures. Density Functional Tight Binding (DFTB) approach was used to calculate physico-chemical properties of the compounds. Both global and local reactivity descriptors based on Conceptual DFT (CDFT) [8] and some reaction enthalpies were calculated. The number of phenolic groups in the structure was also considered as a structural descriptor (more details in Materials and Methods). Hydrogen Transfer for the HAT mechanism was approached by calculating the Bond Dissociation Enthalpy (BDE) of the compound’s phenolic hydrogens. The SET mechanism was approached by calculating the energies of the Frontier Molecular Orbitals and some CDFT descriptors described below related to electronic charge and movement. The energy of frontier molecular orbitales HOMO and LUMO, and some of their derivated descriptors, were also taken into consideration.

1 327-97-9 1.560 1.750 29 7770-78-7 0.104 0.085
2 583-17-5 0.930 0.840 30 528-43-8 0.209 0.197
3 588-30-7 0.820 0.750 31 81-54-9 1.930 2.000
4 69-72-7 0.037 0.052 32 72-48-0 1.070 1.050
5 99-06-9 0.025 0.069 33 81-64-1 0.548 0.035
6 99-96-7 0.028 0.059 34 478-43-3 0.076 0.056
7 121-34-6 0.092 0.056 35 481-74-3 0.069 0.042
8 1257-08-5 5.290 5.260 36 521-61-9 0.068 0.073
9 970-74-1 3.710 3.560 37 481-72-1 0.077 0.044
10 154-23-4 3.040 2.950 38 117-12-4 0.076 0.073
11 482-35-9 2.390 2.160 39 84-60-6 0.072 0.076
12 153-18-4 2.020 2.330 40 481-39-0 0.105 0.088
13 522-12-3 2.180 2.570 41 331-39-5 1.310 1.240
14 30311-61-6 1.560 1.630 42 99-50-3 1.150 1.290
15 487-52-5 2.420 2.270 43 989-51-5 5.950 6.090
16 94344-54-4 1.930 1.820 44 480-16-0 2.680 2.750
17 36338-96-2 1.430 1.360 45 520-18-3 1.590 1.320
18 491-70-3 2.180 2.240 46 548-83-4 1.120 0.705
19 480-40-0 0.081 0.053 47 60-82-2 1.790 1.240
20 5373-11-5 1.470 1.390 48 40957-83-3 0.097 0.020
21 578-74-5 0.083 0.050 49 529-59-9 0.077 0.026
22 10236-47-2 0.098 0.077 50 10083-24-6 2.530 2.350
23 520-26-3 0.104 0.075 51 92-61-5 0.383 0.210
24 486-66-8 0.101 0.033 52 29388-59-8 0.308 0.183
25 552-66-9 0.072 0.035 53 476-41-5 1.620 1.700
26 501-36-0 2.140 1.710 54 117-10-2 0.068 0.057
27 33171-05-0 1.180 1.020 55 152-84-1 0.073 0.070
28 580-72-3 0.253 0.207        

Table 1: Experimental data for the phenolic antioxidant molecules studied. The DPPH and ABTS values are reported in relative TEAC (mM).

Materials and Methods

Theoretical calculations of descriptors

As it is reported in [6], all the experimental values obtained are in form of Trolox Equivalent Antioxidant Capacity (TEAC) units. All 3D structures used for calculations were obtained directly from Chemspider as SDF files. The 3D phenolic structure was then uploaded to EntosAI and optimized along with the DFTB calculations. Global reactivity descriptors were obtained from DFT calculations [9-11]. ELUMO and EHOMO corresponded to the energies of the Lowest Unoccupied Molecular Orbital and the Highest Occupied Molecular Orbital, respectively. Properties like the Hardness (η, the resistance of the system to a change of charge, Equation 1), the Softness (S, the inverse of the hardness, Equation 2), the Chemical Potential (μ, measurement of the intrinsic strength of a Lewis acid or base, Equation 3) and the Absolute Electrophilicity Index (ω, the energy stabilization for an electron transfer, Equation 4) were calculated [12,13]. We also included the BDE × f*(O) as the multiplication of the BDE times the Radical Fukui Value of the most active phenolic oxygen. All structures were singlets having two electrons on their HOMO orbital.

All theoretical results were calculated at the DFTB [14] level of theory with the GFN1-xľB method [15] available on the EntosAI software package at the default configuration. The BDE energy obtained from EntosAI is the lowest BDE [kJ/ mol] energy corresponding to an O-H bonding in the molecule. The oxygen atom corresponding to that bond was used as the basis for the f-(O), f+(O), and f*(O) descriptors, corresponding to electrophilic, nucleophilic, and radical Fukui reactivity values for that oxygen atom. The f-(M), f+(M), and f*(M) correspond to the highest absolute value of the electrophilic, nucleophilic, and radical Fukui values in the whole molecule. The absolute energy of the molecule was also calculated. The HOMO [eV] and LUMO [eV] values were extracted from the Orbital calculations in EntosAI. From those HOMO and LUMO values, the η [eV], μ [eV], ω [eV], and S [eV] were calculated using Equations 1-4. The total number of OH groups n(OH) in the molecule was also considered. We also included the multiplication of the BDE times the Radical Fukui Value of the most active phenolic oxygen (BDE x f*(O)) as a descriptor.

QSAR and machine learning

Since the pharmaceutical industry set the best practice standard for using predictive models, with companies like Pfizer investing heavily into predictive research and saving several billion dollars a year from virtual screening techniques, quantitative structure- property relationship has become valuable assets for both research and industrial use [16]. QSAR is a well-known computational approach for chemical data sciences. QSAR is used to develop predictive models for a vast array of chemical activities based on the study of empirical data coupled with computational calculations and data science techniques [17]. The present work is a QSAR study for the predictive calculation of the ABTS and DPPH RSA values of phenolic compounds, based on theoretical calculations performed and statistically treated with almost zero cost. In the past, many theoretical studies have tried to predict antioxidant activities, with various degrees of success [18,19].

Machine learning is a field of artificial Intelligence and data science, where computational algorithms and complex mathematical models are used for data classification, clustering, and regression. Supervised machine learning algorithms have to meet a balance between the error induced by bias present in the data, as well as the variance of the model. Proper statistical validation is required to maximize the derived models' trustworthiness. Cross-Validation is commonly recognized as a very efficient and reliable method for statistical validation of machine learning and QSAR models [20]. The WEKA software [21,22] is a suite dedicated to implementing various machine learning methods (and other data science applications) for predictive classification or regression problems. As the goal of this study is to predict a certain RSA value, we used the WEKA tool for Feature Selection (dimensionality reduction) and the regression implementation. For this, we used the methods: Multiple Linear Regression (Linear Regression) and Support Vector Machine (SMOreg), together with feature reduction attributes.

Feature selection was carried out using the Principal component analysis (PCA) and Correlation-based Feature Selection (CFS) Subset evaluator options in the WEKA software wrapped over each of the endpoints ABTS and DPPH values individually. All feature selections were carried with the default configuration. For the CFS reductions, the conservative forward selection was preferred.

The final QSAR regression models were obtained for each endpoint by applying the Linear Regression and the SMOreg methods available in WEKA. The regression equation results and the corresponding statistical metrics for validation are described in the results section. All Regression models were validated with Leave-One-Out Cross Validation (LOOCV).

Results and Discussion

In this work, we performed a theoretical approach to study the relationship between the RSA and the electronic and thermodynamic properties of the systems. Theoretical and computational chemistry is a way of approaching molecules and their reactivities through computerized calculation methods. Many advantages arise from using these methods for predictions, including their low costs, high efficiency, and remarkable precision when studying the properties of electronic or thermodynamic systems. Computational chemistry is broadly used in areas such as pharmacy, inorganic chemistry, material sciences, spectroscopy, and many others as an alternative or previous step to performing expensive lab experiments [23-25].

Feature Selection

Principal components analysis: The PCA feature selection for both ABTS and DPPH endpoints yielded the following descriptors as results: ω, f+(O), f*(O), f-(O), f+(M), BDE, f-(M), f*(M) and BDE × f*(O). The n(OH) descriptor was manually added based on our chemical knowledge of the reaction mechanisms discussed above.

Interestingly, the Electrophilicity Index followed by the Fukui values for the oxygen atom involved in the active OH bond were selected as the most important descriptors. The electrophilicity was also considered in this feature selection, which could be referring to an electron transfer process. This makes sense from a chemical perspective, as the flow of electrons is directly addressed by ω. The Fukui functions, associated with the oxygen of the most active phenolic moiety, described the reactivity of that part of the molecule towards electrophilic, nucleophilic, and radical attacks. Both CDFT descriptors explain the HAT and SET mechanism, while the BDE energy gives insights into the HAT mechanism.

CFS subset evaluator: The CFS Subset feature selection for both endpoints ABTS and DPPH yielded the following descriptors as results: n(OH), BDE, BDE × f*(O), f*(M), S, and f-(M). It is important to note that the n(OH) and the BDE were selected as the most important descriptors for the RSA. These descriptors are directly related to the phenol moiety in the AOx molecule. It is known from the literature that some correlation could exist between the number of phenolic OH bonds and the RSA of certain molecules. At the same time, the BDE has been widely studied and found to be correlated to the AOx activity of certain compounds, especially under vacuum or non-polar phase conditions [26], because of the HAT mechanism. The radical susceptibility to electron transfer in molecular active sites has been studied before via the radical Fukui function [27]. All the electronic descriptors obtained here seem to be related to the electron transfer process; hence they are related to the SET Aox mechanism. BDE, n(OH), and BDEb × f*(O) appear to be the most important descriptors across both feature selection procedures. This makes sense from a mechanistic perspective. It is important to note that the descriptors appearing to bear more information about the RSA of the compounds are all descriptors that can be directly linked to the HAT or SET action mechanism of antioxidants discussed in the introduction.

Both PCA and CFS data were manually mixed and successfully reduced further without losing information. A final universal pruned dataset consisting of n(OH) and BDE x f*(O), will be used for the model regression for both DPPH and ABTS experimental values.

Regression Results

Below are the regression equations obtained for the predictive calculation of the ABTS and DPPH values with their corresponding error metrics. Both algorithms were used with standard configuration, without normalizing data, and LOOCV was used for all models. Calculated and experimental comparisons were evaluated with regression through the origin.

As we can see from (Table 2), the ABTS endpoint was approached with effectiveness and accuracy. The Multiple Linear Regression approach seems slightly more reliable for this response and dataset with a Q2=0.94, even though SMOreg also provided a Q2=0.91. In (Figure 1) the graphical relationship between experimental and calculated ABTS values is provided for both MLR and SMOreg algorithms.

LinearRegression ABTS=0.8453 × n(OH)-0.0238∙BDE × f*(O)  -1.5735
Correlation Coefficient (Q2) 0.9429
Mean absolute error 0.3673
Root mean squared error 0.433
Relative absolute error 35.64%
Root relative squared error 32.70%
SMOreg ABTS=0.8165 × n(OH)-0.0281∙BDE × f*(O)-1.6226
Correlation Coefficient (Q2) 0.9275
Mean absolute error 0.4181
Root mean squared error 0.492
Relative absolute error 40.57%
Root relative squared error 37.16%

Table 2: ABTS endpoint regression results of the LinearRegression and the SMOreg algorithms.


Figure 1: Flowchart of the modelling procedure.

As we can see, (Figure 2) the experimental and predicted values have a very high correlation with experimental data, presenting both models for the ABTS activity with a Q2 ≥ 0.91. Thus, these QSAR models are remarkably successful.


Figure 2: MLR and SMOreg results fitting between calculated and experimental ABTS values.

As we can take from (Table 3), the DPPH endpoint was approached with effectiveness and accuracy. The Multiple Linear Regression approach seems to be slightly more reliable for this type of response and dataset with a Q2=0.936, even though SMOreg algorithm proved to be extremely efficient too with a Q2=0.9273. In (Figure 3) the graphical relationship between experimental and calculated DPPH values is provided for both MLR and SMO algorithms.

LinearRegression DPPH=0.8508 × n(OH)-0.0265∙BDE × f*(O)-1.6973
Correlation Coefficient (Q2) 0.936
Mean absolute error 0.3827
Root mean squared error 0.4627
Relative absolute error 36.74%
Root relative squared error 34.55%
SMOreg DPPH=0.86 × n(OH)+0.0332∙BDE × f*(O)-1.8887
Correlation Coefficient (Q2) 0.9273
Mean absolute error 0.4217
Root mean squared error 0.4981
Relative absolute error 40.49%
Root relative squared error 37.20%

Table 3: DPPH endpoint regression results of the LinearRegression and the SMOreg algorithms.


Figure 3: MLR and SMOreg results fitting between calculated and experimental DPPH values.

The final equations for both algorithms, when targeting the ABTS and DPPH values, include n(OH), which is known from the literature to be related to the activity of certain compounds, as a major number of phenolic sites could potentially make the electronic and charge accommodation after a HAT or SET mechanism easier, by giving more means to the molecule to delocalize charge or an unpaired electron. Finally, the other key descriptor for describing our problem was [BDE × f*(O)], related to the reactivity and susceptibility of the phenolic oxygen atom in the molecule to radical attacks. It makes mechanistic sense to include this descriptor in the predictive regression equation, as we are literally studying the reaction between phenolic compounds and free radicals, which we also already know from literature undergoes via radical hydrogen atom transfers (HAT mechanism). It is important to note that all presented Q2 ≥ 0.90, making our models extremely efficient.

The Linear Regression algorithms in WEKA seem to be more effective for both endpoints, showing better correlation coefficients (Q2) and better error metrics than the SMOreg algorithm across all models.

Another important point to discuss is the fact that this QSAR work and its results all comply with the OECD principles for the validation, for regulatory purposes, of (quantitative) structure- activity relationship models [28] as it has a defined endpoint (ABTS and DPPH), applies an unambiguous algorithm (as described in the methodology of this work), has a defined domain of applicability (phenolic compounds), present appropriate measures of goodness-of-fit and robustness (Q2, R2 and error metrics, obtained with Leave-One-Out Cross Validation) and we have also presented a mechanistic interpretation for the obtained models (as it was made clear in the previous discussion) [29].


The obtained results have been satisfactory for achieving our goal of producing a regression QSAR of the RSA (measured by ABTS and DPPH separately) of phenolic compounds based on CDFT descriptors obtained via the Density-Functional Tight-Binding method. Two precise models were developed for the ABTS and the DPPH antioxidant capacity tests, with Q2 ≥ 90% and robust error metrics.

We have fulfilled all these work objectives, with highly potent, fast, and cheap models being developed for the RSA. Alongside this, we have proven that DFTB provides a very efficient and approachable calculation framework for this type of study. It would be interesting in future studies to compare models generated with traditional DFT and models generated via DFTB, comparing total time expended and other metrics.

The n(OH) and [BDE × f*(O)] have proven to be effective descriptors for theoretically approaching the RSA of phenols via the mechanisms of HAT expected to be involved in the ABTS and DPPH tests. Our QSAR model makes mechanistic sense and comes accompanied by a clear, reasonable, and logical mechanistic interpretation.

Lastly, this study poses extra value to the current body of knowledge in this area, as we were able to generate very effective and cheap models using only two descriptors directly related to a chemical behavior. In contrast, models in the literature use four or more descriptors and are not always directly linked to a chemical mechanism. Other issues found in current literature are the low number of instances in certain studies for statistical significance. We would recommend revising the studies in this area to try and keep the number of molecules over 40 and the number of descriptors as low as possible.

This study has successfully generated two extremely effective QSAR models at practically zero cost for predicting the DPPH and ABTS values of RSA. Furthermore, both works comply with all the strict 5 OECD QSAR validation principles for regulatory purposes.


Author Info

Andrés Halabi1,2*, Millaray Hernández1 and Hans Lenes2
1Departamento de Investigación y Desarrollo, Los Alamos National Laboratory, Valdivia, Chile
2Departamento de Ciencias Químicas, Universidad Andrés Bello, Santiago, Chile

Citation: Halabi A, Hern�¡ndez M, Lenes H (2023) Accurate, Fast and Easy-To-Understand Density Functional Tight Binding and Machine Learning QSAR for the DPPH and ABTS Antioxidant Activity of Phenolic Compounds Based on No-Code Freeware J Drug Metab Toxicol. 14:286

Received: 04-Mar-2023, Manuscript No. JDMT-23-22028; Editor assigned: 07-Mar-2023, Pre QC No. JDMT-23-22028 (PQ); Reviewed: 24-Mar-2023, QC No. JDMT-23-22028; Revised: 28-Mar-2023, Manuscript No. JDMT-23-22028 (R); Published: 05-Apr-2023 , DOI: 10.35248/2157-7609.23.14.286

Copyright: © 2023 Halabi A, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.