Determinants of Customer Satisfaction at the San Francisco International Airport

This study attempts to determine the overall satisfaction factors from airline passengers at the San Francisco International Airport (SFO), using the classification method of random forest. The analysis is based on the 2014 annual survey conducted by SFO that collects data on passenger demographics and satisfaction with airport facilities and services. Results of this study indicate that some service attributes are more important than others for passengers’ overall satisfaction at SFO. Study results are expected to provide practical insights to the airport industry. This study, in addition, introduces the machine learning method of random forest to tourism research.


INTRODUCTION
Researchers in tourism have generally used (i) multiple linear regression [1,2] which ignores the fact that the response is ordinal and not interval scale data, (ii) multinomial or ordinal logistic regression [3,4] or (iii) transformation to convert a 5-point Likert scale response to a binary response that is modeled by the binary logistic regression method [5], which is not necessary. The method of random forest [6] is a machine learning tool for classification and regression problems; the method uses decision trees and bootstrapping to predict a multinomial response (classification) or a continuous response (regression). This study attempts to determine the overall satisfaction factors from airline passengers at the San Francisco International Airport (SFO) (hereon called "SFO") by using the method of random forest.

LITERATURE REVIEW
Airports are complex service settings where passenger satisfaction is influenced by a variety of attributes [7]. Some of the known factors that influence passenger's satisfaction are: security check, art display, accessibility, airport parking, baggage, cleanliness, information availability, restrooms, restaurants, shops, staff, signage, and Wi-Fi [8][9][10][11][12]. In study of the service quality at Melbourne airport [11], significant discrepancies between passengers' expectations and their perceptions of service quality at the airport were found, indicating room for improvement in service quality at the Melbourne Airport. Another study [13] used observations and information collected from a focus group study, and in-depth interviews to determine reasons for delays in baggage access. Researchers in hospitality and tourism have also investigated the problems related to determinants of customer satisfaction [14][15][16].

METHODOLOGY Data collection and description of variables
SFO conducts an annual survey and collects data on passenger demographics and satisfaction with airport facilities and services from stratified random samples [15]. This study uses secondary data from the 2014 SFO annual survey, which provided a random sample of 2820 responses on 95 questions, with a number of missing responses ranging from 0 to 2820. A total number of 23 variables are selected for the analysis based on existing literature. The method of multivariate imputation by chained equations (MICE) yields a complete data set and results in estimates with smaller standard errors and narrower confidence intervals [16]. The R-package mice is therefore used to replace missing values [17].
In this study, three types of predictor variables are selected to determine the key drivers of overall satisfaction at SFO: ratings, cleanliness, and demographics. Ratings include a total of 15 items (artwork exhibitions, restaurants, retail shops and concessions, signs and directions inside SFO, escalators/elevators/moving walkways, information on screens/monitors, information booths (lower level -near baggage claim), information booths (upper level -departure area), accessing and using free Wi-Fi at SFO, signs and directions on SFO airport roadways, airport parking facilities, AirTrain, long term parking lot shuttle (bus ride), airport rental car center, and SFO Airport as a whole. A 5-point Likert scale, with 1 as "Unacceptable" to 5 as "Outstanding" is used to measure ratings.
Cleanliness includes a total of 6 items (Boarding areas, AirTrain, airport rental car center, airport restaurants, restrooms, and overall SFO cleanliness). A 5-point Likert scale, with 1 as "Dirty", 3 as "Average" to 5 as "Clean" is used to measure cleanliness. Age, gender, and income are the demographic variables, with age categorized into 7 levels, gender categorized into 3 levels, and income categorized into 5 levels. Table 1 summarizes the variables selected in this study.

Method of random forest
The analyses are performed using the statistical software environment R [18]. The method of random forest is utilized to build a predictive model for overall satisfaction as a function of the 22 selected predictors. Random forest is a highly accurate ensemble machine learning method for classification or regression, which involves building a large number of decision trees in the training step, and outputs the model of the classes predicted by individual trees [4,19,20].
This study uses the R-package randomForest [21] to perform the method of random forest. The package randomForest outputs 'Out of Bag' (OOB) (i.e., out of the training sample) estimates of prediction accuracy as well as a plot showing the importance of predictors in the model. The package is iteratively used by adding and dropping predictors until a final model with good prediction accuracy is obtained. The association between the response variable and each individual predictor is further tested by the chi-square test of independence; in majority of the cases, the expected frequencies of several cells turn out to be less than 5, and the p-values for the chi-square test are evaluated by bootstrap [22].

Performance measures for prediction
A large number of performance measures for multi-level classifiers exist in machine learning literature [23]. Accuracy, precision, recall and the geometric mean F1 of precision, and recall are commonly used [24,25]. To compute these measures, the confusion matrix is first calculated. Since the response has five categories, the confusion matrix is comprised of a 5x5 matrix of cell frequencies  C i,j where C i,j = number of times true response of j get predicted as i (i, j = 1, 2, …, 5) (  There are examples in the literature when a multi-level classification or prediction problem is transformed into a binary classification so that the binary logistic regression can be used [3]; for this reason, the overall ratings are transformed as follows: "Unacceptable (1)", "Below Average (2)", "Average (3)" = 0, "Good (4)", and "Outstanding (5)" = 1 and the performance measures are recalculated; these are referred to as binary accuracy, precision, recall, and F1 in this study. Table 3 shows that the method of multivariate imputation by chained equations (MICE) has performed quite well for the data set; the five-point summaries of data before and after imputation are very close to each other.

Data imputation
The stacked bar chart of Wi-Fi service (RATE_WIFI) (Figure 1) shows that the majority of SFO passengers give a rating of 4 or 5 to the Wi-Fi service at SFO; Figure 1 further suggests that the proportions of Wi-Fi service ratings of 1, 2,…, 5 are similar across the gates, i.e., there is no association between Wi-Fi service rating and gate; this is confirmed by the chi-square test of association between gate and Wi-Fi service (p = 0.18), which implies that the quality of Wi-Fi is similar at each gate. Figures 2 and 3 show stacked bar charts of eight of the rating predictors by the response variable overall satisfaction with SFO (RATE_ALL).
All of the bar charts suggest the presence of association between the response and the predictor; the chi-square test of independence

Predicted Overall Satsfaction
True Overall Satsfaction  confirms this association; Table 4 shows that strong association exists between the response variable and each of the potential predictors. Figure 4 shows the stacked bar charts of age (AGE) and gender (GENDER) by the response variable overall satisfaction with SFO (RATE_ALL). Figure 4 suggests that overall satisfaction with SFO is not affected by age or gender. Table 5 shows the results of the chi-square test of independence between the response variable and the two demographic variables age and gender. The associations between the response and these two potential predictors are insignificant (p > 0.05).

The random forest model
The backward selection procedure is used to find the important predictors of the response variable overall satisfaction with SFO (RATE_ALL). Table 6 shows the multi-level confusion matrix of the full random forest model for the response as a function of all of the 22 potential predictors, and Table 6 shows the binary confusion matrix of prediction obtained from Table 6. Tables 6  and 7 show that the random forest model has high accuracy (75%) and very high binary accuracy (98.5%). Figure 5 shows the plot of variable importance measures for the full random forest model; gender (GENDER), language (LANG), age     3: Results of data imputation by MICE -number of missing values, and five-point summary of data before (B) and after (A) data imputation.
(AGE), and income (INCOME) are the least important predictors in this model, and overall SFO cleanliness (CLEANLINESS_ALL), signs and directions inside SFO (RATE_SIGN), artwork exhibitions (RATE_ART), and restaurants (RATE_FOOD) are the most important ones. Key drivers of overall satisfaction were obtained by successively removing predictors from the bottom of Figure  5: signs and directions inside SFO (RATE_SIGN), overall SFO cleanliness (CLEANLINESS_ALL), signs and directions on SFO airport roadways (RATE_ROADS), artwork exhibitions (RATE_ ART), retail shops and concessions (RATE_STORE), restaurants (RATE_FOOD), airport rental car center (RATE_RENTAL) and accessing and using free Wi-Fi at SFO (RATE_WIFI). Table 8 shows the multi-level confusion matrix, and Table 9 shows the binary confusion matrix for the final random forest model. The OOB accuracy of the final random forest model (74.6%) is very close to that of the full random forest model (75.5%). Figure  6 shows the variable importance of the predictors in the final random forest model.

DISCUSSIONS AND IMPLICATIONS
This study introduces the machine learning tool of random forest to tourism literature, and shows the applicability of this approach in determining drivers of passenger satisfaction using data from the 2014 SFO customer satisfaction survey. The methods used in this study (data imputation, random forest predictive model) and performance measures computed for multi-level response (precision, recall, F1) are taken from the machine learning literature and applied to analysis of SFO customer satisfaction data. These methods can clearly be applied to any modeling situation in which the response variable is multi-level, without transforming it to binary response, or using methods such as multiple linear regression which should not be used for ordinal data.
Generally, this study suggests that the key drivers of overall satisfaction at the SFO airport are artwork and exhibitions, restaurants, retail shops and concessions, signs and directions inside SFO, signs and directions inside SFO airport roadways, airport rental car center, accessing and using free Wi-Fi at SFO, and overall cleanliness of SFO. Among these key drivers, overall cleanliness of SFO, signs and directions inside SFO, artwork and exhibitions, and restaurants are regarded most important. Several limitations exist in this study. Study results cannot be generalized as data is from a single airport and from 2014 only. Moreover, there is no 'typical' airport in terms of services and facilities provided [26][27][28][29] Table 9: Confusion matrix of the random forest model for RATE_ALL using the final predictors for binary response (Unacceptable and Below Average = 0, Average, Good, or Outstanding = 1) obtained by collapsing rows and columns of Table 8. replicate this study for different years and different sizes of airports. Additionally, this study did not use the entire list of variables from the SFO survey. Future studies are encouraged to include a broader variety of predictor variables to determine the drivers of passengers' overall satisfaction.