Keywords: 3D objects; Military vehicle; Graphic Training Aid (GTA)
The problem of misidentification
Perception and identification of military vehicles is an important function in both live combat and unmanned operations. According to the Unmanned Systems Integrated Roadmap Executive Summary, target identification and designation is the second highest priority for future unmanned systems. Identification has been defined as “a decision about an object’s unique identity requiring subjects to discriminate between similar objects that involves generalizations across some shape changes as well as physical translation and rotation” . Within a military context, it is often referred to as CID (Combat Identification) and has been defined as “the means to positively identify friendly, hostile, and neutral platforms to reduce fratricide due to misidentification, and to maximize effective use of weapons systems” . The absence of the ability for an individual to properly identify military vehicles can lead to costly errors. Sometimes referred to as “misidentification,” these types of errors occur when an object in the world is perceived as a different object . Within the scope of this article, the problem of misidentifying military vehicles in the real world can lead to many types of errors, including human injury or loss of life [4,5].
Misidentification within the military domain is often termed fratricide, or “blue-on-blue” . This refers to misidentification incidences in which allied combat forces fire upon one another. As an example, in the first Gulf War, it has been suggested that overall U.S. and U.K. casualties/injuries due to friendly fire were as high as 25 to 30 percent [6,7]. Although many factors are involved in making these types of mistakes (e.g., command decisions, communication, and fog of war), most often the errors are attributable to the individual war fighter. Therefore, it is pertinent to design training  that leads to high levels of performance for identification tasks to best reduce misidentification on the battlefield.
One reason that misidentification is so prominent in military operations is due to the similarities in the visual appearances of military vehicles [9,10]. Military vehicles tend to share structural similarities regardless of their country of origin (Figure 1), and these similarities lead to difficult decision making, especially when individuals are forced to make hasty decisions in the midst of battlefield environments [3,4,9,11]. The majority of military vehicles are so similar in appearance that without knowledge of specific cues, it is nearly impossible to distinguish one from another . Especially from a frontal view, lack of distinguishable cues can lead to quick and often incorrect decision making . It has also been demonstrated that these vehicles have highly similar shapes, sizes, and spatial relationships/locations between their components, making the process of identifying them from one another profoundly difficult . Therefore, this paper examined a novel training method for training expertise, specifically through comparing three-dimensional (3D) objects to standardized methods of training (Figure 1).
Classification of objects and expertise
To design appropriate training for identifying similar vehicles, it is important to understand exactly how the human mind categorizes objects. Insightful research on object categorization has been published in the works of Biederman and Shiffrar . Their work demonstrates how humans classify objects based on familiarity and expertise with the objects in question. One of the critical findings of their work has been that humans tend to group objects along three hierarchical levels: superordinate category (e.g., furniture), basic category (e.g., table), and subordinate category (e.g., coffee table) . Most humans can classify at a basic level almost instantaneously [12,13]. What is significant about this previous research on object categorization is that with expertise, the ability to effectively categorize objects is no longer at the basic level, but at the subordinate level. This indicates that training that strives for expertise should more readily allow for individuals to arrive at correct identification of sub-ordinate categories. Using the domain of military vehicle identification, one may arrive at a super-ordinate level (military vehicle), a basic level (tank), and a subordinate level (M1A1 Abrams). If individuals can be trained in an “expert-like” fashion, then their ability to quickly and effectively classify the vehicles they encounter will be enhanced. These (3D) objects may provide information that simply cannot exist using other media.
Using 3D training materials
3D objects may be one way to effectively train novices to categorize at the subordinate level for military vehicles. It could be argued that 3D objects provide a wealth of visual information (e.g., depth and multiple views) compared to their two-dimensional (2D) counterparts [14,15]. Although research has not been conducted specifically on 3D objects, investigations in the domains of education, psychology, medicine, and military training have all found evidence demonstrating that 3D imagery, when used as a training medium, can produce high performance outcomes [15,16-18]. For example, Kim  found that students who were trained using a stereoscopic 3D image of the earth were able to perform significantly higher on a test of plate tectonics knowledge. Other studies have demonstrated that integration of 3D images can be promising in enhancing students’ learning of anatomy [19-21]. Additionally, Nicholson et al. in 2008  found that training using 3D computer-generated models led to significant increases in students’ knowledge of 3D relationships within the ear. Moreover, Hu  found that 3D visualizations during surgical planning led to significant decreases in workload and reductions in preparation time. Also, 3D stereoscopic images improve student’s abilities to visualize, giving even more promise to this type of media [19,22]. Previous work by the authors’ [3,14,23,24] examining the effects of objects for training military vehicle identification has been consistent in finding performance-enhancing effects for using physical, scaled objects (e.g., die cast scale models). More so, others have found that using real world objects as props lead to faster acquisition times for learning how to interact with simulation training media .
Based on the need to experimentally establish the effects of using 3D materials (namely, 1:35 scale models) to train military vehicle identification, current methods of training were used as a basis for comparison with the 1:35 scale model training. Specifically, military issued flashcards (GTA 17-2-013) showing multiple canonical 2D (line drawings) views and perspective 2D images from a virtual simulation, called the military deployable virtual training environment (DVTE), were used as comparison media.
Due to the extant training literature supporting the use of 3D stereoscopic imagery and its effects on other highly visual domains (e.g., anatomy education), we hypothesized that the use of 3D objects to study a set of military vehicles would lead to powerful training outcomes. Specifically, our main hypothesis stated the following
Hypothesis HR: Individuals trained to memorize a set of military vehicles using 1:35 scaled replicas were expected to significantly outperform individuals who were instead trained using (HRa) 2D canonical views (in the form of military issued flashcards (GTA 17-2-013)) or (HRb) perspective 2D images (from the military deployable virtual training environment (DVTE)). This effect should be consistent across multiple measures of performance (recognition, alliance categorization, and identification).
Fifty-five undergraduate students, 36 males and 19 females, were recruited from a large southeastern university. No effects due to gender were found in the sample. Participants’ ages ranged from 18- 33 (M=19), with none of the participants having previous military experience. All participants had either 20/20 vision or corrected 20/20 vision. Participants received course credit as compensation for their time.
As described above, three applied training methods were used (Figure 2): (a) for the 2D canonical views, military issued training cards containing black and white line drawings were used; (b) 2D polygonbased, perspective, virtual views were obtained from a military issued training simulation; and (c) the 3D physical models were commercial, off-the-shelf (COTS) 1:35 die cast scale models of military vehicles.
Canonical 2D views: The military issued cards (GTA 17-2-13) each contained three images of a vehicle, drawn with black lines on a white background. Each image corresponded to one canonical view: Front, side, and a perspective view from 45 degrees off centerline (between front and side views).
Perspective virtual 2D views: The 2D, perspective, virtual views were obtained from a military virtual simulation, called the deployable virtual training environment . Because the vehicle models in the virtual simulation were pre-fabricated and we could not change the source code, the vehicles’ colors could not be altered for this study. The color of the vehicles differed somewhat from model to model, but was mostly a uniform sand or olive color.
Physical 3D models: We used 1:35 scale models of the same military vehicles as shown in the other conditions as physical 3D models. These were either die-cast metal models that were purchased pre-assembled or models that were built from 1:35 scale plastic model kits. In order to remove color as a possible confounding variable, the scale models were all painted with a matte white finish.
Participants were given a Dell Laptop INSPIRON with a 16” screen at high resolution. The computer was fitted with MediaLab experimental software. This software was used to display the images in the final measure described above as well as automatically enter participant responses into a Microsoft Excel workbook. Participants in the DVTE based training also used this laptop to interact with a Microsoft power point presentation containing the military vehicle images.
The experiment consisted of testing the effects of the three training modalities on three dependent variables (DVs): individual’s performance in (a) recognizing the vehicles (correctly specify whether they had seen the vehicles before), (b) assessing the allegiance of the vehicles (know if the vehicle was friendly or enemy), and (c) identifying each vehicle model by name (e.g., M1A1, T-80, Challenger). To measure performance accuracy, a total percentage correct score was derived from a questionnaire filled out by participants as they viewed images of 66 military vehicles presented randomly (Figure 3). Upon completing the ten second viewing time for each photograph, participants would answer whether they had seen it before or not (recognition); what alliance the vehicle was (allegiance); and answer the vehicles name (identification). Correct answers were summed, and percentages were derived from the total of sixty six items. The measure consisted of six photographs of each of the seven vehicles studied by the participants and four vehicles that served as distracters.
The experiment used a One-way between subjects ANOVA design, with a three level independent variable. The between-subjects factor (IV) was training modality with three levels (i.e., 2D canonical views (military issued cards), 2D perspective virtual views (DVTE images), and 3D physical objects (i.e., 1:35 scale models)).
Participants were greeted and situated at a workstation in a standard laboratory. They were then handed an informed consent and biographical data form asking age, gender, military experience, and visual acuity (corrected and non-corrected). Participants then began training according to their condition.
Training: At the beginning of training, participants were told that they would be studying a set of military vehicles and would later have to know whether they had seen the vehicles before, know the allegiance of the vehicles, and identify the vehicles by name. Each participant was then randomly assigned to one of three conditions (Figure 2): (a) 2D canonical views (training cards), (b) 2D perspective, virtual views (DVTE images), or (c) 3D physical models (1:35 scale models). Participants in the card training condition viewed seven military issued armored vehicle recognition cards from Graphic Training Aid (GTA) 17-02-013. Participants in the DVTE condition viewed polygon representations of the same seven vehicles, and those in the model condition viewed scaled models, also of the same seven vehicles.
The 3D models were presented on a table in a frontal view. Participants in both the card and model groups were allowed to touch and/or move the training media if needed, whereas those in the DVTE condition could move through the presentation freely during the allotted time.
In all three conditions, an 8” × 11” information sheet (Figure 4) accompanied each vehicle. These information sheets were adapted from a military vehicle training developed by Traysys . Each sheet contained the name and allegiance (i.e., friend or enemy) of the vehicle, as well as questions associated with the physical characteristics of the vehicle. A portion of the physical characteristics were global features; that is, they were not unique to the individual vehicles (e.g., wheel count). Conversely, some of these characteristics were considered to be “critical cues” , which were distinct visual features of one particular vehicle. Emulating the question and answer format devised by Bramley , participants were asked to study each vehicle and attempt to answer the questions on the front of the sheet before looking on the back of the sheet. However, they were able to review sheet card as much as necessary throughout the twenty minutes. After the twenty minute training, the materials were removed and the testing portion of the experiment began.
Testing: After training was complete, the Dell laptop was placed in front of the participants and the MediaLab software was started. The MediaLab software, once initiated, provided participants with directions and described the three testing tasks again. The participants then viewed images of 66 military vehicles presented in random order (Figure 3), six photographs of each of the seven vehicles studied by the participants, and six photographs each of four vehicles that served as distracters. Once the presentation began, participants had ten seconds to examine each vehicle photograph, followed by a screen prompting them to answer the questions on their response sheets. Participants were given as much time as needed to fill out the appropriate answers on their questionnaire before progressing to the next item. After all sixty-six photographs were presented and participants were finished filling out the performance measures, they received post-participation information on the experiment and were given course credit.
A One-way ANOVA with planned comparisons (physical models versus cards and simulated vehicle images) was conducted for each of the three performance measures (i.e., recognition, allegiance assessment, and identification). Figure 5 depicts a line graph demonstrating the effects of training modality on performance (e.g. correct items) by task type.
Recognition: Performance on the recognition task was not significantly different between conditions (F (2, 52)=1.486, p=.24). Although performance on the recognition task was not significant, the mean for the physical model condition was higher (M=43.65) than the means for the other two conditions (cards, M=41.61; images M=40.47).
There was a significant effect for training modality on performance in assessing the allegiance of the vehicle (i.e., friend/enemy differentiation performance), F (2.52)=3.715, p<.05, Eta2=.125. Planned comparisons demonstrated that model training had higher scores (M=39.15, SD=7.37) than both the card training (M=33.22, SD=8.37) and simulated vehicle training (M=32.884, SD=8.24), (t (52)=2.725, p=.009).
The scale model training led to significantly higher means compared to the other two training modalities. As predicted, there was a statistically significant difference between the three conditions (F (2,52)=3.228, p<.05, Eta2=.11. The model condition (M=33.3, SD=7.95) had significantly higher identification scores when compared to the simulated vehicle (M=25.3, SD=12.4) and the card training (M=27.22, SD=9.8) conditions (t (52)=2.49, p=.016).
The results suggest that there are training benefits when using physical 3D (1:35th scale) models, in agreement with previous research [3,14,23]. The models enhanced novice training on tasks of discriminating and identifying military vehicles above and beyond that of military issued cards and simulation vehicles. Those participants who were trained on the models outperformed the rest of the sample on both friend/enemy and identification tasks. This indicates that the 1:35th scale 3D training appears to be better than the other training methods. Yet, due to our limited sample size and design, further empirical research should aim to replicate these results.
There are multiple reasons that 3D physical models may be better training devices. As discussed in the introduction section, the differences could be related to many possible visual factors associated with 3D, and may include individual difference factors such as higher interest for those participants who interacted with physical models rather than pictures/images or presentation software. Given that the models provide more visual information, the argument could be made that training on models simply creates more powerful referent memories for later use upon testing. Yet, this is outside the scope of this study, and would need to be addressed in future research. Another possible reason, and a potential outlet for future research, is the fact that using the physical models may have led to multi-modal learning. Touching and/or picking up the models could have led to both kinesthetic and proprioceptive information that later helped participants remember the differing vehicles more readily.
Physical models may be a sufficient, cost effective, and practical way to familiarize trainees with real world objects, but more research is needed to definitively conclude this effect. In a short amount of time, trainees in the 1:35th scale model condition were able to distinguish vehicles at a subordinate level of classification, indicating performance that is closer to expertise than the other conditions. Future research will have to focus on exactly how to integrate physical training devices into training protocols. Given some of the effects found in this study, and previous research by Biederman and Shiffrar , it is possible that models may be better suited to bring novices to expert levels quickly compared to current methods. Given the limited sample and design of this study, more research should be conducted to advance this notion of training with 1:35th scale models.
Another important outcome is the finding that relatively speedy training times (approximately 20 minutes) can lead to acceptable performance outcomes. Even though 1:35th scale models did lead to significantly stronger results, the means across all three groups demonstrated that the trainees could recognize at least 60% of the vehicles, know the alliance of at least 50% of the vehicles, and identify at least 30% of the vehicles they were tested on. Overall, this implies training is an important factor in learning military vehicles. Within the scope of this study, it appears that 1:35th scale models can lead to better performance outcomes, but this is greatly limited by our sample size and study design. Further development of this type of training could potentially lead to fast acquisition times for trainees studying objects in multiple domains.
This research is an incremental step in a new direction for training object identification tasks. From the results found here and from previous literature showing expertise development over very short time intervals (e.g., chicken sexing , we believe that training protocols could be developed using scale models to quickly train identification skills. The further development of training paradigms, creation of reliable and valid measures, and the further integration of 3D attributes into the science of simulation and training must all be considered for future work in this area. It is not enough to be able to simply recognize an object. For proper identification to occur, trainees must quickly, reliably, and efficiently match their perceptions to their memory.
Future research will have to further investigate what aspects of physical scale models provided the training outcomes. Investigations using modern stereoscopic 3D computing (e.g., NVIDIA GeForce) systems could investigate whether stereoscopy alone has the same effect as actual objects. Future endeavors in this domain must examine the impact on short- and long-term memory when individuals instead physically handle an object. Future research will also need to focus on separating the haptic effects of using physical scaled model from the stereoscopic visual properties of said media.
Finally, future research will have to aim to measure a sample that is closer to the population of interest (i.e. infantry soldiers). This sample was limited in that undergraduates were measured as a proxy for soldiers and infantrymen and women. Future research will be much more generalizable if a sample from the relevant population is acquired.