Emotional Responses to Information and Warning Sounds
Journal of Ergonomics

Journal of Ergonomics
Open Access

ISSN: 2165-7556

Research Article - (2012) Volume 2, Issue 3

Emotional Responses to Information and Warning Sounds

Daniel Vastfjall1*, Penny Bergman2, Anders Sköld2, Ana Tajadura3 and Pontus Larsson3
1Department of Psychology, Linköping University, Sweden
2Department of Civil and Environmental Engineering Acoustics, Linköping University, Sweden
3Department of Civil and Environmental Engineering, Applied Acoustics, Room Acoustics, Linköping University, Sweden
*Corresponding Author: Daniel Vastfjall, Department of Psychology, Linköping University, Sweden


Two studies examined emotional reactions to warning and information sounds. Study 1 showed that warning sounds designed to convey four levels of warning could be differentiated with self-report measures of emotional reactions. Study 2 validated this finding with physiological measures of emotion. Results are discussed in relation to creation of warning and information sounds and methods for assessing the effectiveness of such sounds.

Keywords: Emotion reaction model; Skin conductance responses; SR lambda signature; Self assessment mannekin


Information and warning signals are becoming an integral part of many work environments. Today, commercial products like cars and trucks, but also other products like handheld computing devices and mobile phones are filled with various auditory alerts. There are good reasons for using the auditory modality to provide information; in visually complex environments auditory information may be very beneficial [1]. Hearing is also omnidirectional and therefore cannot easily be shut off or ignored [2]. Two broad classes of auditory alerts may be identified: 1) Auditory icons, which can be described as “caricatures of naturally occurring sounds such as bumps, scrapes, or even files hitting mailboxes.” [3]. Essentially it is in its meaning supposed to provide the listener with information in an efficient way. 2) Earcons are abstract musically-based auditory alerts. The major difference between auditory icons and earcons is that there is no obvious link between a beep, for instance, and the information of what you are supposed to do [3]. Earcons thus rely on the listener associating the auditory event with the intended action. While it seems obvious that auditory icons are preferable, earcons have the distinct advantage of generalizability. The same or the similar warning can be used for several different events of the same importance.

In many modern systems with human-machine interaction the information load can be high and thus auditory icon is a great support for focusing attention on different events. One example of such situation is driving a vehicle. Here different signals of different urgency levels, like a sound drawing attention to some kind of display, telling you that it is time to visit a garage for a periodic maintenance, or a bit more urgent level telling you that the engine is hot or the oil level is low, up to driving support systems telling you that you are about to crash into another vehicle. In the latter situation, immediate action is required which put a high demand of the clearness of the information carried as well as the reaction-time to the sound.

The development of earcons is today to a large extent based on basic psychoacoustics. Thorough research has been conducted on how to design abstract sounds [4] but the cognitive response linked to the sound is much less well understood [5]. It is therefore important to systematically investigate and be able to measure how the sound is comprehended. Edworthy and Hellier [5] suggest that abstract sounds can be interpreted very differently depending on the many possible meanings that can be linked to a sound, in large dependent on the surrounding environment and the listener. Designing sounds with unambiguous and appropriate meaning is perhaps the most important task in auditory warning design [5]. Another approach is to look at the urgency of the sound. Higher urgency indicates that a quicker reaction is needed, although it cannot be directly connected to the meaning. Research conducted by Haas and Casali [6] has also shown that warning sounds with higher perceived urgency produce faster response times, a measure relevant for many real-life actions in working environments (i.e. braking, or pushing the correct button).

Urgency is both a cognitive and emotional sensation with the function of motivating behavior [7]. Basic emotion research suggests that urgency is a form of cognitive preprocessing of information [8]. At its most basic level, a person in a specific environment (i.e. a truck) has a specific goal (i.e. driving the truck to its destination). When an event occurs (i.e. a warning sound is heard indicating something is wrong), the person appraises the event on a number of psychological dimensions designed to evaluate the seriousness of the threat (unexpectedness, familiarity, agency including urgency). This appraisal process automatically leads to an emotion with changes in experience, physiology, cognition, and behavior. The perception of a state of emotion is the internal event and, as a consequence of this experience, the person tries to cope with the situation by taking external or internal actions to improve the relationship between his goals and the environment. The more negative and the more arousal this experience has, the more serious is the event (thus calling for more immediate or urgent action: we have termed this theoretical framework the Emotion Reaction Model [ERM]; [9]. A key component of the ERM model is that earcons (as defined here) are low-level emotional stimuli that will engage relatively direct and automatic emotional responses (i.e. brain stem responses). Auditory icons (as defined here) on the other hand will involve higher level processing with linkage to episodic memories etc before emotional responses are elicited.

Much of the previous research on warning sounds has relied on urgency as a measure of the effectiveness of the design [2]. In the present research we take a more general approach and measure how various warning sounds elicit emotional responses. In our previous research, we have measured emotion as a combination of two principal dimensions: valence and activation [10,11]. Valence is a basic dimension of all emotional responses; it ranges from negative over neutral to positive. Activation (or arousal) is a second orthogonal dimension of experience that relate to how active vs. passive the experience is [12]. The two dimensions are simply a parsimonious description of the basic building blocks of our emotional experiences. The actual experience (eg. sadness) is distinct feeling state that can be described as a combination of these two dimensions (negative and low activation). A feeling of high urgency can, in this framework, be seen as high activation and a negative valence state. Low urgency would entail a sensation of low activation and positive valence (eg. calmness).

In the present research we measured emotional responses to information and warning sounds using a self-report measure of valence and activation (Experiment 1) and physiological indicators of the same dimensions, Event-Related Facial EMG (valence) and Skin Conductance Responses (SCRs) (Experiment 2). We varied sounds according to previous research in urgency level ranging from very low (information), low (caution) to high (warning) and very high (severe warning). We expected that a shift from low to high urgency would be evident in both the valence and activation measures as indexed by both self-reports and physiology.

Materials and Methods

Experiment 1

In the first experiment a set of information and warning sounds were created. The sounds were all abstract earcons. A main design parameter was to achieve different levels of warning, without using changes in sound level. Thus, we used a set of loudness-equalized sounds differing in character.

Participants: 26 participants, 12 men and 14 women participated in the experiment. All participants were tested for normal hearing. The median age was 25 years old, ranging between 21 and 33 years.


Stimuli: The stimuli used were auditory icons of 4 different warning levels and 3 different designs, referred to as Concept I, II and III. The sounds were adjusted to have equal loudness and of as similar length as possible considering periodicity of the looping featured in the design. Thus totally 12 sounds were used for this study. Other sounds of similar type and length were included in the experiment as well but not as a part of this study. (table 1)

Concept Sound Description
I Info A bell-like, dong-ding, sound. 2 tones, a fifth apart with an amplitude vibrato in the decay. Lowest tone (LT): 290 Hz.
I Caution Two beep-sounds followed by an echo, repeated twice. LT: 734Hz, 14 dB lower than a tone at 1469Hz.
I Warning 0.8 s long honking sound repeated thrice with a half-second silence in between. LT: 83.44Hz.
I Severe Warning Similar to the warning sound, but 0.2 s long with 4 repetitions and 0.2 second silence in between. LT: 94.2 Hz.
II Info Clock-like (Toot-toot-toot) sound. 0.4 s sound, consist of a mixture of two tones, 3 octaves from each other with a short decay. Repeated thrice with no silence in between. LT: 309Hz.
II Caution Similar to the info-sound, like an artificial bell. Lasts 0.75 s repeated twice with a longer decay and 0.7 s silence in between. LT: 309Hz.
II Warning A 0.6 s sound divided in two parts. Both consisting of two tones, 3 octaves from each other. The first part with a longer decay and the second part with a short decay. Repeated thrice with 0.5 s in between. LT: 277Hz.
II Severe Warning 4 beeps of 0.1 s each, repeated four times with 0.1 s silence in between. LT: 297Hz.
III Info A short knocking sound repeated twice in block of two knocks. LT: 1760Hz.
III Caution A beep sound consisting 2 tones, 3 octaves in between, repeated twice in block of two beeps. Each beep lasts about 0.1 s. LT: 985Hz.
III Warning A rougher beep sound. Contains 3 strong tones in dissonant intervals, repeated 4 times. Each beep lasts about 0.3 s. LT: 99Hz.
III Severe Warning A 0.1 s long beep consisting of two tones, a semitone apart repeated quickly 4 times in a block. Totally 4 blocks with 0.1 s silence in between. LT 2470Hz.

Table 1: Sound descriptions.

The sounds were abstract, non-ecological earcons with 4 different warning levels. The lowest were on an informational level and the other 3 were of increasing warning levels. A short description of the sounds is shown in table 1. The different concepts were created by different independent sound designers. It should be emphasized that the goal of the present study is not to determine how physical parameters of the sound relate to experience [13]; for a discussion about this see [14], but rather to test if the qualitatively different warning levels would elicit the expected emotional reactions.

Set up: The listening test took place in a room with low background noise (<20dB) without any identifiable sound sources. Sound absorbing black screens were placed around the listening position, forming a booth. All apparatus were placed outside the room. The stimuli were presented through electrostatic headphones (STAX Earspeakers, SR lambda Signature), with a Lucid DA9624 D/A converter. The software for the presentation of the sounds was Microsoft PowerPoint 2003 controlled by the participant.

The experiment included affective ratings and descriptive ratings of the 12 different sounds, the latter for another study. The affective ratings were done using the paper and pencil version of the Self Assessment Mannekin (SAM) scale, (Figure 1) [15,16] for a recent validation using a similar sample as the present one testing valence and activation. The two different ratings were done in separate blocks where half of the participants began with the affective ratings and the other half with the descriptive ratings. Thus, half of the participants heard the stimuli before the affective ratings.


Figure 1: The SAM scale for Valence (upper), and Activation (lower).

Procedure: The participants were individually tested. The participants were first welcomed and tested for normal hearing. Then a questionnaire featuring demographic questions followed. After that they were instructed in how to fill in the SAM-scale and how to perform the test. The participants were allowed to listen to the sounds several times and in their own pace. They were however asked to go with the first feeling for the sound and not over think it.

Results: A 3*4 repeated measures analysis of variance of the three different designs and the four warning levels were conducted on valence and activity respectively. Greenhouse-Geissers’ corrected F-value were used. All factors showed a significant difference (p<0.05), for the different sound concepts and the warning levels as well as the interaction between the two (Tables 2 and 3). Of main importance here is the fact both the valence and activation scales were sensitive to the urgency level. The interaction (Concept * Level) was also significant for both Valence and Activation, indicating that the different concepts showed different patterns over the warning levels. In Figure 2, it may be seen that the three different concepts all move from the lower left quadrant (low activation and positive), to the upper right quadrant (higher activation and negative) with increasing warning levels. This finding replicates the finding in previous studies that this intermediate dimension ranging from low activation and positive valence to high activation and negative valence, correspond to urgency, which is an important factor for the reactions to these sounds [17]. In contrast to previous research that used sounds varying in loudness, this study was able to show the effects with loudness-equalized sounds. Given that loudness was not a factor, what caused the emotional responses to be so systematic? It is evident from table 1 that increasing urgency is associated with a lower fundamental frequency, more roughness, more dissonance, and higher repetition frequency (all parameters found to be linked to higher urgency in previous research [2,5,13] ).


Figure 2: Results from self reports, with lines between the warning levels within the concepts, i = info, sw = severe warning.

Factor df F Sig. p
Concept 1.998 15.911 .000
Level 2.760 61.885 .000
Concept*Level 4.444 5.396 .000

Table 2: Factorial Anova for Activation ratings.

Factor df F Sig. p
Concept 1.928 5.087 .028
Level 2.496 113.123 .000
Concept*Level 3.773 24.809 .000

Table 3: Factorial Anova for Valence ratings.

Overall, these findings support the prediction that increasing urgency covariate with an increase in negative valence and activation.

Experiment 2

Self-reports have a number of limitations including limited insight into internal processes by the participants, self-presentation biases, and situational incitements to response differently [18]. While ratings of affective reactions to simple sensory stimuli probably is devoid of most of these problems [19], the theoretical framework described earlier suggested that physiological changes are, along with the subjective experience of emotion, one important component of how individuals react to different levels of urgency.

Several studies have shown a systematic variation in physiological systems when exposed to objects differing in valence and activation. These studies also showed a correlation between physiological indicators and self-report. Primarily these studies have been conducted using visual stimuli [20]. Some recent studies have however shown similar patterns for emotional reactions to sounds [19].

When measuring physiological reactions it is common to measure in the somatic and the sympathetic systems in the body. The somatic system (voluntary movements in the muscles) can be represented by facial Electromyographic Reactions (EMG), (Figure 3). That includes measures of the muscle activity in the zygomaticus major region and the corrugator superciili region [19]. Activity in zygomaticus results in a smile and corresponds to a positive reaction whereas activity in corrugator results in a frown and corresponds to a negative reaction (For a fuller description on the use of EMG [21] the sympathetic system (that measures the non-voluntarily automatic reactions) includes i.e. Skin Conductance Responses (SCR), usually measured at the long and index fingers, Figure 3. An increase in SCR is covarying with an increase in self-reported activation [22].


Figure 3: Electrode placements.

Studies of Facial EMG responses to sounds have found mixed results. Dimberg [23] found a 95-dB tone evoked a “negative” reaction with increased corrugator activity and an autonomic response pattern that resembled a defense reaction. However, the 75-dB tone elicited no Facial EMG response and an orienting response indicated by a distinct heart rate deceleration and fast habituating skin conductance responses with a relatively short recovery time. Further studies using the high intensity stimuli showed that Facial EMG indeed was responsive to the unpleasantness of the sound [24] and that there was some sex differences, where only females that reacted with a significant increased corrugator response to the high intensity tone [25,26]. Sköldström found that corrugator EMG activity was reactive to sound level and that there were some sensitivity to the frequency of the tone. They also found that rated annoyance was related to EMG activity. Jäncke [27] investigated the effects of auditory stimuli (pure tones and environmental noise) of different intensities on EMG activity. They found that tones and noises of high intensity (> 85 dB) strong Facial EMG reactions over muscles of the upper face were evoked. Who [27] interpreted their findings as support for the notion that facial EMG activity of the muscles of the upper face could serve as an indicator of sensitivity (rather than valence) to external auditory stimuli. However, in a more recent study, even newborns assessed between 24 and 72 h after birth responded with EMG activity to noises of different intensity [28].

Studies of SCR responses to sound are more consistent. Both noise-like sounds varying in intensity and frequency [29,30] and environmental sounds [1,19,23] has been found to activate appetitive and aversive responses indexed by the SCR measure.

However, very rarely has loudness-equalized sounds been used. In Experiment 2 the sounds from Experiment 1 was used and participants facial EMG and SCR was measured. Given that self report ratings showed an increase of negative valence and activation, we expected that Facial EMG and SCR measures would respond similarly.

Participants: 8 participants, 5 men and 3 women participated in the experiment. All participants were tested for normal hearing. The median age was 25 years old, ranging between 19 and 49 years.

Set up

Stimuli: The same set of stimuli as the former experiment was used. Four different warning levels by three different sounds designs and adjusted for equal loudness ISO 532B.

The stimuli were presented three times to each participant, in a blocked design with four different randomizations. The measurements for this study were done together with another study, with other sounds of similar length and type, which improved the effect of the randomization. In total 55 sounds x 3 times were played for each participant.

Set up and physiological response measurements: The measurements were performed using electrostatic headphones (STAX Earspeakers, SR lambda Signature), with a Lucid DA9624 D/A converter. Stimuli were presented with the software Presentation, version 9.90 by Neurobs, which present the stimuli with sufficient time-accuracy and also delivers a logfile over the stimuli presented.

The physiological data were recorded with a BIOPAC Systems Inc. MP150 together with the software AcqKnowledge 3.8.1 at a sample rate of 250 samples per second. Two modules measured the Electromyographic (EMG) activity from the zygomaticus major and the corrugator supercilii muscle regions in the face, and one module measured the visceral systems of the Skin Conductance Responses (SCR) on the long and the index finger of the non dominant hand. The participants were first cleaned with alcohol where the electrodes were to be placed. The electrodes were then filled with Signa Gel, Parker Laboratories, Inc. For the EMG measurements 4 mm reusable electrodes were used and for the SCR measurements 8 mm reusable electrodes were used. The participant was then given a resting period of 10 minutes before test start, to ensure sufficient time for the skin to absorb the gel.

Procedure: The participants were individually tested in a room with low background noise (< 20dBA). The room had no disturbing environment and all apparatus were kept in another room. The participants were welcomed and instructed to sit in a relaxed position and not to move if possible. When the electrodes were placed in the proper position and been given time to absorb the gel the test begun. Each stimulus lasted approximately 5 seconds and was followed by a respiratory time of 10 seconds before the next stimulus was played. Between the second and the third repetition of the stimuli the participants were given a rest break for 5 minutes before continuing.

Processing of recorded signals and score treatment: The principle for the analysis of the time signals was taken from Bradley and Lang [19]. The signals were recorded 2 seconds preceding each stimulus and 13 seconds after onset. Due to failure in the equipment all data of the corrugator supercilii muscle regions had to be removed. First, movements from the participants were detected and data of that measurement were removed. This led to the removal of one data point from the SCR measurements and one data point from the zygomaticus major measurements.

A baseline was measured as the change of the signal in the second immediately preceding each stimulus in the SCR measurements. The signal during the stimuli was averaged over 500 ms bins between one second after the on-set of the stimuli and three seconds onwards. The raw score was then taken as the maximum change between two bins. Then the final score was calculated as the difference between the raw score and the baseline. To normalize the data a log transformation (Log[SCR+1]) was conducted [21].

Measurements of the zygomaticus major were treated offline with a digital High Pass filter at 90 Hz, integrated over 125 ms and rectified. Baseline was measured as the average signal in the second immediately preceding each stimulus. The raw score was taken as the average over three seconds following onset of stimuli. Then the final score was calculated as the difference between the raw score and the baseline.

Data exceeding three standard deviations within each stimulus were considered as outliers and removed. In the SCR measurements 10 data points were removed due to this procedure. In the zygomaticus major measurements 3 data points were removed due to this procedure.

Results: The scores from the physiological measurements were analyzed in a similar way to the data from Experiment 1, with the difference that there was a third factor in the design, the three repetitions of the sounds. Thus were a 3 X 4 X 3 repeated measures analysis of variance of the three repetitions, four warning levels and three different designs conducted on SCR and zygomaticus major separately. Greenhouse-Geissers’ corrected F-value were used.

For the SCR we see a significant difference between the different warning levels (p=0.032, Partial Eta-squared, η2=.388) and the repetitions of the stimuli (p=0.018), but not for the concepts (η2=.247), (table 4). The difference between the repetitions showed that the first repetition gave a significantly higher response than repetition 2 and 3, who had similar means. No significant interactions could be seen.

Factor Df F Sig. p h2
Repetition 1.080 8.874 .018 .559
Concept 1.200 2.298 .166 .247
Level 1.992 4.441 .032 .388
Repetition*Concept 2.060 .402 .682 .054
Level*Repetition 1.821 2.057 .170 .227
Concept*Level 2.480 2.535 .099 .266

Table 4: Factorial Anova for SCR.

For the Facial EMG we got significant differences for both the concepts (p=0.002, η2=.669) and the warning levels (p=0.003, η2=.601), but not for the repetitions of the sounds (η2=.248), (table 5). A significant interaction (p=0.50, η2=0.337) could only be seen in Concept*Level for the Facial EMG.

Factor Df F Sig. p h2
Repetition 1.512 2.312 .153 .248
Concept 1.329 16.289 .002 .699
Level 1.698   10.553 .003 .601
Repetition*Concept 2.107 2.415 .122 .257
Level*Repetition 3,218 .755 .539 .097
Concept*Level 2.195 3.555 .050 .337

Table 5: Factorial Anova for Facial EMG.

Figure 4 and 5 shows physiological and self-report data in the same graphs. As can be seen, both SCR and Facial EMG measures mimicked self-report ratings with more activity for higher warning levels.


Figure 4: Results from Self reports and Physiological measurements plotted above each other. The scaling and offset between the two is adjusted for clearness of the figure.


Figure 5: Results from Self reports and Physiological measurements plotted above each other. The scaling and offset between the two is adjusted for clearness of the figure.


The scope of this study was to measure whether emotional responses would differ between sounds with different levels of intended urgency. The sounds included four different warning levels ranging from low (informational sound, low urgency) to very high (severe warning, high urgency) of three different designs. We expected to find that a shift from low to high urgency would result in more negative emotions and a higher activation level. The results in Experiment 1 supported this prediction. The self-ratings showed more positive valence and less activation in the informational sound with low urgency than the sounds of higher urgency. These changes were significant in all cases. We could also tell that the concepts differed from each other but the reaction pattern was still the same. To be noted is that in an actual implementation of the sound different loudness would be used and the differences between the different warnings/urgency levels would probably be bigger.

The results in Experiment 2 provided further support. The measurements of autonomus nervious activity as well as the zygomaticus major region differed significantly between the different levels of warning. In the SCR measurements an additional pattern emerged. A repeated exposure to the sound changed the reaction to it. The first reaction showed a significant stronger reaction than the following two repetitions (i.e. habituation). This is in line with earlier findings [20]. This would implicate that in highly arousing environments one could expect a smaller reaction than in a less arousing context. Habituation is thus an important issue to consider when designing warning sounds.

Rather than designing and assessing the efficiency of warning sounds from the perspective of induced urgency, we relied on the more general concept of emotional reactions. With a starting point in the overarching framework of the Emotion Reaction Model (ERM) we were able to design warning sounds that induced different emotional reactions. It should however be noted that a certain emotional reaction not necessarily will lead to the desired action (i.e. that drivers break when hearing a high urgency sound). Future research should couple how emotional reactions drive behavior in specific situations (i.e. driving). The emotional reaction is also dependent on the context where the stimulus is presented in. Implemented in real life the sounds will have different loudness and be of different length which would improve the listeners’ differentiation of the sounds. It would also be stronger differences in level of emotion for less urgent sounds in comparison to highly urgent sounds. A question is therefore, what is the appropriate level of emotion? An increase in arousal is up to a certain level beneficial for task performance. Beyond that point however performance drops. Future research should therefore find optimal levels of arousal for warning sounds. A further benefit of using emotions as guide lines for sound design is that the cognitive association (imagery or episodic memory) to the same sounds may show substantial inter-individual variation [31]. It is possible that by creating the emotional response the different possibilities in what the sound could mean to the listener could be narrowed down.


Citation: Vastfjall D, Bergman P, Sköld A, Tajadura A, Larsson P (2012) Emotional Responses to Information and Warning Sounds. J Ergonomics 1:106.

Copyright: © 2012 Vastfjall D, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.