The most widely discussed philosophical question concerning music and the emotions is that of how music can express emotions. (For a thorough treatment of this question, see S. Davies 1994 and part II of Gracyk & Kania 2011.) There is a second group of questions centered around listeners' emotional responses to music. These include questions about why and how we respond emotionally to music, the value of such responses, and why we choose to listen to music that elicits ‘negative’ responses from us, such as sadness. Theorists typically restrict themselves to ‘pure’ or ‘absolute’ music for simplicity, though it is surprising how many central examples fall outside this boundary through being program music or song. The reason given for the restriction is usually that it is easier to understand how music with an accompanying text, say, could express the emotions evident in the text. On the other hand, an important criterion for the evaluation of such music is how appropriately the composer has set her chosen text to music. So an accompanying text is clearly not sufficient for the musical expression of an emotion. Thus, a better reason for initially putting such music to one side is that the interrelation of music and text, or other elements, is likely to be highly complex, and best approached with as well-developed a theory of the more basic phenomenon in hand as possible.

3.1 Emotions in the Music
Pieces of music, or performances of them, are standardly said to be happy, sad, and so on. Music's emotional expressivity is a philosophical problem since the paradigm expressers of emotions are psychological agents, who have emotions to express. Neither pieces of music, nor performances of them, are psychological agents, thus it is puzzling that such things could be said to express emotions. One immediately helpful distinction is that between expression and expressivity, or expressiveness. Expression is something persons do, namely, the outward manifestation of their emotional states. Expressivity is something artworks, and possibly other things, possess. It is presumably related in some way to expression, and yet cannot simply be expression for the reason just given. Most theorists also distinguish between expressivity and representation, claiming that music is expressive of emotions, rather than representing them. To give a non-musical example, one might paint a person crying, yet do so in a clinical style such that the painting represents the person's sadness, yet is itself not a sad painting, that is, expressive of sadness, but rather a cool, detached one. The emotions in a piece of music are thus more closely tied to it than mere descriptions of emotional states, yet not so closely related as to count as expressions of emotion simpliciter.

An obvious way to connect expressivity with expression is to argue that pieces of music or performances of them are expressions of emotion – not the piece's or performance's emotions, but rather those of the composer or performer. There are two major problems with this ‘expression theory’. The first is that neither composers nor performers often experience the emotions their music is expressive of as it is produced. Nor does it seem unlikely that a composer could create, or a performer perform, a piece expressive of an emotion that she had never experienced. This is not to deny that a composer could write a piece expressive of her emotional state, but two things must be observed. The first is that for the expression theory to be an account of musical expressivity, at least all central cases of expressivity must follow this model, which is not the case. The second is that if a composer is to express her sadness, say, by writing a sad piece, she must write the right kind of piece. In other words, if she is a bad composer she might fail to express her emotion. This brings us to the second major problem for the expression theory. If a composer can fail to express her emotions in a piece, then the music she writes is expressive independently of the emotion she is experiencing. Thus music's expressivity cannot be explained in terms of direct expression.

Those usually cited as classic expression theorists include Tolstoy (1898), Dewey (1934), and Collingwood (1938). (One classic critique is Tormey 1971, 97–127.) These theorists have been defended in recent discussions, however, from accusations that they hold the simple view outlined above. See, for example, Ridley 2003b and Robinson 2005, 229–57. Jenefer Robinson has attempted to revive the expression theory, though she defends it as an interesting and valuable use of music's expressivity, rather than an account of expressivity itself (2005, 229–347; 2011).

A second way to link music's expressiveness with actual felt emotions is through the audience. The ‘arousal theory’ is, at its simplest, the claim that the expressiveness of a passage of music amounts to its tendency to arouse that emotion in an understanding listener. Some problems with this simple version can be overcome. For instance, some emotions, such as fear, require a particular kind of intentional object (something threatening), yet there is no such object at hand when we hear fearful music. Thus it seems implausible to claim the music's fearfulness resides in its arousal of fear in us. But the arousalist can broaden the class of aroused emotions to include appropriate responses to the expressed emotion, such as pity. It can also be objected that many understanding listeners are not moved to respond emotionally to music. But the arousalist can simply restrict the class of listener to which his theory appeals to those who are so moved. The main problem with the theory seems more intractable. Essentially it is that in order for a listener to respond appropriately to the music, she must discern the emotion expressed therein. This is most obvious when the response is a sympathetic, rather than empathetic, one. The listener's response depends upon the emotion expressed, and thus the expressivity of the music cannot depend upon that response. (A sophisticated defense of the arousal theory is to be found in Matravers 1998, 145–224, though see the second thoughts in Matravers 2011.)

Despite the problems of the arousal theory as the whole story of musical expressivity, there is a growing consensus, thanks largely to the work of Jenefer Robinson (1994, 2005), that our lower-level, less cognitive responses to music must play some role in the emotional expressivity we attribute to it. However, this role is likely to be a causal one, rather than part of an analysis of what it is for music to be emotionally expressive.

At the other end of the spectrum from the expression and arousal theories is ‘associationism’ – the theory that music's expressivity is a matter of conventional association of certain musical elements, such as slow tempi, with certain emotional states, such as sadness. Again, though associations must play some role in some cases of expression – for instance, cases of particular musical instruments, such as the snare drum, being associated with particular situations, such as war – this role is likely to be a peripheral one. The main reason is the logical-priority problem, already encountered by the arousal theory. The expressivity of music seems closely related to the resemblance between the dynamic character of both the music and the emotions it is expressive of. It is implausible that funeral marches might just as easily have been in quick-paced compound-time. Even in such cases as the snare drum, it seems possible that the instrument was chosen for the battlefield in part due to the expressive character of its sonic profile.

The cliché that music is ‘the language of the emotions’ is often considered as a possible starting point for a theory of musical expressivity. The idea combines the attractive simplicity of conventionality that associationism makes the basis of music's meaning with the idea that music's order is to be understood in terms of syntax. (See Lerdahl and Jackendoff 1983 for a theory along the latter lines.) However, although Deryck Cooke (1959) and Leonard Meyer (1956) are often cited as proponents, it is not clear that anyone holds a full-blown version of the theory. The central problem is the great disparities between language and music, in terms of the ways in which each is both syntactic and semantic (Jackendoff 2011). A serious subsidiary problem is that even if music were about the emotions in the way that language can be, that would not account for music's expressivity. The sentence ‘I am sad’ is about the emotions, but it is not expressive of sadness in the way a sad face is, though I could use either to express my sadness. Most people agree that music's relation to emotion is more like that of a sad face than that of a sentence. (This last criticism is also applicable to Susanne Langer's theory (1953) that music is about the emotions in a symbolic yet non-linguistic way.)

Several theorists have defended accounts of musical expressivity known variously as resemblance, contour, or appearance theories (e.g., Budd 1995, 133–54; S. Davies 1994, 221–67; Kivy 1989, though see Kivy 2002, 31–48 for recent qualms). The central idea is that music's expressiveness consists in the resemblance of its dynamic character to the dynamic character of various aspects of human beings undergoing emotions. The aspects appealed to include the phenomenology of the experience of the emotion, the emotion's typical facial expression, the contour of vocal expression typical of a person experiencing the emotion, and the contour of bodily behavior typical of such a person, including ‘gait, attitude, air, carriage, posture, and comportment’ (S. Davies 2006, 182). Stephen Davies argues that such theories hold music to be expressive in a literal albeit secondary sense of the term. We say that a piece of music is sad in the same sense in which we say that a weeping willow is sad (S. Davies 2006, 183). Such uses are no more metaphorical than a claim that a chair has arms.

Jerrold Levinson concedes that there is an important resemblance between the contour of music expressive of an emotion and the contour of typical behavioral expressions of that emotion. He objects, however, that such an account cannot be the whole, or even the most fundamental part of the story (Levinson 1996b, 2006c). He drives in a wedge precisely at the point where an appeal is made to the resemblance between the music and typical behavioral expressions. He asks what the manner and extent of the resemblance between the two must be, precisely, in order for the music to count as expressive of some emotion. After all, as is often said, everything resembles everything else in all sorts of ways, and so one could point out many resemblances between a funeral march and an expression of joy, or for that matter a cup of coffee and sadness. The resemblance theorist must give some account of why the funeral march, and not the cup of coffee, is expressive of sadness, and not joy. Levinson claims that the obvious suggestion here is that the funeral march is ‘readily-hearable-as’ an expression of sadness. If this is correct, then the resemblance the music bears to emotional behavior is logically secondary – a cause or ground of its expressivity. The expressivity itself resides in the music's disposition to elicit the imaginative response in us of hearing the music as a literal expression of emotion. As a logical consequence, the imaginative experience prompted must include some agent whose expression the music literally is.

In reply to this kind of objection, Stephen Davies has emphasized the role of the listener's response in resemblance theories. Such responses have always been appealed to by such theories, as evidenced by Malcolm Budd's talk of ‘hearing as’ (1995, 135–7), and Peter Kivy's discussion of our tendency to ‘animate’ that which we perceive (1980, 57–9). But Davies now makes the appeal quite explicit and central, devoting as much space to explication of the response-dependent nature of expressivity as to the role of resemblance (2006). To the extent that the response is one of imaginative animation, there will be agreement between Levinson and the resemblance theorist. But Davies, at least, continues to resist Levinson's theory of imagined literal expression on two fronts.

The first is in his refusal to accord a role to imagination in our response to expressive music. For Davies, the response of the appropriate listener upon which the expressivity of the music depends is one of an experience of resemblance (2006, 181–2). In other words, the answer to the question of the manner and extent to which music must resemble some behavioral expression in order to qualify as expressive of a particular emotion is simply ‘in whatever manner and to whatever extent leads us to experience the music as resembling the emotion’. No further attempt at analysis is given, presumably because Davies believes this is the end of the philosophical line. Further explanation of our tendency to respond in this way to music will be in some other domain, such as the psychology of music. Since Davies's theory posits at base a contour-recognition experience while Levinson's posits an imaginative experience of expression, the link between literal expression and musical expressivity looks closer in Levinson's theory than in Davies's. An empirical consequence seems to be that Davies's theory will predict weaker emotional responses to music than Levinson's. Whether or not this is an advantage or disadvantage of the theory depends on the empirical facts about how we respond emotionally to music.

On the second front Davies is more aggressive. He attacks the idea that we imagine a persona inhabiting the music, or giving rise to it in some way as the literal expression of its emotional experiences (1997b; 2006, 189–90; see also Kivy 2002, 135–59; 2006; Ridley 2007). The simplest objection is that there is empirical evidence that understanding listeners do not engage in any such imaginative activity. This is decisive if true, but there is plenty of room to quibble about our ability to test for the right kinds of imaginative activity, the selection of the subjects, and so on. A different kind of objection is that if the persona theory were true, expressive music could not constrain our imaginative activity in such a way as to yield convergent judgments of expressiveness among understanding listeners. That is, different people, or the same person on different occasions, could imagine a single passage as the expression of an imagined agent's anger, excitement, passionate love, and so on. The problem is compounded when we consider the narrative(s) that we might imagine to explain the agent's (or agents') emotional progress in an extended work. It is not clear even how we might individuate one such agent from another, reidentify an agent over time, and so on. These criticisms seem a little uncharitable. As mentioned above, Levinson is open to a resemblance account's playing an important role in our identification of the particular emotions expressed in a passage. So Levinson can simply help himself to whatever level of specificity of emotions expressed the best resemblance account has to offer. As for the individuation and reidentification of the imagined agents, no mention has been made of such complexities so far, and as Levinson says, these ‘are matters on which the account of basic musical expressiveness can remain agnostic’ (2006c, 204).

Since both the resemblance and ready-hearability theories make music's expressiveness a matter of response-dependence, both must answer the question of whose responses are to be taken into account. Both appeal to listeners with understanding of the kind of music under discussion. This raises the question of what counts as understanding (a matter considered in section 4, below). One thing that cannot be appealed to in this connection, though, is an ability to hear the right emotional expressivity in music, for this would render any account circular. Levinson points out that one can appeal to everything but such understanding of expressivity, and thinks that sensitivity to expressivity will come along with the rest (1996b, 109). Aside from this, though, there is the fact that some apparently understanding listeners simply deny that music is expressive of emotion. Levinson thinks we can reasonably exclude such listeners from the class whose responses are appealed to in the analysis of expressiveness, since only those generally disposed to hear expressiveness are reasonably appealed to in determining the specific expressiveness of a particular passage, which are the terms in which he puts his theory.

This suggestion raises the specter of an ‘error theory’ of music's expressivity, that is, a theory that all claims of emotional expressivity in music are strictly false. A major burden of such a theory is to explain away the widespread tendency to describe music in emotional terms. This has been attempted by arguing that such descriptions are shorthand or metaphor for purely sonic features (Urmson 1973), basic dynamic features (Hanslick 1986), purely musical features (Sharpe 1982), or aesthetic properties (Zangwill 2007). There are many problems with such views. For one thing, they are committed to some sort of scheme for reduction of expressive predicates to other terms, such as sonic or musical ones, and such a scheme is difficult to imagine (Budd 1985a, 31–6). For another, anyone not drawn to this theory is likely to reject the claim that the paraphrase captures all that is of interest and value about the passage described, precisely since it omits the expressive predicates (Davies 1994, 153–4). However, the possibility that a musical culture might fall into the grip of anti-expressivist formalism raises questions about contextualism. Perhaps Levinson, Davies, et al., are right that most people hear most music as emotionally expressive. It is a nice question, however, whether, if our musical culture fell into the grip of anti-expressivist formalism – in the future or the past – it would be appropriate to exclude ourselves from the reference class of listeners appealed to by such theories as those of Davies and Levinson. If so, this would point to a kind of high-level contextualism, or cultural relativity about the expressivity of music, making it a more contingent matter than most theorists imply. On the other hand, such an occurrence may be unlikely if our tendency to ‘animate’ non-sentient objects is deeply rooted in our biology.

3.2 Emotions in the Listener
There are two main questions asked about our emotional responses to pure music. The first is analogous to the ‘paradox of fiction’. It is not clear why we should respond emotionally to expressive music when we know that no one is undergoing the emotions expressed. The second is a variant of the ‘paradox of tragedy’. If some music arouses ‘negative’ emotional responses in us, such as sadness, why do we seek out the experience of such music? These questions are addressed in turn.

One might simply deny that we respond emotionally to music. R. A. Sharpe (2000, 1–83), while stopping short of outright denial, suggests that our emotional responses to music are a much smaller component of our understanding experience of it than the philosophical literature on the topic would suggest. (See also Zangwill 2004.) Peter Kivy (1999) goes almost all the way, arguing that those who report emotional reactions to music are confusing the pleasure they take in the beauty of the music, in all its expressive individuality, with the feeling of the emotion expressed.

Though most philosophers appeal to typical experience and empirical data to reject the plausibility of Kivy's position, they admit the problem that motivates it, namely, the conceptual tension between the nature of music and the nature of the emotions we feel in response to it. To elaborate, there is some consensus that emotions are cognitive, in the sense that they take intentional objects – they are about things – and that the nature of a given emotion's intentional object is constrained. For instance, in order to feel fear, one must believe that there is something (the intentional object) that is threatening. When one listens to a sad piece of music, however, one knows there is nothing literally feeling an emotion of sadness, and thus it is puzzling that one should be made sad by the experience.

Part of the puzzle can be resolved by acknowledging that not all emotional responses (broadly construed) are cognitive (Robinson 1994; 2005, 387–400). For instance, it is no more puzzling that one could be startled by a fortissimo blow to a bass drum than that one could so respond to a thunderclap. Similarly, we might respond non-cognitively to basic musical elements such as tension and release just as we do to the tension we observe in a balloon being over-inflated, or to the release of doves into the air.

As for higher-order emotional responses, there are at least two possible explanations. One appeals to the phenomenon of ‘emotional contagion’ or ‘mirroring responses’ (Davies 1994, 279–307; 2006, 186–8). When surrounded by moping people, one tends to become sad. Moreover, such a ‘mood’ is not about some intentional object. One is not necessarily sad for the mopers, nor whatever they are sad about, if anything. Similarly, when ‘surrounded’ by music that presents an appearance of sadness, one might become sad, but not sad about the music, or anything else (Radford 1991). Jenefer Robinson has objected that such contagion is only well documented as a form of non-cognitive response, and in response to restricted cues: ‘After all, if living with a basset hound were like living with a depressed person, would normal folk choose a basset hound as their life's companion?’ (2005, 387–8). But it may be that music's dynamic character is enough for it to cross the threshold into the realm of resemblances which elicit mirroring responses from us.

The ready-hearability theorist is at a slight advantage in accounting for our emotional responses to music's expressivity, since according to that theory one imagines that the music is a literal expression of emotion. This means that emotional responses to music's expressivity are no more puzzling than emotional responses to other expressive imagined agents, such as fictional characters in novels. The advantage is only slight because the question of how and why we respond emotionally to fictions is itself a philosophical problem of some magnitude. Nonetheless, there are several theories available (though this is not the place to go into them). One difficulty with appealing to a solution to this ‘paradox of fiction’ is that it is not clear our emotional responses to the expressivity of music are the same as those to emotionally expressive characters. For instance, the standard example of an emotional response to music is being made sad by a funeral march, while the standard example of emotional response to fiction is (something like) to feel pity for a sad character. If the former is to be explained in the same way as the latter, we would expect listeners to feel pity in response to the funeral march (pity for the persona imagined to be expressing herself through it). However, it seems reasonable to ask for more detailed examples since, on the one hand, we surely do feel sad in response to tragedy and, on the other, it is not obvious that we do not feel pity (or imagined pity, or whatever one's preferred theory of emotional response to fiction posits) in response to tragic music.

Leaving behind our consideration of how and why we respond emotionally to pure music, we turn to the question of why we seek out music that arouses ‘negative’ emotions in us, such as sadness, assuming henceforth that we are in fact aroused to such emotions. (Since this problem is a close analog of the ‘paradox of tragedy’, some of the references below are to literature not explicitly about music, but the transposition of the arguments to music is not difficult to imagine.) One obvious suggestion is that our negative emotional response is a price we are willing to pay for the other benefits of engaging with the piece in question, such as (but not limited to) ‘positive’ emotional responses. While this sort of reasoning may play a role, it cannot be a complete solution, since for most pieces that elicit negative responses there are many others that elicit fewer, or less intense negative responses for the same positive payoff. More sophisticated versions of the same suggestion argue for a more intimate link between the negative emotional response and the payoff. One such is that we cannot understand the work we are engaging with without understanding its expressivity, which brings the negative response with it (Goodman 1968, 247–51; Davies 1994, 311–20; Goldman 1995, 68; Robinson 2005, 348–78). Closely related is the benefit of an aesthetic or artistic appreciation of the expressivity responsible for the negative response.

Shifting focus from benefits located in the expressive work to those located in the emotional listener, the oldest suggestion is Aristotle's theory of catharsis, according which our negative emotional response to negatively expressive art results in a (positive) psychological purgation of the negative emotions (Aristotle 1987, 36–9 (ch. 6)). A less therapeutic approach is the suggestion that, since these emotions are without ‘life implications’ (that is, as discussed above, we are not sad about anything), we are able to take advantage of our responses to savor these emotions, gain an understanding of them, and be reassured that we have the capacity to feel them (Levinson 1982). A question that must be answered by any defender of this kind of response is the extent to which it explains, first, our persistence in seeking out music that elicits negative emotional experiences and, second, the enjoyment we seem to take in these negative responses, as opposed to putting up with them for their related benefits.

A different kind of solution to the problem argues that responses such as sadness that are evoked by expressive music are not really negative. Hume (1757) argues, with respect to tragedy, that the pleasure we take in the mode of presentation of the content of an artwork does not simply counterbalance the negative emotion evoked, but rather subsumes and transforms it into a pleasurable feeling. Kendall Walton argues (also with respect to tragedy) that sadness is not in itself negative. Rather, it is the situation to which sadness is the response that is negative. Thus, though we would not seek out the death of a loved one, given the death we ‘welcome’ the sorrow (Walton 1990, 255–9). Similarly, we cannot affect the sadness of a musical work by not listening to it, and so we welcome our sorrowful response to it as appropriate. Walton's approach has the advantage over Hume's of not positing a rather obscure psychological process. A difficulty for both, however, is the extent to which they accord with our emotional experience in rejecting the characterization of our sadness as negative.

Stephen Davies (1994, 316–20) argues that the kinds of solutions given above construe the problem too narrowly. Though he agrees that we accept the negative responses some music elicits because we are interested in understanding it, he points out that this gives rise to the further question of why we should be so interested in understanding something that brings us pain. His short answer is ‘We are just like that’ (1994, 317), and he begs off giving the long answer, since it seems to be the equivalent of giving an account of human nature, or the meaning of life. However, he points out that human life is suffused with activities that people willingly engage in despite, or indeed partially because of, the difficulties they bring about. Many things, from watching the news, through mountain-climbing, to raising children, are fraught with well-known difficulties, including negative emotional responses. Yet we enthusiastically engage in such activities because that is the kind of creature we are.

Finally, it must be noted that though we have spoken baldly of ‘emotions’ above, there is a consensus that we do not respond to music with fully-fledged emotions. Most obviously, such responses lack most of the behaviors characteristic of the supposedly felt emotion. Some take our responses instead to be weaker versions of ordinary emotions (Davies 1994, 299–307), others take them to share some aspects of ordinary emotions, such as their characteristic affective states, but to lack others, such as a specific intentional object (Levinson 1982, 319–22; Radford 1989). A third option is that we respond to expressive music with ‘quasi-emotions’, that is, the affective components of fully-fledged emotions that we imagine to be fully-fledged (Walton 1990, 240-55). Apart from debate over which of these proposals most closely matches our experience, there is the question of how well each of them fits with the various solutions discussed above to the problem of our negative responses to music, and with empirical work, in which there has been much recent interest (e.g. Robinson 2005; Bicknell 2007b, 2009; Cochrane 2010a, b; S. Davies 2011a, b).