Two ERP studies on Dutch temporal semantics. Giosuè Baggio

Two ERP studies on Dutch temporal semantics Giosuè Baggio

Contents 1 Introduction 3 2 Tense violations 5 2.1 Introduction................................ 5 2.2 Method.................................. 8 2.3 Results................................... 11 2.4 Discussion................................. 27 3 Münte et al. 1998 31 3.1 Experiment................................ 31 3.2 Semantics................................. 32 3.3 Discussion................................. 35 3.4 Conclusion................................ 37 4 Temporal connectives 38 4.1 Introduction................................ 38 4.2 Method.................................. 43 4.3 Results................................... 48 4.4 Discussion................................. 56 5 Conclusion 63 A Materials 64 1

Acknowledgements This thesis is the result of an intense half-a-year of work spent as a trainee at the F.C. Donders Center for Cognitive Neuroimaging in Nijmegen. During this period, and in the preceding months at ILLC, I could benefit from the help of a number of people that ought to be mentioned here. I am sincerely grateful to my supervisors Peter Hagoort and Michiel van Lambalgen for making the stage possible and for giving a direction to the project. This thesis owes much to their ideas and work. I would also like to thank all the researchers gravitating around the EEG labs at FCDC, in particular Tineke Snijders and Nienke Weder. This thesis is dedicated to my parents, Roberto and Rinella. 2

1 Introduction Event-related potentials (ERPs) have been used in cognitive psychology since the 1960s and in psycholinguistics since the late 1970s. What makes ERPs suitable for the study of cognition, in particular within the information processing paradigm, is their good temporal resolution: voltage variations occurring within a time interval of a few tens of milliseconds can be measured at the scalp. In addition, ERP data are relatively easy to obtain [10]. The electroencephalography (EEG) can be recorded from a number of electrodes placed at the scalp while the subject performs a cognitive task, for instance reading a number of grammatically correct and ungrammatical sentences. The amplified EEG signal reflects not only the cognitive processes evoked by the stimuli, but also neuronal activity related to the execution of other, possibly non-cognitive, tasks. While the cognitive processes of interest are time-locked to the stimuli, the noise is supposed to be randomly distributed (in terms of polarity, latency and amplitude) in the EEG trace. Therefore, averaging on individual trials will tend to reduce the noise while increasing the signal, revealing ERP components. ERPs obtained after averaging on trials belonging to different experimental conditions, for example grammatically correct vs. ungrammatical sentences, can be overlayed and inspected for differential effects. Assuming that the conditions differ only with respect to the experimental manipulation (say, the introduction of a grammatical violation), the differences between waveforms can be attributed to the cognitive processes of interest, in this case the detection or the repair of the grammatical violation. 1 On the basis of information concerning the time course and the scalp distribution of the observed differential effects, inferences about the neural implementation of the processes of interest can be drawn. As we remarked above, the temporal resolution of ERP data is excellent compared with other imaging techniques such as PET or fmri. As a consequence, ERPs allow reliable inferences with respect to the relative time course of cognitive processes. This makes them suitable for testing psycholinguistic models of language comprehension. For instance, the issue of the relative time course of syntactic and lexical binding has been extensively discussed in the light of ERP data [6]. Several studies reported that morphosyntactic anomalies 2 elicit a positive deflection starting 500 ms after the onset of the incorrect item, peaking around 600 ms and continuing for 500 ms at least. Because of its polarity (positive) and latency (600 ms following the critical word), this syntax-related ERP component has been called P600 or SPS (syntactic positive shift) [19]. A negative deflection starting 250 ms and peaking around 400 ms after the onset of the critical word has been observed to lexical anomalies 3 in several studies. As a consequence of its functional characteristics, this ERP component has been called N400. Given the different polarity and latency of the P600/SPS and N400, it has been suggested that lexical and syntactic binding operations are subserved by distinct neural circuits, activated at different stages of sentence processing. 1 See [15], [41] for a critical examination of the principles underlying this methodology. 2 The spoilt children is/are trowing the toy on the ground [19]. 3 He spread his warm bread with butter/socks [19]. 3

ERP research on sentence processing has produced a large amount of empirical evidence, supporting a number of cognitive models of language comprehension. However, it has been argued that a cognitive neuroscience approach to language has not as yet merged with linguistic and psycholinguistic research programmes [6]. One could mention formal semantics as an example of a linguistic research program that might indeed be relevant for a cognitive approach to language. On the one hand, as research questions have been mostly drawn from a different theoretical background, studies on language comprehension have been hardly interested in testing the predictive/explanatory power of formal semantics against ERP data. On the other hand, perhaps because formal semantics is seen as empirical only to the extent that it deals with an empirically given language, concepts and methods of formal semantics were rarely used to make sense of ERP data. We suggest that a closer and more lively interaction between semantics and ERP research would be profitable for both parties. Of course, such an integration can be done at different levels of depth. A proper account of semantic binding would involve all the levels of analysis singled out by Marr and Poggio [32], although this is far beyond what we can offer in this thesis. 4 Rather, our purpose is to try to make sense of ERP data sketching some analyses at the first level of description, inspired by some concepts and methods of formal semantics. First-level analyses are supposed to uncover the nature and the goal of the computations involved in the cognitive processes of interest [32]. In this sense, the goal of syntactic binding is to compute a structural (tree-like) representation of a sentence [17], [59], the goal of lexical binding is to compute the meaning of a sentence on the basis of the meanings of its constituents, and the goal of semantic binding is to compute the denotation of a sentence, that is a model making the sentence true. The hypothesis that syntactic, lexical and semantic binding operations are distinct, though not independent, is tested against ERP data in two studies on Dutch temporal semantics. The first, on tense violations, is presented in section 2. The second, on temporal connectives, is motivated in section 3 and presented in section 4. 4 See [57] for progress in this direction. 4

2 Tense violations 2.1 Introduction The study of brain responses to linguistic violations has proved a valuable basis for testing hypotheses and models of language comprehension. This approach has been used to investigate several linguistic phenomena [28]. A particularly interesting case is tense. As is known from linguistic research, establishing the temporal location of an event is a semantic phenomenon, common to natural language, that requires the contribution of language-specific inflectional and syntactical devices [31]. In this perspective, the analysis of event-related potentials elicited by tense violations seems a promising approach to explore the interactions between morphology, syntax and semantics as involved in the phenomenon of temporal reference [51]. Tense violations have been the object of several ERP studies in the last two decades. In some cases, important research questions have been raised [52], [56]. However, as to the specific ERP correlates of tense violations, no clear picture has emerged. In our view, one reason is that the label tense violation has been used to present several different linguistic phenomena, resulting in a variety of effects that appear difficult to explain and compare across paradigms. In particular, these studies tended to overlook the distinction between morphology and syntax, regarding as syntactic violations of tense what in fact were violations of some morphological category. Although morphology and syntax interact (inflection is an example), we should emphasize the different roles that morphological and syntactical principles have in grammar: the former govern the internal structure of words, the latter the arrangement of words in a sentence. Finally, in these studies the consequences of inflectional anomalies on the semantic level have been largely ignored. The first results on tense violations have been reported by Kutas and Hillyard [27]. In this experiment, a group of English-speaking volunteers was presented with sentences like the following: (1) Most of the earth s weather happens in the bottom layer of the atmosphere called/calls the troposphere. (2) The eggs and meat of this turtle are considered/consider choice food by many people. (3) This allows them to stay/stayed under water for a longer period. Incorrect sentences elicited a positive peak 300 ms following the critical word (that is, the verb) approaching significance at parietal sites and a negative wave in the 300-400 ms window, suggestive of an N400-like effect. It is not clear to which feature of the stimuli the observed ERP effects should be attributed. On the one hand, sentences (1)-(3) contain morphological and not syntactical anomalies as Kutas and Hillyard claim. More precisely, the morphological category violated is finiteness (finite/infinitive/participle) and not tense (past/present/future). On the other hand, in this case morphological violations are realized in quite different ways: in (1) the present indicative calls replaces the past participle called ; in (2) the infinitive consider replaces the past participle considered ; in (3) the past participle stayed replaces the infinitive stay. 5

Osterhout and Nicol [44] investigated the effects of tense violations in modal constructions, presenting to a group of English-speaking subjects sentences like: (4) The cat won t eat/eating the food that Mary leaves them. (5) The expensive ointment will cure/curing all known forms of skin disease. (6) The new fighter planes can fly/flying faster than anyone had expected. In the 300-500 ms window, violations were more positive at posterior sites of the midline. The LAN elicited by violations was not significantly different from the effect elicited by correct items. In the 500-800 ms window, violations were more positive at midline and posterior sites, consistently with the distribution of the P600. ERPs elicited by sentence-final words in the violation condition were more negative-going than those observed in correct sentences, beginning 200 ms after the final word and continuing throughout the epoch. Also in this case one should notice that sentences (4)-(6) contain morphological and not syntactical anomalies and that the category concerned is again finiteness and not tense: the present participle replaced in all cases the infinitive form of the verb. In the study conducted by Allen, Badecker and Osterhout [1], syntactic (tense) violations were investigated in sentences containing either high-frequency (HF) or low-frequency (LF) verbs: (7) The man will work/worked on the platform. (HF) (8) The man will sway/swayed on the platform. (LF) Correct low-frequency sentences elicited a more negative N400, ungrammatical items a more positive P600 and low-frequency ungrammatical items a biphasic increase of N400 and P600. In the 500-900 ms window a significant main effect for grammaticality was found, maximal on posterior sites as is typical of the P600. Again, the morphological category violated in (7) and (8) is finiteness and not tense, although the authors say that for example, the two ungrammatical sentences He will walked and He will swayed are equally and unconditionally illformed with respect to tense [1]. In conclusion, the studies of Kutas and Hillyard [27], Osterhout and Nicol [44] and Allen et al. [1] did not bring into play genuine tense violations. Violations of the category of finiteness, together with other morphological violations, will most likely contribute to our understanding of the inflectional/derivational system. Even so, tense violations promise something more, that is some insight into the involvement of the different components of the language system in establishing the temporal location of events described by sentences or verb phrases. There seem to be two ways to construct genuine tense violations. The first is to use sentences in which the temporal location of an event is determined with respect to the temporal location of another one, as is typically the case in contexts in which relational terms such as after and before occur. Commandeur [7] presented to a group of Dutch native speakers sentences like: (9) Voordat Jan stopte met roken, proefde/proeft hij nauwelijks wat hij at. 6

A null effect was found to tense violations in this study. Incidentally, the temporal relations involved introduce semantical complications that may in this case be undesirable. 5 A better solution seems that of using sentences in which the temporal location of an event is determined with respect to a temporal frame specified by an adverb like yesterday : (10) Gisteren kookte/kookt Henk een Indiaas gerecht. Again, [7] reported no reliable main effect of grammaticality to critical words. A significant anterior negativity in the 400-600 ms window following sentence-final words was nevertheless found. Tense violations in temporal adverb constructions have been investigated in two other studies. Fonteneau, Frauenfelder and Rizzi [12] presented to a group of French-speaking volunteers sentences like the following: (11) Demain l étudiant lirait/lisait le livre. The authors correctly classified these tense violations as morphological. The results were a biphasic wave in the 450-550 ms window, with a negative maximum at posterior sites and a positive anterior peak. Steinhauer and Ullman [52] investigated the effects of morphological tense violations for regular and irregular English verbs: (12) Yesterday, I sailed/sail Diane s boat to Boston. (13) Yesterday, we ate/eat Peter s cake in the kitchen. In the 300-400 ms window, irregular verb tense violations elicited N400-like negativities in both male and female subjects while regular verb tense violations resulted in LANs in male and N400s in female subjects. In the 400-900 ms window, regular and irregular verb tense violations elicited a consecutive LAN (400-500 ms) and P600 (600-900 ms) in both sexes. The effects occurring in the 300-400 ms window were interpreted as indexes of the involvement of the morpho-phonological system while those occurring in the 400-900 ms windows were associated with morphosyntactical processing. In the studies of Fonteneau et al. [12] and Steinhauer and Ullman [52] the linguistic phenomena considered were genuine tense violations. Still, there seems to be a problem with the interpretation of the LAN and P600 occurring in the 400-900 ms time interval. On the basis of the data available, this pattern of effects could alternatively be attributed either to the morphological violation itself the detection of a grammatical anomaly or to its consequences on the semantical level the detection of an inconsistency in the temporal information given. Our concern of realizing genuine tense violations seems at this point justified. Consider for instance: (14) Yesterday, I sailed Diane s boat to Boston. (15) Yesterday, I sailing Diane s boat to Boston. (16) Yesterday, I sail Diane s boat to Boston. 5 See sections 3 and 4 for some results and discussion in a different paradigm. 7

With respect to the correct item (14), (15) is a violation of the morphological category of finiteness while (16) is a violation of the category of tense. The finiteness violation makes the clause I sailing Diane s boat to Boston in (15) ungrammatical. As a result, it fails to provide any information about the location of the event of the speaker sailing Diane s boat to Boston on the temporal axis. In contrast, given that genuine tense violations can be realized only in constructions containing a second verb phrase or a temporal adverb, I sail Diane s boat to Boston in (16) is grammatically acceptable. The tense violation still allows the clause to have a denotation: the event is located in the present that is, in the context of (16), outside the temporal frame specified by the adverb yesterday. In this sense, we can say that the temporal information conveyed by (16) is inconsistent. We conducted an ERP study in which the paradigm used by Fonteneau et al. [12], Steinhauer and Ullman [52] and Commandeur [7] was slightly modified by manipulating the relative position of the temporal adverb and the main verb. The hypothesis that the distinction between grammatical and semantical processing has a neural correlate was tested against ERP data on a group of Dutch-speaking volunteers. Sections 2.2 and 2.3 present the method and the results of our study. In section 2.4 we will discuss our findings in the light of specific hypotheses on morphological and semantical processing and we will compare them with previous research. 2.2 Method Subjects We recruited 25 Dutch native speakers from our local subject database. One of them (male, age 22) was not included in the final analysis due to strong ocular and muscular artifacts. This left us with 24 subjects (14 female, mean age 24.6, age range 20-49). None had neurological or psychiatric disorders, had experienced any neurological trauma, or reported the usage of prescription medications, alcohol or other recreational drugs within the preceding 24 hours. All subjects had normal or corrected to normal vision. Informed consent was obtained before the beginning of each session. Subjects receivede6per hour for taking part in the study. Materials An initial set of 960 sentences was constructed, including 80 correct adverb-verb items (A), 80 incorrect adverb-verb items (B), 80 correct verb-adverb items (C), 80 incorrect verb-adverb items (D) and 640 fillers (see section 4.2). The following are examples of the sentences used (see appendix A for the complete set): (17) Afgelopen lente won Julian een literatuur prijs in Frankrijk. (A) (18) Afgelopen lente wint Julian een literatuur prijs in Frankrijk. (B) (19) Pierre beëindigde afgelopen zomer de vertaling van Max Havelaar. (C) (20) Pierre beëindigt afgelopen zomer de vertaling van Max Havelaar. (D) The only difference between condition-pairs (A)-(B) and (C)-(D) was the relative position of the verb and the adverbial modifier. Incorrect items were constructed by replacing the correct past-tensed form of the verb with its present-tensed form. 8

In order to prevent referential and semantical ambiguities, unanchored temporal adverbs [50], anaphoric pronouns, quantifying expressions and modal verbs were carefully avoided. To reduce ocular artifacts, we made sure that each sentence consisted at most of 11 words and each word had at most 13 letters. Dutch spelling conventions were checked in an up-to-date, on-line version of the Van Dale dictionary. From the initial set of 320 critical sentences, 2 versions were constructed. Each version contained 40 correct adverb-verb (A), 40 incorrect adverb-verb (B), 40 correct verb-adverb (C) and 40 incorrect verb-adverb (D) items. Each subject read 160 critical sentences and 160 fillers, presented in a pseudo-random fashion. To avoid repetition effects, only sequences of at most 3 items of the same condition were allowed. Each version was finally inspected to rule out sequences of items suggesting any form of thematic continuity. Procedure After electrode application, participants were asked to sit in a dimly lit, sound-attenuating room in front of a video monitor. Subjects were instructed not to blink or move while sentences were presented on the screen. Their only task was to read each sentence carefully for comprehension. Stimuli were presented word by word (duration 300 ms, with 600 ms between word onsets) in white in the middle of the screen. After 1000 ms of blank screen following the offset of the last word of each sentence, an asterisk mark appeared for 1500 ms, during which subjects were allowed to blink and move. Each session lasted for about two hours, including subject preparation. The experiment took approximately 50 minutes to be completed and was divided into four blocks of 80 trials each. Between the blocks there were three short breaks to allow subjects to recuperate from the blinking regime. Participants behavior during the task was monitored with the aid of a video camera and a microphone placed inside the experimental room. Immediately after the last block, an informal debriefing interview was held. Subjects reports were used as additional elements in a preliminary interpretation of ERP data [46]. Recording Biosignals were recorded simultaneously from 32 electrode sites. Horizontal and vertical eye movements were monitored by two electrodes placed at the left- and right-eye outer canthi and one below the left eye. The EEG was recorded from 28 scalp locations: Fp1, Fp2, F7, F3, Fz, F4, F8, FC5, FC1, FCz, FC2, FC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, O1, O2 (figure 2.1). Sintered Ag/AgCl electrodes, fixed in an elastic cap, were used. EOG electrodes were applied with two-sided adhesive decals. Before electrode application, subjects skin was cleaned with ethyl alcohol and abrasive gel to reduce the impedance at the scalp-electrode junction. Electrode impedance was kept below 5 kω and was checked before the beginning of each block and at the end of the session. The left mastoid electrode TP9 was used as reference. Subsequently, a linked mastoid reference was computed off-line by re-referencing all EEG channels to the right mastoid electrode TP10, which was excluded from further analysis. 9

EEG and EOG were amplified by a multichannel BrainAmp TM DC system. Amplifier settings were: sampling rate 500 Hz, sampling interval 2000 µs, time constant 10 s, high cutoff 70 Hz. Several markers were sent out on-line to the EEG recorder: for conditions (A)-(D), a sentence-initial and a sentence-final markers, corresponding to the onset of the first and the last word respectively; for conditions (A)-(B) a critical word marker, corresponding to the onset of the verb; for conditions (C)-(D) a critical word marker, corresponding to the onset of the first word of the adverb ( vorige, afgelopen, etc.). Figure 2.1 Overview of the 28-electrode montage used. Data analysis The following transforms were applied to the raw EEG signal. The sampling rate was changed to 200 Hz (interval 5000 µs) and the high cutoff filter was set to 30 Hz. The signal was then segmented as follows: 40 segments per condition (A)-(D), starting with the onset of critical words and ending 1000 ms later; 40 segments per condition (A)-(D), starting with the onset of sentence-final words and ending 800 ms later. Baseline correction was computed over the 150 ms preceding the onset of critical or sentence-final words. 10

Epochs contaminated by artifacts (eye blinks, horizontal eye movements, muscle activity, high-amplitude alpha and electrode drifts) were marked and excluded from further analysis. In order to guarantee an acceptable signal-to-noise ratio, subjects had to have at least 30 (on 40) good trials to be kept in the final analysis. For critical word segments, the average number of rejected trials was: condition (A) 1.96, (B) 1.42, (C) 1.29, (D) 1.46. For sentence-final segments, the averages were: (A) 1.46, (B) 2.25, (C) 1.79, (D) 2.37. Average waveforms were computed for each participant in each condition. Overlay functions of grand average ERPs were calculated for condition pairs (A)-(B) and (C)-(D), both to critical and sentence-final words. Waveforms were inspected for differential effects at specific latencies. Several time windows were selected. For conditions (A)-(B): 500-600 ms and 750-850 ms to critical word; 250-350 ms to sentence-final word. For conditions (C)-(D): 300-500 ms, 550-900 ms, 550-650 ms, 750-850 ms and 800-1000 ms to critical word; 500-700 ms to sentence-final word. Subsequent repeated measures analyses of variance (ANOVAs) used mean amplitude values computed for each participant in each time window. Univariate F tests were adjusted by means of the Huynh-Feldt correction. Mauchly s sphericity test was used to calculate the value of ǫ. Multivariate ANOVAs for single dependent measures were performed on the same data set [46]. The results of univariate and multivariate tests will be compared whenever they differ. Initial analyses included the factors violation (2 levels) and site (28 levels), eventually restricted to collections of electrodes (e.g. midline) or single sites. This allowed us to determine the scalp locations at which the main effect of tense violations was significant. Other topographically relevant factors included in further analyses were hemisphere (2 levels), anterior-posterior (2 levels) and quadrant (4 levels: left-anterior, right-anterior, right-posterior and left-posterior). The factor sentence type (2 levels: adverb-verb and verb-adverb) was used in a separate ANO- VA in the 750-850 ms time window. Interaction effects of violation, electrode site, hemisphere, anterior-posterior and quadrant were tested in each time window. 2.3 Results In this section we will first present the results of separate statistical analyses for adverb-verb and verb-adverb sentence types in all relevant time windows. This will allow us to look at the time course and topographical distribution of the particular ERPs observed. We will then compare these effects across time windows and sentence types. At that point we will hopefully have sufficient elements to start a discussion of our experiment. A first inspection of adverb-verb epochs reveals a positive shift elicited by incorrect sentences in the 500-600 ms time window to critical word. The effect has relative maxima at right and midline anterior (Fp2, Fz, F4, FCz, FC2) sites (figure 2.2). However, the main effect of violation is not significant on an ANOVA including all 28 electrodes nor on an independent analysis conducted on right and midline anterior sites. A direct inspection of topographical maps (figure 2.3) reveals a clear positivity of incorrect sentences at right temporal (T8) sites. An independent 11

ANOVA conducted on T8 shows that the main effect of violation is significant. Consistently with this result, we found a reliable interaction effect between the factors violation and hemisphere. Results are reported in table 2.1. Table 2.1 Violation (28 electrodes): F(1,23) = 1.497,p =.234,ǫ = 1 Violation (Fp2, Fz, F4, FCz, FC2): F(1, 23) = 2.546, p =.124, ǫ = 1 Violation (T8): F(1,23) = 5.859,p =.024,ǫ = 1 Violation hemisphere: F(1,23) = 4.177,p =.053,ǫ = 1 Figure 2.2 Grand average (n = 24) waveforms elicited by adverb-verb sentences at right and midline anterior sites and T8. The onset of critical word is at 0 ms. 12

Figure 2.3 Triangulation-linear interpolated topographic map displaying the mean difference between tense violation and correct adverb-verb sentences in the 500-600 ms time window to critical word. Incorrect sentences elicit a more positive response at right fronto-central. The pattern of effects changes somewhat in the 750-850 ms time window. A first inspection of adverb-verb epochs in this interval reveals a clear positive deflection of incorrect items distributed along the midline. The effect has relative maxima at midline, right fronto-central (F4, FC2) and left centro-parietal (CP1, P3) sites (figures 2.4 and 2.5). The effect is not significant on an ANOVA including all 28 electrodes nor it is on an independent analysis conducted on right fronto-central sites. The main effect of violation is only marginally significant at the midline, but it does reach significance at left centro-parietal sites. Despite the fact that the observed positive shift has a clear left-posterior distribution, we did not find reliable interactions between violation and hemisphere or quadrant. Results are reported in table 2.2. 13

Table 2.2 Violation (28 electrodes): F(1,23) = 1.749,p =.199,ǫ = 1 Violation (F4, FCz, FC2): F(1,23) = 1.981,p =.173,ǫ = 1 Violation (Fz, FCz, Cz, Pz): F(1,23) = 3.349,p =.080,ǫ = 1 Violation (CP1, P3, Pz): F(1,23) = 5.391,p =.029,ǫ = 1 Violation (T8): F(1,23) = 4.578,p =.043,ǫ = 1 Violation hemisphere: F(1,23) =.289,p =.596,ǫ = 1 Violation quadrant: F(3,69) =.559,p =.530,ǫ =.506 Figure 2.4 Grand average (n = 24) waveforms elicited by adverb-verb sentences at midline, right fronto-central and left centro-parietal sites. The onset of critical word is at 0 ms. 14

Figure 2.5 Triangulation-linear interpolated topographic maps displaying the mean difference between tense violation and correct adverb-verb sentences in the 750-850 ms time window to critical word. Incorrect sentences elicit a more positive response at midline, right fronto-central and left centro-parietal sites. Top, back and left views are shown. 15

These findings indicate that there might be a right-lateralized effect co-occurring with the one just described. For this reason, we conducted a separate ANOVA on right hemisphere sites, focusing on the same electrode (T8) at which we found a significant main effect of violation in the 500-600 ms time window. At T8 the main effect of tense violation is again significant (table 2.2). In particular, a comparison between topographical maps (figures 2.3 and 2.5, back view) suggests that a right temporal positive shift starts around 500 ms and continues throughout the epoch for 350 ms at least, reaching significance in the 500-600 ms and 750-850 ms intervals. Let us now turn to sentence-final effects in adverb-verb constructions. In the 250-350 ms time window to sentence-final words incorrect sentences elicit a negative wave with relative maxima over parietal and occipital scalp sites (figures 2.6 and 2.7). The effect is not significant on an ANOVA including all 28 electrodes but it does reach significance on an analysis restricted to parietal and occipital sites (P7, P4, Pz, O1, O2). We did not find a significant interaction effect between the factors violation and anterior/posterior. Results are reported in table 2.3. Figure 2.6 Triangulation-linear interpolated topographic map displaying the mean difference between tense violation and correct adverb-verb sentences in the 250-350 ms time window to sentence-final word. Violations elicit a more negative response at parietal and occipital sites. 16

Table 2.3 Violation (28 electrodes): F(1,23) = 1.426,p =.245,ǫ = 1 Violation (P7, P4, Pz, O1, O2): F(1,23) = 4.497,p =.045,ǫ = 1 Violation ant./post.: F(1,23) = 2.316,p =.142,ǫ = 1 Figure 2.7 Grand average (n = 24) waveforms elicited by adverb-verb sentences at parietal and occipital sites. The onset of sentence-final word is at 0 ms. We will now consider the ERPs elicited by tense violations in verb-adverb constructions. In the 300-500 ms time window to critical word, incorrect sentences elicit a negative wave larger over left frontal (F7, F3) and fronto-central (FC5, FC1) sites (figures 2.8 and 2.9). The main effect of violation fails to reach significance on an 17

analysis over all 28 electrodes. However, an ANOVA restricted to left anterior sites (F7, F3, FC5, FC1) shows that the effect is significant. In addition, we found reliable interactions of violation with the factors site, hemisphere, anterior/posterior and quadrant. Results are reported in table 2.4. Table 2.4 Violation (28 electrodes): F(1,23) = 1.202,p =.284,ǫ = 1 Violation (F7, F3, FC5, FC1): F(1,23) = 4.794,p =.039,ǫ = 1 Violation site: F(27,621) = 2.774,p =.019,ǫ =.199 Violation hemisphere: F(1,23) = 6.935,p =.015,ǫ = 1 Violation ant./post.: F(1,23) = 5.053,p =.029,ǫ = 1 Violation ant./post.: F(1, 23) = 5.053, p =.034 (MANOVA) Violation quadrant: F(3,69) = 2.774,p =.008,ǫ =.618 Violation quadrant: F(3, 21) = 2.622, p =.077 (MANOVA) Figure 2.8 Grand average (n = 24) waveforms elicited by verb-adverb sentences at left frontal and fronto-central sites. The onset of critical word is at 0 ms. 18

Figure 2.9 Triangulation-linear interpolated topographic map displaying the mean difference between tense violation and correct verb-adverb sentences in the 300-500 ms time window to critical word. Violations elicit a more negative response at left frontal and fronto-central sites. In order to look at later processing effects, we first selected a large time interval (550-900 ms), further on segmented into three smaller windows (550-650 ms, 750-850 ms and 800-1000 ms) to allow for a more fine-grained analysis. In the 550-900 ms interval, we found a significant main effect of violation at right posterior (CP6, CP2, P8, P4, O2) sites, in particular at centro-parietal (CP6) and parietal (P4) electrodes (figure 2.10). The main effect of violation is not significant on an ANO- VA over all 28 electrodes, nor it is on an analysis restricted to left posterior (CP5, CP1, P7, P3, O1) and right temporal (T8) sites. The interactions of violation with the factors site, hemisphere, anterior/posterior and quadrant are all significant. We also found a marginally significant interaction between the factors violation, quadrant and site, most likely due to the strong impact of the positive shift on the electrodes CP6 and P4. Results are reported in table 2.5. These findings indicate that the observed positive wave has a right posterior focus. A marginal positive shift at right temporal sites seems to co-occur with this effect, starting around 550 ms and continuing throughout the epoch (figure 2.10). A look at the smaller time windows will help us to clarify this situation. 19

Figure 2.10 Grand average (n = 24) waveforms elicited by verb-adverb sentences at T8, left and right centro-parietal, parietal and occipital sites. The onset of critical word is at 0 ms. 20

Table 2.5 Violation (28 electrodes): F(1,23) =.667,p =.422,ǫ = 1 Violation (CP6, CP2, P8, P4, O2): F(1, 23) = 3.953, p =.059, ǫ = 1 Violation (CP5, CP1, P7, P3, O1): F(1, 23) = 1.630, p =.214, ǫ = 1 Violation (CP6, P4): F(1,23) = 5.254,p =.031,ǫ = 1 Violation (T8): F(1,23) = 1.823,p =.190,ǫ = 1 Violation site: F(27,621) = 2.216,p =.030,ǫ =.287 Violation hemisphere: F(1,23) = 4.891,p =.037,ǫ = 1 Violation ant./post.: F(1,23) = 5.705,p =.026,ǫ = 1 Violation quadrant: F(3,69) = 5.008,p =.006,ǫ =.834 Violation quadrant: F(3, 21) = 2.997, p =.054 (MANOVA) Violation quadrant site: F(12,276) = 2.086,p =.081,ǫ =.369 Violation quadrant site: F(12, 12) =.819, p =.632 (MANOVA) In the 550-650 ms interval, the main effect of violation is not significant on an ANOVA over all 28 electrodes, nor it is on single analyses restricted to left and right posterior sites. An independent ANOVA on CP6 and P4 shows that the effect is only marginally significant at these two sites. The observed positivity equally fails to reach significance at T8. There are however reliable interactions of violation with the factors anterior/posterior and quadrant, even if the latter is not supported by a parallel multivariate test. The violation by site interaction is only marginally significant. We did not find a reliable interaction between the factors violation and hemisphere. Results are reported in table 2.6. Table 2.6 Violation (28 electrodes): F(1,23) =.233,p =.634,ǫ = 1 Violation (CP6, CP2, P8, P4, O2): F(1, 23) = 2.573, p =.122, ǫ = 1 Violation (CP5, CP1, P7, P3, O1): F(1, 23) = 1.305, p =.265, ǫ = 1 Violation (CP6, P4): F(1,23) = 3.369,p =.079,ǫ = 1 Violation (T8): F(1,23) = 0.316,p =.580,ǫ = 1 Violation site: F(1,23) = 1.843,p =.079,ǫ = 1 Violation hemisphere: F(1,23) = 1.802,p =.193,ǫ = 1 Violation ant./post.: F(1,23) = 5.733,p =.025,ǫ = 1 Violation quadrant: F(3,69) = 4.606,p =.010,ǫ =.784 Violation quadrant: F(3, 21) = 2.31, p =.106 (MANOVA) The results of statistical tests reported above indicate that the observed positive shift has a posterior distribution, although not strongly lateralized. There is nevertheless a slight dominance of the right posterior quadrant, again a consequence of the impact of the positive wave on CP6 and P4. We expect that all the observed ERPs will reach significance in the 750-850 ms window, in particular at right temporo-parietal sites. 21

Figure 2.11 Triangulation-linear interpolated topographic maps displaying the mean difference between tense violation and correct verb-adverb sentences in the 750-850 ms time window to critical word. Incorrect sentences elicit a more positive response at right temporo-parietal sites. Top, back and right views are shown. 22

In the 750-850 ms time window, the main effect of violation is not significant on an ANOVA including all 28 electrodes. The positive shift reaches significance at right temporo-parietal sites, in particular CP6 and P4 (figure 2.11). The effect is not significant on an ANOVA restricted to right temporal (T8) sites. The violation by site interaction is reliable. We also reported marginally significant interactions of violation with the factors hemisphere, anterior/posterior and quadrant, the latter on a multivariate test only. Results are reported in table 2.7. Table 2.7 Violation (28 electrodes): F(1,23) = 1.542,p =.227,ǫ = 1 Violation (CP6, CP2, P8, P4, O2): F(1, 23) = 5.399, p =.029, ǫ = 1 Violation (CP6, P4): F(1,23) = 6.580,p =.017,ǫ = 1 Violation (T8): F(1,23) = 2.509,p =.127,ǫ = 1 Violation site: F(27,621) = 2.066,p =.044,ǫ = 1 Violation hemisphere: F(1,23) = 3.104,p =.091,ǫ = 1 Violation ant./post.: F(1,23) = 3.324,p =.081,ǫ = 1 Violation quadrant: F(3,69) = 1.323,p =.275,ǫ =.935 Violation quadrant: F(3, 21) = 2.635, p =.076 (MANOVA) The violation by quadrant interaction appears difficult to explain: with a strong positive wave occurring at CP6 and P4 (see figures 2.10 and 2.11) we expected that the effect was statistically robust. However, this is partially supported by a multivariate test only. In the 800-1000 ms interval, the main effect of violation is not significant on an ANOVA over all 28 electrodes. However, the positive shift is again significant on an independent analysis conducted on right temporo-parietal sites, in particular CP6 and P4. The effect fails to reach significance at T8. Interaction effects are reliable between violation and site, violation and hemisphere, violation and anterior/posterior and violation and quadrant. In this case, the results of statistical analyses support the claim that the observed positive shift has a right posterior focus. Results are reported in table 2.8. Table 2.8 Violation (28 electrodes): F(1,23) = 1.163,p =.292,ǫ = 1 Violation (CP6, CP2, P8, P4, O2): F(1, 23) = 5.51, p =.028, ǫ = 1 Violation (CP6, P4): F(1,23) = 6.149,p =.021,ǫ = 1 Violation (T8): F(1,23) = 1.809,p =.192,ǫ = 1 Violation site: F(27,621) = 3.063,p =.005,ǫ =.257 Violation hemisphere: F(1,23) = 6.308,p =.019,ǫ = 1 Violation ant./post.: F(1,23) = 10.331,p =.004,ǫ = 1 Violation quadrant: F(3,69) = 6.995,p =.001,ǫ = 1 23

Let us finally consider sentence-final effects in verb-adverb constructions. In the 500-700 ms time window, we found a negative wave distributed along the midline, with relative maxima over centro-parietal sites (figures 2.12 and 2.13). The main effect of violation is significant on a separate analysis conducted on the following 12 electrodes: FC1, FCz, FC2, C3, Cz, C4, CP1, CP2, CP6, P3, Pz, P4. We found a significant main effect of violation on an ANOVA including all 28 sites, but we did not find reliable interaction effects. Results are reported in table 2.9. Table 2.9 Violation (28 electrodes): F(1,23) = 6.592,p =.017,ǫ = 1 Violation (12 electrodes): F(1,23) = 7.594,p =.011,ǫ = 1 Violation site: F(27,621) = 1.527,p =.168,ǫ =.195 Violation hemisphere: F(1,23) =.358,p =.555,ǫ = 1 Violation ant./post.: F(1,23) =.321,p =.577,ǫ = 1 Figure 2.12 Grand average (n = 24) waveforms elicited by verb-adverb sentences at frontocentral, central, centro-parietal and parietal sites. The onset of sentence-final word is at 0 ms. 24

Figure 2.13 Triangulation-linear interpolated topographic map displaying the mean difference between tense violation and correct verb-adverb sentences in the 500-700 ms time window to sentence-final word. Violations elicit a more negative response at midline, left and right frontocentral, central, centro-parietal and parietal sites. Let us summarize the results presented in this section. To adverb-verb violations, we found: (i) no early processing effects, in particular no left anterior negativity in the 300-500 ms time window to critical word; (ii) a positive shift starting around 500 ms to critical word and reaching significance in the 750-850 ms interval at left centro-parietal sites; (iii) a positive shift at right temporal sites, significant in both the 500-600 ms and 750-850 ms windows to critical word; (iv) a negative wave at parietal and occipital sites, significant in the 250-350 ms window to sentence-final word. To verb-adverb violations, we found: (i) a left anterior negative effect, significant in the 300-500 ms time window to critical word; (ii) a positive shift starting around 550 ms to critical word and reaching significance in the 750-850 ms and 800-1000 ms intervals at right temporo-parietal sites; (iii) no significant positive shift at right temporal sites; (iv) a negative wave at midline and centro-parietal sites, significant in the 500-700 ms window to sentence-final word. 25

This summary depicts a rather complex scenario that will be discussed in detail in section 2.4. What can be easily noticed is the similar (P600-like) effect elicited by adverb-verb and verb-adverb constructions in the 750-850 ms time window. For this reason, we conducted a series of statistical tests including the factor sentence type in that time interval. This allowed us to assess the effects elicited by tense violations across sentence types. The results are reported in table 2.9. Table 2.10 Violation (28 electrodes): F(1,23) = 3.293,p =.083,ǫ = 1 Violation (anterior electrodes): F(1, 23) =.870, p =.361, ǫ = 1 Violation (posterior electrodes): F(1, 23) = 8.302, p =.008, ǫ = 1 Violation (left hemisphere electrodes): F(1, 23) = 1.961, p =.175, ǫ = 1 Violation (right hemisphere electrodes): F(1, 23) = 4.115, p =.054, ǫ = 1 Violation (midline electrodes): F(1, 23) = 3.929, p =.060, ǫ = 1 Violation (left anterior quadrant): F(1, 23) =.133, p =.719, ǫ = 1 Violation (right anterior quadrant): F(1, 23) = 1.098, p =.306, ǫ = 1 Violation (right posterior quadrant): F(1, 23) = 8.126, p =.009, ǫ = 1 Violation (left posterior quadrant): F(1, 23) = 9.352, p =.006, ǫ = 1 Violation (T8): F(1,23) = 4.418,p =.043,ǫ = 1 Violation (T7): F(1,23) =.323,p =.575,ǫ = 1 Sentence type hemisphere: F(1,23) = 3.674,p =.068,ǫ = 1 Sentence type quadrant: F(3,69) = 18.494,p =.000,ǫ = 1 We believe that the observed positive shift should be classified as a P600. In fact, although the main effect of violation does not reach significance on an ANOVA including all 28 electrodes, it is reliable at posterior sites and marginally significant at the midline, consistently with previous research on the P600/SPS [19], [26]. Above we remarked that the P600 elicited by adverb-verb sentences has relative maxima over left posterior sites whereas the P600 elicited by verb-adverb constructions has a right posterior focus (compare figures 2.5 and 2.11). Indeed, the main effect of violation is significant at both left and right posterior quadrants. We also found a significant interaction between the factors sentence type and quadrant. The different lateralization of the P600 in adverb-verb and verb-adverb sentence types and the LAN observed in the latter case can explain these findings. Adverb-verb violations elicit a positive shift at right temporal sites (see figure 2.5, back view), although the P600 has a left posterior focus. Verb-adverb violations elicit a P600-like positivity at right temporo-parietal sites (see figure 2.11), although the slight deflection observed at T8 (see figure 2.10) is not significant on separate analyses. Still, the results reported in table 2.10 indicate that the main effect of violation is significant at T8 whereas it fails to reach significance on an ANOVA restricted to T7. This is consistent with an analysis including left or right hemisphere electrodes only: the main effect of violation is almost significant at right hemisphere sites but it is not reliable on an analysis restricted to left hemisphere electrodes. 26

2.4 Discussion In our informal debriefing interviews we asked subjects whether there was something anomalous with the sentences they read. All of them, without exception, remarked that in some cases the tense of the verb did not match with the temporal adverb chosen. It is perhaps interesting to notice that subjects apparently shared the intuition that verb tense had to match with the adverb rather than the inverse. It thus seems reasonable to attribute the main sentence-internal effect we observed (P600) to the detection/reprocessing of verb-tense violations. None of the subjects noticed any difference between adverb-verb and verb-adverb constructions. In particular, none of them reported that verb-adverb sentences were in any sense more difficult to understand. Nonetheless, we are inclined to see the LAN elicited by verb-adverb violations as a consequence of sentence structure, that is of the relative position of the verb and the temporal adverb. Finally, we hypothesize that the positive shift observed at right temporal sites is related to semantic binding operations, more precisely to the detection/reprocessing of temporal inconsistencies. Sentence-final ERPs will be attributed to global processing factors. P600 Several accounts of verb morphology have been proposed in the psycholinguistic literature. These can be roughly classified into three families. First, generative accounts that regard lexical storage as maximally economical and inflected verb forms (regular and irregular) as the result of computations on verb stems and suffixes. Second, models that view inflected verb forms (regular and irregular) as items stored in long-term memory, modeled by a connectionist network. Third, dual-route approaches in which regularly-inflected verb forms are obtained by combining verb stems and suffixes while irregularly-inflected verb forms are retrieved from long-term memory. In a recent version of the latter account [56], procedural memory (left frontal cortex and basal ganglia) subserves regular verb inflection and declarative memory (left temporal lobe) stores affixes and irregular verb forms. The declarative/procedural model predicts LANs for regular and N400s for irregular verb tense violations [56]. In this perspective, generative accounts would predict LANs for both regular and irregular verb tense violations and connectionist models N400s in both cases. The results of the ERP studies on tense violations reviewed in section 2.1 are partially consistent with the predictions than can be derived from these models: the LAN observed by Steinhauer and Ullman [52] to regular and irregular verb tense violations offers more support to a generative account than to the declarative/procedural model; the N400-like negativity observed by Kutas and Hillyard [27] to finiteness violations is consistent with connectionist accounts of verb morphology; the N400 reported by Steinhauer and Ullman [52] to irregular verb tense violations supports the declarative/procedural model as well as connectionist approaches. However, the effect that has been found more often to violations of verb inflection (finiteness and tense), that is the P600 reported by Osterhout and Nicol [44], Allen et al. [1] and Steinhauer and Ullman [52], cannot be easily explained within any of these frameworks. Below we will try find a possible interpretation of the P600 we also observed to morphological violations of tense. 27

There exist several functional accounts of the P600 [22]. Coulson et al. [8] observed that the amplitude of the positive shift varies with the frequency of ungrammatical items in an experimental block and concluded that the P600 is a domaingeneral component related to the detection of an unexpected event. By contrast, Friederici [13], Münte et al. [38], [39] and Osterhout et al. [43] argued that the P600 is language-specific and reflects reanalysis and repair processes. Several studies reported consecutive LAN/P600 effects to morpho-syntactic violations [8], [14], [37], [52]. A version of the latter account relates the LAN to the detection of the violation and the P600 to reprocessing costs [22]. As we shall see later, empirical data suggest other interpretations of the LAN. On the basis of our results, we are forced to reject the idea that the detection and reprocessing of a tense violation elicit consecutive and distinct ERP effects (LAN followed by P600). In fact, we observed a LAN only to tense violations in verb-adverb sentences. Since it seems unreasonable to hold that tense violations were detected in verb-adverb constructions only, but that reanalysis took place for both sentence types, the question is now whether the P600 we observed reflects only the detection or the detection and reprocessing of a tense violation. Following the picture proposed by Ullman [56], we assume that morphological processing is subserved either by left frontal (rule-based accounts) or by left temporal (connectionist models) brain areas or both (dual-route approaches). Accordingly, we assume that the reprocessing of morphological tense violations would activate either left frontal or left temporal brain areas. Since we did not find any left frontal or left temporal ERP effect to tense violations in both sentence types, we would tend to exclude that the P600 we observed is related to reanalysis or repair processes. In conclusion, we believe that the P600 should be attributed to the detection of a verb-tense violation. The different lateralization we reported for the P600 to adverb-verb and verbadverb sentence types might be related to the fact that in the former case the anomaly is detected as soon as the verb is processed whereas in the latter the detection takes place when the first word of the adverb ( vorige, afgelopen, etc.) is encountered. The anomalous word (i.e. the verb) coincides with the critical word (i.e. the lexical item at which the violation is detected) only in the case of adverbverb constructions. This might be a starting point for explaining why the P600s elicited by different sentence types have different topographical distributions. LAN Left anterior negativities have been reported to word-category violations [37], morpho-syntactic anomalies [8], [14], [37] including tense mismatches [52], relative clause movement phenomena [36], [34], [23] 6 and filler-gap dependencies [11], [23], [24]. There is no agreement upon the functional interpretation of the LAN [19], although working memory accounts seem promising for two reasons. Firstly, the connection between working memory and left anterior brain areas is supported by functional neuroimaging studies [45]. Secondly, several linguistic phenomena can be analyzed in terms of memory maintenance. Even so, some of the observed LANs remain difficult to explain within a working memory account [19]. 6 See section 4.1 for a discussion of these studies. 28

Working memory might subserve the mechanism involved in the processing of tense violations as well. A possible suggestion is that the detection of a tense violation in verb-adverb constructions involves the reactivation of verb-form information when the adverb is encountered. Since reactivation occurs while processing goes on, this backward move of the parser has an impact on working memory load, resulting in a LAN. The detection of a morphological anomaly elicits then a P600 as it was the case for adverb-verb constructions. However, since adverb-verb sentences do not elicit a similar LAN, the explanation just sketched implies that adverbial information is not reactivated when verbs are encountered. This is possibly a consequence of the linguistic fact that an adverb can modify more than one verb phrase. Accordingly, it might be more economical in terms of working memory resources to maintain adverbial information active while verb-forms are checked rather than reactivate it every time a verb is encountered. Semantic binding A number imaging studies found right hemisphere activations to the processing of spoken or read narratives, paragraphs of random sentences, untitled paragraphs [54] and semantically ambiguous sentences [55]. Studies on patients with brain lesions suggest that the right hemisphere is responsible for maintaining the theme of a conversation or a discourse [16], revising inferences from discourse after context update and integrating temporally or semantically distant concepts, for instance in understanding metaphors and jokes. In particular, it has been suggested that right temporal and inferior frontal brain areas are responsible for global semantic integration, including the theme, actors, places and times of a narrative [54]. Such an approach to right hemisphere function might be relevant for our purposes since our incorrect stimuli were in a sense narratives lacking of temporal coherence. The results reported in table 2.10 indicate that the effect of tense violations in both sentence types is significant at right temporal (T8) sites. In the case of adverb-verb sentences, this is supported by separate ANOVAs in the 500-600 ms and 750-850 ms time window. In the case of verb-adverb sentences however, we found no significant effect at T8 on an independent analysis. Above we attributed the observed P600 to the detection rather than the reprocessing of a morphological anomaly. For similar reasons, we are inclined to exclude that the observed temporal positivity is related to semantic reanalysis, for instance to the recomputation of the denotation of the clause Julian wint een literatuur prijs in Frankrijk in example (18). We presume that semantic reanalysis would be subserved by left temporal and frontal brain areas, involving the computation of a new verb form ( won ) for which affixes would be retrieved from declarative memory and combined by procedural memory [56]. However, we did not find any left temporal or frontal ERP effect co-occurring with the observed positive shift to both sentence types. In short, we suggest that the right temporal positivity is related to the detection of a semantical inconsistency, that is a mismatch between the values assigned to temporal variables by the verb and the adverbial modifier [31]. Notice again that, despite the results reported in table 2.10, separate statistical analyses support this conclusion only with respect to adverb-verb sentences. 29

Sentence final effects Post-offset N400-like negativities have been reported to syntactically and semantically unacceptable sentences [18], [42], possibly reflecting the engagement of specialized neural generators subserving global consistency processes [19]. A similar view can be adopted to explain the negative waves we observed at parietal and central scalp sites to adverb-verb and verb-adverb constructions respectively. Sentence closure (marked with a dot following the last word) indicates that no other verb phrase will be processed, and thus constrains the scope of the temporal adverb. At that stage, local consistency processes (like those described in the preceding paragraphs) terminate. Assuming that morphological tense violations and temporal inconsistencies are detected during processing, the observed sentencefinal negativities might be related to the attempts of the processor at constructing a consistent interpretation of the sentence [42]. Again, a possible objection to this view is that we would expect left frontal or temporal brain areas to subserve these reinterpretation processes. However, the ERP effects we observed to sentence-final words were neither left lateralized nor they had an anterior focus. To conclude, we want to emphasize that the conjectures presented here stand in need of further empirical support, possibly from different experimental paradigms. For instance, we might be interested in realizing consecutio temporum violations in sentences containing temporal connectives or in sentences in which the temporal adverbs modify more than one verb phrase. A final suggestion is to construct crosssentence tense violations in complex narratives, avoiding the use of other temporal expressions. This would perhaps help us to clarify the differences between the ERPs elicited by adverb-verb and verb-verb tense violations and would also provide new insights into the cognitive processes involved in establishing temporal coherence at the discourse level. 30

3 Münte et al. 1998 The aim of this section is to discuss an ERP study conducted by Münte and colleagues [40] on English temporal connectives. This experiment offers us an excellent opportunity to discuss and possibly support the main idea behind this thesis, namely that the introduction of semantic techniques into the ERP paradigm can reinforce the criticism of existing hypotheses and accelerate the process of generation and testing of new ones. This will also give us some background and the main motivation for the study on Dutch temporal connectives presented in section 4. 3.1 Experiment Twenty-four English-speaking volunteers participated the experiment. Each subject read 120 critical sentences (60 after, 60 before ) and 480 fillers presented one word at a time in the middle of a video monitor. Biosignals were recorded from 26 geodesically distributed scalp sites. ERPs were obtained for 6144 ms epochs starting 300 ms before sentence onset. Two time windows were selected for further analyses: 500-3000 ms (first clause) and 3000-5500 ms (second clause). Before and after versions of the same sentences were presented to different halves of the group of volunteers. Across subjects the same sentences occurred, differing only in their initial connective: Before p, q vs. After p, q. The tense of all verbs was simple past. Each clause described a distinct event which is neither logically nor causally related to the other [40], for instance: (1) After the scientist submitted the article, the journal changed its policy. (2) Before the scientist submitted the article, the journal changed its policy. A comprehension probe followed each critical sentence. The clauses were presented separately in random order at the left and right sides of the screen. Subjects had to indicate which of the two events happened first by pressing either the leftor the right-hand button. An additional test assessing subjects working memory span was conducted. Participants read aloud increasingly larger (2 to 6) sequences of sentences. Immediately after each sequence, subjects attempted to recall the last word of each sentence in the order in which these were presented. Scores were then used to group subjects (high, medium and low working memory span). The results were the following. Before sentences elicited a larger negativity over left compared with right hemisphere sites. At sites on the left frontal scalp the responses to before and after sentences diverged as early as 300 ms and the difference between sentence-types was reliably larger during the second-clause (3000-5500 ms). Besides, the mean amplitude difference of left frontal negativity between before and after sentences was larger in subjects with high working memory span scores. According to the authors, these results can be explained as follows. As reported above, brain responses to different sentence-types diverged within 300 ms after sentence onset, thus when the temporal connective was processed. After and before access almost immediately different bits of conceptual knowledge stored in 31

long-term memory: after signals that the events will be presented in their actual order whereas before signals that the order will be reversed. In other words, the initial lexical item ( after or before ) leads subjects to expect that the events will be presented in their actual or in reverse order. This expectation is based on real-world knowledge of time, and linguistic knowledge of how these notions map onto expressions of time [40]. The prolonged negativity at left frontal scalp sites is related to an increase of working memory load required to process sentences presenting the events out of chronological order (i.e. before sentences) [40], [58]. Similar responses have been observed contrasting spoken or written sentences differing in working memory demands [34], [36], [58] (see section 4.1). After sentences do not require extra working memory resources. As a consequence, representations of linguistic stimuli appear to be structured by real-world knowledge of the temporal sequence [40]. According to Münte and colleagues, this is the first demonstration that high-level, real-world knowledge has immediate and sustained consequences on neural processing during sentence comprehension [40]. Two assumptions underlie the proposed explanation. First, after and before name converse relations. The meaning of after in sentences of the form After p, q is that the event p precedes the event q. The meaning of before in sentences of the form Before p, q is that the event p follows the event q. In this sense, we can say that real-world knowledge of time maps onto the lexical meanings of temporal connectives. Linguistic knowledge is stored in long-term memory and retrieved when after and before are processed. Second, the only relevant difference between after and before sentences is the order in which they present the events. Notice that these two assumptions are not equivalent. In fact, one can consistently hold that after and before are converses and that nevertheless there is more to the semantics of temporal connectives than the order in which they present the events [4], [5]. Still, it is difficult to avoid the impression that this is not what Münte et al. assume throughout their paper. 3.2 Semantics At first sight, it would seem natural to regard after and before as converses. The following definitions account for this intuition: (3) Ap,q iff t 1 [q(t 1 ) t 2 [t 2 < t 1 p(t 2 )]] (4) Bp,q iff t 1 [q(t 1 ) t 2 [t 2 > t 1 p(t 2 )]] These definitions can be phrased as follows. A sentence of the form After p, q is true at the time of the utterance if and only if there is an interval t 1 at which the event q occurs and there is an interval t 2 earlier than t 1 at which the event p occurs. A sentence of the form Before p, q is true at the time of the utterance if and only if there is an interval t 1 at which the event q occurs and there is an interval t 2 later than t 1 at which the event p occurs. From definitions (3)-(4) it is easy to show that Before q, p can be derived from After p, q and vice versa. In this sense, there is a symmetry between the sentential meanings of temporal connectives given in definitions (3)-(4). Within this framework after and before are treated as converses. 32

There are at least two problems with the proposed definitions. First, they lead to wrong predictions with respect to the inferences that can be drawn from after and before sentences [2], [20], [21]. Second, within this framework the different behavior of negative polarity items in after and before sentences cannot be accounted for [29], [48]. To focus on the first problem, consider the following sentences: (5) After Doris finished her studies, she traveled all over the world. (6) Before Doris traveled all over the world, she finished her studies. Definitions (3)-(4) predict that (5) entails (6) and vice versa. According to (3), (5) is true if and only if there is an interval t 1 at which Doris traveled all over the world and an interval t 2 earlier than t 1 at which Doris finished her studies. This is compatible with the situation in which Doris traveled all over the world both before and after the moment at which she finished her studies. On the contrary, our intuitions suggest that (6) is true if and only if there is an interval t 1 at which Doris finished her studies and every interval t 2 at which Doris traveled all over the world is later than t 1. Hence, a proper semantics for after and before should predict that (5) does not entail (6). Accordingly, we reject (4) and propose the following definition of before [29]: (7) Bp,q iff t 1 [q(t 1 ) t 2 [p(t 2 ) t 1 < t 2 ]] Notice that in (7) universal quantification has been built into the meaning of before. In this way, the problems raised by non-mutually entailing sentences like (5) and (6) are solved. However, within this new framework after and before are no longer converses. An important consequence of definition (7) is that a sentence of the form Before p, q can be true even if the event q occurs at some interval t 1 and, for every t 2 later than t 1, the event p fails to occur at t 2. To see that this prediction is correct, consider the following sentences: (8) John had lunch with Mary before he went to work. (9) Paul died before he saw his grandchildren. (10) The pianist left the orchestra before the concert started. Here we have some examples of different uses of the temporal connective before. In (8) by accepting the truth of the whole sentence we are forced to accept the truth of both clauses John went to work and he had lunch with Mary. In (9) by accepting the truth of the whole sentence we are forced to accept the truth of the main clause he died and the falsity of the subordinate clause Paul saw his grandchildren. Finally, accepting the truth of (10) commits us to to accept the truth of the main clause the pianist left the orchestra but it does not force us to accept the truth of the subordinate clause the concert started. Consider now the after versions of (8), (9) and (10): (11) John went to work after he had lunch with Mary. (12) Paul saw his grandchildren after he died. (13) The concert started after the pianist left the orchestra. 33

Notice that, while (8) and (11) are equivalent, (10) and (13) are not. In fact, (13) commits us to accept the truth of both clauses the pianist left the orchestra and the concert started whereas (10) leaves open the possibility that the subordinate clause is false, namely that the concert didn t start. Finally, (9) is turned into (13), which is false at least in a world in which the dead cannot see. A connective is veridical if it forces us to accept the truth of the clause it introduces [35]. In this sense, after is generally veridical while before is not. After p, q entails the truth of the subordinate clause p, as predicted by definition (3). In contrast, Before p, q does not entail the truth of the subordinate clause p, as predicted by definition (7). We will call veridical those before sentences like (8) that (contextually) entail the truth of the subordinate clause and non-veridical those that do not [48]. The class of non-veridical before sentences includes also those like (9) that entail the falsity of the subordinate clause and that might be called anti-veridical. Let us now consider some cross-linguistic data. It has been argued that Romance counterparts of non-veridical before require the presence of a subjunctive in the clause they introduce [2], [48]. This is indeed true. Consider for instance the French translation of (9): (14) Paul mourut avant qu il ne vît ses petits-enfants. However, a subjunctive in the subordinate clause can be used also when the occurrence of the connective is veridical. In Italian (9) can be translated as follows: 7 (15) Prima che Giovanni andasse al lavoro, pranzó con Maria. where andasse is the past-tensed subjunctive form of the verb andare ( to go ). In Romance languages, to know the mood of the verb in the subordinate clause is not sufficient to decide whether the connective is used veridically or non-veridically. In German, a subjunctive in the bevor clause is required in order to express nonveridicality. Still, non-veridical readings of past-tensed indicative bevor sentences are in some cases accepted by native speakers. In Dutch too, past-tensed indicative voordat sentences allow non-veridical readings. 8 The same holds for inainte in Romanian and pered in Russian. A more extensive empirical research would reveal whether, for a given language, there exist grammatical indicators that force non-veridical or anti-veridical readings of a counterpart of before. For our purposes, on the basis of a small amount of data, we will assume what follows. Firstly, before can be used both veridically and non-veridically whereas after allows veridical uses only. Secondly, the distinction between veridical and non-veridical occurrences of before is genuinely semantical. Accordingly, it is the context of an utterance or world knowledge as in the case of (9) that gives a hearer the information required in order to decide whether the occurrence of before is veridical or non-veridical. Thirdly, non-veridical readings of Before p, q sentences arise when a specific causal relation between the events obtains: the occurrence of q prevents the occurrence of p later in time, as is the case 7 Note that we changed the position of the temporal connective in order to avoid the use of the infinitive construction: Giovanni andó al lavoro prima di pranzare con Maria. 8 See the behavioral data presented in section 4.2. 34

in examples (9)-(10) and also, as we will argue below, in (2) reported by Münte and colleagues [40]. The meaning of prevents (P) is given by the following definition: (16) qpp iff t 1 [q(t 1 ) t 2 [t 1 < t 2 p(t 2 )]] Information concerning causal relations between events (world knowledge) in the form of (16) is stored in declarative memory and retrieved when it is made salient by the context. It remains to show how non-veridical readings of Before p, q sentences arise from the interplay of discourse information and world knowledge. The purpose of the derivation below is to replace all variables with constants, resulting either in a formula in which only constants occur or in a negative existential statement: (17) t 1 [q(t 1 ) t 2 [p(t 2 ) t 1 < t 2 ]] discourse information, by (7); (18) t 1 [q(t 1 ) t 2 [t 1 < t 2 p(t 2 )]] world knowledge, by (16); Replacing the variable t 1 with the constant t q in (17) and (18) yields: (19) q(t q ) t 2 [p(t 2 ) t q < t 2 ] and (20) q(t q ) t 2 [t q < t 2 p(t 2 )]; (21) q(t q ) by (19); (22) t 2 [t q < t 2 p(t 2 )] by (20); (23) t 2 [t q < t 2 p(t 2 )] by (22). Propositions (21) and (23) can be read as follows. The event q occurs at the interval t q, however there is no interval t 2 later than t q at which p occurs. The subordinate clause is false or equivalently, the temporal connective is interpreted anti-veridically. 3.3 Discussion Following the ideas presented above, we will try to sketch a cognitively plausible account of the processing of After p, q and Before p, q sentences. At a first stage, the meanings of temporal connectives and other lexical items are retrieved from declarative memory. The processor then computes the meaning of the sentence, that is a representation containing only variables in the form of (3) or (7). These processes roughly correspond to what formal semanticists call compositional build-up and psycholinguists syntactic and lexical binding. At a second stage, the processor engages in a credulous interpretation of discourse [53]: working memory computes a denotation of the clauses p and q which makes the sentence true. This is done by replacing all variables with constants. Existentially quantified variables can be directly replaced with constants. The process terminates either with a formula in which only constants occur or with a negative existential statement. For instance, the formula (21) q(t q ) indicates that q occurs at the interval we called t q, that is t q is the denotation of the clause q. In the case of After p, q sentences, a denotation for both clauses is computed. In the case of Before p, q sentences, a denotation is computed only for the main clause q. If world knowledge in the form of (16) is retrieved from declarative memory, the processor derives a proposition in the form of (23) that the subordinate clause has no denotation in the relevant time interval. Let us now take a closer look at examples (1) and (2). 35

Let s be the clause the scientist submitted the paper and j the journal changed its policy. Syntactic and lexical binding result in the following representations for (1) After s, j and (2) Before s, j respectively: (24) t 1 [j(t 1 ) t 2 [t 2 < t 1 s(t 2 )]] (25) t 1 [j(t 1 ) t 2 [s(t 2 ) t 1 < t 2 ]] The processor computes the denotation of the clauses s and j in (1) After s, j replacing the variables t 1, t 2 with the constants t j, t s in (24): (26) j(t j ) t s < t j s(t s ) Proposition (26) means that the event j (the journal changed its policy) occurs at the interval t j, the event s (the scientist submitted the paper) occurs at the interval t s, and t s is earlier than t j. If we now turn to example (2), we can see that the manipulation of the initial lexical item ( before instead of after ) results in a sentence in which (i) events are presented in reverse order and (ii) a different sequence of events is described: in (1) the journal s policy change follows the scientist s submission of the paper whereas in (2) the former precedes the latter. Since discourse context is changed, a different fragment of world knowledge is now made salient. Reading example (2), one might wonder whether the occurrence of the first event prevented the occurrence of the second later in time. That is, the scientist might have not been able to submit the paper precisely because of the journal s policy change. In this sense, (2) can be interpreted non-veridically. The fact that the choice of the temporal connective was manipulated without reversing the order of the clauses, reveals an important limitation of the experimental design of Münte and colleagues, which might have had consequences on ERP profiles. Sentences like (2), 9 in which the manipulation activated causal knowledge in the form of (16) (e.g. The journal s policy change prevented the scientist to submit the paper ), might be more demanding in terms of working memory resources. The computations are sketched below: (25) t 1 [j(t 1 ) t 2 [s(t 2 ) t 1 < t 2 ]] discourse information, by (7); (27) t 1 [j(t 1 ) t 2 [t 1 < t 2 s(t 2 )]] world knowledge, by (16); Replacing the variable t 1 with the constant t j in (25) and (27) yields: (28) j(t j ) t 2 [s(t 2 ) t j < t 2 ] and (29) j(t j ) t 2 [t j < t 2 s(t 2 )]; (30) j(t j ) by (28); (31) t 2 [t j < t 2 s(t 2 )] by (29); (32) t 2 [t j < t 2 s(t 2 )] by (31). There is no interval t 2 later than t j at which s occurs. The subordinate clause is false or equivalently, the occurrence of the temporal connective is interpreted as anti-veridical. 9 An inspection of the complete set of materials of [40] shows that (2) is not an isolated example. There are other before sentences that might have triggered non-veridical readings of the temporal connective: Before the run exhausted the joggers, the vendors provided some drinks, Before the ambassador met the king, the incident overshadowed the negotiations, etc. 36

According to Münte and colleagues, the observed slow potential negativity at left frontal scalp sites during the second clause is related to the additional computations required to build a representation of before sentences in working memory. Before sentences are more complex to parse because they present the events out of chronological order. However, non-veridical readings of before, forced by the activation of world knowledge, could make a sentence like (2) more complex to process compared with (1). This might reflect the number of the computations resulting in (30) and (32) compared with those resulting in (26). Assuming that these processes are subserved by working memory [53], we suggest that sentences like (2) might require more resources compared with (1). The semantics of before introduced a confounding factor in the experiment of Münte et al. Notice however that while the experimental design, and in particular the manipulation of the initial item without switching the order of the clauses, suggests that event order and non-veridicality might have interacted during the processing of Before p, q sentences, the data set presented in [40] does not allow any conclusion with respect to the correctness of our argument. In sum, without further experimental work we cannot decide which of the two hypotheses (event order vs. non-veridicality explaining the observed ERP effect), if any, should be rejected. 3.4 Conclusion We would like to conclude this section with some suggestions for a refinement of the experiment discussed so far. Sentences of the form p before q and q after p were not used as stimuli in [40]. Upon a closer look, these sentences seem to be the ideal candidate for our new experiment. Suppose that the ERP effects elicited by After p, q (in-order veridical) vs. q after p (out-of-order veridical) sentences are contrasted: if the response to q after p constructions is more negative, given that the only relevant difference between sentence types is event order, we will consider the hypothesis proposed by Münte et al. supported also by the new data and thus, up to a certain degree, confirmed. Another comparison could be tested: After p, q (in-order veridical) vs. p before q (in-order (non-)veridical). If the response to p before q constructions is more negative, given that the only relevant difference between sentence types is the connective used, that is the fact that only before sentences license non-veridical readings, the observed ERPs should be attributed to the cognitive processes involved in establishing whether the occurrence of the connective is veridical or not. Accordingly, the hypothesis proposed by Münte et al. should be corrected. The account of sentence processing sketched here predicts however that before sentences are more complex to process only if a specific causal relation between the events obtains, and knowledge of such relation is retrieved from declarative memory and used in the processes described above. As a consequence, the comparison in which we might be interested is not After p, q vs. p before q. Rather, we should compare veridical vs. non-veridical p before q sentences, that is in-order items differing only in the causal relation obtaining between the events, and thus in the computations assigning a denotation (or no denotation) to the subordinate clause. The new study on temporal connectives is presented in section 4. 37

4 Temporal connectives 4.1 Introduction In section 2 we discussed a series of studies in which the ERP effects investigated were very fast voltage changes occurring at specific latencies to critical words, that is discourse items that are assumed to elicit the processes of interest. In this section we will look at a different class of ERP effects: slow voltage changes that unfold as lexical items are bound together into a coherent interpretation of the sentence. Sentence-length ERPs have been reported by Müller et al. [36] to spoken relative clauses in English. The following are examples of the stimuli used: (1) The fireman who speedily rescued the cop sued the city.... (SS) (2) The fireman who the cop speedily rescued sued the city.... (SO) Subject-object (SO) constructions elicited a larger negativity compared with (SS) sentences, starting with the onset of the relative clause and increasing as the gap (indicated with ) following the relative clause was encountered. The effect was more pronounced on right hemisphere sites, with a frontal maximum. The observed slow-potential wave was more broadly distributed during post-relative clause. Functional imaging studies established a connection between working memory structures and blood flow increase in dorsolateral prefrontal brain regions during the processing of SO sentences [36]. The topographical distribution of the effects reported by Müller and colleagues supports the hypothesis that additional working memory resources are required to process object-relative sentences. Similar ERP effects have been found in two other studies [34], [23]. Mecklinger et al. [34] contrasted the ERPs elicited by SS and SO constructions in German, investigating how semantic information conveyed by lexical items influences the choice of either reading (SS vs. SO). In a subgroup of subjects ( fast comprehenders ), semantically neutral SS and SO sentences (that is, sentences in which the verb does not influence thematic role assignment) elicited slow negativities beginning around 400 ms after the verb onset and continuing throughout the epoch. The effect was larger at midline and right anterior electrodes. King and Kutas [23] compared ERPs to SS and SO structures in English. SO sentences elicited larger slow-potential negativities with relative maxima over left frontal sites. The difference between the ERPs elicited by SS and SO sentences was larger in a subgroup of subjects ( good comprehenders ). In [36], [34], [23], working memory has been invoked to explain the observed ERP effects. In examples (1) and (2) above, the relative pronoun who (serving as a wh-filler) is held in working memory until the position from which it was extracted (i.e. the gap, indicated with ) is encountered. The distance between filler and gap is larger in SO constructions. Therefore, the relative pronoun who has to be maintained in working memory for a longer period when an SS sentence like (2) is processed compared with an SO sentence like (1). The longer the moved constituent is available in working memory, the more resources have to be spent. This would explain the larger negativity elicited by SO compared with SS constructions and also the differences between subject groups [11]. 38

The studies just reviewed tended to focus on syntactic or structural binding operations, presumably assuming that lexical and semantic binding had little or no consequence on working memory load. It is of course reasonable to expect that structural properties of stimulus sentences (including filler-gap dependencies, word order [11], etc.), would affect the profile of ERP responses, in particular when default or preferred structures like SS are manipulated. However, as the discussion in [34] seems to imply, lexical and semantic binding operations might well interact with structural processing. Therefore, confounding factors might be introduced in the experimental design whenever the relevant distinctions are not made. An interesting case to this regard is the experiment conducted by Münte and colleagues on English temporal connectives [40] discussed in section 3. In this study, subjects were presented with sentences like following: (3) After the scientist submitted the article, the journal changed its policy. (4) Before the scientist submitted the article, the journal changed its policy. A comprehension probe followed each sentence, in which the clauses were presented at the left and right sides of the screen. Participants had to indicate which of the two events happened first (as stated by the after or before sentence they read) by pressing either the left- or the right-hand button. The results were a larger negative wave to before sentences at left frontal scalp sites. The effect was larger during the second clause, in particular in subjects with high working memory span scores ( good comprehenders ). According to the Münte and colleagues [40], and consistently the findings of the studies reviewed above [23], [34], [36], the observed negativity at left frontal scalp sites is related to the additional computations required to build a representation of the content of before sentences in working memory [58]. After p, q sentences present the events in the right chronological order: the order in which the events are mentioned in discourse (p, q) coincides with the order in which they occur (first p, then q). In contrast, Before p, q sentences present the events in reverse order: the order in which the events are mentioned in discourse (p, q) reverses the order in which they occur (first q, then p). On the basis of the information conveyed by the initial lexical item (the temporal connective in Before p, q indicates that p follows q), working memory has to reverse the order of the events as they are mentioned in discourse. The mechanism might be quite similar to that described above for filler-gap dependencies. The first clause p is held in working memory until the last word of the second clause q is encountered. At that point, a new discourse representation is constructed: this time the order in which the events are mentioned in discourse (q, p) coincides with the temporal ordering of the events (first q, then p). This would explain the observed slow-potential negativity to before sentences and the differences between subject groups. The findings of Münte et al. indicate that discourse ordering relations between clauses or sentences are relevant for establishing temporal ordering relations between events. More precisely, the hypothesis is that discourse and temporal ordering relations between events are expected to coincide. In other words, readers and hearers expect that events are mentioned and described in the order in which they occur. This expectation might be active during sentence comprehension and might 39

have consequences on ERP responses. Whenever a sentence presenting the events in reverse order is processed, as is the case for Before p, q constructions, working memory has to adjust the discourse ordering relation (p, q) with respect to the temporal ordering relation established by before (first q, then p), resulting in a new discourse representation (q, p). Borrowing the terminology from McGonigle and Chalmers [33], we will talk of a privileged temporal representation vector (forcing events to be encoded in working memory only in the order in which they occur) instead of an expectation that discourse (in general, representational) order and temporal order coincide. Evidence from behavioral studies conducted on humans and non-human primates supports the hypothesis that privileged temporal (and spatial) representation vectors are active during working memory tasks [33]. Still, some criticism can be addressed to the explanation adopted by Münte and colleagues. For instance, one could ask whether the observed variations of working memory load were related to the processing of before sentences or rather were the consequence of the additional task imposed by the comprehension probe. In the probe, subjects had to indicate by pressing a button which of the two events occurred first. A temporal representation vector might indeed improve subjects performance on this task. Economical considerations have received increasing attention in recent accounts of behavioral data from working memory tasks [33]. In this case, events might be stored in working memory in the right order precisely to ensure a fast and accurate retrieval during the probe. However, a recomputation of the discourse ordering relation might not be required when the only task is reading and understanding the stimuli. Another, more important question is whether event order is the only relevant difference between After p, q and Before p, q constructions, and thus whether the observed ERP effects should be attributed to event order alone or rather to the interaction with some other feature of the stimuli. 10 Cross-linguistic data suggests that the answer to this question is negative. In most European languages, after and its counterparts are generally veridical while before and its counterparts are not [2], [30]. A connective is veridical if it forces us to accept the truth of the clause it introduces [35]. Sentences of the form After p, q entail the truth of the subordinate clause p. In contrast, sentences of the form Before p, q do not entail the truth of the subordinate clause p. We will call veridical those before sentences that (contextually) entail the truth of the subordinate clause and non-veridical (NV) those that do not [48]. The class of non-veridical before sentences includes also those that entail the falsity of the subordinate clause and that we might call anti-veridical (AV). The following examples are taken from Google: (5) The defendant escaped before the officer completed the criminal records review or the paperwork necessary to obtain felony warrants. (NV) (6) Secularism died before it came in vogue. (AV) Sentence (6) entails the falsity of subordinate clause (secularism) came in vogue whereas in (5) we are not forced to accept the truth (nor the falsity) of the clause introduced by before. Upon a closer look, this is precisely what happens in the 10 See sections 3.2 and 3.3 for discussion. 40

case of example (4) reported by Münte et al.: we are not forced to accept the truth of the subordinate clause because the journal s policy change might in fact have prevented the scientist to submit the article. In contrast with what the authors claim (each clause described a distinct event that is neither logically nor causally related to the other [40]), we can hypothesize a causal relation between the events described by (4) and thus a specific logical relation between the sentence and the clause introduced by the temporal connective. In our view, these remarks suggest a different interpretation of the results presented by Münte and colleagues. Before sentences might be more difficult to process not only as they present the events in reverse order, but also because discourse information might have to be checked against factual or world knowledge in order to establish whether the sentence is veridical or non-veridical. In particular, suppose that the proposition that the occurrence of the first event prevents the occurrence of the second later in time is retrieved from declarative memory as world knowledge. At that stage, the processor performs semantic binding operations, resulting in a failure to assign a denotation to the clause introduced by before. For instance, suppose that a sentence like the following is encountered: (7) Max died before he saw his grandchildren. (AV) As soon as relevant lexical information is available, the conditional If Max dies, then he does not see his grandchildren later in time is retrieved from declarative memory as (an instance of) world knowledge and is used in a reasoning process terminating with a failure to assign a denotation to the clause Max saw his grandchildren. Accordingly, the occurrence of the temporal connective is interpreted as anti-veridical. Semantic binding operations of this kind might have interacted with event order in the experiment conducted by Münte and colleagues. Notice that we are postulating a distinction between (i) syntactic, (ii) lexical and (iii) semantic binding operations. In other words, we assume that the processes involved in (i) computing a tree-like representation of the structure of the sentence [17], [59], (ii) computing the meaning of the sentence on the basis of the meanings of its constituents and (iii) computing a denotation of the clauses making the sentence true [53] are distinct, although not independent. More importantly, we assume that the standard ERP task of reading and understanding stimulus sentences is sufficient not only to elicit syntactic and lexical binding operations (as research on the N400 and P600 shows), but also semantic binding operations, that is (as we will argue below) reasoning processes. Against our hypothesis, one could observe that a standard task elicits only lexical and syntactic binding operations, since these are sufficient to understand a sentence. For instance, in the case of examples (4) and (5) the processor would compute the meaning of the sentence leaving undefined the truth-value of the clauses the scientist submitted the paper and the officer completed the criminal records review etc.. If this was the case, assuming the lexical and syntactic binding operations are the same for after, veridical and non-veridical before sentences, we would see no difference between ERP responses. Finally, if the hypothesis proposed by Münte and colleagues is correct, we would expect a difference only between in-order ( After p, q ) and out-of-order ( Before p, q or p after q sentences). 41

In conclusion, the results presented by Münte and colleagues do not support nor they refute the hypothesis of an interaction between the processes involved in recomputing discourse ordering relations and semantic binding operations. For this reason, we conducted an ERP study on Dutch temporal connectives in which the experimental paradigm used in [40] was refined by manipulating the choice of the connective ( after vs. before ), its position (sentence-initial vs. between clauses) and its preferred readings (veridical vs. non-veridical). The new experimental design will hopefully give us enough data in order to test different hypothesis concerning the processing of after and before sentences. Below we present the three main comparisons that will be considered and discussed in section 4.4. (i) Suppose we compare After p, q vs. q after p sentences. The latter present the events in the right order whereas in the former event order is reversed. Both sentence types allow veridical readings only. If the response to q after p sentences is more negative, we can conclude that the hypothesis proposed by Münte et al. is supported by the new data set and, up to a certain degree, confirmed. On the other hand, if no difference is observed, we can attribute the left anterior negativity reported by Münte and colleagues either to veridicality checking (ii) or to the additional task imposed by the comprehension probe, resulting in a variation of working memory load. (ii) Suppose we compare After p, q vs. p before q sentences. The latter allow also non-veridical readings (under suitable conditions) whereas the former licenses veridical readings only. Both sentence types present the events in the order in which they occur. If the response to p before q sentences is more negative, we can hypothesize that before sentences activate a search procedure in declarative memory, terminating either with the activation of world knowledge, then used by working memory to derive an anti-veridical reading of the temporal connective or, when world knowledge is not activated, with a veridical reading of the connective. This reflects the intuition that veridicality itself might be more difficult to establish for before than after sentences. Henceforth, we will call this procedure veridicality checking. In this case, the hypothesis proposed by Münte et al. would have to be revised and eventually rejected if no difference is observed between After p, q vs. q after p sentences in comparison (i). (iii) Suppose we compare veridical vs. non-veridical p before q sentences. Both sentence types present the events in the order in which they occurred. However, a specific causal relation obtains between the events described by the latter: the occurrence of p prevents the occurrence of q later in time. If non-veridical p before q sentences are more negative, we can conclude that semantic binding operations are activated whenever a specific causal relation between the events is detected. According to our hypothesis, world knowledge is used in a reasoning process, terminating with a failure to assign a denotation to the subordinate clause or equivalently, with an anti-veridical reading of the temporal connective. Henceforth, we will call this procedure semantic binding. Since veridical p before q sentences do not exhibit a causal relation between the events of the type described above, world knowledge is not retrieved from declarative memory and thus semantic binding operations are not initiated. We expect that a standard comprehension task will be sufficient to elicit semantic binding operations. 42

4.2 Method Subjects Twenty-four volunteers (14 women and 10 men, 20 to 49 years of age, mean age 24.6) participated in our experiment (see section 2.2). All were native speakers of Dutch with no history of neurological traumas or psychiatric disorders. None of them reported the usage of alcohol or other recreational drugs within the preceding 24 hours. All had normal or correct to normal vision. Subjects gave their informed consent before the beginning of each session. Participants received e 6 per hour for taking part in the study. Materials The set of stimuli consisted of 640 critical sentences and 320 fillers (see section 2.2). Critical sentences were constructed from an initial set of 160 clause pairs. Each clause described a distinct event located in the past. Narratives representing sequences of events were obtained by adding temporal connectives to clause pairs. We manipulated the choice of the temporal connective ( nadat vs. voordat ), its position (sentence-initial vs. between clauses) and thus the order in which the events were presented (in-order vs. out-of-order) and the temporal relation between the events (unmarked vs. marked). Of the initial 160 clause pairs, 80 described a causally unmarked temporal relation between the events (i.e. the occurrence of the first normally does not prevent the occurrence of the second later in time) and 80 described a causally marked temporal relation between the events (i.e. the occurrence of the first normally prevents the occurrence of the second later in time). A voordat sentence describing an unmarked relation has a preferred veridical reading. If the temporal relation described is causally marked, the sentence has a preferred non-veridical reading. Since nadat licenses veridical readings only, a nadat sentence describing a marked temporal relation conveys the additional meaning that the occurrence of the first event failed to prevent the occurrence of the second later in time. We used these intuitions to realize separate conditions for veridical and nonveridical voordat sentences, unmarked and marked nadat sentences. Our materials included 160 sentences per type (A)-(D) or equivalently 80 sentences per condition (A1)-(D2). Conditions are listed below, followed by some examples of the stimuli used (see appendix A for the complete set): (A1) 80 Unmarked Nadat p, q (C1) 80 Veridical Voordat q, p (A2) 80 Marked Nadat p, q (C2) 80 Non-veridical Voordat q, p (B1) 80 Unmarked q nadat p (D1) 80 Veridical p voordat q (B2) 80 Marked q nadat p (D2) 80 Non-veridical p voordat q (8) Nadat de man het lezen beëindigd had, sloot de vrouw het boek. (A1) (9) Nadat de doelman de bal gegrepen had, maakte de aanvaller een goal. (A2) (10) De vrouw sloot het boek nadat de man het lezen beëindigd had. (B1) (11) De aanvaller maakte een goal nadat de doelman de bal gegrepen had. (B2) (12) Voordat de vrouw het boek gesloten had, beëindigd de man het lezen. (C1) (13) Voordat de aanvaller een goal gemaakt had, greep de doelman de bal. (C2) (14) De man beëindigd het lezen voordat de vrouw het boek gesloten had. (D1) (15) De doelman greep de bal voordat de aanvaller een goal gemaakt had. (D2) 43

In order to prevent referential and semantical ambiguities, anaphoric pronouns and modal verbs were avoided. We avoided cross-clause anaphora to prevent that the cognitive processes underlying co-reference had an impact on working memory load. Each sentence consisted of 12 words and the position of the verbs was the same across items. To reduce ocular artifacts, we made sure that each word had at most 13 letters. Dutch spelling conventions were checked in an up-to-date, on-line version of the Van Dale dictionary. From the initial list of 640 critical sentences, we constructed 4 versions of the stimuli. Each version contained 20 items per condition (A1)-(D2). Thus, each subject read 160 critical sentences and 160 fillers, presented in a pseudo-random fashion. Only sequences of at most 3 items per condition were allowed in order to avoid repetition effects. We inspected each version to rule out sequences of sentences suggesting any form of thematic continuity. Pre-test In section 4.1 we remarked that in most European languages before and its counterparts are not generally veridical. Accordingly, we expect native speakers of Dutch to make a distinction between veridical and non-veridical occurrences of voordat or, in other words, to interpret non-veridically those voordat sentences that describe marked relations between events. Upon a closer look, this point is of great importance for our purposes: if subjects on average do not recognize such preferred non-veridical readings or, equivalently, do not make a distinction between veridical and non-veridical occurrences of voordat, then we cannot expect different ERP responses to the conditions listed above. For this reason, we administered a behavioral pre-test with a restricted set of nadat and voordat sentences. The materials included 160 critical sentences (see appendix A), 20 per condition (A1)-(D2), and 40 fillers, similar in structure to critical sentences but containing the temporal connectives op het moment dat, zodra and terwijl instead. With these 200 sentences, we constructed 4 test versions consisting of 40 critical sentences (5 per condition) and 40 fillers each. The sentences were presented (4 per page) as in the example below: 1. Bekijk de volgende zin: De brandweer kwam aan voordat de brand het bos vernietigd had. Uit deze zin kan worden afgeleid dat: a. de brand het bos vernietigde; b. de brand het bos niet vernietigde; c. het niet duidelijk is of de brand het bos vernietigde. The options (a)-(c) were meant to correspond to the conclusions that can be drawn from a voordat sentence when the temporal connective is interpreted veridically (a) or non-veridically (b)-(c) where (b) corresponds to the anti-veridical reading. Subjects were 32 (19 women and 13 men, 19 to 43 years of age, mean age 26.9) Dutch native speakers, from different educational backgrounds. An equal number of subjects was assigned to each test version. Participants were allowed to complete the test without time restrictions. Clear instructions in Dutch were stated in the first page of the booklets, asking subjects to read carefully each of the 80 sentences presented and then circle one of the three possible conclusions. 44

We expected that subjects read all 20 nadat and the 10 voordat sentences describing unmarked temporal relations as veridical, choosing option (a), and all 10 voordat sentences describing causally marked temporal relations as non-veridical (or anti-veridical), choosing options (b) or (c). The results are summarized in table 4.1 below. Condition a b+c p Fit to expect. (A1) Unmarked Nadat p, q 158 2.161 98.75% (A2) Marked Nadat p, q 158 2.161 98.75% (B1) Unmarked q nadat p 156 4.044 97.50% (B2) Marked q nadat p 159 1.325 99.38% (C1) Veridical Voordat q, p 102 58.000 63.75% (C2) Non-veridical Voordat q, p 31 129.001 80.63% (D1) Veridical p voordat q 113 47.000 70.63% (D2) Non-veridical p voordat q 32 128.000 80.00% Table 4.1 Frequencies of subjects choices per condition. Responses that match with our expectations are marked with an asterisk. Significance values from a one-sample t-test (test value 5, i.e. the maximum number of answers per condition matching with our expectations) are reported. Percentages of responses fitting to expectations are given in the last column. Our expectations for nadat sentences appear to be largely fulfilled. A prima facie positive result can also be reported for voordat sentences, although statistical testing reveals a significant departure of subjects responses from the expected pattern. The average number of choices to conditions (C1)-(D2) that match with our expectations is nevertheless reasonably high, in particular to non-veridical voordat sentences (C2) and (D2). This suggests that participants were more inclined to interpret non-veridically (what we intended to be read as) veridical voordat sentences than the inverse. We can thus conclude that subjects are on average sensitive to non-veridical uses of voordat. More precisely, as subjects responses to conditions (C1)-(D2) indicate, non-veridical (and anti-veridical) readings of voordat are viable interpretive options whenever a marked relation between the events is detected (as it was the case for most of our intended non-veridical sentences) or can be hypothesized (as it might have been the case for some of our intended veridical ones). As to our ERP experiment, these behavioral findings look in a sense promising: since subjects make a difference between nadat, veridical and non-veridical voordat sentences, it does not seem unreasonable to expect different ERP responses to conditions (A1)-(D2) listed above. Procedure After electrode application, participants were conducted in the experimental room and asked to sit in front of the video monitor. Subjects were instructed to avoid eye blinks or movements during sentence presentation. Their only task was to read each sentence carefully for comprehension. Stimulus sentences were presented one word at a time (duration 300 ms, with 600 ms between word 45

onsets) in white in the middle of the screen. After 1000 ms of blank screen following the offset of the last word of each sentence, an asterisk mark appeared for 1500 ms, during which subjects were allowed to blink or change their posture. There was no comprehension probe following critical sentences and no additional working memory span test was performed. Each session lasted for about two hours, including subject preparation. The actual experiment took approximately 50 minutes to be completed and was divided into four blocks of 80 trials each. Three short breaks between the blocks allowed subjects subjects to recuperate from the blinking regime. We monitored participants behavior during the task with a video camera and a microphone placed inside the experimental room. Immediately after the last block, we conducted an informal debriefing interview with subjects. Together with the results of our behavioral pre-test, participants reports were used as additional elements in a preliminary interpretation of ERP data [46]. Recording The EEG/EOG was sampled continuously from 32 electrode sites. Two electrodes placed at the left- and right-eye outer canthi and one below the left eye monitored horizontal and vertical eye movements. The remaining 28 electrodes were placed at the following scalp locations: Fp1, Fp2, F7, F3, Fz, F4, F8, FC5, FC1, FCz, FC2, FC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, O1, O2 (figure 2.1). Sintered Ag/AgCl electrodes, fixed in an elastic cap, were used. EOG electrodes were applied with two-sided adhesive decals. Before electrode application, subjects skin was cleaned with ethyl alcohol and abrasive gel to reduce the impedance at the scalp-electrode junction. Electrode impedance was kept below 5 kω and was checked before the beginning of each block and at the end of the session. The left mastoid electrode TP9 was used as reference. Subsequently, a linked mastoid reference was computed off-line by re-referencing all EEG channels to the right mastoid electrode TP10, which was excluded from further analysis. The EEG/EOG was amplified by a multichannel BrainAmp TM DC system. Amplifier settings were: sampling rate 500 Hz, sampling interval 2000 µs, time constant 10 s, high cutoff 70 Hz. Several markers were sent out on-line to the recording device: for all conditions (A1)-(D2), a sentence-initial and a sentence-final markers, corresponding to the onset of the first and the last word of each sentence respectively; for conditions (A1)-(A2) and (C1)-(C2) a second-clause marker, corresponding to the onset of the first word (i.e. the verb) of the second clause; for conditions (B1)- (B2) and (D1)-(D2) a temporal connective marker, corresponding to the onset of the words nadat and voordat. Data analysis Several transforms were applied to the raw EEG signal. The sampling rate was changed to 200 Hz (interval 5000 µs) and the high cutoff filter was set to 30 Hz. Twenty segments per condition (40 per sentence type) were obtained, starting with the onset of sentence-initial words and ending 7400 ms later. Baseline correction was computed over the 150 ms preceding the onset of sentence-initial words. 46

Epochs contaminated by artifacts (eye blinks, horizontal eye movements, muscle activity, high-amplitude alpha and electrode drifts) were marked and excluded from further analysis. In order to guarantee an acceptable signal-to-noise ratio, subjects had to have at least 15 (on 20) good trials to be kept in the final analysis. The average number of rejected trials per condition was: (A1) 2.38, (A2) 2.58, (B1) 2.71, (B2) 2.83, (C1) 2.33, (C2) 2.25, (D1) 2.75, (D2) 2.42. Average waveforms were computed for each participant in each condition and in each sentence type. Overlay functions of grand average ERPs were calculated for sentence type pairs (A)-(B), (A)-(C), (A)-(D) and for condition pairs (A1)-(C1), (A1)-(A2), (D1)-(D2). Waveforms were inspected for differential effects at specific latencies. Three time windows to sentence-initial words were selected: 600-3000 ms (first clause), 4200-6600 ms (second clause) and 6600-7400 ms (sentence-final). We included also a sentence-final time window because we expected semantic binding operations to occur as soon as all relevant lexical information is available. This means that the ERP effects due to the manipulation of the temporal relation, if any, will most likely occur at the end of the second-clause and in particular after the last verb phrase has been parsed: in p voordat q sentences, in the interval following the onset of the last two lexical items (i.e. the verb); in Nadat p, q sentences a few hundred milliseconds sooner (because of the position of the main-clause verb), thus mostly in the second time window (4200-6600 ms). Repeated measures analyses of variance (ANOVAs) used mean amplitude values computed for each participant in each time window. Univariate F tests were adjusted by means of the Huynh-Feldt correction. Mauchly s sphericity test was used to calculate the value of ǫ. Statistical analyses were performed in two moments. Initially, we conducted two parallel ANOVAs with the factors we manipulated while constructing the materials: 11 ANOVA (1) ANOVA (2) Connective Connective (2 levels: nadat / voordat ) (2 levels: nadat / voordat ) Event order Position (2 levels: in-order/out-of-order) (2 levels: initial/between clauses) Temporal relation Temporal relation (2 levels: marked/unmarked) (2 levels: marked/unmarked) Electrode site Electrode site (28 levels) (28 levels) Subsequently, we conducted separate ANOVAs of comparisons between sentence types and conditions. The factors were: sentence type (2 levels), condition (2 levels), temporal relation (2 levels) and electrode site (28 levels), eventually restricted to collections of electrodes or single sites. The factor hemisphere (2 levels: left/right) was used to test for lateralizations of the observed ERPs. 11 Notice that the main effect of event order in ANOVA (1) coincides with the interaction effect of connective and position in ANOVA (2) and the main effect of position in ANOVA (2) coincides with the interaction effect of connective and event order in ANOVA (2). This reflects the properties of nadat and voordat sentences: e.g. to reverse event order, one should change either the temporal connective or its position, but not both. 47

4.3 Results Let us first look at the outcomes of factorial analyses in each time window separately. In the 600-3000 ms interval (first clause), we did not find significant main effects of the factors connective, event order and position. However, the main effect of temporal relation was marginally significant and the interaction between the factors event order and relation was reliable. We did not find significant connective by order and connective by relation interactions. The situation is slightly different in the 4200-6600 ms time window (second clause). Again, the main effects of connective, event order and position are not significant while the main effect of relation is marginally significant. Also in this case, there is no reliable interaction between the factors connective and event order. The event order by relation interaction is again significant. Interestingly, we also found a reliable interaction between the factors connective and relation. In the 6600-7400 ms interval (sentence-final), the only significant effect is the interaction between event order and relation. Results are reported in table 4.2. Table 4.2 600-3000 ms Connective: F(1,23) =.160,p =.693,ǫ = 1 Order: F(1,23) =.504,p =.485,ǫ = 1 Position: F(1,23) =.266,p =.611,ǫ = 1 Relation: F(1,23) = 3.084,p =.092,ǫ = 1 Connective order: F(1,23) =.266,p =.611,ǫ = 1 Connective relation: F(1,23) = 2.822,p =.106,ǫ = 1 Order relation: F(1,23) = 6.371,p =.019,ǫ = 1 4200-6600 ms Connective: F(1,23) =.010,p =.921,ǫ = 1 Order: F(1,23) =.707,p =.409,ǫ = 1 Position: F(1,23) =.001,p =.972,ǫ = 1 Relation: F(1,23) = 2.986,p =.097,ǫ = 1 Connective order: F(1,23) =.001,p =.972,ǫ = 1 Connective relation: F(1,23) = 5.502,p =.028,ǫ = 1 Order relation: F(1,23) = 11.500,p =.003,ǫ = 1 6600-7400 ms Connective: F(1,23) =.148,p =.704,ǫ = 1 Order: F(1,23) =.059,p =.810,ǫ = 1 Position: F(1,23) =.410,p =.528,ǫ = 1 Relation: F(1,23) = 1.810,p =.192,ǫ = 1 Connective order: F(1,23) =.410,p =.528,ǫ = 1 Connective relation: F(1,23) = 1.132,p =.298,ǫ = 1 Order relation: F(1,23) = 12.089,p =.002,ǫ = 1 In contrast with the results of Münte and colleagues, that is no effect event order, that is no difference between in-order and out-of-order sentences (see point (i), sec- 48

tion 4.1). In addition, there is no effect of temporal connective, that is no difference between nadat and voordat sentences and thus no effect of veridicality checking (see point (ii), section 4.1). What seems to be consistent with our initial assumptions is the interaction between the factors connective and relation during the second clause. This suggests that there might be processing differences between veridical vs. non-veridical voordat sentences and unmarked vs. marked nadat sentences (see point (iii), section 4.1). Let us now look at sentence type comparisons, starting from (A) Nadat p, q vs. (C) Voordat q, p. The main effect of sentence type is not significant in any of the time windows examined. However, we found reliable interactions between the factors sentence type and temporal relation. Results are reported in table 4.3. Table 4.3 Type (600-3000 ms): F(1,23) = 1.129,p =.281,ǫ = 1 Type (4200-6600 ms): F(1,23) =.452,p =.503,ǫ = 1 Type (6600-7400 ms): F(1,23) =.252,p =.621,ǫ = 1 Type rel. (600-3000 ms): F(1,23) = 8.967,p =.006,ǫ = 1 Type rel. (4200-6600 ms): F(1,23) = 21.182,p =.000,ǫ = 1 Type rel. (6600-7400 ms): F(1,23) = 15.410,p =.001,ǫ = 1 Type rel. site (600-3000 ms): F(27,621) = 1.954,p =.068,ǫ =.246 Type rel. site (4200-6600 ms): F(27,621) = 1.590,p =.123,ǫ =.327 Type rel. site (6600-7400 ms): F(27,621) = 1.832,p =.091,ǫ =.240 Comparing (A) Nadat p, q vs. (B) q nadat p sentence types, the main effect of sentence type is significant at P8 in the 6600-7400 ms interval (figures 4.1 and 4.2). A direct inspection of topographical maps in the same time window reveals that out-of-order nadat sentences have a negative focus at left frontal sites (figure 4.2). However, the main effect of sentence type fails to reach significance on a separate ANOVA restricted to F3. Also in this case, the interaction between the factors sentence type and relation is significant, although in the 4200-6600 ms and 6600-7400 ms intervals only. See table 4.4 for the results of statistical analyses. Table 4.4 Type (28 electrodes) (600-3000 ms): F(1, 23) =.017, p =.898, ǫ = 1 Type (28 electrodes) (4200-6600 ms): F(1, 23) =.294, p =.593, ǫ = 1 Type (28 electrodes) (6600-7400 ms): F(1, 23) =.058, p =.811, ǫ = 1 Type (P8) (4200-6600 ms): F(1,23) = 3.609,p =.070,ǫ = 1 Type (P8) (6600-7400 ms): F(1,23) = 5.313,p =.031,ǫ = 1 Type (F3) (6600-7400 ms): F(1,23) = 1.590,p =.220,ǫ = 1 Type rel. (4200-6600 ms): F(1,23) = 7.750,p =.011,ǫ = 1 Type rel. (6600-7400 ms): F(1,23) = 7.789,p =.010,ǫ = 1 49

Figure 4.1 Grand average (n = 24) waveforms elicited by Nadat p, q vs. q nadat p sentences at P8. The onset of sentence-initial word is at 0 ms. Figure 4.2 Triangulation-linear interpolated topographic map displaying the mean difference between Nadat p, q and q nadat p sentences in the 6600-7400 ms time window to sentenceinitial word. Out-of-order nadat sentences elicit a more negative response at right parietal (P8) and left frontal (F3) sites. Top and front views are shown. 50

Lastly, we contrasted (A) Nadat p, q vs. (D) p voordat q sentences. Again, the main effect of sentence type was not significant in any of the time windows considered. Voordat sentences are more negative at left prefrontal sites in the 6600-7400 ms interval (figure 4.3) but a separate ANOVA conducted on Fp1 shows that the main effect of sentence type is not statistically robust. The interaction between the factors sentence type and temporal relation is reliable in the 4200-6600 ms time window. We found significant interactions between the factors relation and site in all three time windows. Finally, the type by relation by site interaction was significant during the first (600-3000 ms) and the second (4200-6600 ms) clause. Results are reported in table 4.5. Figure 4.3 Triangulation-linear interpolated topographic map displaying the mean difference between Nadat p, q and p voordat q sentences in the 6600-7400 ms time window to sentenceinitial word. Voordat sentences elicit a more negative response at left prefrontal (Fp1) sites. Top and front views are shown. 51