Large scale syntactic annotation for Dutch Gertjan van Noord University of Groningen
Context: Wide-coverage Parsing 1 Assign syntactic structure to sentence Neccessary step to determine the meaning
Context: Wide-coverage Parsing (2) 2 Met de verrekijker zie ik de man De man met de verrekijker zie ik
3 top smain top smain mod pp verb zie 4 su pron ik 5 obj1 np obj1 np verb zie 6 su pron ik 7 prep met 1 obj1 np det det de 6 noun man 7 det det de 1 noun man 2 mod pp det det de 2 noun verrekijke 3 prep met 3 obj1 np det det de 4 noun verrekijke 5
Context: Wide-coverage Parsing (3) 4 Dit is de vrouw die de mannen hebben gezien Dit is de vrouw die de mannen heeft gezien
5 top smain top smain su det dit 1 verb ben 2 predc np su det dit 1 verb ben 2 predc np det det de 3 noun vrouw 4 mod rel det det de 3 noun vrouw 4 mod rel r 4 noun die 5 body ssub r 4 noun die 5 body ssub su 4 verb heb 8 vc ppart su 6 np verb heb 8 vc ppart su 4 obj1 np verb zie 9 det det de 6 noun man 7 obj1 4 su 6 verb zie 9 det det de 6 noun man 7
Parsing: state of the art 6 Full parsing is fragile slow inaccurate
Parsing: state of the art 6 Full parsing is fragile slow inaccurate This is no longer true! Improvements: robustness efficiency disambiguation
Parsing: state of the art 6 Full parsing is fragile slow inaccurate This is no longer true! Improvements: robustness efficiency disambiguation Corpora!
Syntactic Annotation - past 7 Penn Treebank (1989) By linguists (students) Resource for NLP research/development: Train (statistical) models Evaluate (statistical) models Revolution in NLP
Syntactic Annotation - this talk 8 for Dutch manually corrected By linguists (students) Alpino parser and related tools fully automatically Alpino parser and related tools Huge Many more applications
Overview 9 Syntactically annotated corpora are great! Small manually corrected treebanks:... for disambiguation in a parser Huge automatically created treebanks:... for improved disambiguation in a parser... for corpus linguistics... for information extraction / question answering
Alpino 10 Parser for Dutch Characteristics: wide-coverage robust accurate Formalism: Stochastic Attribute Value Grammar Linguistic Sophistication Principled Account of Disambiguation Output: CGN Dependency Structures
CGN Dependency Structures 11 CGN: Corpus of Spoken Dutch abstract representation of syntactic analysis de-facto standard hierarchical information: which words belong together relational information:, su, obj1, obj2, pc,... categorial information: np, pp, smain,...
Vier jonge Rotterdammers willen deze zomer per auto naar Japan 12 top smain su np verb wil 3 mod np mod pp ld pp det num vier 0 mod adj jong 1 noun Rotterdammer 2 det det deze 4 noun zomer 5 prep per 6 obj1 noun auto 7 prep naar 8 obj1 name Japan 9
Er was een tijd dat Amerika met bossen overdekt was 13 top smain mod adv er 0 verb ben 1 su np det det een 2 noun tijd 3 vc cp cmp comp dat 4 body ssub su 1 name Amerika 5 vc ppart verb ben 9 obj1 1 mod pp verb overdek 8 prep met 6 obj1 noun bos 7
Extrinsic Motivation 14 corpus # sentences length accuracy % exact % Alpino Treebank (newspaper) 7136 20 89.1 41.5 CLEF questions (tuned for questions) 1745 11 96.3 82.1 D-Coi-Gr Treebank 8857 15 88.4 48.3 D-Coi WR-P-E-E (newsletters) 90 20 81.1 31.1 D-Coi WR-P-P-B (children book) 276 7 93.5 79.0 Accuracy: in terms of named dependencies
Syntactic Analysis in Alpino 15 Lexicon over 200,000 entries (including many named entities) extensive set of heuristics for unseen words and word sequences mapped to attribute-value matrices organized as inheritence network POS-tagger removes unlikely lexical categories Grammar rewrite rules where categories are attribute-value matrices unification rule set organized as inheritence network Parser constructs parse forest: compact representation of all possible parses selects best parse: disambiguation
Ambiguity in Alpino 16 Avg. readings 0 5000 10000 15000 5 10 15 Sentence length (words)
Ambiguity 17 the expected lexical and structural ambiguities many, many, many unexpected, absurd, ambiguities many don t care ambiguities longer sentences have millions of parses
Er was een tijd dat Amerika met bossen overdekt was 18 top smain mod adv er 0 verb ben 1 su np det det een 2 noun tijd 3 vc cp cmp comp dat 4 body ssub su 1 name Amerika 5 vc ppart verb ben 9 obj1 1 mod pp verb overdek 8 prep met 6 obj1 noun bos 7
Er was een tijd dat Amerika met bossen overdekt was 19 top smain mod adv er 0 verb ben 1 su np predc np mod pp det det een 2 noun tijd 3 det det dat 4 name Amerika 5 prep met 6 obj1 np noun bos 7 mod np mod adj overdekt 8 noun was 9
Er was een tijd dat Amerika met bossen overdekt was 20 top smain mod adv er 0 verb ben 1 su np det det een 2 noun tijd 3 vc cp cmp comp dat 4 body ssub su name Amerika 5 mod pp predc adj overdekt 8 verb ben 9 prep met 6 obj1 noun bos 7
Vier jonge Rotterdammers willen deze zomer per auto naar Japan 21 top sv1 verb vier 0 obj1 np mod np mod pp obj1 np verb wil 3 det det deze 4 noun zomer 5 prep per 6 obj1 np mod adj jong 1 noun Rotterdammer 2 noun auto 7 mod pp prep naar 8 obj1 name Japan 9
Door de overboeking vertrok een groep toeristen uit het hotel 22 top smain mod pp verb vertrek 3 su np ld pp prep door 0 obj1 np det det een 4 noun groep 5 mod noun toerist 6 prep uit 7 obj1 np det det de 1 noun over boeking 2 det det het 8 noun hotel 9 Zempléni: unambiguously literal sentence Alpino: 13 parses
Door de overboeking vertrok een groep toeristen uit het hotel 23 top smain mod pp verb vertrek 3 su np obj1 np prep door 0 obj1 np det det een 4 noun groep 5 noun toerist 6 mod pp det det de 1 noun over boeking 2 prep uit 7 obj1 np det det het 8 noun hotel 9
Disambiguation Model 24 Identify features for disambiguation: arbitrary characteristics of parses Training the model: assign a weight to each feature, by increase weights of features in the correct parse decrease weights of features in incorrect parses Applying the model: For each parse, sum weights of features occurring in it Select parse with highest sum Maximum Entropy
Training 25 Requires a corpus of correct and incorrect parses Alpino Treebank: newspaper-part (cdbl) of Eindhoven corpus 145.000 words manually checked syntactic annotations (Leonoor van der Beek,... ) CGN Dependency Structures Generate all parses with Alpino, and use the treebank to classify each parse
Features 26 Describe arbitrary properties of parses Need not be independent of each other Can encode a variety of linguistic (and other) preferences Linguistic Insights!
Features templates 27 r1(rule) r2(rule,n,subrule) r2 root(rule,n,word) r2 frame(rule,n,frame) r3(rule,n,word) mf(cat1,cat2) f1(pos) f2(word,pos) h(heur) Rule has been applied The N-th daughter of Rule is constructed by SubRule The N-th daughter of Rule is Word The N-th daugther of Rule is a word with subcat frame Frame The N-th daughter of Rule is headed by Word Cat1 precedes Cat2 in the mittelfeld POS-tag Pos occurs Word has POS-tag Pos unknown word heuristic Heur has been applied
Dependency feature templates 28 dep35(sub,role,word) dep34(sub,role,pos) dep23(subpos,role,pos) Sub is the Role dependent of Word Sub is the Role dependent of a word with POS-tag Pos a word with POS-tag SubPos is the Role dependent of a word with POS-tag Pos
Some non-local features 29 In coordinated structure, the conjuncts are parallel or not In extraction structure, the extraction is local or not In extraction structure, the extracted element is a subject Constituent ordering in mittelfeld pronoun precedes full np accusative pronoun precedes dative pronoun dative full np precedes accusative full np
Features indicating bad parses 30-0.0707213 h1(long) -0.0585366 f2(was,noun) -0.0507852 f2(tot,vg) -0.0497879 h1(decap(not_begin)) -0.0494901 s1(extra_from_topic) -0.0411195 r3(np_det_n,2,was) -0.0410466 f2(op,prep) -0.0372584 f2(kan,noun) -0.0337606 h1(skip)
Features indicating good parses 31 0.0741717 f2(en,vg) 0.064064 dep35(en,vg,/obj1,prep,tussen) 0.0549897 f2(word,verb(passive)) 0.0461192 r2(non_wh_topicalization(np),1,np_pron_weak) 0.039418 s1(subj_topic) 0.0387447 dep23(pron(wkpro,nwh),/su,verb)
Results Parse Selection 32 Alpino treebank ten-fold cross-validation Model should select best parse for each sentence out of maximally 1000 parses per sentence accuracy: proportion of correct named dependencies
Results Parse Selection 33 accuracy % baseline 61.5 oracle 89.2 model 84.0 rate 81.5 exact 55
Wrap up 34 So far: background about Alpino manually annotated treebank to train and test disambiguation component Next: applications of automatically constructed treebanks
Automatically constructed treebanks 35 Corpora automatically annotated with Alpino Parser Twente News Corpus (TwNC) (500M words, newspapers) D-Coi (55M words, including Dutch Wikipedia, Dutch Europarl) LASSY (450M words, to be decided) Interesting Applications...
TwNC 36 #sentences 100% 30,000,000 #words 500,000,000 #sentences without parse 0.2% 100,000 #sentences with fragments 8% 2,500,000 #single full parse 92% 27,500,000
Millions of dependency structures 37 Compressed archives of XML files Pseudo random access dictd gzip Storage requirements: 10% of original Mostly by Geert Kloosterman
Example 38 <?xml version="1.0" encoding="iso-8859-1"?> <top> <node rel="top" cat="smain" begin="0" end="10"> <node rel="su" frame="determiner(het,nwh,nmod,pro,nparg)" pos="det" begin="0" end="1" root="dat" word=" <node rel="" frame="verb(hebben,past(sg),transitive)" pos="verb" begin="1" end="2" root="wek" word="w <node rel="obj1" cat="np" begin="2" end="10"> <node rel="det" frame="determiner(de)" pos="det" begin="2" end="3" root="de" word="de" infl="de"/> <node rel="" frame="noun(de,both,sg)" pos="noun" begin="3" end="4" root="woede" word="woede" gen="d <node rel="mod" cat="pp" begin="4" end="10">... </node> </node> </node> <sentence>dat wekte de woede van Turkse inwoners van de wijk.</sentence> <comments> <comment>q#ad19940103-0125-776-2 Dat wekte de woede van Turkse inwoners van de wijk. 1 1-0.0396969573 </comments> </top>
Treebank Tools 39 DtView DtEdit DtSearch
DtView 40
DtSearch 41 XPATH standard Search queries hierarchical relations grammatical relations syntactic category surface order lemma, other attributes Matches: display sentence display sentence with brackets display matching part of sentence your own style-sheets
DtSearch Example 42 dtsearch -s -q //node[../@cat="smain" and @rel="obj2" and not(@cat="pp") and./@begin =../@begin]. [Haar] ging het goed af. " [Ons] staat helemaal geen Big Brother-scenario voor ogen. [Ook hun] past enige schroom. [Zelfs de bloeddorstigste tegenstander] adviseerde hij nog zijn gedrag wat aan te passen. [Die] geef ik voor de wedstrijd een zoen...
Application: Selection Restrictions for Improved Disambiguation 43
Application: Selection Restrictions for Improved Disambiguation 43 Use automatically parsed corpus to learn selection restrictions Bier drinkt de vrouw Beer, the woman drinks Lexical features: dep35(woman,obj1,drink) dep35(beer,su,drink) dep35(woman,su,drink) dep35(beer,obj1,drink) Such features are too infrequent to be useful; the training corpus is too smal to estimate weights for those features
Some Actually Occurring Bad Parses 44 (1) a. Campari moet u gedronken hebben Campari must have drunk you You must have drunk Campari b. De wijn die Elvis zou hebben gedronken als hij wijn zou hebben gedronken The wine Elvis would have drunk if he had drunk wine The wine that would have drunk Elvis if he had drunk wine c. De paus heeft tweehonderd daklozen te eten gehad The pope had twohunderd homeless people for dinner
Extract lexical dependencies 45 top whq w 1 conj body sv1 cnj adv waar 0 crd vg en 1 cnj adv wanneer 2 mod 1 verb drink 3 su name Elvis 4 obj1 noun wijn 5 crd/cnj(en, waar) w/body(en, drink) /obj1(drink, wijn) crd/cnj(en, wanneer) /mod(drink, en) /su(drink, Elvis)
Number of lexical dependencies 46 tokens 480,000,000 types 100,000,000 types with frequency 20 2,000,000
Bilexical preference 47 Pointwise Mutual Information (Fano 1961, Church and Hanks 1990) I(r(w 1, w 2 )) = log compare actual frequency with expected frequency Example: I(/obj1(drink, melk)) f(/obj1(drink, melk)): 195 f(/obj1(drink, )): 15713 f( (, melk)): 10172 expected: 0.34 actual frequency is about 560 times as big its log: 6.3 f(r(w 1, w 2 )) f(r(w 1, ))f( (, w 2 ))
Examples of high bilexical preferences 48 bijltje gooi neer 13 duimschroef draai aan 13 peentje zweet 13 traantje pink weg 13 boontje dop 12 centje verdien bij 12 champagne fles ontkurk 12 dorst les 12
Examples of high scoring objects of drink 49 biertje small glass of beer 8 borreltje strong alcoholic drink 8 glaasje small glass 8 pilsje small glass of beer 8 pintje small glass of beer 8 pint glass of beer 8 wijntje small glass of wine 8 alcohol alcohol 7 bier beer 7
Lexical preferences between verbs and modifiers 50 overlangs snijd door 12 welig tier 12 dunnetjes doe over 11 stief moederlijk bedeel 11 on zedelijk betast 11 stierlijk verveel 11 cum laude studeer af 10 hermetisch grendel af 10 ingespannen tuur 10 instemmend knik 10 kostelijk amuseer 10
Lexical preferences between nouns and adjectives 51 endoplasmatisch zelfrijzend waterbesparende ongeblust onbevlekt ingegroeid knapperend geconsacreerde bezittelijk pientere afgescheurde beklemtoond reticulum bakmeel douchekop kalk ontvangenis teennagel haardvuur hostie voornaamwoord pookje kruisband lettergreep
Can you guess? 52 put
Can you guess? 52 put sponde bodemloze
Can you guess? 52 put sponde bandiet bodemloze echtelijke
Can you guess? 52 put sponde bandiet zelfverrijking bodemloze echtelijke eenarmige
Can you guess? 52 put sponde bandiet zelfverrijking vuist bodemloze echtelijke eenarmige exhibitionistische
Can you guess? 52 put sponde bandiet zelfverrijking vuist wenkbrauw bodemloze echtelijke eenarmige exhibitionistische gebalde
Can you guess? 52 put sponde bandiet zelfverrijking vuist wenkbrauw nonsens bodemloze echtelijke eenarmige exhibitionistische gebalde gefronst
Can you guess? 52 put sponde bandiet zelfverrijking vuist wenkbrauw nonsens veldtocht bodemloze echtelijke eenarmige exhibitionistische gebalde gefronst baarlijke, klinkklare
Can you guess? 52 put sponde bandiet zelfverrijking vuist wenkbrauw nonsens veldtocht bodemloze echtelijke eenarmige exhibitionistische gebalde gefronst baarlijke, klinkklare tiendaagse
Using association scores as disambiguation features 53 new features z(p, r) for each POS-tag p and dependency r
Using association scores as disambiguation features 53 new features z(p, r) for each POS-tag p and dependency r if there is a r-dependency between word w 1 (with Pos-tag p) and word w 2
Using association scores as disambiguation features 53 new features z(p, r) for each POS-tag p and dependency r if there is a r-dependency between word w 1 (with Pos-tag p) and word w 2 the count of this feature is given by I(r(w 1, w 2 ))
Using association scores as disambiguation features 53 new features z(p, r) for each POS-tag p and dependency r if there is a r-dependency between word w 1 (with Pos-tag p) and word w 2 the count of this feature is given by I(r(w 1, w 2 )) only for positive I
Using association scores as disambiguation features 53 new features z(p, r) for each POS-tag p and dependency r if there is a r-dependency between word w 1 (with Pos-tag p) and word w 2 the count of this feature is given by I(r(w 1, w 2 )) only for positive I NB: limited number of features; treebank large enough to estimate their weights
Example 54 Melk drinkt de baby niet Milk, the baby does not drink Analysis 1: z(verb,/obj1)=6 z(verb,/su)=3 Analysis 2: z(verb,/obj1)=0 z(verb,/su)=0 weight z(verb,/obj1): 0.0101179 weight z(verb,/su): 0.00877976
Experiment 1 55 ten-fold cross validation Alpino Treebank fscore err.red. exact CA % % % % standard 87.41 74.60 52.0 87.02 +self-training 87.91 77.38 54.8 87.51
Experiment 2 56 Full system D-Coi Treebank (Trouw newspaper) prec rec fscore CA % % % % standard 90.77 90.49 90.63 90.32 +self-training 91.19 90.89 91.01 90.73
Application: Extraposition of comparatives out of topic 57
Application: Extraposition of comparatives out of topic 57 Reviewer: extraposition of comparative out of topic is impossible: *Lager was de koers nog nooit dan bij opening Alpino grammar allows this We can search for the relevant pattern
Dependency Structure 58 top smain predc ap verb ben 1 su np mod adv nog 4 mod adv nooit 5 adj laag 0 obcomp cp det det de 2 noun koers 3 cmp comparative dan 6 body pp prep bij 7 obj1 noun opening 8
DtSearch queries 59 //node[@cat="smain" and./node[./node[@rel="obcomp"]]/@begin = @begin] //node[@cat="smain" and./node[./node[@rel="obcomp"] /@end >../node[@rel=""]/@begin ]/@begin = @begin]
Extraposed obcomp out of topic 60 Liever benadrukt hij die tegenstellingen dan de bedriegelijke harmonie Nog eerder zal de machtige Mekong droogvallen dan dat de co-premier zijn macht uit handen geeft Zo intens lelijk zijn mijn voeten in de loop van een decennium geworden dat ik de mensenmassa s op het strand er in de zomer niet mee wil lastigvallen Eerder brengt men een hemel vol wolken in kaart dan dit oeuvre Veel eerder vindt er een herschikking in het midden plaats dan dat er werkelijk massaal uit dat midden wordt gevlucht Eerder is er sprake van het kabinet ondanks Kok dan het kabinet-kok Liever sluis ik honderden en honderden guldens door aan loodgieter, fietsenmaker en elektricien dan dat ik zelf ook maar één vinger uitsteek naar het fonteintje bij het toilet, een kapot achterlicht of een weigerende stofzuiger liever waren ze onafhankelijk dan dat ze zich aan iemand bonden Liever is Jim schuldig aan een sprong, dan de prooi van een aanvechting
eerder gaat zoo n kameel door het oog van een naald, dan dat een rijke in zou gaan in het koninkrijk der hemelen 61
Application: Question Answering, and Similar Words 62
Application: Question Answering, and Similar Words 62 (2) By whom was John Lennon killed? (3) Where was he killed? (4) How often was he hit? (5) What are Google-bombs? (6) How high is the Dom-tower in Utrecht (7) In what year did its construction start? (8) Who was the first architect?
Background 63 QA-system based on Alpino: JOOST Best result in CLEF2005 for Dutch; third result overall Best result in CLEF2006 for Dutch; Dutch was made more difficult than other languages No results known yet for CLEF2007
Background 63 QA-system based on Alpino: JOOST Best result in CLEF2005 for Dutch; third result overall Best result in CLEF2006 for Dutch; Dutch was made more difficult than other languages No results known yet for CLEF2007
Strategy 64 Analyse the question into a dependency structure Compare dependency structure with dependency structures of all potential answers Potential answers are paragraphs returned by IR from newspaper texts and Dutch Wikipedia
Strategy 64 Analyse the question into a dependency structure Compare dependency structure with dependency structures of all potential answers Potential answers are paragraphs returned by IR from newspaper texts and Dutch Wikipedia Use many other techniques in addition Ontological information
Ontological information for QA 65 (9) Who is Javier Solana? (10) Which soccer player won the Golden Bal in 1999? (11) In which American state is Iron Mountain? (12) Which French president opened the Channel Tunnel?
Discover Ontological Information 66 Similar words occur in similar contexts Dependency relations: more fine-grained notion of context subject-verb verb-object adjective-noun coordination apposition prepositional complement
Vectors describing contexts 67 Every word is represented by an n-dimensional vector Every dimension is a context characteristic Every cell is a (function of the corresponding) frequency zie.obj verf.obj verzorg.obj laat uit.obj... bus 50 5 1 0... hond 56 1 5 8... truck. 43 4 0 0...
Similarity Measure 68 Dice: i 2 min(v i, w i ) v i + w i other possibilities...
Feature Weights 69 frequency mutual information other possibilities...
Data used 70 subject-verb 5,639,140 verb-object 2,642,356 adjective-noun 3,262,403 coordination 965,296 apposition 526,337 prepositional complement 770,631
Results for BMW 71 Volkswagen, Mercedes, Honda, Chrysler, Audi, Volvo, Ford, Toyota, Fiat, Peugeot, Opel, Mitsubishi, Renault, Mazda, Jaguar, General Motors, Rover, Nissan, VW, Porsche
Results for Sony 72 Matsushita, Toshiba, Time Warner, JVC, Hitachi, Nokia, Samsung, Motorola, Philips, Siemens, Apple, Canon, IBM, PolyGram, Thomson, Mitsubishi, Kodak, Pioneer, AT&T, Sharp
Hinault 73 Kübler, Vermandel, Bruyère, Depredomme, Mottiat, Merckx, Depoorter, De Bruyne, Argentin, Schepers, Criquielion, Dierickx, Van Steenbergen, Kint, Bartali, Ockers, Coppi, Fignon, Kelly, De Vlaeminck
Beatles 74 Rolling Stones, Stones, John Lennon, Jimi Hendrix, Tina Turner, Bob Dylan, Elvis Presley, Michael Jackson, The Beatles, David Bowie, Prince, Genesis, Mick Jagger, The Who, Elton John, Barbra Streisand, Led Zeppelin, Eric Clapton, Diana Ross, Janis Joplin
Paris 75 Londen, Brussel, Moskou, Washington, Berlijn, New York, Rome, Madrid, Bonn, Wenen, Peking, Frankfurt, Athene, Tokio, München, Barcelona, Praag, Antwerpen, Stockholm, Tokyo
Grenoble 76 Rouen, Saint Etienne, Pau, Saint-Etienne, Rennes, Marne-la-Vallée, Aix, Orléans, Toulouse, Montpellier, Amiens, Strasbourg, Lyon, Lens, Avignon, Clermont-Ferrand, Straatsburg, Caen, Bayonne, Limoges
Results for Wim Kok 77 Elco Brinkman, Frits Bolkestein, Hans van Mierlo, W. Kok, Kok, Ruud Lubbers, Den Uyl, John Major, Jacques Wallage, Wallage, Thijs Wöltgens, Hedy d Ancona, Relus ter Beek, Klaus Kinkel, Balladur, Kinkel, Van Mierlo, Jacques Chirac, Kooijmans, Jan Pronk
huis (house) 78 woning, gebouw, pand, auto, straat, kantoor, kamer, boerderij, tuin, winkel, kerk, brug, huisje, appartement, hotel, flat, muur, boom, paleis, villa house, building, house, car, street, office, room, farm, garden, shop, church, bridge, small house, appartment, hotel, flat, wall, tree, palace, villa
verliefdheid (enamour, love) 79 jaloezie, verraad, afgunst, weerzin, romance, hartstocht, overspel, passie, erotiek, vriendschap, obsessie, schuldgevoelen, fascinatie, vergankelijkheid, seksualiteit, animositeit, seks, lust, verlangen, zeeroof jealousy, treason, envy, dislike, romance, passion, adultery, passion, erotics, friendship, obsession, feelings of guilt, fascination, transiency, sexuality, animosity, sex, lust, desire, piracy
witlof 80 broccoli, prei, spruitje, knolselderij, andijvie, courgette, sperzieboon, zuurkool, worteltje, bleekselderij, bloemkool, snijboon, aubergine, peen, zilveruitje, ijsbergsla, koolsoort, winterpeen, doperwtjes, komkommer broccoli, leek, sprout, celeriac, endive, zucchini, butter bean, sauerkraut, carrot, blanched celery, cauliflower, haricot, aubergine, carrot, onion, iceberg lettuce, cabbage, carrot, peas, cucumber
Conclusion 81 Syntactically annotated corpora are perhaps potentially somewhat useful
It s Free! 82 http://www.let.rug.nl/vannoord/alp/alpino/ http://www.let.rug.nl/vannoord/trees/ http://www.let.rug.nl/vdplas/sets/browse.php http://www.let.rug.nl/gosse/sets/