Learning Objectives

  • Describe how morphological analysers, deep syntactic parsers, and typological databases operationalise the linguistic knowledge from this course
  • Connect the 100 Bender concepts to recurring themes in NLP system design and failure analysis
  • Evaluate current AI systems using the linguistic framework developed across the course
  • Identify future directions in linguistically informed NLP and responsible multilingual AI development

Reading

Read Chapter 10 (Review and Outlook) of Bender, E. M. (2013). Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax. Morgan & Claypool. This final unit consolidates all 100 concepts and examines the NLP tools that operationalise them.

1

Core Input

Read through each tab to frame the review and synthesis activities below.

The preceding nine units introduced 100 linguistic concepts. This final unit asks: how do NLP systems operationalise this knowledge — and what happens when they don't?

Three landmark tools represent the application of linguistic knowledge to computational text processing:

  • Morphological analysers (#98) — software systems that map surface word forms to their underlying morphological structure: roots, affixes, features. Examples include the two-level morphology tradition (Koskenniemi 1983), the XTAG system, and language-specific tools for Arabic (MADA), Turkish (ITU), and Finnish (Omorfi). These tools encode the knowledge from Units 2–4.
  • Deep syntactic parsers (#99) — parsers that map surface sentences to semantic representations including predicate-argument structure, semantic role assignments, and grammatical function labels. Examples include the LFG-based XLE parser, the HPSG-based English Resource Grammar, and semantic parsers like the AMR (Abstract Meaning Representation) parser. These tools build on the knowledge from Units 5–9.
  • Typological databases (#100) — structured databases that summarise properties of the world's languages at a high level. Examples include WALS (World Atlas of Language Structures), Glottolog, ASJP, and the AUTOTYP database. These tools make the cross-linguistic knowledge from Units 1–4 computationally accessible.

Together, these tools represent the practical infrastructure of linguistically informed NLP. They allow engineers to go beyond surface pattern matching to genuine structural analysis.

Looking across all 100 concepts, several recurring themes emerge:

  • Structure matters — from #1 (morphosyntax vs bag of words) to #94 (long-distance dependencies), the course has shown repeatedly that ignoring linguistic structure leads to systematic NLP failures. Surface fluency is not the same as structural correctness.
  • Language diversity is the norm — concepts #4, #5, #6, #20, #21, #42, #43, #49, #78–#81 all establish that the structural properties of English are not universal. Any NLP system designed with only English in mind is partial by design.
  • Form and meaning do not map simply — concepts #73, #74, #83–#97 show that the relationship between syntactic structure and semantic interpretation is mediated by a complex set of constructions. NLP systems that assume a one-to-one mapping will fail on passives, raising verbs, expletives, and argument drop.
  • Linguistic knowledge compounds — the levels build on each other. Morphology feeds syntax; syntax scaffolds semantics (#45). Understanding any level in isolation is insufficient.
  • Error analysis requires linguistic tools — #0 remains the framing concept: knowing about linguistic structure is what allows you to diagnose NLP failures as belonging to specific linguistic levels, not random noise.

The field is moving rapidly, but the linguistic foundations established in this course remain relevant regardless of which systems dominate at any given time. Several directions are particularly active:

  • Neuro-symbolic approaches — combining neural models with explicit symbolic linguistic knowledge. Rather than purely learning from data, these systems encode structural constraints — valency frames, agreement rules, case systems — as hard or soft constraints on the model's output.
  • Cross-linguistic and multilingual NLP — applying the typological knowledge in this course (#4, #6, #100) to build systems that handle diverse morphological types, word orders, and grammatical function-marking strategies. Universal Dependencies (Unit 6) and multilingual pre-trained models (mBERT, XLM-R) are current tools in this effort.
  • Low-resource language NLP — developing tools for the approximately 6,900 languages that lack substantial digital text (#5). Linguistic knowledge about morphological structure and typological relatedness can guide systems in the absence of large training corpora.
  • Linguistically informed evaluation — moving beyond general accuracy metrics to evaluate NLP systems on specific linguistic phenomena: passive interpretation, agreement consistency, semantic role assignment, long-distance dependency tracking. Concept #0 provides the rationale: targeted linguistic evaluation reveals systematic failure modes that aggregate metrics hide.
  • Responsible and ethical AI — including the linguistic dimensions: representation of diverse languages (#5, #6), avoidance of English-centric structural assumptions, transparency about what linguistic knowledge a model has and lacks.
2

Key Concepts: NLP Tools That Operationalise Linguistic Knowledge

Expand each concept to understand what it means for NLP systems to operationalise linguistic knowledge.

A morphological analyser takes a word form — as it appears in text — and produces a structured representation of its internal morphological composition. This bridges the gap between the surface orthographic form (what we see) and the underlying morphological structure (what we need for NLP processing).

For the input walked, a morphological analyser might output:

walk+V+PAST

root: walk, category: Verb, feature: Past Tense

For a Turkish form like evlerinizden:

ev+N+PL+2PL.POSS+ABL

root: ev (house), Noun, Plural, 2nd-person-plural Possessive, Ablative case

Morphological analysers draw on all the knowledge from Units 2–4: morpheme structure (#7–#22), allomorphic variation (#23–#27), and morphosyntactic feature systems (#28–#43). They are prerequisites for parsing morphologically rich languages, and they enable lemmatisation, morphological disambiguation, and feature extraction.

NLP implication: without a morphological analyser, an NLP pipeline for Turkish, Arabic, or Finnish must rely on subword tokenisation as an approximation — and as Units 2–3 showed, this approximation is imperfect and introduces noise.

A deep syntactic parser goes beyond surface phrase structure to produce representations that include predicate-argument structure, semantic role labels, and grammatical function assignments. The term "deep" refers to the depth of linguistic analysis, not to deep learning.

For the sentence The cat was chased by the dog, a deep parser might produce:

chase(Agent: dog, Patient: cat)

surface subject (cat) = semantic Patient; surface oblique (dog) = semantic Agent

This is exactly the kind of deep dependency that concept #74 identifies as crucial for most NLP applications. The deep parser must understand passive (#84), semantic roles (#68–#69), and the surface/deep distinction (#73).

Deep parsers draw on all the knowledge from Units 5–9: constituency and phrase structure (#44–#46), parts of speech (#47–#50), heads and arguments (#51–#67), grammatical functions (#68–#82), and syntactic-semantic mismatches (#83–#97).

NLP implication: shallow parsers (which only identify phrase boundaries without semantic role information) are insufficient for question answering, relation extraction, and knowledge base population. Deep parsing is the gateway to semantic NLP.

A typological database provides structured, cross-linguistically comparable information about how the world's languages are structured. Key resources include:

  • WALS (World Atlas of Language Structures) — covers 2,676 languages across 192 structural features including word order, case systems, tonal distinctions, morphological complexity, and much more. Freely available online.
  • Glottolog — comprehensive catalogue of the world's languages and language families; the authoritative source for the ~7,000 languages and 128 families of concept #5.
  • Universal Dependencies (UD) — a treebank project that provides cross-linguistically consistent annotation of syntactic structure for 100+ languages, enabling cross-lingual parser training and evaluation.

Typological databases operationalise the cross-linguistic knowledge from Unit 1 (#4, #5, #6) and the typological patterns discussed in Units 2–8 (morphological type, word order, case systems, agreement patterns, alignment type).

NLP implication: typological databases allow NLP engineers to look up the structural properties of an unfamiliar language before building a system for it. Rather than applying English-centric assumptions and discovering failures in deployment, the engineer can anticipate that a target language is SOV, highly agglutinative, and head-marking — and design accordingly.

3

Connecting the 100 Concepts: Themes Across the Course

Review the complete set of 100 concepts organised by unit. Use the accordions to check your understanding of each group before the quiz.

  • #0 — Knowing about linguistic structure is important for feature design and error analysis in NLP.
  • #1 — Morphosyntax is the difference between a sentence and a bag of words.
  • #2 — The morphosyntax of a language is the constraints that it places on how words can be combined both in form and in the resulting meaning.
  • #3 — Languages use morphology and syntax to indicate who did what to whom, and make use of a range of strategies to do so.
  • #4 — Languages can be classified 'genetically', areally, or typologically.
  • #5 — There are approximately 7,000 known living languages distributed across 128 language families.
  • #6 — Incorporating information about linguistic structure and variation can make for more cross-linguistically portable NLP systems.

  • #7 — Morphemes are the smallest meaningful units of language, usually consisting of a sequence of phones paired with concrete meaning.
  • #8 — The phones making up a morpheme don't have to be contiguous.
  • #9 — The form of a morpheme doesn't have to consist of phones.
  • #10 — The form of a morpheme can be null.
  • #11 — Root morphemes convey core lexical meaning.
  • #12 — Derivational affixes can change lexical meaning.
  • #13 — Root+derivational affix combinations can have idiosyncratic meanings.
  • #14 — Inflectional affixes add syntactically or semantically relevant features.
  • #15 — Morphemes can be ambiguous and/or underspecified in their meaning.
  • #16 — The notion 'word' can be contentious in many languages.
  • #17 — Constraints on order operate differently between words than they do between morphemes.
  • #18 — The distinction between words and morphemes is blurred by processes of language change.
  • #19 — A clitic is a linguistic element which is syntactically independent but phonologically dependent.
  • #20 — Languages vary in how many morphemes they have per word (on average and maximally).
  • #21 — Languages vary in whether they are primarily prefixing or suffixing in their morphology.
  • #22 — Languages vary in how easy it is to find the boundaries between morphemes within a word.

  • #23 — The morphophonology of a language describes the way in which surface forms are related to underlying, abstract sequences of morphemes.
  • #24 — The form of a morpheme (root or affix) can be sensitive to its phonological context.
  • #25 — The form of a morpheme (root or affix) can be sensitive to its morphological context.
  • #26 — Suppletive forms replace a stem+affix combination with a wholly different word.
  • #27 — Alphabetic and syllabic writing systems tend to reflect some but not all phonological processes.

  • #28 — The morphosyntax of a language describes how the morphemes in a word affect its combinatoric potential.
  • #29 — Morphological features associated with verbs and adjectives (and sometimes nouns) can include information about tense, aspect and mood.
  • #30 — Morphological features associated with nouns can contribute information about person, number and gender.
  • #31 — Morphological features associated with nouns can contribute information about case.
  • #32 — Negation can be marked morphologically.
  • #33 — Evidentiality can be marked morphologically.
  • #34 — Definiteness can be marked morphologically.
  • #35 — Honorifics can be marked morphologically.
  • #36 — Possessives can be marked morphologically.
  • #37 — Yet more grammatical notions can be marked morphologically.
  • #38 — When an inflectional category is marked on multiple elements of sentence or phrase, it is usually considered to belong to one element and to express agreement on the others.
  • #39 — Verbs commonly agree in person/number/gender with one or more arguments.
  • #40 — Determiners and adjectives commonly agree with nouns in number, gender and case.
  • #41 — Agreement can be with a feature that is not overtly marked on the controller.
  • #42 — Languages vary in which kinds of information they mark morphologically.
  • #43 — Languages vary in how many distinctions they draw within each morphologically marked category.

  • #44 — Syntax places constraints on possible sentences.
  • #45 — Syntax provides scaffolding for semantic composition.
  • #46 — Constraints ruling out some strings as ungrammatical usually also constrain the range of possible semantic interpretations of other strings.
  • #47 — Parts of speech can be defined distributionally (in terms of morphology and syntax).
  • #48 — Parts of speech can also be defined functionally (but not metaphysically).
  • #49 — There is no one universal set of parts of speech, even among the major categories.
  • #50 — Part of speech extends to phrasal constituents.

  • #51 — Words within sentences form intermediate groupings called constituents.
  • #52 — A syntactic head determines the internal structure and external distribution of the constituent it projects.
  • #53 — Syntactic dependents can be classified as arguments and adjuncts.
  • #54 — The number of semantic arguments provided for by a head is a fundamental lexical property.
  • #55 — In many (perhaps all) languages, (some) arguments can be left unexpressed.
  • #56 — Words from different parts of speech can serve as heads selecting arguments.
  • #57 — Adjuncts are not required by heads and generally can iterate.
  • #58 — Adjuncts are syntactically dependents but semantically introduce predicates with take the syntactic head as an argument.
  • #59 — Obligatoriness can be used as a test to distinguish arguments from adjuncts.
  • #60 — Entailment can be used as a test to distinguish arguments from adjuncts.
  • #61 — Adjuncts can be single words, phrases, or clauses.
  • #62 — Adjuncts can modify nominal constituents.
  • #63 — Adjuncts can modify verbal constituents.
  • #64 — Adjuncts can modify other types of constituents.
  • #65 — Adjuncts express a wide range of meanings.
  • #66 — The potential to be a modifier is inherent to the syntax of a constituent.
  • #67 — Just about anything can be an argument, for some head.

  • #68 — There is no agreed upon universal set of semantic roles, even for one language; nonetheless, arguments can be roughly categorized semantically.
  • #69 — Arguments can also be categorized syntactically, though again there may not be universal syntactic argument types.
  • #70 — A subject is the distinguished argument of a predicate and may be the only one to display certain grammatical properties.
  • #71 — Arguments can generally be arranged in order of obliqueness.
  • #72 — Clauses, finite or non-finite, open or closed, can also be arguments.
  • #73 — Syntactic and semantic arguments aren't the same, though they often stand in regular relations to each other.
  • #74 — For many applications, it is not the surface (syntactic) relations, but the deep (semantic) dependencies that matter.
  • #75 — Lexical items map semantic roles to grammatical functions.
  • #76 — Syntactic phenomena are sensitive to grammatical functions.
  • #77 — Identifying the grammatical function of a constituent can help us understand its semantic role with respect to the head.
  • #78 — Some languages identify grammatical functions primarily through word order.
  • #79 — Some languages identify grammatical functions through agreement.
  • #80 — Some languages identify grammatical functions through case marking.
  • #81 — Marking of dependencies on heads is more common cross-linguistically than marking on dependents.
  • #82 — Some morphosyntactic phenomena rearrange the lexical mapping.

  • #83 — There are a variety of syntactic phenomena which obscure the relationship between syntactic and semantic arguments.
  • #84 — Passive is a grammatical process which demotes the subject to oblique status, making room for the next most prominent argument to appear as the subject.
  • #85 — Related constructions include anti-passives, impersonal passives, and middles.
  • #86 — English dative shift also affects the mapping between syntactic and semantic arguments.
  • #87 — Morphological causatives add an argument and change the expression of at least one other.
  • #88 — Many (all?) languages have semantically empty words which serve as syntactic glue.
  • #89 — Expletives are constituents that can fill syntactic argument positions that don't have any associated semantic role.
  • #90 — Raising verbs provide a syntactic argument position with no (local) semantic role, and relate it to a syntactic argument position of another predicate.
  • #91 — Control verbs provide a syntactic and semantic argument which is related to a syntactic argument position of another predicate.
  • #92 — In complex predicate constructions the arguments of a clause are licensed by multiple predicates working together.
  • #93 — Coordinated structures can lead to one-to-many and many-to-one dependency relations.
  • #94 — Long-distance dependencies separate arguments/adjuncts from their associated heads.
  • #95 — Some languages allow adnominal adjuncts to be separated from their head nouns.
  • #96 — Many (all?) languages can drop arguments, but permissible argument drop varies by word class and by language.
  • #97 — The referent of a dropped argument can be definite or indefinite, depending on the lexical item or construction licensing the argument drop.

  • #98 — Morphological analyzers map surface strings (words in standard orthography) to regularized strings of morphemes or morphological features.
  • #99 — 'Deep' syntactic parsers map surface strings (sentences) to semantic structures, including semantic dependencies.
  • #100 — Typological databases summarize properties of languages at a high level.
4

Worked Examples: Diagnosing NLP Failures Linguistically

Each tab presents a real NLP failure mode. Apply the concepts from this course to diagnose the failure and identify which linguistic knowledge would prevent it.

Failure: An information extraction system processes the sentence "The policy was approved by the committee." It outputs the relation: approve(subject=policy, object=committee).

Diagnosis:

  • The system has assigned the surface subject (policy) the role of Approver and the by-phrase (committee) the role of Approved — which is semantically reversed.
  • #84 — Passive demotes the subject to oblique; the committee is the semantic Agent even though it appears in the by-phrase.
  • #74 — For information extraction, it is the deep dependency (committee approved policy) that matters, not the surface relation.
  • #73 — Syntactic and semantic arguments are not the same; the system has confused syntactic subject with semantic agent.

Fix: Use a deep parser (#99) that assigns semantic roles regardless of surface position, or train a semantic role labeller that handles passive constructions.

Failure: A fixed-vocabulary neural MT system is applied to Turkish. The word yapamayacaktım (I was not going to be able to do it) appears as a single out-of-vocabulary token. The system backs off to a character-level representation and produces an incorrect translation.

Diagnosis:

  • #20 — Turkish is agglutinative; a single word encodes what English requires a full clause to express. A fixed vocabulary cannot represent the open-ended surface form space.
  • #22 — Morpheme boundaries are clear in Turkish; a morphological analyser (#98) could segment this as: yap-ama-yacak-tı-m (do-NEG.ABILITY-FUT-PAST-1SG).
  • #14 — The suffixes express inflectional features: negation of ability, future tense, past tense (counterfactual), first-person singular. Each must be represented correctly in the translation.

Fix: Apply a morphological analyser (#98) before NMT; or use a subword model with a vocabulary tuned to Turkish morpheme boundaries rather than frequency.

Failure: An LLM generates the French sentence *"Les grandes problèmes sont résolus." The noun problèmes is masculine plural; the adjective should be grands (M PL), not grandes (F PL).

Diagnosis:

  • #40 — Determiners and adjectives agree with nouns in number, gender, and case. The system has assigned the wrong gender form to the adjective.
  • #30 — Gender is a morphological feature associated with nouns that must be stored lexically; problème is masculine in French, but the model has treated it as feminine (perhaps influenced by the -e ending, which in many French nouns signals feminine gender).
  • #38 — Agreement features belong to the noun (controller) and are copied to the adjective (target). The model has failed to track the gender feature from controller to target.

Fix: Explicit feature tracking in a structured prediction model; or lexicon-informed post-editing using a morphological analyser (#98) to check agreement.

Failure: A question-answering system is asked: "Who did the minister claim the committee had recommended for the position?" The system returns: the committee (the nearest NP before the verb).

Diagnosis:

  • #94 — Long-distance dependencies separate arguments from their associated heads. Who is the object of recommended, not of claim — but the distance between who and the gap spans two clause boundaries.
  • #45 — Syntax provides scaffolding for semantic composition. Resolving the question requires tracking the wh-dependency through the embedded clause structure, not just matching the nearest plausible NP.
  • #72 — The embedded clause [that the committee had recommended ___ for the position] is a clausal argument of claim; the gap inside it is the object of recommended.

Fix: Use a deep parser (#99) that explicitly represents wh-movement and gap-filling; or train on data containing complex embedded questions.

5

Check Your Understanding

Select the best answer for each question.

A researcher wants to build an NLP system for a language she has never worked with before. She consults a resource that tells her the language is SOV, uses postpositions rather than prepositions, has six noun cases, is predominantly suffixing, and has no grammatical gender. Which type of NLP tool (from concepts #98–#100) is she consulting?

Correct! Concept #100 — typological databases summarise properties of languages at a high level. A resource that provides cross-linguistically comparable structural features (word order, case systems, morphological type, adposition type) is a typological database, such as WALS (World Atlas of Language Structures). This resource operationalises the cross-linguistic knowledge from Units 1–4 (#4, #20, #21, #31, #42, #43) and allows the researcher to design her NLP system with the language's structural properties in mind before collecting data.
Not quite — review the material and try again. Concept #100 — typological databases summarise properties of languages at a high level. A resource that provides cross-linguistically comparable structural features (word order, case systems, morphological type, adposition type) is a typological database, such as WALS (World Atlas of Language Structures). This resource operationalises the cross-linguistic knowledge from Units 1–4 (#4, #20, #21, #31, #42, #43) and allows the researcher to design her NLP system with the language's structural properties in mind before collecting data.

An NLP engineer says: 'Our system achieves 94% accuracy on English, so it should generalise well to other languages.' Which combination of Bender concepts provides the most direct counter-argument?

Correct! Concepts #6, #42, and #49 collectively make the counter-argument. #6 states that incorporating linguistic structure and variation makes NLP systems more cross-linguistically portable — implying that a system that ignores cross-linguistic variation will not be portable. #42 states that languages vary in which kinds of information they mark morphologically — so features that predict accuracy in English may not be present in other languages. #49 states that there is no universal set of parts of speech — so category-based features learned from English data may not transfer. Together, these concepts show that high English accuracy is not evidence of cross-linguistic robustness; it may instead reflect over-fitting to English-specific structural patterns.
Not quite — review the material and try again. Concepts #6, #42, and #49 collectively make the counter-argument. #6 states that incorporating linguistic structure and variation makes NLP systems more cross-linguistically portable — implying that a system that ignores cross-linguistic variation will not be portable. #42 states that languages vary in which kinds of information they mark morphologically — so features that predict accuracy in English may not be present in other languages. #49 states that there is no universal set of parts of speech — so category-based features learned from English data may not transfer. Together, these concepts show that high English accuracy is not evidence of cross-linguistic robustness; it may instead reflect over-fitting to English-specific structural patterns.
AI Dimension

Across this course, each unit's AI Dimension identified a specific failure mode of large language models that follows from a set of Bender concepts. In this final unit, we draw these together into a coherent picture.

  • Stochastic parrots and linguistic form — the course quote from Bender et al. (2021) captures the central problem: LLMs produce plausible-sounding text, leading humans to project understanding onto them. The concepts in this course describe exactly what understanding would require: knowledge of morpheme structure (#7–#22), phonological conditioning (#23–#27), grammatical feature systems (#28–#43), constituent structure (#44–#50), predicate-argument structure (#51–#67), semantic role mapping (#68–#82), and the full range of syntactic-semantic mismatches (#83–#97). LLMs learn statistical patterns over surface forms; they do not compute these structural properties.
  • English-centrism and scale — training on more data does not address the structural diversity of the world's languages (#5, #6, #42, #43). An LLM trained on a trillion tokens of predominantly English text has learned English morphosyntax thoroughly and the morphosyntax of other languages poorly. Scale amplifies the dominance of already-dominant languages; it does not resolve it.
  • Linguistically informed evaluation — concept #0 provides the programme for AI evaluation: design targeted tests based on specific linguistic phenomena. Passive interpretation (#84), long-distance dependency resolution (#94), agreement tracking (#38–#40), argument drop resolution (#96–#97), and cross-linguistic morphological analysis (#98) are all dimensions on which current LLMs can be evaluated systematically. Aggregate accuracy metrics hide systematic failures at specific linguistic levels.
  • Responsible multilingual AI — the 7,000 languages of #5 are not equally served by current AI systems. The concepts in this course provide both the motivation (linguistic diversity is real and structures differ fundamentally) and some of the tools (typological databases #100, morphological analysers #98, deep parsers #99) for building more equitable multilingual systems.

The course has given you the linguistic vocabulary to ask better questions of AI systems — and to design, evaluate, and critique them more rigorously.

6

Activities

Individual task — Concept mapping

Choose any five concepts from the course (from different units). For each concept:

  1. State the concept in your own words.
  2. Give one example from a language other than English that illustrates the concept.
  3. Describe one NLP task or system where this concept is directly relevant.
  4. Identify one way in which ignoring this concept leads to a predictable NLP failure.

Your five concepts should span at least three different units. This exercise prepares you for linguistically framed evaluation of NLP systems.

Pair task — Evaluating NLP failures linguistically

With a partner, collect five examples of errors made by an NLP system (a machine translation tool, a chatbot, a speech recogniser, or a summarisation system).

For each error:

  1. Describe the input and the incorrect output.
  2. Identify which linguistic level the failure involves — morphology, morphophonology, morphosyntax, syntax, POS, argument structure, grammatical function, or syntactic-semantic mismatch.
  3. Cite the specific Bender concept(s) that explain the failure.
  4. State what linguistic knowledge would be needed to handle the input correctly.

Compile your analysis into a short table: Input / Output / Failure type / Concept(s) / Fix.

Group task — Designing a linguistically informed NLP workflow

As a group, design a multilingual NLP pipeline for one of the following tasks:

  • Machine translation between English and a morphologically rich SOV language (Turkish, Japanese, or Hindi)
  • Named entity recognition for a low-resource language with a non-Latin script
  • Semantic role labelling across English, Arabic, and Mandarin Chinese

Your design should address:

  1. What preprocessing steps are needed (tokenisation, morphological analysis) — link to #98
  2. What structural differences between languages must the system handle — link to #4, #20, #42, #78–#81
  3. What typological resources you would consult before building the system — link to #100
  4. What deep parsing capabilities are required — link to #99, #74
  5. What linguistically grounded evaluation criteria you would use — link to #0

Present your design as a structured pipeline diagram with brief justifications for each component.

Review

  • #98 — Morphological analysers operationalise Units 2–4. They map surface word forms to morpheme sequences and feature bundles, enabling NLP systems to process morphologically rich languages correctly rather than relying on surface-form statistics.
  • #99 — Deep syntactic parsers operationalise Units 5–9. They map sentences to predicate-argument structures with semantic role labels, providing the deep dependency representation that concept #74 identifies as essential for most NLP applications.
  • #100 — Typological databases operationalise Units 1–4 and the cross-linguistic dimension throughout the course. They give NLP engineers structured access to the structural properties of languages they are building systems for, enabling linguistically informed design from the outset rather than trial-and-error deployment.

Together, these three tools represent the practical infrastructure through which the 100 concepts of this course are made computationally operational.

  1. Linguistic structure matters (#0, #1, #44, #45) — NLP systems that ignore linguistic structure make predictable errors that can be diagnosed and addressed using the concepts from this course.
  2. Language diversity is the norm, not the exception (#4, #5, #6, #20, #42, #43, #49) — English is one language among 7,000. Its structural properties are not universal; NLP systems designed around English alone will fail systematically on other languages.
  3. Form and meaning do not map simply (#73, #74, #83–#97) — passive constructions, raising verbs, expletives, long-distance dependencies, and argument drop all create gaps between surface syntactic form and deep semantic meaning. NLP systems must be able to bridge this gap.
  4. Levels of linguistic analysis compound — morphology feeds syntax (#28, #44); syntax scaffolds semantics (#45); semantics depends on grammatical functions (#77). Failure at any level propagates upward.
  5. Linguistic knowledge enables better AI (#98, #99, #100) — morphological analysers, deep parsers, and typological databases are the tools through which linguistic knowledge improves NLP systems. The concepts in this course provide the foundation for using and evaluating these tools critically.

Course Complete

You have completed Linguistics for Natural Language Processing and worked through all 100 concepts from Bender (2013). You now have the linguistic vocabulary to design, evaluate, and critically analyse NLP systems — and to understand why surface fluency is not the same as linguistic understanding.