Unit 5: Syntax

Learning Objectives

Explain how syntax constrains possible sentences and why this matters for NLP
Identify constituents in English sentences and describe their hierarchical organisation
Describe attachment ambiguity and explain why it challenges NLP parsers
Analyse the relationship between syntactic constraints and semantic interpretation

Reading

Read Chapter 5 (Syntax) of Bender, E. M. (2013). Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax. Morgan & Claypool. Use the course materials below to activate and consolidate the concepts from that chapter.

Core Input

Read through each tab. Take notes on the key ideas before moving to the activities below.

Syntax is the system of rules governing how words combine into phrases and sentences. Its core insight is that sentences are not simply linear strings of words — they have hierarchical structure.

Three foundational concepts underpin syntactic analysis:

Constituency — words do not combine with each other one-by-one in a chain; they group into intermediate units called constituents. A constituent is a group of words that behave as a single unit in the grammar — they can be moved together, replaced together, or questioned together.
Phrase structure — constituents are organised hierarchically, not just linearly. The sentence The big dog chased the frightened cat contains a noun phrase (The big dog) and a verb phrase (chased the frightened cat), and within the verb phrase there is another noun phrase (the frightened cat). The structure is nested, not flat.
Heads — every phrase has a head — one word that determines the category and core properties of the phrase. A noun phrase (NP) has a noun as its head; a verb phrase (VP) has a verb as its head; a prepositional phrase (PP) has a preposition as its head. The head determines what other elements the phrase can contain and what it can combine with.

A key property of syntax is recursion: phrases can embed inside phrases of the same or different type, potentially without limit. A noun phrase can contain a relative clause, which itself contains a noun phrase, which itself could contain a relative clause. This is why human language can express indefinitely complex thoughts in a single sentence.

NLP implication: phrase structure trees are the input to many semantic processing systems. Parsing — constructing the phrase structure of a sentence — is a fundamental NLP task. Without a correct parse, downstream tasks (semantic role labelling, question answering, coreference resolution) lack the structural scaffolding they need.

Word order is one of the most fundamental syntactic properties of a language, and concept #44 captures the core insight: syntax places constraints on possible sentences. Not every ordering of words is grammatical.

In English, the canonical order is SVO (Subject–Verb–Object):

The dog chased the cat. — grammatical
*Chased the dog the cat. — ungrammatical (violates English word order constraints)

Languages differ in their canonical word order and in how strictly they enforce it:

Japanese is SOV — the verb comes last. The argument order is flexible because grammatical function is marked by case particles (see Unit 4), not by position. A Japanese parser reads case particles, not word order, to determine which NP is subject and which is object.
Welsh is VSO — the verb comes first, followed by subject and object. The subject is identified by its position after the verb, not before it.
Warlpiri (an Australian Aboriginal language) has very free word order: constituents can appear in almost any sequence because case morphology unambiguously marks grammatical function regardless of position.

Constituency tests are formal procedures for identifying the boundaries of syntactic units. Three standard tests are:

Movement test — a constituent can be moved as a unit to the front of the sentence: The frightened cat, the big dog chased. The NP the frightened cat moves as a unit; if the moved string is grammatical, it is a constituent.
Replacement (pronominalization) test — a constituent can be replaced by a pronoun or pro-form: The big dog chased it. The NP the frightened cat is replaced by it.
Question formation test — a constituent can be questioned: What did the big dog chase? The NP the frightened cat is questioned by what.

NLP implication: parsers for different languages must implement different phrase structure rules. An English parser trained on SVO order will fail on SOV Japanese or VSO Welsh — the structural rules are language-specific, not universal.

Concept #45 states that syntax provides scaffolding for semantic composition. The meaning of a sentence is not simply the sum of the meanings of its words — it is derived from the meanings of its parts combined according to their syntactic structure. This is the principle of compositionality.

A minimal pair illustrates this forcefully:

The dog bit the man.
The man bit the dog.

The same four words, in two different orders, yield radically different meanings — different agent, different patient, different situation. Syntax determines who does what to whom. Without syntactic structure, the word meanings alone are not enough to construct the sentence meaning.

Structural ambiguity makes this even clearer:

Old men and women — two readings: (a) old men and old women; (b) old men and (all-age) women. The same words, combined with different phrase structure, yield different meanings.

Concept #46 deepens this: constraints ruling out some strings as ungrammatical also constrain the range of possible semantic interpretations of other strings. The syntactic rules that make *Quickly the dog ran ungrammatical are the same rules that, when satisfied by The dog ran quickly, determine the structural relationship between the verb and the adverb — and therefore enable the semantic interpretation. Well-formedness constraints do double duty: they filter impossible strings and they scaffold the semantic interpretation of possible ones.

NLP implication: semantic role labelling, question answering, textual inference, and information extraction all depend on knowing the syntactic structure of the input. Without syntax, semantic interpretation is guesswork. Errors in parsing propagate directly into errors in meaning representation.

Key Concepts A: Syntactic Constraints and Semantic Scaffolding (#44–#45)

Expand each concept. Consider the NLP implication before reading the explanation.

Syntactic constraints determine which sequences of words are grammatical sentences in a language. These constraints are not merely preferences — they categorically rule out certain strings. Three main types of syntactic constraint are relevant:

Category constraints — certain syntactic categories must appear in certain positions. In English, a transitive verb must be followed by a noun phrase object, not an adverb: The dog chased the cat vs *The dog chased quickly (as the direct object slot). The verb's syntactic category constrains what can fill the object position.
Subcategorisation — verbs select for particular argument types. This is a lexical-syntactic property:
- sleep is intransitive — it takes no object: She slept vs *She slept the baby.
- put is ditransitive and locative — it requires both a direct object and a location: She put the book on the shelf vs *She put the book (missing location) vs *She put on the shelf (missing object, in the relevant sense).
- seem takes a clausal complement: It seems that she left vs *It seems her.
Subcategorisation frames are lexically stored properties of individual verbs that constrain what syntactic structures they can head.
Agreement constraints — morphological features must be consistent across related elements (see Unit 4). In Spanish, *el niña is ungrammatical because the masculine article el conflicts with the feminine noun niña. The syntactic constraint is that the determiner and noun must match in gender.

NLP implication: syntactic constraints dramatically reduce the search space for a parser. Without them, every ordering of every word would need to be considered as a potential sentence — the search space is exponential. With syntactic constraints, the parser can immediately rule out impossible analyses. Constraint violation is also the diagnostic signal for grammatical error detection.

The principle of compositionality holds that the meaning of a complex expression is determined by the meanings of its parts and the syntactic structure combining them. Syntax is not merely a formal convention — it is the mechanism by which word meanings are assembled into sentence meanings.

Basic case — thematic roles: the subject NP of a transitive active sentence is the agent; the object NP is the patient. This assignment of thematic roles is entirely determined by syntactic position, not by word meaning alone:

The dog bit the man. — dog = agent, man = patient
The man bit the dog. — man = agent, dog = patient

Identical words, different syntax, different meaning. The syntactic structure (which NP is subject, which is object) is the scaffolding that determines who does what to whom.

Advanced case — quantifier scope: Every student read a book has two interpretations: (a) for every student, there is a (possibly different) book they read; (b) there is a specific book that every student read. These two readings differ in the scope of the quantifiers every and a. Both readings arise from the same surface syntactic structure — the disambiguation requires applying scope-bearing elements in different orders over the syntactic structure. Semantic interpretation is computed over the syntax; without the syntax, the scope ambiguity cannot even be properly stated.

NLP implication: semantic role labelling (who did what to whom), question answering (what is the answer entity in the syntax), and natural language inference (do these two sentences entail each other) all depend on the syntactic structure of the input. A system that operates directly on word sequences without a parse is attempting semantic interpretation without the scaffolding on which interpretation depends.

Key Concepts B: Constraints, Ambiguity, and Recursion (#46)

Continue with the remaining core concept and the key syntactic phenomena it connects to.

This concept makes explicit the double function of syntactic well-formedness constraints:

Filtering function — they rule out impossible strings. The constraint that past tense in English is formed by -ed (or an irregular allomorph, see Unit 3) rules out *The dog bited the man.
Interpretive function — the same constraints, when satisfied by a well-formed string, provide the structural scaffolding for semantic interpretation. The constraint that sentences have a subject NP + VP structure means that when a string conforms to this pattern, the subject NP can be assigned to its thematic role (agent or experiencer) and the VP's semantic content can be composed with it.

A concrete illustration: consider the constraint that English transitive verbs take an NP object. This rules out *She saw quickly (adverb in object position). But the same constraint, satisfied by She saw the man, determines that the man is the object — the entity seen — and therefore the patient/theme of the event. The filtering constraint and the interpretive scaffolding are two sides of the same structural rule.

NLP implication: syntactic well-formedness constraints do double duty — they are both a filter on the input space and a structural representation that supports semantic computation. A parser that identifies the constituent structure of a sentence is simultaneously ruling out impossible analyses and providing the representation needed for interpretation.

PP attachment ambiguity arises when a prepositional phrase can be interpreted as modifying either the verb phrase or a noun phrase within it. The classic example:

I saw the man with the telescope.

Two syntactically well-formed readings:

VP attachment: [I saw the man] [with the telescope] — I used the telescope to see the man. The PP attaches to the VP.
NP attachment: [I saw [the man with the telescope]] — the man has a telescope. The PP attaches to the NP the man.

Both readings are grammatical; syntax alone cannot determine which is intended. Resolving the ambiguity requires semantic knowledge (is it plausible to use a telescope to see?) and pragmatic knowledge (is the telescope already in discourse context?).

The problem compounds with multiple PPs: She photographed the model with the camera on the tripod has four or more potential attachment analyses. For n PPs, the number of possible attachment structures grows combinatorially — this is the attachment ambiguity explosion.

NLP implication: PP attachment is one of the hardest problems in syntactic parsing. NLP parsers trained on English corpora have historically preferred low attachment — attaching PPs to the most recent NP — which is wrong approximately 40% of the time for human-judged sentences. Improving PP attachment accuracy requires semantic and world knowledge beyond what syntax provides.

Recursion allows syntax to embed phrases inside phrases, in principle without limit. Relative clauses provide the clearest example:

The cat [that the dog [that the child owned] chased] ran away.

Here a relative clause is embedded inside a relative clause, yielding a centre-embedded structure. Human parsers handle such structures, though they become cognitively demanding at deep embeddings. NLP parsers must handle recursive structures correctly; many neural parsers perform well at standard depths but degrade on deep or unusual embeddings.

Long-distance dependencies arise when two syntactically related elements are separated by an arbitrary amount of intervening material. Wh-questions in English are the paradigm case:

What₁ did the girl who won the prize think the teacher said she deserved ___₁?

The wh-word what appears at the beginning of the sentence, but it is semantically the object of deserved — several embedded clauses deep. The dependency between what and its gap spans multiple clause boundaries. Human parsers track this dependency; NLP parsers and sequence models are known to perform worse on long-distance dependencies than on local ones, particularly for novel or syntactically unusual configurations.

NLP implication: neural sequence models handle local syntactic patterns well because they are frequent in training data. But rare long-distance dependencies or deeply nested relative clauses fall outside the distribution of common training examples, and model performance degrades. This is an architectural limitation, not merely a data limitation.

Worked Examples: Syntax in Action

Work through these examples to see how syntactic structure operates in English and other languages, and what challenges each poses for NLP.

Constituent structure of a simple sentence: The large dog chased the frightened cat.

Labelled bracket notation showing all constituents:

      [S  [NP The large dog]

          [VP chased

              [NP the frightened cat]]]

Node labels:

S — Sentence: the root node; consists of NP (subject) + VP (predicate)
NP (The large dog) — Noun Phrase: Det + Adj + N; head is dog
VP (chased the frightened cat) — Verb Phrase: V + NP; head is chased
NP (the frightened cat) — Noun Phrase: Det + Adj + N; head is cat

Applying the three constituency tests to the frightened cat:

Test	Result	Conclusion
Pronominalization	The large dog chased it.	Replaced as a unit ✓
Movement	The frightened cat, the large dog chased.	Moved as a unit to sentence-initial position ✓
Question formation	What did the large dog chase?	Questioned as a unit ✓

All three tests confirm that the frightened cat is a constituent (an NP). This is the procedural basis on which linguistic phrase structure is established. NLP parsers must produce representations equivalent to this labelled bracket structure to support downstream semantic tasks.

PP attachment ambiguity is one of the hardest problems in NLP parsing. Both readings of an ambiguous sentence are syntactically well-formed; disambiguation requires knowledge beyond the syntax.

Example 1: I saw the man with the telescope.

      Reading A (VP attachment):

      [S [NP I] [VP saw [NP the man] [PP with the telescope]]]

      → I used the telescope to see the man.

      Reading B (NP attachment):

      [S [NP I] [VP saw [NP the man [PP with the telescope]]]]

      → The man has a telescope.

Example 2: She photographed the model with the camera on the tripod.

Two PPs (with the camera, on the tripod) each have multiple possible attachment sites: photographed (VP), the model (NP), or even the camera (NP). The combinatorial possibilities grow with each additional PP.

NLP parser behaviour:

Parsers trained on English corpora tend to prefer low attachment (attaching the PP to the most recently mentioned NP) because this is statistically more frequent in the training data.
Low attachment is correct roughly 60% of the time — meaning it is wrong approximately 40% of the time. This is a systematic error source in NLP parsing.
Semantic constraints can help: if the PP contains an instrument (telescope, camera), VP attachment (used to do the action) is often more plausible than NP attachment (the entity has the instrument). But exploiting such constraints requires semantic and world knowledge.

Phrase structure rules are language-specific. The same NLP parsing approach cannot be applied unchanged across languages with different word orders.

Japanese (SOV — Subject–Object–Verb):

inu-ga neko-wo oi-kaketa

dog-NOM cat-ACC chase-PAST

“The dog chased the cat.”

The verb comes last. The parser cannot know the predicate until the final word of the clause — it must hold the arguments in memory while waiting for the verb that determines what the predicate–argument structure is.

Welsh (VSO — Verb–Subject–Object):

Gwelodd y dyn y gath.

saw-3SG.PAST the man the cat

“The man saw the cat.”

The verb comes first. The subject follows the verb, not precedes it; the parser identifies the subject by its position immediately after the verb, not before it.

German Verb-Second (V2) in main clauses:

Gestern sah ich den Hund.

Yesterday saw I the.ACC dog

“Yesterday I saw the dog.” (Lit. Yesterday saw I the dog.)

German main clauses require the finite verb to be the second constituent. When a topic (here gestern, yesterday) is fronted, the subject moves to after the verb. Case marking identifies the subject (ich, nominative) despite its non-canonical post-verbal position.

NLP implication: each of these languages requires a parser with language-specific phrase structure rules. A parser for Japanese must hold arguments pending verb arrival; a parser for German must handle V2 movement and identify subjects by case, not position. Cross-lingual transfer of parsing systems requires explicit modelling of these structural differences.

Long-distance wh-dependency:

      What₁ did the girl who won the prize think

          the teacher said

              she deserved ___₁?

The gap (___₁) is the object of deserved, three embedded clauses deep. The wh-word what at the front of the sentence is co-indexed with this gap. A parser must track this dependency across:

The matrix clause: did ... think
The first embedded clause: the teacher said
The second embedded clause: she deserved ___

Human parsers handle this; NLP parsers and language models are known to perform worse on long-distance dependencies — particularly when the intervening material is complex.

Japanese pre-nominal relative clauses:

[kodomo-ga katta] inu

[child-NOM bought] dog

“The dog that the child bought”

In Japanese, relative clauses are pre-nominal: the entire modifying clause appears before the noun it modifies. For complex relative clauses, the head noun — which determines the semantic type of the whole NP — is not encountered until after the entire embedded clause has been processed. The NLP parser must hold the relative clause structure in memory pending the head noun.

NLP implication: transformer-based language models theoretically allow attention connections between any two positions in a sentence. But in practice, performance on rare long-distance dependencies and deeply nested embeddings is worse than on common local patterns — even with transformer attention. The problem is not only architectural but distributional: rare structures are underrepresented in training data, and models do not generalise well beyond the distribution they have seen.

Check Your Understanding

Select the best answer for each question.

An NLP system is asked to interpret 'I photographed the tourist with the camera.' It selects only one reading. Which concept best explains why this input is inherently problematic even for a perfect syntactic parser?

#44 — syntax constrains possible sentences #45 — syntax provides scaffolding for semantic composition #46 — constraints on ungrammatical strings restrict interpretations of grammatical ones None — PP attachment is a purely semantic problem, not a syntactic one

A phrase structure grammar rules out the string *'Quickly chased dog cat the' as ungrammatical. According to concept #46, what does this grammaticality constraint also tell us?

Nothing — grammaticality constraints only filter strings and have no semantic implication That the words in the string have no meaning when unordered That grammatical strings with the same words are constrained in how their meaning can be computed That word order is irrelevant in natural language because meaning can always be recovered from context

AI Dimension

Four issues from this unit connect directly to how AI language systems handle syntax — and where they characteristically fail:

Parsing and sequence prediction — LLMs generate text token by token. They implicitly learn syntactic preferences from training data but have no explicit parse tree or phrase structure representation. They handle common constructions well — the same constructions that dominate their training data — but fail on rare, deeply nested, or syntactically unusual structures, because those patterns are underrepresented in training (#44, #46).
Attachment ambiguity — LLMs resolve PP attachment ambiguities using statistical tendencies from training data, not semantic understanding. They default to the most common attachment pattern regardless of context, producing interpretively incorrect outputs when the semantics requires the less frequent reading. This is exactly the low-attachment bias documented in traditional NLP parsers, now reproduced at scale in neural generation (#45).
Long-distance dependencies — transformer attention theoretically connects any two positions in a sentence, providing a potential mechanism for handling long-distance dependencies. In practice, LLMs perform worse on test sentences that contain deep embeddings or novel long-distance configurations not well represented in their training distribution. The architectural mechanism does not guarantee generalisation to rare structural patterns (#45, #46).
Structural errors in generation — LLMs sometimes generate syntactically malformed sentences: missing required arguments, wrong subcategorisation frame (a verb used transitively when it is intransitive, or vice versa), or dangling constituents — particularly in lower-resource languages or when prompted in unusual stylistic registers. Concept #44 provides the diagnostic framework: the system has violated a subcategorisation or category constraint. Identifying and correcting such violations programmatically requires explicit grammatical knowledge that the LLM does not expose.

Activities

Individual task — Constituent structure and ambiguity analysis

Identify the constituent structure of the following sentences. Use labelled brackets to show NP, VP, PP, and S boundaries. Then identify any potential attachment ambiguities and describe the alternative readings.

The student saw the professor with a telescope.
She told the woman that she had won the prize.
The old men and women sang.

For each sentence, structure your answer as: (a) labelled bracket representation of the most likely reading; (b) if structurally ambiguous, labelled bracket representation of the alternative reading; (c) what information — syntactic, semantic, or pragmatic — would be required to select between the two readings.

Pair task — Phrase structure comparison: English and Japanese

Compare the basic phrase structure of English and Japanese by examining the following three sentences. If you do not know Japanese, look up the Japanese equivalents using a reliable language resource or dictionary.

The cat chased the mouse.
I read an interesting book yesterday.
She thinks that he will come.

For each sentence pair:

Note the word order in the English version and the word order in the Japanese version. Express both using S, V, O, and adverbial (ADV) notation.
Draw a labelled bracket structure for both the English and Japanese sentences, identifying NP, VP, and any embedded clauses.
Discuss what challenges the difference in phrase structure poses for a machine translation system working between English and Japanese — focusing on what information the parser must have at each point in the sentence and when.

Group task — Syntactic ambiguity corpus and disambiguation strategies

As a group, collect five examples of syntactically ambiguous sentences — you may draw from newspapers, social media, AI-generated text, or construct them yourselves. Aim for variety: include at least one PP attachment ambiguity, one scope ambiguity, and one structural ambiguity involving coordination.

For each sentence:

Draw two alternative phrase structure representations using labelled bracket notation. Label each reading clearly (e.g. Reading A: VP-attachment / Reading B: NP-attachment).
State which reading is most likely in context and explain what contextual or world knowledge supports that reading.
Classify the disambiguation strategy required: purely syntactic (structural frequency alone resolves it), semantic (world knowledge about plausibility of events), or pragmatic (discourse context, speaker intention). Justify your classification.

After working through all five examples, discuss as a group: What does the range of disambiguation strategies required tell us about the limits of purely syntactic parsing for NLP? What additional resources or representations would a robust NLP system need?

Review

#44 — Syntax places constraints on possible sentences.: Syntactic constraints operate at three levels: category constraints (certain categories must fill certain positions), subcategorisation (individual verbs select for specific argument types — transitive, intransitive, ditransitive, clausal complement), and agreement constraints (morphological features must be consistent across related elements). These constraints reduce the search space for parsers, enable grammatical error detection, and define the space of structurally possible interpretations. Without them, every ordering of every word would need to be evaluated — computationally intractable for real sentences.
#45 — Syntax provides scaffolding for semantic composition.: The principle of compositionality holds that sentence meaning is derived from word meanings combined according to syntactic structure. The assignment of thematic roles (agent, patient, experiencer) is determined by syntactic position, not word meaning alone — the dog bit the man and the man bit the dog contain the same words but describe opposite events. Quantifier scope ambiguities can only be properly stated over a syntactic representation. All downstream semantic tasks — semantic role labelling, question answering, textual inference, information extraction — require syntactic structure as their input. Without syntax, semantic interpretation is guesswork.
#46 — Constraints ruling out ungrammatical strings also constrain the interpretations of grammatical strings.: Syntactic well-formedness constraints do double duty. They are filters that rule out impossible strings, and they are scaffolds that, when satisfied by a well-formed string, determine how its meaning is composed. The constraint that a transitive verb takes an NP object rules out *She saw quickly; the same constraint, satisfied by She saw the man, determines that the man is the object — the entity seen — and scaffolds its semantic role assignment. Filtering and interpretation are two sides of the same structural rule.

Syntactic analysis presents NLP with five major challenges:

PP attachment ambiguity — prepositional phrases can attach to the VP or to an NP, and both readings are syntactically well-formed. Disambiguation requires semantic and pragmatic knowledge. NLP parsers default to low attachment (most recently mentioned NP), which is wrong approximately 40% of the time. With multiple PPs, the number of possible analyses grows combinatorially.
Long-distance dependencies — syntactically related elements (such as a wh-word and its gap) can be separated by arbitrarily large amounts of intervening material. NLP parsers and language models perform worse on long-distance dependencies than on local ones, particularly for rare or novel configurations not well represented in training data.
Recursive structure — phrases embed inside phrases without principled limit. Deeply nested centre-embedded relative clauses are grammatically well-formed but computationally challenging. Neural parsers degrade on deep embeddings.
Cross-linguistic phrase structure differences — SVO, SOV, and VSO languages require different phrase structure rules. Japanese verb-final order, German V2 movement, and Warlpiri free word order each require language-specific parsing strategies. Parsers trained on English cannot be directly applied to typologically different languages.
The relationship between syntactic and semantic interpretation — attachment ambiguities, scope ambiguities, and structural ambiguities in coordination cannot be resolved by syntax alone. Semantic and pragmatic knowledge must interact with syntactic analysis. Systems that separate syntax and semantics into strict pipeline stages lose the bidirectional interaction that human language understanding relies on.

Proceed to Unit 6: Semantics when ready.

Unit 5: Syntax

Learning Objectives

Core Input

Key Concepts A: Syntactic Constraints and Semantic Scaffolding (#44–#45)

#44 — Syntax places constraints on possible sentences.

#45 — Syntax provides scaffolding for semantic composition.

Key Concepts B: Constraints, Ambiguity, and Recursion (#46)

#46 — Constraints ruling out some strings as ungrammatical usually also constrain the range of possible semantic interpretations of other strings.

PP attachment ambiguity — when syntax alone is not enough.

Recursion and long-distance dependencies.

Worked Examples: Syntax in Action

Check Your Understanding

Activities

Review

Summary: What are the three core concepts from this unit (#44–#46)?

Summary: What are the key syntactic challenges for NLP?