Unit 8: Argument Types and Grammatical Functions
Learning Objectives
- Distinguish semantic roles from grammatical functions and explain why the two do not always align
- Describe how different languages identify grammatical functions (word order, agreement, case marking)
- Explain the obliqueness hierarchy and its relevance to argument reordering constructions
- Analyse the deep/surface dependency distinction and its implications for NLP information extraction
Reading
Read Chapter 8 (Argument Types and Grammatical Functions) of Bender, E. M. (2013). Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax. Morgan & Claypool. Use the course materials below to activate and consolidate the concepts from that chapter.
Core Input
Read through each tab. Take notes on the key ideas before moving to the activities.
Semantic roles describe the role a participant plays in an event or state. They are meaning-level descriptions, not grammatical categories. The principal roles include:
- Agent — the volitional doer of the action
- Patient — the thing affected by the action
- Theme — the thing moved or described
- Goal — the destination or endpoint
- Source — the origin
- Experiencer — the one who perceives or feels
- Instrument — the tool used
- Beneficiary — the one who benefits
Example: The dog (Agent) bit the man (Patient).
There is no universally agreed-upon set of semantic roles (#68): FrameNet uses hundreds of fine-grained roles; VerbNet uses approximately 35; classical generative linguistics uses around ten macro-roles. The lack of universality means that NLP semantic role labelling (SRL) systems must make deliberate design decisions about their role inventory. SRL is a core NLP task for event extraction, question answering, and inference.
Grammatical functions are the syntactic positions arguments occupy: Subject, Direct Object, Indirect Object, Oblique, and Complement. They are positional and structural categories, not semantic ones.
The subject is the "distinguished argument" (#70): it controls verb agreement, has special access to reflexive binding, and is the understood subject of controlled predicates (She tried to leave — she is subject of both tried and leave).
Crucially, the subject is not always the Agent:
- "The window broke" — the window is subject but is the Patient.
- "She was frightened by the noise" — she is subject but is the Experiencer.
Grammatical function ≠ semantic role. This non-alignment is one of the most important facts for NLP systems that aim to interpret meaning rather than merely parse structure.
Surface dependencies are the syntactic relations visible in the sentence — who is the subject, what is the direct object.
Deep dependencies are the underlying semantic relations — who is doing what to whom.
Example: "The cat was chased by the dog."
- Surface: cat is subject.
- Deep: dog is Agent; cat is Patient.
For NLP tasks that require understanding who did what to whom (#74), surface structure is insufficient. Relation extraction, question answering, and knowledge base population all need deep (semantic) dependencies. Systems that read only surface syntax will confuse agent and patient in passive sentences; deep parsing is required to recover the true event structure.
Key Concepts A (Concepts #68–#77)
Expand each concept. Think about your answer before reading the explanation.
Linguists disagree on the granularity and membership of semantic role inventories. Nevertheless, some roles are near-universally recognised:
- Agent: volitional, animate, causes the event
- Patient: affected by the event, undergoes change
- Theme: moves or is described — e.g. the location theme in "She put the book on the table"
- Goal: the endpoint — "She gave the book to him" — him is Goal
- Experiencer: experiences a mental state — "She feared the dog" — she is Experiencer, not Agent (not volitional)
- Instrument: "She opened the door with a key" — key is Instrument
The lack of universality means NLP SRL systems must make explicit design decisions about their role inventory. FrameNet, VerbNet, and PropBank each make different choices; systems trained on one inventory may not transfer well to another.
Syntactic argument types include:
- Subject — the distinguished syntactic argument of the predicate
- Direct object — the NP following a transitive verb, often accusative in case-marking languages
- Indirect object — the NP that receives the transferred item, often dative
- Oblique — a PP argument, more peripheral in the argument hierarchy
These categories may not transfer cross-linguistically. Some languages have no structural distinction between direct and indirect object, using only case marking to differentiate them. NLP systems trained on English syntactic categories should not assume those categories are universal.
In English, the subject displays a cluster of special properties that no other argument shares:
- Controls subject–verb agreement (she runs / they run)
- Is the understood subject of infinitival complements (she tried to leave — she is also the subject of leave)
- Controls reflexive binding (she hurt herself — herself refers to the subject)
- Appears before the verb in declaratives
- Is deleted in imperatives (Leave! = you leave)
These properties cluster on the subject in English; in other languages the privileged argument may have a different profile. Importantly, subjects are not always Agents: "The door opened" — subject = Patient.
The obliqueness hierarchy ranks arguments by directness of their relation to the head:
Subject > Direct Object > Indirect Object > Oblique
More oblique = more peripheral: more likely to be omitted, more likely to be expressed as a PP rather than a bare NP, less able to control agreement. The hierarchy predicts promotion patterns in passive, dative shift, and causative constructions. Knowing that an argument is more oblique helps predict how it will be realised in different syntactic environments.
Clauses can occupy argument positions just as NPs do:
- "She believes [that he lied]" — finite clause as object
- "She wants [to leave]" — infinitival clause as object
- "She saw [him running]" — participial clause as object
- "She asked [whether he would come]" — embedded question as object
Finite clauses carry their own tense and agreement; non-finite clauses share a subject with the matrix clause. NLP clausal argument extraction is essential for understanding beliefs, desires, communication, and mental state reports — all central to opinion mining and event extraction.
The same surface subject can bear many different semantic roles:
- "The cat was chased by the dog" — syntactic subject = cat; semantic Agent = dog
- "It rained" — syntactic subject = it (expletive, no semantic role)
- "She seems to know the answer" — syntactic subject = she, raised from the embedded clause
- "The door opened" — syntactic subject = door (Patient, not Agent)
The regular relations (#73) are learned mappings: by default, Agent maps to subject and Patient maps to object — but morphosyntactic constructions such as the passive can rearrange this (#82).
Several core NLP tasks require recovering who did what to whom, not merely what is the syntactic subject:
- Question answering: "Who was chased by the dog?" — the answer is semantically the Patient, not the syntactic subject.
- Information extraction: "The company acquired its rival" and "The rival was acquired by the company" encode the same deep relation (acquirer/acquired) in different surface configurations.
- Machine translation: translating a passive must generate the appropriate voice in the target language, which requires knowing the deep roles.
NLP SRL systems aim to assign semantic roles regardless of surface syntactic realisation. This makes deep dependency analysis a prerequisite for accurate meaning extraction.
The mapping between semantic roles and grammatical functions is a lexical property — different verbs map the same participants differently:
- buy: Buyer → Subject; Goods → Direct Object; Seller → Oblique (optional from)
- sell: Seller → Subject; Goods → Direct Object; Buyer → Oblique (to)
The same participants (Buyer, Seller, Goods) occupy different syntactic positions depending on which verb is chosen. NLP lexical resources such as VerbNet encode these mappings explicitly; they are central to both SRL and machine translation.
Grammatical function — not semantic role — is the variable that controls many syntactic phenomena:
- Agreement: the verb agrees with the subject in English; in Basque, with both subject and object.
- Reflexive binding: controlled by the subject (she hurt herself, not *herself hurt her).
- Relative clause extraction: subject relatives are easiest (the dog that bit the man); object relatives harder (the man that the dog bit); oblique relatives most marked (the man that the dog gave the bone to).
For NLP, grammatical function is the key variable for many syntactic phenomena. Parsers must correctly identify grammatical functions before higher-level phenomena can be analysed.
Default mappings connect grammatical function to semantic role:
- Subject → Agent (for active transitive verbs)
- Direct Object → Patient
- Indirect Object → Recipient / Goal
These defaults hold for canonical active sentences. With the verb's lexical mapping in hand (#75) and the grammatical function identified (#76), the semantic role can often be inferred. Grammatical function identification (parsing) is thus a prerequisite for semantic role labelling — even though the two levels of analysis are not the same thing.
Key Concepts B (Concepts #78–#82)
Continue with the remaining concepts from Chapter 8.
In English, position is the primary cue for grammatical function: the NP before the verb is the subject; the NP after a transitive verb is the direct object. This is the canonical (default) word order.
Consequence for passive sentences: the word-order cue (NP before verb) points to the syntactic subject even though that NP is semantically the Patient. The same surface heuristic applies, but it now tracks a non-Agent.
NLP parsers for English can use word order as a primary feature for grammatical function assignment. This strategy transfers poorly to languages where word order is freer (Japanese, Turkish, Russian) and other cues — case marking, agreement — do the work instead.
In agreement-marking languages, the verb carries morphological information that identifies its subject (and sometimes object):
- Swahili: a-li-m-pig-a (he-PAST-her-hit-FINAL = "he hit her") — agreement prefixes mark both subject (a-) and object (-m-) on the verb.
- Basque: agreement tracks both the absolutive and ergative arguments.
In subject-agreement languages, the verb agreement suffix can disambiguate subject identity even when word order varies. NLP systems for agreement-marking languages must extract subject identity from agreement morphology, not only from position.
Case marking encodes grammatical function directly on the noun phrase, allowing word order to vary freely:
- Japanese: -ga (nominative/subject), -wo (accusative/direct object), -ni (dative/indirect object, location, direction)
- German: Nominativ (subject), Akkusativ (direct object), Dativ (indirect object), Genitiv (possessor)
- Urdu/Hindi: -ne (ergative, subject in perfective), -ko (accusative/dative)
- Turkish: six cases — nominative, accusative, dative, locative, ablative, genitive
NLP parsers for case-marking languages read grammatical function from case particles or inflections, not from position. This enables analysis of free word order languages where positional heuristics would fail.
Two typological strategies encode grammatical-function information:
- Head-marking: the head of a phrase encodes information about its dependents. Example: verb agreement in Swahili and Basque marks the subject (and sometimes object) on the verb (the head), not on the noun (the dependent).
- Dependent-marking: the dependent encodes information about its relation to the head. Example: case marking on the noun (the dependent), as in Japanese and German.
English is mixed: it shows dependent-marking (case on pronouns: he/him) but also head-marking (verb agreement). Cross-linguistically, head-marking is the more common strategy. NLP parsers must know which encoding strategy a language uses in order to extract grammatical function information correctly.
Several constructions systematically alter the default mapping of semantic roles to grammatical functions:
- Passive: Agent demoted to oblique (by-phrase) or omitted; Patient promoted to subject. "The company was acquired by the rival."
- Causative: adds a Causer argument as the new subject; the original subject becomes object or oblique.
- Dative shift: "She gave the book to him" ↔ "She gave him the book" — same participants, different syntactic positions.
- Applicative: in many Bantu languages, adds a Beneficiary or Instrument as a promoted object.
These rearrangements mean that the same event may be described with very different surface argument patterns. NLP SRL systems must normalise to the deep dependency structure (#74) rather than reading roles from surface position.
Worked Examples
Study each tab carefully. Make sure you can explain the NLP relevance of each example.
The following pairs show the same event expressed in active and passive voice. Study the surface and deep dependencies in each case.
| Sentence | Surface subject | Surface object | Semantic Agent | Semantic Patient |
|---|---|---|---|---|
| The dog chased the cat. | dog | cat | dog | cat |
| The cat was chased by the dog. | cat | — | dog | cat |
| The engineer approved the report. | engineer | report | engineer | report |
| The report was approved by the engineer. | report | — | engineer | report |
| The window was broken. (agentless passive) | window | — | unexpressed | window |
NLP implication: SRL must recover the Agent even when it is not the subject. Passive voice is common in scientific and legal text — systems that read surface structure will misassign roles, identifying the Patient as the "doer."
English allows two constructions for ditransitive verbs — the PP dative and the double-object (NP dative):
(a) PP dative: "She sent the report to the committee."
(b) NP dative: "She sent the committee the report."
In the PP dative (a): committee is Oblique (Goal, inside PP); report is Direct Object (Theme).
In the NP dative (b): committee is Indirect Object (Goal, bare NP); report is Direct Object (Theme).
Both encode the same event: the same Agent, Theme, and Goal, with the same semantic roles — only the syntactic positions differ.
Not all verbs participate in dative shift:
- "She explained the theory to him." ✓
- *"She explained him the theory." ✗ — explain does not allow the NP dative
NLP implication: Relation extraction must recognise both constructions as expressing the same underlying relation. The argument positions of the same semantic role (Goal) differ between the two surface forms, requiring normalisation.
Case marking directly encodes grammatical function on the noun phrase. The Japanese case system illustrates this:
| Particle | Case | Grammatical function |
|---|---|---|
| -ga | Nominative | Subject (intransitive/unergative); also some objects |
| -wo | Accusative | Direct object |
| -ni | Dative | Indirect object; location; direction; time |
| -no | Genitive | Possessor |
Japanese example:
田中さんが本を図書館に返した
Tanaka-san-ga hon-wo toshokan-ni kaeshita
Tanaka-NOM book-ACC library-DAT returned
“Tanaka returned the book to the library.”
German uses four grammatical cases with noun declension (Nominativ, Akkusativ, Dativ, Genitiv), allowing considerable word order variation while keeping grammatical function unambiguous.
NLP implication: Parsers for case-marking languages must read grammatical function from case particles or inflections; positional heuristics that work for English will fail on these languages.
Languages differ in how they align subjects and objects across transitive and intransitive verbs:
| Alignment type | Examples | Pattern |
|---|---|---|
| Nominative-accusative | English, French, Japanese, German | Subject of intransitive and subject of transitive marked the same (Nominative); object of transitive is Accusative |
| Ergative-absolutive | Basque, many Indigenous Australian and Mayan languages | Subject of intransitive and object of transitive marked the same (Absolutive); subject of transitive marked differently (Ergative) |
| Split ergativity | Hindi-Urdu | Ergative pattern in perfective aspect; nominative-accusative pattern in imperfective |
NLP implication: Alignment type determines which argument positions map to which semantic roles. Cross-lingual SRL systems cannot assume English-style mapping; they must detect the alignment system of each target language (#80, #81).
Check Your Understanding
Select the best answer for each question.
In the sentence 'The prize was won by the student', the student is in a by-phrase oblique. Which concept states that for NLP tasks requiring event understanding, this surface relation may be less important than the underlying semantic dependency?
A computational linguist is building a semantic role labeller for Japanese. She notes that the same event is expressed using case particles (-ga, -wo, -ni) regardless of word order. Which concept best explains why this allows word order to vary freely in Japanese?
Agent and patient identification
LLMs show systematic errors in identifying semantic roles in passive sentences. When asked "who performed the action" in a passive clause, models frequently report the syntactic subject (the Patient) rather than the Agent in the by-phrase. This is a direct consequence of reading surface dependencies instead of deep ones (#73, #74).
Event extraction and SRL
NLP systems for relation extraction, knowledge base population, and question answering require identifying semantic roles. Current neural SRL systems perform well on English but struggle with passives, unusual argument orders, and lower-resource languages with different alignment types (#75, #80, #82).
Passive voice and agency in AI-generated text
LLMs use passive voice in ways that systematically obscure agency: "Mistakes were made." "The data was processed." "The model was trained on…" The passive construction removes the Agent from subject position, often omitting it entirely. Concept #82 (rearrangement of the lexical mapping) explains the grammatical mechanism. This pattern is frequent in AI-generated text and has ethical implications for accountability: when agency is removed, it becomes impossible to ask "who is responsible?"
Cross-linguistic argument structure
LLMs trained primarily on English learn English-specific mappings of semantic roles to grammatical functions (#75). For languages with different alignment systems (ergative-absolutive, split ergativity), these mappings do not transfer. Even for closely related languages (French, Spanish), passive and causative constructions have different surface realisations (#80, #82), and models fine-tuned on English role labels perform poorly when applied cross-lingually.
Activities
Individual task
For each sentence below, identify (a) the grammatical subject, (b) the semantic role of the subject (Agent, Patient, Experiencer, Theme, Goal), and (c) the semantic role of any direct object. Use #77 as a guide: start from grammatical function and apply the default mapping, then check whether any rearrangement (#82) has occurred.
- "The manager approved the report."
- "The report was approved by the manager."
- "She received a long letter."
- "The window shattered."
- "The teacher sent the students the assignment."
For each, note whether the surface reading (grammatical function) and the deep reading (semantic role) match or differ, and state which concept explains any mismatch.
Pair task
Find five sentences in a news article or academic paper that use passive constructions. For each sentence:
- Identify the surface subject and state its semantic role.
- Identify the Agent — if expressed in a by-phrase, state who it is; if omitted, note that it is unexpressed.
- Discuss why the writer may have chosen the passive construction — consider topic prominence, agent demotion, and agency concealment as possible motivations.
Link your observations explicitly to #82 (rearrangement of the lexical mapping) and #74 (deep semantic dependencies matter for applications).
Group task
Compare how three languages (English, Japanese, and one of: Turkish, German, Swahili, Hindi) express the following event:
"The teacher sent the students the assignment."
Provide a glossed example for each language. For each language, identify:
- How grammatical functions are marked — word order (#78), agreement (#79), or case marking (#80)?
- Whether the language is primarily head-marking or dependent-marking (#81).
- What challenges this typological profile poses for a multilingual SRL system attempting to extract consistent role labels across all three languages.
Review
- #68 — No universal semantic role set; role inventories are design decisions
- #69 — Syntactic argument types (subject, direct object, indirect object, oblique) may not be universal
- #70 — Subject is the distinguished argument, controlling agreement, reflexives, and control
- #71 — Arguments arrange on an obliqueness hierarchy: Subject > Direct Object > Indirect Object > Oblique
- #72 — Clauses (finite and non-finite) can also be arguments
- #73 — Syntactic and semantic arguments are not the same, though they stand in regular relations
- #74 — For most NLP applications, deep (semantic) dependencies matter more than surface (syntactic) relations
- #75 — Lexical items map semantic roles to grammatical functions; different verbs map the same participants differently
- #76 — Syntactic phenomena (agreement, reflexives, extraction) are sensitive to grammatical function, not semantic role
- #77 — Identifying grammatical function helps infer semantic role via the default mapping
- #78 — Some languages (English) identify grammatical functions through word order
- #79 — Some languages (Swahili, Basque) identify grammatical functions through agreement morphology
- #80 — Some languages (Japanese, German) identify grammatical functions through case marking
- #81 — Head-marking (dependency info on the head) is more common cross-linguistically than dependent-marking (info on the dependent)
- #82 — Passive, causative, dative shift, and applicative constructions rearrange the lexical mapping of roles to functions
Surface dependencies are the syntactic relations visible in a sentence (who is the subject, what is the direct object). Deep dependencies are the underlying semantic relations (who is Agent, who is Patient, who is Beneficiary).
In canonical active sentences the two align: the subject is the Agent and the object is the Patient. But several grammatical constructions break this alignment (#82): in a passive, the Patient is the subject; in an agentless passive, the Agent is absent from the surface entirely; in a causative, a new Causer argument displaces the original subject.
Semantic role labelling (SRL) is the NLP task that bridges the gap: it assigns semantic role labels (Agent, Patient, Goal, etc.) to argument phrases regardless of their surface syntactic position. SRL systems must therefore look beyond position to lexical mapping (#75), construction type (#82), and deep dependency structure (#74).
Cross-linguistic complications arise from alignment (#80, #81): the same event expressed in a nominative-accusative language and an ergative-absolutive language will have different surface argument patterns, requiring language-specific or language-aware processing.
AI failure mode: LLMs systematically misread passive sentences by treating the surface subject as the Agent. This is a direct consequence of relying on surface patterns rather than deep dependency analysis (#73, #74).
Proceed to Unit 9: Semantic and Syntactic Mismatches when ready.