Chunking
Core Chunking Terms
Chunking: The process of grouping words in a sentence into syntactically related phrases (chunks), such as noun phrases (NP), verb phrases (VP), and prepositional phrases (PP), without building a full parse tree.
Chunk: A group of words that function together as a single syntactic unit. Example: "the big dog" (NP), "ran quickly" (VP).
IOB Tagging: A scheme for marking chunk boundaries:
- B- (Beginning): Marks the first word of a chunk
- I- (Inside): Marks subsequent words in the same chunk
- O (Outside): Marks words not part of any chunk
Noun Phrase (NP): A chunk containing a noun and its modifiers. Example: "the red ball"
Verb Phrase (VP): A chunk containing a verb and its auxiliaries or complements. Example: "is running fast"
Prepositional Phrase (PP): A chunk starting with a preposition and its object. Example: "in the park"
Adverbial Phrase (ADVP): A chunk containing adverbs and their modifiers. Example: "very quickly"
Adjectival Phrase (ADJP): A chunk containing adjectives and their modifiers. Example: "extremely happy"
Cross-Linguistic Terms
Postposition: A word that follows a noun to indicate grammatical relationships, common in Hindi. Example: "घर में" (in the house)
SOV Order: Subject-Object-Verb word order, typical in Hindi.
SVO Order: Subject-Verb-Object word order, typical in English.
Analysis Terms
Chunk Boundary: The point in a sentence where one chunk ends and another begins.
Gold Standard Annotation: The correct chunking of a sentence, used for evaluation.
Chunking Accuracy: The percentage of correctly identified chunks compared to the gold standard.
Phrase Structure: The hierarchical organization of words into chunks and phrases.
Shallow Parsing: Another term for chunking, focusing on phrase-level analysis rather than full syntactic parsing.