Computer Science & Engineering Natural Language Processing Lab Experiments

Building Chunker

Chunking is an analysis of a sentence which identifies the constituents (noun groups, verbs, verb groups, etc.) which are correlated. These are non-overlapping regions of text. Usually, each chunk contains a head, with the possible addition of some function words and modifiers either before or after depending on languages. These are non-recursive in nature i.e. a chunk cannot contain another chunk of the same category.


Some of the groups possible are:

1. Noun Group


2. Verb Group


For example, the sentence 'He reckons the current account deficit will narrow to only 1.8 billion in September.' can be divided as follows:


[NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only 1.8 billion ] [PP in ] [NP September ]


Each chunk has an open boundary and close boundary that delimit the word groups as a minimal non-recursive unit.