POS Tagging - Hidden Markov Model
1. Ambiguity in POS Tagging
Sentence:
They can fish in the can.
- Task: Identify ambiguous words and list all possible POS tags for each.
- Explain: How does context help determine the correct POS sequence?
2. Emission and Transition Probabilities
Training Corpus:
Sentence |
---|
The/DET dog/NOUN barks/VERB loudly/ADV |
A/DET cat/NOUN sleeps/VERB peacefully/ADV |
The/DET cat/NOUN barks/VERB |
Dogs/NOUN sleep/VERB |
Questions:
A. Calculate:
- P(dog | NOUN)
- P(cat | NOUN)
B. Calculate:
- P(VERB | NOUN)
- P(ADV | VERB)
C. Which transition is more likely:
- NOUN → VERB
- VERB → ADV?
3. Viterbi Algorithm Application
Sentence:
The dog sleeps
Probability Matrices:
Transition Probabilities:
- P(NOUN | DET) = 0.8
- P(VERB | NOUN) = 0.6
- P(END | VERB) = 0.9
Emission Probabilities:
- P(the | DET) = 0.7
- P(dog | NOUN) = 0.4
- P(sleeps | VERB) = 0.5
Task:
Show your step-by-step calculations to find the most likely POS tag sequence using the Viterbi algorithm.
4. POS Tag Ambiguity
POS-tagged Sentences:
Sentence 1: I/PRON will/AUX park/VERB the/DET car/NOUN Sentence 2: The/DET park/NOUN is/VERB beautiful/ADJ Sentence 3: Park/PROPN Avenue/PROPN is/VERB busy/ADJ
- Task: Are there any words that appear with different POS tags?
- Explain: How does an HMM handle such ambiguities?
5. HMM Design and Probability Calculation
Mini Tagset:
{DET, NOUN, VERB}
Training Sentences:
- The/DET boy/NOUN runs/VERB
- A/DET girl/NOUN walks/VERB
- The/DET dog/NOUN sleeps/VERB
- The/DET cat/NOUN jumps/VERB
Tasks:
- Calculate all transition probabilities:
A = {ai,j} - Calculate all emission probabilities for the word "the"
- Calculate initial state probabilities:
π = {πi}