corpus_data under MyDriveICNALE corpus under MyDrive/corpus_fileBy the end of this session, students will be able to:
- Understand NLP tasks such as POS tagging and dependency parsing
- Understand how automated parsing works
- Conduct multi-lingual Part-Of-Speech (POS) tagging using TagAnt
- Conduct POS tagging using spaCy library in Python (through Google Colab)
- Conduct Dependency parsing using spaCy library in Python (through Google Colab)
Task 1: POS tagging with TagAnt
Task 2: POS-sensitive frequency list
Task 3: Understanding dependency grammar through visualization
Input FilesLanguageFollowings are basic selection in TagAnt.
| Menu | Function | Example |
|---|---|---|
| word | tokenization | dogs, ran |
| pos | POS tag (simple) | NOUN, VERB |
| pos_tag | POS tag (detailed) | NNS, VBD |
| lemma | lemmatized word | dog, run |
word+posword+pos_tagword+lemmaword+pos+lemma| Display Information | Example |
|---|---|
| word | カキ を 食べ たい |
| word+pos | カキ_NOUN を_ADP 食べ_VERB たい_AUX |
| word+pos_tag | カキ_名詞-普通名詞-一般 を_助詞-格助詞 食べ_動詞-一般 たい_助動詞 |
| word+lemma | カキ_カキ を_を 食べ_食べる たい_たい |
aozora_50Before
「大溝」
僕は本所界隈のことをスケツチしろといふ社命を受け、同じ社のO君と一しよに久振りに本所へ出かけて行つた。
After
「_補助記号-括弧開_「 大溝_名詞-固有名詞-人名-姓_大溝 」_補助記号-括弧閉_」
_SPACE_ 僕_代名詞_僕 は_助詞-係助詞_は 本所_名詞-固有名詞-地名-一般_本所 界隈_名詞-普通名詞-一般_界隈 の_助詞-格助詞_の こと_名詞-普通名詞-一般_こと を_助詞-格助詞_を スケツチ_名詞-普通名詞-一般_スケツチ しろ_動詞-非自立可能_する と_助詞-格助詞_と いふ_動詞-一般_いふ 社命_名詞-普通名詞-一般_社命 を_助詞-格助詞_を 受け_動詞-一般_受ける 、_補助記号-読点_、 同じ_連体詞_同じ 社_名詞-普通名詞-助数詞可能_社 の_助詞-格助詞_の O_名詞-普通名詞-一般_o 君_接尾辞-名詞的-一般_君 と_助詞-格助詞_と 一しよ_名詞-普通名詞-サ変可能_一しよ に_助詞-格助詞_に 久_形容詞-一般_久い 振り_接尾辞-名詞的-一般_振り に_助詞-格助詞_に 本所_名詞-固有名詞-地名-一般_本所 へ_助詞-格助詞_へ 出_動詞-一般_出る かけ_動詞-非自立可能_かける て_助詞-接続助詞_て 行つ_動詞-一般_行ふ た_助動詞_た 。_補助記号-句点_。
AntConc, create following frequency lists:
動詞-非自立可能*_動詞-非自立可能
非自立可能動詞
loading other models
ROOT,head, dependency type, and dependent.head)ROOTThe following is a dependency for I play baseball.
simple-dependency
The same sentence, I play baseball can be expressed in the following format
| tid | token | dep | head |
|---|---|---|---|
| 1 | I | nsubj | 2 |
| 2 | play | ROOT | |
| 3 | baseball | dobj | 2 |
| 4 | . | punct | 2 |
This type of vertical format is often used to represent multi-layered token information.
dependency parsing to identify fine-grained features of grammar.
Note that spaCy English model is trained on ClearNLP tag set
| Nominals | Clauses | Modifier words | Function Words | |
|---|---|---|---|---|
| Core arguments | nsubj, obj, iobj |
csubj, ccomp, xcomp |
||
| Non-core dependents | obl, vocative, expl, dislocated |
advcl |
advmod, discourse |
aux, cop, mark |
| Nominal dependents | nmod, appos, nummod |
acl |
amod |
det, clf, case |
| Coordination | Headless | Loose | Special | Other |
|---|---|---|---|---|
conj, cc |
fixed, flat |
list, parataxis |
compound, orphan, goeswith, reparandum |
punct, root, dep |
xcomp
conj
Try filling in the gap in the following table.
| tid | token | dep | head |
|---|---|---|---|
| 1 | I | ||
| 2 | love | ROOT | |
| 3 | beef | ||
| 4 | tongue | ||
| 5 | . | punct | 2 |
Try filling in the gap in the following table.
| tid | token | dep | head |
|---|---|---|---|
| 1 | The | ||
| 2 | cat | ||
| 3 | sleeps | ROOT | |
| 4 | on | ||
| 5 | the | ||
| 6 | mat | ||
| 7 | . | punct | 3 |
Try filling in the gap in the following table.
| tid | token | dep | head |
|---|---|---|---|
| 1 | She | ||
| 2 | quickly | ||
| 3 | reads | ROOT | |
| 4 | interesting | ||
| 5 | books | ||
| 6 | . | punct | 3 |
example_textDescribe complexification strategies:
Visit our webapp
Try the sentences above and analyze their dependencies
Linguistic Data Analysis I