By the end of this session, students will be able to:
- Define extraction rules to identify fine-grained grammatical features in language
- Conduct analysis using a template Python code or web application provided by the instructor.
Describe complexification strategies:
Visit our webapp
Try the sentences above and analyze their dependencies
In assignment 4, you will conduct a grammatical analysis on a corpus combining POS tagger and dependency parser.
You will be able to: - extract fine-grained grammatical features from either a Japanese or an English corpus. - write a short report describing the results and interpretation of the analysis results.
In this notebook, the following analysis pipeline is implemented for you.
file path to yout corpus files.adjectival modifier (amod)From Table 5.1 in Durrant (2023, p. 102), pick one or two sentences.
that-clause complement, you will first look for ccomp and look if the ccomp has mark that is that.Some useful token information are following:
| code | what it does | example |
|---|---|---|
| token.lemma_ | lemmatized form | be, child |
| token.pos_ | simple POS (Universal Dependency) | NOUN, VERB |
| token.tag_ | fine-grained POS (PennTag set) | NN, JJ, VB, BBZ |
| token.dep_ | dependency type | amod, advmd |
| token.head | token information of the head of the dependency |
In pair, brainstorm 3 - 5 grammatical constructions you would like to identify in your Corpus Lab.
The final corpus lab is about syntactic features.
In this task you will describe research questions, hypothesis, and methods.
amod for dependency label to extract adjective + noun phrase.Once you articulated the information above, you will now conduct a search over the corpus.
You should use either simple text analyzer or your own Colab Notebook.
Submission
.docx file) that addresses requirements in a written format (one or two pages depending on your analysis results.).
.ipynb file) with extraction code and results.Success Criteria
Your submission …
Linguistic Data Analysis I