Corpus Lab 3
Mini-research on vocabulary and multiword units
Assignment Overview
This assignment aims to help you practice the following skills:
Constructing lists of formulaic language through concordance software- Planning and conducting a small-scale corpus study using single- and multi-word indices
Assignment Details
Task 1: Describing statistical characteristics of collocations (4 points)
In the first task, I would like you to calculate major Strengths Of Association (SOA) measures to quantify the association between two words (node words and their collocates.)
The frequency of node words, their collocates and entire corpus size will be given to you.
Your task is to calculate T-score, MI, MI^2, and LogDice.
A spreadsheet file with SOA values.A word file (.docx) for plots and prose descriptions.
Your submission …
contains accurate T-score, MI, MI^2 and LogDice scoresprovides visualization of the relations between SOA indicesdescribe the relationships among SOA indices and typical collocations
Task 2 & 3: Mini-research project (8 points altogether 12 points altogether)
The task 2 and 3 are related to the mini-research project.
In this part of the assignment, you will conduct a mini-research project to describe uses of single- and multi-word units in a corpus you choose.
Specifically, you will:
- select lexical richness or phraseological sophistication indices to answer a set of research questions
- analyze the chosen corpus with the selected indices
- present the results and interpretation in a written prose
The final report are one-to-two page lengths report.
- Short background and Research Questions (one paragraph)
- Method section
- Corpus descriptions (one paragraph)
- Index descriptions (one paragraph)
- Analysis (one paragraph)
- Research hypothesis (one paragraph)
- Results (data interpretation and commentary)
- Figures or statistical report
- Conclusion
Assignment Guideline
Step 1: Construct research questions
In this type of research, researchers typically set RQs about the relationships between lexical characteristncs and variables that defines subsection of the corpus (e.g., grade, genre, or proficieincy score).
The following information is available through the GiG corpus:
The following information is available through the ICNALE corpus:
- Ratings performed by external raters
Step 2: Understand and choose the corpus
In this assignment, please choose one of the following corpora:
- Growth in Grammar (GiG) corpus (Durrant, 2023)
- ICNALE corpus (Edited Essay OR GRA)
- Some Japanese corpus here (Ask Masaki about availability).
Step 3: Construct hypothesis
Based on what you’ve learned about the vocabulary use of learner, state several hypotheses that you expect as the findings for the research question.
In other words, what do you expect as the relationship between lexical characteristics X and external variable Y?
Step 4: Select index
Based on the RQs and hypotheses, you will select indices that can capture the lexical characteristics X in your corpus.
Step 5: Compute the index
You will now use the tools we have covered in this course to derive lexical richness scores for the text.
Step 6: Conduct analysis
To answer the research questions, you may want to do the followings: - Obtain descriptive statistics of the lexical richness indices - Visualize the relationship between variables - Optionally run statistical analyses
Step 7: Interpret and write-up the results
You will write-up what you found in your mini research in a one-to-two page short report.
Your submission …