Day 3: Multiword Units and Collocations
Exploring formulaic language
Overview
Day 3 explores multiword units, collocations, and statistical measures for analyzing word combinations in corpus linguistics.
Key Concepts
- Types of multiword units (collocation, n-grams, lexical bundles)
- Association strengths (t-score, Mutual Information, LogDice)
- Context window vs dependency bigram approaches
- n-gram search and window-based collocation search
- Linear regression analysis for corpus data
Preparation
Before Day 3:
- Read:
- Durrant (2023) Ch. 7
- Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in Corpus‐Based Language Learning Research. Language Learning, 67(S1), 155–179.
- Skim:
- Durrant (2023) Ch. 8 (Ignore R codes if you are not familiar)
- Eguchi & Kyle (2020) - review if needed
Schedule
| Time | Activity |
|---|---|
| 10:30-12:00 | Session 7: Multiword Units — Conceptual Overview |
| 12:00-13:00 | Lunch |
| 13:00-14:30 | Session 8: Hands-on Collocation Analysis |
| 14:30-14:40 | Break |
| 14:40-16:10 | Session 9: Learner Corpus Mini-Research |
| 16:10-17:00 | Office Hour (You can ask questions.) |
Assignments
- Due 8/7 (Thu): Corpus Lab Assignment 3
- Prepare mini-project research topic and questions for presentation