Day 3: Multiword Units and Collocations

Exploring formulaic language

Author

Masaki EGUCHI, Ph.D.

Modified

August 4, 2025

Overview

Day 3 explores multiword units, collocations, and statistical measures for analyzing word combinations in corpus linguistics.

Key Concepts

  • Types of multiword units (collocation, n-grams, lexical bundles)
  • Association strengths (t-score, Mutual Information, LogDice)
  • Context window vs dependency bigram approaches
  • n-gram search and window-based collocation search
  • Linear regression analysis for corpus data

Preparation

Before Day 3:

  • Read:
    • Durrant (2023) Ch. 7
    • Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in Corpus‐Based Language Learning Research. Language Learning, 67(S1), 155–179.
  • Skim:
    • Durrant (2023) Ch. 8 (Ignore R codes if you are not familiar)
    • Eguchi & Kyle (2020) - review if needed

Schedule

Time Activity
10:30-12:00 Session 7: Multiword Units — Conceptual Overview
12:00-13:00 Lunch
13:00-14:30 Session 8: Hands-on Collocation Analysis
14:30-14:40 Break
14:40-16:10 Session 9: Learner Corpus Mini-Research
16:10-17:00 Office Hour (You can ask questions.)

Assignments

Reflection