Corpus Lab 2

Frequency lists & Lexical richness

Author

Masaki EGUCHI, Ph.D.

Modified

August 4, 2025

Assignment Overview

This assignment aims to help you practice the following skills:

  • Creating a word frequency list based on a corpus of Japanese (as an example of non-English language)
  • Computing and interpreting lexical diversity scores for English text samples
  • Computing and interpreting lexical sophistication indices for English text samples

Assignment Details

Complete the following three tasks. Submit a single word file with write-up of each task, with appendix in specified file formats.

Submit the finished assignment through Google Classroom.

Task 1: A Japanese word frequency List

Goals

The goal of this task is to:

  • Construct a Japanese word frequency list.

Instruction

  • Use Aozora 500 corpus.
  • Create frequency list using TagAnt and AntConc
  • Understand frequency distributions using simple text analyzer.
Submission:
  • A Japanese word frequency list (.txt or .tsv format)
  • Descriptive paragraphs explaining the frequency distributions of Japanese language.
Success Criteria

Your submission …

Task 2: Replication of Figure 4.19 from Durrant with two more recent lexical diversity indices

Goals

The goals of this task are:

  • to compute more recent, robust alternatives to classical indices using TAALED
  • to replicate Durrant’s analysis with two more recent lexical diversity indices

Instructions

  • Complete the hand calculation of lexical diversity indices on the spreadsheet
  • Compute recommended lexical diversity indices — MATTR and MTLD Original — using TAALED.
  • Replicate Figure 4.19 in Durrant (2023, p. 72) with the two indices (i.e., MATTR and MTLD Original)
  • Discuss implication of the findings.
Submission:
  • Spreadsheet file containing hand-calculated lexical diversity scores.
  • Descriptive paragraphs explaining the replication of Durrant’s analysis (300 words).
    • Research question
    • Your hypothesis regarding the replication
    • Plots (one for MTLD; the other for MATTR)
    • Results and interpretation
Success Criteria

Your submission …

Task 3: Qualitative analysis of lexical sophistication

Goal

The goals of this task are to:

  • compute several important lexical sophistication indices
  • compare and contrast two texts using the selected indices
  • describe the use of vocabulary in the two text based on the quantitative and qualitative information

Instructions

  • Two texts from the example used in the classroom
  • Using the simple text analyzer, compare two texts based on two indices that you select.
  • Select one frequency-based index and another type of index.
  • Interpret the results of the analysis and describe the difference in a (few) paragraph(s).
Submission:
  • Plots that contains results of the lexical sophistication analysis.
  • Descriptive paragraph(s) contrasting two texts based on lexical sophistication (200-300 words).
Success Criteria

Your submission …