Building Custom NLP Tools to Annotate Discourse-Functional Features for Second Language Writing Research: A Tutorial

Research Methods in Applied Linguistics

Rhetorical Analysis
Corpus Annotation
NLP model training
Author

Eguchi, M., Kyle, K.

Published

December 1, 2024

Doi

Abstract

The current tutorial paper describes a process of developing a custom natural language processing model with a particular focus on a discourse annotation task. After an overview of recent developments in natural language processing (NLP), the paper discusses the development of the Engagement Analyzer ((Eguchi & Kyle, 2023)), focusing on corpus annotation, the machine learning model, model training, evaluation, and dissemination. A step-by-step tutorial of this process via the spaCy Python package is provided. The paper highlights the feasibility of developing custom NLP tools to enhance the scalability and replicability of the annotation of context-sensitive linguistic features in L2 writing research.

APA reference

Eguchi, M., & Kyle, K. (2023). Span Identification of Epistemic Stance-Taking in Academic Written English. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), 429–442. https://doi.org/10.18653/v1/2023.bea-1.35
Eguchi, M., & Kyle, K. (2024). Building custom NLP tools to annotate discourse-functional features for second language writing research: A tutorial. Research Methods in Applied Linguistics, 3(3), 100153. https://doi.org/10.1016/j.rmal.2024.100153