Course Syllabus

Linguistic Data Analysis I - Graduate Course

Author

Masaki EGUCHI, Ph.D.

Modified

August 7, 2025

Course Information

Course Title: Linguistic Data Analysis I
Credits: 2
Format: Intensive 5-day course (15 sessions)
Language: English
Classroom: 113 Lecture room

Instructor Information

Instructor: Masaki Eguchi, Ph.D. 

Course Description

This course introduces the foundations of corpus linguistics and the analysis of learner language through corpus linguistic approaches. It covers key concepts in corpus linguistics, including what corpora are, how they are used to answer (applied) linguistic research questions, and how to design corpus-based analyses to address substantive research questions in second language research. The primary language of analysis in this course is English, but students are encouraged to apply the concepts introduced to the languages they work with in their own research.

Learning Objectives

By the end of this course, students will be able to:

  • Explain what corpus linguistics is and how corpus linguistics can help learn linguistic phenomena
  • Search for and select available corpora relevant to their own research
  • Discuss design issues related to language corpora for specific research purposes
  • Apply introductory corpus linguistic analyses (e.g., frequency analysis, concordancing, POS tagging) to preprocessed corpora
  • Evaluate the benefits and drawbacks of a corpus linguistic approach to linguistic analysis

Course Components

  • Lectures and tutorials
  • Daily Hands-on activities
  • Mini-corpus labs
  • Final project
Navigation

Required Materials

Textbook

  • Durrant, P. (2023). Corpus linguistics for writing development: A guide for research. Routledge. https://doi.org/10.4324/9781003152682

  • Stefanowitsch, A. (2020). Corpus linguistics: A guide to the methodology. Zenodo. https://doi.org/10.5281/ZENODO.3735822 (This is an open source textbook, so it’s freely available online)

Other required/Optional readings are provided through Google Classroom.

Softwares (Free)

Web application for simple text analyses

Concordancing Software

  • AntConc: Corpus analysis toolkit for Concordancing

Lexical Profiling Software

Statistics

  • JASP: Statistical analysis software

Others

Assignments and Grading

You can find detailed information about each assignment in assignments page (under construction).

We have two possible choices for the grade distribution for this course. I will explain the details of each choice in the first day of the course. The class as a whole decides which one we will incorporate in this course. Once decided, all students will follow the decided plan.

Grade Distribution – Option A

  • For option A, pairs of students will develop their own mini-research and present their final project on the final day.
Assignment Percent
Corpus Lab Assignments (4 Γ— 15%) 60%
Class Participation 20%
Final Project 20%

Grade Distribution – Option B

  • For option B, pairs of students will present on Corpus Lab assignments on the final day.
Assignment Percent
Corpus Lab Assignments (4 Γ— 15%) 60%
Class Participation 20%
Final Presentation on Selected Corpus Lab Assignment 20%

Grading Scale

We follow the grading system at Tohoku University.

Grade Range Grade Point
AA 100-90% 4.0
A 89-80% 3.0
B 79-70% 2.0
C 69-60% 1.0
D 59-0% 0.0

Daily Structure

Each day follows this general pattern:

Time Activity
10:30-12:00 Session 1
12:00-13:00 Lunch break
13:00-14:30 Session 2
14:30-14:40 Break
14:40-16:10 Session 3
16:10-17:00 Office Hour (You can ask questions.)

Attendance Policy

  • Due to the intensive nature of the course, attendance and participation are crucial to your success in this course.
  • However, in case of emergency, do not hesitate to reach out to the instructor for possible accomodation. I may be able to accommodate depending on the situation.

Assignment Submission

Deadlines

  • All assignments are due at 10:30 AM on the specified day
  • Late submissions will receive a 10% penalty per day
  • Extensions may be granted for documented emergencies.

Submission Format

  • Submit all assignments via the course management system (Google Classroom)
  • Use the provided templates when available
  • File naming convention: LastName_Assignment#.ext
  • Acceptable formats: .docx, .pdf, .ipynb (for Python notebooks)

Collaboration

  • As this is a very intensive course, collaboration is encouraged to gain most out of the time we spend. I will help each of you during the class time and office hours, but I encourage you to also help each other in:
    • setting up the tools
    • recalling class materials
    • thinking about the approaches to corpus lab
  • However, you MUST write your own write-up of the assignments, meaning that you MUST make outlines, draft, and finalize the written submission by yourselves.

Plagiarism

You must present your own write-up. Do not copy each other’s work or ask someone to write for you.

About the use of AI tools such as ChatGPT, do not use it to make drafts. You can use it to polish your language.

Technology Policy

Required Technology

  • Bring a laptop to every session.
  • Ensure all required software is installed.

Classroom Etiquette

  • Laptops should be used for course activities only

Communication

  • Course related communications will happen via Google Classroom.

Course Announcements

  • Course announcements are made through Google Classroom.

Materials Sharing

  • Materials (e.g., slides) are shared through this website.

Accommodations