Session 12: Hands-on Activity

Masaki EGUCHI, Ph.D.

Session overview

🎯 Learning Objectives

By the end of this session, students will be able to:

  • Define extraction rules to identify fine-grained grammatical features in language
  • Conduct analysis using a template Python code or web application provided by the instructor.

In what way are these sentence complex?

Describe complexification strategies:

  • She hopes to join an international research team after graduation.
  • Experts agree that collaboration improves problem-solving efficiency.
  • Students often struggle because they lack sufficient guidance.
  • He succeeded in the most demanding and competitive program at the university.
  • The growing influence of social media on youth behavior is concerning.
  • Policies that encourage innovation are essential for economic growth.

Let’s parse the sentence.

  • Visit our webapp

  • Try the sentences above and analyze their dependencies

Dependency collocation

Dependency-based collocation

Assignment 4

In assignment 4, you will conduct a grammatical analysis on a corpus combining POS tagger and dependency parser.

You will be able to: - extract fine-grained grammatical features from either a Japanese or an English corpus. - write a short report describing the results and interpretation of the analysis results.

Let’s start

Colab Notebook

Introduction

  • The notebook is very basic version of what TAASSC can do for you in Kyle & Crossley (2018).
  • This is meant for educational use; for research more rigorous approach may be needed to identify T-units more consistently.

Algorithm used in the notebook

In this notebook, the following analysis pipeline is implemented for you.

  • Your input is file path to yout corpus files.
  • The current code loads the corpus files onto colab.
  • It then iterate through the corpus files one by one.

Algorithm used in the notebook (cont’d)

  • Parse the sentence using spacy
  • Conduct basic analysis (such as calculating the number of tokens, sentences, etc.)
  • Count the number of specific grammatical structures (MAIN FEATURE)
  • Store the results into a Python dictionary
  • After every corpus file is processed, it can create a dataset to export.
  • You can export the results for further analysis

Let’s now work on the notebook

  • Let’s first identify adjectival modifier (amod)

coming up with extraction rules

From Table 5.1 in Durrant (2023, p. 102), pick one or two sentences.

  • Using POS/ Dependency parsing option in simple text analyzer, try to describe how to identify those structure.
  • For example, to identify that-clause complement, you will first look for ccomp and look if the ccomp has mark that is that.

spaCy token information

Some useful token information are following:

code what it does example
token.lemma_ lemmatized form be, child
token.pos_ simple POS (Universal Dependency) NOUN, VERB
token.tag_ fine-grained POS (PennTag set) NN, JJ, VB, BBZ
token.dep_ dependency type amod, advmd
token.head token information of the head of the dependency

Thinking grammartically

In pair, brainstorm 3 - 5 grammatical constructions you would like to identify in your Corpus Lab.

  • Describe the grammatical feature
  • Give some examples that fall under the grammatical construction.
  • Explain why you are interested in.

Corpus Lab 4

Overview

The final corpus lab is about syntactic features.

  • Using the colab notebook and python, you will extract fine-grained grammatical features that might distinguish scores in ICNALE.
  • You will extract the features on colab notebook.
  • Then you export the dataset and plot the relationships with the ICNALE score.

Task 1: Research questions, Hypotheses and Methods

In this task you will describe research questions, hypothesis, and methods.

Research questions

  • Research questions should include:
    • type of features you are looking at (e.g., adverbial clauses)
    • situational variables that defines your sub-corpora (e.g., grade, genre, proficiency)

Hypothesis

  • Your research hypothesis should:
    • describe your predictions in terms of:
      • quantitative trends of the feature in relation to the factor you are interested in.

Task 2: Definitions and operationalization of grammatical features to extract

  • You must describe the specific grammatical features that you plan to extract.
  • For example, for clausal features you need to specify if you are interested in :
    • subordinate clauses or embedded clauses
    • particular type of clauses
  • Description of rules to identify desirable linguistic feature.
    • For example, you will need to specify amod for dependency label to extract adjective + noun phrase.

Fine-grained Descriptive grammatical features

  • Once you articulated the information above, you will now conduct a search over the corpus.

  • You should use either simple text analyzer or your own Colab Notebook.

    • I will specify which option should be used by the time we start working on this assignment (that is, this depends on your progress as a group.)

Task 3: Results and interpretation

  • Provide the results of your corpus analysis in a way you think is most effective to address your research questions. Make effective use of tables, plots, or other data presentation technique as you think.
  • Provide descriptive paragraphs to walk the reader through the results and how to interprete that results.

Submission

  • A word file (.docx file) that addresses requirements in a written format (one or two pages depending on your analysis results.).
  • IF you use colab, Google Colab notebook (.ipynb file) with extraction code and results.

Success Criteria

Your submission …