Basic Concordancing
This assignment aims to help you practice the following skills:
In this task, you are asked to articulate your research question and hypothesis for your first corpus assignment.
In this task, you are asked to describe the methods choice of your corpus search. This should include justifications for the corpora used, type of corpus search used.
LIST, Chart, Collocates, Compare and KWIC.In this part of the assignment, you are asked to describe the search results and provide interpretations on the findings.
Corpus search results: Provide some numbers or discourse samples based on your corpus search. This can be frequency list, table with frequency counts, or a copy of KWIC results.
Interpretation: Write a paragraph, providing some interpretations of your findings.
Success Criteria
Your submission …
Remember from Session 2:
Operationalization
“an explicit and unambiguous description of a set of operations that are performed to identify and measure that construct.” (Stefanowitsch, 2020, p. 77)
Measure of height
Let’s operationalize a foundational concept in linguistic research, word.
Concept to operationalize: The word “run”
What is possible operationalization?
That is, what search term would you like to enter below?
Search field
run264014).Search - run
runs, ran, and running? Now we recognized that the word run was just one of the possible forms of the headword run.
If we were thinking to retrieve all the occurence of the word “run” then we were imprecise.
What should we do?
Lemma: A headword and all the inflected form derived from it.
RUN:
Searching lemma - run
Search result
Now you might wonder:
→ This is even more precise operationalization of word by its grammatical category.
You can use POS tag like the following .
| Category | Simple tag | Symbol | Example |
|---|---|---|---|
| Common noun | NOUN | N | run_N |
| Proper Nouns | NAME | NP | Sendai_N |
| All nouns | NOUN+ | N+ | sun_N+, Sendai_N+ |
| Lexical verbs | VERB | V | run_V |
| All verbs | VERB+ | V | run_V+, do_V+ |
| Category | Simple tag | Symbol | Example |
|---|---|---|---|
| Adjectives | ADJ | J | simple_J |
| Adverbs | ADV | R | clear_R |
See this page for more information of POS in English-Corpora.org
run used as nounrun that are used as nouns.run used as nounSearch: run_N
run - as noun
KWIC = Key Word In Context
KWIC will give us insights into how each word is used in context.
SEARCH window.KWIC search
KWIC search
KWIC search
KWIC search
You can sort the words.
KWIC search
Sorting helps identify: - Common phrases - Grammatical patterns
- Semantic preferences
Choose a word that you want to see the context for.
Search the word with KWIC.
Sort the word in the following way.
KWIC search
Search “prove” (lemma) and sort by: 1. R1 (what follows “prove”?) 2. L1 (what precedes “prove”?)
What patterns do you notice?
Research question: How does frequency of lol change across time?
CHART shows frequency across:
Genres
Time periods
Text types
This allows you to return frequency by sections of of COCA (= conditional frequency).
lol using CHARTCHART and enter lollol using CHARTSearch - lol
What does this tell us about “lol”?
Genre preferences?
Formality levels?
Change over time?
Describe the frequency pattern looking at PER MIL row.
CHART function can be used to get frequency across:
Research question: What is most frequent word that ends with derivational mopheme -ness?
Regular expression (正規表現) allows you to search corpus through “pattern matching”.
Any idea for operationalizing derivational morphemes?
*ness*nessSearch results *ness
Let’s go back to List search.
Enter “a * of the”
What result do you expect with this search?
Don’t turn to the next page YET!!!
a * of the
This helps find:
Which is more frequent?
exciting or excited?
RQ: Is there any reletitive phrase with three adjectives?
three adjectives
collocates search allows us to search for co-occurring words within specified window.collocatescollocation
entering words
enter collocates
window
| Method | Purpose | Example | Key Features |
|---|---|---|---|
| Simple Word | Find exact word forms | run → finds only “run” |
- Case sensitive - Single form only |
| LEMMA | Find all forms of a word | RUN → finds “run, runs, ran, running” |
- Use CAPITAL letters - Includes all inflections |
| POS Tag | Find words by grammatical category | run_N (noun)run_V (verb) |
- Disambiguates word classes - Use underscore + tag |
| Method | Purpose | Example | Key Features |
|---|---|---|---|
| KWIC | View words in context | Any search term | - Sort by L1, R1, etc. - Find patterns in usage |
| CHART | Track frequency across categories | lol across time/genres |
- Shows distribution - Genre/time comparisons |
| Wildcards | Pattern matching | *ness → “happiness”a * of the → “a lot of the” |
- * = any characters- Find lexical patterns |
| Collocates | Find co-occurring words | Words near target | - Specify window size - Statistical associations |
Success Criteria
Your submission …
What patterns in English interest you?
Consider:
Linguistic Data Analysis I