English Corpora Guide
Online Corpus Interfaces
Overview
The EnglishCorpora.org, formally BYU (Brigham Young University) corpora, provide web-based interfaces to some of the largest and most widely-used corpora in the world. These include COCA (Corpus of Contemporary American English), BNC (British National Corpus), and many others.
You will need a usable account for english-corpora.org on on Day 1.
Available Corpora
Major English Corpora
- COCA (Corpus of Contemporary American English): 1 billion words, 1990-2019
- BNC (British National Corpus): 100 million words
- GloWbE (Global Web-Based English): 1.9 billion words
- NOW (News on the Web): 14+ billion words, updated daily
- COHA (Corpus of Historical American English): 400 million words, 1810-2009
Specialized Corpora
- SOAP (Corpus of American Soap Operas): 100 million words
- TIME (TIME Magazine Corpus): 100 million words, 1920s-2000s
- Wikipedia Corpus: 1.9 billion words
Registration and Access
Free Access
- Visit english-corpora.org
- Click on desired corpus
- Register for free account
- Limited to 20 queries per day
Academic License (Do not purchase for this class)
- Extended query limits
- Download capabilities
- Available through institution
Search hints
A list of corpora
| Corpus | Size | Regions | Time | Genre |
|---|---|---|---|---|
| IWEB | 13.9b | 6 | 2017 | Web |
| NOW | 16.2b | 20 | 2010-now | Web: News |
| CORONA | 1.58b | 20 | 2020-now | Web: News |
| GLOWBE | 1.9b | 20 | 2012-13 | Web/blogs |
| WIKI | 1.9b | (+) | 2014 | Wikipedia |
| COCA | 1.0b | Am | 1990-2019 | Balanced |
| COHA | 400m | Am | 1810-2009 | Balanced |
| TV | 325m | 6 | 1950-2018 | TV shows |
| MOVIES | 200m | 6 | 1930-2018 | Movies |
| SOAP | 100m | Am | 2001-2012 | TV shows |
| HANSARD | 1.6b | Br | 1803-2005 | Parliament |
| EEBO | 755m | Br | 1470s-1690s | Various |
| SUP CRT | 130m | Am | 1790s-2010s | Legal |
| TIME | 100m | Am | 1923-2006 | Magazine |
| BNC | 100m | Br | 1980s-1993 | Balanced |
| CAN | 50m | Can | 1970s-2000s | Balanced |
| CORE | 50m | 6 | 2014 | Web |