English Corpora Guide

Online Corpus Interfaces

Author

Masaki EGUCHI, Ph.D.

Modified

August 2, 2025

Overview

The EnglishCorpora.org, formally BYU (Brigham Young University) corpora, provide web-based interfaces to some of the largest and most widely-used corpora in the world. These include COCA (Corpus of Contemporary American English), BNC (British National Corpus), and many others.

You will need a usable account for english-corpora.org on on Day 1.

Available Corpora

Major English Corpora

  • COCA (Corpus of Contemporary American English): 1 billion words, 1990-2019
  • BNC (British National Corpus): 100 million words
  • GloWbE (Global Web-Based English): 1.9 billion words
  • NOW (News on the Web): 14+ billion words, updated daily
  • COHA (Corpus of Historical American English): 400 million words, 1810-2009

Specialized Corpora

  • SOAP (Corpus of American Soap Operas): 100 million words
  • TIME (TIME Magazine Corpus): 100 million words, 1920s-2000s
  • Wikipedia Corpus: 1.9 billion words

Registration and Access

Free Access

  1. Visit english-corpora.org
  2. Click on desired corpus
  3. Register for free account
  4. Limited to 20 queries per day

Academic License (Do not purchase for this class)

  • Extended query limits
  • Download capabilities
  • Available through institution

Search hints

A list of corpora

Corpus Size Regions Time Genre
IWEB 13.9b 6 2017 Web
NOW 16.2b 20 2010-now Web: News
CORONA 1.58b 20 2020-now Web: News
GLOWBE 1.9b 20 2012-13 Web/blogs
WIKI 1.9b (+) 2014 Wikipedia
COCA 1.0b Am 1990-2019 Balanced
COHA 400m Am 1810-2009 Balanced
TV 325m 6 1950-2018 TV shows
MOVIES 200m 6 1930-2018 Movies
SOAP 100m Am 2001-2012 TV shows
HANSARD 1.6b Br 1803-2005 Parliament
EEBO 755m Br 1470s-1690s Various
SUP CRT 130m Am 1790s-2010s Legal
TIME 100m Am 1923-2006 Magazine
BNC 100m Br 1980s-1993 Balanced
CAN 50m Can 1970s-2000s Balanced
CORE 50m 6 2014 Web