Corpus of Contemporary American English

A more than 560-million-word corpus of American English

(Learn how and when to remove this message)

The Corpus of Contemporary American English (COCA) is a one-billion-word corpus[1] of contemporary American English. It was created by Mark Davies, retired professor of corpus linguistics at Brigham Young University (BYU).[2][3]

Content

The Corpus of Contemporary American English (COCA) is composed of one billion words as of November 2021.[1][2][4] The corpus is constantly growing: In 2009 it contained more than 385 million words;[5] In 2010 the corpus grew in size to 400 million words;[6] By March 2019,[7] the corpus had grown to 560 million words.[7]

As of November 2021, the Corpus of Contemporary American English is composed of 485,202 texts.[4] According to the corpus website,[4] the current corpus (November 2021) is composed of texts that include 24-25 million words for each year 1990–2019.

For each year contained in the corpus (1990–2019), the corpus is evenly divided between six registers/genres: TV/movies, spoken, fiction, magazine, newspaper, and academic (see Texts and Registers page of the COCA website). In addition to the six registers that were previously listed, COCA (as of November 2021) also contains 125,496,215 words from blogs, and 129,899,426 from websites, making it a corpus that is truly composed of contemporary English (see Texts and Register page of COCA).[4]


The texts come from a variety of sources:

Availability

The Corpus of Contemporary American English is free to search for registered users.

Queries

Related

The corpus of Global Web-based English (GloWbE; pronounced "globe") contains about 1.9 billion words of text from twenty different countries. This makes it about 100 times as large as other corpora like the International Corpus of English, and it allows for many types of searches that would not be possible otherwise. In addition to this online interface, you can also download full-text data from the corpus.

It is unique in the way that it allows one to carry out comparisons between different varieties of English. GloWbE is related to the many other corpora of English.[8]

See also

References

  1. ^ a b Milana, Prior (2021). A Comparative Corpus Study on Intensifier Usage across Registers in American English (Thesis).
  2. ^ a b "Mark Davies, Professor of (Corpus) Linguistics, Brigham Young University (BYU)". www.mark-davies.org. Retrieved November 9, 2021.
  3. ^ Kauhanen, Henri (March 21, 2011). "The Corpus of Contemporary American English: Background and history". VARIENG. Retrieved October 13, 2011.
  4. ^ a b c d "Homepage". corpus of Contemporary American English. Retrieved April 24, 2022.
  5. ^ Davies, Mark (January 1, 2009). "The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights". International Journal of Corpus Linguistics. 14 (2): 159–190. doi:10.1075/ijcl.14.2.02dav. ISSN 1384-6655.
  6. ^ Davies, Mark (December 1, 2010). "The Corpus of Contemporary American English as the first reliable monitor corpus of English". Literary and Linguistic Computing. 25 (4): 447–464. doi:10.1093/llc/fqq018. ISSN 0268-1145.
  7. ^ a b Davies, Mark; Kim, Jong Bok (March 1, 2019). "The advantages and challenges of "big data": Insights from the 14 billion word iWeb corpus". Linguistic Research. 36 (1): 1–34. doi:10.17250/khisli.36.1.201903.001. ISSN 1229-1374. S2CID 133013527.
  8. ^ "Corpus of Web-Based Global English". www.english-corpora.org. Retrieved December 18, 2019.

Further reading

  • Anderson, Wendy; Corbett, John (2009). Exploring English with Online Corpora. Palgrave Macmillan. p. 205. ISBN 978-0-230-55140-4.
  • Bennett, Gena R. (2010). Using Corpora in the Language Learning Classroom: Corpus Linguistics for Teachers. Ann Arbor, Michigan: University of Michigan. p. 144. ISBN 978-0-472-03385-0.
  • Davies, Mark (2005). "The advantage of using relational databases for large corpora: Speed, advanced queries, and unlimited annotation". International Journal of Corpus Linguistics. 10 (3). John Benjamins Publishing Company: 307–334(28). doi:10.1075/ijcl.10.3.02dav.
  • Davies, Mark (2010). "More than a peephole: Using large and diverse online corpora". International Journal of Corpus Linguistics. 15 (3): 405–411. doi:10.1075/ijcl.15.3.13dav.
  • Lindquist, Hans (2009). Corpus Linguistics and the Description of English. Edinburgh University Press. ISBN 978-0-7486-2615-1.

External links

  • v
  • t
  • e
Corpus linguistics
Text corpora,
English
Text corpora,
non-English
Organizations
  • v
  • t
  • e
Dictionaries of English
Old and Middle English
Historic
British English
American English
Canadian English
Australian English
Online
Learners / ESL