In simple terms, a corpus is a collection of texts that is stored electronically. This corpus, or a set of corpora, can then be used to see how writers and speakers use words in context.
Corpus tools allow you to see how a certain word is used in context, the frequency of the word, the company the word keeps (or collocation), the register, genre, or discipline it is frequently used in, and so on. They can also be used to submit your own paper to see how you are using certain words.
Here is a list of some great corpora and corpus tools:
- Academic Vocabulary Lists: This site includes the academic vocabulary lists of English based on the academic sub-corpus of the Corpus of Contemporary American English.
- British National Corpus: This is a huge corpus of written and spoken texts in British English.
- Corpus of Contemporary American English (COCA): This is a huge corpus of written and spoken texts in American English.
- COCA Words and Phrases [Academic]: This site is the academic sub-corpus of COCA. It includes the Academic Vocabulary List. It also allows you to see how academic English words are used in different disciplines, and it allows you to enter your own text so that you can see the types of academic and technical words you are using in your paper.
- Google Book Corpus (American & British English): This is a huge corpus of Google books in both American and British English.
- Lextutor Concordancer: This tool allows you to submit your paper, see a list of the different words you use in your paper and their frequencies, and see concordance lines showing keywords in context.
- Michigan Corpus of Academic Spoken English (MICASE): This corpus is a collection of various speech events in the university setting including lectures, lab sections, seminars, dissertation defenses, and advising sessions.
- Michigan Corpus of Upper-Level Student Papers (MICUSP): This corpus is a collection of upper-level undergraduate and graduate student papers from various disciplines.
- Netspeak: This site is allows you to search for words, their frequencies, and synonyms. Netspeak uses the World Wide Web to search for words and determines frequency of use.