Catalog of corpora and other data from the Linguistic Data Consortium (LDC). All LDC data licensed by the UW Department of Linguistics is accessible via the CompLing Database.
Widely-used online corpora for English, suited for many uses by teachers, researchers, and companies. Includes Corpus of Contemporary American English (COCA), Corpus of Historical American English (COHA), British National Corpus (BNC), and many more.
"... provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum."
"... site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition."
Database Downloads
Various online language and linguistics databases provide free downloads of their data. You can download the full database of these online resources, among others: