wiki:LanguageModelSources
Last modified 10 years ago Last modified on 09/21/08 20:42:41

DIY corpus using Search Engines (like Google)

Natural Language Toolkit (NLTK)

ARPA

Possible sources of written data (written corpora) for the creation of Language Models

Other Sources but with Licensing Restrictions

Multilingual Copora