Available Corpora
Placeholder
Content to be added.
Masaki EGUCHI, Ph.D.
July 23, 2025
Content to be added.
---
title: "Available Corpora"
---
## Placeholder
Content to be added.
- [CorpusMate](https://corpusmate.com/info)
- [Montclair State University - CORAL lab](https://www.coralcorpuslab.com)
# English corpora
# Japanese corpus resources
## Japanese corpus
- [NINJAL-LWP for TWC](https://tsukubawebcorpus.jp/)
- [名大会話コーパス](https://mmsrv.ninjal.ac.jp/nucc/)
- [日本語対話コーパス一覧](https://individuality.jp/dialogue_corpus.html)
- [関西弁コーパス](https://sites.google.com/view/kvjcorpus/%E3%83%9B%E3%83%BC%E3%83%A0?authuser=0)
- [Aozorabunko-data](https://huggingface.co/datasets/globis-university/aozorabunko-clean)
## Vocabulary list
- [現代日本語書き言葉均衡コーパス(BCCWJ) 公開データ](https://clrd.ninjal.ac.jp/bccwj/bcc-chu.html)
- [日本語話し言葉コーパス(CSJ)](https://clrd.ninjal.ac.jp/csj/chunagon.html)
- [国語研日本語ウェブコーパス](https://www.gsk.or.jp/catalog/gsk2020-d/)
## Corpus data
- [wortschatz corpus](https://wortschatz.uni-leipzig.de/en/download/Japanese#:~:text=News%20Year%20%20Country%20,10K%2030K%20100K%20300K%201M)
## Databases
- [Collection of age of acquisition ratings for over 5,000 Japanese words](https://osf.io/fawmq/files/osfstorage)
- [JALEX: Japanese Version of Lexical Decision Database](https://osf.io/qr2sg/)
- [AWD-J: AWD-J: Abstractness of Word Database for Japanese common words](https://sociocom.naist.jp/awd-j/)