Homepage

Embedding Datasets Download


This page provides for downloading the Tencent AI Lab Chinese and English Term Embedding Corpora.

Latest Version for Chinese

The lastest version is v0.2.0, which was released on Dec 24, 2021.

Version Dimension Vocab. Size Download Url Description
v0.2.0 200 Small (2,000,000) tencent-ailab-embedding-zh-d200-v0.2.0-s.tar.gz Original size: 3.6G; tar.gz size: 1.5G
Large (12,287,936) tencent-ailab-embedding-zh-d200-v0.2.0.tar.gz Original size: 22GB; tar.gz size: 9.0G
100 Small (2,000,000) tencent-ailab-embedding-zh-d100-v0.2.0-s.tar.gz Original size: 1.8G; tar.gz size: 763M
Large (12,287,936) tencent-ailab-embedding-zh-d100-v0.2.0.tar.gz Original size: 12GB; tar.gz size: 4.7G

Information of version v0.2.0:

Main updates of this version:

Latest Version for English

The lastest version is v0.1.0, which was released on Sep 15, 2022. The instruction of parsing phrases with URL encoding into their original forms can be found in Q4 in FAQ.

Version Dimension Vocab. Size Download Url Description
v0.1.0 200 Small (2,000,000) tencent-ailab-embedding-en-d200-v0.1.0-s.tar.gz Original size: 3.6G; tar.gz size: 1.5G
Large (6,596,681) tencent-ailab-embedding-en-d200-v0.1.0.tar.gz Original size: 12GB; tar.gz size: 4.8G
100 Small (2,000,000) tencent-ailab-embedding-en-d100-v0.1.0-s.tar.gz Original size: 1.8G; tar.gz size: 763M
Large (6,596,681) tencent-ailab-embedding-en-d100-v0.1.0.tar.gz Original size: 6GB; tar.gz size: 2.5G

Information of version v0.1.0:


History version download

  • v0.1.0 (Chinese)