Supported Languages

Trainable Languages

In the table below we show 100 languages that users can use the training data of such languages to train their own pipelines with Trankit.

Trainable Languages

Afrikaans

Estonian

Kyrgyz

Sindhi

Albanian

Filipino

Lao

Sinhala

Amharic

Finnish

Latin

Slovak

Arabic

French

Latvian

Slovenian

Armenian

Galician

Lithuanian

Somali

Assamese

Georgian

Macedonian

Spanish

Azerbaijani

German

Malagasy

Sundanese

Basque

Greek

Malay

Swahili

Belarusian

Gujarati

Malayalam

Swedish

Bengali

Hausa

Marathi

Tamil

Bengali

Hebrew

Mongolian

Tamil Romanized

Bosnian

Hindi

Nepali

Telugu

Breton

Hindi Romanized

Norwegian

Telugu Romanized

Bulgarian

Hungarian

Oriya

Thai

Burmese

Icelandic

Oromo

Turkish

Burmese

Indonesian

Pashto

Ukrainian

Catalan

Irish

Persian

Urdu

Chinese (Simplified)

Italian

Polish

Urdu Romanized

Chinese (Traditional)

Japanese

Portuguese

Uyghur

Croatian

Javanese

Punjabi

Uzbek

Czech

Kannada

Romanian

Vietnamese

Danish

Kazakh

Russian

Welsh

Dutch

Khmer

Sanskrit

Western Frisian

English

Korean

Scottish Gaelic

Xhosa

Esperanto

Kurdish (Kurmanji)

Serbian

Yiddish

Pretrained Languages & Their Code Names

Trankit provides 90 pretrained pipelines for 56 languages. Each pretrained pipeline is associated with a treebank that it is trained on. Below we show the 56 pretrained languages, their corresponding treebanks, and the code names to initialize pretrained pipelines. The pretrained pipelines can be directly downloaded by clicking on their code names in the table below.

Note that, the names of the default treebanks are put inside the brackets []. For example, English has 4 treebanks, which are UD_English-EWT, UD_English-GUM, UD_English-LinES, and UD_English-ParTUT. The treebank UD_English-EWT is put inside a bracket [], so it is the default treebank for English. Looking at the following table, we can select the appropriate code name and follow the instructions here to initialize a pipeline. For example, if we want to initialize a pipeline that is trained on the default treebank UD_English-EWT, we can use the code name english; to initialize a pipeline that is trained on a non-default treebank such as UD_English-GUM, we can use english-gum.

Language

Treebank

Code Name (for pipeline initialization)

Requires MWT expansion?

Treebank License

Treebank Documentation

Afrikaans

[UD_Afrikaans-AfriBooms]

afrikaans

https://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/af_afribooms/index.html

Ancient Greek

[UD_Ancient_Greek-PROIEL]

ancient-greek

http://creativecommons.org/licenses/by-nc-sa/3.0/

http://creativecommons.org/licenses/by-nc-sa/3.0/

UD_Ancient_Greek-Perseus

ancient-greek-perseus

http://creativecommons.org/licenses/by-nc-sa/2.5/

https://universaldependencies.org/treebanks/grc_perseus/index.html

Arabic

[UD_Arabic-PADT]

arabic

Yes

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/ar_padt/index.html

Armenian

[UD_Armenian-ArmTDP]

armenian

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/hy_armtdp/index.html

Basque

[UD_Basque-BDT]

basque

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/eu_bdt/index.html

Belarusian

[UD_Belarusian-HSE]

belarusian

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/be_hse/index.html

Bulgarian

[UD_Bulgarian-BTB]

bulgarian

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/bg_btb/index.html

Catalan

[UD_Catalan-AnCora]

catalan

Yes

https://www.gnu.org/licenses/gpl-3.0.en.html

https://universaldependencies.org/treebanks/ca_ancora/index.html

Chinese (simplified)

[UD_Simplified_Chinese-GSDSimp]

chinese

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/zhs_gsdsimp/index.html

Chinese (traditional)

[UD_Chinese-GSD]

traditional-chinese

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/zh_gsd/index.html

Chinese (classical)

[UD_Classical_Chinese-Kyoto]

classical-chinese

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/lzh_kyoto/index.html

Croatian

[UD_Croatian-SET]

croatian

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/hr_set/index.html

Czech

UD_Czech-CAC

czech-cac

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/cs_cac/index.html

UD_Czech-CLTT

czech-cltt

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/cs_cltt/index.html

UD_Czech-FicTree

czech-fictree

Yes

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/cs_fictree/index.html

[UD_Czech-PDT]

czech

Yes

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/cs_pdt/index.html

Danish

[UD_Danish-DDT]

danish

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/da_ddt/index.html

Dutch

[UD_Dutch-Alpino]

dutch

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/nl_alpino/index.html

UD_Dutch-LassySmall

dutch-lassysmall

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/nl_lassysmall/index.html

English

[UD_English-EWT]

english

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/en_ewt/index.html

UD_English-GUM

english-gum

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/en_gum/index.html

UD_English-LinES

english-lines

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/en_lines/index.html

UD_English-ParTUT

english-partut

Yes

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/en_partut/index.html

Estonian

[UD_Estonian-EDT]

estonian

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/et_edt/index.html

UD_Estonian-EWT

estonian-ewt

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/et_ewt/index.html

Finnish

UD_Finnish-FTB

finnish-ftb

Yes

http://creativecommons.org/licenses/by/4.0/

https://universaldependencies.org/treebanks/fi_ftb/index.html

[UD_Finnish-TDT]

finnish

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/fi_tdt/index.html

French

[UD_French-GSD]

french

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/fr_gsd/index.html

UD_French-ParTUT

french-partut

Yes

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/fr_partut/index.html

UD_French-Sequoia

french-sequoia

Yes

http://infolingu.univ-mlv.fr/DonneesLinguistiques/Lexiques-Grammaires/lgpllr.html

https://universaldependencies.org/treebanks/fr_sequoia/index.html

UD_French-Spoken

french-spoken

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/fr_spoken/index.html

Galician

[UD_Galician-CTG]

galician

Yes

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/gl_ctg/index.html

UD_Galician-TreeGal

galician-treegal

Yes

http://infolingu.univ-mlv.fr/DonneesLinguistiques/Lexiques-Grammaires/lgpllr.html

https://universaldependencies.org/treebanks/gl_treegal/index.html

German

[UD_German-GSD]

german

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/de_gsd/index.html

UD_German-HDT

german-hdt

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/de_hdt/index.html

Greek

[UD_Greek-GDT]

greek

Yes

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/el_gdt/index.html

Hebrew

[UD_Hebrew-HTB]

hebrew

Yes

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/he_htb/index.html

Hindi

[UD_Hindi-HDTB]

hindi

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/hi_hdtb/index.html

Hungarian

[UD_Hungarian-Szeged]

hungarian

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/hu_szeged/index.html

Indonesian

[UD_Indonesian-GSD]

indonesian

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/id_gsd/index.html

Irish

[UD_Irish-IDT]

irish

http://creativecommons.org/licenses/by-sa/3.0/

https://universaldependencies.org/treebanks/ga_idt/index.html

Italian

[UD_Italian-ISDT]

italian

Yes

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/it_isdt/index.html

UD_Italian-ParTUT

italian-partut

Yes

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/it_partut/index.html

UD_Italian-PoSTWITA

italian-postwita

Yes

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/it_postwita/index.html

UD_Italian-TWITTIRO

italian-twittiro

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/it_twittiro/index.html

UD_Italian-VIT

italian-vit

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/it_vit/index.html

Japanese

[UD_Japanese-GSD]

japanese

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/ja_gsd/index.html

Kazakh

[UD_Kazakh-KTB]

kazakh

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/kk_ktb/index.html

Korean

[UD_Korean-GSD]

korean

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/ko_gsd/index.html

UD_Korean-Kaist

korean-kaist

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/ko_kaist/index.html

Kurmanji

[UD_Kurmanji-MG]

kurmanji

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/kmr_mg/index.html

Latin

[UD_Latin-ITTB]

latin

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/la_ittb/index.html

UD_Latin-Perseus

latin-perseus

http://creativecommons.org/licenses/by-nc-sa/2.5/

https://universaldependencies.org/treebanks/la_perseus/index.html

UD_Latin-PROIEL

latin-proiel

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/la_proiel/index.html

Latvian

[UD_Latvian-LVTB]

latvian

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/lv_lvtb/index.html

Lithuanian

[UD_Lithuanian-ALKSNIS]

lithuanian

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/lt_alksnis/index.html

UD_Lithuanian-HSE

lithuanian-hse

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/lt_hse/index.html

Marathi

[UD_Marathi-UFAL]

marathi

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/mr_ufal/index.html

Norwegian (Bokmaal)

[UD_Norwegian-Bokmaal]

norwegian-bokmaal

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/no_bokmaal/index.html

Norwegian (Nynorsk)

[UD_Norwegian_Nynorsk-Nynorsk]

norwegian-nynorsk

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/nn_nynorsk/index.html

UD_Norwegian_Nynorsk-NynorskLIA

norwegian-nynorsklia

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/nn_nynorsklia/index.html

Old French

[UD_Old_French-SRCMF]

old-french

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/fro_srcmf/index.html

Old Russian

[UD_Old_Russian-TOROT]

old-russian

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/orv_torot/index.html

Persian

[UD_Persian-Seraji]

persian

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/fa_seraji/index.html

Polish

UD_Polish-LFG

polish-lfg

https://www.gnu.org/licenses/gpl-3.0.en.html

https://universaldependencies.org/treebanks/pl_lfg/index.html

[UD_Polish-PDB]

polish

Yes

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/pl_pdb/index.html

Portuguese

[UD_Portuguese-Bosque]

portuguese

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/pt_bosque/index.html

UD_Portuguese-GSD

portuguese-gsd

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/pt_gsd/index.html

Romanian

UD_Romanian-Nonstandard

romanian-nonstandard

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/ro_nonstandard/index.html

[UD_Romanian-RRT]

romanian

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/ro_rrt/index.html

Russian

UD_Russian-GSD

russian-gsd

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/ru_gsd/index.html

[UD_Russian-SynTagRus]

russian

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/ru_syntagrus/index.html

UD_Russian-Taiga

russian-taiga

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/ru_taiga/index.html

Scottish Gaelic

[UD_Scottish_Gaelic-ARCOSG]

scottish-gaelic

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/gd_arcosg/index.html

Serbian

[UD_Serbian-SET]

serbian

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/sr_set/index.html

Slovak

[UD_Slovak-SNK]

slovak

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/sk_snk/index.html

Slovenian

[UD_Slovenian-SSJ]

slovenian

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/sl_ssj/index.html

UD_Slovenian-SST

slovenian-sst

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/sl_sst/index.html

Spanish

[UD_Spanish-AnCora]

spanish

Yes

https://www.gnu.org/licenses/gpl-3.0.en.html

https://universaldependencies.org/treebanks/es_ancora/index.html

UD_Spanish-GSD

spanish-gsd

Yes

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/es_gsd/index.html

Swedish

UD_Swedish-LinES

swedish-lines

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/sv_lines/index.html

[UD_Swedish-Talbanken]

swedish

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/sv_talbanken/index.html

Tamil

[UD_Tamil-TTB]

tamil

Yes

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/ta_ttb/index.html

Telugu

[UD_Telugu-MTG]

telugu

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/te_mtg/index.html

Turkish

[UD_Turkish-IMST]

turkish

Yes

http://creativecommons.org/licenses/by-nc-sa/3.0/

https://universaldependencies.org/treebanks/tr_imst/index.html

Ukrainian

[UD_Ukrainian-IU]

ukrainian

Yes

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/uk_iu/index.html

Urdu

[UD_Urdu-UDTB]

urdu

http://creativecommons.org/licenses/by-nc-sa/4.0/

https://universaldependencies.org/treebanks/ur_udtb/index.html

Uyghur

[UD_Uyghur-UDT]

uyghur

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/ug_udt/index.html

Vietnamese

[VLSP + UD_Vietnamese-VTB]

vietnamese

https://vlsp.org.vn/sites/default/files/2019-06/VLSP2013%20User%20Agreement_0.pdf http://creativecommons.org/licenses/by-sa/4.0/

https://vlsp.org.vn/resources-vlsp2013 https://universaldependencies.org/treebanks/vi_vtb/index.html

UD_Vietnamese-VTB

vietnamese-vtb

http://creativecommons.org/licenses/by-sa/4.0/

https://universaldependencies.org/treebanks/vi_vtb/index.html