Introduction
This is the central place for hyphenation patterns in TEX. They are all bundled in a single package called hyph-utf8.
For pattern authors
If you are a pattern author and wish to update your patterns, please contact the hyph-utf8 package maintainers through the tex-hyphen mailing list .
Documentation
Algorithm
Papers
- Documentation (needs improvement)
- Documentation for the Lua(La)TEX part of the package
- TUG 2008 paper
- TUG 2016 paper
- The latest public hyphenation exception log from TUGboat, volume 39 (2018), no. 2
Slides
- TEX hyphenation applied to HTML (Mathias Nater, BachoTEX 2010)
Related packages
- Babel – for pdfTEX and other 8-bit TEX engines
- Polyglossia – for XƎTEX and LuaTEX
Links
Collaboration
- Mozilla
- FOP XML Hyphenation patterns (Simon Pepping)
- TEX-Hyphen-Pattern (Perl implementation on CPAN (Roland von Ipenburg)
- Hyphenator.js (Client-side implementation of hyphenation in HTML documents) (Mathias Nater)
OpenOffice.org
- Test TEX/OpenOffice hyphenation algorithm online (based on hunspell)
- Using TEX hyphenation patterns in OpenOffice.org (explains how to properly convert TEX patterns into OpenOffice-friendly form)
Other external links
- Hunspell (library)
- Open Office language extensions
- text-hyphen (rubyforge); (source code repository)
- TEX Hyphenator in Java
- Knuth-Liang Hyphenation for Haskell
- Knuth-Liang’s original, and László’s extended hyphenation for Rust
- WordPress wp-Typography plugin, with hyphenation as a central feature
- Indic languages:
- An article about the soft hyphen
- TEX line breaking algorithm in JavaScript
Languages
The list of supported languages is in the table below. Note that German and Spanish have additional documentation in a separate file.
(if patterns for any other language exist and are missing below please let us know)
name, synonyms | BCP 47 tag (link to file) | (left, right)hyphenmin | 8-bitencoding | licence | authors | |
---|---|---|---|---|---|---|
Afrikaans | afrikaans | af | (1, 2) | EC | LPPL | Tilla Fick Chris Swanepoel |
Ancient Greek | ancientgreek | grc | (1, 1) | LPPL | Dimitrios Filippou | |
ibycus | grc-x-ibycus | (2, 2) | custom | Peter Heslin | ||
Arabic | arabic | ar | (, ) | MIT | ||
Armenian | armenian | hy | (1, 2) | LGPL | Sahak Petrosyan | |
Assamese | assamese | as | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Basque | basque | eu | (2, 2) | EC | custom | Juan M. Aguirregabiria |
Belarusian | belarusian | be | (2, 2) | T2A | MIT | Maksim Salau |
Bengali | bengali | bn | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Bulgarian | bulgarian | bg | (2, 2) | T2A | custom | Anton Zinoviev |
Catalan | catalan | ca | (2, 2) | EC | LPPL | Gonçal Badenes Francina Turon |
Chinese | pinyin | zh-latn-pinyin | (1, 1) | EC | GPL | Werner Lemberg |
Church Slavic | churchslavonic | cu | (1, 2) | MIT | Aleksandr Andreev Mike Kroutikov |
|
Coptic | coptic | cop | (1, 1) | MIT | Claudio Beccari | |
Croatian | croatian | hr | (2, 2) | EC | LPPL, custom | Igor Marinović |
Czech | czech | cs | (2, 3) | EC | GPL | Pavel Ševeček |
Danish | danish | da | (2, 2) | EC | LPPL, MIT | Frank Jensen |
Dutch | dutch | nl | (2, 2) | EC | LPPL | Piet Tutelaers |
English | ukenglish, british, UKenglish | en-gb | (2, 3) | ASCII | MIT | Dominik Wujastyk Graham Toal |
usenglishmax | en-us | (2, 3) | ASCII | custom | Gerard D.C. Kuiken | |
Esperanto | esperanto | eo | (2, 2) | IL3 | LPPL | Sergei B. Pokrovsky |
Estonian | estonian | et | (2, 3) | EC | MIT, LPPL | Enn Saar |
Ethiopic | ethiopic, amharic, geez | mul-ethi | (1, 1) | MIT, custom, custom | Mojca Miklavec |
|
Finnish | finnish | fi | (2, 2) | EC | custom | Kauko Saarinen |
French | french, patois, francais | fr | (2, 2) | EC | MIT | Daniel Flipo Bernard Gaulle Arthur Reutenauer |
Friulian | friulan | fur | (2, 2) | EC | MIT, LPPL | Claudio Beccari |
Galician | galician | gl | (2, 2) | EC | LPPL | Javier A. Múgica |
Georgian | georgian | ka | (1, 2) | T8M | LPPL | Levan Shoshiashvili |
German | german | de-1901 | (2, 2) | EC | MIT | Deutschsprachige Trennmustermannschaft |
ngerman | de-1996 | (2, 2) | EC | MIT | Deutschsprachige Trennmustermannschaft | |
swissgerman | de-ch-1901 | (2, 2) | EC | MIT | Deutschsprachige Trennmustermannschaft | |
Gujarati | gujarati | gu | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Hindi | hindi | hi | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Hungarian | hungarian | hu | (2, 2) | EC | MPL, GPL, LGPL | Bence Nagy |
Icelandic | icelandic | is | (2, 2) | EC | LPPL | Jörgen Pind |
Interlingua | interlingua | ia | (2, 2) | ASCII | LPPL | Peter Kleiweg |
Irish | irish | ga | (2, 3) | EC | GPL, MIT | Kevin P. Scannell |
Italian | italian | it | (2, 2) | ASCII | LPPL, MIT | Claudio Beccari |
Kannada | kannada | kn | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Kurdish | kurmanji | kmr | (2, 2) | EC | LPPL | Jörg Knappen Medeni Shemdê |
Latin | latin | la | (2, 2) | EC | MIT, LPPL | Claudio Beccari |
classiclatin | la-x-classic | (2, 2) | ASCII | MIT, LPPL | Claudio Beccari | |
liturgicallatin | la-x-liturgic | (2, 2) | EC | MIT | Claudio Beccari Monastery of Solesmes Élie Roux |
|
Latvian | latvian | lv | (2, 2) | L7X | LGPL, GPL | Janis Vilims |
Lithuanian | lithuanian | lt | (2, 2) | L7X | MIT | Vytas Statulevičius Sigitas Tolušis Yannis Haralambous |
Malay | indonesian | id | (2, 2) | ASCII | GPL | Jörg Knappen Terry Mart |
Malayalam | malayalam | ml | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Marathi | marathi | mr | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Modern Greek | monogreek | el-monoton | (1, 1) | LPPL | Dimitrios Filippou | |
greek, polygreek | el-polyton | (1, 1) | LPPL | Dimitrios Filippou | ||
Mongolian | mongolian | mn-cyrl | (2, 2) | T2A | LPPL, MIT | Dorjgotov Batmunkh |
mongolianlmc | mn-cyrl-x-lmc | (2, 2) | LMC | custom | Oliver Corff Dorjpalam Dorj |
|
Norwegian | bokmal, norwegian, norsk | nb | (2, 2) | EC | custom | Rune Kleveland Ole Michael Selberg Karl Ove HuftHammer |
nynorsk | nn | (2, 2) | EC | custom | Karl Ove Hufthammer Rune Kleveland Ole Michael Selberg |
|
no | (2, 2) | custom | Rune Kleveland Ole Michael Selberg |
|||
Occitan | occitan | oc | (2, 2) | EC | MIT, LPL | Claudio Beccari |
Oriya | oriya | or | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Pali | pali | pi | (1, 2) | MIT | Wie-Ming Cittānurakkho Bhikkhu | |
Panjabi | panjabi | pa | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Persian | farsi, persian | fa | (, ) | MIT | ||
Piemontese | piedmontese | pms | (2, 2) | ASCII | MIT, LPPL | Claudio Beccari |
Polish | polish | pl | (2, 2) | QX | MIT, custom | Hanna Kołodziejska Bogusław Jackowski Marek Ryćko |
Portuguese | portuguese, portuges | pt | (2, 3) | EC | BSD 3-clause licence | Pedro J. de Rezende J. Joao Dias Almeida |
Romanian | romanian | ro | (2, 2) | EC | custom | Adrian Rezus |
Romansh | romansh | rm | (2, 2) | ASCII | MIT, LPPL | Claudio Beccari |
Russian | russian | ru | (2, 2) | T2A | LPPL | Alexander I. Lebedev |
Sanskrit | sanskrit | sa | (1, 3) | custom | Yves Codet | |
Serbian | serbianc | sh-cyrl | (2, 2) | T2A | LPPL | Dejan Muhamedagić |
serbian | sh-latn | (2, 2) | EC | LPPL | Dejan Muhamedagić | |
Slovak | slovak | sk | (2, 3) | EC | GPL | Jana Chlebíková |
Slovenian | slovenian, slovene | sl | (2, 2) | EC | LPPL, MIT | Matjaž Vrečko |
Spanish | spanish, espanol | es | (2, 2) | EC | MIT/X11 | Javier Bezos |
Swedish | swedish | sv | (2, 2) | EC | LPPL | Jan Michael Rynning |
Tamil | tamil | ta | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Telugu | telugu | te | (1, 1) | MIT, LGPL, GPL | Santhosh Thottingal | |
Thai | thai | th | (2, 3) | LTH | LPPL | Theppitak Karoonboonyanan |
Turkish | turkish | tr | (2, 2) | EC | LPPL | Pierre A. MacKay H. Turgut Uyar S. Ekin Kocabas Mojca Miklavec |
Turkmen | turkmen | tk | (2, 2) | EC | MIT | Nazar Annagurban |
Ukrainian | ukrainian | uk | (2, 2) | T2A | LPPL | Maksym Polyakov |
Upper Sorbian | uppersorbian | hsb | (2, 2) | EC | LPPL | Eduard Werner |
Welsh | welsh | cy | (2, 3) | EC | LPPL, MIT | Yannis Haralambous |