GermaLemma – Lemmatizer for German language text Markus Konrad <>, WZB Mai 2017

In order to use GermaLemma, you will need to download the TIGER corpus from the University of Stuttgart from Their corpus is free to use for non-commercial purposes.

It’s supposed to work with a corpus that employs the STTS tagset:

Then, you should convert the corpus into pickle format for faster loading by running:

python tiger_release_[…].conll09

This will place a lemmata.pickle file in the “data” directory which is then automatically loaded when you use GermaLemma like this:

` from germalemma import GermaLemma lemmatizer = GermaLemma() `

Module Contents


GermaLemma(self,**kwargs) Lemmatizer for German language text main class.
class GermaLemma(**kwargs)

Lemmatizer for German language text main class.


Initialize GermaLemma lemmatizer. By default, it will load the lemmatizer data from ‘data/lemmata.pickle’. You can also pass a manual lemmata dictionary via lemmata or load a corpus in CONLL09 format via tiger_corpus or load pickled lemmatizer data from pickle. Force usage of module by setting use_pattern_module to True (or False for not using). By default, it will try to use if it is installed.

find_lemma(w, pos_tag)

Find a lemma for word w that has a Part-of-Speech tag pos_tag. pos_tag should be a valid STTS tagset tag (see or a simplified form with: - ‘N’ for nouns - ‘V’ for verbs - ‘ADJ’ for adjectives - ‘ADV’ for adverbs All other tags will raise a ValueError(“Unsupported POS tag”)! Return the lemma or, if no lemma was found, return w.

Lemmata dictionary lookup for word w with POS tag pos. Return lemma if found, else None.


Try to lemmatize adjectives using prevalent German language adjective suffixes. Return possibly lemmatized adjective.


Try to split a word w that is possibly made of composita. Return the lemma if found, else return None.

_lemma_via_patternlib(w, pos)

Try to find a lemma for word w that has a Part-of-Speech tag pos_tag by using module’s functions. Return the lemma or w if lemmatization was not possible with

add_to_lemmata_dicts(lemmata_lower, token, lemma, pos)