AMHARIC VERB STEM RESOURCES
Author: Michael Gasser, School of Informatics, Indiana University, gasser@indiana.edu
2007-12-15
CONTENTS
The accompanying XML files give the stems for 4275 simple and derived verbs from Amsalu Aklilu's Amharic-English dictionary (1981). Also included are two simple Python programs that search for particular lexemes or stems.
There are four XML files:
am_verbstems_phon.xml: Phonetic stem forms, alphabetized by stem
am_verbstems_writ.xml: Written stem forms, alphabetized by stem
am_verblex_phon.xml: Phonetic stem forms, organized by lexeme
am_verblex_writ.xml: Written stem forms, organized by lexeme
There are two Python modules:
find_stems.py
find_lexeme.py
ROMANIZATION
Verb stems are represented using the SERA romanization conventions (Firdyiwek & Yacob, 1997), except that all words beginning with vowels start with the "consonants" "'" or "`". Romanized stems appear in two different forms, "written" forms that are adequate for handling written Amharic and "phonetic" forms that supplement the written forms with gemination and the epenthetic (sixth order) vowel [ɨ] (I in SERA). For example, the imperfect stem of the verb መለሰ 'returned' is /mels/ in the written form and /mel:Is/ in the phonetic form (gemination is represented by ":"). The written forms are in the *writ* files and the phonetic forms in the *phon* files.
LEXEME FILES
The stems are arranged in two different ways in the files. The *verblex* files are organized by lexemes, with all of the associated stems appearing under the lexeme entry. For example, here is the entry for the lexeme "tewadede" (ተዋደደ) in the file with phonetic representations of stems (am_verblex_phon.xml):
loved, liked each other; was fixed (e.g. spade, pickaxe etc. with the handle)
Romanized lexeme names are "written" (no gemination or epenthetic vowels) representations of the 3rd person singular masculine perfect, as is conventional for Ethiopian Semitic. In the few cases where there are homonyns, the lexemes are distinguished by suffixed "_1" and "_2"; these suffixes also appear on the Ethiopic forms of the lexeme names (for example, "Tebeqe_1" (ጠበቀ_1) and "Tebeqe_2" (ጠበቀ_2)).
The attributes for each lexeme include its romanized ("rom") and Ethiopic ("eth") name (citation form), its morphological class ("class", in my own notation), and the consonants in its root ("root"). Next appears the gloss from Amsalu's dictionary (if this is available).
Next the five stems that constitute the verb's "principal parts" (Bender & Hailu, 1978) appear ("stems"): perfect(ive) ("prf"), imperfect(ive) ("impf"), imperative/jussive ("impv"), infinitive ("inf"), and gerund(ive) ("ger"). For the example verb, with 2rd person singular masculine inflections (except for the infinitive), these would take the forms "tewad:edk" (ተዋደድክ), "tIw:ad:edal:eh" (ትዋደዳለህ), "tewaded" (ተዋደድ), "mew:aded" (መዋደድ), and "tewad:eh" (ተዋደህ). The infinitive form includes the "me-"/"m-" prefix.
Following the stem list, there are alternate stems ("altstem"), which are used in contexts where morphophonological changes occur. These could be dispensed with in a system that uses finite-state rules to handle these changes. Each alternate stem includes its tense-aspect-mood ("tam"), its inflection if there is one ("infl"), and its form ("form"). For example, the line
states that for the 2nd person singular feminine imperfect, the appropriate stem is "w:ad:ej"; with inflections this would take the form "tIw:ad:ejal:ex" (ትዋደጃለሽ). Note that stems are not checked for whether they might fail to occur because of semantic restrictions.
The following alternate stems are possible:
tam='prf' infl='pre': perfect with prefix (for example, "ar:ef" in "alar:efem" (አላረፈም))
tam='prf' infl='23p': perfect, 2nd or 3rd person plural (for example, "dem:" in "dem:u" (ደሙ))
tam='prf' infl='pre,23p': perfect, prefix and 2nd or 3rd person plural (for example, "ay:" in "alay:um" (አላዩም))
tam='impf' infl='2f': imperfect, 2nd person singular feminine
tam='impf' infl='23p': imperfect, 2nd or 3rd person plural (for example, "bel" in "sibelu" (ሲበሉ))
tam='impv' infl='2f': imperative, 2nd person singular feminine (for example, "gIZ" in "gIZi" (ግዢ))
tam='impv' infl='neg,2f': imperative, negative 2nd person singular feminine (for example, "meN" in "at:ImeNi" (አትመኚ))
tam='impv' infl='23p': imperative, 2nd person plural; jussive, 3rd person plural (for example, "ament" in "amentu" (አመንቱ))
tam='jus': jussive (for example, "sber" in "yIsber" (ይስበር))
tam='jus' infl='3p': jussive, 3rd person plural (for example, "sm" in "yIsmu" (ይስሙ))
tam='ger' infl'1s': gerund, 1st person singular (for example, "meTIc:" in "meTIc:E" (መጥቼ))
STEM FILES
The *verbstems* files are organized by the stems rather than the lexemes. An entry for one of the basic stems (in the file with written forms, am_verbstems_writ.xml) looks like this:
The entry includes the stem form ("form"), the Ethiopic and romanized representations of its lexeme ("lexeth", "lexrom"), its tense-aspect-mood ("tam"), its class ("class"), and its root consonants ("root"); the stem in question is the (written) imperfect form of the verb "tesgebegebe" (ተስገበገበ), as in "ysgebegebal" (ይስገበገባል).
An entry for an alternate stem looks like this:
In addition to the attributes for a "stem" entry, an alternate stem entry there is the possibility of inflection ("infl"); the stem in question is used for the (written) 1st person singular form of the gerund of the verb "'aseqaye" (አሰቃየ), as in "aseqaycE" (አሰቃይቼ).
THE PROGRAMS
The accompanying programs require Python 2.5. If you want to run the programs from the command line, make sure that the path in the first line of each file points to Python on your system and that the file is executable. To find the stem for a given romanized written form, type the following in a shell:
find_stems