AMHARIC VERB STEM RESOURCES Author: Michael Gasser, School of Informatics, Indiana University, gasser@indiana.edu 2007-12-15 CONTENTS The accompanying XML files give the stems for 4275 simple and derived verbs from Amsalu Aklilu's Amharic-English dictionary (1981). Also included are two simple Python programs that search for particular lexemes or stems. There are four XML files: am_verbstems_phon.xml: Phonetic stem forms, alphabetized by stem am_verbstems_writ.xml: Written stem forms, alphabetized by stem am_verblex_phon.xml: Phonetic stem forms, organized by lexeme am_verblex_writ.xml: Written stem forms, organized by lexeme There are two Python modules: find_stems.py find_lexeme.py ROMANIZATION Verb stems are represented using the SERA romanization conventions (Firdyiwek & Yacob, 1997), except that all words beginning with vowels start with the "consonants" "'" or "`". Romanized stems appear in two different forms, "written" forms that are adequate for handling written Amharic and "phonetic" forms that supplement the written forms with gemination and the epenthetic (sixth order) vowel [ɨ] (I in SERA). For example, the imperfect stem of the verb መለሰ 'returned' is /mels/ in the written form and /mel:Is/ in the phonetic form (gemination is represented by ":"). The written forms are in the *writ* files and the phonetic forms in the *phon* files. LEXEME FILES The stems are arranged in two different ways in the files. The *verblex* files are organized by lexemes, with all of the associated stems appearing under the lexeme entry. For example, here is the entry for the lexeme "tewadede" (ተዋደደ) in the file with phonetic representations of stems (am_verblex_phon.xml): loved, liked each other; was fixed (e.g. spade, pickaxe etc. with the handle) Romanized lexeme names are "written" (no gemination or epenthetic vowels) representations of the 3rd person singular masculine perfect, as is conventional for Ethiopian Semitic. In the few cases where there are homonyns, the lexemes are distinguished by suffixed "_1" and "_2"; these suffixes also appear on the Ethiopic forms of the lexeme names (for example, "Tebeqe_1" (ጠበቀ_1) and "Tebeqe_2" (ጠበቀ_2)). The attributes for each lexeme include its romanized ("rom") and Ethiopic ("eth") name (citation form), its morphological class ("class", in my own notation), and the consonants in its root ("root"). Next appears the gloss from Amsalu's dictionary (if this is available). Next the five stems that constitute the verb's "principal parts" (Bender & Hailu, 1978) appear ("stems"): perfect(ive) ("prf"), imperfect(ive) ("impf"), imperative/jussive ("impv"), infinitive ("inf"), and gerund(ive) ("ger"). For the example verb, with 2rd person singular masculine inflections (except for the infinitive), these would take the forms "tewad:edk" (ተዋደድክ), "tIw:ad:edal:eh" (ትዋደዳለህ), "tewaded" (ተዋደድ), "mew:aded" (መዋደድ), and "tewad:eh" (ተዋደህ). The infinitive form includes the "me-"/"m-" prefix. Following the stem list, there are alternate stems ("altstem"), which are used in contexts where morphophonological changes occur. These could be dispensed with in a system that uses finite-state rules to handle these changes. Each alternate stem includes its tense-aspect-mood ("tam"), its inflection if there is one ("infl"), and its form ("form"). For example, the line states that for the 2nd person singular feminine imperfect, the appropriate stem is "w:ad:ej"; with inflections this would take the form "tIw:ad:ejal:ex" (ትዋደጃለሽ). Note that stems are not checked for whether they might fail to occur because of semantic restrictions. The following alternate stems are possible: tam='prf' infl='pre': perfect with prefix (for example, "ar:ef" in "alar:efem" (አላረፈም)) tam='prf' infl='23p': perfect, 2nd or 3rd person plural (for example, "dem:" in "dem:u" (ደሙ)) tam='prf' infl='pre,23p': perfect, prefix and 2nd or 3rd person plural (for example, "ay:" in "alay:um" (አላዩም)) tam='impf' infl='2f': imperfect, 2nd person singular feminine tam='impf' infl='23p': imperfect, 2nd or 3rd person plural (for example, "bel" in "sibelu" (ሲበሉ)) tam='impv' infl='2f': imperative, 2nd person singular feminine (for example, "gIZ" in "gIZi" (ግዢ)) tam='impv' infl='neg,2f': imperative, negative 2nd person singular feminine (for example, "meN" in "at:ImeNi" (አትመኚ)) tam='impv' infl='23p': imperative, 2nd person plural; jussive, 3rd person plural (for example, "ament" in "amentu" (አመንቱ)) tam='jus': jussive (for example, "sber" in "yIsber" (ይስበር)) tam='jus' infl='3p': jussive, 3rd person plural (for example, "sm" in "yIsmu" (ይስሙ)) tam='ger' infl'1s': gerund, 1st person singular (for example, "meTIc:" in "meTIc:E" (መጥቼ)) STEM FILES The *verbstems* files are organized by the stems rather than the lexemes. An entry for one of the basic stems (in the file with written forms, am_verbstems_writ.xml) looks like this: The entry includes the stem form ("form"), the Ethiopic and romanized representations of its lexeme ("lexeth", "lexrom"), its tense-aspect-mood ("tam"), its class ("class"), and its root consonants ("root"); the stem in question is the (written) imperfect form of the verb "tesgebegebe" (ተስገበገበ), as in "ysgebegebal" (ይስገበገባል). An entry for an alternate stem looks like this: In addition to the attributes for a "stem" entry, an alternate stem entry there is the possibility of inflection ("infl"); the stem in question is used for the (written) 1st person singular form of the gerund of the verb "'aseqaye" (አሰቃየ), as in "aseqaycE" (አሰቃይቼ). THE PROGRAMS The accompanying programs require Python 2.5. If you want to run the programs from the command line, make sure that the path in the first line of each file points to Python on your system and that the file is executable. To find the stem for a given romanized written form, type the following in a shell: find_stems
If you want phonetic rather than written stems, add "phon" after the form. If you want only basic stems and no alternate stems, add "no_alt" after the form. If you want lexeme names to include the Ethiopic spelling as well as the romanization, add "eth" after the form. The keywords "phon", "no_alt", and "eth" can appear in any order. Here are some examples of "find_stems": % find_stems.py Saf Stems with form Saf Lexeme: Safe -- TAM: prf Lexeme: Safe -- TAM: impv Lexeme: teSafe -- TAM: impf Lexeme: teSafe -- TAM: jus % find_stems.py Saf phon Stems with form Saf Lexeme: Safe -- TAM: prf Lexeme: Safe -- TAM: impv % find_stems.py S:af phon eth Stems with form S:af Lexeme: teSafe ተጻፈ -- TAM: impf Lexeme: teSafe ተጻፈ -- TAM: jus % find_stems.py SIf phon eth Stems with form SIf Lexeme: Safe ጻፈ -- TAM: ger % find_stems.py SIf: phon eth Stems with form SIf: Lexeme: Safe ጻፈ -- TAM+infl: ger+1s % find_stems.py SIf: phon eth no_alt No stems found To find the lexeme, given the romanized lexeme name (written form of the 3rd singular masculine perfect), type the following in a shell: find_lexeme Options for "find_lexeme" are the same as for "find_stems". Here are some examples of "find_lexeme": % find_lexeme.py teflekeleke Lexeme: teflekeleke Gloss: swarmed (ants etc.) Root: f l k Basic stems: prf teflekelek impf flekelek impv tefleklek inf mefleklek ger tefleklk Alternate stems: jus fleklek % find_lexeme.py teflekeleke phon Lexeme: teflekeleke Gloss: swarmed (ants etc.) Root: f l k Basic stems: prf teflekel:ek impf flekel:ek impv tefleklek inf mefleklek ger tefleklIk Alternate stems: jus fleklek ger+1s tefleklIk: % find_lexeme.py teflekeleke phon eth no_alt Lexeme: teflekeleke ተፍለከለከ Gloss: swarmed (ants etc.) Root: f l k Basic stems: prf teflekel:ek impf flekel:ek impv tefleklek inf mefleklek ger tefleklIk If the lexeme name applies to two or more homonymous lexemes, all are printed out: % find_lexeme.py Tebeqe phon Lexeme: Tebeqe_1 Gloss: was tightened, was firm, was fastened; was stressed (syllable) Root: T b q Basic stems: prf Teb:eq impf Tebq impv TIbeq inf meTbeq ger Tebq Alternate stems: jus Tbeq ger+1s TebIq: Lexeme: Tebeqe_2 Gloss: waited for; looked after, watched, took care of; expected; preserved Root: T b q Basic stems: prf Teb:eq impf Teb:Iq impv Teb:Iq inf meTeb:eq ger Teb:Iq Alternate stems: ger+1s Teb:Iq: Both "find_stems" and "find_lexeme" can also be run interactively from within Python. In addition, the "find_lexeme.py" module contains a function that reads in the entire lexicon and returns it as a Python dictionary. Here is an example of its use. % python Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from find_lexeme import * >>> LEXICON = read_lexicon(written = False) >>> LEXICON['sebere']['stems'] {u'impf': u'sebr', u'inf': u'mesber', u'prf': u'seb:er', u'impv': u'sIber', u'ger': u'sebr'} >>> LEXICON['sebere']['stems']['prf'] u'seb:er' >>> LEXICON['sebere']['altstems'] {u'jus': u'sber', u'ger+1s': u'sebIr:'} >>> [lexeme for lexeme, entry in LEXICON.iteritems() if entry['stems']['prf'][:3] == 'qef'] [u'qeferere', u'qefedede', u'qefefe', u'qefeqefe'] LIMITATIONS The defective irregular verbs "al:e" (አለ) 'there is' and "new" (ነው) 'is' are not included. The agent (እድራጊ) and instrument (ማድረጊያ) forms of the verbs are not included. Some variations in spelling/pronunciation are not covered. For example, alternatives to the labialized consonant-vowel combinations /kWe/, /kWI/, /gWe/, /gWI/, etc. using the vowels /o/ and /u/ (/ko/, /ku/, /go/, /gu/) are not provided, and verbs with initial "laryngeals" appear either with initial /'/ or /`/ but not with both ("`aweqe" (ዐወቀ), "`abede" (ዐበደ), "'amene" (አመነ), "'adere" (አደረ)). Finally, the stems were generated using hand-coded regular expressions for simple and derived verbs in 154 classes (based on the root verb classes in Bender & Hailu (1978)) and a set of hand-coded phonological/orthographic rules. There are almost certainly errors. The lexicons have also not yet been tested for their coverage. The author welcomes all comments and corrections. REFERENCES Amsalu Aklilu. (1981). Amharic-English dictionary. Mega Publishing Enterprise, Addis Ababa. Bender, Lionel and Hailu Fulass. (1978). "Amharic Verb Morphology: A Generative Approach" Yitna Firdyiwek and Daniel Yaqob (1997). "The System for Ethiopic Representation in ASCII". URL: "citeseer.ist.psu.edu/56365.html"