Suggested Research
Areas of suggested research on or related to Amharic NLP.
To Submit Topics
- Give a brief description of the topic.
- Explain the benefit of
the research to the public.
- Explain the benefit to the student who undertakes the research.
- Your contact information (email address will be convert to images to avoid spam).
- Submit topics to: Samuel Eyassu

Thesis Topics
Classification using any of Neural Network (NN) Topologies
Applying any of NN topologies, for instance Learning Vector Quantization(LVQ), on Classification of Ethiopic texts is one potential researchable area. Besides helping to compare the results with other schemes that has already been tested, it also contributes to the pool of knowledge with regard to investigating Classification performance of a topology. The student will gain valuable knowledge on NLP, Classification and NN.
Contact:
Information Filtering (IF) Using LSI Model
Performance of IF using NN Model was tested as part of Thesis by Ato Lemma Nigussie. But LSI Model ,with its capacity to project the term by document matrix space into term-by-concept and concept-by-document space, enables researchers to deal with documents verses concept space. Consequently this Model is worth investigating for its lesser dimensional pseudo-documents, easier preprocessing and representation of documents by their concept. In addition to comparing the performance of this Model with other schemes that has already been tested, it also contributes to the pool of knowledge with regard to investigating IF performance of LSI Model. The student will gain valuable knowledge on IF, LSI and NLP.
Contact: ![]()
Ontological Modeling of Amharic Verbs
Amharic verbs have been well studied and morphological models do not vary by a wide degree. At this time no computational models are available for Amharic and verb surveys available in paper references have covered only a limited collections of verbs and lack the depth needed for computational applications. The output of this research will be a comprehensive semantic model of Amharic verbs with classified instances that will constitute a knowledge base. The knowledge base will be an aid for machine translation and computational morphology of Amharic.
This project would apply semantic technology resources to develop a model for Amharic verbs using a survey of approximately 2,000 verbs. Verbs will be classified initially based on classic taxonomies (I, II, III, etc.), class properties and subclasses will be added to account for all characteristics of verbs and their relationships to one another and verbs of the Semitic family. The student will gain in depth knowledge of ontologies, modeling techniques and verb morphology.
Contact: 
Projects : Graduates or Undergraduates
Ethiopic Regular Expressions in Java
The regular expressions (regex) are a labor saving syntax for describing patterns of textual patterns and have become pervasive in high level and scripting languages alike. Regex patterns are applicable to problems of text retrieval and string matching in many scenarios. Regex syntax has been developed around the needs of Roman script and as such does not offer the same level of utility for expressing patterns for the characteristics of Ethiopic script. In this project the student will port a Perl based regex package for Ethiopic to the Java language based around the Internationalization Components for Unicode (ICU) class libraries.
The output of this research will be an Java based regular expressions package for Amharic that can become an invaluable resource for future researchers as well as for the private sector. The resulting package will be submitted to the ICU project for consideration for permanent inclusion in the OpenSource project. The student researcher will benefit from working with a world class OpenSource project (ICU) and will gain a deep understanding into text pattern matching.
Contact: 
Verb Lexicons
A developing database of verbs and their base (pre-inflected) forms will be expanded to reach the full potential of Amharic verb space. The student will devise a plan to expand the verb and validate the verb site (computationally, manually, or a combination of the two). The output of the work will be verb database representing a map of verb bases and their potential states. Future projects can utilize the database for conflation and stemming projects. The student will benefit from learning computational approaches to verb morphology.
Contact: 
Amharic Corpus Development
Collections of corpora material are available in Amharic but are not in convenient formats for conducting research. This project would assess and characterize the status of the various collections, propose a strategy for converting the collection to the TEI format in Unicode, then undertake the conversion and archiving of the collection. The result of this work will be the availability to future researchers of a high quality corpus that can serve as the basis of thesis investigations. The student will gain valuable experience in document management and corpora based linguistics.
Contact: 