Personal tools
You are here: Home Research Research Topics Suggested Research
Document Actions

Suggested Research

by Daniel Yacob last modified 2007-02-02 08:23

Areas of suggested research on or related to Amharic NLP.


To Submit Topics

  1. Give a brief description of the topic.
  2. Explain the benefit of the research to the public.
  3. Explain the benefit to the student who undertakes the research. 
  4. Your contact information (email address will be convert to images to avoid spam).
  5. Submit topics to: Samuel Eyassu Email of Samuel Eyassu 11pt


Thesis Topics


Classification using any of  Neural Network (NN) Topologies

Applying any of  NN topologies, for instance Learning Vector Quantization(LVQ),   on Classification of  Ethiopic texts is one potential researchable area. Besides helping to compare the results with other schemes that has already been tested, it also contributes to the pool of knowledge with regard to investigating Classification performance of a topology. The student will gain valuable knowledge on NLP, Classification and NN.

Contact: Email of Samuel Eyassu 11pt


Information Filtering (IF) Using LSI Model

Performance of IF using NN  Model was tested as part of Thesis by Ato Lemma Nigussie. But LSI Model ,with its capacity to project the term by  document matrix space into term-by-concept and concept-by-document space, enables researchers to deal with documents verses concept space. Consequently this Model is worth investigating for its lesser dimensional pseudo-documents, easier preprocessing and representation of documents by their concept. In addition to comparing the performance  of this Model with other schemes that has already been tested, it also contributes to the pool of knowledge with regard to investigating IF performance of LSI Model. The student will gain valuable knowledge on IF, LSI and NLP.

Contact: Email of Samuel Eyassu 11pt


Ontological Modeling of Amharic Verbs

Amharic verbs have been well studied and morphological models do not vary by a wide degree. At this time no computational models are available for Amharic and verb surveys available in paper references have covered only a limited collections of verbs and lack the depth needed for computational applications. The output of this research will be a comprehensive semantic model of Amharic verbs with classified instances that will constitute a knowledge base. The knowledge base will be an aid for machine translation and computational morphology of Amharic.

This project would apply semantic technology resources to develop a model for Amharic verbs using a survey of approximately 2,000 verbs. Verbs will be classified initially based on classic taxonomies  (I, II, III, etc.), class properties and subclasses will be added to account for all characteristics of verbs and their relationships to one another and verbs of the Semitic family. The student will gain in depth knowledge of ontologies, modeling techniques and verb morphology.

Contact: Email of Daniel Yacob 11pt


Projects : Graduates or Undergraduates


Ethiopic Regular Expressions in Java

The regular expressions (regex) are a labor saving syntax for describing patterns of textual patterns and have become pervasive in high level and scripting languages alike. Regex patterns are applicable to problems of text retrieval and string matching in many scenarios.  Regex syntax has been developed around the needs of Roman script and as such does not offer the same level of utility for expressing patterns for the characteristics of Ethiopic script. In this project the student will port a Perl based regex package for Ethiopic to the Java language based around the Internationalization Components for Unicode (ICU) class libraries. 

The output of this research will be an Java based regular expressions package for Amharic that can become an invaluable resource for future researchers as well as for the private sector. The resulting package will be submitted to the ICU project for consideration for permanent inclusion in the OpenSource project. The student researcher will benefit from working with a world class OpenSource project (ICU) and will gain a deep understanding into text pattern matching.

Contact: Email of Daniel Yacob 11pt

Verb Lexicons

A developing database of verbs and their base (pre-inflected) forms will be expanded to reach the full potential of Amharic verb space. The student will devise a plan to expand the verb and validate the verb site (computationally, manually, or a combination of the two). The output of the work will be verb database representing a map of verb bases and their potential states. Future projects can utilize the database for conflation and stemming projects. The student will benefit from learning computational approaches to verb morphology.

Contact: Email of Daniel Yacob 11pt

 

Amharic Corpus Development

Collections of corpora material are available in Amharic but are not in convenient formats for conducting research. This project would assess and characterize the status of the various collections, propose a strategy for converting the collection to the TEI format in Unicode, then undertake the conversion and archiving of the collection. The result of this work will be the availability to future researchers of a high quality corpus that can serve as the basis of thesis investigations. The student will gain valuable experience in document management and corpora based linguistics.

Contact: Email of Daniel Yacob 11pt




This site conforms to the following standards: