g2 is a command line utility that provides a front-end to the [LibEth] character transcoding services. g2 comes with the [LibEth] source code but is undocumented. This Wiki page attempts to provide some useful documentation:

Why "g2"?

Prior to g2 there was a sera2any utility which collected together a number of separate sera2xyz utilities, the sera2any name in turn was inspired by the Mule any2ps utility. g2 accepts more than just [SERA] as input and the name is short for "Ge'ez To Any" (or geez2any).

Building

Builing g2 assumes that you have already built and installed the LibEth library. g2 can be statically linked against the shared LibEth (typically libeth.so residing in /usr/lib or /usr/local/lib) which will reduce the executable size of g2. Alternatively, g2 can be compiled with g2 built into the executable by compiling against the static library (libeth.a). This will result in a larger g2 executable, but it can be copied between Linux systems (or Solaris systems, etc) and should still be able to run.

With a shared libeth

gcc -lm -leth gezXfer.c common.c tables.c -o g2

With a static libeth

gcc -lm /path/to/libeth.a gezXfer.c common.c tables.c -o g2

Notes

-lm links in the math library, needed only for the pow function. If LibEth was built to use its internal pow function then the math library is not required and the -lm flag may be ommitted.

-g can be added if you want to debug g2. Likewise LibEth must be compiled with the -g flag if your debugging session is to enter the libeth routines.

Usage

g2 [options] filein > fileout

By default [SERA] input and UTF-8 output are assumed. g2 will recognize the following command line switches, archiac switches are not documented here:

Flag Argument Meaning
-fromdos none Remove DOS ^M (carriage return) at end of lines.
-todos none Insert DOS ^M (carriage return) at end of lines. Same as -tvout dos. Can not be used with -tvout options
-h none Help. Presently out of date.
-html none Indicates input document is HTML. HTML tags will not be transliterated.
-l <iso-639-code> Set language context for input/output, one of: am - Amharic
amh - Amharic
ti - Tigrinya
tir - Tigrinya
gz - Ge'ez
gez - Ge'ez
la - Latin
lat - Latin
The default context is Tiginya. This flag is useful when working with [SERA] transliteration which interprets some letters differently depending on the language. For example: Amharic: a ⇒ አ
Tigrinya: a ⇒ ኣ
So to apply Amharic interpretation rules:
./g2 -l amh filein > fileout
-i <input-encoding> Input encoding, one of:
-o <output-encoding> Output encoding, one of: uni - Unicode, UTF-8 encoding assumed (default output)
aausisa - Transliteration system in use at SISA.
acis -
acuwork -
addis98 -
addisword -
addiswp -
alpas -
braille - Ethiopic Braille convention under Unicode Braille support.
brana - Brana
cbhale - CBHale encoding (multifont).
dehai - Dehai email network's transliteration system.
dejene -
ed - "Ed" transliteration system used by the SIL "Ed" editor for Amharic.
enhpfr -
ethiome - EthioMicroEmacs encoding (multifont).
ethiop - Ethiop transliteration.
ethiome -
ethiop -
ethiopic -
ethiosoft -
ethiosys - EthioSystems encoding (multifont)
ethiowalia -
fidel -
geez -
geezab -
geezbausi -
geezedit -
geezfont -
geezinga -
geeztypenet - Phonetic Systems Ge'ezTypeNet font encoding.
ies - Institue of Ethiopian Studies transliteration system.
image - Output as links to images (old ENH system).
iso - ISO transliteration system for Ethiopic (old proposal).
jis - Japanese Industrial Standard, used for Ethiopic before web browsers supported Unicode.
jun - Short for "JUNET" - Japanese Unix Network encoding used in Mule.
latex - LATEX encoding (if TEX support is enabled).
mainz - Mainze University's transliteration system.
mono -
monoalt -
nci - New Concepts Incorporated encoding.
ncic - National Computer and Information Center encoding (used in Agafari fonts).
ncic_et - NCIC modified encoding of the Ejji Tsihuf font.
omnitech - OmniTech corporation's encoding.
phonetic - Phonetic Systems encoding.
powergeez - PowerGe'ez encoding.
qubee - Qubee transliteration.
sera - System for Ethiopic Encoding in ASCII
tex - TEX encoding (if TEX support is enabled).
tfanus -
tfanusnew -
visgeez - VisualGe'ez encoding.
visgeez2k - VisualGe'ez 2000 encoding.
wazema - Wazema encoding.
-tvin Input encoding variant, or "secondary encoding", one of: utf7 - UTF-7, 7-Bit UCS Transformation Format
utf8 - UTF-8, 8-Bit UCS Transformation Format, the default with -i uni
utf16 - UTF-16, 16-Bit UCS Transformation Format, or "two byte" Unicode.
-tvout Output encoding variant, or "secondary encoding", one of: clike - Lowercase "C-Like" character escape: ካ ⇒ \x12ab.
Clike - Uppercase "C-Like" character escape: ካ ⇒ \x12AB.
decimal - Decimal address value format: ካ ⇒ d4779.
dos - Insert DOS ^M (carriage return) at end of lines. Same as -todos. Can not be used with -tvout options.
escd - XML/HTML entity in decimal form: ካ ⇒ &#4779;.
esch - XML/HTML entity in lowercase hexadecimal form: ካ ⇒ &#x12ab;.
Esch - XML/HTML entity in uppercase hexadecimal form: ካ ⇒ &#x12AB;.
java - Lowercase Java character escape: ካ ⇒ \u12ab.
Java - Uppercase Java character escape: ካ ⇒ \u12AB.
name - Lowercase Unicode character name: ካ ⇒ ethiopic syllable kaa.
Name - Upperrcase Unicode character name: ካ ⇒ ETHIOPIC SYLLABLE KAA.
uplus - Lowercase U+wxyz character escape: ካ ⇒ U+12ab.
Uplus - Uppercase U+WXYZ character escape: ካ ⇒ U+12AB.
utf7 - UTF-7, 7-Bit UCS Transformation Format
utf8 - UTF-8, 8-Bit UCS Transformation Format, the default with -i uni
utf16 - UTF-16, 16-Bit UCS Transformation Format, or "two byte" Unicode.
xml - Lowercase XML tag character escape: ካ ⇒ <U12ab>.
Xml - Uppercase XML tag character escape: ካ ⇒ <12AB>.
zerox - Lowercase 0x character escape: ካ ⇒ 0x12ab.
Zerox - Uppercase 0x character escape: ካ ⇒ 0x12AB.
-rtf none Make output in RTF. This feature works with the circa 1997 definition of RTF.
-s none Substitute Latin spaces with Ge'ez wordspace.
-S <string>

Convert the string following the flag instead of reading from a file or stdin. Use quotation marks when multiple words are used and separated by space. This flag is useful for quickly looking up a character address. Example:

./g2 -tvout uplus -S ka ⇒ U+12ab
./g2 -tvout java -S ka ⇒ \u12ab
./g2 -tvout esch -S ka ⇒ &#x12ab;
-stats <output encoding> Print tables of statistics in fidel.out and fidel2.out. This is likely broken but repairable
-u none Make output UPPERCASE. This flag can be used with: -clike
-esch
-java
-name
-uplus
-xml
-zerox
Hence ./g2 -tvout Uplus ... and ./g2 -tvout uplus -u ... are identical.
-v none Print version and exit.
-x none Close string. When used with the -html option assures that, if approrpiate, a closing </font> tag closes a block of text. This makes more sense when used with blocks of text through the perl interface and not on the command line (I'm probably forgetting the use case here).
-z
-0
none Treat ዐ (Ayn-Ge'ez) as 0 (Zero) in a numeric context, e.g: 1ዐ2, ዐ234, 12.ዐ5, 12,ዐዐ5. This was a common problem with Geezigna documents.

Converting a Text File

notes on converting a text file

Converting an HTML File

notes on converting an html file

Converting an MS Word File

notes on converting an ms word file