(see Project Page at: http://www.meteck.org/files/geni/index.html ; the following is a snapshot of that page as of August 2015)
Project funded by the National Research Foundation of South Africal under the Competitive Programme for Rated Researchers (CPRR) -- Y-rated development grant, 2015-2017 (3 years)
Introduction and background
The use of natural languages in applications is ubiquitous. Canned, unchangeable, text can be used for some scenarios, but not when the information to be communicated depends on the context and large amounts of text. This is addressed by controlled natural languages and natural language generation (NLG) systems, which take structured data or knowledge as domain input, and are matched at runtime with templates or a grammar engine to generate the text. NLG systems mainly focus on generating English, however, and neither an NLG system nor sufficient theoretical foundations exist for the indigenous South African languages, despite the requirements for it. Preliminary results in isiZulu NLG have shown that a template-based approach is unfeasible for Bantu languages, due to, mainly, their complex grammar rules, noun class system, and agglutination. Thus, extant NLG systems cannot be adopted for Bantu languages, and a grammar engine is required to obtain automatically generated understandable text.
Aims
The aims of this project are to define the formal and algorithmic foundations for an isiZulu/isiXhosa grammar engine and to implement it to realize a (controlled) NLG system. The project will uncover sentence and linguistic realization patterns, postulated to be very similar for isiZulu and isiXhosa, and it will ensure incorporation of multilingualism. The rules and modular, efficient, algorithms will make the grammar usable for computation. This will be optimized on linguistic annotations of the input and text generation at runtime. A proof-of-concept grammar engine for isiZulu/isiXhosa will be developed to validate the theory. To ensure broad usability and interoperability with related theoretical and technological advances, such as linguistic linked data and ontology-driven information systems, it will use as input files domain knowledge that is represented in ontologies serialized in the Semantic Web language OWL, which also facilitates incremental system development.