A Grammar engine for Nguni natural language interfaces (GeNi)

(see Project Page at: http://www.meteck.org/files/geni/index.html ; the following is a snapshot of that page as of August 2015)

Project funded by the National Research Foundation of South Africal under the Competitive Programme for Rated Researchers (CPRR) -- Y-rated development grant, 2015-2017 (3 years)

Overview

Introduction and background

The use of natural languages in applications is ubiquitous. Canned, unchangeable, text can be used for some scenarios, but not when the information to be communicated depends on the context and large amounts of text. This is addressed by controlled natural languages and natural language generation (NLG) systems, which take structured data or knowledge as domain input, and are matched at runtime with templates or a grammar engine to generate the text. NLG systems mainly focus on generating English, however, and neither an NLG system nor sufficient theoretical foundations exist for the indigenous South African languages, despite the requirements for it. Preliminary results in isiZulu NLG have shown that a template-based approach is unfeasible for Bantu languages, due to, mainly, their complex grammar rules, noun class system, and agglutination. Thus, extant NLG systems cannot be adopted for Bantu languages, and a grammar engine is required to obtain automatically generated understandable text.

Aims

The aims of this project are to define the formal and algorithmic foundations for an isiZulu/isiXhosa grammar engine and to implement it to realize a (controlled) NLG system. The project will uncover sentence and linguistic realization patterns, postulated to be very similar for isiZulu and isiXhosa, and it will ensure incorporation of multilingualism. The rules and modular, efficient, algorithms will make the grammar usable for computation. This will be optimized on linguistic annotations of the input and text generation at runtime. A proof-of-concept grammar engine for isiZulu/isiXhosa will be developed to validate the theory. To ensure broad usability and interoperability with related theoretical and technological advances, such as linguistic linked data and ontology-driven information systems, it will use as input files domain knowledge that is represented in ontologies serialized in the Semantic Web language OWL, which also facilitates incremental system development.
 

Participants and collaborators

  • Maria Keet (PI), Department of Computer Science, University of Cape Town (UCT)
  • Catherine Chavula (PhD Student), Department of Computer Science, UCT
  • Joan Buyamugisha (PhD student), Department of Computer Science, UCT
  • Lyneve Laing (MSc student), Department of Computer Science, UCT
  • TBA (MSc student), Department of Computer Science, UCT
     
  • Langa Khumalo, Linguistics Program, School of Arts, University of KwaZulu-Natal
  • Mantoa Smouse, African Languages and Literatures Section, UCT
  • Zukile Jama, African Languages and Literatures Section, UCT
  • Somikazi Deyi, African Languages and Literatures Section, UCT

Outputs

  • Publications
    • Chavula, C., Keet, C.M. An Orchestration Framework for Linguistic Task Ontologies. 9th Metadata and Semantics Research Conference (MTSR'15), Springer CCIS. 9-11 September, 2015, Manchester, UK. (accepted)
  • Presentations
    • TBA
  • Other dissemination
    • TBA