Open research areas and topics

Hussein's primary research areas are Digital Libraries, Information Retrieval and ICT for development.

From a digital libraries perspective, my current focii are, firstly, on the architecture of highly distributed interoperable and scalable Internet-based information systems and, secondly, on digital preservation, especially of cultural heritage. Digital libraries is a relatively new research area, at the intersection of Computer Science, computer networking and information sciences. From a Computer Science perspective, there are various technical issues that need to be resolved to support the ultimate aim of enabling simpler access to more information of a higher quality to all users of online and electronic systems. I have worked closely with the Open Archives Initiative (http://www.openarchives.org) and currently work with the Networked Digital Library of Theses and Dissertations (http://www.ndltd.org) and have worked with the various groups promoting Open Access in South(ern) Africa, thus collaborating with institutions and individual researchers on a wide and distributed scale. I have active and ongoing collaborations with UCT's Fine Arts, Archaeology and Geomatics departments related to the preservation of Bushman and other African heritage.

From an ICT4D perspective, I am interested in all aspects of development and how technology can support these goals. This includes education, healthcare, democracy, the prevention of conflict, employment, infrastructure, service delivery and especially the development of human dignity and the maintenance of human rights. My view on ICT4D is that development is not about different systems for "developing countries", but about a carefully-constructed plan for progress in any society, without resorting to the deficit model that traditionally maintains that some countries may be called "developed". I have a particular interest in educational technology.

At the intersection of ICT4D and Information Retrieval, I am interested in the application of IR to development and the development of African language information retrieval systems. I have worked with students in recent years to investigate isiZulu and isiXhosa systems, as well as other African languages.

I am interested in working with motivated postgraduate students who share my somewhat idealistic passion for improving the lives of people by removing barriers to information and computing. The ideas listed below are therefore far from exhaustive and all wild and wacky ideas are welcome and encouraged.

General Areas

ICT in Education One of the biggest challenges in South Africa and many African countries is the state of education, especially in schools. South Africa, particularly, has one of the worse education systems in the world (by recent Math and Science assessments). Various technology interventions have been proposed to address this, including distribution of textbooks, videos, laptops; the creation of computer labs in schools; and the creation of portals for sharing of high quality educational material. Nevertheless, there is still scope for more innovative interventions that work in spite of the current environment.

Digital Library Architecture In attempting to move closer to the goal of making information readily available to users, managed and flexible information systems must be placed within the grasp of all institutions and archivists. As such, the architecture of digital libraries needs to be simple but flexible. Ongoing research in this area, at UCT and with various international collaborators, is producing component models, frameworks, visual interfaces and specification languages for the construction of custom digital libraries without the need for custom software development. There is still much scope for additional work in these aspects as well as methodologies for component packaging and user interface workflow definition that is relevant not only to digital libraries but all online systems.

Digital Preservation of Cultural Heritage South Africa has many important collections of information such as the Bleek and LLoyd Collection documenting the Bushman languages, the DISA project documenting the struggle for liberation and the District Six museum. A recurring problem with such projects is the difficulty in managing the process of digitising and creating and managing data and metadata electronically. There is much scope for improvement in usability and in the creation of tools specifically aimed at heritage collection (by scanning, oral recordings, etc.) and preservation. Projects related to this are not about algorithms but innovative interventions to safeguard history.

African Languages Information Retrieval South African (or other African) languages arguably are marginalized online and most local information is available only in English. This project will investigate the creation of a search engine for African languages. There are mutiple aspects of this project that may be tackled independently: the creation of a focused crawler for African language documents; the organization and presentation of African language documents as a preservation archive; and the development of core information retrieval algorithms (stemmers, morphological analyzers, cross-lingual transformations) for African languages. The ultimate goal is to have a specialized search engine where users can find African language documents easily and with greater accuracy than language-independent search engines.

Information Retrieval for Development How do we specialise search engines to enable development or address developmental goals? Do we need specific basic algorithms or specialised user interfaces or both? Is a greater study of users needed? How are such systems influenced by the typical infrastructure in so-called developing countries?

Some Specific Project Ideas (open)

Note: Most of these projects are appropriate for Honours and Masters study.

Electronic heritage resources in local languages The Bleek and Lloyd Collection of Bushman heritage artefacts preserves the culture of an important group of South Africans. The online archive provides users with copies of notebooks containing stories in English, |xam and !kun. The direct descendents of these Bushman groups do not, however, speak any of these languages so the information is not useful to them without translation and reinterpretation. Simultaneously, the reinterpretation and translation may provide useful information to researchers studying the original texts.
Two recently completed projects looked into using AI and crowdsourcing to create transcriptions of the text, while previous work from researchers included manual transcription. This project will look into how to present these multiple representations of the information so that users viewing the information can use the most appropriate representation and/or contribute/enhance the existing information. Examples of similar projects are Google Goggles and how Chrome does translation of Web pages. The work will make research contributions in the area of novel interfaces for the creation, display and enhancement of heritage information.

Mobile devices and Heritage Archives to support Education Archaeological datasets encode and preserve key elements of South African heritage, such as the Bushman rock art. While such collections are an invaluable tool to researchers, they are not generally accessible to the public or anyone outside the tertiary education sector. This project is on the presentation of archaeological data for the express purpose of pre-university learning. In particular, a large number of annotated images of rock art are part of the Bleek and Lloyd collection at UCT and the Rock Art Institute's collection at Wits. It should be possible for a school pupil walking through a cave to take a picture of some rock art and instantly find out what she is looking at (based on image retrieval). This would greatly enhance the experience of learning from primary source materials. It could also be possible to perform such exploration based on photographs and other static representations so it is not necessary to go to the physical caves to experience the rock art. This project partly builds on a previous project (School of Rock Art - Honours 2012) where a user could take a virtual tour through a cave, with annotated, overlaid and hyperlinked data to enhance the experience. The emphasis in both cases is on supporting learning through interactive presentation of information. Major aspects of this project will include the image retrieval/matching mechanism and designing the interaction between users and the data. Experiments with users will confirm the learning aspects of the project. (This was done by Ayodeji Olojede in 2014-2016, but extensions are possible)

High Performance Computing in Developing Countries HPC techniques have become increasing popular in order to solve computationally-intensive problems. However many of these solutions are not applicable in the African context because of limited computational, storage or network resources. A general aim of many ongoing efforts is to adapt scalable solutions to local conditions, thereby making HPC more practical for those without supercomputers or massive bandwidth.

Javascript/HTML5 Curation Tools Creating cross-platform tools for the management of digital document collections using in-browser Javascript only. This project is about testing the limits of the latest Web technology to create platform-independent desktop tools. Probably not suitable for more than an Honours project.

Geographical Navigation of Information NDLTD has a collection of more than 3 million electronic theses from around the world (managed at UCT). These can be searched using keywords but many users would prefer to search for information based on source location or topic location e.g., documents dealing with Kenyan politics or documents produced in Kenya. Thus, the goal is to produce a GIS-type navigation interface for the information. There are 2 aspects of the project that are interesting: the user interface for information navigation; the information processing; and the automatic classification of documents by location.

AJAX High Performance Computing Client Develop and evaluate an AJAX-only client for a high performance computing paradigm such as volunteer computing. Traditionally, volunteer computing requires the installation of a local client, but with the maturation of AJAX (Javascript+XML), it may be possible to use the Javascript interpreter within a Web browser as the sandboxed environment and have users contribute to global problem solving simply by visiting a website!

Recommending Digital Repositories Develop and evaluate a multi-criteria recommendation engine for selecting a digital repository technology. Selecting the most appropriate digital repository tool (e.g., DSpace vs EPrints) is somewhat of a black art because of the large number of variables and the difficulty in making direct comparisons among tools. Nevertheless, many repository managers have made successful choices, while some have also made poor choices. Using this data (which will need to be collected), patterns can be learnt using machine learning to help repository managers make better choices in future.

Digital Libraries as Platform Facebook has become a phenomenal success largely because of its clean API, simple toolkits and reasonable model to add third party applications to a core system. This plug-in approach has not been as successful in other Web-based systems but Facebook appears to have hit on the best compromise between capability and control. In particular, digital library systems (such as the ACM digital library) could possibly offer lots of interesting services (e.g., recommendations, local copies) to users but these systems are notoriously difficult to extend. This project will look into how a digital library system can be decomposed into a platform with services so that extensions work in a manner similar to Facebook. The big question is: can the technology of Facebook Applications (or other systems like Google Gadgets) be generalised to provide services to users in arbitrary content management systems?

Web-based Component Testing With the rapid acceptance of Web Services and Web-based technology, there is a growing proliferation of services that can be accessed remotely through well-defined interfaces. Past experience in protocol development has shown that well-defined interface specifications are not sufficient to ensure compliance with a standard and this usually results in multiple non-conformant interpretations and, generally, problems for human and machine users of the services. The incompatibilities among Web browsers is possibly the best contemporary example that illustrates why standards-compliance and compliance-testing are crucial in networked environments. In the digital library community, Hussein has worked with the Open Archives Initiative in developing protocol testing tools such as the Repository Explorer (a local mirror is at http://re.cs.uct.ac.za) and this has greatly influenced the success of the standard it tests. This is, however, a first generation testing tool. Much work remains to be done in generalising the testing framework so that testing tools can be automatically generated or driven by specifications. In an ideal environment, any Web-based protocol should be specified formally, in order to generate testing tools and test cases automatically. This work can have a major impact on the success of emerging digital library protocols and standards based within the Web Services initiative in general.

Innovative Document Management Currently, there are a number of digital repository software toolkits to support centralised archiving of electronic resources. However, all of these tools require user intervention where users are required to explicitly submit items with associated descriptions. This has long been recognised as the bottleneck in acquiring and archiving material. Innovative techniques are required to support users and incorporate archiving (and sharing when appropriate) into their routine tasks by integrating document management into desktop software and other systems. An example of such a system would be one that transparently and efficiently archives all versions of a word processor document at the level of the filesystem. An example from a different extreme would be a system to replace photocopying for archival purposes with a scanner and software to automatically tag, organise and manage short-term and long-term duplicate copies of documents. Personal archiving is very relevant in an age where we produce a growing number of digital artefacts such as email messages, digital photos, PDA schedule entries and electronic documents how do we effectively manage such fluid information in a connected world where digital photos may be shared on one website, research documents on others and everything related to an individual must be periodically archived?

digital libraries laboratory @ uct . cs

General Areas

Some Specific Project Ideas (open)

Contents