Caroline Barrière
Phone: 819-934-3450
Fax: 819-934-2607
Email: Caroline.Barriere@cnrc-nrc.gc.ca
Michel Mellinger
Phone: 819-934-2602
Fax: 819-934-2607
Email: Michel.Mellinger@cnrc-nrc.gc.ca
The TerminoWeb project focuses on the development of a technology which will allow, as a medium term objective, the automatic construction of specialized ontologies (i.e. ontologies for specific domains), converging in this way with the study of terminology. The project began in 2004 and, even though it is a long-term project, a number of its aspects are being actively explored and results are presented here as they are produced (consult this page on a regular basis for updates). Currently, we are developing technology for the processing of English texts, adaptation of TerminoWeb to French is in progress, and we will consider exploring bilingual English-French terminology in the future.
The different aspects for the project are as follows:
The project is based on information extraction technology by use of linguistic patterns, and also develops new and innovative technologies. For example, for the construction of specialized corpora, we are developing a module for the post-processing of results obtained from search engines, which allows texts ordering by taking into account criteria regarding text structure (flowing text) and knowledge density (knowledge-rich contexts). This technology is unique and supports information retrieval for the purpose of ontology extraction. As for term extraction, we have engaged in R&D for the integration of relational criteria between terms in order to determine their status (term of interest or not) in a given domain. This is also an original approach as criteria that are commonly used at this time are essentially linguistic and statistical criteria linked to individual terms only (one term at a time).
The TerminoWeb project will lead to several software applications, including:
Therefore, the project aims at having significant impact in the fields of terminology, translation, and language learning.
The publication NRC-48765 provides futher details of the project, explains the various functions of TerminoWeb, illustrates its capabilities with results from experiments, and identifies areas of future R&D work considered at that time.
We welcome discussing technology transfer options with private sector companies interested in one or the other application areas listed above.
The beta version 1.0 of TerminoWeb has been available since December 2006, in order to gather comments from early users; please see the Web site for the TerminoWeb prototype should you wish to participate in specific tests of TerminoWeb. This beta version includes a preliminary module for corpus construction, as well as a module for term extraction and for knowledge-rich contexts search in the corpus extracted by this first module. We make the TerminoWeb technology evolve according to feedback received from users, at the same time as we pursue those areas of R&D of interest to the development of tools derived from TerminoWeb.