Search engine for websites from the Comunidad Valenciana region. Tabarca Search Engine is an information retrieval system which starting from a given input(either full sentences in natural language or a set of keywords) returns as output a list of documents ordered by the relevance of each one with respect to the query. Currently the search engine has around 50,000 web pages stored and written in different languages (Spanish, Valencian and English). Between the pages indexed by this search engine highlight the fact that all the University of Alicante pages, including the press related ones, are stored in the index.
The information retrieval systems are responsible for processing a collection of text and selecting those that contain terms related to the question and discarding those that are not related. The IR-n system is an information retrieval system based on passages, it uses a probabilistic model to perform the retrieval and uses an expansion module of the question improving the results. This system has participated in international competitions such as CLEF.
The information extraction systems, unlike earlier systems, extract from a collection of texts belonging to the same domain information considered relevant to the application. These systems are intended primarily to locate specific information in texts to fill a database to which we can ask questions. The result is that these systems transform unstructured information into structured.
This application has been developed within the Group of Natural Language Processing and Information Systems as a result of technology transfer in the areas of Information Retrieval, and Information Extraction. The application consists of two main modules: IR-n as Information Retrieval System and an Information Extraction System both properly adapted to the application domain: the notary information framework.
This application assigns meanings to the words using the EuroWordNet electronic dictionary, in both Spanish and English.