Monday, September 22, 2008

ARCHITECTURE OF WORDNET

Topic discussed on the 19th September 2008
Search on WordNet architecture.
---------------------------------------------------------------------

The diagram below gives a brief overview of the WordNet architecture.

Grinder
The grinder acts as a converter; takes a lexical source file written by lexicographers and converts them into a format that is understandable and updatable for WordNet.
The WordNet Database
The word/synset pairs are stored in the wordnet Database. Nouns and verbs are grouped according to semantic fields, adjectives and adverbs are kept in another file separately.
Lexical Source Files
Each lexical file is assigned a file number for use within the database.

All wordnet applications have been modeled or implemented following the Princeton WordNet. For our project, the EuroWordNet architecture can be taken into consideration as well because it is multilingual (one of the applications of wordnet,like disambiguation, can be found in multilingual WN). So both architectures(Princeton and Euro WN) will be mostly similar. The architecture also allows enrichment of monolingual source lexicons through exploitation of the semantic information encoded in corresponding entries.

Tuesday, September 16, 2008

Applications of WordNet (2)

Topic discussed on the 5th September 2008
Where wordnet is used, give some applications.
Give detail explanations of the applications.

------------------------------------------------------------

Detail Applications of WordNet (WN)
Information retrieval

It has been closely related to organisation and representation of knowledge on the internet. The application of Artificial Intelligence has helped for the retrieval of information, a method of incorporating logic and inference focus on WN.
Wordnet has been used as a semantic lexicon, in which queries have been expanded through keyword design, like in full text retrieval in a communication aid. It can be implemented as a linguistic knowledge tool to represent and interpret the meaning of. The user will be provided with efficient and integrated access to information. The development of a natural language interface can be done using wordnet, to optimise the precision of internet search engines by expanding queries.

Conceptual identification/disambiguation
Semantic disambiguation as follows:

i. All the noun-verb pairs in the sentence are selected.

ii. The most likely meaning of the term is chosen. Internet is used with this goal.

iii. Drawing from the most frequently appearing concepts (step ii), all the nouns are selected in the “glossaries” of each verb and its hierarchical subordinates.

iv. Drawing from the most frequently appearing concepts, all the nouns are selected in the “glossaries” of each noun and its hierarchical subordinates.

v. A formula is applied to calculate the concepts common to the nouns in point’s (iii) and (iv).

Disambiguation is very often and varies wordnet application. IWA/H, ARPA and KRSL have been created to avoid ambiguity, later AutoASC, proved very effective for disambiguation in information retrieval. The product has the latest wordnet gloss definition.

Query expansion
An expansion system was proposed in 1994, based on the calculation the tf-idf (term frequency–inverse document frequency) for the query terms and adding to it half the tf-idf for the WN synonyms for these terms. The benefits of applying wordnet to queries, using a Word Sense Disambiguator (WSD) enhance the search process.

Document classification
Semantic traits are extracted by grammatical categorisation of WN nouns, verbs, adjectives and categorisation of the relevance of data. An algorithm was designed by Judith Klavans for automatically determining the genre of a paper on the grounds of the WN verb categories used.

Applications of WordNet

Topic discussed on the 5th September 2008
Where wordnet is used, give some applications.
Give detail explanations of the applications.

------------------------------------------------------------
WordNet intends to create a product that could combine the advantages of electronic dictionaries and online thesauri. It is an ideal tool for disambiguation of meaning, semantic, tagging and information retrieval.

Some applications of WordNet (WN):
1. Conceptual identification/disambiguation
2. Information retrieval
3. Query expansion
4. Machine translation
5. Document classification

WN can be used in the development of a natural language interface to optimize the precision of Internet search engines by expanding the queries.
WordNet has served as a support for the development of tools to enhance the efficiency of internet resource searches.

Saturday, September 6, 2008

First Research (Yankesh)

Topic discussed on Monday 25th August 2008,
What is wordnet?
Where can wordnet be used, both internationally and locally?

Mauritian Creole as a language in Mauritius.
--------------------------------------------------------------------

WORDNET FOR MAURITIAN CREOLE

WordNet
WordNet is a dictionary and thesaurus which distinguishes between adverbs, verbs, nouns and adjectives because they follow grammatical rules. Synsets are connected to other synsets via a number of semantic relations.

Words can be connected to other words through lexical relation. The nouns and verbs are arranged by an ISA relation. WordNet does not include information about the origin and development of a word, pronunciations and forms of irregular verbs.

Neural networks algorithms can be used to search the expanded WordNet for related terms to disambiguate search keywords.

Mauritian Creole
This language often known as “Morisyen” is spoken by almost all the people in Mauritius. The nouns, verbs, adjectives and adverbs use in this language are often ambiguous. Some words have same pronunciation, and can be written in the same way, but they refer to an adjective or a verb. This criterion has to be taken into serious consideration in our WordNet system.

Uses of WordNet
• The Wordnet Enhanced Automatic Crossword Generation was created by Aoife Aherne and Carl Vogel, of the Computational Linguistics Group from the University Of Dublin, Ireland.

• WordNet is used to perform searches across search engines such as Yahoo, Google and Ask.com and many others.



WHERE CAN THIS WORDNET FOR MAURITIAN CREOLE BE USED??

The Nelson Mandela Centre for African Culture Trust Fund, is a cultural centre under the supervision of the Ministry of Arts and Culture. Its objectives and services provided are as follows:
To preserve and promote African arts and culture.

To preserve and promote Creole arts and culture.

• To collect, publish and disseminate information with respect to the African and Creole arts and culture.

To organise lectures, seminars, workshops, exhibitions and any other activities leading to the better understanding of the African and Creole arts and culture.

There are more services which are provided but these three highlighted above best suits our project. Our software can be used there as a teaching toolkit for better comprehension of the Mauritian Creole.

Friday, September 5, 2008

First Research (Ved)

Topic discussed on Monday 25th August 2008,
What is wordnet?
Where can wordnet be used, both internationally and locally?

Mauritian Creole as a language in Mauritius.
---------------------------------------------------------------

WordNet for Mauritian Creole

WordNet

What is WordNet?

· A large lexical database, or electronic dictionary

· Covers most English nouns, verbs, adjectives, adverbs

· Electronic format enables automatic manipulation

WordNet v/s Paper Dictionaries

· Traditional paper dictionaries are organized alphabetically, so words that are grouped together (on the same page) are unrelated

· WordNet is organized by meaning, so words in close proximity are related

· Users can browse WordNet and find words that are meaningfully related to their queries (like in a thesaurus) Ref1

Uses of WordNet

All systems which use WordNet include

· Sense disambiguation (The process of identifying which sense of a word is used in a given sentence)

· Information Extraction & Retrieval

· Prepositional Attachment (Prepositional phrase attachment is a common cause of structural ambiguity in natural language)

· Textual Summarization (Consists of the following processes: Topic analysis, Passage Extraction, Text Understanding and Information Integration)

· Recognition of Textual Cohesion (more research will follow on this)

· Intelligent Internet Searches Ref2

Mauritian Creole

Kreol Morisyen, the language of Mauritius

Mauritius has a very complex linguistic situation. While English is the official language of parliament, traffic regulations and school administration, it is spoken by only 3% of the population. French is the native language and 80% of our newspapers are written in French. Furthermore, we have other languages like Urdu, Chinese, Hindi, Bhojpuri, etc… Mauritian Creole, or MC, is the national language, and it is spoken by nearly the entire population.

The majority of MC words are of French origin, while some are derived from English, Indian Languages, Malagasy and Chinese.

Derived from French-lexicon creoles, some examples of MC are “livere” (winter), derived from the French “l’hiver”, and “dilo” (water) from “de l’eau”. Ref3

Future implementation of WordNet for Mauritian Creole

1) Nelson Mandela Centre for African Culture Trust Fund

Objectives/Services

….

-To preserve and promote Creole Arts and Culture

….Ref 4

2) Nelson Mandela Centre for African Culture

Documentation and Research

We could Wordnet for Mauritian Creole as a Research field that could later provide other people the opportunity to expand the Creole vocabulary and enhance our Wordnet for Mauritian Creole.

3) Ministry of Arts and Culture

We could create and provide this software as a key step for the promotion of the Creole language as we already have our Creole DictionaryRef5. This software would provide the ideal Creole WordNet as an analogy to the English WordNet.