A personalized ontology model for web information gathering
As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in personalized web information gathering. However, when representing user profiles, many models have utilized only knowledge from either a global knowledge base or user local information. In this paper, a personalized ontology model is proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful.
1. Golden Model: TREC Model:
The TREC model was used to demonstrate the interviewing user profiles, which reflected user concept models perfectly. For each topic, TREC users were given a set of documents to read and judged each as relevant or nonrelevant to the topic. The TREC user profiles perfectly reflected the users’ personal interests, as the relevant judgments were provided by the same people who created the topics as well, following the fact that only users know their interests and preferences perfectly.
2. Baseline Model: Category Model
This model demonstrated the noninterviewing user profiles, a user’s interests and preferences are described by a set of weighted subjects learned from the user’s browsing history. These subjects are specified with the semantic relations of superclass and subclass in an ontology. When an OBIWAN agent receives the search results for a given topic, it filters and reranks the results based on their semantic similarity with the subjects. The similar documents are awarded and reranked higher on the result list.
3. Baseline Model: Web Model
The web model was the implementation of typical semi interviewing user profiles. It acquired user profiles from the web by employing a web search engine. The feature terms referred to the interesting concepts of the topic. The noisy terms referred to the paradoxical or ambiguous concepts.
v The topic coverage of TREC profiles was limited.
v The TREC user profiles had good precision but relatively poor recall performance.
v Using web documents for training sets has one severe drawback: web information has much noise and uncertainties. As a result, the web user profiles were satisfactory in terms of recall, but weak in terms of precision. There was no negative training set generated by this model
The world knowledge and a user’s local instance repository (LIR) are used in the proposed model.
1) World knowledge is commonsense knowledge acquired by people from experience and education
2) An LIR is a user’s personal collection of information items. From a world knowledge base, we construct personalized ontologies by adopting user feedback on interesting knowledge. A multidimensional ontology mining method, Specificity and Exhaustivity, is also introduced in the proposed model for analyzing concepts specified in ontologies. The users’ LIRs are then used to discover background knowledge and to populate the personalized ontologies.
ü Compared with the TREC model, the Ontology model had better recall but relatively weaker precision performance. The Ontology model discovered user background knowledge from user local instance repositories, rather than documents read and judged by users. Thus, the Ontology user profiles were not as precise as the TREC user profiles.
ü The Ontology profiles had broad topic coverage. The substantial coverage of possibly-related topics was gained from the use of the WKB and the large number of training documents.
ü Compared to the web data used by the web model, the LIRs used by the Ontology model were controlled and contained less uncertainties. Additionally, a large number of uncertainties were eliminated when user background knowledge was discovered. As a result, the user profiles acquired by the Ontology model performed better than the web model.
1.World knowledge base:
The world knowledge base must cover an exhaustive range of topics, since users may come from different backgrounds. The structure of the world knowledge base used in this research is encoded from the LCSH references. The LCSH system contains three types of references:
1. Broader term- The BT references are for two subjects describing the same topic, but at different levels of abstraction (or specificity). In our model, they are encoded as the is-a relations in the world knowledge base.
2. Used-for- The UF references in the LCSH are used for many semantic situations, including broadening the semantic extent of a subject and describing compound subjects and subjects subdivided by other topics. When object A is used for an action, becomes a part of that action (e.g., “a fork is used for dining”); when A is used for another object, B, A becomes a part of B (e.g., “a wheel is used for a car”). These cases can be encoded as the part-of relations.
3. Related term- The RT references are for two subjects related in some manner other than by hierarchy. They are encoded as the related-to relations in our world knowledge base.
2. Ontology Learning Environment:
The subjects of user interest are extracted from the WKB via user interaction. A tool called Ontology Learning Environment (OLE) is developed to assist users with such interaction. Regarding a topic, the interesting subjects consist of two sets: positive subjects are the concepts relevant to the information need, and negative subjects are the concepts resolving paradoxical or ambiguous interpretation of the information need. Thus, for a given topic, the OLE provides users with a set of candidates to identify positive and negative subjects. These candidate subjects are extracted from the WKB. Who are not fed back as either positive or negative from the user, become the neutral subjects to the given topic.
3. Ontology mining:
Ontology mining discovers interesting and on-topic knowledge from the concepts, semantic relations, and instances in ontology. Ontology mining method is introduced: Specificity and Exhaustivity. Specificity (denoted spe) describes a subject’s focus on a given topic. Exhaustivity (denoted exh) restricts a subject’s semantic space dealing with the topic. This method aims to investigate the subjects and the strength of their associations in ontology. In User Local Instance Repository, User background knowledge can be discovered from user local information collections, such as a user’s stored documents, browsed web pages, and composed/received emails.
Processor : Intel Duel Core.
Hard Disk : 60 GB.
Floppy Drive : 1.44 Mb.
Monitor : LCD Colour.
Mouse : Optical Mouse.
RAM : 512 Mb.
Operating system : Windows XP.
Coding Language : ASP.Net with C#
Data Base : SQL Server 2005
Xiaohui Tao, Yuefeng Li, and Ning Zhong, “A Personalized Ontology Model for Web Information Gathering”, IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No.4, April 2011.