Conceptual Documents and Hypertext Documents are two Different Forms of Virtual Document



Sylvie Ranwez - Michel Crampes


Laboratoire de Génie Informatique et d'Ingénierie de Production

EMA - EERIE, Parc Scientifique Georges Besse

F-30035 Nîmes Cedex 1,


Abstract. When posing simple queries on the Internet, users often find themselves facing relatively incoherent and poorly organized groups of items of information. They should be helped in their navigation through them. This help can take the form of Conceptual Documents that are adapted to their particular state and preferences.

In this paper we give a definition of Virtual Documents and we study two particular cases: Hypertext Documents and Conceptual Documents. We show how Conceptual Documents can improve data retrieval and quality of knowledge transfer. We propose an approach for generating conceptual documents and we present an application based on this approach.

Key Words: Virtual Documents, Conceptual Documents, Hypertext, Conceptual Navigation.




1. Introduction


Searching for information on the Internet, users find themselves faced with collections of pieces of information which they bring to light either by browsing or by using search engines. These collections often have little coherence and users have to filter and organize them in order to make them exploitable. When they do this, they synthesize a form of document suited to their needs. In what follows, we call the individual pages or pieces of pages which users unearth ‘Information Bricks’ (IBs). We call a Real Document (RD) the composite document which they eventually synthesize out of these bricks.


Our research focuses on techniques that would allow the production of these real documents from raw information as coherently as possible. A collection of information bricks with techniques suited to building a real document is called a Virtual Document (VD) - one which is not itself a real document but which contains the specifications necessary for producing one.


This paper analyses first what characterizes coherency in a document, then how it is possible to produce a VD from a set of IBs. Two forms of VD are analyzed: Hypertextual and Conceptual. A Real Document is presented as the result of a VD and specific circumstances. The circumstances differ according to the type of VD - whether it is hypertextual or conceptual. Hypertext documents come into being through the user’s own browsing through hyperlinks. Conceptual documents through a conceptual engine and the user’s specifications.

Finally we give a presentation of a project under development in our laboratory for designing a foundation for building conceptual documents.


2. Preliminary definitions


A document (from the Latin documentum) means "a thing used for giving instruction". Often it is written and can be used as proof, information or testimony.


A document is a tool used for transferring knowledge. Therefore it must be structured in a way that optimizes the user’s comprehension. This structure might be as follows:


This document is called a Real Document (RD) since it can be consulted without any change, i.e. in its present state.


The thing we call an Information Brick (IB) is a fragment of a document, rendered on one (at least) medium, characterized by a conceptual model and insertable into a real document. Once a set of IBs exists, the building of a real document takes the form of selecting the pertinent ones then organizing and assembling them. Bricks can be nested since they can be segmented into sub-bricks. The size of an IB depends on the author’s wish (how deep he explains a concept), and on the content itself: a pen description will probably be shorter than a plane description.


The document from which IBs are extracted is called a Source Document.


The most common definition for Virtual is: being in essence or effect but not in fact; in other words, being in a state of possibility. In the following we will adhere to this definition; it is one widely used in data processing - in for example the term ‘virtual memory’: something which appears functionally for a given user without taking into account the physical structure or the logic used.


In accordance with the two definitions given above, we can define what a Virtual Document is.


3. Virtual Document: a definition


A Virtual Document (VD) is a non-organized collection of Information Bricks (IBs) associated with tools and techniques allowing the creation of a Real Document (RD).



By analogy with object-oriented languages, a Virtual Document is a composite class, IBs are components, and techniques and tools are construction methods. A Real Document can be seen as an instance of this class.


A more formal definition for a Virtual Document might be:


VD = { IB } + Methods allowing the generation of a finished IB sequence.


The methods must take into account:


A Real Document is then a sequence of IBs generated by the methods in concordance with the user’s specification.

4. Different forms of Virtual Document and their characteristics


In this part we will differentiate two kinds of Virtual Document: Hypertextual and Conceptual.


4.1 Hypertext Documents (HD)


By analogy with the vocabulary used in object languages, a Hypertext Document is a sub-class of a VD.


An HD is composed of hypertext information bricks that have predefined connections. The links contained in these bricks (hyperlinks) lead to other bricks - the whole constituting a graph. The method which allows the building of a Real Document out of an HD (the "constructor" of an object-oriented language) is the user’s browsing through the document - i.e. the visit paid to this graph.


A hypertext document is by definition a VD because its final form depends entirely on the wishes of the user. The route taken in visiting the document is known only by the user; it is not preset. A particular case of a HD is the one where there is only one link at the end of each page - and it leads to the following page. In this case the HD is also the RD.


The formal characteristics of a VD are:


HD = {Hypertext IB} + User’s browsing.



4.2 Conceptual Documents (CD)


A Conceptual Document is a Virtual Document from which it is possible to build a Real Document dynamically and at any moment the user asks for one. Contrary to HDs, the bricks that compose a CD can have several formats. These bricks are selected via the semantics of their contents. They can be accessed via the Internet or locally (CDROM, DVD, hard disk,...). In this case, the methods used to create a Real Document consist of an engine and the user’s specifications. The engine is in charge of selecting the IBs, organizing and assembling them, but it does this in obedience to specifications defined by the user.


Among these specifications can figure economic constraints such as reading time. This point is significant since it constitutes one of the major differences between HDs and CDs. Indeed, we can have HDs whose links take into account the semantics of the IBs referred to, but time will never play a role in the browsing. However it is one of the things users keep firmly in mind.


With our formalism, we can write:


CD = {IB}+ Engine and specifications.



4.3 Other characteristics for VDs

It is possible to distinguish homogeneous VDs from heterogeneous VDs.


4.3.1 Homogeneous Virtual Documents


A VD is homogeneous if all its bricks come from the same source document. Thus they all have the same author or group of authors.

A homogeneous document has the following properties:


4.3.2 Heterogeneous Virtual Documents


Heterogeneous Virtual Documents are documents composed of bricks that can have several different origins – Internet, CD-ROM, etc. – and thus several authors. Their characteristics are as follows:

In such documents, problems of coherence are likely. Transitions between IBs will need particular treatment.


5. How to create Real Documents from Conceptual Ones?


The constructing of an RD has three phases to it. The first one is Information Retrieval (IR), and it takes into account size constraints. The second one is the building of the real document; this includes the filtering of the retrieved IBs, the ordering and assembling of these IBs. The final one consists in displaying the document through a convenient interface (HTML for example).


To do this we need an engine. We have constructed what we call a Conceptual Evocative Engine (CEE), one which uses the semantics of the IBs to do the selecting and building of the final document [CRA 97] [CRA 98].


To allow reusability of IBs, they have to be qualified with metadata. Qualification needs a description method. We have developed such a method through a Document Type Definition (DTD) and the corresponding language written in XML. Once the qualification is done the CEE can identify each brick by the semantics of its content and build up conceptual links between those bricks.


Brick qualification using a DTD and associated treatment has several advantages, amongst them the apprehension of constraints of different sorts - narrative and economic for example.

First, narrative constraints:

Second, economic constraints:


When the engine has selected IBs which meet the user’s needs, it has to assemble them. This is achieved by taking into account not just user preferences and narrative and economic constraints but also information such as the educational curriculum the user has followed or is following. The result is a real document adapted to the specific needs of the specific user.


6. Application: The Karina project and the associated DTD


The theoretical considerations set out above constitute the basis of an application - that we call Karina - that we have developed in our laboratory.

Karina is a project jointly undertaken by the Alès School of Mines (l'Ecole des Mines d'Alès or EMA) and the Marseilles Higher Engineering School (l'Ecole Supérieure des Ingénieurs de Marseille or ESIM) within the framework of the French Ministry of Industry's call for projects for the Information Highway. This particular project aims to furnish teachers with tools enabling them to use the Internet for making courses available and also for making use of course bricks provided either by other teachers or by alternative sources - electronic newspapers etc. Initially our application is oriented towards distance learning, but we wish it to remain general enough to embrace cultural and leisure activities as well.


The conceptual evocation engine has served as the object of models [CRA 98] for realizing narrative abstracts in the domain of interactive television. It is now subject to further development, with the aim of finding application in the domain of teaching [RAN 98].


7. State of the art


In adaptive hypermedia systems, the aim is to find a compromise between guiding users and letting them browse on their own [BRU 96] [GRE 97] [STE 97] [WEB 97]. This work concerns essentially the adaptation of user-browsing to an already established hypergraph [BRU 98]. It does not aim at the construction of new links and their organization in response to user needs. The approaches cited above are attempting to find ways of adapting pre-existent hypermedia, not dynamically constructing routings through ad hoc collections of bricks borrowed from other documents.


The sharing of resources though makes document indexing necessary. Document description articulates around complete documents and makes use of either specific descriptors [MAR 97][BAR 98] or descriptors already established as standards or recommendations [DUB 97][MARC]. This latter category includes the IMS (Instructional Management System) recommendation for educational documents [IMS 98]. The aim of the project is to furnish specifications for the description of educational materials using a system of meta-data.


The XML (eXtensible Mark-up Language) language allows the description of electronic documents by means of a DTD (or Document Type Definition). The use of DTDs for internet documents is recent yet already well-established because, for example, a preliminary version of the MARC standard DTD is available [MARCDTD]. XML is intended to be evolutive since it allows for the fusing of several DTDs - not unlike the principles of inheritance in the object world [W3C 98].

Other forms of document description are visited: [BAL 97] for instance proposes an object architecture for modeling electronic documents. In Karina, we use a similar reification technique to transform a descriptor into an XML element without however situating our model in the pure object world.

The authors of [BRA 98] use HTML comments to annotate documents in order to be able to implement adaptive changes. In the Karina project, the entities forming the basis of conceptual browsing are explicitly declared as XML elements - not hidden in comments.


[GRE 98] proposes a technique for automatically generating links between documents treating any one subject. This permits grouping the responses to a search on that subject. In order to extract a content semantics, the approach is to note the words and their frequency of use in a document - including closely related words (this via relations which can be established by WordNet links). The main problem here is in speed of execution.


Starting from the linear form of the conceptual graph, we have explored the use of light versions of weighted conceptual graphs for effecting conceptual evocations between documents [CRA 97] [CRA 98].



8. Conclusion


In their browsing and searching on the internet, users need guidance. They need a system capable of creating documents adapted to their precise requirements and demands; documents which come into being as a result of the application of certain circumstances to a virtual document.

After defining these different types of document, we have concentrated on one of them - the conceptual one. We put forward a system permitting the creation of such documents and the production of real documents adapted to the needs of their users.

Our approach focuses on modes of conceptual browsing. These operate on collections of information bricks which are qualified - that is to say containing inside themselves meta-information related to their content and to any constraints limiting their use. A conceptual evocation engine (CEE) has been modeled and is currently the object of further development.

The process of constructing real documents out of their conceptual counterparts requires the definition of rules of narrative construction and constraint optimization algorithms - this because a real document has to obey constraints of size and time. Our theoretical work now bears on this aspect of the problem. We are attempting to establish the formal bases of a language for conceptual document description - and thus also for virtual document construction.





[BAL 97] Baldonado M., Chen-Chuan K.C., Gravano L.? Metadata for Digital Libraries: Architecture and Design Rationale. Actes. DL'97 ACM Digital Library '97? Philadelphia., PA., USA, July 1997, ACM Press, pp. 247-253.

[BAR 98], Barthélémi S., Loubier M., Pinon J.M., SEMUSDI, SErveur MUlitimédia pour les Sciences De l'Ingénieur. Actes du congrès NTICF'98, INSA de Rouen, 18-19-20 Novemvre 1998.

[BRA 98] De Bra P., Calvi L., 2L670: A Flexible Adaptive Hypertext Courseware System. Actes HyperText'98, Pittsburgh., PA., USA, June 1998, ACM Press, pp. 283-284.

[BRU 96] Brusilovsky P., Schwartz E., Weber G., A Tool for Developing Hypermedia-Based ITS on WWW, Position Paper for ITS'96 Workshop on Architectures and Methods for Designing Cost-Effective and Reusable ITSs, Montreal, June 10th 1996.

[BRU 98]. Brusilovsky P., Methods and Techniques of Adaptive Hypermedia. Adaptive Hypertext and Hypermedia, Brusilovsky, P., Kobsa, A., et Vassileva J. eds. Kluwer Academic Publishers, 1998.

[CRA 95] Crampes M. Composition Multimédia dans un contexte Narratif. Modèles et Maquetage Basé sur une Architecture Agents. PhD Thesis, University of Montpellier II, 1995.

[CRA 97] Crampes M. Auto-Adaptative Illustration through Conceptual Evocation in Proc. DL'97 ACM Digital Library '97 (Philadelphia., PA., USA, July 1997), ACM Press, pp. 247-253.

[CRA 98] Crampes M., Veuillez J.P., Ranwez S., Adaptive Narrative Abstraction Actes. HyperText 98, Pittsburgh., PA., USA, June 1998, ACM Press, pp. 97-105.

[DUB 97] Dublin Core Metadata Element Set: Reference Description,, 1997.

[GRE 97] Greer J.E., Philip T. Guided Navigation Through Hyperspace, Actes Workshop "Intelligent Educational Systems on the World Wide Web", 8th World Conference of the AIED Society, Kobe, Japan, 18-22 August 1997.

[GRE 98] Green S.J. Automated Link Generation: can we do better than term repetition? Seventh International World Wide Web Conference, Brisbane, Australia, 14-18 April 1998.

[IMS 98] Educause, Instructional Management Systems., 1998

[MAR 97] Marchionini G., Nolet V., Williams H., Ding W., Beale Jr. J., Rose A., Gordon A., Enomoto E., Harbinson L., Content + Connectivity => Community: Digital Resources for a Learning Community. Actes Second ACM Digital Library conference, Philadelphia, PA, USA, July 1997.

[MARC] Library of Congress; Network Development and MARC Standards Office. MARC STANDARDS Machine-Readable Cataloging,

[MARCDTD] Library of Congress; Network Development and MARC Standards Office. MARCDTD,

[RAN 98] Ranwez S., Formalisation d'Ontologie Pédagogique (incluant matériel et procédés didactiques) et raisonnement sur cette ontologie pour l'élaboration de cours adaptatifs, suivant différentes Stratégies Pédagogiques, dans un système de formation continue disponible via Internet. Rapport interne LGI2P, Ecole des Mines d'Ales, 1998.

[STE 97] Sterb M.K. The difficulties in Web-Based Tutoring, and Some Possible Solutions, Actes Workshop "Intelligent Educational Systems on the World Wide Web", 8th World Conference of the AIED Society, Kobe, Japan, 18-22 August 1997.

[W3C 98] W3C, Document Object Model (DOM) Level 1 Specification Version 1.0 W3C REC-DOM-Level-1-19981001,, 1 October 1998.

[WEB 97]. Weber G., Specht M. User Modeling and Adaptive Navigation Support in WWW-based Tutoring Systems, Actes UM-97, Cagliari, Italy, June 2-5, 1997.