Cogitech IncMiscellaneous | About us | Training | Tabular Topic Maps | Cogitative Topic Map Websites | Semantic Web Glasses | XWATL
Cogitative Technologies>Semantic Web Glasses

Secure SSL Certificates.

Presentation Harvesting Topic Maps with XSLT at Knowledge Technologies'2001 Conference. 2001-03-06

Harvesting Topic Maps with XSLT


by Nikita Ogievetsky , Cogitech, Inc.
nogievet@cogx.com
©Cogitech, Inc.



Food Chain

Crops are grown.
Crops are harvested and Fowl is hunted.
Food is cooked.
Cooked food is consumed.
Consumed food is recycled, disseminated, turned into fertilizer...
Crops are grown.



Knowledge Chain

Information is acquired.
Conceived information becomes knowledge.
Knowledge is cooked (prepared) for presentation.
Presentation is perceived.
Perceived presentation is recycled, disseminated turned into a common sense..
Information is acquired.



Food Chain. Large Perspective

Food undergoes 2 stages before it is consumed:
  • Harvesting and storing.
  • Aggregating and cooking.



Food Chain. Outcome




Final result depends on both:




1. How the produce was grown and gathered.

  • How, who, when, where.



How it was cooked.




Harvesting Constraints

It is hard to cook delicious, nice-looking and healthy dishes
  • given spoiled ingredients.
It is quite possible to cook tasteless dishes
  • given excellent ingredients.



Harvesting Stylesheets

Collection of constraints and rules constitute a stylesheet.
  • Stylesheets that transform agriculture resources into eatable groceries.
  • Stylesheets that transform groceries (bwyd) into food.



Knowledge Chain. Large Perspective.

Information has to undergo 2 stages before it is conceived:
  • Data acquisition (harvesting) and storing.
  • Aggregating and presenting.



Knowledge Chain. Outcome.

Final result depends on both:
  • How the information was collected.
    • who, when, where
  • How it was presented.



Harvesting Constraints

It is hard to make a good presentation
  • given corrupt/wrong underlying knowledge base.
It is quite possible to make a terrible presentation
  • given great underlying knowledge base.



Knowledge Harvesting Stylesheets

Collection of constraints and rules constitute a stylesheet.
  • Stylesheets that transform information resources into knowledge base.
    • Cognition Stylesheets.
  • Stylesheets that transform knowledge base into a presentation.
    • Presentation Stylesheets.



Cognition Stylesheet

Stylesheet that transforms ...
  • the situation that researcher is looking at into the situation he sees
  • sounds that researcher is listening to into the signals he distinguishes from the noise
  • a wine bouquet that researcher is testing into the bouquet he appreciates
  • ...



Perspectives...

  • The further back we look in time, the more adornments people use in their cognition stylesheets
    • mythologies
  • Or look back into your childhood...
<xsl:choose>
	<xsl:when test="Understand">
		<Have-Fun/>
	</xsl:when>
	<otherwise>
		<Disregard/>
	</otherwise>
</choose>



Food Web

"A complex of interrelated food chains in an ecological community." -- The American Heritage Dictionary

Semantic Web?




Why intermediate Knowledge repository

Or ?




Why XML Topic Maps for Knowledge Repository on the Web

  • Allows to maintain metadata in very structured way, at a higher level then a single web-site.
  • Different types of resources can be stored and maintained separately, and at the same time interconnected with each other and with the business rules of the web site.
  • Not only content and look and feel, but also the web site structure itself and navigational profiles can be customized for different types of users.



XSLT pseudocode

Harvesting Topic Maps: How-to

<xsl:choose>
	<xsl:when test="has-relevant-metadata">
		<topic>
		<xsl:for-each test="doesn't-have-relevant-metadata">
			<occurrence/>
		</xsl:for-each>
		</topic>
		<xsl:for-each test="has-relevant-metadata">
			<association/>
		</xsl:for-each>
	</xsl:when>
	<otherwise/>
</choose>



Knowledge Extraction Stylesheets for Dublin Core Metadata Element Set Mapping




dc.identifier

  • dc.identifier => topic/@id
If dc.identifier is missing
  • generate-id(.)=> topic/@id
<xsl:variable name="id">
	<xsl:choose>
		<xsl:when test="dc:identifier"><xsl:value-of select="dc:identifier"/></xsl:when>
		<xsl:otherwise><xsl:value-of select="generate-id()"/></xsl:otherwise>
	</xsl:choose>
</xsl:variable>
<topic id="{$id}"/>



dc.subject

  • dc.subject => <instanceOf> elements
<instanceOf>
	<xsl:choose>
		<xsl:when test="@rdf:resource">
			<topicRef xlink:href="#{@rdf:resource}"/>
		</xsl:when>
		<xsl:otherwise>
			<topicRef xlink:href="#{psv:Descriptor/@rdf:about}"/>
		</xsl:otherwise>
	</xsl:choose>
</instanceOf>



dc:subject Classes

  • Extract unique <dc.subject>s
<xsl:for-each select="//dc:subject[not(following::dc:subject/@rdf:resource = @rdf:resource)]">
 	<topic id = "{@rdf:resource}">
  		<subjectIdentity>
			<topicRef xlink:href="{substring-before(@rdf:resource,':')}.xtm#{@rdf:resource}"/>
		</subjectIdentity>
  		<instanceOf><topicRef xlink:href="#{substring-before(@rdf:resource,':')}"/></instanceOf>
  		<baseName>
	  		<scope><topicRef xlink:href="#{substring-before(@rdf:resource,':')}"/></scope>
			<baseNameString><xsl:value-of select="substring-after(@rdf:resource,':')"/></baseNameString>
  		</baseName>
  	</topic>
</xsl:for-each>



dc:subject Classes in PRISM

  • xslt:template mode="PRISM"
<xsl:for-each select="//dc:subject[@rdf:resource][not(following::dc:subject/@rdf:resource = @rdf:resource) and not(following::dc:subject/psv:Descriptor/@rdf:about=@rdf:resource )]">
 	<topic id = "{@rdf:resource}">
  		<subjectIdentity><topicRef xlink:href="{substring-before(@rdf:resource,':')}.xtm#{@rdf:resource}"/></subjectIdentity>
  		<instanceOf><topicRef xlink:href="#{substring-before(@rdf:resource,':')}"/></instanceOf>
  		<baseName>
	  		<scope><topicRef xlink:href="#{substring-before(@rdf:resource,':')}"/></scope>
			<baseNameString><xsl:value-of select="substring-after(@rdf:resource,':')"/></baseNameString>
  		</baseName>
  	</topic>
</xsl:for-each>
<xsl:for-each select="//dc:subject/psv:Descriptor[not(following::dc:subject/@rdf:resource = @rdf:about) and not(following::dc:subject/psv:Descriptor/@rdf:about=@rdf:about)]">
 	<topic id = "{@rdf:about}">
  		<instanceOf><topicRef xlink:href="#{substring-before(@rdf:about,':')}"/></instanceOf>
  		<baseName>
	  		<scope><topicRef xlink:href="#{substring-before(@rdf:about,':')}"/></scope>
			<baseNameString><xsl:value-of select="psv:label"/></baseNameString>
  		</baseName>
  		<baseName>
	  		<scope><topicRef xlink:href="#{substring-before(@rdf:about,':')}"/></scope>
			<baseNameString><xsl:value-of select="psv:code"/></baseNameString>
  		</baseName>
  	</topic>
</xsl:for-each>



dc.format

  • dc.format => <instanceOf> MIME types
<instanceOf>
	<topicRef xlink:href="#{translate(.,'/','')}"/>
</instanceOf>
  • Extract unique <dc.format>s
<xsl:for-each select="//rdf:Description[not(following::rdf:Description/dc:format=dc:format)][dc:format]">
 	<topic id = "{translate(dc:format,' .,/?-','')}">
  		<instanceOf><topicRef xlink:href="#dc-format"/></instanceOf>
		<baseName>
			<baseNameString><xsl:value-of select="dc:format"/></baseNameString>
		</baseName>
  	</topic>
</xsl:for-each>



#dc-format

<topic id="dc-format">
  <subjectIdentity>
      <subjectIndicatorRef xlink:href="http://purl.org/dc/elements/1.1#format"/>
  </subjectIdentity>
  <occurrence>
      <instanceOf><topicRef xlink:href="#definition"/></instanceOf>
      <scope><topicRef xlink:href="#dc"/></scope>
      <resourceData>The physical or digital manifestation of the resource.</resourceData>
  </occurrence>
  <occurrence>
      <instanceOf><topicRef xlink:href="#description"/></instanceOf>
      <scope><topicRef xlink:href="#prism"/></scope>
      <resourceData>
		Typically, Format may include the media-type or dimensions of the resource. 
		Format may be used to determine the software, hardware or other equipment needed to 
		display or operate the resource. Examples of dimensions include size and duration. 
		Recommended best practice is to select a value from a controlled vocabulary 
		(for example, the list of Internet Media Types [MIME] defining computer media formats).
		[For PRISM, I think we are only interested in the media type. 
		Physical format info is probably not something we need to do in an interoperable manner.]
      </resourceData>
  </occurrence>
</topic>



rdf:about

  • rdf:about => <resourceRef>/<subjectIndicatorRef>
PRISM metadata is about resource content => <subjectIndicatorRef>.
<subjectIdentity>
	<subjectIndicatorRef xlink:href="{@rdf:about}"/>
</subjectIdentity>



dc:title

  • dc:title => <baseName>
<baseName>
	<baseNameString><xsl:value-of select="."/></baseNameString>
</baseName>



dc:date

  • dc:date => <occurrence> of type "dc-date"
<occurrence>
	<instanceOf><topicRef xlink:href="#dc-date"/></instanceOf>
	<resourceData><xsl:value-of select="."/></resourceData>
</occurrence>



#dc-date

<topic id="dc-date">
  <instanceOf>
      <subjectIndicatorRef xlink:href="http://www.topicmaps.org/xtm/1.0/index.html#psi-occurrence"/>
  </instanceOf>
  <subjectIdentity>
      <subjectIndicatorRef xlink:href="http://purl.org/dc/elements/1.1#date"/>
  </subjectIdentity>
  <baseName><baseNameString>date</baseNameString></baseName>
  <occurrence>
      <instanceOf><topicRef xlink:href="#definition"/></instanceOf>
      <scope><topicRef xlink:href="#dc"/></scope>
      <resourceData>
		A date associated with an event in the life cycle of the resource.
      </resourceData>
  </occurrence>
  <occurrence>
      <instanceOf><topicRef xlink:href="#description"/></instanceOf>
      <scope><topicRef xlink:href="#prism"/></scope>
      <resourceData>
		Typically, Date will be associated with the creation or availability of the resource. 
		Recommended best practice for encoding the date value is defined in a profile of 
		ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format.
		Any number of dates may need to be associated with a resource. PRISM recommends 
		that this element contain the date and time the resource was published. 
		Preference should be given to the more specific PRISM date and time elements.
      </resourceData>
  </occurrence>
</topic>



Creators

  • Unique dc.creator => <topic> of type "creator"
<xsl:for-each select="//dc:creator[not(following::dc:creator = .)]">
 	<topic id = "{translate(.,' .,/?-','')}">
  		<instanceOf><topicRef xlink:href="#creator"/></instanceOf>
		<baseName>
			<baseNameString><xsl:value-of select="."/></baseNameString>
		</baseName>
  	</topic>
</xsl:for-each>



#dc-creator

<topic id="dc-creator">
  <subjectIdentity>
      <subjectIndicatorRef xlink:href="http://purl.org/dc/elements/1.1#creator"/>
  </subjectIdentity>
  <baseName><baseNameString>creator</baseNameString></baseName>
  <occurrence>
      <instanceOf><topicRef xlink:href="#definition"/></instanceOf>
      <scope><topicRef xlink:href="#dc"/></scope>
      <resourceData>
		An entity primarily responsible for making the content of the resource.
      </resourceData>
  </occurrence>
  <occurrence>
      <instanceOf><topicRef xlink:href="#description"/></instanceOf>
      <scope><topicRef xlink:href="#prism"/></scope>
      <resourceData>
		Examples of a Creator include a person, an organization, or a service. 
		Typically, the name of a Creator should be used to indicate the entity. 
		In principle, any number of creators may be associated with a resource. 
		PRISM recommends that this element contain the name of one person or 
		organization primarily responsible for this resource.
		Synonyms or "aliases" for creator names should be handled with an Authority File. 
		Use other PRISM elements to describe arbitrary contributory roles.
      </resourceData>
  </occurrence>
</topic>



Knowledge Extraction Stylesheets for Publishing Requirements for Industry Standard Metadata (PRISM) Specification




prism:copyright

  • prism:copyright => <occurrence> of type "copyright"
<occurrence>
	<instanceOf><topicRef xlink:href="#copyright"/></instanceOf>
	<resourceData><xsl:value-of select="."/></resourceData>
</occurrence>



prism:hasAlternative, prism:isAlternative

  • prism:hasAlternative; prism:isAlternative => <association> of type "alternatives"
<association>
	<instanceOf><topicRef xlink:href="#alternative"/></instanceOf>
	<member>
  		<roleSpec><topicRef xlink:href="#hasAlternative"/></roleSpec>
  		<topicRef xlink:href="#{../dc:identifier}"/>
	</member>
	<member>
  		<roleSpec><topicRef xlink:href="#isAlternative"/></roleSpec>
  		<topicRef xlink:href="#{//rdf:Description[@rdf:about=current()/@rdf:resource]/dc:identifier}"/>
	</member>
</association>



#hasAlternative, #isAlternative

<topic id="isAlternative">
  <subjectIdentity>
      <subjectIndicatorRef xlink:href="http://prismstandard.org/1.0#isAlternative"/>
  </subjectIdentity>
  <baseName><baseNameString>is alternative for</baseNameString></baseName>
  <occurrence>
      <instanceOf><topicRef xlink:href="#description"/></instanceOf>
      <scope><topicRef xlink:href="#prism"/></scope>
      <resourceData>
		The described resource can be substituted for the referenced resource.
      </resourceData>
  </occurrence>
</topic>
<topic id="hasAlternative">
  <subjectIdentity>
      <subjectIndicatorRef xlink:href="http://prismstandard.org/1.0#hasAlternative"/>
  </subjectIdentity>
  <baseName><baseNameString>has an alternative</baseNameString></baseName>
  <occurrence>
      <instanceOf><topicRef xlink:href="#description"/></instanceOf>
      <scope><topicRef xlink:href="#prism"/></scope>
      <resourceData>The described resource has an alternative version that can be substituted, namely the referenced resource.</resourceData>
  </occurrence>
</topic>



XSLT Layers

Per XWATL framework harvesting stylesheets are split in layers.
Include only required stylesheets.
Example:
<xsl:stylesheet ...>
  <!--"http://purl.org/dc/elements/1.1/" vocabulary -->
  <xsl:include href = "dc2xtm.xsl" />  
  <!--"http://purl.org/rss/1.0/modules/syndication/" vocabulary -->
  <xsl:include href = "sy2xtm.xsl" />  
  <!--"http://purl.org/rss/1.0/modules/company/" vocabulary -->
  <xsl:include href = "co2xtm.xsl" />  
  <!--"http://purl.org/rss/1.0/modules/textinput/" vocabulary -->
  <xsl:include href = "ti2xtm.xsl" />  
  <!--"http://purl.org/rss/1.0/" vocabulary -->
  <xsl:include href = "rss2xtm.xsl" />  
  <xsl:include href = "prism2xtm.xsl" />  
  <xsl:include href = "psv2xtm.xsl" />  
{...}



Knowledge Presentation XSLT Templates

Topic Maps give XSLT something to do!




Indexing topics with XSLT keys

<xsl:key
  name = "topicByID" 
  match = "topic" 
  use = "concat('#',@id)" />

  <xsl:apply-templates select="key('topicByID',@xlink:href)"/>



Indexing instanciated topics with XSLT keys

<xsl:key
  name = "instance"
  match = "topic"
  use = "substring-after(instanceOf/topicRef/@xlink:href,'#')" />

  <xsl:apply-templates select="key('instance',@id)"/>



XTM Cooking stylesheets Structural Components

  • Topic Map source code that controls web site content and site map.
  • XSLT stylesheets that control web page layout and look-and-feel style.
  • The whole WWW universe of resources referenced by XTM topic <occurrence> resource locators.
More on this in the AWL book
  • "XML Topic Maps: Creating and Using Topic Maps for the Web"
edited by Jack Park.



Mapping Topic Map elements for HTML rendition

Topic MapWeb Site TopicWeb Page Topic AssociationsSite map. Occurrences Images, Logo, Text, HTML fragments, External Links Topic Names Page Headers, Titles, UL lists, Hyperlinks titles.



Bon Appétit!

  • http://www.cogx.com





Budget Web Hosting and Cheap Domain Name Registration

Cogitech Inc. Made with Bexcelor & Tabular Topic Maps