<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>TheContentGuy &#187; unstructured content</title>
	<atom:link href="http://thecontentguy.net/blog/tag/unstructured-content/feed/" rel="self" type="application/rss+xml" />
	<link>http://thecontentguy.net</link>
	<description>all things unstructured</description>
	<lastBuildDate>Sat, 12 May 2012 05:00:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.5</generator>
		<item>
		<title>The Role of Taxonomy in Intelligent Content</title>
		<link>http://thecontentguy.net/blog/2011/02/04/the-role-of-taxonomy-in-intelligent-content/</link>
		<comments>http://thecontentguy.net/blog/2011/02/04/the-role-of-taxonomy-in-intelligent-content/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 21:56:13 +0000</pubDate>
		<dc:creator>paulwlodarczyk</dc:creator>
				<category><![CDATA[ECM]]></category>
		<category><![CDATA[Front Page]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[intelligent content]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[unstructured content]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/?p=1023</guid>
		<description><![CDATA[<a href="http://www.rockley.com/IC2011/"> <img src="http://thecontentguy.net/wp-content/uploads/ic2011skyscraper.jpg" align=RIGHT alt="Intelligent Content 2011 - Palm Springs, CA 2/16-18" height="129" width="103"  /></a>Admittedly, taxonomy is probably the farthest thing from your mind if you’re designing an intelligent content application. My conclusion in working with search and enterprise content management technology is that taxonomy development and management is a key success factor in creating effective intelligent content systems. Taxonomy can inform content types and metadata schema, make for consistent tagging, harmonize disparate structured data, and drive dynamic search and navigation user experiences, even with not-so-intelligent legacy content.]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1027" title="tag-image" src="http://thecontentguy.net/wp-content/uploads/tag-image.jpg" alt="" width="339" height="502" />Admittedly, taxonomy is probably the farthest thing from your mind if you’re designing an intelligent content application. You’re probably focused on technology selection and content strategy. In fact, many search engine providers – notably Google – would argue strenuously that you don’t need a taxonomy to find content. I would argue even more strenuously that your intelligent content project – whether it’s a live content project with user-generated content or an information publishing portal – just won’t success unless you give a long, hard think about taxonomy and content classification.</p>
<p>Over the last several years I’ve been in the midst of helping companies build intelligent content applications using taxonomy, and almost all of these had already spend a pile of cash on search and content management technology, only to see it fall short of their vision. Many had even implemented component content in DITA or another XML vocabulary. Taxonomy helps to bridge the gap in several important ways.</p>
<p>First of all, taxonomy helps companies organize and manage their source content. A well designed taxonomy can become the basis for a metadata and content type strategy for the CMS, and the source of the controlled vocabularies that content authors and publishers use to classify their content. Content classification is important for defining the “aboutness” of content as well as administering it. We all need well-defined, clear, unambiguous terms for administrative metadata, including our organization structures, customers, products, information types, and information security classifications, to name a few important categories. Leaving CMS users to enter this metadata by freely typing exposes us to human errors and inconsistency. At a minimum, we want to maintain an authoritative term list and expose it in drop-down lists for users to select values when they upload content to the CMS. Users are usually able to enter this metadata with low error rates when selecting from controlled term lists, especially if their job is content publishing. The list just makes it easy and consistent.</p>
<p>For “aboutness” metadata, we need to help users be comprehensive and consistent, so we often use the taxonomy to inform a classification engine that analyzes the document and provides suggested metadata to the user. Subject classification schemes are conceptually similar to the subject headings in library card catalog systems – they start with broad domains and categories that are broken into increasingly narrower topic spaces. The big difference is that each organization will need to develop and maintain subject classifications that are relevant to their business content. For example, I’m helping a high tech manufacturer classify technical documents; their taxonomy covers technologies, manufacturing process steps, customer needs / applications, as well as symptoms, fault codes, and root causes for troubleshooting and repairing their products. They had a rich set of terms for all of these topics, some in a corporate taxonomy and others in specific systems for service and quality management.  Putting them into a taxonomy helped us use that information for auto-classification of content. Metadata is proposed to content publishers when they upload it to the CMS, and they can add or remove terms form the proposed metadata using the same taxonomy the classifier used. The result is that documents are now more completely tagged with “aboutness” metadata in the CMS.</p>
<p>Intelligent content doesn’t end with the CMS, however. Search engines classify content using their algorithms to match end-user search queries (what you type in the search box) to content – whether it’s unstructured document content or structured data in an enterprise system, or both. A search engine doesn’t understand your business – it only looks at all of your content as a “bag of words” that it statistically determines to be “about” something looking at unusual combinations of terms,  or high-frequency terms. A taxonomy can tell the search indexing engine that certain terms are more meaningful to your business, and that there are relationships between terms that matter when it comes to relevance.  Even if the search engine is placing higher value on metadata values, those usually contain “preferred” terms – the official business labels. Users, on the other hand, are not so disciplined when they type in the search box – they may use “non-preferred” terms. A favorite example is searching a NASA site for “moon buggy” when NASA calls the item the “lunar excursion vehicle.” The taxonomy can relate those terms so the search engine returns relevant documents – even if they never contain the term “moon buggy” or that referred to the acronym for it.  </p>
<p>Finally, taxonomy can be used to driving the search user experience in major ways. It can become the basis for the facets in search refinement, allowing users to narrow their search along the dimensions of the taxonomy (show me only information about these document types, or these products, etc.). It can define the terms we show in tag clouds and other interface objects. The taxonomy can also help the search engine identify related searches – for instance, all of the astronauts who ever drove a LEV, or the Apollo missions that included a LEV.</p>
<p>I’ve actually seen a recent <a href="http://www.nhs.uk/Search/Pages/HealthExplorer.aspx?q=Diabetes&amp;qID=845#/tab~845~term~845~history~0">example</a> of an intelligent content application that is entirely defined in a taxonomy – the UK National Health Service has build a Flash application that lets you navigate a hyperbolic tree of symptoms and diseases, all of which is directly managed in a taxonomy and flowed directly into the portal. Their taxonomy aids search and results relevancy as well by taking terms like “AIDS” and assuring that search engine stemming doesn’t return documents that contain the terms “aid” or “aiding” – imagine all the results for “first aid”, “band aid”, “hearing aid”, or “health aid”.  </p>
<p>We also tend not to think of intelligent content systems as being driven by rather “dumb” legacy content, instead they are all about XML and structured content. In fact, most of the intelligent content portals I’ve worked on in the last several years were being populated by legacy PDF content – which was made intelligent only through the use of auto-classification with a well-crafted taxonomy and exposed through faceted search. For lengthy documents, document preview technologies can help hone-in on relevant pages – also being guided by the taxonomy to map search queries to preferred terms.</p>
<p>My conclusion in working with search and enterprise content management technology is that taxonomy development and management is a key success factor in creating effective intelligent content systems. Taxonomy can inform content types and metadata schema, make for consistent tagging, harmonize disparate structured data, and drive dynamic search and navigation user experiences, even with not-so-intelligent content.</p>
<p> I&#8217;ll be presenting more on this topic with extensive examples at <a title="Intelligent Content 2011" href="http://www.rockley.com/IC2011/">Intelligent Content 2011</a> in Palm Springs, February 16-18. Hope to see you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2011/02/04/the-role-of-taxonomy-in-intelligent-content/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Intelligent Content 2010: Making hay from 30 years of legacy content</title>
		<link>http://thecontentguy.net/blog/2009/11/12/intelligent-content-2010-making-hay-from-30-years-of-legacy-content/</link>
		<comments>http://thecontentguy.net/blog/2009/11/12/intelligent-content-2010-making-hay-from-30-years-of-legacy-content/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 01:43:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[ECM]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[DITA]]></category>
		<category><![CDATA[Earley & Associates]]></category>
		<category><![CDATA[intelligent content]]></category>
		<category><![CDATA[legacy content]]></category>
		<category><![CDATA[service information]]></category>
		<category><![CDATA[unstructured content]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/?p=752</guid>
		<description><![CDATA[<a href="http://www.rockley.com/IntelligentContent2010/"> <img class="alignleft" src="http://thecontentguy.net/wp-content/uploads/2009/11/IntelligentContent2010.jpg" width="84" height="143" alt="Join us at Intelligent Content 2010"/></a>When businesses implement intelligent content, they usually adopt a “day forward” strategy that assures all new content is “intelligent” (in XML and dynamically published), and they minimize the volume of legacy content to convert and migrate.  Semiconductor equipment manufacturers – like many other capital equipment manufacturers – support products that last 30 years or more, so legacy technical content is critical to keeping that equipment up, running, and profitable for their customers.  In this presentation, we’ll discuss one company’s unique pathway forward to intelligent content. <a href="http://thecontentguy.net/blog/2009/11/12/intelligent-content-2010-making-hay-from-30-years-of-legacy-content/">Read more...</a>]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-362 alignright" title="Earley &amp; Associates" src="http://thecontentguy.net/wp-content/uploads/2009/06/earleysmall.png" alt="Earley &amp; Associates" width="120" height="126" /><strong>Intelligent Content 2010</strong><br />
February 25-26, 2010<br />
The Parker, Palm Springs, CA</p>
<p>When businesses implement intelligent content, they usually adopt a “day forward” strategy that assures all new content is “intelligent” (i.e., is developed in XML and dynamically published), and they minimize the volume of legacy content to convert and migrate. Semiconductor equipment manufacturers – like many other capital equipment manufacturers – support products that last 30 years or more, so legacy technical content is critical to keeping that equipment up, running, and profitable for their customers.</p>
<p>In one such company today, that legacy content exists as monolithic manuals in PDF format that are hundreds of pages long, or as PDF renditions of engineering drawings, or as data in enterprise systems in relational databases or ERP systems. Field Service Engineers spend many hours per week searching for content across multiple systems – ERP data, content repositories, engineering websites, drawing repositories, knowledge bases, technical forums, email, personal notes – to find the procedures, drawings, reference information, and expert advice they need to effectively troubleshoot and repair customer equipment.</p>
<p>In the future, that information needs to be seamlessly integrated into a single-point of access that provides the Field Service Engineer with information that is relevant to their current context: the product they are working on, the customer account, the current configuration, the current problem or fault condition, the latest engineering information and best known methods – all without entering a word into a search box.</p>
<p>The challenge for intelligent content is simply stated: How do we get there from here?</p>
<p>In this presentation at the <a title="Intelligent Content 2010" href="http://www.rockley.com/IntelligentContent2010/" target="_blank">Intelligent Content</a> conference, we’ll discuss this company’s unique pathway forward to intelligent content:</p>
<ul>
<li>The complexity and richness of content types and sources that must be unified through search and navigation for the end user (service engineers);</li>
<li>Building a firm foundation with a sound information architecture including taxonomy and metadata;</li>
<li>Taking the first steps with a modular content strategy based upon PDF documents in SharePoint, with a unified custom search experience;</li>
<li>Transitioning in later phases of the project to intelligent XML content integrated with enterprise data in a seamless, task-focused interface.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2009/11/12/intelligent-content-2010-making-hay-from-30-years-of-legacy-content/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weekly Digest for 2009-05-29</title>
		<link>http://thecontentguy.net/blog/2009/05/29/twitcontentguy-weekly-digest-for-2009-05-29/</link>
		<comments>http://thecontentguy.net/blog/2009/05/29/twitcontentguy-weekly-digest-for-2009-05-29/#comments</comments>
		<pubDate>Fri, 29 May 2009 12:00:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Digest]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[unstructured content]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/blog/2009/05/29/twitcontentguy-weekly-digest-for-2009-05-29/</guid>
		<description><![CDATA[[metadata] Turn Tagging into Business Results: Take the Metadata Best Practices Survey http://ping.fm/TZa6M [search] Steve Jurvetson: &#8220;All the problems of search would be solved if search relevance was ranked by what browsers were displaying.&#8221; [trends] Churchill Club Top Tech Trend#3: A deluge of unstructured data creates the next great information leaders http://ping.fm/0ehWs [content management] via [...]]]></description>
			<content:encoded><![CDATA[<ul class="aktt_tweet_digest">
<li><span class="syndication-description">[metadata] Turn Tagging into Business Results: Take the Metadata Best Practices Survey <a rel="nofollow" href="http://ping.fm/TZa6M">http://ping.fm/TZa6M</a></span></li>
<li><span class="syndication-description">[search] Steve Jurvetson: &#8220;All the problems of search would be solved if search relevance was ranked by what browsers were displaying.&#8221;</span></li>
<li><span class="syndication-description">[trends] Churchill Club Top Tech Trend#3: A deluge of unstructured data creates the next great information leaders <a rel="nofollow" href="http://ping.fm/0ehWs">http://ping.fm/0ehWs</a></span></li>
<li><span class="syndication-description">[content management] via cNet: &#8220;The dark matter of the enterprise is unstructured data,&#8221; Ann Winblad <a rel="nofollow" href="http://ping.fm/0ehWs">http://ping.fm/0ehWs</a></span></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2009/05/29/twitcontentguy-weekly-digest-for-2009-05-29/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UIMA &#8211; First Standard for Accessing Unstructured Information</title>
		<link>http://thecontentguy.net/blog/2009/04/07/uima-first-standard-for-accessing-unstructured-information/</link>
		<comments>http://thecontentguy.net/blog/2009/04/07/uima-first-standard-for-accessing-unstructured-information/#comments</comments>
		<pubDate>Tue, 07 Apr 2009 16:14:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[ECM]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[content management]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[OASIS]]></category>
		<category><![CDATA[unstructured content]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/blog/?p=184</guid>
		<description><![CDATA[I posted earlier today at Not Otherwise Categorized&#8230; about the announcement last month of OASIS&#8217;s new standard for the Unstructured Information Management Architecture, Version 1.0.  Read all about it here:  OASIS Approves UIMA &#8211; the first standard for accessing Unstructured Information]]></description>
			<content:encoded><![CDATA[<p>I posted earlier today at <a title="Earley &amp; Associates blog" href="http://sethearley.wordpress.com" target="_blank">Not Otherwise Categorized&#8230;</a> about the announcement last month of OASIS&#8217;s new standard for the Unstructured Information Management Architecture, Version 1.0.  Read all about it here:  <a title="OASIS Approves UIMA - the first standard for accessing Unstructured Information" rel="bookmark" href="http://sethearley.wordpress.com/2009/04/07/oasis-approves-uima-the-first-standard-for-accessing-unstructured-information/" target="_blank"><span style="color: #105cb6;">OASIS Approves UIMA &#8211; the first standard for accessing Unstructured Information</span></a></p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2009/04/07/uima-first-standard-for-accessing-unstructured-information/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

