<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>TheContentGuy &#187; semantic search</title>
	<atom:link href="http://thecontentguy.net/blog/tag/semantic-search/feed/" rel="self" type="application/rss+xml" />
	<link>http://thecontentguy.net</link>
	<description>all things unstructured</description>
	<lastBuildDate>Sat, 12 May 2012 05:00:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.5</generator>
		<item>
		<title>The Role of Taxonomy in Intelligent Content</title>
		<link>http://thecontentguy.net/blog/2011/02/04/the-role-of-taxonomy-in-intelligent-content/</link>
		<comments>http://thecontentguy.net/blog/2011/02/04/the-role-of-taxonomy-in-intelligent-content/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 21:56:13 +0000</pubDate>
		<dc:creator>paulwlodarczyk</dc:creator>
				<category><![CDATA[ECM]]></category>
		<category><![CDATA[Front Page]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[intelligent content]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[unstructured content]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/?p=1023</guid>
		<description><![CDATA[<a href="http://www.rockley.com/IC2011/"> <img src="http://thecontentguy.net/wp-content/uploads/ic2011skyscraper.jpg" align=RIGHT alt="Intelligent Content 2011 - Palm Springs, CA 2/16-18" height="129" width="103"  /></a>Admittedly, taxonomy is probably the farthest thing from your mind if you’re designing an intelligent content application. My conclusion in working with search and enterprise content management technology is that taxonomy development and management is a key success factor in creating effective intelligent content systems. Taxonomy can inform content types and metadata schema, make for consistent tagging, harmonize disparate structured data, and drive dynamic search and navigation user experiences, even with not-so-intelligent legacy content.]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1027" title="tag-image" src="http://thecontentguy.net/wp-content/uploads/tag-image.jpg" alt="" width="339" height="502" />Admittedly, taxonomy is probably the farthest thing from your mind if you’re designing an intelligent content application. You’re probably focused on technology selection and content strategy. In fact, many search engine providers – notably Google – would argue strenuously that you don’t need a taxonomy to find content. I would argue even more strenuously that your intelligent content project – whether it’s a live content project with user-generated content or an information publishing portal – just won’t success unless you give a long, hard think about taxonomy and content classification.</p>
<p>Over the last several years I’ve been in the midst of helping companies build intelligent content applications using taxonomy, and almost all of these had already spend a pile of cash on search and content management technology, only to see it fall short of their vision. Many had even implemented component content in DITA or another XML vocabulary. Taxonomy helps to bridge the gap in several important ways.</p>
<p>First of all, taxonomy helps companies organize and manage their source content. A well designed taxonomy can become the basis for a metadata and content type strategy for the CMS, and the source of the controlled vocabularies that content authors and publishers use to classify their content. Content classification is important for defining the “aboutness” of content as well as administering it. We all need well-defined, clear, unambiguous terms for administrative metadata, including our organization structures, customers, products, information types, and information security classifications, to name a few important categories. Leaving CMS users to enter this metadata by freely typing exposes us to human errors and inconsistency. At a minimum, we want to maintain an authoritative term list and expose it in drop-down lists for users to select values when they upload content to the CMS. Users are usually able to enter this metadata with low error rates when selecting from controlled term lists, especially if their job is content publishing. The list just makes it easy and consistent.</p>
<p>For “aboutness” metadata, we need to help users be comprehensive and consistent, so we often use the taxonomy to inform a classification engine that analyzes the document and provides suggested metadata to the user. Subject classification schemes are conceptually similar to the subject headings in library card catalog systems – they start with broad domains and categories that are broken into increasingly narrower topic spaces. The big difference is that each organization will need to develop and maintain subject classifications that are relevant to their business content. For example, I’m helping a high tech manufacturer classify technical documents; their taxonomy covers technologies, manufacturing process steps, customer needs / applications, as well as symptoms, fault codes, and root causes for troubleshooting and repairing their products. They had a rich set of terms for all of these topics, some in a corporate taxonomy and others in specific systems for service and quality management.  Putting them into a taxonomy helped us use that information for auto-classification of content. Metadata is proposed to content publishers when they upload it to the CMS, and they can add or remove terms form the proposed metadata using the same taxonomy the classifier used. The result is that documents are now more completely tagged with “aboutness” metadata in the CMS.</p>
<p>Intelligent content doesn’t end with the CMS, however. Search engines classify content using their algorithms to match end-user search queries (what you type in the search box) to content – whether it’s unstructured document content or structured data in an enterprise system, or both. A search engine doesn’t understand your business – it only looks at all of your content as a “bag of words” that it statistically determines to be “about” something looking at unusual combinations of terms,  or high-frequency terms. A taxonomy can tell the search indexing engine that certain terms are more meaningful to your business, and that there are relationships between terms that matter when it comes to relevance.  Even if the search engine is placing higher value on metadata values, those usually contain “preferred” terms – the official business labels. Users, on the other hand, are not so disciplined when they type in the search box – they may use “non-preferred” terms. A favorite example is searching a NASA site for “moon buggy” when NASA calls the item the “lunar excursion vehicle.” The taxonomy can relate those terms so the search engine returns relevant documents – even if they never contain the term “moon buggy” or that referred to the acronym for it.  </p>
<p>Finally, taxonomy can be used to driving the search user experience in major ways. It can become the basis for the facets in search refinement, allowing users to narrow their search along the dimensions of the taxonomy (show me only information about these document types, or these products, etc.). It can define the terms we show in tag clouds and other interface objects. The taxonomy can also help the search engine identify related searches – for instance, all of the astronauts who ever drove a LEV, or the Apollo missions that included a LEV.</p>
<p>I’ve actually seen a recent <a href="http://www.nhs.uk/Search/Pages/HealthExplorer.aspx?q=Diabetes&amp;qID=845#/tab~845~term~845~history~0">example</a> of an intelligent content application that is entirely defined in a taxonomy – the UK National Health Service has build a Flash application that lets you navigate a hyperbolic tree of symptoms and diseases, all of which is directly managed in a taxonomy and flowed directly into the portal. Their taxonomy aids search and results relevancy as well by taking terms like “AIDS” and assuring that search engine stemming doesn’t return documents that contain the terms “aid” or “aiding” – imagine all the results for “first aid”, “band aid”, “hearing aid”, or “health aid”.  </p>
<p>We also tend not to think of intelligent content systems as being driven by rather “dumb” legacy content, instead they are all about XML and structured content. In fact, most of the intelligent content portals I’ve worked on in the last several years were being populated by legacy PDF content – which was made intelligent only through the use of auto-classification with a well-crafted taxonomy and exposed through faceted search. For lengthy documents, document preview technologies can help hone-in on relevant pages – also being guided by the taxonomy to map search queries to preferred terms.</p>
<p>My conclusion in working with search and enterprise content management technology is that taxonomy development and management is a key success factor in creating effective intelligent content systems. Taxonomy can inform content types and metadata schema, make for consistent tagging, harmonize disparate structured data, and drive dynamic search and navigation user experiences, even with not-so-intelligent content.</p>
<p> I&#8217;ll be presenting more on this topic with extensive examples at <a title="Intelligent Content 2011" href="http://www.rockley.com/IC2011/">Intelligent Content 2011</a> in Palm Springs, February 16-18. Hope to see you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2011/02/04/the-role-of-taxonomy-in-intelligent-content/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to Turn Tagging into Cash: Take the Metadata Best Practices Survey</title>
		<link>http://thecontentguy.net/blog/2009/05/26/how-to-turn-tagging-into-cash-take-the-metadata-best-practices-survey/</link>
		<comments>http://thecontentguy.net/blog/2009/05/26/how-to-turn-tagging-into-cash-take-the-metadata-best-practices-survey/#comments</comments>
		<pubDate>Tue, 26 May 2009 15:07:21 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[DITA]]></category>
		<category><![CDATA[ECM]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[Benchmarking]]></category>
		<category><![CDATA[content management]]></category>
		<category><![CDATA[Earley & Associates]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[survey]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/blog/?p=310</guid>
		<description><![CDATA[We tag stuff to add meaning, and so that we and others – especially information systems – can find it.  But is your approach to tagging business content effective?  Find out - take the Metadata Best Practices Benchmarking Survey from Earley &#038; Associates and Taxonomy Strategies.]]></description>
			<content:encoded><![CDATA[<p>If you couldn’t tell by now, one of my particular interests is tagging, a.k.a. content classification, a.k.a. metadata.  We tag stuff to add meaning, and so that we and others – especially information systems – can find it.  But is your approach to tagging business content effective?  Find out &#8211; take the <strong><a title="Metadata Best Practices Benchmarking Survey" href="http://www.surveymonkey.com/s.aspx?sm=TEtPrAKwkiKIXhkey6revA_3d_3d" target="_blank">Metadata Best Practices Benchmarking Survey</a></strong> from Earley &amp; Associates and Taxonomy Strategies.</p>
<p><strong><span style="color: black; font-size: 14pt;  mso-bidi-font-size: 11.0pt; "><a title="Metadata Best Practices Benchmarking Survey" href="http://www.surveymonkey.com/s.aspx?sm=TEtPrAKwkiKIXhkey6revA_3d_3d" target="_blank"><span style="color: blue;"><span style="font-family: Calibri;">Take the Survey</span></span></a></span></strong></p>
<p><span id="more-310"></span>Depending upon context, “tagging” can mean one of three different things: tagging a document, tagging within a document, or tagging a content object.</p>
<p style="PADDING-LEFT: 30px"><strong>Tagging documents.</strong>  These days most of us think of tagging as the keywords we put on our documents – like our photos and websites – so that others can find them when they search.  User tags are fine for finding photos in flickr, but for tagging to be effective in business we need to make it systematic, so that we avoid ambiguity and improve search recall and relevance.  So we’re increasingly “mature” in our approaches to tagging: We use taxonomy to organize our terms into classes and to manage the relationships between terms.  We develop thesauri and foreign language equivalents.  We integrate taxonomies and thesauri into search indexes for ECM and site search and SEO.</p>
<p style="PADDING-LEFT: 30px"><strong>Tagging within a document.</strong>  I got interested in tagging in the early days of XML (back when we spelled it &#8220;S-G-M-L&#8221;), when we were tagging within documents.  By tagging unstructured content inside documents we could do really sophisticated things – not just multi-channel output.  For example, knowing that a paragraph in a document was a step in a service procedure or that a string of gibberish was a part number let us bring life to that content when we transformed it from markup into an interactive electronic technical manual.  <strong>Tagging let us turn books into diagnostic software.</strong></p>
<p style="PADDING-LEFT: 30px"><strong>Tagging reusable content objects.</strong> As content reuse matured with standards like DITA, organizations had more reusable components, with more people creating them in more departments.  Tagging reusable content objects became essential to actually reusing them – if you couldn’t find it, you’d never reuse it.  If you had a single service manual with 100 procedures, now you have at least 100 reusable content objects, so the search scope increased by two orders of magnitude.  At IBM, colleagues report having over a million DITA topics in more than six repositories, with over a dozen departments sharing content across thousands of publications.  <strong>Searching for content objects is like trying to find a needle in a haystack, except you’re trying to find the right needle, and you have more and smaller needles to search amongst, in more and increasingly bigger haystacks.</strong></p>
<p><strong>Measuring Metadata Maturity.</strong>  Each type of tagging can have measurable benefits on your business.  Five years ago, <a title="Earley &amp; Associates" href="www.earley.com" target="_blank">Earley &amp; Associates</a> and <a title="Taxonomy Strategies" href="www.taxonomystrategies.com" target="_blank">Taxonomy Strategies</a> developed a survey to understand metadata maturity for various types of businesses.  Earley is conducting an updated survey to see how organizations have moved up the learning curve.  Since we have a baseline of responses from five years ago, we’ll be able to describe how metadata and taxonomy practices have matured over time.  Also, the original survey was focused on the impact of metadata best practices on knowledge management and e-commerce search.  We now recognize that metadata is also used by technical communicators – especially those that use XML and other technologies to create, manage, and multichannel publish reusable content.  We want to hear from you all for the first time.</p>
<p>The survey is pretty detailed, so you might want to grab your favorite caffeinated beverage before you dig in.  As compensation for your time (about 15 minutes) Earley &amp; Associates is offering these nifty incentives:</p>
<ul type="disc">
<li style="line-height: 14.25pt; margin: 0in 0in 10pt; color: black; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in"><strong>A free pass to any future Earley &amp; Associates Community of Practice conference call</strong> (a $50 value).  These are monthly, and the next one is Wednesday June 2<sup>nd</sup> on <a title="Taxonomy Community of Practice - June 2009" href="http://www.earley.com/_June2009.asp" target="_blank">Taxonomy for Portals</a> featuring Giovanni Piazza, Chief Knowledge Officer of Ernst &amp; Young, and Ralph Poole of Earley &amp; Associates.</li>
<li style="line-height: 14.25pt; margin: 0in 0in 10pt; color: black; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><strong>A $200 discount on registration to the <a title="Henry Stewart Digital Asset Management Conference" href="http://www.damusers.com/" target="_blank">Henry Stewart conference</a></strong> on digital asset management, June 1-2 in NYC.  Seth Earley will be there presenting preliminary results.</li>
<li style="line-height: 14.25pt; margin: 0in 0in 10pt; color: black; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><strong>Free participation</strong> in a webcast reviewing the results of the survey (date TBA).</li>
</ul>
<p class="MsoNormal" style="line-height: 14.25pt; margin: 0in 0in 0pt;"><strong><span style="color: black; font-size: 14pt; mso-fareast-font-family: 'Times New Roman'; mso-bidi-font-family: 'Times New Roman'; mso-bidi-font-size: 11.0pt; mso-ascii-font-family: Calibri; mso-hansi-font-family: Calibri;"><a title="Metadata Best Practices Benchmarking Survey" href="http://www.surveymonkey.com/s.aspx?sm=TEtPrAKwkiKIXhkey6revA_3d_3d" target="_blank"><span style="color: blue;"><span style="font-family: Calibri;">Take the Survey</span></span></a></span></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2009/05/26/how-to-turn-tagging-into-cash-take-the-metadata-best-practices-survey/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If it&#8217;s about Search 3.0, shouldn&#8217;t it be Google Cubed?</title>
		<link>http://thecontentguy.net/blog/2009/05/20/if-its-about-search-30-shouldnt-it-be-google-cubed/</link>
		<comments>http://thecontentguy.net/blog/2009/05/20/if-its-about-search-30-shouldnt-it-be-google-cubed/#comments</comments>
		<pubDate>Wed, 20 May 2009 19:41:48 +0000</pubDate>
		<dc:creator>paulwlodarczyk</dc:creator>
				<category><![CDATA[search]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[Wolfram Alpha]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/blog/?p=277</guid>
		<description><![CDATA[I&#8217;ve been trying to catch up on my surfing after losing a week to a hard drive failure and laptop rebuild.  One pretty big thing I missed was Google Squared (the other was Wolfram&#124;Alpha &#8211; I&#8217;ll cover that in a separate post).  Google Squared is Google&#8217;s answer (or perhaps one of their answers) to semantic search [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.googlelabs.com"><img class="alignleft size-full wp-image-279" title="googlelabs" src="http://thecontentguy.net/wp-content/uploads/2009/05/googlelabs.png" alt="googlelabs" width="222" height="85" /></a>I&#8217;ve been trying to catch up on my surfing after losing a week to a hard drive failure and laptop rebuild.  One pretty big thing I missed was Google Squared (the other was Wolfram|Alpha &#8211; I&#8217;ll cover that in a separate post). </p>
<p>Google Squared is Google&#8217;s answer (or perhaps one of their answers) to semantic search and Linked Data.  &#8216;Squared gets its moniker from the matrix used for displaying results &#8211; each &#8220;square&#8221; in the matrix contains some fact derived from the content on the source site. <br />
<span id="more-277"></span>Each row in the Google Squared matrix is a search result, but the interesting part is the columns.  &#8216;Squared relies on RDFa and microformats on the source sites to extract structure for the search category &#8211; if it&#8217;s available (I&#8217;m not entirely sure how &#8216;Squared derives its semantic structure in the absence of metadata, but clearly it does).  So a search on &#8220;rollercoasters&#8221; will generate columns for height, speed, construction, etc.  Essentially, &#8216;Squared is generating search facets on the fly using structure that is implied or explicit in the set results set. </p>
<p>Because sites are inconsistent with the amount of structure they provide, &#8216;Squared can &#8211; and will &#8211; make errors in interpreting free text.  For example, in the video, we can see that &#8220;height&#8221; &#8211; while intended to describe the height of the rollercoaster &#8211; sometimes returns text about the minimum height requirements for riders.  Still, from the demo &#8216;Squared does look interesting.  Predictable categories like Restaurants (e.g. a search on &#8220;pizza&#8221;) have the dimensions you&#8217;d expect to see in columns &#8211; description, address, price range, ambiance, etc. </p>
<p>Google Squared affords general-purpose faceted search, because the columns can be used to refine the search results.  The current alpha doesn&#8217;t let you sort on a column, but clearly this is where things are heading.</p>
<p>Google Squared will be made publicly available on the Google Labs site in the next week or so. </p>
<p>Here is the video <a title="What Is Google Squared? It Is How Google Will Crush Wolfram Alpha (Exclusive Video)" href="http://www.techcrunch.com/2009/05/12/what-is-google-squared-it-is-how-google-will-crush-wolfram-alpha-exclusive-video/" target="_blank">courtesy of TechCrunch</a> (over six minutes and shaky-cam &#8211; but you&#8217;ll get the idea).</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="src" value="http://www.youtube.com/v/t2onuEXThPs&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="640" height="480" src="http://www.youtube.com/v/t2onuEXThPs&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" allowfullscreen="true"></embed></object></p>
<p>As a footnote, TechCrunch declares that &#8216;Squared &#8220;is how Google will crush Wolfram|Alpha&#8221;.  I&#8217;m sorry &#8211; they missed on that point by a country mile.  Alpha isn&#8217;t a search engine &#8211; it&#8217;s a user experience (and a cool one at that) built atop a &#8220;curated&#8221; database, designed to answer queries that are primarily computational in nature.  Google searches the web.  No comparison &#8211; the products aren&#8217;t in the same class, and don&#8217;t solve the same problem.  Google can&#8217;t plot the <a title="Wolfram|Alpha Julia Set Query" href="http://www93.wolframalpha.com/input/?i=Julia%20set%20c%3D-0.38%2B0.62i" target="_blank">Julia Set</a> for you &#8211; Alpha can.  That doesn&#8217;t mean Alpha &#8220;crushes&#8221; &#8216;Squared, either.  Geez.   Anyway, more on Alpha later.</p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2009/05/20/if-its-about-search-30-shouldnt-it-be-google-cubed/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Creating a News Digest for My Website</title>
		<link>http://thecontentguy.net/blog/2009/05/01/creating-a-news-digest-for-my-website/</link>
		<comments>http://thecontentguy.net/blog/2009/05/01/creating-a-news-digest-for-my-website/#comments</comments>
		<pubDate>Fri, 01 May 2009 15:57:54 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[social technology]]></category>
		<category><![CDATA[blogging]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[WordPress]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/blog/?p=255</guid>
		<description><![CDATA[I&#8217;ve been looking for some time for a way to simplify how I repost headlines from news sources and blogs, so that I can aggregate them into my own &#8220;digest&#8221; page of items of interest to share with my readers, network, and followers.  Several things were important to me about how to do this: I [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been looking for some time for a way to simplify how I repost headlines from news sources and blogs, so that I can aggregate them into my own &#8220;digest&#8221; page of items of interest to share with my readers, network, and followers. </p>
<p>Several things were important to me about how to do this:</p>
<ol>
<li>I wanted to use this feature to notify about interesting content without a lot of authoring on my part.  Ideally posting an item to the digest would be automated with a semantic search bot pulling in the content.  Worst case I would have to drag and drop headlines and links in a Twitter-like fashion.</li>
<li>I didn&#8217;t want to add commentary &#8211; I wanted a simple re-blog capability.</li>
<li>I wanted attribution of the author to be very clear &#8211; this was not my content, just items of interest to share.</li>
</ol>
<p><span id="more-255"></span></p>
<h3>Inline RSS for a WordPress page</h3>
<p>I concluded that what I really needed was a way to display RSS feeds in-line on a WordPress page to create my headlines or digest.  WordPress has loads of plug-ins that are great for integrating RSS feeds into sidebar widgets, but I found one plug-in &#8211; <a title="inlineRSS Plug-in Page" href="http://wordpress.org/extend/plugins/dge-inlinerss/" target="_blank">DGE inlineRSS</a> &#8211; that provided all of the features I needed to put the RSS into the body of a WordPress posting or page. </p>
<p>inlineRSS is pretty simple to use.  After install, there are three easy steps to getting a feed embedded into a page.  First, you need to configure inlineRSS to point to your feeds.  This is a simple matter of entering the RSS feed URL, and associating an XSLT file that will transform the feed into the HTML that displays on your site (inlineRSS provides a simple XSLT file that you can alter to meet your needs).  Next, you need to be sure that the configuration sets the path to the XSLT file on your site, and make any changes to the XSLT for your unique formatting.  Lastly, you need to enter the embed code for the inline RSS itself &#8211; this is simply:</p>
<blockquote><p>!inlineRSS:<em>myrssfeed</em></p></blockquote>
<p>where <em>myrssfeed</em> is any of the feeds you configured in the inlineRSS options screen.  inlineRSS implements a WordPress filter to replace this code with the XSLT-formatted RSS feed.</p>
<h3>Creating the Digest RSS Feed</h3>
<p>The next step to getting a digest page up and running was to create the source of the RSS feed itself.  I considered three options:</p>
<ul>
<li><strong>reFeed</strong> &#8211; a server for creating custom RSS feeds from items of interest</li>
<li><strong>Twitter or Ping.fm</strong> &#8211; generate a &#8220;tweetstream&#8221; of headlines and links (re-blogging vs. micro-blogging)</li>
<li><strong>TextWise</strong> &#8211; using a semantic search bot (specifically TextWise&#8217;s <a title="TextWise Gyzork" href="http://www.gyzork.com" target="_blank">Gyzork </a>demonstration app) to auto-generate a custom RSS feed that contains blog posts and news items that match specific semantic signatures</li>
</ul>
<p><strong>reFeed.</strong>  Last month, <a title="Mike's Digital Lab" href="http://www.mikeaxelrod.com" target="_blank">Mike Axelrod</a> set up a <a title="reFeed and reBlog" href="http://www.reblog.org" target="_blank">reFeed</a> server that he and I experimented with briefly.  This may hold some promise for the future for hand-selecting items of interest to reblog via an RSS feed.  However, Mike noted some technical issues, particularly with the quality of the RSS, so we tabled that project for the time being (he and I promise to blog more about reFeed in the near future).</p>
<p><strong>Twitter and Ping.fm.</strong>  As Mike discusses in a <a title="Twitter to wordpress mojo and can tweets feed the semantic web" href="http://www.mikeaxelrod.com/wp/2009/04/30/twitter-to-wordpress-mojo-and-can-tweets-feed-the-semantic-web" target="_blank">related post</a>, we&#8217;ve both been exploring Twitter and <a title="Ping.fm" href="http://ping.fm" target="_blank">Ping.fm</a> to stream reblog-type items to our websites.  Mike has the <a title="Ping.fm WordPress plug-in page" href="http://wordpress.org/extend/plugins/pingfm-custom-url-status-updates/" target="_blank">Ping.fm custom URL plug-in</a> for WordPress working and has his &#8220;pingstream&#8221; going to headline items in his sidebar.  I&#8217;m doing the same, using Ping to drive Twitter, then putting the Twitter feed into my sidebar with the  <a title="Alex King's Twitter Tools plug-in page" href="http://wordpress.org/extend/plugins/twitter-tools/" target="_blank">Twitter Tools</a> plug-in (look under <span style="text-decoration: underline;">Headlines</span> in the sidebar to your right).  I&#8217;m still working on getting my Ping feed to display in-line with RSS or other techiques.</p>
<p><strong>TextWise Gyzork.</strong>  You may be familiar with <a title="TextWise" href="http://www.textwise.com" target="_blank">TextWise</a> if only from my use of their technology here for finding related products, Wikipedia articles, and blogs for my blog posts.  TextWise uses semantic analysis to search the web for similar documents.  Each document (web page, blog post, etc.) gets a semantic signature that identifies the concepts in the document and their weight (i.e. relevance).  <a title="TextWise Gyzork" href="http://www.gyzork.com" target="_blank">Gyzork</a> lets me apply a semantic signature to any document, then create a saved semantic search that returns either blog posts or news items that match.  Gyzork also lets me create an RSS feed for that search.  If you check out my <a title="TheContentGuy Headlines" href="http://thecontentguy.net/blog/headlines" target="_blank">Headlines</a> tab above, you&#8217;ll see a set of headlines (embedded using inlineRSS) that were generated by a Gyzork search seeded by one of my posts.  This search has been pretty good over the last six months at surfacing items I&#8217;m interested in.</p>
<h3>Next Steps</h3>
<p>I plan to continue to experiment with the automatic news digest.  Things I plan to try out:</p>
<ul>
<li>Propagating the news feed items to my status updates in LinkedIn and Twitter</li>
<li>Getting my Ping.fm stream to display inline as a set of headlines</li>
<li>Micro-blogging using Ping as the front end</li>
<li>Working with TextWise technology to create more Gyzork feeds for the news page</li>
<li>Cleaning up the XSLT to better format the news feed</li>
<li>Experiment with other semantic technologies to create automatic feeds of interest</li>
</ul>
<p>If you have any further ideas on this, please comment.</p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2009/05/01/creating-a-news-digest-for-my-website/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Earley &amp; Associates to sponsor Semantic Technologies Jumpstart Series</title>
		<link>http://thecontentguy.net/blog/2008/09/30/earley-associates-to-sponsor-semantic-technologies-jumpstart-series/</link>
		<comments>http://thecontentguy.net/blog/2008/09/30/earley-associates-to-sponsor-semantic-technologies-jumpstart-series/#comments</comments>
		<pubDate>Tue, 30 Sep 2008 22:47:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[search]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[content technologies]]></category>
		<category><![CDATA[Earley & Associates]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[webcast]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/blog/?p=103</guid>
		<description><![CDATA[The Content Guy is pleased to have been invited by Earley &#038; Associates to present in their Semantic Technologies Jumpstart series this fall.  This free series runs every Thursday, from October 30th – November 20th, between 11:30 am – 1:00 pm ET.  Join us November 6 for "Implementing Semantic Search."]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.earley.com/"><img class="size-full wp-image-102 alignleft" style="margin: 3px; border: 0px;" title="Earley &amp; Associates" src="http://thecontentguy.net/wp-content/uploads/2008/09/ad_tagline_72dpi.jpg" alt="Earley &amp; Associates" width="125" height="125" /></a></p>
<p>The Content Guy is pleased to have been invited by <a title="Earley &amp; Associates" href="http://www.earley.com/" target="_blank">Earley &amp; Associates</a> to present in their <span style="color: #888888;"><a title="Semantic Technologies Jumpstart Series" href="http://www.earley.com/Jumpstarts.asp" target="_blank">Semantic Technologies Jumpstart</a></span> series this fall.  This free series runs <strong>every Thursday, from October 30th – November 20th, between 11:30 am – 1:00 pm ET.  </strong><br />
<span id="more-103"></span><br />
<strong><em>Join us November 6, 2008: </em></strong></p>
<p><strong><em>Implementing Semantic Search<br />
Presented by Paul Wlodarczyk and Amber Swope</em></strong><br />
Semantic search helps business people find answers to pressing questions by wading through oceans of information to find nuggets of meaningful information.  In this presentation we’ll discuss how semantic search and content analysis technologies are starting to appear in the marketplace today.  We’ll provide a recap of what semantic search is and what the key benefits are, then we’ll answer the following questions:<br />
• Is semantic search a feature, an application, or enterprise system?<br />
• How can I add semantic search to my existing work processes?<br />
• Will I need to replace my existing content technologies?<br />
• What will I need to do to prepare my content for semantic search?<br />
• Is semantic search just for documents or can I search my data too?<br />
• Can I use semantic search to find information on the internet and other public data sources?<br />
• Are there standards to consider?</p>
<p>This conference call is for implementers and decision makers of all technical levels including those new to semantic technologies.  We will introduce technical concepts in terms everyone can understand.</p>
<p>For more information on the other presentations in the Semantic Technologies Jumpstart series or to register <a title="Semantic Technologies Jumpstart Series" href="http://www.earley.com/Jumpstarts.asp" target="_blank">click here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2008/09/30/earley-associates-to-sponsor-semantic-technologies-jumpstart-series/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microsoft Live Search goes Semantic with first Powerset “flights”</title>
		<link>http://thecontentguy.net/blog/2008/09/27/microsoft-live-search-goes-semantic-with-first-powerset-flights/</link>
		<comments>http://thecontentguy.net/blog/2008/09/27/microsoft-live-search-goes-semantic-with-first-powerset-flights/#comments</comments>
		<pubDate>Sat, 27 Sep 2008 21:44:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[search]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Powerset]]></category>
		<category><![CDATA[semantic search]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/blog/?p=86</guid>
		<description><![CDATA[The first 30 days of Powerset integration projects at Microsoft have resulted in some experiments that are being “flighted” on Microsoft Live Search and Powerset.com.]]></description>
			<content:encoded><![CDATA[<p><a title="Powerset.com" href="http://www.powerset.com/" target="_blank">Powerset</a>, a natural language search technology startup, was <a title="Microsoft Acquires Powerset" href="http://blogs.msdn.com/livesearch/archive/2008/07/01/powerset-joins-live-search.aspx" target="_blank">acquired by Microsoft August 1</a> of this year. Powerset uses natural language processing technology licensed from PARC to improve the relevance of internet searches. Prior to the acquisition, Powerset had focused its semantic technologies on improving search and discovery on Wikipedia – both in terms of a better user experience for entering search terms, and better relevance and organization of search results.<br />
<span id="more-86"></span><br />
Now, according to the <a title="Powerset Blog" href="http://www.powerset.com/blog/articles/2008/09/17/powersets-first-live-search-projects" target="_blank">Powerset blog</a>, the first 30 days of integration projects at Microsoft have resulted in some experiments that are being “flighted” on Microsoft Live Search and Powerset.com</p>
<p style="TEXT-ALIGN: left">There were three separate projects in the integrations with the following goals:</p>
<ol>
<li>
<div style="TEXT-ALIGN: left">To expand the number of queries for which Live Search shows Answers (using data from Freebase). </div>
</li>
<li>
<div style="TEXT-ALIGN: left">To use Powerset’s semantic technology to generate improved captions for Wikipedia articles (shown below)</div>
</li>
<li>
<div style="TEXT-ALIGN: left">To use Powerset’s Factz extraction to generate a list of related searches for a set of queries</div>
</li>
</ol>
<p class="MsoNormal" style="margin: 0in 0in 6pt;">While the impact of Powerset’s semantic processing on Live Search is still limited, you can see the beginnings of it (that is, if you&#8217;re lucky enough to be in the random sample of searches in the experiment): semantically improved summaries in the captions for results that point to Wikipedia articles.</p>
<p> <a href="http://farm4.static.flickr.com/3143/2865651306_10abdfd41b_o.png"><img class="alignnone" style="margin: 2px; border: gray 2px solid;" src="http://farm4.static.flickr.com/3143/2865651306_10abdfd41b_o.png" alt="" width="607" height="206" /></a><a href="http://flickr.com/photos/powerset/2865651306"></a></p>
<p> </p>
<p style="TEXT-ALIGN: left">The integration with Live Search is two-way: Powerset.com now boasts related searches that are powered by Live Search.</p>
<p style="TEXT-ALIGN: left"> <a href="http://flickr.com/photos/powerset/2865651466/"><img class="alignnone" style="margin: 2px; border: gray 2px solid;" title="Powerset related links powered by Live Search" src="http://farm4.static.flickr.com/3118/2865651466_5083cd2e2b_o.png" alt="" width="640" height="260" /></a><a href="http://thecontentguy.net/blog/wp-content/uploads/2008/09/powersetbobney.jpg"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2008/09/27/microsoft-live-search-goes-semantic-with-first-powerset-flights/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

