<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>TheContentGuy &#187; Linked Data</title>
	<atom:link href="http://thecontentguy.net/blog/tag/linked-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://thecontentguy.net</link>
	<description>all things unstructured</description>
	<lastBuildDate>Sat, 05 Mar 2011 06:00:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.5</generator>
		<item>
		<title>Weekly Digest for 2009-12-05</title>
		<link>http://thecontentguy.net/blog/2009/12/06/weekly-digest-for-2009-12-05/</link>
		<comments>http://thecontentguy.net/blog/2009/12/06/weekly-digest-for-2009-12-05/#comments</comments>
		<pubDate>Sun, 06 Dec 2009 14:08:21 +0000</pubDate>
		<dc:creator>paulwlodarczyk</dc:creator>
				<category><![CDATA[Digest]]></category>
		<category><![CDATA[BI]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[ECM]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[SharePoint]]></category>
		<category><![CDATA[social media]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/?p=773</guid>
		<description><![CDATA[This week's twitter digest from TheContentGuy.]]></description>
			<content:encoded><![CDATA[<ul>
<li>[ECM] Bearingpoint on SharePoint governance and risk <a href="http://bit.ly/4xFXdc" target="_blank">http://bit.ly/4xFXdc</a> <a class="twitter-hashtag" href="http://search.twitter.com/search?q=%23ecm" target="_blank">#ecm</a> <a class="twitter-hashtag" href="http://search.twitter.com/search?q=%23bearingpoint" target="_blank">#bearingpoint</a> <a class="twitter-hashtag" href="http://search.twitter.com/search?q=%23sharepoint" target="_blank">#sharepoint</a> RT <a class="twitter-user" href="http://twitter.com/jmancini77" target="_blank">@jmancini77</a></li>
<li>[ECM] If buying cars were like buying ECM (very funny and insightful) <a href="http://bit.ly/6F3Q0N" target="_blank">http://bit.ly/6F3Q0N</a> <a class="twitter-hashtag" href="http://search.twitter.com/search?q=%23ecm" target="_blank">#ecm</a> RT <a class="twitter-user" href="http://twitter.com/ldallasBMOC" target="_blank">@ldallasBMOC</a></li>
<li>[semantic web] Matt McAlister says Socially Linked Data is here today &#8211; you&#8217;re using it right now <a href="http://ow.ly/HtbX" target="_blank">http://ow.ly/HtbX</a></li>
<li>[cloud] AFP: IBM builds Blue Insight BI platform for employees; model for Smart Analytics Cloud offering <a href="http://j.mp/22xuez" target="_blank">http://j.mp/22xuez</a> via <a class="twitter-user" href="http://twitter.com/dcarli" target="_blank">@dcarli</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2009/12/06/weekly-digest-for-2009-12-05/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If it&#8217;s about Search 3.0, shouldn&#8217;t it be Google Cubed?</title>
		<link>http://thecontentguy.net/blog/2009/05/20/if-its-about-search-30-shouldnt-it-be-google-cubed/</link>
		<comments>http://thecontentguy.net/blog/2009/05/20/if-its-about-search-30-shouldnt-it-be-google-cubed/#comments</comments>
		<pubDate>Wed, 20 May 2009 19:41:48 +0000</pubDate>
		<dc:creator>paulwlodarczyk</dc:creator>
				<category><![CDATA[search]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[semantic search]]></category>
		<category><![CDATA[Wolfram Alpha]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/blog/?p=277</guid>
		<description><![CDATA[I&#8217;ve been trying to catch up on my surfing after losing a week to a hard drive failure and laptop rebuild.  One pretty big thing I missed was Google Squared (the other was Wolfram&#124;Alpha &#8211; I&#8217;ll cover that in a separate post).  Google Squared is Google&#8217;s answer (or perhaps one of their answers) to semantic search [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.googlelabs.com"><img class="alignleft size-full wp-image-279" title="googlelabs" src="http://thecontentguy.net/wp-content/uploads/2009/05/googlelabs.png" alt="googlelabs" width="222" height="85" /></a>I&#8217;ve been trying to catch up on my surfing after losing a week to a hard drive failure and laptop rebuild.  One pretty big thing I missed was Google Squared (the other was Wolfram|Alpha &#8211; I&#8217;ll cover that in a separate post). </p>
<p>Google Squared is Google&#8217;s answer (or perhaps one of their answers) to semantic search and Linked Data.  &#8216;Squared gets its moniker from the matrix used for displaying results &#8211; each &#8220;square&#8221; in the matrix contains some fact derived from the content on the source site. <br />
<span id="more-277"></span>Each row in the Google Squared matrix is a search result, but the interesting part is the columns.  &#8216;Squared relies on RDFa and microformats on the source sites to extract structure for the search category &#8211; if it&#8217;s available (I&#8217;m not entirely sure how &#8216;Squared derives its semantic structure in the absence of metadata, but clearly it does).  So a search on &#8220;rollercoasters&#8221; will generate columns for height, speed, construction, etc.  Essentially, &#8216;Squared is generating search facets on the fly using structure that is implied or explicit in the set results set. </p>
<p>Because sites are inconsistent with the amount of structure they provide, &#8216;Squared can &#8211; and will &#8211; make errors in interpreting free text.  For example, in the video, we can see that &#8220;height&#8221; &#8211; while intended to describe the height of the rollercoaster &#8211; sometimes returns text about the minimum height requirements for riders.  Still, from the demo &#8216;Squared does look interesting.  Predictable categories like Restaurants (e.g. a search on &#8220;pizza&#8221;) have the dimensions you&#8217;d expect to see in columns &#8211; description, address, price range, ambiance, etc. </p>
<p>Google Squared affords general-purpose faceted search, because the columns can be used to refine the search results.  The current alpha doesn&#8217;t let you sort on a column, but clearly this is where things are heading.</p>
<p>Google Squared will be made publicly available on the Google Labs site in the next week or so. </p>
<p>Here is the video <a title="What Is Google Squared? It Is How Google Will Crush Wolfram Alpha (Exclusive Video)" href="http://www.techcrunch.com/2009/05/12/what-is-google-squared-it-is-how-google-will-crush-wolfram-alpha-exclusive-video/" target="_blank">courtesy of TechCrunch</a> (over six minutes and shaky-cam &#8211; but you&#8217;ll get the idea).</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="640" height="480" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="src" value="http://www.youtube.com/v/t2onuEXThPs&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="640" height="480" src="http://www.youtube.com/v/t2onuEXThPs&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" allowfullscreen="true"></embed></object></p>
<p>As a footnote, TechCrunch declares that &#8216;Squared &#8220;is how Google will crush Wolfram|Alpha&#8221;.  I&#8217;m sorry &#8211; they missed on that point by a country mile.  Alpha isn&#8217;t a search engine &#8211; it&#8217;s a user experience (and a cool one at that) built atop a &#8220;curated&#8221; database, designed to answer queries that are primarily computational in nature.  Google searches the web.  No comparison &#8211; the products aren&#8217;t in the same class, and don&#8217;t solve the same problem.  Google can&#8217;t plot the <a title="Wolfram|Alpha Julia Set Query" href="http://www93.wolframalpha.com/input/?i=Julia%20set%20c%3D-0.38%2B0.62i" target="_blank">Julia Set</a> for you &#8211; Alpha can.  That doesn&#8217;t mean Alpha &#8220;crushes&#8221; &#8216;Squared, either.  Geez.   Anyway, more on Alpha later.</p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2009/05/20/if-its-about-search-30-shouldnt-it-be-google-cubed/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Talis Enters the Public Linked Data Arena</title>
		<link>http://thecontentguy.net/blog/2009/04/06/talis-enters-the-public-linked-data-arena/</link>
		<comments>http://thecontentguy.net/blog/2009/04/06/talis-enters-the-public-linked-data-arena/#comments</comments>
		<pubDate>Mon, 06 Apr 2009 21:43:25 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[Creative Commons]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[public data]]></category>
		<category><![CDATA[Talis]]></category>

		<guid isPermaLink="false">http://thecontentguy.net/blog/?p=174</guid>
		<description><![CDATA[Ever have a data set that was burning a hole in your proverbial pocket, and you just wanted to share it with the world, but had nowhere to put it?  Well now you do.  For some time now Amazon has made large data sets publicly available through their Public Data Sets &#8211; but these were [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="Talis" src="http://www.talis.com/images/logo.gif" alt="" width="88" height="68" />Ever have a data set that was burning a hole in your proverbial pocket, and you just wanted to share it with the world, but had nowhere to put it?  Well now you do.  For some time now Amazon has made large data sets publicly available through their Public Data Sets &#8211; but these were one-way.  They put it up, you could access it.  Now Talis has entered the public domain data game with the <a title="Talis Connected Commons" href="http://blogs.talis.com/n2/cc" target="_blank">Talis Connected Commons</a>.   Unlike Amazon, the Talis Commons is a place for <em>you</em> to make <em>your</em> data available. <br />
<span id="more-174"></span><br />
From the Talis press release:</p>
<blockquote><p>The Talis Connected Commons scheme is intended to directly support the publishing and reuse of <a onclick="pageTracker._trackPageview('/outbound/article/http://linkeddata.org');" href="http://linkeddata.org/"><span style="color: #1e6fac;">Linked Data</span></a> in the public domain by removing the costs associated with those activities.</p>
<p>The scheme is intended to support a wide range of different forms of data publishing. For example scientific researchers seeking to share their research data; dissemination of public domain data from a variety of different charitable, public sector or volunteer organizations; open data enthusiasts compiling data sets to be shared with the web community.</p>
<p>For qualifying data sets, Talis will provide, through the <a href="http://www.talis.com/platform"><span style="color: #1e6fac;">Talis Platform</span></a>:</p>
<ul>
<li>Free hosting of up to 50 million <a onclick="pageTracker._trackPageview('/outbound/article/http://www.w3.org/TR/REC-rdf-syntax/');" href="http://www.w3.org/TR/REC-rdf-syntax/"><span style="color: #1e6fac;">RDF</span></a> triples and 10Gb of content</li>
<li>Access to <a href="http://n2.talis.com/wiki/Platform_API"><span style="color: #1e6fac;">data access services</span></a> that operate on that data, including data retrieval and text search</li>
<li>Free access to a public <a onclick="pageTracker._trackPageview('/outbound/article/http://www.w3.org/TR/rdf-sparql-query/');" href="http://www.w3.org/TR/rdf-sparql-query/"><span style="color: #1e6fac;">SPARQL</span></a> endpoint for each dataset.</li>
</ul>
<p>This means that data set providers will not incur any of the commercial costs normally associated with hosting data on the Talis Platform. In addition neither the data set provider or its users will incur any usage charges relating to the use of the Platform services made available on that data.</p>
<p>To qualify for entry into the scheme all data and content hosted in the Platform must be made available under one of the following public domain data licenses:</p>
<ul>
<li><a onclick="pageTracker._trackPageview('/outbound/article/http://www.opendatacommons.org/licenses/pddl/1.0/');" href="http://www.opendatacommons.org/licenses/pddl/1.0/"><span style="color: #1e6fac;">Open Data Commons Public Domain Dedication and License</span></a></li>
<li><a onclick="pageTracker._trackPageview('/outbound/article/http://creativecommons.org/license/zero/');" href="http://creativecommons.org/license/zero/"><span style="color: #1e6fac;">Creative Commons CC0</span></a></li>
</ul>
</blockquote>
<p>As Mike Axelrod and I have been actively discussing, as more of these services become available through web APIs (e.g. Nova Spivak&#8217;s hosted ontlogy service, Amazon&#8217;s Public Data Sets, or text analysis services like TextWise, OpenCalais, or Amplify), developers can start mashing them up into useful virtual applications.  Marshall Kirkpatrick at ReadWriteWeb discussed the roadmap for this in a <a title="Talis Takes on Amazon With Pot of Structured Data in the Sky" href="http://www.readwriteweb.com/archives/talis_takes_on_amazon_with_pot_of_structured_data.php" target="_blank">recent post</a>. </p>
<blockquote><p>First, massive bodies of data are created or gathered, books are scanned, census data is collected, and patients donate their anonymous aggregate medical data to science. Next, the data is semantically analyzed and marked up (through any number of different semantic processing engines). Then, the data is stored and an API is made available (this is where the Talis Connected Commons comes in). Finally, developers build applications that leverage the smart data offered up through the platform, data visualizers find new stories to tell in images built from the marked up data and new relationships between people, organizations and concepts have the mist cleared away from them through systematic analysis of various permutations of previously unavailable structured data.</p></blockquote>
<p>That last bit is what has Mike and me interested &#8211; finding new ways of making use of the relationships between data and content that all the various semantic tools unearth.</p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2009/04/06/talis-enters-the-public-linked-data-arena/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Connecting the dots: How XML authoring enables the Semantic Web</title>
		<link>http://thecontentguy.net/blog/2008/08/15/connecting-the-dots-how-xml-authoring-enables-the-semantic-web/</link>
		<comments>http://thecontentguy.net/blog/2008/08/15/connecting-the-dots-how-xml-authoring-enables-the-semantic-web/#comments</comments>
		<pubDate>Fri, 15 Aug 2008 20:10:17 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[DITA]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[Calais]]></category>
		<category><![CDATA[Linked Data]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Search Monkey]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[web services]]></category>

		<guid isPermaLink="false">http://paulwlodarczyk.wordpress.com/?p=4</guid>
		<description><![CDATA[What if we start combining semantic web technologies and semantic document technologies?]]></description>
			<content:encoded><![CDATA[<p><a title="New, Improved *Semantic* Web!" href="http://flickr.com/photos/14829735@N00/303503677"><img class="alignleft" style="margin: 2px;" src="http://farm1.static.flickr.com/105/303503677_e83d70118f_m.jpg" alt="" width="193" height="240" /></a>I recently attended the <a title="Linked Data Planet" href="http://www.linkeddataplanet.com/" target="_blank">Linked Data Planet </a>conference where a number of pioneers in the field of Semantic Web shared their perspectives on the state of the art – and business – of helping the world tag their web pages for meaning.  For those of you in the dark about semantic mark-up, it lets authors annotate their web pages with metadata (HTML attributes that don’t get displayed in the document) that describe what those pages are about. <br />
<span id="more-7"></span><br />
So for example, when I say “New York” in an HTML document it&#8217;s ambiguous – do I mean the city, the state, the Yankees, the Mets, the Giants, the Jets, the song, the steak, the state of mind – you get the idea.  Words are ambiguous – except in the context of the language in which they occur.  So if I am writing about a sporting event <strong>you</strong> know from the context of the article that I mean the team, but the typical search engine does not.  To a search engine, New York is just a string that occurs in the document with some frequency. </p>
<p>There are two ways to make sense out of words in a document.  One is semantic analysis (I&#8217;ll leave that topic to another day).  The other is semantic tagging &#8211; adding metadata to a document.<br />
With metadata, I can define things precisely.  I can state that this document is about the sports team, not the steak.  I can do this by tagging the named entities in the document – the people, places, things, events, and facts – in an unambiguous way.  I can also set those entities into relationships with each other.  For example, a piece of text may refer to two companies involved in a merger.  So I can tag the document being about <strong>Company A</strong> (thing number one) and <strong>Company B</strong> (thing number two) involved in a <strong>merger</strong> (an event, but also a relationship between the two named entities). </p>
<p>So semantic tagging adds meaning to documents that goes beyond the text, and it does it in an unambiguous way, which is handy.  But it has traditionally faced two large hurdles: (1) it’s been relatively expensive to add semantic markup (either with investments in labor or technology) and (2) there has been little mass market for consuming this markup.  Both of those hurdles are rapidly falling away. </p>
<p>Let’s address the second point first.  Yahoo has introduced <a title="Yahoo! Search Monkey" href="http://developer.yahoo.com/searchmonkey/" target="_blank">Search Monkey</a> – a new technology that rates web pages not on the keywords and number of links to the page (the “wisdom of crowds”) but on the semantic markup that is embedded in the page (the wisdom of the author).  This creates a substantial motive for adding the markup: Search Engine Optimization.  Semantic markup makes your content more likely to be found and more relevant to the searcher.</p>
<p>Great, so how do you add semantic markup?  For legacy content, you need to use some combination of people and automation to add markup to what you already wrote.  Using people to tag content requires specialized skills that are not in good supply.  Natural language processing technologies for auto-tagging content have been around since the late 90s in lab settings; auto-tagging products are emerging in new and interesting forms in the marketplace today. Thomson-Reuter’s <a title="Thomson-Reuters Calais" href="http://www.opencalais.com/">Calais</a> open source project is a great example.  For a demo <a title="Calais Viewer Demo" href="http://sws.clearforest.com/calaisviewer/" target="_blank">click here</a> and try pasting some <a title="Terms of use" href="http://www.opencalais.com/terms" target="_blank">non-proprietary</a> text that describes what your company does (for example, I tried the “About Our Company” page we used in proposals at JustSystems and it accurately tagged all of the named companies, legal entities, products, technologies, countries, cities, and correctly identified JustSystems’s acquisition of XMetaL from Blast Radius as a business event).</p>
<p>Adding semantic markup to new web content as it is created &#8211; making it available as data &#8211; is the way to go.  But what about other types of unstructured content, like documents, that might be published to the web and other channels?  We’ve been doing this with XML and SGML documents all along, using semantic tags to unambiguously flag specific pieces of text for future discovery.  This has ranged from tagging part numbers in a service manual (which could automate adding hyperlinks or improve search relevance), to tagging financial reports with XBRL to find specific facts within the MD&amp;A or footnotes of an annual report (which could prevent another Enron).  But the important concept here is this: when content is tagged, it can be treated as data</p>
<p>More recent XML standards like <a title="DITA.XML.ORG" href="http://dita.xml.org/" target="_blank">DITA</a> help authors focus on creating granular content – primarily for content reuse.  But our customers are finding that DITA and other topic-oriented XML approaches are helping them break out of the document model – where loads of facts are locked-up within documents.  Think of a lengthy Policies and Procedures manual.  The historical reason it’s all bound in one book is for the convenience of publishing.  Today – with electronic publishing on the web, intranets, and portals – you really only want to publish a single policy or procedure as it is added or revised.  The book itself is obsolete when you can publish a procedure at a time. </p>
<p>In a DITA world, because of its granular nature, a single document (like a Policy manual that was one very large document in your document management system) may instead be managed as a collection of hundreds of DITA topics in your CMS or XML object store.  The document would no longer exist, it becomes a collection of topics, more like records in a database.  To effectively manage large collections of DITA topics, you <strong>need</strong> to specify metadata for each topic – just so that you can find any given topic again.  So a typical DITA project would define the CMS metadata scheme and the taxonomy for classifying the DITA topics.  For those of us in the XML document world, this is old hat.</p>
<p>So all this makes me ask:</p>
<ul>
<li>What if we start combining semantic web technologies and semantic document technologies?</li>
<li>What if we combine technologies that auto-tag named entities with granular authoring approaches like DITA?</li>
<li>What if you could automatically tag named entities within the DITA topic you are creating, tagging as you type? </li>
<li>What if a web service could automatically provide the CMS metadata when you go to check-in a new topic?</li>
<li>What if the publishing tools that transform your DITA to HTML could automatically add the semantic markup to your HTML pages that are published from your DITA content?</li>
<li>How would that change how you publish business documents like policies and procedures to your employees?</li>
<li>How would it change how you create marketing content for your web site?</li>
<li>How would it change the way you create and manage your product technical content?</li>
</ul>
<p>Could the secret to the semantic web be right under our nose?</p>
]]></content:encoded>
			<wfw:commentRss>http://thecontentguy.net/blog/2008/08/15/connecting-the-dots-how-xml-authoring-enables-the-semantic-web/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

