Ever have a data set that was burning a hole in your proverbial pocket, and you just wanted to share it with the world, but had nowhere to put it?  Well now you do.  For some time now Amazon has made large data sets publicly available through their Public Data Sets – but these were one-way.  They put it up, you could access it.  Now Talis has entered the public domain data game with the Talis Connected Commons.   Unlike Amazon, the Talis Commons is a place for you to make your data available. 

From the Talis press release:

The Talis Connected Commons scheme is intended to directly support the publishing and reuse of Linked Data in the public domain by removing the costs associated with those activities.

The scheme is intended to support a wide range of different forms of data publishing. For example scientific researchers seeking to share their research data; dissemination of public domain data from a variety of different charitable, public sector or volunteer organizations; open data enthusiasts compiling data sets to be shared with the web community.

For qualifying data sets, Talis will provide, through the Talis Platform:

  • Free hosting of up to 50 million RDF triples and 10Gb of content
  • Access to data access services that operate on that data, including data retrieval and text search
  • Free access to a public SPARQL endpoint for each dataset.

This means that data set providers will not incur any of the commercial costs normally associated with hosting data on the Talis Platform. In addition neither the data set provider or its users will incur any usage charges relating to the use of the Platform services made available on that data.

To qualify for entry into the scheme all data and content hosted in the Platform must be made available under one of the following public domain data licenses:

As Mike Axelrod and I have been actively discussing, as more of these services become available through web APIs (e.g. Nova Spivak’s hosted ontlogy service, Amazon’s Public Data Sets, or text analysis services like TextWise, OpenCalais, or Amplify), developers can start mashing them up into useful virtual applications.  Marshall Kirkpatrick at ReadWriteWeb discussed the roadmap for this in a recent post

First, massive bodies of data are created or gathered, books are scanned, census data is collected, and patients donate their anonymous aggregate medical data to science. Next, the data is semantically analyzed and marked up (through any number of different semantic processing engines). Then, the data is stored and an API is made available (this is where the Talis Connected Commons comes in). Finally, developers build applications that leverage the smart data offered up through the platform, data visualizers find new stories to tell in images built from the marked up data and new relationships between people, organizations and concepts have the mist cleared away from them through systematic analysis of various permutations of previously unavailable structured data.

That last bit is what has Mike and me interested – finding new ways of making use of the relationships between data and content that all the various semantic tools unearth.