Dec
12
2011
A short Linked Data URI design Q&A
URI design for Linked Data is pretty straightforward, but there are a few common practices out there in the real world which I find jarring as somebody who is mainly a data consumer. This is my attempt to briefly talk about why you shouldn’t do those things. I’ll probably update this post over time.
- Why should I avoid routinely redirecting [usually a 303] from a published thing URI to a specific representation of a document describing it? (e.g., a web page, or some RDF/XML, etc.)
- Redirecting, rather than just sending back the document and a
Content-Locationheader will work, but it does make life slightly trickier for developers debugging their consuming applications, and also makes it harder for people to share links to your data: the URIs that people see are always the specific document URIs, not your content-negotiating endpoint. - Why do I want to differentiate between my thing URIs and my document URIs?
- Because many vocabularies include properties which could be used to describe either (for example, much of the Dublin Core Metadata Terms) — and very often it’s useful to provide information about both the document and the thing primarily described by that document.
- Why should I avoid publishing my data on a separate subdomain to my normal web pages?
- This is much the same as why you shouldn’t routinely redirect to representations: you’re making discoverability and link-sharing harder.
- Why should I avoid deriving my identifiers from names and titles?
- URIs shouldn’t change, but names of things do: even if just to correct a mistake. When this happens, it means you either have to break links by changing the identifier, or accept that it won’t match the actual title (in which case, why make it match in the first place?). Sometimes, even correctly-named derived identifiers can have unintended results. For example, a recipe on the BBC Food site named “Carrots glazed with cumin and orange” had a derived identifier which truncated the title portion in a rather unfortunate place. UUIDs can be good, because they don’t require a centralised identifier-issuing service (or person!).
- Why should I avoid including things like “.rdf” or “.action” in my data URIs?
- URIs shouldn’t change. The web server technology, data formats, and virtually everything else in the technical stack used to publish your data should be outlived by the identifier: don’t tie the two together.
- Why should I opt for minting
http:andhttps:URIs? - Every linked data consumer understands HTTP (and HTTPS). Other schemes require either additional protocol support, or require specialist knowledge about how to resolve URIs to resources. By all means, reference other URIs, including those using schemes other than
http:andhttps:, but do this as well. - What should I do about things appearing in multiple collections?
- Make the items accessible within each collection (using the same local identifier in each case), but choose one location which will be the canonical home for the item and redirect to it (see the
3xxitems here for a short guide to which you should use).