creating resources the RESTful way can be done with either PUT or POST, with PUT expecting that the URI of the created resource is known in advance (by the client issuing the request). semantically, the difference is that PUT is idempotent and thus can be repeated safely, whereas POST is non-idempotent and thus cannot be repeated. a PUT can be repeated because after creating a resource, a repeated PUT will simply overwrite the resource with the same representation, whereas a POST always is directed towards some factory
resource, and thus repeating it would create a duplicate, which should be avoided.
in many cases, clients do not know the URI of the new resource, so the PUT approach cannot be used. the problem with the simple POST model is that clients have a hard time figuring out what to do when the POST request fails (e.g., the HTTP library says something went wrong
). did the request make it to the server and cause a resource to be created? or did the request fail before that and the client could retry the POST? HTTP semantics say that POSTs cannot be repeated, so what now?
this is where people came up with the POST/PUT pattern. clients PUTs the actual data there, and the important observation is that this PUT request can be safely repeated if it fails, because of its HTTP semantics.
what this does is simply push complexity around. instead of having to deal with possible duplicates being created, the server now has to deal with possible templates that were never populated. in many scenarios, this latter task is easier to do that the former one, it's essentially garbage collection and the server can clean up all resources that were created as a result of a POST, but were never populated with a subsequent PUT. if a client waits for an absurdly long period of time and then attempts to PUT to a resource that was garbage-collected, then it's up to server logic to either keep track of all URIs ever minted (which might get expensive), or just use URIs that are very unlikely to collide. in either case, the server would let the client know that the URI is gone, and that a new one should be requested by POSTing to the factory.
so far, this is just what people have been doing for a while already. as a new twist, what if the creation of a resource on the server is actually a fairly heavy-weight operation, because of all sorts of bookkeeping and retention mechanisms built into the persistence layer? creating a resource before it has been populated ideally should be avoided, because garbage collection is not as easy; as soon as a resource has been created, a lot of repository data is being created and cannot be deleted anymore.
the question i have been very slowly getting to thus is the following: in such a scenario, what about issuing a one-time token as a result of an initial request, which allows the client to PUT to that URI. but only if that PUT happens, the actual resource will be created in the repository. the token is just a place-holder, and the URI of the actual resource depends on the creation of the repository-level URI. in that case, the first PUT will be accepted, but will result in a 303 response that redirects to the cool URI. a second request to the token URI should result in a redirect as well, so that clients know that this token has been used.
there are two ways how the tokens can be associated with the cool URIs. the REST layer could maintain a list of token/persistent URI pairs, or the repository itself could have a field for each resource that keeps track of the token from which a resource has been generated. fast look-up is important, but can be done in either case. extending the repository model probably is the better way to go, but may not be possible in all cases. it is important to notice that tokens only need to be tracked once they have been used (i.e., something has been PUT there), before that, all that is needed is a token scheme that makes collisions highly unlikely (such as time stamp + client IP + random string
).
here is the proposed sequence of steps in this scenario:
- clients
GETthe template and a token from the factory resource. since this operation does not change server state, it is safe and idempotent and it is possible to useGET. - clients
PUTto the token URI, at which point the server creates and populates the resource at the repository level, and keeps track of the token from which the resource has been created. the server responds with a303 (See Other)response, indicating the persistent URI of the resource. - subsequent requests to the token URIs are handled with a
301 (Moved Permanently)response, telling clients that the token URI should no longer be used, and that a persistent URI is now available.
if the initial GET fails, it can be repeated safely, because it does not change server state, and thus is safe and idempotent. if something goes wrong with the PUT, it can be repeated, but if the server receives a repeated request, it will redirected with a 301. as soon as the resource has been created and the persistent URI is established, future interactions with the resource work as usual.
what are the main benefits of this approach?
- the factory
GETallows the server to supply a template, and tell the client where to direct requests for creation. since the server is providing the URI, we can usePUTrequests for creating the resource at this known URI. this initial step is stateless on the server side. - the token-based creation allows the server to delay persistence-layer creation until something has been
PUTby the client. the tokens are one-timePUTonly, and after that they respond with 301. this approach scales with the number of resources, not with the number of clients, and usually is simply one more attribute in the persistence layer.
any feedback on this pattern is very welcome. has it been used somewhere else? does it seem to introduce problems not discussed here? is there a simpler way of handling the same scenario?
mark nottingham's POST Once Exactly (POE, http://www.mnot.net/blog/2005/03/21/poe) is probably pretty much what i've described here, as usual going back to the history books is a good idea. mark has pointed out that the pattern and the required information could be used by extending HTTP, or by embedding appropriate information in representations. since POE hasn't made it as an RFC, i guess the way to go nowadays is the "plain HTTP" version described here. as mark notes, the important thing is that the server needs to keep state about whether a resource has been used or not.
so maybe the simplest way to describe this GET/PUT pattern would be to say it implements POE, but avoids POST.
Posted by: dret | Thursday, November 17, 2011 at 10:14
Thanks for your explanations. My second question was a bit ambiguous, sorry. I like to know, how to provide the link to the resource where I have —as a client— to PUT my filled form to (the link in the response to the first GET to receive the template, not the response to the PUT-request).
I'm looking forward to your more detailed documentation as I am really interested in this topic. One of my students currently analyses different POST Once proposals (http://fernuni-hagen.de/dvt/studium/masterarbeiten_12.shtml, nearly finished) using colored petri nets to model server and client states and interactions. I will try to apply his method and model to your approach next week, then I might have further questions.
Daniel
Posted by: Daniel Schulte | Thursday, November 17, 2011 at 05:30
daniel, thanks for your comments! here are my responses:
1) how to link to the factory/template: yes, i think there should be a link relation. it simply would be a relationship defined for this pattern. with AtomPub, this is easy because you just POST to the collection URI, so there's not additional resource involved. for the GET/PUT pattern, this is different, and we will need to have a link relationship for that.
2) returning the PUT URI: http://tools.ietf.org/html/rfc2616#section-10.3.4 says that the 303 response should return the new URI in a Location header, and thus that's probably more appropriate than a Link header.
3) using the token URI as persistent URI: that's not something we would like to do because we want the token to be transient, and we want the persistent URI to reflect an internal id. but i don't think the pattern would change at all if you used the same URI, the only difference on the implementation side would be that instead of storing the token with the persisted object, you would add a switch to the persisted object saying uninitialized/initialized.
thanks for the comments! i am planning on documenting this a little better, any feedback what you would look for would be welcome.
Posted by: dret | Wednesday, November 16, 2011 at 21:05
Erik,
at first I was surprised to see a solution for the "POST Once" problem without using the POST verb. But now, I see several advantages, e.g., the option to use content negotiation before creating a new resource and the option to provide a form (instead of or as part of a template).
However, as you focus on the idea to use GET+PUT, I have questions with respect to technical details:
- Browsing a RESTful application (a collection, ...), a client needs to know, where and how to add resources using the GET+PUT combination, i.e., in the first step, it needs to know, where to get the template. A link relation can specify that the provided URI allows to add resources using GET+PUT with templates (and in my opinion the link relation should be independent of the context as it specifies a procedure; a further link relation may state more precisely what kind of resource may be added). Do you have any link relation or another approach in mind?
- How should I provide the URI for PUTting the filled template? As a link header with some special link relation?
- I also like Rubens point and I think in many use cases one URI will be enough (even though it might be less nice). I see no problem in supporting both options. If the client gets a 201 Created, it knows, the PUT was successful (I would only use 200 OK for subsequent PUTs) and it already has the URI of the new resource. If it gets a 303 See other, it knows, the put was successful and the new URI is provided. And clients should be able to handle 301 not only after creating resources. Did I missed any potential problem in supporting both options?
Thanks
Daniel
Posted by: Daniel Schulte | Wednesday, November 16, 2011 at 05:42
@ruben, thanks for the comment. one goal we have is to provide "nice" URIs as the persistent identifiers for resources, and for our back-end architecture, this means to reuse the internal identifier as part of that URI. however, that identifier is only known after the resource has been created in the persistence layer. i agree that in principle, you could get away with just using one identifier and simply remember whether it had been initialized before, but then you don't have the opportunity to use the persistence ID as part of the URI.
Posted by: dret | Monday, November 07, 2011 at 08:33
This is an interesting approach – but are the two different URIs really necessary?
While I like the mechanism "token/definitive URI", I don't see the benefit of actually having two URIs.
Would this proposition, with only one URI, have different effects?
1. GET for template and token URI (no resource created yet)
2. PUT to token URI – the token URI was an uninitialized resource (404) but no becomes a resource (200)
I think it still has the same benefits, but a little less complexity.
And also: how is your token generation going to work? Do you use GUIDs or an incremental counter (which could insanely increment without serving a purpose)?
Posted by: Ruben Verborgh | Sunday, November 06, 2011 at 23:23