creating resources the RESTful way can be done with either PUT or POST, with PUT expecting that the URI of the created resource is known in advance (by the client issuing the request). semantically, the difference is that PUT is idempotent and thus can be repeated safely, whereas POST is non-idempotent and thus cannot be repeated. a PUT can be repeated because after creating a resource, a repeated PUT will simply overwrite the resource with the same representation, whereas a POST always is directed towards some factory
resource, and thus repeating it would create a duplicate, which should be avoided.
in many cases, clients do not know the URI of the new resource, so the PUT approach cannot be used. the problem with the simple POST model is that clients have a hard time figuring out what to do when the POST request fails (e.g., the HTTP library says something went wrong
). did the request make it to the server and cause a resource to be created? or did the request fail before that and the client could retry the POST? HTTP semantics say that POSTs cannot be repeated, so what now?
this is where people came up with the POST/PUT pattern. clients POST to a factory resource, but mostly in an attempt to get the URI of a new resource. a often useful side-effect of this pattern is that the server can return a template for the resource, allowing a client to understand what is expected. after the client has received the URI of the new resource, it PUTs the actual data there, and the important observation is that this PUT request can be safely repeated if it fails, because of its HTTP semantics.
what this does is simply push complexity around. instead of having to deal with possible duplicates being created, the server now has to deal with possible templates that were never populated. in many scenarios, this latter task is easier to do that the former one, it's essentially garbage collection and the server can clean up all resources that were created as a result of a POST, but were never populated with a subsequent PUT. if a client waits for an absurdly long period of time and then attempts to PUT to a resource that was garbage-collected, then it's up to server logic to either keep track of all URIs ever minted (which might get expensive), or just use URIs that are very unlikely to collide. in either case, the server would let the client know that the URI is gone, and that a new one should be requested by POSTing to the factory.
so far, this is just what people have been doing for a while already. as a new twist, what if the creation of a resource on the server is actually a fairly heavy-weight operation, because of all sorts of bookkeeping and retention mechanisms built into the persistence layer? creating a resource before it has been populated ideally should be avoided, because garbage collection is not as easy; as soon as a resource has been created, a lot of repository data is being created and cannot be deleted anymore.
the question i have been very slowly getting to thus is the following: in such a scenario, what about issuing a one-time token as a result of an initial request, which allows the client to PUT to that URI. but only if that PUT happens, the actual resource will be created in the repository. the token is just a place-holder, and the URI of the actual resource depends on the creation of the repository-level URI. in that case, the first PUT will be accepted, but will result in a 303 response that redirects to the cool URI. a second request to the token URI should result in a redirect as well, so that clients know that this token has been used.
there are two ways how the tokens can be associated with the cool URIs. the REST layer could maintain a list of token/persistent URI pairs, or the repository itself could have a field for each resource that keeps track of the token from which a resource has been generated. fast look-up is important, but can be done in either case. extending the repository model probably is the better way to go, but may not be possible in all cases. it is important to notice that tokens only need to be tracked once they have been used (i.e., something has been PUT there), before that, all that is needed is a token scheme that makes collisions highly unlikely (such as time stamp + client IP + random string
).
here is the proposed sequence of steps in this scenario:
- clients
GET the template and a token from the factory resource. since this operation does not change server state, it is safe and idempotent and it is possible to use GET.
- clients
PUT to the token URI, at which point the server creates and populates the resource at the repository level, and keeps track of the token from which the resource has been created. the server responds with a 303 (See Other) response, indicating the persistent URI of the resource.
- subsequent requests to the token URIs are handled with a
301 (Moved Permanently) response, telling clients that the token URI should no longer be used, and that a persistent URI is now available.
if the initial GET fails, it can be repeated safely, because it does not change server state, and thus is safe and idempotent. if something goes wrong with the PUT, it can be repeated, but if the server receives a repeated request, it will redirected with a 301. as soon as the resource has been created and the persistent URI is established, future interactions with the resource work as usual.
what are the main benefits of this approach?
- the factory
GET allows the server to supply a template, and tell the client where to direct requests for creation. since the server is providing the URI, we can use PUT requests for creating the resource at this known URI. this initial step is stateless on the server side.
- the token-based creation allows the server to delay persistence-layer creation until something has been
PUT by the client. the tokens are one-time PUT only, and after that they respond with 301. this approach scales with the number of resources, not with the number of clients, and usually is simply one more attribute in the persistence layer.
any feedback on this pattern is very welcome. has it been used somewhere else? does it seem to introduce problems not discussed here? is there a simpler way of handling the same scenario?