Web Services
-thoughts on service orientated architectures


Wednesday, October 09, 2002

Long-Lived Transactions in a Web Services Network  

Introduction


As web services move from content distribution to transactions, the need to develop a transactional model that works in a highly distributed environment, such as the Internet, is becoming a key challenge for companies moving to a Services Orientated Architecture (SOA).


The classical transaction model presumes two phase commits (2PC) and ACID properties (Atomic, Consistent, Isolated, Durable). This view of transactions presumes that they are short lived and under centralized control. Building very large-scale distributed systems requires balancing latency and consistency with scalability and loose coupling. It is not possible to maintain ACID properties in a distributed system and to maintain scalability. Attempting to do this causes the creation of very tightly coupled systems where companies are integrating transaction semantics. This tight integration requires each company to expose the internal workings of their systems to one another. Integration is no longer a simple data sharing relationship but a tight transaction coupling that crosses organizational boundaries.


When two transaction systems are tightly coupled; there is the possibility for deadlocks to be propagated from one system to another. When a deadlock is caused by an external system, there is the potential for significant business impact. Integration of systems is always a complex undertaking; the degree of agreement required to integrate two systems is a good proxy for the complexity of integration. Integrating transactions requires the highest level of agreement among systems, as this requires agreement on the form and function of business processes.


The definitions of the classic ACID properties help to understand the changes required for long-lived transactions (LLT) in an environment with no centralized control and a highly distributed architecture.



  • Atomic: Atomicity guarantees that all operations within a transaction happen within a single unit of work.

  • Consistent: Consistency guarantees that all transactional resources within a transaction are left in a consistent state either after the transaction succeeds and is committed or after it fails and all resources are rolled back to their previous state.

  • Isolated: Isolation ensures that even though multiple transactions may be running in parallel they appear to be running in a serial manner.

  • Durable: Durability ensures that once a transaction has been marked as committed all information relating to the transaction has been committed to durable storage.


The need for ACID type properties does not go away with a distributed system; rather they need to be extended to encompass loosely coupled architectures. The degree by which they need to be extended depends very much on the business process being carried out. The processes require not only the ability to have guaranteed transaction semantics but also as businesses move to SOA's they need the ability to create loosely coupled systems that allow the composition of complex transactions without the requirement for tight coupling that a shared two phase commit requires.


Services Orientated Architecture


Leading companies are moving to an SOA; they are either part of an SOA or they orchestrate a set of services. They
need to maintain ACID type properties but they also need to embrace loosely coupled architectures from a purely economic and scalability perspective. To achieve all these goals, there is a need to extend the ACID properties definition for long-lived transactions such that the properties are still individually valid and collectively consistent.


In the diagram below, an example of an LLT is made up of four actors, the initiator, who decides the rules of engagement, the two participants who deliver the services requested by the initiator and the coordination service. The coordination service provides the necessary mediation and management functions to enable all parties to participate on an equal footing with a guaranteed level of service.






A long-lived transaction is a collection of individual transactions that communicate through a reliable framework (typically a guaranteed messaging environment). Each individual transaction sends and/or receives a guaranteed message as part of its transaction bracket. When a message is sent, delivery is guaranteed and therefore the transaction can complete. In the event of an error, the messaging infrastructure is responsible for delivering a message to all nodes that have completed the transaction. The message tells the node the LLT it needs to compensate for through a coordination identifier. Compensation is different from rollback, as typically the state of the transaction system has changed since the initial transaction was committed. Going back to a consistent state can now be more complex than simply rolling back to the state before the
transaction was committed. Using this model the definitions of ACID properties are relaxed to include the concept of a LLT.



  • Atomic: In an LLT, each part of the transaction needs to be atomic; the overall transaction is now comprised of a set of atomic components.

  • Consistent: The consistency is now over a set of atomic transactions; of which each may internally use 2PC and have ACID properties. The consistency requirement is relaxed to be within an execution time. If all parties do not complete within the defined time, a compensation event happens.

  • Isolated: Each atomic element of an LLT transaction is isolated. The LLT is a set of isolated transactions; the degree of isolation provided depends on the compensation mechanisms implemented by each atomic transaction. The degree of isolation is delegated to each transaction node. If the node implements a well designed compensation mechanism the overall system maintains the isolation property.

  • Durable: The overall LLT must be durable and maintain the persistence at each step to ensure that in event of any transaction node failing the LLT will either succeed and commit or fail and then the LLT will send individual compensation mechanisms to each participant.


To inter-operate, the need is to loosen the requirement for consistency to a finite time period that is set by the initiator of the transactional conversation, i.e. everyone in the transaction must have reached a consistent state with me within fixed time period. If they do not reach consistency the coordinating application will cancel the transaction and send a compensation message - it is then the responsibility of each participant to implement their compensation mechanism
and inform the coordinator of successful compensation. If compensation fails the initiator must be informed. The action the initiator takes on a failed compensation is process specific.


The state of individual systems is not tightly coupled; they are moving independently of one another. Their consistency is defined within a fixed time period. The notion of compensation is key to long lived transactions. In all reliable systems the key is how well exceptions are handled. The more automatic the exception handling the more robust and reliable the system is.


Each transactional node can perform two-phase commits internally; the only interface exposed to other parties is the sending and receiving of guaranteed messages. Each node provides for sending and receiving messages within a transaction. This allows each node to maintain two phase commits and ACID properties while participating in a long-lived transaction that spans individual 2PC transaction boundaries. The coordination system allows the initiator to create rules that determine the success or failure of each long-lived transaction. Through the coordination system, all nodes have the same visibility into the progress of the transaction through the individual nodes.


Web Services Network


Enabling loosely coupled transactions requires several components be available to all participants. The most important is an environment to create loosely coupled systems from disparate nodes. A guaranteed messaging infrastructure is needed to connect the individual nodes in the transaction group. Other necessary features are a management system to provide visibility and guaranteed exception handling.


To deliver these features in a way that does not require every participant to either implement the functionality of the coordinator, a third party service is required. This is the reason a Web Services Network (WSN) is necessary for widespread implementation of LLT. The WSN acts as the run time engine for building loosely coupled interactions. It provides a uaranteed messaging infrastructure for all participants, provides the necessary visibility, and provides exceptions and events as guaranteed messages allowing the construction of compensation rules.


Grand Central Communications (GCC) provides the needed infrastructure for the construction and management of long-lived transactions. GCC provides a service that enables edge nodes to participant in LLT. GCC provides once and only once messaging semantics for every message sent into the network. For every message, GCC provides shared visibility to all participants allowing a common view of the progress of a transaction. All messages, including exceptions and events, are treated as first class messages with the same degree of reliability and visibility. This allows for the creation of compensation rules by the initiator of an LLT, where the rules for successful completion or compensation can be described and applied to individual business processes.


The key driver is the ability for businesses to implement both transactions and compensation. Compensation is something that has historically been difficult to implement. How many times has an order been cancelled several hours/days after it has been placed and either then shipped and not billed or shipped and not billed? One of the objectives of a WSN is to provide the shared infrastructure to enable the creation of LLT with compensation to help eliminate these types of problems.


Value Chain Example


A simple example is a distributor taking orders from customers through a web services interface and then placing
the individual orders with several manufacturers through web services interfaces. This allows manufacturers real time visibility into the demand chain and can allow the distributor to provide feedback to customers for ship and delivery time directly into applications.


Within the place order process to all the different systems, there is a significant amount of latency - especially if any of the participants use batch processing. The simplest failure is when the order cannot be completed. For some reason, cancellation occurs and all parties need to be notified that the order was canceled. Typically, some time has past since several of the parties have committed their transactions. Therefore a simple rollback is not appropriate, as the state of the systems have changed since the initial transaction. To recover from the error, a compensating transaction is now required.


The failure in any node will create a notification within GCC that will be delivered to the distributor. The distributor will have created a set of rules indicating the action to be taken in the case of failure of a single step. These rules could route the order to a different manufacturer (this is a form of compensation) or in the event of complete failure a compensation message would be delivered to all participants who registered a successful transaction. If a compensation message is not acknowledged the problem can then be escalated to a human to ensure the appropriate resolution.


There are many other failure modes that may exist in an interaction among several parties. One advantage a loosely
coupled system such as GCC provides is to create rules at runtime enabling different modes of success and failure to be defined. This enables a truly loosely coupled system as the system can be changed without the need to deploy or change software.


Conclusion


Not all actors will provide the same level of semantics. Not everyone will be able to provide the necessary persistence mechanisms for once and once only delivery. It is in this heterogeneous environment, of both implementations and capabilities, that Grand Central Communications provides the necessary services to enable solutions.


Grand Central Communications provides the necessary infrastructure and runtime environment to build services
that require long-lived transactions. The range of capabilities that GCC provides enables companies to offer increasing levels of sophistication in an evolutionary manner. This enables businesses to effectively position themselves as providers of robust and reliable services for partners and customers.


posted by John McDowall | 4:10 PM


links
archives