Jurassic Park - where DynaSOArs play

Monday, February 25, 2008

Crossed the final hurdle, at last...

I passed my viva last week, and it was sheer joy, a sense of relief, a feeling that I can never explain in words. This is to acknowledge the love and cheer and encouragement from everyone -

"I remember the day I first came to Newcastle to attend the interview for an RA position in the myGrid project. Newcastle was in stark contrast with the town of Gaithersburg, Maryland, where I used to live and work for Verizon Communications Inc. at Silver Spring. It was the middle of January - cold and cloudy and dark, and the wind from the North Sea almost blew me away. It was hard to make a decision about leaving the job in the US to join academia. Today, when I look back, I know, I was right. I have now spent more than five years in Newcastle, the longest period at one place since I left home in 1997 to join the software industry. And I can say that Newcastle has been nothing less than a home to me. It is like my hometown Calcutta - a city, which slowly grows around a person - Newcastle has grown all around me. And I owe it to Professor Paul Watson, Professor Pete Lee and Dr. Anil Wipat - who extended the first welcome to me. I owe my gratitude to the City of Newcastle, the university, all members of staff in the School of Computing Science, my friends within and outside the university, for the warmest five years I have stayed here, despite the chilling weather.

Professor Paul Watson was my supervisor and I could not have come this far without his constant guidance and support. Apart from always managing to get some funds to keep me employed as an RA without which I couldn't have continued with my PhD, he has provided me with valuable guidance and insights throughout the course of research. On numerous occasions when I was struggling to find the right approach, regardless of his busy schedule as the Director of the North East Regional e-Science Centre, Paul has been tireless in his attempts to make me focus on the problem from the correct angle. No word is sufficient to express my gratitude to Paul.

I would like to thank Professor Pete Lee and Dr. Aad van Moorsel, who were the two other members of my thesis committee for their valuable suggestions during and after the thesis committee meetings which acted as inputs to my work. Dr. Jim Smith is another person who has been a close friend in these five years and have always helped me when I faced any problem, be it regarding any architectural aspect of my work, or any silly question about LaTeX. I am indebted to Jim for his help and support whenever I asked for. Dr. Savas Parastatidis, who is now at Microsoft Research, was a source of inspiration during the years I was able to work with him. All the long discussions I had with him regarding the architecture of Web Services have contributed a lot towards my knowledge and the research. I must not miss mentioning about the support I received from the Computing Officers, especially Jim Wight and Gerry Tomlinson, who have always listened to my requests about new softwares on the cluster and helped me in configuring my experimental setup, which sometimes required Jim to bypass security rules of the Computing Cluster for the external computers I used during my experiments.

A large section of the work presented here was the result of collaborative research between Newcastle and Manchester Universities. I wish to thank my colleagues from Manchester, especially Professor Norman W. Paton, Dr. Alvaro A. A. Fernandes, Dr. Tasos Gounaris, Steven Lynden and Dr. M. Nedim Alpdemir, who unfortunately left for his country a couple of years ago, for all the active collaboration and support I received from all of them. Another part of the research, the development of the dynamic service oriented framework, was based on collaborative research as well, and I wish to thank Dr. Chris Fowler, Charles Kubicek, John Colquhoun for their valuable contributions.

I can not forget the amount of support I received from my family during this entire journey. My parents, Mrs. Binata Mukherjee and Mr. Prabhat Mukherjee have inspired me to dream since I was a child. And I am extremely indebted to them, and I hope that these three letters, if I am able to achieve them, will fulfil a part of their dreams. I can never express enough gratitude for the support I received from my sister, Dr. Nandini Mukhopadhyay, who has constantly encouraged me, at times pushed me - when I used to get frustrated. One person needs a special mention here, and that is my wife, Sumana, who never fell short in supporting me in every step, and was not shy in sacrificing her perfectly good job in the US, when I decided to join academia in the UK to pursue my dreams. Our little boy, Rik, has been my source of joy at home and our newborn daughter, Riti, has been another source of inspiration during the last few months of my work.

Finally, I would like to thank EPSRC, who have funded the major projects I have worked in, and my colleagues at OGSA-DAI for their valuable support during the course of research."

Wednesday, May 03, 2006

Contract first

I had an excellent e-mail exchange with Jim and Savas about proper message orientation, and now I feel I have a much clearer concept. The way I have implemented DynaSOAr 2.0 is message-oriented, and loosely-coupled, but there is one drawback - the consumer will not be able to generate any metadata about the service and the messages it consumed with the current WS tools. In a proper message-oriented service, you need to define your messages first, and interact with the service by sending those messages, and not the business objects which is a common tendency in the current WS programming style. Locally these messages are "java objects", you create them and set their properties - but that is not the same as interacting with the service by sending your internal business objects.

What do I mean by these? Let's take a couple of examples:

(1) Sending business objects:

public class PurchaseOrder { }

public class OrderedItem { }

public OrderedItem processPO (PurchaseOrder myOrder) { }

This is the common style where you send your business objects (like PurchaseOrder) while interacting with your service. Effectively, you are exposing the internal details of your service, and also, you need to have several APIs (like deletePO, updatePO etc) to perform several different processing.

(2) Sending messages:

public class POMessage { }

public class ResponseMessage { }

public ResponseMessage processMessage (POMessage myMessage) { }

Here, you are explicitly sending messages, and not exposing your business objects. Ideally, you can have different types of messages, and the service should be able to deal with them differently based on the type of each message - by which it means you do not have to expose a CORBA-style OO interface.

So, this is the style I will be adopting for a revision of DynaSOAr 2.0 - "contract first", as it is termed in the WS-world these days.

And there is a "contract first" issue elsewhere too - my contract runs out on June 15th. Unless it is extended soon, I will not be able to finish what I started. Yes, I can always go back home (to India) and get another job there - but, in that case I won't be able to complete my PhD here - which will mean that I will have wasted a great deal of time...

I am a little tensed about this - hoping that something will come up soon.

Tuesday, April 25, 2006

A step closer to the fully dynamic DQP

The guys from EPCC mentioned that they are trying to create a lighter version of OGSA-DAI which would be readily deployable within a container - as a WAR file. This is a good news for me. So far, deploying OGSA-DAI was a rather complicated process - first you had to deploy OGSA-DAI, then add a data service, then create a data service resource and finally add the resource to the service - as a result of which, I wasn't able to deploy OGSA-DAI on the fly on a node where the data resides. Now that this version of OGSA-DAI will be available (soon), I will move one step closer to the concept of "moving code to the data" - because I already have a version of DQP which can deploy my evaluator services on-the-fly, and I am now adding the code to deploy the analysis services on-the-fly as well.

The code is mostly ready, but I can't test it because since yesterday for some reason, some machines in the giga-cluster are down or unreachable:-(

Old, but still gold

It's about ten years old, but still worth reading - an essay by Neal Stephenson (author of the Snow Crash) - "In the beginning there was the command line".

PS: Whatever Neal writes in the essay and despite how well, I still love my Powerbook G4. Apple has come a long way since those memory-problems when improper memory handling caused them to crash...Lookwise, Powerbooks, iBooks and iMacs are sleek, actually resembling sleek luxury cars, performance-wise they seem to be superior to the windows machines I use. But yes, you need to know how to drive before you start driving a Jaguar, don't you?

Wednesday, April 12, 2006

DynaSOAr 2.0 is ready

Finally, I have been able to finish off the DynaSOAr 2.0 prototype.

This new prototype extends the previous one with the concept of a broker and I have also added the DynaSOAr registries (using GRIMOIRES). In the new architecture, DynaSOAr maintains an internal registry (for its own use) which is updated every time a new service is deployed or added to the repository. For example, when a service is added to the repository for the first time, an entry is made in the registry, with the endpoint of the repository as a CategoryBag for the service. Initially, there are no AccessPoints (service endpoints) because it is not deployed anywhere. When the service is deployed on some node, the registry is updated and a new AccessPoint is added to the service entry.

In the new DynaSOAr Service Provider receives a service request (say meant for Service A) from a consumer. It simply adds the abstract name of the service and the endpoint of the internal registry to the message header and passes it onto the bound entity, which can be a Broker or a HostProvider. The Broker and the HostProvider have similar interfaces, but they function differently. A HostProvider either manages a cluster of nodes, in which case, the HostProvider entity is a ROOT. Alternatively, it can be one of the nodes in the cluster as well, in which case it is the LEAF. So, the Service Provider will always have an entity locally bound to it - and this entity can be a broker, connecting to other brokers or HostProviders. This entity can also be a ROOT HostProvider, and a LEAF HostProvider too. Services are always deployed on the LEAF HostProvider. A Broker decides on where to forward the request if it is connected to other brokers or HostProviders. Right now, I have a very simple random scheduler, but there is a group at Newcastle who are looking into proper scheduling algorithms for this. The HostProvider, once it receives the request checks whether it has the service already deployed within its domain - in which case it is passed on to that node. If the service hasn't yet been deployed, the service code (currently only WAR files) is fetched from the repository and deployed on a target node...

(Right now, I am not considering hybrid nodes - where the node may be responsible for managing other nodes, and deploy service on itself as well)

I have tried some new things in this version - for example using Castor instead of Axis generated stubs. I am using the recommended signature for message-oriented services (because these services need to deal with message headers). I created a schema for all the messages (defined the messages) and used Castor to generate the Java bindings - and within the services, the messages are created using these Java bindings and then marshalled into an XML document using Castor - on the other hand, an XML formatted message is unmarshalled into Java objects by Castor. I am not totally sure whether this approach has been tried by someone or not or whether this should be the proper approach for message-orientation though. But it seems to fit into the concept of message-oriented services, where you first define your schema, and there is only one method in the service interface which decides what to do based on the message received...

Further, I have been able to use this framework with OGSA-DQP - where, the evaluator services can now be dynamically deployed on the nodes which contain the data resources. So, this is one step in the direction of achieving what we call "moving the code to the data".

Friday, January 13, 2006

GRIMOIRES doing funny things

As I mentioned in my past two posts, that I have been trying to use GRIMOIRES as the registry in my dynamic deployment work. The idea is that when a service is deployed on a node, it will be registered with the GRIMOIRES registry, which is the basic concept of UDDI. What we are trying to add is a dynamic deployment feature, where the Service Provider will advertise a set of services, which may or may not be deployed somewhere. The flow is as follows:

A consumer contacts the Service Provider (SP) and finds the services that are supported by the SP. The consumer then decides which service is to be invoked, and sends a request message (SOAP) to the SP. The SP looks up the registry and finds out on which nodes this service is already deployed. If such a node is found, the request is forwarded to that node and on completion, the response is sent back to the consumer. But, if there are no nodes on which the service is already deployed, the SP sends a message to a suitable host to deploy this service dynamically. In this message, the SP provides the service name/ID and the location where the deployment code can be found. In the DynaSOAr work, we have already developed this infrastructure (except the registry), where the deployable code can be fetched from a code-store (which is again a service). I am now trying to develop a registry, and that is where GRIMOIRES comes in. I could have developed a simple MySQL-based utility to store all the information required in the MySQL backend, but then it defies the purpose of UDDI.

So, I need to add "businessEntities" and "businessServices" to the GRIMOIRES registry. The services would have more than one bindingTemplates - because a service can be deployed on more than one nodes. Each service should also have a reference to the Code-Store URL, from where the service code can be fetched during hot-deployment. UDDI specs allow more than one bindingTemplates and there is a concept of tModels, which can be used for reference purposes - which eactly suits the requirement for the reference to the Code-Store. This is what I tried to do. Adding a businessEntity and a businessService din not prove to be difficult at all. But the problem cropped up when I added the bindingTemplate, more specifically more than one of those. It seems GRIMOIRES creates duplicate entries while doing this via the UDDIBrowser. So, once I add a bindingTemplate to a service, the registry is updated, and when the businessEntity (under which the businessService is created) is expanded, multiple copies of the same entry is displayed (using UDDIBrowser). But, funnily, if a query is sent to the registry (via the same UDDIBrowser) inquiry interface, it returns the correct number of services...I suspect it sends a "select distinct"-like query to the database.

Other than this, I think I am comfortable with the tModel concept for CodeStore. So, each of the services registered with the registry will have multiple bindingTemplates, and one tModel reference. I have created a tModel as follows:

<tmodel tmodelkey="some tModel key - uuid">
<name>CodeStoreLocation</name>
<description lang="en">some description</description>
<overviewdoc>
 <overviewurl>http://codestore.url</overviewurl>
</overviewdoc>
</tmodel>

And I am using this as a reference within the businessServices - as categoryBag entries.

<businessService
businessKey="544d3b47-c908-449a-9f2d-8f5c4f69fa9e"
serviceKey="cdb0c903-839b-4a74-ae1d-c3981e55e27a">
<name>QueryEvaluationService</name>
<bindingTemplates>
 <bindingTemplate
   bindingKey="c8cdc9a5-f070-4e16-9e09-f78b3842f9c3"
   serviceKey="cdb0c903-839b-4a74-ae1d-c3981e55e27a">
   <accessPoint URLType="http">serviceURL</accessPoint>
   <tModelInstanceDetails>
     <tModelInstanceInfo tModelKey="some key"/>
   </tModelInstanceDetails>
 </bindingTemplate>
 <bindingTemplate
   bindingKey="ca8d1c0e-9312-4b92-9d80-edb7c247a813"
   serviceKey="cdb0c903-839b-4a74-ae1d-c3981e55e27a">
   <accessPoint URLType="http">service URL</accessPoint>
   <tModelInstanceDetails>
     <tModelInstanceInfo tModelKey="some key"/>
   </tModelInstanceDetails>
 </bindingTemplate>
</bindingTemplates>
<categoryBag>
 <keyedReference
   keyName="CodeStoreLocation"
   keyValue="http://codestore.is.somewhere"
   tModelKey="dqp:uk.org.ogsadai:codestore:location"/>
</categoryBag>
</businessService>

I guess this should work...

Thursday, January 12, 2006

UDDI Registry And tModels

For the past few days I have been scratching my head on how to describe the services that I will put in the DynaSOAr registry for the dynamic distributed query processing work. I have been exploring GRIMOIRES as a possible option. It provides a GShell interface to interact with the registry and alternatively the UDDIBrowser can be used to browse the contents and publish/query entries in the registry. Unfortunately, UDDIBrowser contains minimal (rather no) documentation which led to several problems in publishing new entries, especially tModels. A google search led me to this article on tModels - I found it quite useful. I have a clearer idea on describing the services to be exposed by the registry - specially, identifying the code store repository from where the service code can be fetched in case of dynamic deployment, and also information about the service or the virtual machine image...

I still have problems with GRIMOIRES, which I think is possibly a bug in the UDDI server - because similar steps against jUDDI were successful.

Tuesday, December 20, 2005

Thoughts about OGSA-DQP and DynaSOAr

We have just released OGSA-DQP 3.0. The previous two releases received fair amount of positive feedback from the community, and we hope this new release will have similar fate. DQP-3.0 is based on OGSA-DAI WSRF 2.1 and OGSA-DAI WSI 2.1. This release contains major bug-fixes, performance enhancements and most importantly, it allows the co-ordinator component to be deployed on Windows. Now I will be able to concentrate on my research – hoping to be able to start writing my thesis sometime next summer.

So, what am I trying to achieve?

In the past few months, we have also produced a Dynamic Service Oriented Architecture framework, called DynaSOAr. It provides an architecture for dynamically deploying web services remotely on a grid or the Internet. When a web service provider receives a request for a web service, it checks to see if this web service is already deployed. If it is, the call is routed straight to the deployed web service. If the web service is not deployed, the web service code is fetched from a repository and the web service is deployed on a host provider under the service provider's domain. One potential use for this architecture is for moving web services which access a database and perform analysis on the data closer to the database.

I am trying to use this idea of DynaSOAr inside OGSA-DQP by allowing dynamic deployment of the query evaluation component of DQP (called evaluator) on target hosts, dynamic deployment of analysis services, and also packaged virtual machines containing services or databases.

The evaluators are already compatible with DynaSOAr. I have also been able to create Virtual Machines using VMWare Workstation, start them on Linux hosts from a suspended state using the VMWare Player, and invoke services on them. The challenge is in transporting those VMs on the fly to and from different hosts, for which I am planning to explore several technologies like GridFTP, Peer-to-peer, etc.

Right now, I am fiddling with GRIMOIRES, a UDDI registry released by the University of Southampton. I’m planning to use it within the dynamically deployabale version of DQP. This registry will store information about the services, where they are deployed, and from where the service code can be found in case a hot-deployment is required…

Data and SOA

I just finished reading an excellent article on Service Orientation. It’s written by Pat Helland (from Microsoft), and titled – "Data on the Outside vs. Data on the Inside". Pat explores the concepts of Service Orientation, and talks about how data should be structured within the service and outside the service boundary – which has always been an interesting topic since the advent of Web Services…