Chatbot, software world’s new celebrity. What is it? As per Wikipedia

A Chatbot (also known as a Talkbot, Chatterbot, Bot, Chatterbox, Artificial Conversational Entity) is a computer program designed to simulate an intelligent conversation with one or more human users via auditory or textual methods.

The way I see it (in over simplified statement), it is a Web Service powered by Natural Language Processing and, driven by BigData and Machine/Deep Learning.

There are many frameworks available to build a Chatbot and of all, I chose Microsoft Bot Framework for obvious reasons. There are good number of articles on how to build Chatbot using Microsoft Bot Framework and deploy it on Azure, which I’ll not repeat.

Once Chatbot is built and deployed, a client application is required using which users can communicate with it. Microsoft, provides Out-of-box integration with Skype and allows Chatbot developers to integrate their Chatbots with Facebook Messenger, WeChat and few others.

There are situations when developers want their custom chat clients to interact with Chatbots and integration with custom chat clients is made possible by DirectLine API. Chatbot integration with custom chat clients is a challenge as there is little or no documentation and no samples available for reference. In this article I would using fiddler to demonstrate Chatbot integration with custom chat clients. Let’s get started

Step 1: Enable DirectLine API

Login into your bot dashboard using Microsoft Live ID. Once you register your bot you’ll supported channels.In the list of channels Add Direct Line channel.

When you select Add, a new browser tab will open up where you should Check “Enable this bot on Direct Line” option. You should also generate Direct Line Secret by clicking “Generate Direct Line secret button”. Save generated secret for later use.

Step 2: Use Fiddler to communicate with the Chatbot hosted on Azure

Step 2.1: Initiate communication with Chatbot

Any custom chat client should start its interaction with Chatbot by posting message to “https://directline.botframework.com/api/conversations

Header of the POST message should have Authorization parameter, which take Direct Linesecret generated in Step 1

Authorization: BotConnector <Direct Line secret>

Above request returns with return code 200 and conversation id, like below

Note conversationId. This token/conversationId is valid only for 30 minutes

Step 2.2: POST message to Chatbot

Using the conversationId we can POST any number of messages to Chatbot. All messages to Chatbot will be posted to a new URL, which is https://directline.botframework.com/api/conversations//messages”, in this example the POST request will be sent to

https://directline.botframework.com/api/conversations/Jk21ZkYRQ6a/messages

Request Body will be in the format below
{
"text": “<Your Message>”
}

Above request doesn’t have a response body and return Code will be 204.

Content-type must be sent in header. Without Content-type in header, Chat Client cannot post message to Chatbot.

Step 2.3: Retrieve Chatbot’s response for the message sent

Above request doesn’t return response from Bot. To view responses from Bot we need to perform a GET request on same URI to which we posted in Step 2.2: POST message to Chatbot

GET “https://directline.botframework.com/api/conversations/Jk21ZkYRQ6a/messages

Response from bot is present in “text” field

When you ask two architects with similar experiences to define a system, they would approach and define it very differently. What distinguishes the two defined architectures? and Why are they different? I think and believe, Software architecture should not be considered merely as a set of models or structures, but should include the decisions that lead to these particular structures, and the rationale behind them.

What is decision making? and How do we make decisions?. Decision making is a cognitive process that involves selecting among several alternative possibilities. Each of us make decisions based on our attitude, which we develop over years based on our interpretation of world around us. This Attitude dictates what we choose from alternate possibilities. Architect who would like to have control on things might take monolithic design approach while other Architect with objective attitude might end up with distributed or micro-services driven architecture.

Like Medicine, Software Architecture is not a science but a practice. Individuals’ attitude defines how he/she approaches and architects solution to the problem.

To be a successful Architect it is important to use experiences to tune attitude and apply them to problem at hand.

Thanks for taking time to read through my last post, Parallelism on Multi-core Processors. I was asked by many on how to handle calls to resources which are not thread-safe and have to be locked either by caller or by library, which exposes the resource. Let me pick the same example that I was using in previous post but simplifying it in the interest of time

for (int i = 0; i < 10000; ++i)
{
    Console.WriteLine("Test Parallel");
}

How much time does this for loop take to complete? Ok, How much time would a parallel version of it take to run? Guesses???

Parallel.For(0, 10000, i =>
{
    Console.WriteLine("Test Parallel");
});

On my laptop, simple for loop took ~4 seconds and parallel for took ~6 seconds.

If you thought parallel loop will take less time, then you are wrong? Why does parallel version take more time? I’ll try to keep it simple and try my best not to confuse

Let us assume a scenario where there are 2 threads (T1 and T2) and a resource which can be accessed by only one thread at a time, like resources like console, file handle, etc. In this scenario let us assume that Thread T1 started first and acquired lock on console and then Thread T2 started and tries to acquire lock on console but since it is already taken by Thread T1, Thread T2 will wait for X nanoseconds before attempting again to acquire lock. If lock is still not available Thread T2 will wait for X nanoseconds once again before attempting again to acquire lock and this continues until Thread T2 acquires lock or until it times out(if applicable), whichever is earlier. There is a fixed amount of time that a thread has to wait before attempting to acquire a lock and thread will not be informed about availability of lock so that it can come out of wait/sleep and take the lock. This is the overhead of contention and cannot be negotiated. This is what exactly is happening in Parallel For loop and leading to more execution time than simple for loop.

Above explanation is overly simplified contention management in .Net but in reality it is much more complex and sophisticated. For the purpose of this topic above explanation serves enough.

To conclude,whenever there is a contention for resource or a resource can be accessed by only one thread at a time it is always advised to take sequential route than parallel route for performance benefits.

Let me know what you guys think…

Parallelism, a dark art, no one gets this right first time and many times after that as well. This is one area in software that cannot be understood without understanding underlying hardware(processor). To begin with let me explain the evolution of hardware and then dive into software aspects.

Until early 2000’s processors were single core and were capable of executing only one instruction at a time. Focus of Intel and AMD was on adding more cycles to processor so that more instructions can be executed in a second and thereby reduce overall time that a program takes to execute. After reaching to certain point, heat sink issues didn’t allow adding any more cycles to processor.

Then came multi-core processors, started with 2 cores and now we have 8 cores in consumer devices. But how many cores we can add? Each core produces certain amount of heat in a given time, which needs to be dissipated at equal rate so that the processor doesn’t meltdown. With every additional core that gets added there is a need to add more infrastructure to dissipate heat generated processor, which becomes unmanageable both from size and cost perspectives. This is when hardware geeks say we are done and we can’t improve processor to give more cycles because they are constrained by Laws of physics.

Now it is up to the software pros, to show some skill. We’ll come to skill part in a minute but before that do software programmers understand underlying hardware? Do they write hardware optimized code; I meant do they write programs that use all the cores that are given to them today? Good question, let me try out an example

namespace CompareTPL
{
    class Program
    {
        static int addPara(int num)
        {
            int j = 0;
            for (int i = 0; i < 10000000; ++i)
            {
                 j = i * i;
            }

            return j;
        }

        static void Main(string[] args)
        {
            for (int i = 0; i < 1000; ++i)
            {
                addPara(5);
            }
        }
    }
}

A simple for loop, which madly iterates over a large number and computes product of two numbers. When I run this program on an Intel I7 quadcore machine with 16GB RAM, what do I see in perfmon

Histogram bars tell me that above program is not utilizing all cores efficiently and also the load is unevenly distributed. This means that above program will take longer time to execute.

Now I’ll modify program to use multiple threads to achieve parallelism by replacing for with Parallel.For

namespace CompareTPL
{
    class Program
    {
        static int addPara(int num)
        {
            int j = 0;
            for (int i = 0; i < 10000000; ++i)
            {
                j = i * i;
            }

            return j;
        }

        static void Main(string[] args)
        {

            Parallel.For(0, 1000, i =>
            {
                addPara(5);
            });
        }
    }
}

Now what does perfmon show

Load is equally distributed among 4 cores and all cores are utilized to their maximum limit. This means the program will run faster. On my machine (Intel I7, 4 cores, 16GB RAM) it took 2.487 seconds to complete Parallel version and 5.320 seconds to complete simple for loop version. It tells me that using all cores effectively will execute program in less time. I’ve added .NetCLRLocksAndThreads counters to detect any contentions (in yellow) during execution of this program. Even though there are multiple threads executing same set of instructions there are no contentions reported.

Let’s make this more interesting by replacing i*i with a console statement

class Program
    {
        static int addPara(int num)
        {
            int j = 0;
            for (int i = 0; i < 10000000; ++i)
            {
                Console.WriteLine("Test Parallel");
            }

            return j;
        }

        static void Main(string[] args)
        {
            Stopwatch s = new Stopwatch();
            s.Start();
            Parallel.For(0, 1000, i =>
            {
                addPara(5);
            });
            s.Stop();
            Console.WriteLine(s.ElapsedMilliseconds);
        }
    }
}

What do I see from perfmon now? Utilization of cores is 100% and load is evenly distributed among 4 cores.

Total Number of contentions (Total # of Contentions) have gone up significantly and Contentions Rate Per Second has also gone up. Which resource is causing the contention? Console, yes console is the resource under contention. Only one thread can write to console at any moment. For writing to console, a thread has to request for lock on Console and wait until it gets lock. There are multiple threads waiting for the same lock/resource and this has a profound impact on performance and execution time.

This means that threads created by Parallel.For are waiting for Console to write to it. The same program (with Console) takes more than 30 minutes to complete, which earlier took 2.487 seconds (with no console but multiplication of two numbers i*i). Locks and Contentions are very costly and can kill an application by choking the performance.

Other kind of dependencies that slow down the application are int op1 = a + b; int op2 = op1 + c;

In the above code fragment, second expression(op1 + c) cannot be executed until first expression(op1 = a + b) is executed because second expression needs result of first expression(op1) as input. This instruction level dependency clips compiler and forces it to generate sequential instructions which operating system cannot execute in parallel on available cores.

These are the challenges that a hardware engineer cannot address and only software engineer can. No matter how many more cycles we add to processor we cannot improve program’s performance because there is a flaw in the program that is stopping it from being executed in parallel on available cores.

From above examples, it is evident that we have enough processing power today to get things done decently fast and we are not writing programs to make full use of underlying processors. Hence I support hardware guys not adding any more cycles to processor because they already have added more cycles than what a fairly complex program needs.

The point I was trying to explain is that no matter how much parallelism we introduce if there is a contention for resource we’ll not reap benefits. To conclude, I’ll summarize things that one needs to remember when introducing parallelism to gain performance

  • Today’s Processors and Operating systems are smart enough to distribute load evenly among to all available cores.

  • Programmer has to make sure that all threads or tasks that he/she are introducing are fairly independent so that there are no contentions. This will allow compiler to break instructions into independent instructions which Operating System can execute in parallel on different cores.

  • Deploying an under-performing application on a multi-core platform will not improve performance beyond certain limit. Application has to go through refactoring to reap maximum benefits of multi-core platforms on which it is deployed.

  • Look into Contentions and processor utilization from Perfmon to evaluate your program’s performance. If you notice contentions or underutilization of available processor cores, then focus on refactoring code.

In next blog I’ll try to explain differences between Parallel and Asynchronous programming; and also explain when to use which paradigm.

Azure Scheduler today doesn’t allow submitting long running jobs to Service bus queues/topics but does allow submitting them to storage queues. What if, there is a requirement to submit jobs to service bus queues/topics so that features like duplicate detection can be leveraged to eliminate duplicate jobs from getting executed. One can submit messages to Service bus queue from Scheduler dashboard using REST APIs. Here is how we do it

Job Action

Select Post method in Scheduler Job dashboard, which displays above template. URI will represent Queue or Topic where ever we want to send message. URI template for Service bus would be http{s}://{serviceNamespace}.servicebus.windows.net/{queuePath|topicPath}/messages

we can add timeout for POST request by appending timeout at the end of Uri. POST Uri with timeout will be http{s}://{serviceNamespace}.servicebus.windows.net/{queuePath|topicPath}/messages?timeout=

example: http{s}://{serviceNamespace}.servicebus.windows.net/{queuePath|topicPath}/messages?timeout=60

BrokerProperties allows one to set properties like TimeToLive, MessageId, Label etc… These properties have to beaded in Json format and this is mandatory

Authorization property is used to define AccessToken.

Content-Type value can be anything but it is mandatory to have this parameter added to header.

When Scheduler executes the job, a message will be posted to Service Bus Queue or Topic. In Service Bus Explorer screen snapshot below, we can see properties that we set while creating the job Service Bus Explorer

You can do this from code as well and below is the sample

    using System;
    using Microsoft.WindowsAzure.Scheduler;
    using Microsoft.Azure;
    using Microsoft.WindowsAzure.Scheduler.Models;
    using Microsoft.WindowsAzure.Management.Scheduler;

    namespace SchedulerFault
    {
        public class Scheduler
        {
            private SchedulerClient SchdClient;
            SubscriptionCloudCredentials credentials;
            public Scheduler(SubscriptionCloudCredentials creds)
            {
                credentials = creds;
                SchdClient = new SchedulerClient("<Scheduler Service>", "Job Collection Name", credentials);                
            }

            public void inject()
            {
                var Request = new JobHttpRequest(new Uri(@"<Topic/Queue URI>?timeout=60"), "POST");
                Request.Body = "Test Newton Properties";
                Request.Headers.Add("BrokerProperties", "{\"Label\":\"M2\", \"MessageId\":\"Id1\"}");
                Request.Headers.Add("Content-Type","application/xml");
                Request.Headers.Add("Authorization", "<Access Token>");

                var StartTime = DateTime.Now;
                var Recurrence = new JobRecurrence()
                {
                    Frequency = JobRecurrenceFrequency.Minute,
                    Interval = 1,
                    Count = 5
                };
                var Action = new JobAction();
                Action.Type = JobActionType.Http;
                Action.Request = Request;
                var jobParam = new JobCreateParameters();
                jobParam.Action = Action;
                jobParam.Recurrence = Recurrence;
                jobParam.StartTime = StartTime;

                try
                {
                    var Response = SchdClient.Jobs.Create(jobParam);
                    Console.WriteLine(Response.RequestId);
                    Console.WriteLine(Response.StatusCode);
                    Console.WriteLine(Response.Job.Id);
                }
                catch(Exception ex)
                {
                    Console.WriteLine(ex.Message);            
                }            
            }
        }
    }

We need to install Microsoft WindowsAzure Management Scheduler Nuget package has necessary SDK for managing Scheduler and jobs in it.

Scheduler service name(first parameter for SchedulerClient constructor) can be obtained from Scheduler dashboard or it follows following format CS-RegionName-scheduler. If scheduler is created int Southeast Asia then Service name will be CS-SoutheastAsia-scheduler. If it is created in SouthCentralUS then service name will be CS-SouthCentralUS-scheduler.

Job Collection name(second parameter for SchedulerClient constructor) is a container in which Job will be created.

Simple aah, If anyone says to you that Azure Scheduler cannot submit messages to Service Bus topic or queue then you know what to say and where to re-direct them.

Queues/Topics have been there for longtime and everyone thinks they know everything about them. I would park that discussion for later date and discuss about partitioning, which was introduced as an add-on capability to these services.

In a normal service bus queue/topic there is only message store powered by SQL to store messages. If this message store is not reachable for any reason then entire queue/topic is unavailable. Throughput of queue/topic is limited by this single message store, which more often than not is a bottleneck.

In partitioned queue/topic there are multiple message stores (also called as partition) and, read and write requests are catered by different partitions, which will reduce contention. If a client is writing a message to queue and another client is reading from queue these two operations can happen on two different partitions. This isolation gives boost to performance and improves overall throughput.

Behind the scenes Partitioned queues/topics can be created by enabling a checkbox box in configure queue/topic dialog. By default partitioning is enabled and below screenshots show how to enable/disable partitioning.

Create Queue Create Queue

Configure Queue Configure Queue

Queue summary Queue summary

In the above screenshots, you can see I mentioned queue size (Max Size) as 1 GB but queue 16 GB queue is created? What could have happened? When partitioning is enabled Azure creates 16 partitions or message stores at the backend, each partition is of the size, which was mentioned during create in configure queue dialog. In the above screenshot, I requested 1 GB queue and enabled partitioning so Azure created 16 partitions each of the size 1 GB (, which I mentioned during create). What if I mentioned 5 GB as queue size during create, I get 80 GB queue (16 partitions each of 5 GB size).

Additional space is fine but how does it impact my reads and writes? As shown in diagram below (for simplicity only 3 partitions are shown), 9 messages (Msg 1 thru Msg 9) are sent to queue and queue internally uses round robin algorithm to send messages into partitions. If there is one receiver reading messages from queue, messages are picked from partitions at random. If we abandon a message after reading it then the message goes back to partition from which it was picked and also doesn’t break sequence in that partition. For example, if we pick Msg 5 in partition 2 and then abandon it, it will still be second message in partition 2. But when next receive call is made by receiver, queue can return message from any partition (could also be Msg 5 again). This can break sequential processing because we are expecting Msg 5 (which will always be the case if the queue doesn’t have partitions) and with partitions there is no gurantee that Msg 5 will be returned.

Message selection Message selection

If a partition is unavailable when a client is writing a message into it then queue will send message to next available partition. While reading, message from available partition is returned so there is no wait time like it is in single partition queue and will improve performance. Client application is abstracted from this entire process of partition selection. To enable high availability for queues/topics during local failures it is advised to turn-on partitions during their creation. If sequential processing is a hard requirement then don’t turn-on partitioning.

Event Hub, Microsoft’s PaaS offering to accept millions of events per second is built to handle traffic for Internet of Things scenarios. It (Event Hub) is an offshoot of Partitioned queues/topics and leverages same partitioning strategy. I know you would ask, why does Event Hub scale and Queues/Topics don’t? To achieve scale certain features like session based messaging, TTL were let go in Event Hub, which makes it light-weight and achieve the scale.

In this post I’ll not be writing about Event Hubs functionality, which is available on MSDN instead I’ll be sharing subtle aspects which are important to know while using Event Hubs.

Event hub speed is measured in Throughput Units (TPU), which defines the maximum event rate (ingress and egress) i.e., 1 Throughput unit equals to 1 MB/sec ingress and 2MB/sec egress and 2 Throughput units equals to 2 MB/sec ingress and 4MB/sec egress and so on.

Event Hubs are designed for downstream parallelism which is why egress rate is double that of Ingress. To understand this better let’s do some math

Ingress rate per partition => (TPU / No of Partitions)

Egress rate per partition => ((TPU * 2) / No of Partitions); Egress is twice of Ingress for TPU

TPU Partitions Ingress Egress
1 8 0.125 MB/Sec (=> 1/8) 0.25 MB/Sec (=> 2/8)
2 8 0.250 MB/Sec (=> 2/8) 0.50 MB/Sec (=> 4/8)
1 32 0.03125 MB/Sec (=> 1/32) 0.0625 MB/Sec (=> 2/32)
2 32 0.0625 MB/Sec (=> 2/32) 0.125 MB/Sec (=> 4/32)

TPU - No of Throughput Units

Partitions - Eventhub partitions

Ingress - Ingress rate per partition (MB/Sec)

Egress - Egress rate per partition (MB/Sec)

For example, if user selects 1 TPU and 8 partitions and all partitions see even load, each partition gets approximately 0.125 MB/Sec ingress throughput for a total aggregate throughput of 1 MB/Sec(8 x 0.125) and if user selects 2 TPU and 8 partitions, each partition gets approximately 0.25 MB/Sec (8 x 0.25).

However there is cap on maximum ingress and egress that a partition can deliver, which is 1 MB/sec ingress and 2MB/Sec Egress. For instance, I choose 2 TPUs and 8 Partitions for an Event Hub, and one partition is receiving all of traffic and other partitions are sitting idle. When this partition receives more than 1MB/Sec it starts throttling and doesn’t matter if maximum traffic that Event Hub with 2 TPUs can take is 2MB/Sec.

To summarize, irrespective of load patterns of individual partitions and maximum throughput defined by TPUs selected, a partition cannot take more than 1 MB/Sec ingress.

Let’s say user selects 8 partitions and 10 TPUs for an Event Hub; total ingress that it can support is 10 MB/Sec which means in case of even load distribution each partition will receive 1.25 MB/Sec. How?

=> Each TPU guarantees 1MB/Sec Ingress

=> 10 TPUs will deliver 10 * 1 MB/Sec, which is 10 MB/Sec

=> Total number of partitions to cater traffic is 8

=> Even distribution means 10 MB per Sec/ 8 Partitions, which is 1.25 MB/Sec per Partition

But there is cap on maximum throughput for a partition, which is 1MB/Sec. What does this imply? This means that partitions will throttle if Event Hub is operating at its full capacity. This brings us to most important decision point, Number of Partitions and TPUs have to be selected together and after evaluating load patterns. Event Hub allows changing TPUs anytime but not partitions. Once allocated partitions cannot be increased or decreased and to alter their count Event hub has to be recreated.

Too many parameters to consider aren’t they; let me make it simple with an example. I’ve a scenario where peak traffic volume is 2,000 transactions/Second and each transaction(message) is of size 15 KB. Net volume that Event Hub will have to deal with is

=> 2,000 * 15 KB => 30,000 KB/Sec => 30 MB/Sec (for simplicity 1 MB = 1000 KB)

I choose to have 8 partitions and in the case of even load distribution each partition will receive

=> 30 MB per Sec / 8 Partitions => 3.75 MB/Sec per Partition, which is way more than maximum ingress rate of a partition (1 MB/sec). 8 partitions Event Hub will throttle for above scenario.

Now I configure 32 partitions and in the case of even load distribution each partition will receive

=> 30 MB per Sec / 32 partitions => 0.9375 MB/Sec per Partition, which is less than maximum ingress rate of a partition (1 MB/sec). 32 partitions will cater my requirement (2000 transactions per Sec, each transaction is of 15 KB size) without throttling.

Azure portal allows user to create maximum of 32 partitions and provision upto 20 TPUs. If user requires more throughput units (partitions implied) he/she should contact Azure support.

A physician, a civil engineer, and a computer scientist were arguing about what was the oldest profession in the world. The physician remarked, “Well, in the Bible, it says that God created Eve from a rib taken out of Adam. This clearly required surgery, and so I can rightly claim that mine is the oldest profession in the world.” The civil engineer interrupted, and said, “But even earlier in the book of Genesis, it states that God created the order of the heavens and the earth from out of the chaos. This was the first and certainly the most spectacular application of civil engineering. Therefore, fair doctor, you are wrong: mine is the oldest profession in the world.” The computer scientist leaned back in her chair, smiled, and then said confidently, “Ah, but who do you think created the chaos?”

One prime (not the only) reason for the chaos is rigidity of software architecture and design, by rigidity I meant how easy or hard is it to replace an existing algorithm or a feature or a component with a new and better one. If a component can be replaced with a better one, like replace one caching strategy with other, RDBMS with NoSQL without having to refactor entire code base then design/architecture can be called flexible or stable. But how many times changes like these are contained to a class or two.

In present world where there are millions of software developers connected through internet and in the virtual company of Uncle Bob, Martin Fowler, Mark Seemann and a long list of craftsmen we still see software projects with rigid architectures and bad code. Why do these happen?

In Ramayan, Sita asks Ravana

इह सन्तो न वा सन्ति सतो वा नानुवर्तसे |

तथाहि विपरीता ते बुद्धिराचारवर्जिता ||

Which roughly translates to “Is there no learned person in Lanka who can preach you righteous behavior or is it that he/she preaches and you don’t listen!!! Where is the problem?”

This dialogue between Sita and Ravana suits software professionals more than any other. There are myriad of articles, books, videos on OOAD, SOLID principles and, Patterns and Practices by renowned software professionals that will help in building optimal designs and architecture. But good designs and code are rare sightings. One reason for current status quo, is that engineers and architects spend good amount of time debating about current and upcoming features and creating architecture and designs for present and future needs. Outcome of these discussions is, bring things together, which should never have been together, like employee and his/her reporting functionalities. This coupling leads to rigidity and software becomes unmanageable. “Change is inevitable” this is undeniable truth that history, software craftsmen and all resources on the internet have been saying and engineers have been ignoring.

To avoid these familiar situations in the future, every software professional should ask oneself following set of questions before proposing and finalizing an architecture and design

  • How will my architecture/design handle change?

  • How many components/modules will be affected if I’ve to replace a component with another. For instance, if we are changing message tracking functionality will alerting be impacted? How will I manage changes to tracking without affecting alerting?

  • Am I designing for the exact customer requirements or am I over-designing assuming that customer will have additional requirements in future?

  • How will I unit-test my code?

  • If I’ve to make changes to my code do I have unit-tests to check if I broke any existing functionality?

  • Am I taking any direct dependency on external components like using stored procedures in my class functions.

As long as we are designing for present and not future and ensuring that coupling is minimal, chaos can be managed and reduced. Take care of present, future will take care of itself This discussion is more relevant in the age of cloud computing than before because proliferation of new SaaS and PaaS services with competitive pricing models and the need to stay competitive has put us under tremendous pressure. We have to choose right services and replace existing services without major downtime and maintenance. If we are tightly coupled to existing services and their interfaces then any change in existing interfaces or strategy can potentially put one out-of-business.

To conclude, it is not anymore about right and perfect architectures and designs it is all about optimal architectures and designs, which can meet customer needs at that point of time and can co-evolve with customer business strategy. This is because, software is no more a tactical investment it is strategical advantage for organizations and strategies are influenced by many external factors which are beyond developers span of control and all developers can do is keep their designs open for changes.