I was recently working with a client troubleshooting several problematic areas in their Oracle IPM system. This client is part of a large origination with hundreds of users that could be logged into the system at any given time. With this much usage, the client was set up with several Info & Process Brokers. One of the symptoms that this client was seeing, primarily under heavy usage, was an uneven load balancing.
One of the Info Brokers, IB1, would be processing most if not all of the search requests. This would continue until IB1 would overload and crash. Restarting the IPM service on IB1 would restore usage, but when IB1 went down it would pass off a great deal of its unprocessed workload stressing the second Info Broker, IB2.
So the fail over was partially working in this environment, but even after the first Info Broker was back up and running a majority of the workload was being carried now by the second Info Broker.
The load balancing for Oracle IPM is basically on a Round Robin but uses an Client Side Address caching for holding onto a server (in this case an Info Broker for making SQL Queries) for a predetermined time. The theory behind this is that a user making several queries won’t have to make an Info Broker request from the Request Broker with each search. When a user makes a search, get assigned an Info Broker and the Address Timeout will keep you connected to the Info Broker for X amount of time for however many searches you are going to run. In this clients case once a user was assigned to an InfoBroker a client stayed with that InfoBroker until they logged out.
The setting that handles this behavior is the Address Cache Timeout, and it can be located under the Advanced tab for the Oracle I/PM settings in Services Configuration utility, and the factory default setting is 300,000ms (5 minutes). This seems a reasonable value on lower volume systems or systems that don’t have multiple Info and Process Brokers set up as the Request Broker won’t have to direct every search request to an Info/Process Broker. The software manufacturer even states that “Settings larger than 30 seconds cause unusual behavior between computers.”
In systems with a single Info and Process Broker, this setting will never be an issue. We have also seen systems with load balanced Info brokers that don’t suffer from the issues that this client was experiencing with users seeming to be locked onto a single Info Broker. With this client we ended up reducing the Address Cache.
We moved in a series of steps reducing the time by about half, and were able to see improvements with the load balancing after each change. Currently they are at around 20 seconds and the failover is functioning as we would want to predict it to function as well as users not getting locked onto a single broker until that broker begins to show signs of stress and goes down.
SE with ImageSource Inc.