Subscriber and Subscription Administration (SSM) is the system that funnels orders for IBM SaaS choices supplied by way of IBM and third-party marketplaces to the suitable endpoints. This provisions orders for the purchasers and manages their total subscriber and subscription lifecycle. It handles about 2,000 requests per hour.

SSM is a legacy monolith app. Nevertheless, coping with such a mission-critical utility with hundreds of thousands of strains of code is usually a nightmare. Making it extra complicated is the transaction dealing with carried out at each smallest service layer unit. To assist high-end enterprise use instances, there are dozens of composite APIs that SSM helps. These composite APIs internally make calls to the smallest-unit APIs, holding a number of DB connections for a single composite API request.

This finally resulted in blasting the DB reminiscence, shedding myriad stay transactions. You is likely to be asking:

  • Can’t transaction dealing with be carried out on the composite API stage somewhat than the smallest API unit? No as a result of the information entry layer construction is tightly coupled with the lower-level APIs, and transferring to greater stage would introduce a number of stale object-state exception instances.
  • Can’t the monolith app be damaged down into microservices structure, which is a present market development? No as a result of it is a expensive affair when it comes to sources and time; and furthermore, builders have been busy in tending to the above problem, giving no room to suppose and make investments time on this strategy.

It was vital to discover a quick and environment friendly resolution to this downside, because it impacted the enterprise. To make issues worse, with SSM being on the core of {the marketplace} ordering circulation, each upstream and downstream methods have been considerably impacted. It was additionally tough to determine the supply of the issue, whether or not it was on the code, database, or infrastructure layer (because the utility is deployed on IBM Cloud). With the workforce’s engineering abilities and aggressive debugging, the difficulty was analyzed.

The journey

Sample Discovery Part

We analyzed the historic efficiency points utilizing an inner monitoring instrument. This helped determine an enormous variety of calls have been being made to fetch a person with many roles or related entitlements, ensuing within the utility consuming extra sources and in the end inflicting delays for future API calls. Tis was a progressive effort achieved thorough:

  • Grouping the particular APIs within the monitoring instrument that triggered extra load to the applying.
  • Taking a snapshot of historic information, enabling us to seek out the sample that triggered the efficiency degradation.
  • Creating comparable API units to run in an SSM preproduction atmosphere.

Downside Copy Part

Efficiency load exams have been run on an SSM preproduction atmosphere over just a few weeks at completely different occasions of the day. For each run, heap dumps have been collected. Heap dump assortment for evaluation was a bottleneck. The answer was to kill the principle Java course of and duplicate it to an area machine for debugging. Steps to gather the heap dump from IBM Cloud atmosphere:

  • ibmcloud goal --cf -sso
  • ibmcloud cf apps
  • ibmcloud cf ssh <appname>
  • Run - ps -aux (to get the method ID of the working cf apps)

We then killed the method ID with the -3 choice (don’t use the -9 choice). As soon as the above instructions are fired, you’ll discover core dump below the next folder:

       vcap@27854948-c2e2-4bc8-7649-c266:~$ ls -ltr /dwelling/vcap/app/
        complete 5840
        drwxr-xr-x 4 vcap vcap      62 Jul  5 09:50 WEB-INF
        drwxr-xr-x 3 vcap vcap      38 Jul  5 09:50 META-INF
        drwxr-xr-x 2 vcap vcap      26 Jul  5 09:50 jsp
        -rw-r----- 1 vcap vcap 5979538 Jul  5 12:40 javacore.20210705.124041.16.0001.txt

You may generate as many core dumps as you need (relying on the investigation).

Subsequent, we copied the distant core dump into an area laptop computer: ibmcloud cf ssh <appname> -c "cat <path of core dump>" to native laptop computer path listing.

After couple of executions, the identical situation was simulated, which gave some confidence that the investigation is heading in the right direction. It was certainly a frightening job to simulate it over and over throughout peak occasions.

Downside Evaluation Part

With just a few dumps, REST calls (GET and POST) have been analyzed in depth. This gave insights on the degraded utility conduct. The GET calls have been holding the DB connection even after getting the outcome set. In between, different incoming requests waited for the DB connections to launch. This sometimes triggered a impasse state of affairs, ensuing within the general app going into degraded efficiency mode throughout high-traffic occasions, leading to a crash. As the next screenshot reveals, 75 threads in "at com/mchange/v2/resourcepool/BasicResourcePool.awaitAvailable( Code))" have been awaiting connection from pool.

Screenshot shows 75 threads waiting for connection

Resolution Part

Primarily based on the evaluation, the commit mechanism of the GET calls was modified from autoCommit = False to True. This releases the connection instantly when the outcome set is fetched vs. holding the it till the top of the transaction.

Screenshot shows releasing the connection

We fine-tuned the DB connection pool measurement for optimizing the connections between the applying and information layer. We elevated ibernate.c3p0.max_size from 125 to 250 to create extra DB connections within the DB connection pool. We additionally diminished hibernate.c3p0.idle_test_period from 120 to 60 (time for which the connection will be idle earlier than releasing).

Screenshot shows reduction of hibernate time

The mixed strategy above resulted in ~80% enchancment within the response time for all APIs.

Bar charts show improvement in response time for all APIs

The efficiency enchancment was useful and had a constructive affect on the API customers. The journey was more durable, however the discovery and studying made the applying and the workforce extra resilient.


Thanks to Anil Sharma for the evaluation on the database and Bhakta for sharing experience on heap dumps. And particular due to Nalini V. for guiding us on this journey.

By admin

Leave a Reply

Your email address will not be published.