I believe in having each application and its underlying services and infrastructure observed, rather than just monitoring the network holistically and responding in a short enough time for damage control.
In a T24 Core Banking system, monitoring all TSA job executions down to the code level can save your team endless troubleshooting meetings.
For each T24 Job to take place, agents will be allocated through TSA.WORKLOAD.PROFILE. Depending on single-threaded or multithreaded routines, the agents will be assigned. Each Agent takes a Job List Number and processes keys located inside them. Once it finishes processing, it deletes the key and goes to the next.
Defining REVIEW.TIME" and "TIME.OUT" in TSA.SERVICE
TSM periodically reviews all the TSA agents running in the system to check whether they are still active.
NOTE: As per the standard functionality of T24, the review time specified for service should always be lesser than the death watch time or the timeout specified.
Checking, clearing, and taking backup of COMO for TSA AGENT.
We can check the COMO name for each Agent in TSA.STATUS application. &COMO& directory inside bnk.run will contain log files, details like COB logs and upgrade and UPDATE logs, etc.
This directory is not cleared by the system and should be removed manually.
Changing TSA Agents at runtime
Some jobs require more agents; some require less to run in COB. Depending on the jobs, we can write a routine to change the agents in TSA.WORKLOAD.PROFILE
Note:: Agents can be increased or decreased.
Clearing TSA.STATUS before starting cob
Note: It is not required to clear the F.TSA.STATUS to start the COB. However, this table is dependent on F.T24.SESSION. F.TSA.STATUS and F.T24.SESSION are interlinked. Removing one and not the other will cause duplicate Session number issues.
TSA.STATUS record with @ID starts with OLTP is in RUNNING status
Whenever a new TSS session is launched or a user login to T24 through the browser, the SERVICE.CONTROL of TSA.SERVICE record OLTP will be updated with the value of 'START'; a new record for OLTP service will be created in TSA.STATUS.
Whenever the server or JBoss is started minimum number of sessions will be spawned. When the TSS session is closed, or the user logs out from T24 properly, the TSA.STATUS of the OLTP record is automatically stopped.
If the T24 session is closed abruptly or does not log out properly, then AGENT.STATUS remains RUNNING.
TSA logoff TSM agent even though it is active
TSM recovery interval should be increased to 'deathwatch time*1.5' to avoid restarting TSM when it is active.
TSM needs to do service. heartbeat for frequent intervals and TSA to be added with a meaningful message
Restarting TSM if all agents die due to DB failure
tSM must be manually restarted if every Agent dies due to database failover. This is because idle agents never resume and hang forever.
TSM will be restarted after crossing the TIME.OUT seconds set in F.TSA.SERVICE>TSM.
F.JOB.LIST in T24
Use of the files F_JOB_LIST_xx
When a job is getting executed during COB, the corresponding files are selected based on the underlying query, and the record IDs to be processed by the Job are written inside the file F_JOB_LIST_xx. Then the TSA (agents) would pick each id one by one from the job list and process them.
JOB.LIST files increases in size daily
JOB.LIST tables are temporary files that are used by the COB/service framework to populate records keys that will be processed by the TSA agents.
All these data/record keys will be processed and deleted by the TSA agents after the successful completion of the Job.
Note: Ensure that there are no TSA agents or TSM agents running during this shrink activity.
System crash while trying to access/create JOB.LIST when running COB/ Online Services
We request you checked the ORA/SQL/XMLdriver.log for the timestamp when the problem occurred.
Residual entries in JOB.LIST
Residual entries are still present in the JOB.LIST file though the Job says completed. Additionally,' Job finished by another thread' in the COMO records can be seen. Check whether the 'ROLLBACK' flag is set to 'NO' for files like F.TSA.STATUS and F.BATCH.STATUS.
Because of changing the ROLLBACK from "-T" to "+T," your log size may increase by a small scale if you use TJ. But there will not be any performance impact because of this.
Where to go from there?
Keep in mind that this is just a small subset of all Job.list and TSA.Agent monitoring and tuning best practices. We are happy to seeing you at our T24 performance tuning and monitoring workshop this fall.
Happy Performance Engineering! Keep up the great work!