We are using Hyperic HQ 4.1.2. In our deployment we see several instance of the following in our log.
2010-05-04 14:59:52,011 WARN [Timer-3] [org.hyperic.hq.application.HQApp@313] Method ran a long time.
Class: org.hyperic.hq.autoinventory.shared.AutoinventoryManagerLocal
Method: notifyAgentsNeedingRuntimeScan
RunTime: 920406
It appears this is the result of several agents (>100 in our environment) being marked as "dirty" in the EAM_AI_AGENT_REPORT DB table that are no longer valid agents. Unfortunately, this table contains references to the EAM_AGENT DB table corresponding to the registration order of the agent systems (i.e. older agents appear first). This became a problem when we removed several agents (> 50) from the UI that no longer corresponded to valid systems in our environment. It appears that the SERVICE_DIRTY column in the EAM_AI_AGENT_REPORT table was never set to 0 when these agents were deleted. As a result, the agents that were removed are retried indefinitely by the notifyAgentsNeedingRuntimeScan method which eventually causes it to timeout. Since the invalid "dirty" agents appear at the front of the list being tried by the method it is never able to complete in a timely manner and update the valid "dirty" agents later in the list.
The workaround for us was to (1) decrease the connect timeout used by applying a slightly modified version of the changeset from HHQ-3694 with the major difference being a timeout of 30 seconds instead of 60 and (2) setting the SERVICE_DIRTY column to 0 for any invalid agents in the table. Once we cleared the column for invalid agents the method was able to complete and reschedule measurements for any "dirty" agents that were left in the table.
There are a few tickets related to this issue that help mitigate the problem but do not solve the issue I presented completely due to the ordering issue of registered agents (e.g. older agents are always tried first).
http://jira.hyperic.com/browse/HHQ-3234
http://jira.hyperic.com/browse/HHQ-3240
http://jira.hyperic.com/browse/HHQ-3694
http://jira.hyperic.com/browse/HHQ-3782
[b][/b]
2010-05-04 14:59:52,011 WARN [Timer-3] [org.hyperic.hq.application.HQApp@313] Method ran a long time.
Class: org.hyperic.hq.autoinventory.shared.AutoinventoryManagerLocal
Method: notifyAgentsNeedingRuntimeScan
RunTime: 920406
It appears this is the result of several agents (>100 in our environment) being marked as "dirty" in the EAM_AI_AGENT_REPORT DB table that are no longer valid agents. Unfortunately, this table contains references to the EAM_AGENT DB table corresponding to the registration order of the agent systems (i.e. older agents appear first). This became a problem when we removed several agents (> 50) from the UI that no longer corresponded to valid systems in our environment. It appears that the SERVICE_DIRTY column in the EAM_AI_AGENT_REPORT table was never set to 0 when these agents were deleted. As a result, the agents that were removed are retried indefinitely by the notifyAgentsNeedingRuntimeScan method which eventually causes it to timeout. Since the invalid "dirty" agents appear at the front of the list being tried by the method it is never able to complete in a timely manner and update the valid "dirty" agents later in the list.
The workaround for us was to (1) decrease the connect timeout used by applying a slightly modified version of the changeset from HHQ-3694 with the major difference being a timeout of 30 seconds instead of 60 and (2) setting the SERVICE_DIRTY column to 0 for any invalid agents in the table. Once we cleared the column for invalid agents the method was able to complete and reschedule measurements for any "dirty" agents that were left in the table.
There are a few tickets related to this issue that help mitigate the problem but do not solve the issue I presented completely due to the ordering issue of registered agents (e.g. older agents are always tried first).
http://jira.hyperic.com/browse/HHQ-3234
http://jira.hyperic.com/browse/HHQ-3240
http://jira.hyperic.com/browse/HHQ-3694
http://jira.hyperic.com/browse/HHQ-3782
[b][/b]