On 08/03/21, at 6:30 am PST, the True North helpdesk began receiving reports of connection performance issues to the Las Vegas data center. After review, it was discovered that there were, in fact, random disconnects occurring throughout multiple client environments. True North engineers immediately began a technical deep dive. Our engineers discovered a significant increase in file activity on both database and application servers across several client environments, resulting in random session disconnects due to an increase in latency in the infrastructure.
During the event, some customer users were able to connect, actively working in the EMR while others had more difficulties getting and/or maintaining an active login session.
What We Found
While we were awaiting final vendor log analyses, as soon as we had enough data to identify the source of the problem, we were able to take action that reduced latency to its pre-event levels. Each database and application server in the True North data center has an EDR (endpoint detection & response) security scanning component installed. This tool, named SentinelOne, protects any operating system it’s installed on from malicious activity and ransomware attacks. After consulting with several vendors during the investigation, it was discovered that the active scanning function of EDR on each machine saw an increase of activity in the hosted environments the detection software categorized as abnormal. This appears to have been a false-positive event caused by the detection of “new” and/or unexpected traffic patterns following the recent increase in customers who enabled Medication Management for athenaPractice/athenaFlow. Although the v20 upgrade did not cause the false positive, the related changes to file systems, databases, and file movement patterns triggered SentinelOne to take immediate defensive measures. The extreme load of this additional scanning and log generation led to the spike in latency which caused the event.
After the discovery, True North worked directly with SentinelOne to fine-tune filters in the EDR logging and scanning to alleviate the additional load this traffic caused. Exclusions were put in place based on findings with the vendor. After applying new filters to the scanner and resetting all agents, the traffic load returned to normal levels. More connections to the data center started to re-establish from users at just after 12 p.m. PST and continued to increase through 3 p.m. PST. Performance and activity loads have remained at expected levels since that time.
We want to be clear that the recent events are not related to any malicious activity and are not a result of a cybersecurity threat. All of your organization’s data is secure and 100% accounted for. Please be assured we are continuing to closely monitor the infrastructure for any signs of anomalies. In addition, we will continue periodic reviews and updates to internal processes to ensure more frequent and proactive communications, including status page and/or ticket updates, during a customer-impacting event such as this.
If you have any questions pertaining to the event, please let us know and we’ll do our best to respond within 72 hours. If you have any new urgent or work-stoppage issues, please call in an emergency ticket to the help desk so that we can assist you in a timely manner.