Root cause
New release containing database changes was being pushed to production which locked access to some database tables. Due to high volume of requests to access those tables, and them being locked, requests for access were piling up in the queue which overloaded our servers.
What happened
For a certain amount of time (estimated at 45 min) users were not able to access their TrekkSoft sites due to our production servers being down.
What we did
When our developers identified the new release is impacting the performance of our services and overloading the servers they had to restart those and return them to their normal functioning state with a rollback.
The consequences
During the 45 min time window of the incident no bookings have been able to be processed.
Learnings
In order to prevent similar incidents in the future we updated our internal guidelines on new releases to avoid releasing critical changes during peek hours when high availability is needed.
We apologize for any inconvenience this might have caused you.