November 6, 5:00 PM CT:
We just sent out the following letter via email:
Dear Pressero Customer,
I apologize for the significant Pressero issues encountered as of late. We understand that you have entrusted us with the web-to-print portion of your business and that we have let you down. For this, I am very sorry. Those who have been with us for the last 12 years know that this was an extreme and unusual occurrence.
So what happened? In many ways, “growing pains” may be the best description. We are grateful that many new printing companies have purchased Pressero, and thrilled that many of you have been successful using Pressero. This has resulted in significantly more storefronts running on the platform. To respond to this increased demand, we went through a major hardware upgrade at our Chicago data-center in mid September. Sadly, despite careful planning and analysis, simply adding more hardware did not prove to be enough. Our team has been working diligently day and night to reconfigure the structure of the Pressero server farm. We are continuing to make adjustments to our servers as needed and will keep you informed.
While we cannot change the past, it’s critical to apply the lessons we have learned to the future. We are making the following improvements:
- We will be more proactive about letting you know when Pressero or our other services are not running as they should and what steps we are taking to resolve them.
- We will retain the ongoing services of a specialized company to proactively monitor, advise, and help our team manage the hosting infrastructure.
- We have now implemented a mechanism to better manage usage of the API by external 3rd party applications some of our customers have connected to Pressero. This will help reduce overall system load without impacting any users of properly integrated applications.
We are optimistic that these changes will result in a more dependable and stable Pressero experience. Your patience and support through this difficult time have been greatly appreciated. If you wish to discuss any of this with me, please feel free to email me (firstname.lastname@example.org) or call me at 800.571.2138 x710.
November 6, 1:30 PM CT:
All services are restored. You may now add or edit categories and products without concern. We will continue to closely monitor performance.
November 6, 12:00 PM CT:
While storefronts continue to be responsive, we are still making updates. Again, as precaution we would like subscribers to not add or edit products and categories in admin for another 60-90 minutes while we complete that work.
November 6, 10:24 AM CT:
While we work to restore full services we ask as a precaution that subscribers not add or edit products or categories in their admin accounts. We will update you in about an hour.
November 6, 9:08 AM CT:
Services were partially restored around 8:40 AM. Our team is working to fully restore services.
November 6, 8:10 AM CT:
We are currenty exeriencing some problems with Pressero storefronts. Our team is investigating.
November 3, 5:22 PM CT:
Pressero continues to be stable.
November 2, 12:20 PM CT:
We discovered that some customers have been over utilizing the Pressero API. While this utilization was most likely accidental, it did make an impact on the overall system load. We instituted rate limiting (throttling) on the API and have seen a significant improvement. If you are an API user and are seeing some of our API requests being denied, please evaluate your code to determine if it is overutilizing the API. If you need log information, please open a ticket with our support team.
November 2, 11:00 AM CT:
We are receiving reports of intermittent outages. We are investigating.
November 1, 1:00PM CT:
All systems have been stable since our previous posting. Unfortunately, some of our application updates caused issues related to products with inventory as well as products showing a preview of uploaded files. These issues have been resolved as well. We will continue to closely monitor performance.
October 31, 12:15 PM CT:
The database server has been stable since 11:15 AM. We are continuing to monitor system performance.
October 31, 11:02 AM CT:
We are seeing some intermittent issues on the new database server. Our consultant is investigating.
October 30, 4:45 PM CT:
We have added four additional web servers to the cluster serving storefronts. We will continue to monitor system performance.
October 30, 2:45 PM CT:
Pressero continues to experience intermittent availability issues. Our consultant is recommending that we temporarily add more web servers to the cluster to help with performance until the root cause can be identified. Our DevOps team is currently deploying the additional servers over the next hour.
October 30, 10:40 AM CT:
The database server is back online. We will continue to monitor performance.
October 30, 10:30 AM CT:
The slow page load time issue has become worse. We are going to take the database server down now for 5 minutes to change resource allocation and make a configuration change. Thank you for your continued patience.
October 30, 10:00 AM CT:
Unfortunately, we are receiving some reports of slow page load time. Our team is working to address this.
October 29, 4:45 PM CT:
In an abundance of caution, we had our caching (redis) cluster rebuilt and expanded. We will continue to monitor system performance.
October 29, 10:15 AM CT:
Pressero storefronts just experienced an outage. The admin interface stayed mostly available during this incident. The outage was not due to the new database server. It was related to a failure of our caching (redis) cluster. Services have been restored but the team is investigating the reason and what can be done to prevent this from happening again.
October 29, 12:53 AM CT:
The conversion to a new database server is complete. We will continue to monitor system performance.
October 27, 5:02 PM CT:
Our datacenter is nearly ready for the database server update. We will be doing this during our regularly scheduled maintenance window at 11 pm CT on Saturday, October 28. Pressero will be down approximately two hours.
October 27, 12:26 PM CT:
Unfortunately Pressero does continue to experience some intermittent performance problems. Our DevOps team is working with the consultants to address these concerns.
October 27, 8:15 AM CT:
Our data center ran into technical problems with the new database server. We decided that we can no longer have Pressero down. Our consultants added some additional resources to our existing database server and performed some performance configurations. Services will be completely restored over the next few minutes. We will post an update later today about system performance and when the next attempt to switch to a new database server will be.
October 27, 6:56 AM CT:
Unfortunately, a few challenges have been encountered. The revised estimate for a return to services is now 8:00 AM. We apologize for this delay.
October 27, 4:51 AM CT:
We are in the final stages of preparing the new database server. Pressero will be down for until 7:00 AM CT (revised from 6:30) for the server cutover. During this time we will display a message on your sites stating that your site is down temporarily for maintenance. We will update this document with further information once the cutover is complete.
October 26, 8:10 PM CT:
Our consultants have advised us that the next step is to replace the current database server that Pressero uses. We are working with the data center to quickly provision this server within our private cloud infrastructure. Unfortunately moving to this new server will involve some downtime later tonight. We hope to keep this outage to a minimum. We will update this post when a time and estimate outage window is available.
October 26, 4:22 PM CT:
As you most likely know, we have been experiencing significant system issues with the Pressero system running out of our Chicago data center. On behalf of the entire Aleyant team, I want to apologize for this. We understand that these issues have been more than an inconvenience to you.
I want to assure you that the severity of the problem is understood. Our technical team has been working long hours to resolve this problem. In an effort to resolve this quicker, we have also brought in 3rd party consultants to assist with the timely resolution of these problems. We will spare no expense to make Pressero a stable and dependable platform for you and your customers.
One piece of feedback we have received from our customers is that we have not done enough to communicate the status of this issue to you. We will begin posting more notices on our support news page, which can be found here: http://support.aleyant.com/news/root.aspx
Your patience and understanding are greatly appreciated.