Richard Cave's blog

DNS Issue with PLoS.org - Resolved

Submitted by Richard Cave on Sun, 2008-10-05 11:40.

We had an outage on our PLoS.org domain resources yesterday due to a DNS issue. As a result, www.plos.org and all of the plos.org subdomains were intermittent as well. The issue was addressed promptly but there was a 2-10 hour delay while DNS servers were updated and ~24 hour delay before a comprehensive international DNS update. Most people were able to access www.plos.org within a few hours.


( categories: )

Journal Websites - Topaz 0.9 rc1 Upgrade

Submitted by Richard Cave on Thu, 2008-07-17 17:03.

Last night, we upgraded the journal websites to Topaz 0.9 rc1 (rc1 because this is a “beta” 0.9 release). The development for this release focused on performance and stability - specifically to alleviate the sluggish speed of the websites and the pain of ingests.

( categories: )

Topaz 0.9 (rc1) - Site Maintenance Tonight on the PLoS Journals

Submitted by Richard Cave on Wed, 2008-07-16 18:08.

We're going to take down the journal websites for a few hours tonight starting at 7pm PST. Russ is going to upgrade to Topaz 0.9 (rc1) on the production servers while Josh replaces/rebuilds the dead drive on the Mulgara server.


( categories: )

Sunday Outage Sunday

Submitted by Richard Cave on Mon, 2008-07-14 15:38.

We experienced a hardware malfunction yesterday that caused the TOPAZ hosted journals to be offline from 4pm - 10pm PST.

James sent an email late afternoon yesterday indicating site errors on the PLoS journal websites. Soon after his email, the IT team started receiving SMS alerts. I assumed that something had occurred with the Topaz framework and started looking at the appropriate log files but couldn't find anything. I spent the requisite amount of time banging my head against the "site error" wall without success and I called up Russ for assistance. After a bit of digging through the server logs, he found the culprit - a drive had failed on the Mulgara server. This drive is part of a RAID 0 configuration, so we didn't lose any data but we also mysteriously lost the connection from the Mulgara server to the DAS array (disk storage for the Mulgara data).

( categories: )

Update on Performance Issues of PLoS Websites

Submitted by Richard Cave on Tue, 2008-03-18 17:56.

Performance of the websites hosted on Topaz has increased over the last two weeks with a variety of patches ported to the production servers. We still have an outstanding memory problem that requires a restart of the Topaz applications three times a day (these restarts usually occur around midnight, 8am and 4pm with a duration of less than 10 minutes). I feel that we're close to diagnosing the memory problem which is the last performance hurdle.

( categories: )
Syndicate content