Archive for the ‘Clusters’ Category

Current state of VPAC clusters

Monday, August 13th, 2007

After the weekend power down we are working on bringing back Edda and Wexstan but need to complete the following additional work first:

  • More storage! We are bringing about 12TB of storage online to replace the current allocations.
  • Updated firmware on Edda - this is to help fault detection and address various maintenance issues.
  • Updated Myrinet drivers on Wexstan - this is necessary to migrate its Myrinet connections from Brecca to Edda
  • Migrating to LDAP for user authentication - this is necessary to bring the new electronic system to apply for projects and accounts online

More news as we have it.

Brecca Shutdown

Friday, August 10th, 2007

Brecca was shutdown for the final time today, and in the near future will be moved to Monash to be reincarnated as a VPAC cluster that will be accessed purely via the grid by VPAC members.

Here are the final moments of Brecca in its current incarnation. David Bannon issues the final shutdown command on the management node.

David Bannon shuts down brecca-m, the management node for Brecca

The shutdown in progress.

The final shutdown of Brecca

Next week Brecca will be moved out of the way in the VPAC machine room and the first two racks of the new cluster, Tango, will be set into its place. The final rack will join Tango once the new air conditioning unit has been installed to the rear of Tango.

Tango update - 8th August - Much progress!

Wednesday, August 8th, 2007

It’s been a week since the last update, and the reason for that is that we’ve been flat out here! All three of the racks now have all the delivered nodes racked up..

Tango starting to look complete

..and last weekend the cablers were in wiring up the racks with made to measure tested cables making a rather nice looking arrangement.

Rear view of Tango showing the rather neat wiring work.

This next weekend we have a total shutdown in the machine room so that the electricians can upgrade the wiring ready for our new UPS to arrive on Tuesday and also prepare for the additional air conditioning unit which is due in early September.

Tango update - 1st August - Third rack arrives

Wednesday, August 1st, 2007

Today the last rack arrived and stands ominously in the corridor, waiting for its imminent move into the machine room.

The rack awaits

The racking of the compute nodes proceeds apace - Xenon have one rack almost full and another started.

The other 2 racks are filling up nicely

Tango update - 31st July - More compute nodes arrive

Tuesday, July 31st, 2007

Today we had another 29 compute nodes arrive..

Chris and Sam passing compute nodes into the VPAC machine room

..stacked and ready to rack!

29 compute nodes, 116 cores!

Tango update - 24th July - new rack arrives

Tuesday, July 24th, 2007

Today we had the second of the three racks arrive for Tango. The first essential is to dispose of the packaging!

The VPAC systems manager

Because of the false floor in the VPAC machine room the rack has to be pushed up a purpose built ramp by a couple of handy press-ganged volunteers.

Andy and David manouvre the rack into the machine room

It then needs to turn past Wexstan…

Turning the corner past Wexstan

…to go down the central corridor past Edda…

Passing Edda

…to its temporary home whilst Tango is assembled.

New rack in place

Tango update - 23rd July - racking nodes

Monday, July 23rd, 2007

Busy day today for Tango, first we had the rest of our first batch of 27 nodes arriving ready for racking.

More boxes arrive

Xenon then unpacked them..

Nodes unpacked

…and racked them.

Nodes being racked

So now we have 25 nodes racked (the other two are being used for familiarisation work by VPAC) and ready to be joined by their compatriots in the near future.

First 25 nodes racked

Brecca Shutdown - 10th August

Thursday, July 19th, 2007

Brecca is due to be retired to be replaced by a nice new cluster of AMD Opteron machines called “Tango”.

Currently we are expecting to retire Brecca on the weekend of the 11th August as that is the date of power work in the machine room to support the new power and cooling for the new cluster. All VPAC machines will need to be shut down that weekend.

We are currently waiting for confirmation from RMIT of that date, but we have put a reservation on to prevent *any* jobs starting now that would still be running on the 11th. Shorter jobs will still run. Jobs that were already started and have not finished will sadly need to be killed.

Remember that a VPAC account lets you access any of the academic clusters at VPAC (Brecca, Wexstan and Edda at present) and that your files are independent of the clusters, so Brecca going away won’t change your data and you will see the same files whichever cluster you login to.

VPAC has moved to AARNET3

Saturday, May 12th, 2007

It has been a long day but we have finally managed to transition all the clusters and other VPAC infrastructure to our new AARNET3 network connection.

All three of the VPAC clusters (brecca, edda and wexstan) are now open for business again and running jobs have continued uninterrupted during this period.

If you find you have problems connecting to the systems on Saturday please try again on Sunday, it may take a while for some sites DNS records to update if they ignore the timeout on them.

If you have any problems please email the VPAC helpdesk at the email address: help at vpac.org

Change to how “showq” works

Monday, April 23rd, 2007

We have changed the “showq” command to not list blocked jobs by default.

Previously you would see a list of all running, eligible (waiting to run) and blocked jobs but if there were several hundred blocked jobs at the time you would need to page up a lot to be able to see what jobs were actually waiting to run.

Blocked jobs are those that are exceed part of the VPAC scheduling policy, such as how many jobs you can have running at once and how many jobs you can have waiting to run. This doesn’t mean that they will never run, just that they have to wait for some of your other jobs to finish first before they will be eligible.

To see blocked jobs you can do “showq -b”.