Archive for August, 2007

Tango now open for early adopters

Friday, August 31st, 2007

We have now opened our new AMD Opteron cluster, Tango, for early adopters to experiment with - please feel free to compile code and run short jobs on it.

We can only bring up 2 racks of the machine at present until the new air conditioning unit is installed in early September, so at the moment there are just 230 CPUs available. Another 130 will be added once that work has been done.

At the end of the year the cluster will be upgraded to be in excess of 500 processors!

Please note:

  • Compute nodes may be shut down at any time if problems are found.
  • The head node will need to be shut down briefly on Monday for some new hardware.
  • Jobs are limited to 1 day whilst in experimental mode.
  • This is a 64-bit system, please recompile your code!
  • We have lifted the maximum number of CPUs a single user can use to 128.

Tango has:

  • Portland Group compilers - pgcc, pgCC, pgf77, pgf90.
  • MVAPICH2 for MPI (using Portland Group compilers & Infiniband Interconnect).
  • mpiexec set up to start MVAPICH2 jobs by default.

If you find *any* problems with using Tango please please please let us know ASAP - email help (at) vpac.org with as much as you can tell us about how something went wrong.

We are still in the process of installing software & libraries, if you find something you need missing please let us know so we can prioritise it.

The Brecca Tango swap

Friday, August 17th, 2007

Over the past two days we’ve uncabled Brecca & moved it out of the way ready for it to be removed to its new home at Monash, and have moved two of the Tango racks into its place. Xenon Systems are currently finishing off the power cabling for Tango and we should be able to start commissioning the new cluster in the next week.

Here is a photo of those two Tango racks, sitting where Brecca once stood.

Tango stands in Brecca’s former location.

New UPS in service

Thursday, August 16th, 2007

Today Chloride Hydride commissioned the new UPS that was delivered on Tuesday and so the VPAC machine room is again running on UPS power.

Chloride Hydride 80-Net 120kVA UPS at VPAC.

The unit is a 120kVA unit with a battery life of around 20 minutes at full load.

Edda status update - jobs running again!

Tuesday, August 14th, 2007

After a very long, and very busy, day we have Edda up and running again.

We are a few nodes down due to various issues, but currently there are 41 compute nodes available giving a total of 164 CPUs for jobs.

Don’t forget, some nodes are reserved for jobs of 8 hours or less (between 8am-8pm) and one node is reserved for test jobs of less than 15 minutes (between those same times).

Edda status update - login node up

Tuesday, August 14th, 2007

We have successfully reinstalled the login node for Edda, you can now access edda.vpac.org and queue jobs.

We are reinstalling compute nodes now, once we are happy with the state of them we will start jobs running on them!

Thank you for your patience on this.

Machine room update

Tuesday, August 14th, 2007

This morning we had the old UPS removed that was disconnected as part of the power work over the weekend.

We have also taken delivery of the new UPS and Chloride Hydride are currently installing the new batteries and will be commissioning the unit tomorrow.

The machine room was also prepared for the extra air-conditioning unit that is due to arrive in early September with the removal of a large data safe that was located in the machine room.

Edda status update

Tuesday, August 14th, 2007

We are now able to talk to the hardware management device from the cluster management system.

We are now able to finish off updating the nodes firmware and migrating them to using the new authentication system and storage.

Edda status update

Monday, August 13th, 2007

Sadly Edda is still down due to an incompatibility between the firmware on the management device and the cluster management software. We are downloading the previous version of the firmware for the management device and will downgrade to that version tomorrow morning. We hope that this will let us power up the Edda nodes again.

Apologies for the inconvenience!

Wexstan back online

Monday, August 13th, 2007

Wexstan is now back online - we are working on some internal issues with Edda at the moment.

Current state of VPAC clusters

Monday, August 13th, 2007

After the weekend power down we are working on bringing back Edda and Wexstan but need to complete the following additional work first:

  • More storage! We are bringing about 12TB of storage online to replace the current allocations.
  • Updated firmware on Edda - this is to help fault detection and address various maintenance issues.
  • Updated Myrinet drivers on Wexstan - this is necessary to migrate its Myrinet connections from Brecca to Edda
  • Migrating to LDAP for user authentication - this is necessary to bring the new electronic system to apply for projects and accounts online

More news as we have it.