- Info
News
-
11.6.2010: ui5 offline
-
ui5 crashed again and had to be taken offline for maintenance.
-
11.6.2010: Blade enclosure shut down
-
Yesterday at around 8 pm the HP Blade Enclosure shut down for an unknown reason. All the 166 jobs that ran on this plattform were terminated. We will try to put the enclosure back online in the course of the day.
-
14.6.2010: ui1 online
-
A new ui is now online: ui1.bfg.uni-freiburg.de.
-
22.06.2010: hp32n2 up again
-
after rack4 power failure hp32n2 is up and running
-
2.7.2010: ui1 and ui2 down
-
to reduce overheating in the computer room, we decided to shut down temporarily ui1 and ui2.
-
05.07.2010: BFG cluster still down
-
The BFG remains shut down until further notice.
-
02.07.2010 bfg cluster down
-
After the thermal situation has escalated, the entire cluster had to be shut down.
-
05.07.2010 bfg nfs (homes) up
-
To enable filecopy from the homedirs, the nfs fileservers were restarted. Also lustre will be available in the late evening. Access nfs and lustre via ui7 and ui8
-
06.07.2010 local cluster up
-
The ui nodes and a subset of the workernodes are up again to provide basic function of the cluster for local users. GRID services including dCache are still down.
-
06.08.2010 hp32n2 up again
-
The machine is back. Please note: we don't know for sure if the maintenance operation really fixed the problem.
-
09.09.2010 Cooling failure
-
The cooling failed today. Many worker nodes including the whole bwGRiD had to be shut down.
-
16.09.2010 Cluster offline
-
Severe kernel security issue forces all workernodes offline till upgrade and reboot.
-
17.09.2010 Cluster up partially
-
We are using a patched Kernel now until a secure official SL5 Kernel is released. Only the lustretest queue and the userinterfaces are available.
bwGRiD cluster remains down.
-
23.09.2010 Cluster productive
-
New SL5 Kernel was released and installed. After reboot of almost all machinse including userinterfaces, bfg-classic is fully productive now.
bwGRiD: bwui userinterface was also patched and is available.
-
29.10.2010 hp32n1,n2 up again
-
Cooling in the racks should be OK now. Package update has been done.
-
15.11.2010 hp32n2 down during weekend
-
hp32n2 shut down itself during weekend. Please keep in mind that as long as the reason for the shut down remains unknown, the machine might go down again.
-
24.11.2010 Cooling failure
-
All worker nodes, servers and storage servers had to be shut down.
-
26.11.2010 BFG cluster back online
-
You can now send your jobs again on the cluster
-
bwGRiD News
-
bwGRiD is online again with some new features. New queues have been established and the old ones have been modified. You have more time and resources for your jobs. You have 30TB storage and BFG users have new homes.
-
bwGRiD: Lustre one OST deactivated
-
One OST deactivated. One vdisk lost and 2 are being recreated.
-
GPU computing on the bwGRiD
-
First tests on the bwGRiD with nvidia Tesla
-
bwGRiD LustreFS performance problems
-
The LustreFS performance is down!!! Try not to use it. Searching for the problem.
-
bwGRiD: New single queue
-
We created a new single queue for all jobs which only need 1 node. We hope to have a better control when scheduling jobs.
-
Transition of the cooling system done
-
BFG cluster and bWGrid are operational again.
-
hp32n1 RAM replaced
-
A memory module has bee replaced today. The node had to be stopped for this procedure.
-
04.04.2011 bwGRiD frontend offline due to hardware problem (UPDATE)
-
Due to hardware problems of the BWS frontend server, it is currently not possible to submit jobs on the bwGRiD cluster Freiburg
-
bwGRID back online
-
It is now possible to submit jobs again.
-
BFG cluster back online
-
End of scheduled downtime
-
Cooling system crashed! Starting the clusters with caution!
-
Minimal setup of the clusters is running till problem is solved.
The new cooling system crashed! The clusters BFG and bwGRiD had to be shut down.
-
bwGRiD: New software available / old removed
-
Some new software installed on the bwGRiD. The list is available
via 'module avail'.
-
BFG user homes up again (fs2)
-
There was a hardware problem regarding server fs2. Some of the BFG user homes were down. fs1 and users from the VO bwgrid weren't affected.
-
Crash rack 5
-
Many jobs were killed. Please check the status of your jobs.
-
BFG lustre-file-system maintenance at 22-23 August 2011
-
The maintenance of the lustre-file-system will take place on 22 - 23 August 2011,
between 8:00h 22.08.2011 and 18:00h 23.08.2011.
-
bwGRiD: TORQUE server crash
-
Some jobs could not be finished because there was no communication between TORQUE server and the clients. We're sorry for the inconvenience.
-
bwGRiD: New software available / old will be deactivated
-
We will have a maintenance of the cooling system tomorrow (as you may know). We will try to use this time to install some new software and deactivate old ones.
-
bwGRiD: Quotas enabled for /gridhome
-
The limit is 25G soft and 50G hard!
Please save your files to the storage in Karlsruhe or to your local home. For calculation you can use your lustre space in ${SCRATCH} (30T for all). Only Grid users (dgbw...) affected.
-
BFG Lustre offline
-
Due to a hardware failure, the Lustre file system is down until further notice.
We hope to get it back online soon.
-
Lustre back online
-
The maintenance has been successfully completed.
-
13.03.2012 downtime bfg cluster postponed
-
neccessary software-, system- and firmware-
upgrades and reboots will take place in april
-
24.04.2012 downtime bfg/bwGRiD clusters
-
yearly scheduled maintenance of air condition and important updates will last till 17:00 (estimated), lustre maintenance will
take longer.
-
23.07.2012 Air Condition Maintenance (ended)
-
Air condition unit will be improved with a dust protector.
-
30.08.2012 Short downtime of bfg cluster
-
upgrading dCache and cvmfs requires short downtime and restart
of services. downtime is expected to last from 10.00 to 14.00 only.
queues have to be drained starting the eavening before.
-
18.06.2013 fileserver crash, cluster down
-
fileserver fs2, hosting system image of the clusternodes and a part of the user homedirectories, crashed
-
19.06.2013 cluster up bfg homedirs partially missing
-
batch and grid functions of the bfg cluster are restored.
50% of the user homedirs are not available at the moment.
-
20.06.2013 bfg cluster up - homedirs available
-
homedirs of fs2 were moved to fs1 and are available
-
09.11.2013 bfg cluster down - fileserver crashed
-
system of fileserver fs1, hosting diskless nfs shares and 50% of userhomes crashed
-
10.11.2013 bfg cluster up - homedirs available
-
after some time consuming hardware and installation
operations fs1 is up again. Homedirs are available.
There was no data loss.
-
14.11.2013 bfg cluster up after nfs problems
-
fs1 reinstalled with oi151a8 providing better zfs/nfs capabilities.
perhaps single reboots to complete system config.
|
|