To improve system performance and stability, we are migrating ICER/HPCC systems to a new operating system image (CentOS 7), new software system (installed by EasyBuild), upgraded module system (LMOD 7.7), and new manager for jobs and cluster resources (SLURM). Migration of nodes to the new systems is in progress and will be completed by October 15th. To use the new environment, users will need to examine their applications, workflows and scripts for any changes.
For the operating system, nodes will be updated from CentOS 6.9 to CentOS 7.4 and the upgraded system libraries improves their performance and compatibility. One major change is our use of Linux Control Groups (cgroups) which enforces limits on CPU and memory to match your job's request via the scheduling system. Also, all nodes will be limited to no more than 80% of the resources available to reduce resource contention.
For the software and module systems, we installed many recent versions of software by EasyBuild with a most recent LMOD module system. Users should find more functionality on software searching and better compiled programs to improve their research work. Request for more software installations on the new systems are welcome from our users.
For the job scheduling, the new environment uses SLURM to manage cluster resources instead of TORQUE/Moab. We are providing documentation on job management and job scheduling. You may also check Torque vs. SLURM for a smooth transition on the job script and system commands.
We have prepared documentation and transition website for using this new environment, which will be continually updated. To help users' transition into the new systems, there will be open office hours every Monday and Thursday at 1 PM. In addition, group consulting can be scheduled with our research consultants and system administrators to help you recompile your code and adapt your jobs scripts to the new environment. There are also upcoming training sessions; the schedule is posted https://icer.msu.edu/upcoming-workshops
We have migrated a significant number of nodes, including sixty intel16 nodes, to the new environment so users can move works there. To access these new features and to schedule jobs on Intel16 nodes of the new systems, users must log-in to a new development node dev-intel18. This new dev-node is a prototype node for the new intel18 cluster that will be available in September. As fewer nodes with the current (CentOS 6) environment, more jobs with long waiting time users may experience significantly. However, if users would like to submit them to the new environment, they will start running in a short time. Information about the resources currently available in the new environment is available at: https://wiki.hpcc.msu.edu/x/QwIzAQ
Further updates will be posted on our announcements blog https://wiki.hpcc.msu.edu/pages/viewrecentblogposts.action?key=Announce , and to social media https://twitter.com/icermsu and https://www.facebook.com/msuicer/ . If you are a primary investigator who has purchased Intel16 nodes with our buy-in program, please contact us if you want to upgrade your node to this new environment ahead of schedule. We will also be contacting buy-in users to help your group move to the new environment.
This will be a time of transition in the next few months; we appreciate your patience and understanding as we make these necessary changes that will improve the MSU user experience. If you have any questions or concerns, please contact us at https://contact.icer.msu.edu/contact, using the “Compute Transition” subject.