What is my HPCC user name/password?
If you are affiliated with MSU, then your MSU NetID is your user name, and your NetID password is your HPCC password. This is the same as those for all the MSU online services. An HPCC account must be requested by an MSU faculty member at https://contact.icer.msu.edu/account
Can I re-set my password on the HPCC because my login got denied after multiple failed attempts?
No. The authentication on the HPCC is directly tied to MSU. You will need to request a password reset at https://netid.msu.edu/netid/password/index.html
I used to be able to connect to the HPCC server, but now I can't. Why?
There can be multiple reasons for this, such as system downtime (so please check https://wiki.hpcc.msu.edu/display/Announce first). Another common reason is account expiry. The HPCC periodically disables users who are no longer affiliated with the university or registered with a class for which the instructor has created temporary student accounts. To re-activate your HPCC account, please have your PI submit a sponsoring form at https://contact.icer.msu.edu/sponsoredrenewal
Can you keep me posted on the current status of HPCC?
Yes. Users are encouraged to follow the HPCC Announcements blog: https://wiki.hpcc.msu.edu/display/Announce regularly to keep updated on the status of HPCC (such as scheduled downtimes and urgent notices).
I am looking for help to troubleshoot my problem. How do I share my code/files with you?
We do not go to your directory to view files or test your code for that matter. Please send your files along with your reply to the ticket email.
Is there any limit per user on using the HPCC resources?
Limit on running a program on a dev-node: 2 CPU hours. If you are running a multi-threaded program, the wall time limit would be (roughly) 2hrs divided by the number of threads.
Limit on file counts: 1 million files for each of the home, research and scratch spaces.
Limit on storage size: each user has up to 1T of storage for free (for each of the home and research directories); beyond 1T, the cost is $125 per TB per year. For the newest scratch space (i.e. /mnt/scratch/<your_user_name>), 50T is the maximum and can't be increased further (users will need to archive/delete files when this limit has been exceeded).
Limit on cluster usage: 1) the longest wall time you can request is 7 days; 2) the maximal number of CPU cores you can use is 1040 at any time; 3) the maximal number of jobs that can be queued is 1000; 4) non-buyin users have a maximum of 1 million CPU hours per year.
I would like to know more about the dev-node limit.
When you connect to any of the HPCC's dev-nodes, you will see the following message:
"processes on development nodes are limited to two hours of CPU time."
The two hour CPU time limit is for each process you run on that dev-node. If one process uses CPU time greater than 2 hours, then only that process will be killed. You however, can still connect to that dev-node, and run another process. Additionally, if your process uses 100% CPU (1 core), it will be terminated in two hours. If your process uses 200% CPU (2 cores), it will be terminated in one hour, and so on.
How do I check my cup time usage?
Run command "cputime". You can also run "sreport" to get the information with date such as "sreport user top user=<username> start=2021-01-06 -t hour".
How do I get my storage usage data?
Run command "quota". You can't write new files if your quota has been used up.
I have a buyin account, do I need to specify it when I submit jobs?
No. When submitting a job without specifying an account, your default account is used. You can check your default account using the "buyin_status -l" command; buyin user's default is their buyin account. We recommend you read https://wiki.hpcc.msu.edu/x/3IAhAQ if you have purchased buyin nodes.
What is HPCC's data backup policy?
We back up data in users' home and research directories, not in their scratch spaces.
You will have 24 hourly backups for the previous 24 hour period. For previous days however, we will provide daily backups only. Daily backups are performed at 12 AM Eastern Time and retained for 60 days.
My files in the scratch space are gone.
Files in scratch are automatically purged if the last modification time is older than 45 days. Note that the scratch spaces are not intended for long-term storage. Files saved in scratch have no back-up.
I can't transfer files from/to my scratch space.
If you use hpcc.msu.edu as the hostname, you are connecting to the gateway (login) node which has no scratch mounted. You should connect to rsync.hpcc.msu.edu in this case. It is a dedicated server for file transfer.
Do you support running GPU jobs?
Why did I get an "Illegal Instruction" error?
This is usually because a program was compiled on a newer CPU architecture (e.g., intel18) but then run on an older one (e.g., intel14). Our system has a range of CPUs, and the newest versions support new instructions not available on the older CPUs. One short-term fix is to run programs on the same CPU that they were compiled on. Based on our experience, this error has occurred only on intel14 nodes and therefore you need to avoid them. That is, for dev-node testing, pick one from dev-intel16, dev-intel16-k80 and dev-intel18. For job submission, add #SBATCH--constraint="[intel16|intel18]" in your SLURM script.
How do I use Python on HPCC system?
There are two methods: users can install their own version of Python with Anaconda or use the versions of Python installed on the HPCC system. To support our users, a wiki page particularly addressing these questions is provided: https://wiki.hpcc.msu.edu/display/ITH/Python(link is external)
I tried to use python matplotlib to plot, but got an error of "No module named '_tkinter'".
If you use the default python module (/opt/software/Python/3.6.4-foss-2018a/bin/python) on a dev-node, you need to load the Tkinter module before using python in order to proceed without errors. Run: module load Tkinter/3.6.4-Python-3.6.4
I have a Python conflict. What should I do to resolve it?
Upon login to a dev-node, a default module list will load automatically. Since Python/3.6.4 is included in the list, it can interfere with a user's conda environment. As a consequence, your program may not be able to find packages installed in your conda environment even if it has been activated. In other words, the program still picks up Python/3.6.4 in the module system. The solution is to run "module unload Python" before activating the conda environment.
How do I deactivate Conda base environment?
Many users have reported that after a local installation of Anaconda on the HPCC, their login prompt changes to something starting with "(base) -bash-4.2$". This is because conda activates the default environment, "base", upon startup. To disable this behavior, which often results in conflicts with system defaults, users can run the following command:
conda config --set auto_activate_base False
Why did my "module load" command output errors?
There are many reasons that errors occur when you try loading a module. However, the most common cause is that you have forgotten to run "module purge".
Sometimes, "module spider" can also fail to find the module. Most likely it's because your personal module cache is out of date. To clear it, run "rm ~/.lmod.d/.cache".
We have module load examples at https://wiki.hpcc.msu.edu/x/yQQzAQ
I want to install some software packages, what should I do?
See here for instruction. Please note that we encourage users to install software on their own, if possible. The HPCC has provided numerous versions of compilers and libraries which should accommodate the vast majority of software across different fields.
When I submit a job, sometime the following message is displayed: "Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions". Why does this message occur?
Once a job is submitted the scheduler adds it to the calculations and continues to update the status of the job as the system works. The status for a job will reflect the current state of the scheduler, so you will see this message update once the scheduler has found a place to put the job. There are always some nodes which are down or drained in the cluster due to normal maintenance, but the "reserved for jobs in higher priority partitions" is the important part, and simply indicates that the scheduler has not yet found a time to schedule the job. This will update as the scheduler continues to function.
How do I request software installation?
ICER currently has over 900 different software installed and available to users on the HPCC. Counting the different versions of each software, there are more than 5,000 versions of system-wide software installations on HPCC system, and that number is increasing monthly. Each month we receive requests from HPCC users for new software installation. Not all requested software can be installed system-wide. Some requests may be recommended for installation in users' home or research group space instead.
If you are thinking of requesting the system-wide installation of a piece of software, we strongly recommend you check the following factors when submitting a request for software installation:
(1) How popular is the software? If it is not a popular software, are there other users on HPCC who would also be using it? If you are the only one using it, we would recommend it be installed in your home directory.
(2) What type of license agreement does the software have? Some software licenses may restrict use even when they are free. Examples include software with export control, specific end-user agreement, etc. When software licenses restrict use, we typically recommend the user directly make an agreement with the software provider to obtain and install it in their home directory. If it will be used by a group of people, HPCC system administrators can help with setting up the group access in compliance with the license agreement.
(3) Is the software well maintained and up-to-date? If the software you wish to install is legacy software or is not being well maintained, chances are its installation will require an older version of its dependencies as well. The effort to install this software may then be greater than the effort required to find an up-to-date software with the same, similar, or even better functionality. It may be time to consider transitioning to using a newer software.
ICER’s Research Consulting Team is happy to help HPCC users resolve their software installation issues, no matter where the software is installed, either system-wide or in your private home or research directory.
Can I use HPC through web browsers?
It is now possible for users to access HPCC resources via internet browsers. Many new users will attempt to log into HPCC via the wiki page: https://wiki.hpcc.msu.edu/(link is external). However, HPCC users are not able to login to the HPCC via the wiki pages. Instead, to access HPCC via a web browser, use the following two portals:
1. HPCC OnDemand, login portal: https://ondemand.hpcc.msu.edu/(link is external)
This is our new web access to HPCC resources. To see its features and how to use it, please visit "Open OnDemand" wiki page(link is external).
2. Web-based Remote Desktop Protocol, login portal: https://webrdp.hpcc.msu.edu/(link is external)
This is the old web access. This access may be replaced by HPCC OnDemand in the future. Users who still utilize this portal should be aware that their Conda installation has a conflict with this protocol. For proposed solutions to this issue, please visit "Web-based Remote Desktop Protocol" wikipage(link is external).
What to do when I can not load modules?
MSU's High Performance Computing Center (HPCC) has a variety of software packages installed system-wide. To access these software, users must load modules which manage the software and their dependencies systematically.
Sometimes, you may find that a particular module can not be loaded. This occurs because the module to be loaded requires other pre-requiste modules to be first loaded before it can be loaded. For example, in order to load OpenMPI/3.1.4 requires GCC/8.3.0 be pre-loaded. The command "module spider <module_name>" shows the names of all pre-requite modules for <module_name>. For example, "module spider OpenMPI/3.1.4" will show which modules are required in order to load OpenMPI/3.1.4.
I have issues with copying files to my HPCC research space.
Many users have reported problems copying or transferring files to their research space. Although their research space still has plenty available space, they get the following error message:
“failed to ... ... Disk quota exceeded”
This problem may occur because the folders which you copy or transfer files to have incorrect group ownership or no set-group-ID. Please read the "Instruction for using research space" section of the Research Space wiki page(link is external) to learn how to resolve this issue.
What is powertools?
Powertools is a collection of software tools and examples that allows researchers to better utilize HPC systems. Powertools was created to help advanced users use the HPCC more effectively. To learn powertools, run command "powertools".
I want copy files from/to my MS One Drive/Google Drive.
Rclone is currently installed on HPCC. This software supports research in the cloud; and helps HPCC users to sync files and directories between MSU’s HPCC and their cloud storage, including OneDrive and Google Drive.
Instructions for starting Rclone on HPCC(link is external) are provided in the HPCC documentation. Instructions for configuring rclone on HPCC are also provided.
The new tool cloudSync has been developed to help users sync files and directories. After running rclone configsuccessfully, load the tool with the command module load powertools then try the cloudSync command. (Click here(link is external) to find more Information on how to use cloudSync.)
How to check HPCC nodes usage?
users can see this information by simply running the "node_status" command on any dev node.
I know that HPCC offers a storage purchases plan. However, I do not need to access my data frequently. Does HPCC offer a cheaper long-term archiving plan?
We do not. However, MSU offers the Data Storage Finder (https://data-storage-finder.tech.msu.edu). There are several possible options for data archiving.