[mpich-discuss] HPC cluster network utilisation monitoring tool ?
raffenet at mcs.anl.gov
Tue Oct 26 10:00:24 CDT 2021
In a previous life, I used NAGIOS (https://www.nagios.com/solutions/network-monitoring/) to monitor all kinds of things on servers. It should be capable of telling you port-level network utilization. As for your switch, you might need to find something specific to the hardware.
On the MPI side, profilers like mpiP (https://software.llnl.gov/mpiP/) can capture usage statistics and produce a report which you can look to for insights on how MPI is performing for your applications.
On 10/19/21, 12:52 PM, "Nicholas Yue via discuss" <discuss at mpich.org> wrote:
I am mainly using MPICH via the mpiexec that ships with Paraview
I have a small test cluster and a 1Gbit switch.
What is the recommended way to determine and record the network utilization for a given MPI run?
I was hoping to gather such information over time and be in a proactive position to plan for network equipment update should I find that I start running into a situation where my network is becoming the performance bottleneck.
More information about the discuss