[mpich-discuss] HPC cluster network utilisation monitoring tool ?

Raffenetti, Ken raffenet at mcs.anl.gov
Tue Oct 26 10:00:24 CDT 2021


In a previous life, I used NAGIOS (https://www.nagios.com/solutions/network-monitoring/) to monitor all kinds of things on servers. It should be capable of telling you port-level network utilization. As for your switch, you might need to find something specific to the hardware.

On the MPI side, profilers like mpiP (https://software.llnl.gov/mpiP/) can capture usage statistics and produce a report which you can look to for insights on how MPI is performing for your applications.

Ken

On 10/19/21, 12:52 PM, "Nicholas Yue via discuss" <discuss at mpich.org> wrote:

    Hi,

      I am mainly using MPICH via the mpiexec that ships with Paraview

      I have a small test cluster and a 1Gbit switch.

      What is the recommended way to determine and record the network utilization for a given MPI run?

      I was hoping to gather such information over time and be in a proactive position to plan for network equipment update should I find that I start running into a situation where my network is becoming the performance bottleneck.

    Cheers
    -- 
    Nicholas Yue
    https://www.linkedin.com/in/nicholasyue/



More information about the discuss mailing list