[mpich-discuss] HPC cluster network utilisation monitoring tool ?

Raffenetti, Ken raffenet at mcs.anl.gov
Tue Oct 26 13:08:02 CDT 2021

Apologies, I meant to link to the Nagios Core product, which is free, as an entry point to Nagios monitoring. https://www.nagios.com/products/nagios-core/


On 10/26/21, 10:01 AM, "Raffenetti, Ken via discuss" <discuss at mpich.org> wrote:

    In a previous life, I used NAGIOS (https://www.nagios.com/solutions/network-monitoring/) to monitor all kinds of things on servers. It should be capable of telling you port-level network utilization. As for your switch, you might need to find something specific to the hardware.

    On the MPI side, profilers like mpiP (https://software.llnl.gov/mpiP/) can capture usage statistics and produce a report which you can look to for insights on how MPI is performing for your applications.


    On 10/19/21, 12:52 PM, "Nicholas Yue via discuss" <discuss at mpich.org> wrote:


          I am mainly using MPICH via the mpiexec that ships with Paraview

          I have a small test cluster and a 1Gbit switch.

          What is the recommended way to determine and record the network utilization for a given MPI run?

          I was hoping to gather such information over time and be in a proactive position to plan for network equipment update should I find that I start running into a situation where my network is becoming the performance bottleneck.

        Nicholas Yue

    discuss mailing list     discuss at mpich.org
    To manage subscription options or unsubscribe:

More information about the discuss mailing list