[mpich-discuss] HPC cluster network utilisation monitoring tool ?

Raffenetti, Ken raffenet at mcs.anl.gov
Tue Oct 26 13:08:02 CDT 2021


Apologies, I meant to link to the Nagios Core product, which is free, as an entry point to Nagios monitoring. https://www.nagios.com/products/nagios-core/

Ken

On 10/26/21, 10:01 AM, "Raffenetti, Ken via discuss" <discuss at mpich.org> wrote:

    In a previous life, I used NAGIOS (https://www.nagios.com/solutions/network-monitoring/) to monitor all kinds of things on servers. It should be capable of telling you port-level network utilization. As for your switch, you might need to find something specific to the hardware.

    On the MPI side, profilers like mpiP (https://software.llnl.gov/mpiP/) can capture usage statistics and produce a report which you can look to for insights on how MPI is performing for your applications.

    Ken

    On 10/19/21, 12:52 PM, "Nicholas Yue via discuss" <discuss at mpich.org> wrote:

        Hi,

          I am mainly using MPICH via the mpiexec that ships with Paraview

          I have a small test cluster and a 1Gbit switch.

          What is the recommended way to determine and record the network utilization for a given MPI run?

          I was hoping to gather such information over time and be in a proactive position to plan for network equipment update should I find that I start running into a situation where my network is becoming the performance bottleneck.

        Cheers
        -- 
        Nicholas Yue
        https://www.linkedin.com/in/nicholasyue/

    _______________________________________________
    discuss mailing list     discuss at mpich.org
    To manage subscription options or unsubscribe:
    https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list