[mpich-discuss] HPC cluster network utilisation monitoring tool ?

Benson Muite benson_muite at emailplus.org
Tue Oct 26 10:22:38 CDT 2021


May also be of interest:
https://github.com/octoshell/octoshell-v2
https://github.com/srcc-msu/job_statistics

There are related papers on JobDigest and OctoShell

Will probably want to determine what data to measure and keep, as many 
of the MPI performance monitoring tools may give significantly more 
information than you need.


On 10/26/21 6:00 PM, Raffenetti, Ken via discuss wrote:
> In a previous life, I used NAGIOS (https://www.nagios.com/solutions/network-monitoring/) to monitor all kinds of things on servers. It should be capable of telling you port-level network utilization. As for your switch, you might need to find something specific to the hardware.
> 
> On the MPI side, profilers like mpiP (https://software.llnl.gov/mpiP/) can capture usage statistics and produce a report which you can look to for insights on how MPI is performing for your applications.
> 
> Ken
> 
> On 10/19/21, 12:52 PM, "Nicholas Yue via discuss" <discuss at mpich.org> wrote:
> 
>      Hi,
> 
>        I am mainly using MPICH via the mpiexec that ships with Paraview
> 
>        I have a small test cluster and a 1Gbit switch.
> 
>        What is the recommended way to determine and record the network utilization for a given MPI run?
> 
>        I was hoping to gather such information over time and be in a proactive position to plan for network equipment update should I find that I start running into a situation where my network is becoming the performance bottleneck.
> 
>      Cheers
>      --
>      Nicholas Yue
>      https://www.linkedin.com/in/nicholasyue/
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 



More information about the discuss mailing list