[mpich-discuss] HPC cluster network utilisation monitoring tool ?
Benson Muite
benson_muite at emailplus.org
Tue Oct 26 10:22:38 CDT 2021
May also be of interest:
https://github.com/octoshell/octoshell-v2
https://github.com/srcc-msu/job_statistics
There are related papers on JobDigest and OctoShell
Will probably want to determine what data to measure and keep, as many
of the MPI performance monitoring tools may give significantly more
information than you need.
On 10/26/21 6:00 PM, Raffenetti, Ken via discuss wrote:
> In a previous life, I used NAGIOS (https://www.nagios.com/solutions/network-monitoring/) to monitor all kinds of things on servers. It should be capable of telling you port-level network utilization. As for your switch, you might need to find something specific to the hardware.
>
> On the MPI side, profilers like mpiP (https://software.llnl.gov/mpiP/) can capture usage statistics and produce a report which you can look to for insights on how MPI is performing for your applications.
>
> Ken
>
> On 10/19/21, 12:52 PM, "Nicholas Yue via discuss" <discuss at mpich.org> wrote:
>
> Hi,
>
> I am mainly using MPICH via the mpiexec that ships with Paraview
>
> I have a small test cluster and a 1Gbit switch.
>
> What is the recommended way to determine and record the network utilization for a given MPI run?
>
> I was hoping to gather such information over time and be in a proactive position to plan for network equipment update should I find that I start running into a situation where my network is becoming the performance bottleneck.
>
> Cheers
> --
> Nicholas Yue
> https://www.linkedin.com/in/nicholasyue/
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
More information about the discuss
mailing list