Home Exam 2: Evaluation of Transports for Application Running in a Cluster

In this assignment you will evaluate the performance of different transports available in the compute cluster at NopBox.

  • Compare and evaluate the performance of Gigabit Ethernet, IP over PCI Express (IPoPCIe) and SuperSockets.

  • Write a report where you present the evaluation and the main conclusions.

  • Create a poster (2 x A3 pages) that the group will present on May 4th.

Scenario /  Testbed

The assignment will be graded based on the groups’ ability to produce useful and correct information within the boundaries of the given time and resources.

You must design a set of experiments to be performed on the testbed provided for the assignment.

The scenario in this assignment is the same clous service provider, that needs to do some processing in the data center. The cloud provider has a 8-node cluster available for your evaluations. You should give a recommendation what kind of transport they should use for their MPI applications. In addition to standard Gigabit Ethernet the machines are connected in a PCI Express network, and you can also use IP over PCI Express (IPoPCIe) or SuperSockets.

For you tests, try to use only two nodes. All nodes are connected on a 8-port PCI Express switch, and the cost is equal to communicate will all nodes in the cluster. There are several interesting metrics to measure, however, the most important metrics that should be covered are throughput and latency. For this assignments you can use the OSU benchmarks in the MPI-library (located in the /opt/osu directory.), you are also are also free to write your own.

You are free to use the built-in tools to do measurements (located in the /opt/DIS/bin directory), you are also free to modify the test applications or write your own benchmarks. If you choose to do your own benchers It Is important that this is documented in your report.

There are several options (shown both in the group session and lecture) where you can adjust the properties off all three transports. It is expected that you do some experimenting and give a through evaluation of the optimal setting for the cluster.

We encourage you to discuss the challenges and techniques across groups to reduce the overhead in attaining a new field of knowledge.

Report

You must write up the results as a technical report of no more than 4 pages in ACM format. It is expected that such a report includes the core elements presented in the lectures under  “A systematic approach to performance evaluation”.The results must be based on your own experiments and your own data.

The report is evaluated by writing quality, clarity of presentation, by the trustworthiness and correctness of the results. The evaluation does not consider whether related work (citations of other papers) is included.

Evaluation Details

In our evaluation of the reports, we will focus on the following elements:

  • Choice of metrics, workloads, system configuration parameters and methodology for the experiments

  • Use of statistical sound methods when analysing the data

  • Disposition of the available time (ability to collect and present useful information within the boundaries of the available resources)

  • Objectivity in defining the work, choosing metrics and workloads, in the analysis and in presenting the results

  • Transparency of reporting (exposure of assumptions and limitations to the reader)

  • Clarity of presentation

Bonus elements:

  • Analysis of metrics, beyond the core metrics listed above, that helps illustrate the qualities of the different transport mechanisms.

  • Custom workloads on the OpenMPI testbed.

  • Evaluate one or more applications such as distributed training of neural networks (i.e. TensorFlow) or databases.

 

Formalities

The deadline for handing in your assignment is: Friday, April 28th at (23:59:59.999).

Deliver your report (as PDF) at https://devilry.ifi.uio.no/.

The groups should also prepare a poster (2 x A3 pages) and a quick talk (max 5 minutes without slides) where you pitch your poster for the class on May 4th. Name the poster with your group name, and e-mail the poster by email to inf5072@ifi.uio.no no later than noon (12:00) on May 2nd. We will then print the poster for you.

For questions and course related chatter, we have created a Slack space:

https://mpglab.slack.com/messages/inf5072/

 

There will be a prize for best poster/presentation (awarded by an independent panel and independent of the grade).

Please check the Dolphin & Cluster FAQ page for updates and FAQ.

For questions please contact: inf5072@ifi.uio.no

 

Resources:

MPI Benchmarks (OSU Micro-Benchmarks)

Useful commands for tweaking SuperSockets, IPoPCIe and running MPI

 

Publisert 3. apr. 2017 15:32 - Sist endret 5. apr. 2017 09:47