{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cluster computing\n", "\n", "## Performance of geodynamic models\n", "\n", "![Memory hierarchy](img/memory-hierarchy.jpg)\n", "\n", "*Memory hierarchy of modern PC ([https://allthingsvlsi.wordpress.com](https://allthingsvlsi.wordpress.com))*\n", "\n", "Let's consider a typical 2D thermomechanical geodynamic model:\n", "\n", "- Dimensions: 1000 km x 1000 km\n", "- Resolution: 1 km \n", " - Grid points: **1 000 000**\n", " - Maximum time step (diffusion limit): 16 kyrs\n", "- Four unknowns: $v_x$, $v_y$, $P$, $T$\n", " - Four equations per grid point\n", "- Discretized versions of the equations:\n", " - About 20 operations (+ - \\* /) per equation\n", " - Operations per step: **80 000 000**\n", "- Modern PC processors can do about 10-100 GFLOPS (1 GFLOP = $10^9$ floating-point operations per second)\n", " - The *processor* could do 1000 steps per second\n", " - For example, 50 Myrs / 16 kyrs per step = 3200 steps\n", " - Model run time: **3.2 secs**\n", "- **BUT**: Memory access time (random): approx. 50 ns\n", " - Each operation needs to fetch at least one number from memory\n", " - Worst case: Random location:\n", " - $80\\times10^6\\times50\\times10^{-9}~s=4.0~s$ per step\n", " - **Total runtime** (\"**wall clock time**\"): $\\approx 4~\\mathrm{s/step}\\times3200~\\mathrm{steps}=3.5~\\mathrm{hours}$\n", "- Also, a lot of other \"book keeping\" during the model calculations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise - How heavy is a 3D model?\n", "\n", "- Make a similar runtime estimation for a 3D model with same resolution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Improving model performance\n", "\n", "The cure: Split the job onto multiple processors.\n", "\n", "- Each will have fewer operations to do\n", " - *Partitioning* of the job:\n", " - Each processor will handle its own grid points, or\n", " - Each processor will handle its own part in solving the coefficient matrix\n", "- Each will have a smaller memory region to worry about (can store numbers closer to the processing unit)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modern computer architecture\n", "\n", "![Processor architecture](img/arch1.jpg)\n", "\n", "*Processor architecture of a 4-core processor ([http://sips.inesc-id.pt/~nfvr/msc_theses/msc10g/](http://sips.inesc-id.pt/~nfvr/msc_theses/msc10g/))*\n", "\n", "Modern PCs already use multiple *cores* (CPUs within one physical processor).\n", "\n", "- No speedup if the program/code used does not support multiple cores!\n", "- Limited (currently) to about 16 cores, typically 2-4\n", " - Some CPUs with larger numbers of cores (up to 72) exist, but are expensive\n", "- Some PC hardware allows two physical processors\n", " \n", "More cores can be used by interconnecting multiple physical computers (*nodes*)\n", "\n", "- Needs a fast way to communicate between computers\n", " - Faster is better (>10 Gb/s)\n", "- Needs a protocol for CPUs/nodes to discuss with each other in order to distribute (partition) the work\n", " - One of the most common: MPI (Message Passing Interface)\n", " \n", "![Architecture of a computing cluster](img/nodesNetwork.gif)\n", "\n", "*Architecture of a computing cluster*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The **geo-hpcc** computer cluster\n", "\n", "![The geo-hpcc cluster](img/IMG_1903.jpg)\n", "\n", "- 35 nodes, each with 2 processors, each with 8 cores = 560 cores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Performance of parallel programs\n", "\n", "We will test the effect of running a code in parallel, using the *geo-hpcc* cluster.\n", "\n", "1. Login to the cluster using instructions at [https://introgm.github.io/2020/instructions/cluster.html](https://introgm.github.io/2020/instructions/cluster.html)\n", "2. Type\n", "```bash\n", "$ cd mpi\n", "$ srun -n 64 python mpi.py\n", "```\n", "3. To see and edit the Python code\n", "```bash\n", "$ nano mpi.py\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise - Timing parallel performance\n", "\n", "- Run the `mpi.py` script with different number of cores (modify the number after `-n`, try values between 1-400 cores). Keep record of the core count and time elapsed. We'll compile our results and plot them as a group.\n", " - What kind of relationship would you expect to see?\n", " - What do you actually see?\n", "- Try commands `squeue` and `sinfo` to see the job queue and the status of different nodes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parallel performance results\n", "You can find the results of the parallel performance exercise in the [exercise summary notebook](https://introgm.github.io/2020/day-4/parallel-performance.html)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 2 }