{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cluster computing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Performance of geodynamic models\n", "\n", "For example, typical 2D thermomechanical geodynamic model:\n", "\n", " - Dimensions 1000 km x 1000 km\n", " - Resolution 1 km \n", " - **1 000 000 grid points**\n", " - Max. time step (diffusion limit) 16 kyrs\n", " - Four unknowns ($v_x$, $v_y$, $P$, $T$)\n", " - Four equations per grid point\n", " - Discretized versions of the equations: About 20 operations (+-\\*/) per equation\n", " - **80 000 000 operations (80 MFLOPS =** $80\\times10^6$ **FLOPS) per step**\n", " - Modern PC processors can do about 10-100 GFLOPS (GFLOP = $10^9$ FLOPS)\n", " - *Processor* could do 1000 steps per second\n", " - E.g. 50 Myrs / 16 kyrs/step = 3200 steps\n", " - **Model run time 3.2 secs**\n", " - BUT: Memory access time (random) approx. 50 ns\n", " - Each operation needs to fetch at least one number from memory\n", " - Worst case: random location. $80\\times10^6\\times50\\times10^{-9}~s=4.0~s$ per step\n", " - **Total runtime** (\"**wall clock time**\") $\\approx 4~\\mathrm{s/step}\\times3200~\\mathrm{steps}=3.5~\\mathrm{hours}$\n", " - Also, a lot of other \"book keeping\" during the model calculations\n", " \n", "### Do\n", "\n", " - Make a similar runtime estimation for a 3D model with same resolution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Memory hierarchy](img/memory-hierarchy.jpg)\n", "\n", "*Memory hierarchy of modern PC (https://allthingsvlsi.wordpress.com)*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cure: Split the job among multiple processors.\n", "\n", " - Each will have less operations to do\n", " - *Partitioning* of the job:\n", " - Each processors will handle its own grid points, or\n", " - Each processors will handle itw own part in solving the coefficient matrix\n", " - Each will have smaller memory region to worry about (can store numbers closer to the processing unit)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Processor architecture](img/arch1.jpg)\n", "\n", "*Processor architecture of a 4-core processor (http://sips.inesc-id.pt/~nfvr/msc_theses/msc10g/)*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Modern PCs already use multiple *cores* (CPUs within one physical processor).\n", "\n", " - No speedup if the program/code used does not support multiple cores!\n", " - Limited (currently) to about 16 cores, typically 2-4\n", " - Some PC hardware allows two physical processors\n", " \n", "More cores can be used by interconnecting multiple physical computers (*nodes*)\n", "\n", " - Needs a fast way to communicate between computers\n", " - Faster is better (>10 Gb/s)\n", " - Needs a protocol for CPUs/nodes to discuss with each other in order\n", " to distribute (partition) the work\n", " - One of the most common: MPI (Message Passing Interface)\n", " \n", "![](img/nodesNetwork.gif)\n", "\n", "*Architecture of a computing cluster*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**geo-hpcc** cluster\n", "\n", " - 35 nodes, each with 2 processors, each with 8 cores = 560 cores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Performance of parallel programs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will test the effect of running a code in parallel, using the *geo-hpcc* cluster.\n", "\n", "1. Login to the cluster using instructions at https://introgm.github.io/2018/instructions/cluster.html\n", "2. Type\n", "```\n", "$ cd mpi\n", "$ srun -n 64 python mpi.py\n", "```\n", "3. To see and edit the Python code\n", "```\n", " $ nano mpi.py\n", "``` " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do\n", "\n", " - Run the `mpi.py` script with different number of cores (modify the number after `-n`, try values between 1-256 cores). Keep record of core count and time elapsed. Use Excel/LibreOffice/Python to plot number of processors versus elapsed time. What kind of plot would you expect to see? What do you actually see?\n", " - Try plotting 2-base logarithm of time vs 2-base logarithm of number of cores used.\n", " - Try commands `squeue` and `sinfo` to see the job queue and the status of different nodes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parallel performance results\n", "You can find the results of the parallel performance exercise in the [exercise summary notebook](parallel-performance.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }