next up previous index
Next: Definitions of variables Up: Setting up your own Previous: Output quantities, locations and   Index

The importance of parallel computing

Suppose one wants to simulate a harbour with a typical domain size of 2 x 2 km2 with SWASH. In addition, we assume the following typical values:

According to the dispersion relation, kd = 1.4 or the wave length of the primary wave is about 90 m. We choose a grid size of 2 m, being 1/45 of the wave length. Requiring a wave Courant number of at most 0.5, the associated time step is 0.03 s. To take into account the higher harmonics in the simulation accurately, two layers will be chosen (see Table 5.2; the minimum wave period is 3.2 s, which is 2.5 times smaller than the peak period of 8 s). On a present-day computer (2.0 GHz Intel Core 2 processor) SWASH requires about 6 $ \mu$s per grid point and per time step for a two-layer simulation. So, the simulation of our harbour takes about 17 days on a single processor to complete the run of 60 minutes real time. This clearly shows the need for parallel computing.


Different parallelization strategies can be considered of which the most popular are: Data parallel programming uses automatic parallelizing compilers which enables loop-level parallelization. Generally, this approach often will not yield high efficiency. The main reason for this is that a large portion of the existing code is in most cases inherently sequential.


On shared memory platforms with all processors using a single memory, parallelization is usually done by multithreading with the help of OpenMP compiler directives. A drawback of this approach is that forcing good parallel performance limits the number of processors only to about 16.


Obtaining good scalability for relatively large number of processors is usually achieved through distributed memory parallel machines with each processor having its own private memory. A popular example of distributed memory architecture is a cluster of Linux PCs connected via fast networks, since it is very powerful, relatively cheap and nearly available to all end-users. The conventional methodology for parallelization on distributed computing systems is domain decomposition, which not only achieves benefit from carrying out the task simultaneously on many processors but also enables a large amount of memory required. It gives efficient parallel algorithms and is easy to program within message passing environment such as MPICH2.


A parallel version of SWASH with the distributed memory parallelization paradigm using MPI standard has been developed. The message passings are implemented by a high level communication library MPICH2. Only simple point-to-point and collective communications have been employed. No other libraries or software are required. For a full three-dimensional simulation with a high resolution we expect a good scalable performance.


Refer to the example above, numerical computations have been carried out for the full simulation period of our harbour on 1 through 32 computational cores of our Linux cluster. The results show a super linear speedup of up to a factor 8.6 on 8 cores, but then it levels off to a factor of 26 on 32 cores. As such, the computing time has been reduced to 15 hours per 60 minutes to be simulated.


next up previous index
Next: Definitions of variables Up: Setting up your own Previous: Output quantities, locations and   Index
The SWASH team 2017-04-06