V Latin American Symposium on
High Performance Computing
Symposium: July 23-24, 2012
School: July 25 to August 3, 2012
The use and development of High Performance Computing in Latin America is steadily growing. The new challenges coming from the use of the computing capabilities of clusters, grids, and distributed systems for HPC, help to promote the research and innovation in this area. Building on the great success of the previous four editions, in 2012 the Latin American Symposium on High Performance Computing grew to include three major events: the V HPCLatAm2012 International Symposium (Buenos Aires, from July 23-24), the High Performance Computing School (ECAR 2012, Buenos Aires, from July 25 to August 3), and the HPC Day (La Plata, August 30) within the 41st Argentine Conference of Informatics (41 JAIIO). The HPCLatAM2012 International Symposium provided a regional forum fostering the growth of the HPC community in Latin America through the exchange and dissemination of new ideas, techniques, and research in High Performance Computing. The symposium featured invited talks from academy and industry, short- and full-paper sessions presenting both mature work and new ideas in research and industrial applications. The submitted articles presented new valuable contributions in the areas of Parallel Algorithms and Architectures, High Performance Applications, Tools and Environments for High Performance System Engineering, Graphics Processing Units in High Performance Computing, Distributed and Grid Computing, and Parallelism and Data Sharing on Multi-core Architectures, among others.
AUTHORS: Andre Rossa (Engineering Simulation and Scientific Software, Rio de Janeiro, Brazil) and Alvaro Coutinho (High-Performance Computing Center, Department of Civil Engineering, Federal University of Rio de Janeiro, Brazil)
ABSTRACT: We study coupled incompressible viscous flow and advective-diffusive transport of a scalar. Both the Navier-Stokes and transport equations are solved using an Eulerian approach. The SUPG/PSPG stabilized finite element formulation is applied for the 8-node isoparametric hexahedron. The implementation is held using the libMEsh FEM library which provides the support for adaptive mesh refinement and coarsening and parallel computation. The Rayleigh-Bénard natural convection and the planar lock-exchange density current problems are solved to assess the adaptive parallel performance of the numerical solution.Download full paper.
AUTHORS: Santiago Costarelli, Mario Storti, Rodrigo Paz, Lisandro Dalcín (CIMEC-INTEC-CONICET-UNL) and Sergio Idelsohn (CIMEC-INTEC-CONICET-UNL, International Center for Numerical Methods in Engineering (CIMNE), Technical University of Catalonia (UPC), Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Spain)
ABSTRACT: Graphic Processing Units have received much attention in last years. Compute-intensive algorithms operating on multidimensional arrays that have nearest neighbor dependency and/or exploit data locality can achieve massive speedups. This work discuss a solver for the pressure problem in applications using immersed boundary techniques in order to account for moving solid bodies. The solver is based on standard Conjugate Gradients iterations and depends on the availability of a fast Poisson solver on the whole domain to define a preconditioner.Download full paper.
AUTHORS: Rodrigo Alonso (Instituto de Física, Facultad de Ingeniería, Universidad de la República, Uruguay) and Sergio Nesmachnow (Centro de Cálculo, Facultad de Ingeniería, Universidad de la República, Uruguay)
ABSTRACT: This article presents the application of parallel computing techniques to process satellite imagery information for solar resource estimates. A distributed memory parallel algorithm is introduced, which is capable to generate the required inputs from visible channel images to feed a statistical solar irradiation model. The parallelization strategy consists in distributing the images within the available processors, and so, every image is accessed only by one process. The experimental analysis demonstrate that a maximum speedup value of 2.32 is achieved when using four computing resources, but beyond that point the performance rather decrease due to hard-disk input/output velocity.Download full paper.
AUTHORS: Germán Gadea, Andrés Flevaris, Juan Souteras, Sergio Nesmachnow, Alejandro Gutiérrez and Gabriel Cazes (Universidad de la República, Uruguay)
ABSTRACT: This paper presents an efficient parallel algorithm for the problem of converting satellite imagery in binary files. The algorithm was designed to update at global scale the land cover information used by the WRF climate model. We present the characteristics of the implemented algorithm, as well as the results of performance analysis and comparisons between two approaches to implement the algorithm. The performance analysis shows that the implemented parallel algorithm improves substantially against the sequential algorithm that solves the problem, obtaining a linear speedup.Download full paper.
AUTHORS: Juan Pablo Balarini, Martín Rodríguez, and Sergio Nesmachnow (Centro de Cálculo, Facultad de Ingeniería, Universidad de la República, Uruguay)
ABSTRACT: This article introduces a parallel neural network approach implemented over Graphic Processing Units (GPU) to solve a facial recognition problem, which consists in deciding where the face of a person in a certain image is pointing. The proposed method uses the parallel capabilities of GPU in order to train and evaluate a neural network used to solve the abovementioned problem. The experimental evaluation demonstrates that a significant reduction on computing times can be obtained allowing solving large instances in reasonable time. Speedup greater than 8 is achieved when contrasted with a sequential implementation and classification rate superior to 85% is also obtained.Download full paper.
AUTHORS: Mauro Canabé and Sergio Nesmachnow (Centro de Cálculo, Facultad de Ingeniería, Universidad de la República, Uruguay)
ABSTRACT: This work presents parallel implementations of the MinMin scheduling heuristic for heterogeneous computing using Graphic Processing Units, in order to improve its computational efficiency. The experimental evaluation of the four proposed MinMin variants demonstrates that a significant reduction on the computing times can be attained, allowing to tackle large scheduling scenarios in reasonable execution times.Download full paper.
AUTHORS: Santiago Iturriaga, Sergio Nesmachnow (Universidad de la República, Uruguay), Francisco Luna, and Enrique Alba (Universidad de Málaga, Spain)
ABSTRACT: This work presents a parallel implementation on GPU for a stochastic local search method to efficiently solve the task scheduling problem in heterogeneous computing environments. The research community has been searching for accurate schedulers for heterogeneous computing systems, able to run in reduced times. The parallel stochastic search proposed in this work is based on simple operators in order to keep the computational complexity as low as possible, thus allowing large scheduling instances to be efficiently tackled. The experimental analysis demonstrates that the parallel stochastic local search method on GPU is able to compute accurate suboptimal schedules in significantly shorter execution times than state-of-the-art schedulers.Download full paper.
AUTHORS: Javier Arnedo-Fdez, Igor Zwir , and Rocío Romero-Zaliz (Dpt. of Computer Science and Artificial Intelligence, University of Granada, Spain)
ABSTRACT: In this work we report our first research steps on using GPUs to accelerate biclustering of very large data sets, which are common in real world applications such as biomedical and biotechnological. The bicluster problem is NP-hard, thus, finding an optimal solution could be time consuming, especially when dealing with large data sets. We present a GPU-accelerated implementation of the biclustering probabilistic move-based algorithm called FLOC, which can efficiently and accurately approximate biclusters with low mean squared residues without the impact of random interference. Results show that when the size of the dataset increases, the GP-GPU version of FLOC solves the biclustering problem much faster than the CPU FLOC version running on a single CPU core.Download full paper.
AUTHORS: Rafael F Garabato (Argentina Software Design Center, Intel Córdoba), Andrés More (Argentina Software Design Center, Intel Córdoba and Instituto Universitario Aeronáutico) and Victor Hugo Rosales (Argentina Software Design Center, Intel Córdoba)
ABSTRACT: This paper discusses how to decrease and stabilize network latency in a Beowulf system. Having low latency is particularly important to reduce execution time of High Performance Computing applications. Optimization opportunities are identified and analyzed over the different system components that are integrated in compute nodes, including device drivers, operating system services and kernel parameters. This work contributes with a systematic approach to optimize communication latency, provided with a detailed checklist and procedure. Performance impacts are shown through the figures of benchmarks and mpiBLAST as a real-world application. We found that after applying different techniques the default Gigabit Ethernet latency can be reduced from about 50µs into nearly 20µs.Download full paper.
AUTHORS: Diego Montezanti, Fernando Emmanuel Frati (Instituto de Investigación en Informática, Facultad de Informática, UNLP, Argentina and Consejo Nacional de Investigaciones Científicas y Técnicas), Dolores Rexachs, Emilio Luque (Departamento de Arquitectura de Computadores y Sistemas Operativos, UAB, Spain), Marcelo Naiouf (Instituto de Investigación en Informática, Facultad de Informática, UNLP, Argentina) and Armando De Giusti (Instituto de Investigación en Informática, Facultad de Informática, UNLP, Argentina and Consejo Nacional de Investigaciones Científicas y Técnicas)
ABSTRACT: The challenge of improving the performance of current processors is achieved by increasing the integration scale. This carries a growing vulnerability to transient faults, which increase their impact on multicore clusters running large scientific parallel applications. The requirement for enhancing the reliability of these systems, coupled with the high cost of rerunning the application from the beginning, create the motivation for having specific software strategies for the target systems. This paper introduces SMCV, which is a fully distributed technique that provides fault detection for message-passing parallel applications, by validating the contents of the messages to be sent, preventing the transmission of errors to other processes and leveraging the intrinsic hardware redundancy of the multicore. SMCV achieves a wide robustness against transient faults with a reduced overhead, and accomplishes a trade-off between moderate detection latency and low additional workload.Download full paper.
AUTHORS: Germán Bianchini, Miguel Mendez-Garabetti and Paola Caymes-Scutari (Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Información, Facultad Regional Mendoza - Universidad Tecnológica Nacional, Argentina)
ABSTRACT: Several propagation models have been developed to predict forest fire behaviour. They can be grouped into empirical, semi-empirical, and physical models. These models can be used to develop simulators and tools for preventing and fighting forest fires. Nevertheless, in many cases the models present a series of limitations related to the need for a large number of input parameters. Furthermore, such parameters often have some degree of uncertainty due to the impossibility of measuring all of them in real time. Therefore, they have to be estimated from indirect measurements, which negatively impacts on the output of the model. In this paper we present a method which combines Statistical Analysis with Parallel Evolutionary Algorithms (taking advantage of the computational power provided by High Performance Computing) to improve the quality of model's output.Download full paper.