KEYNOTES
  • Mateo's presentation title

    AUTHOR: Prof. Mateo Valero (Barcelona Supercomputing Center, Spain)

    SCHEDULE: Monday, July 23, 9:30

    ABSTRACT: to be completed

    SHORT BIO: Prof. Mateo Valero is a professor in the Computer Architecture Department at UPC, in Barcelona. His research interests focuses on high performance architectures. He has published approximately 600 papers, has served in the organization of more than 300 International Conferences and he has given more than 400 invited talks. He is the director of the Barcelona Supercomputing Centre, the National Centre of Supercomputing in Spain. Dr. Valero has been honoured with several awards. Among them, the Eckert-Mauchly Award, Harry Goode Award, the "King Jaime I" in research and two National Awards on Informatics and on Engineering. He has been named Honorary Doctor by the University of Chalmers, by the University of Belgrade, by the Universities of Las Palmas de Gran Canaria and Zaragoza in Spain and by the University of Veracruz in Mexico. "Hall of the Fame" member of the IST European Program (selected as one of the 25 most influents European researchers in IT during the period 1983-2008. Lyon, November 2008). In December 1994, Dr. Valero became a founding member of the Royal Spanish Academy of Engineering. In 2005 he was elected Correspondant Academic of the Spanish Royal Academy of Science, in 2006 member of the Royal Spanish Academy of Doctors and in 2008 member of the Academia Europaea. He is a Fellow of the IEEE, Fellow of the ACM and an Intel Distinguished Research Fellow.

  • Planeación de Capacidad en Centros de Datos para Motores de Búsqueda Web

    AUTHOR: Prof. Mauricio Marín (Universidad de Santiago de Chile and Yahoo! Research Chile)

    SCHEDULE: Monday, July 23, 15:00

    ABSTRACT: Los centros de datos para motores de búsqueda Web son sistemas dedicados a ejecutar una sola aplicación/servicio con algoritmos y estructuras de datos optimizados para procesar cientos de miles de consultas de usuario por segundo. La intensidad de tráfico de las consultas es altamente variable e impredecible, lo cual obliga a mantener los nodos del cluster de procesadores operando a baja utilización en régimen permanente con el objetivo de evitar saturación frente a variaciones bruscas en el tráfico. En esta charla se describen técnicas de evaluación del rendimiento de motores de búsqueda, las cuales permiten realizar tareas tales como optimizar procesos, evaluar distintas alternativas, probar la capacidad del sistema para absorber variaciones en el tráfico y dimensionar la cantidad de recursos requeridos para atender un determinado incremento en el nivel de tráfico cuando el sistema opera en régimen permanente. Las técnicas descritas están basadas en el uso combinado de fórmulas del análisis operacional del rendimiento de sistemas computacionales, y simulación discreta orientada a procesos y recursos.

    SHORT BIO: Mauricio Marín es profesor titular del Departamento de Ingeniería Informática de la Universidad de Santiago de Chile, y es investigador asociado y director de Yahoo! Labs Santiago. Obtuvo su doctorado en la Universidad de Oxford, Inglaterra, UK. Sus líneas de investigación tienen relación con procesamiento paralelo y distribuido con aplicaciones en sistema de recuperación de información y bases de datos.

  • Cristal's presentation title

    AUTHOR: Prof. Adrián Cristal (Barcelona Supercomputing Center, Spain)

    SCHEDULE: Thursday, July 24, 9:15

    ABSTRACT: to be completed

    SHORT BIO: to be completed

  • Foster's presentation title

    AUTHOR: Prof. Ian Foster (Argonne National Laboratory and University of Chicago, USA)

    SCHEDULE: Tuesday, July 24, 11:30

    ABSTRACT: We have made much progress over the past decade toward effectively harnessing the collective power of IT resources distributed across the globe. In fields such as high-energy physics, astronomy, and climate, thousands benefit daily from tools that manage and analyze large quantities of data produced and consumed by large collaborative teams. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that far more--ultimately most?--researchers will soon require capabilities not so different from those used by these big-science teams. How is the general population of researchers and institutions to meet these needs? Must every lab be filled with computers loaded with sophisticated software, and every researcher become an information technology (IT) specialist? Can we possibly afford to equip our labs in this way, and where would we find the experts to operate them? Consumers and businesses face similar challenges, and industry has responded by moving IT out of homes and offices to so-called cloud providers (e.g., Google, Netflix, Amazon, Salesforce), slashing costs and complexity. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity. More importantly, we can free researchers from the burden of managing IT, giving them back their time to focus on research and empowering them to go beyond the scope of what was previously possible. I describe work we are doing at the Computation Institute to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date with the Globus Online system, and suggest a path towards large-scale delivery of these capabilities.

    SHORT BIO: to be completed

INVITED TALKS
  • Presentation of National System of High Performance Computing Large Scientific Equipment

    AUTHOR: Prof. Pablo D. Mininni

    SCHEDULE: Monday, July 23, 10:15

    ABSTRACT: The National System of High Performance Computing Large Scientific Equipment (SNCAD) is a joint initiative between the Science, Technology and Productive Innovation Ministry and the Interinstitutional Council of Science and Technology (CICyT) inserted in the Programme of Large Scientific Equipments and Data Bases. Its main purpose is to consolidate a national network of high performance computing centers belonging to the scientific and academic community, trying to meet the growing demand in the areas of storage, grid computing, high performance, high troughput, visualization and other emerging technologies. This talk will introduce the system, and explain the conditions and steps to associate equipment to this national system.

    SHORT BIO: Pablo D. Mininni received his diploma in 1999 and his doctoral degree in 2003, both in physics and from the University of Buenos Aires (UBA) in Argentina. From 2004 to 2007 he lived in the United States in Boulder, Colorado, where he was a postdoc at National Center for Atmospheric Research (NCAR). Now, he is professor and chair of the Physics Department at the University of Buenos Aires, and scientist at National Center for Atmospheric Research. Dr. Mininni works on the numerical and theoretical study of turbulent flows, with applications in geophysics and astrophysics. In the field of fluid dynamics, his expertise includes parallelization methods for computational fluid dynamics, the application of statistical methods for the characterization and analysis of turbulent flows, spectral analysis of multi-scale and multi-physics phenomena, and sub-grid modeling for turbulent flows.

    Download full presentation
  • Meeting Exascale Challenges with File System Building Blocks

    AUTHOR: Andy McNeil (IBM)

    SCHEDULE: Monday, July 23, 14:30

    ABSTRACT: to be completed

    SHORT BIO: Andy is a Distinguished Engineer in IBM's Systems and Technology Group and is based in Research Triangle Park, NC. He has been architecting and developing direct server attached and external shared storage products for most all of his 33+ year career with IBM. He is currently the Chief Engineer for Modular and Blade Storage in the IBM Storage Systems development organization and is responsible for the architecture and technology strategy for storage products such as the IBM DS3000/5000, Storwize V7000, etc. He is also the leader of the storage architecture and attachment strategy for the IBM BladeCenter platform and the recently announced IBM PureSystems platform. He is an IBM Master Inventor with over 30 issued US patents and has been a member of the IBM Academy of Technology since 1999. He earned a BS in Electrical Engineering from Clemson University in 1979.

  • RISC project Initial Results

    AUTHOR: Prof. Ulises Cortés (Barcelona Supercomputing Center, Spain)

    SCHEDULE: Monday, July 23, 18:00

    ABSTRACT: The RISC project aims at deepening strategic R\&D cooperation between Europe and Latin America in the field of High Performance Computing (HPC) by building a multinational and multi-stakeholder community that will involve a significant representation of the relevant HPC R\&D European and Latin American actors (researchers, policy makers, users). RISC will identify common needs, research issues and opportunities for cooperative R\&D on HPC between EU and Latin America in the transition to multi-core architectures across the computing spectrum and relevant programming paradigms, algorithms and modelling approaches, thus *setting the basis for the formulation of a global strategy for future research.

    SHORT BIO: Prof. Ulises Cortés is a Full-Professor and Researcher of the Technical University of Catalonia (UPC) since 1982 (tenured since 1988 and habilitated as Full-Professor since 2006) working on several areas of Artificial Intelligence (AI) in the Software Department including knowledge acquisition for and concept formation in knowledge-based systems, as well as on machine learning and in autonomous intelligent agents. This is the wiki entry for the KEMLG. Since 1989 Professor Cortés and his group have been applying their work in Artificial Intelligence to Environmental Sciences in special to Wastewater Treatment Plants with the financial support of CICyT and CIRIT and the European Union. He is a co-founder of SISLTECH and member of its Board of Directors since 2011.

    Download full presentation
  • Domain decomposition methods for the finite element solution of partial differential equations

    AUTHOR: Prof. Javier Príncipe (Universidad Politécnica de Catalunya, Spain)

    SCHEDULE: Monday, Thursday 24, 10:00

    ABSTRACT: Future increases of the computational power of distributed-memory architectures will most likely be achieved with a substantially higher degree of multicore parallelism per node, which in turn will be accompanied with more complex hierarchical cache/memory designs. These two levels of hardware parallelism (inter-node and intra-node) would naturally be exploited by a hybrid software design. This hybrid software design, in turn, may be performed by hybrid algorithms adapted to the organization of the underlying parallel hardware. Under this scenario, domain decomposition (DD) methods provide a natural framework for the development of hybrid parallel solvers tailored for finite element (FE) analysis. Some finite element codes are built by a nested nonoverlapping decomposition of the domain employing a direct elimination of interior variables of elements and larger and larger macro elements, which are created by assembly of elements or smaller macro elements. We can think of iterative substructuring methods as halting this process at some stage and solving the remaining linear system by a preconditioned Krylov space method. These methods are hybrid (in an algorithmic sense) by construction (direct at the subdomain level and iterative in the global coupling).

    SHORT BIO: Javier Príncipe (http://principe.rmee.upc.edu) is assistant professor at the Fluid Mechanics department, Universitat politécnica de Catalunya (UPC) and associate researcher at the International Center for Numerical Methods in Engineering (CINME), an autonomous research center in Barcelona. With a six-year degree in physics, obtained in 1999 at the Facultad de Ciencias Exactas y Naturales (School of Sciences), Universidad de Buenos Aires, he started working at the Center for industrial research (CINI) in Buenos Aires implementing numerical methods to simulate turbulent flows and applying them to simulate steel-making processes. In 2002 he started as a research assistant at the International center for numerical methods in engineering (CIMNE) taking part in Spanish and European projects. He enjoyed a doctoral fellowship from the Catalan government to perform PhD studies between 2004 and 2007 and received his PhD at the Universitat politécnica de Catalunya in April 2008. His thesis entitled "Subgrid scale stabilized finite elements for low speed flows" and developed under the advice of prof. Ramon Codina, was awarded with the Premi Extraordinari de doctorat 2010 (award to the best PhD thesis in Civil Engineering of 2008), UPC. After that, he started again at CIMNE as a Juan de la Cierva postdoctoral researcher between 2009 and 2010. He is author of 16 publications in indexed international journals, more than 30 conference articles and technical reports, 17 presentations at international conferences and many invited talks. He acts also as a reviewer for international journals, has participated in evaluation comitees and organized congress special sessions.

    Download full presentation
FULL PAPER PRESENTATIONS

Full Papers I: High performance scientific computing

  • Parallel Adaptive Simulation of Coupled Incompressible Viscous Flow and Advective-Diffusive Transport Using Stabilized FEM Formulation

    PRESENTER: Andre Rossa

    SCHEDULE: Monday, July 23, 11:30

    AUTHORS: Andre Rossa (Engineering Simulation and Scientific Software, Rio de Janeiro, Brazil) and Alvaro Coutinho (High-Performance Computing Center, Department of Civil Engineering, Federal University of Rio de Janeiro, Brazil)

    ABSTRACT: We study coupled incompressible viscous flow and advective-diffusive transport of a scalar. Both the Navier-Stokes and transport equations are solved using an Eulerian approach. The SUPG/PSPG stabilized finite element formulation is applied for the 8-node isoparametric hexahedron. The implementation is held using the libMEsh FEM library which provides the support for adaptive mesh refinement and coarsening and parallel computation. The Rayleigh-Bénard natural convection and the planar lock-exchange density current problems are solved to assess the adaptive parallel performance of the numerical solution.

    Download full presentation and some additional videos.
  • A Numerical Algorithm for the Solution of Viscous Incompressible Flow on GPU's

    PRESENTER: Santiago D. Costarelli

    SCHEDULE: Monday, July 23, 11:50

    AUTHORS: Santiago Costarelli, Mario Storti, Rodrigo Paz, Lisandro Dalcín (CIMEC-INTEC-CONICET-UNL) and Sergio Idelsohn (CIMEC-INTEC-CONICET-UNL, International Center for Numerical Methods in Engineering (CIMNE), Technical University of Catalonia (UPC), Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Spain)

    ABSTRACT: Graphic Processing Units have received much attention in last years. Compute-intensive algorithms operating on multidimensional arrays that have nearest neighbor dependency and/or exploit data locality can achieve massive speedups. This work discuss a solver for the pressure problem in applications using immersed boundary techniques in order to account for moving solid bodies. The solver is based on standard Conjugate Gradients iterations and depends on the availability of a fast Poisson solver on the whole domain to define a preconditioner.

    Download full presentation and some additional videos.
  • Parallel Computing Applied to Satellite Images Processing for Solar Resource Estimates

    PRESENTER: Rodrigo Alonso

    SCHEDULE: Monday, July 23, 12:10

    AUTHORS: Rodrigo Alonso (Instituto de Física, Facultad de Ingeniería, Universidad de la República, Uruguay) and Sergio Nesmachnow (Centro de Cálculo, Facultad de Ingeniería, Universidad de la República, Uruguay)

    ABSTRACT: This article presents the application of parallel computing techniques to process satellite imagery information for solar resource estimates. A distributed memory parallel algorithm is introduced, which is capable to generate the required inputs from visible channel images to feed a statistical solar irradiation model. The parallelization strategy consists in distributing the images within the available processors, and so, every image is accessed only by one process. The experimental analysis demonstrate that a maximum speedup value of 2.32 is achieved when using four computing resources, but beyond that point the performance rather decrease due to hard-disk input/output velocity.

    Download full presentation
  • Parallel conversion of satellite image information for a wind energy generation forecasting model

    PRESENTER: Germán Gadea

    SCHEDULE: Monday, July 23, 12:40

    AUTHORS: Germán Gadea, Andrés Flevaris, Juan Souteras, Sergio Nesmachnow, Alejandro Gutiérrez and Gabriel Cazes (Universidad de la República, Uruguay)

    ABSTRACT: This paper presents an efficient parallel algorithm for the problem of converting satellite imagery in binary files. The algorithm was designed to update at global scale the land cover information used by the WRF climate model. We present the characteristics of the implemented algorithm, as well as the results of performance analysis and comparisons between two approaches to implement the algorithm. The performance analysis shows that the implemented parallel algorithm improves substantially against the sequential algorithm that solves the problem, obtaining a linear speedup.

    Download full presentation

Full Papers II: GPU computing

  • Facial Recognition Using Neural Networks over GPGPU

    PRESENTER: Juan Pablo Balarini

    SCHEDULE: Monday, July 23, 16:20

    AUTHORS: Juan Pablo Balarini, Martín Rodríguez, and Sergio Nesmachnow (Centro de Cálculo, Facultad de Ingeniería, Universidad de la República, Uruguay)

    ABSTRACT: This article introduces a parallel neural network approach implemented over Graphic Processing Units (GPU) to solve a facial recognition problem, which consists in deciding where the face of a person in a certain image is pointing. The proposed method uses the parallel capabilities of GPU in order to train and evaluate a neural network used to solve the abovementioned problem. The experimental evaluation demonstrates that a significant reduction on computing times can be obtained allowing solving large instances in reasonable time. Speedup greater than 8 is achieved when contrasted with a sequential implementation and classification rate superior to 85% is also obtained.

    Download full presentation
  • Parallel implementations of the MinMin heterogeneous computing scheduler in GPU

    PRESENTER: Mauro Canabé

    SCHEDULE: Monday, July 23, 16:40

    AUTHORS: Mauro Canabé and Sergio Nesmachnow (Centro de Cálculo, Facultad de Ingeniería, Universidad de la República, Uruguay)

    ABSTRACT: This work presents parallel implementations of the MinMin scheduling heuristic for heterogeneous computing using Graphic Processing Units, in order to improve its computational efficiency. The experimental evaluation of the four proposed MinMin variants demonstrates that a significant reduction on the computing times can be attained, allowing to tackle large scheduling scenarios in reasonable execution times.

    Download full presentation
  • A parallel online GPU scheduler for large heterogeneous computing systems

    PRESENTER: Francisco Luna

    SCHEDULE: Monday, July 23, 17:20

    AUTHORS: Santiago Iturriaga, Sergio Nesmachnow (Universidad de la República, Uruguay), Francisco Luna, and Enrique Alba (Universidad de Málaga, Spain)

    ABSTRACT: This work presents a parallel implementation on GPU for a stochastic local search method to efficiently solve the task scheduling problem in heterogeneous computing environments. The research community has been searching for accurate schedulers for heterogeneous computing systems, able to run in reduced times. The parallel stochastic search proposed in this work is based on simple operators in order to keep the computational complexity as low as possible, thus allowing large scheduling instances to be efficiently tackled. The experimental analysis demonstrates that the parallel stochastic local search method on GPU is able to compute accurate suboptimal schedules in significantly shorter execution times than state-of-the-art schedulers.

    Download full presentation
  • Biclustering of very large datasets with GPU tecnology using CUDA

    PRESENTER: Rocío Romero-Zaliz

    SCHEDULE: Monday, July 23, 17:20

    AUTHORS: Javier Arnedo-Fdez, Igor Zwir , and Rocío Romero-Zaliz (Dpt. of Computer Science and Artificial Intelligence, University of Granada, Spain)

    ABSTRACT: In this work we report our first research steps on using GPUs to accelerate biclustering of very large data sets, which are common in real world applications such as biomedical and biotechnological. The bicluster problem is NP-hard, thus, finding an optimal solution could be time consuming, especially when dealing with large data sets. We present a GPU-accelerated implementation of the biclustering probabilistic move-based algorithm called FLOC, which can efficiently and accurately approximate biclusters with low mean squared residues without the impact of random interference. Results show that when the size of the dataset increases, the GP-GPU version of FLOC solves the biclustering problem much faster than the CPU FLOC version running on a single CPU core.

    Download full presentation

Full Papers III: Applications

  • Optimizing Latency in Beowulf Clusters

    PRESENTER: Andrés More

    SCHEDULE: Tuesday, July 24, 14:00

    AUTHORS: Rafael F Garabato (Argentina Software Design Center, Intel Córdoba), Andrés More (Argentina Software Design Center, Intel Córdoba and Instituto Universitario Aeronáutico) and Victor Hugo Rosales (Argentina Software Design Center, Intel Córdoba)

    ABSTRACT: This paper discusses how to decrease and stabilize network latency in a Beowulf system. Having low latency is particularly important to reduce execution time of High Performance Computing applications. Optimization opportunities are identified and analyzed over the different system components that are integrated in compute nodes, including device drivers, operating system services and kernel parameters. This work contributes with a systematic approach to optimize communication latency, provided with a detailed checklist and procedure. Performance impacts are shown through the figures of benchmarks and mpiBLAST as a real-world application. We found that after applying different techniques the default Gigabit Ethernet latency can be reduced from about 50µs into nearly 20µs.

    Download full presentation
  • SMCV: a Methodology for Detecting Transient Faults in Multicore Clusters

    PRESENTER: Diego Montezanti

    SCHEDULE: Tuesday, July 24, 14:20

    AUTHORS: Diego Montezanti, Fernando Emmanuel Frati (Instituto de Investigación en Informática, Facultad de Informática, UNLP, Argentina and Consejo Nacional de Investigaciones Científicas y Técnicas), Dolores Rexachs, Emilio Luque (Departamento de Arquitectura de Computadores y Sistemas Operativos, UAB, Spain), Marcelo Naiouf (Instituto de Investigación en Informática, Facultad de Informática, UNLP, Argentina) and Armando De Giusti (Instituto de Investigación en Informática, Facultad de Informática, UNLP, Argentina and Consejo Nacional de Investigaciones Científicas y Técnicas)

    ABSTRACT: The challenge of improving the performance of current processors is achieved by increasing the integration scale. This carries a growing vulnerability to transient faults, which increase their impact on multicore clusters running large scientific parallel applications. The requirement for enhancing the reliability of these systems, coupled with the high cost of rerunning the application from the beginning, create the motivation for having specific software strategies for the target systems. This paper introduces SMCV, which is a fully distributed technique that provides fault detection for message-passing parallel applications, by validating the contents of the messages to be sent, preventing the transmission of errors to other processes and leveraging the intrinsic hardware redundancy of the multicore. SMCV achieves a wide robustness against transient faults with a reduced overhead, and accomplishes a trade-off between moderate detection latency and low additional workload.

    Download full presentation
  • Evolutionary Statistical System for applying in Forest Fire Spread Prediction

    PRESENTER: Miguel Mendez-Garabetti

    SCHEDULE: Tuesday, July 24, 14:40

    AUTHORS: Germán Bianchini, Miguel Mendez-Garabetti and Paola Caymes-Scutari (Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Información, Facultad Regional Mendoza - Universidad Tecnológica Nacional, Argentina)

    ABSTRACT: Several propagation models have been developed to predict forest fire behaviour. They can be grouped into empirical, semi-empirical, and physical models. These models can be used to develop simulators and tools for preventing and fighting forest fires. Nevertheless, in many cases the models present a series of limitations related to the need for a large number of input parameters. Furthermore, such parameters often have some degree of uncertainty due to the impossibility of measuring all of them in real time. Therefore, they have to be estimated from indirect measurements, which negatively impacts on the output of the model. In this paper we present a method which combines Statistical Analysis with Parallel Evolutionary Algorithms (taking advantage of the computational power provided by High Performance Computing) to improve the quality of model's output.

    Download full presentation
PAPER PRESENTATION

Presentations I

  • Parallel Simulations of Coupled Multiphysics Problems in Bio-physics

    PRESENTER: Mariano Vázquez

    SCHEDULE: Tuesday, July 24, 10:30

    AUTHORS: M. Vázquez (Barcelona Supercomputing Center, Barcelona, Spain and IIIA-CSIC, Bellaterra, Spain), G. Houzeaux, P. Lafortune, R. Arís and J. Aguado-Sierra (Barcelona Supercomputing Center, Barcelona, Spain)

    ABSTRACT: This paper introduces simulation issues for parallel coupled multiphysics problems in bio-mechanics. In particular, it focuses on a paradigmatic case: cardiac electromechanics. As a Physical system, the heart can be modelled as a many-parts problem. An electrical activation potential propagates throughout the muscle, causing mechanical contraction to act against blood, which is in turn pumped through the valves. In this work we describe the electromechanical activation in greater detail and introduce a fluid structure interaction scheme. The parallel coupled model, programmed in BSC's in-house multiphysics code Alya is thoroughly discussed, with scalability proven up to several hundreds of cores.

  • Scaled Real-Time Parallelization for DEVS Simulation of Hybrid Systems

    PRESENTER: Federico Bergero

    SCHEDULE: Tuesday, July 24, 10:45

    AUTHORS: Federico Bergero, Ernesto Kofman (Laboratorio de Sistemas Dinámicos. FCEIA - UNR. CIFASIS-CONICET), and Francois Cellier (Modeling and Simulation Research Group ETH Zürich, Switzerland)

    ABSTRACT: We introduce a novel parallelization technique for discrete event simulation of hybrid systems. The models are first split into several sub-models that are concurrently simulated on different processors. In order to avoid the cost of the synchronization between processes, the simulation time of each sub-model is locally synchronized in a real-time fashion with a scaled version of physical time, which implicitly synchronizes all sub-models. The new methodology, coined Scaled Real-Time Synchronization and an adaptive extension of it were implemented in PowerDEVS , a DEVS simulation tool, under a real-time operating system. We tested their performance by simulating two large-scale models, obtaining in both cases a considerable speedup.

    Download full presentation
  • Accessing files using transactional memory

    PRESENTER: Adrián Cristal

    SCHEDULE: Tuesday, July 24, 11:00

    AUTHORS: Cristian Perfumo(School of Electrical Engineering and Computer Science, Faculty of Engineering and Built Environment, University of Newcastle, Australia) and Adrián Cristal (Barcelona Supercomputing Center, Artificial Intelligence Research Institute and Spanish National Research Council (CSIC), Spain)

    ABSTRACT: Although lock-free transactional memory facilitates concurrent programming, most implementations serialise those transactions in which input/output is performed. In this paper we present the first approach that allows programmers to access files within transactions using the traditional (fopen, fread, fwrite, etc.) operations and at the same time provides real parallelism without making changes to the operating system. The performance evaluations presented show promising results using both synthetic and real-world benchmarks.

Presentations II

  • Parallel execution of a parameter sweep for molecular dynamics simulations in a mixed GPU/CPU environment

    PRESENTER: Carlos García Garino

    SCHEDULE: Tuesday, July 24, 12:15

    AUTHORS: Emmanuel N. Millán (CONICET and ITIC, Universidad Nacional de Cuyo, Mendoza, Argentina and Instituto de Ciencias Básicas, Universidad Nacional de Cuyo, Mendoza, Argentina), Carlos García Garino (ITIC, Universidad Nacional de Cuyo, Mendoza, Argentina), and Eduardo M. Bringa (CONICET and Instituto de Ciencias Básicas, Universidad Nacional de Cuyo, Mendoza, Argentina)

    ABSTRACT: Molecular Dynamics (MD) simulations can help understanding an immense number of phenomena at the nano and microscale. They often require the exploration of large parameter space, and a possible parallelization strategy consists of sending different parameter sets to different processors. Here we present such approach using a mixed environment of GPUs and CPU cores. We take advantage of the software LAMMPS (lammps.sandia.gov), which is already prepared to run in a mixed environment, in order to do an efficient parameter sweep. Two examples are presented in this work: one where a random variation of the initial conditions allows to build appropriate statistics, and another where the collision of two clusters is sampled over a multivariate space to obtain information on the resulting structural properties.

    Download presentation
  • Improve an efficient simulation of multidimensional Gaussian random fields in GPU

    SCHEDULE: Tuesday, July 24, 12:30

    PRESENTER: Daniel Baeza

    AUTHORS: Daniel Baeza, Exequiel Sepúlveda, and Julián Ortiz (ALGES Laboratory, Advanced Mining Technology Center (AMTC), University of Chile, Chile and Department of Mining Engineering, University of Chile, Chile)

    ABSTRACT: This paper shows a parallel implementation of the Turning Bands Method for conditional simulation of random fields, using a Graphics Processing Unit. This implementation is compared with a classical serial algorithm in CPU by reporting the speedup and processing power of each implementation. The use of the algorithm is illustrated with the results in acceleration obtained in an application using a real dataset to obtain conditional simulations.

    Download full presentation
  • Applying List Scheduling Algorithms In A Multithreaded Execution Environment

    PRESENTER: Maurício Lima Pilla

    SCHEDULE: Tuesday, July 24, 12:45

    AUTHORS: Cícero Camargo, Gerson G. H. Cavalheiro, Maurício L. Pilla, Simone A. C. Cavalheiro, and Luciana Foss (Technological Development Center (CDTec), Federal University of Pelotas, Brazil)

    ABSTRACT: List scheduling algorithms are known to be efficient when the application to be executed can be described statically as a directed acyclic graph (DAG) of tasks. Many programming tools use this basic technique to schedule programs written using multithread programming interfaces. This paper presents an analysis of scheduling algorithms for multithread programs in a dynamic scenario where threads are created and destroyed during execution. We introduce an algorithm to convert DAGs describing applications as tasks into directed cyclic graphs (DCGs) describing the same application designed in a multithreaded programming interface. Our algorithm cover case studies described in previous works, successfully mapping from the abstract level of graphs to the application environment. These mappings preserve the guarantees offered by the abstract model, providing efficient scheduling of dynamic programs that follow the intended multithread model.

    Download full presentation

Presentations III

  • Parallelizing NEC's Equation Solver Algorithm with OpenMP

    PRESENTER: Mario Trangoni

    SCHEDULE: Tuesday, July 24, 15:00

    AUTHORS: Mario Trangoni and Victor Rosales (Argentina Software Design Center, Intel, Argentina)

    ABSTRACT: In order to take advantage of the parallel multi-core architecture of the modern processors, the legacy serial code must be analyzed to discover the regions where the parallelization effort can be more rewarding. In this work the Intel R VTune Amplifier XE was used to profile the NEC2C software for simulation of electromagnetic response of antennas, a proposal for parallelization of this code was implemented. This work reviews the optimization procedure done in an HPC application. After applying profiling and paralelizing the main computational kernel got an speed up of nearly 7 times.

    Download full presentation
  • Parallelizing Lattice Boltzmann Methods with OpenMP

    PRESENTER: Miguel Montes

    SCHEDULE: Tuesday, July 24, 15:15

    AUTHORS: Miguel Montes (Instituto Universitario Aeronáutico, Córdoba, Argentina)

    ABSTRACT: This article describes the results of parallelizing Lattice Boltzmann methods using OpenMP. The main goal was to evaluate OpenMP as a tool to incrementally parallelize pre-existing serial code. It analyzes the speedup obtained with different number of cores, and the effects of using different types of CPU affnity. It also identifies a problem when the code used is compiled with gcc and the number of threads used is close to the number of machine threads.

    Download full presentation
  • Power Characterisation of Shared-Memory HPC Systems

    PRESENTER: Javier Balladini

    SCHEDULE: Tuesday, July 24, 15:30

    AUTHORS: Javier Balladini (Department of Computer Engineering, Universidad Nacional del Comahue, Argentina), Enzo Rucci, Armando De Giusti, Marcelo Naiouf (III LIDI, Facultad de Informática, Universidad Nacional de La Plata), Remo Suppi, Dolores Rexachs, Emilio Luque (Department of Computer Architecture and Operating Systems, Universitat Autónoma de Barcelona, Spain)

    ABSTRACT: Energy consumption has become one of the greatest challenges in the field of High Performance Computing (HPC). Besides its impact on the environment, energy is a limiting factor for the HPC. Keeping the power consumption of a system below a threshold is one of the great problems; and power prediction can help to solve it. The power characterisation can be used to know the power behaviour of the system under study, and be a support to reach the power prediction. Furthermore, it could be used to design power-aware application programs. In this article we propose a methodology to characterise the power consumption of shared-memory HPC systems. Our proposed methodology involves the finding of influence factors on power consumed by the systems, that is a sensitivity analysis of workload properties and system parameters on the power behaviour. The workload considers the computation and communication aspects of applications. Our methodology is similar to previous works, but we propose an in-deep approach that can help us to get a better power characterisation of the system.We apply our methodology to characterise an Intel server platform and the results show that we can find a more extended set of influence factors on power consumption.

  • Using distributed local information to improve global performance in Grids

    PRESENTER: Paula Verghelet

    SCHEDULE: Tuesday, July 24, 15:45

    AUTHORS: Paula Verghelet, Diego Fernández Slezak, Pablo Turjanski and Esteban Mocskos (Laboratorio de Sistemas Complejos, Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires)

    ABSTRACT: Grid computing refers to the federation of geographically distributed and heterogeneous computer resources. These resources may belong to different administrative domains, but are shared among users. Every grid presents a key component responsible for obtaining, distributing, indexing and archiving information about the configuration and state of services and resources. Optimizing tasks assignations and user requests to resources require the maintenance of up-to-date information about the grid. In large scale Grids, the dynamics of the resource information cannot be captured using a static hierarchy and relying in manual configuration and administration. It is necessary to design new policies for discovery and propagation of resource information. There is a growing interest in the interaction of Grid Computing and the Peer to Peer (P2P) paradigm, pushing towards scalable solutions. In this work, starting from the Best-Neighbor policy based on previously published ideas, the reasons behind its lack of performance are explored. A new improved Best-Neighbor policy are proposed and analyzed, comparing it with Random, Hierarchical and Super-Peer policies.

    Download full presentation