GPU Accelerated Stochastic Inversion of Deep Water Seismic Data

Documentos relacionados
Heterogeneous multi-core computer architectures and

Scheduling Divisible Loads on Heterogeneous Desktop Systems with Limited Memory

Transparent application acceleration by intelligent scheduling of shared library calls on heterogeneous systems September 9, 2013

Efficient Locally Trackable Deduplication in Replicated Systems. technology from seed

A Cloud Computing Architecture for Large Scale Video Data Processing

Tese / Thesis Work Análise de desempenho de sistemas distribuídos de grande porte na plataforma Java

Digital Cartographic Generalization for Database of Cadastral Maps

SIMULATION OF FLOW AROUND FLOATING STRUCTURES: SHIPS AND PLATFORMS

A Tool to Evaluate Stuck-Open Faults in CMOS Logic Gates

GPU-based Heterogeneous Systems [PCs (CPU + GPU) = Heterogeneous Systems]

Easy Linux! FUNAMBOL FOR IPBRICK MANUAL. IPortalMais: a «brainware» company Manual

T Ã O B O M Q U A N T O N O V O

gssjoin: a GPU-based Set Similarity Join Algorithm

Interoperability through Web Services: Evaluating OGC Standards in Client Development for Spatial Data Infrastructures

Multicriteria Impact Assessment of the certified reference material for ethanol in water

Unifying Stream Based and Reconfigurable Computing to Design Application Accelerators

Placa de vídeo em CUDA

CANape/vSignalyzer. Data Mining and Report Examples Offline Analysis V

Performance and Power Consumption Analysis of Full Adders Designed in 32nm Technology

Performance-aware task management and frequency scaling in embedded systems

Especialização em Engenharia e Administração de Banco de Dados SISTEMA DE GERENCIAMENTO DE BANCO DE DADOS I

Simulação Gráfica e Visão Computacional. Soraia Raupp Musse

Project Management Activities

CIS 500 Software Foundations Fall September(continued) IS 500, 8 September(continued) 1

Normas Gráficas do Símbolo e Logótipo aicep Portugal Global aicep Portugal Global Symbol and Logo Graphic Guidelines Capítulo 1 Chapter 1

Neutron Reference Measurements to Petroleum Industry

Capítulo Sistemas de Memória Memória Virtual. Ch7b 1

UNIDADE DE PESQUISA CLÍNICA Centro de Medicina Reprodutiva Dr Carlos Isaia Filho Ltda. SAMPLE SIZE DETERMINATION FOR CLINICAL RESEARCH

Manual de Docência para a Disciplina de Análise Matemática II

A Aviação no Comércio Europeu de Licenças de Emissão Especificidades para pequenos emissores

[DataSet11] D:\Fmh\Doutoramento\Tese\Dados\Quantitativos\Questionário Prof essores.sav

Avaliação de Investimentos em Tecnologia da Informação: uma Perspectiva de Opções Reais

ICS-GT INTEGRATED CONTROL SYSTEM FOR GAS TURBINE

Lucas de Assis Soares, Luisa Nunes Ramaldes, Taciana Toledo de Almeida Albuquerque, Neyval Costa Reis Junior. São Paulo, 2013

Serviços: API REST. URL - Recurso

Normalização e interoperabilidade da informação geográfica

Aula 21 Ordenação externa


TÉCNICAS DE COMPUTAÇÃO PARALELA PARA MELHORAR O TEMPO DA MINERAÇÃO DE DADOS: Uma análise de Tipos de Coberturas Florestais

Cooperative Execution on Heterogeneous Multi-core Systems

Collaborative Execution Environment for Heterogeneous Parallel Systems CHPS*

Execution Rate. Bytes of Available Memory

Software reliability analysis by considering fault dependency and debugging time lag Autores

Solutions. Adição de Ingredientes. TC=0.5m TC=2m TC=1m TC=3m TC=10m. O Tempo de Ciclo do Processo é determinado pelo TC da operação mais lenta.

Erasmus Student Work Placement

Easy Linux! FUNAMBOL FOR IPBRICK MANUAL. IPortalMais: a «brainmoziware» company Manual Jose Lopes

MARCELO DE LIMA BRAZ REDUÇÃO DA QUANTIDADE DE REPROCESSO NO SETOR DE PRODUÇÃO DE CALDOS ALIMENTÍCIOS NA EMPRESA DO RAMO ALIMENTÍCIO (ERA).

75, 8.º DTO LISBOA

Banca examinadora: Professor Paulo N. Figueiredo, Professora Fátima Bayma de Oliveira e Professor Joaquim Rubens Fontes Filho

Ontology Building Process: The Wine Domain

Leonel Sousa 1, Aleksandar Ilić 1, Frederico Pratas 1 and Pedro Trancoso 2 work performed in the scope of HiPEAC FP7 project

A MÁQUINA ASSÍNCRONA TRIFÁSICA BRUSHLESS EM CASCATA DUPLAMENTE ALIMENTADA. Fredemar Rüncos

Resumo componentes físicos cooperação entre os actores eficácia eficiência

ÍNDICE PORTUGUÊS INDEX ENGLISH

User interface evaluation experiences: A brief comparison between usability and communicability testing

Cyclic loading. Yield strength Maximum strength

DPI. Núcleo de Apoio ao Desenvolvimento de Projetos e Internacionalização Project Development And Internationalization Support Office

Programação Paralela Híbrida em CPU e GPU: Uma Alternativa na Busca por Desempenho

The L2F Strategy for Sentiment Analysis and Topic Classification

Uma arquitetura baseada em agentes de software para a automação de processos de gerênciadefalhasemredesde telecomunicações

MÉTODO DE ANÁLISE DA VULNERABILIDADE COSTEIRA À EROSÃO

DEPARTAMENTO DE ENGENHARIA CIVIL E ARQUITETURA

CMOS - Devices. Resistors

Solid State Drive versus Hard Disk Drive

Faculdade de Engenharia. Transmission Lines ELECTROMAGNETIC ENGINEERING MAP TELE 2007/2008

OFFSHORE INTERACTIONS

Fábio Markus Nunes Miranda. Volume rendering of unstructured hexahedral meshes DISSERTAÇÃO DE MESTRADO

Collaborative Networks the rsptic example espap Entidade de Serviços Partilhados da Administração Pública, I.P. Direitos reservados.

COMITÊ DO ESPECTRO PARA RADIODIFUSÃO - CER SPECTRUM DAY A REVISÃO DA REGULAMENTAÇÃO DO USO DA FAIXA DE 3,5 GHZ UMA NECESSIDADE COMPROVADA.

Paulo Azevedo & Pedro Ferreira. Departamento de Informática Universidade do Minho

CHPC Computational Platforms

Parallel Algorithms for Multicore Game Engines

DALI TECHNOLOGY. Sistemas e Planeamento Industrial DOMÓTICA. Eng.º Domingos Salvador dos Santos.

Select a single or a group of files in Windows File Explorer, right-click and select Panther Print

Universidade Técnica de Lisboa. Faculdade de Motricidade Humana

Avaliação de Desempenho do Método de Lattice Boltzmann em Arquiteturas multi-core e many-core

Generation and Analysis of Android Benchmarks with Different Algorithm Design Paradigms

Leica Sprinter 50 / 150 / 150M / 250M Push the Button

Performance Evaluation of Software Architectures. Outline. José Costa Software architectures - exercises. Software for Embedded Systems

Scientific data repositories: the USP experience

D I E C A S T I N G P L Á S T I C O S C O M P O N E N T E S

Transcript name: 1. Introduction to DB2 Express-C

Caracterização dos servidores de

Definindo melhor alguns conceitos

PROVA DE EXATAS QUESTÕES EM PORTUGUÊS:

Número: Nome:

English version at the end of this document

Manual de Docência para a disciplina de Algoritmia e Programação 2005/2006 Engenharia Informática, 1º ano José Manuel Torres

Mil-Spec Numbering System Defined

Session 8 The Economy of Information and Information Strategy for e-business

Forensics.

Introduction to Network Design and Planning

e-lab: a didactic interactive experiment An approach to the Boyle-Mariotte law

Manual de normas gráficas Graphic guidelines handbook 2008

Service quality in restaurants: an experimental analysis performed in Brazil

SISTEMAS DISTRIBUÍDOS 1º EXAME

Designing drive controllers with Matlab - Simulink 1kW

Transcrição:

GPU Accelerated Stochastic Inversion of Deep Water Seismic Data Tomás Ferreirinha Rúben Nunes Amílcar Soares Frederico Pratas Pedro Tomás Nuno Roma Pedro Tomás INESC-ID, Instituto Superior Técnico, Universidade de Lisboa pedro.tomas@inesc-id.pt

Outline Motivation and objectives Stochastic Seismic AVO Inversion Algorithm Algorithm Description Dependencies Parallelization Strategy Mapping Methodology Main Considered Optimizations Multi-GPU Parallelization Approach Experimental Results Conclusions 2

Seismic Inversion methods Allow estimating the physical properties of the Earth subsurface Typically based on seismic reflection data Allows studying well logs, e.g., for accurate drilling 3

Motivation Simple models for seismic inversion are characterized by a high level of uncertainty, leading to drilling errors Complex models on the other hand lead to significant execution times Simulation times ranging between weeks to months are common As a result, simulations are often constrained in algorithmic complexity and achieved numerical precision 4

Objective Accelerate state-of-art complex Seismic Inversion Algorithms Decrease execution time from weeks or months to days or even hours such as to allow using more complex and accurate models Allows using more complex computational models Can be used for larger fields Efficiently exploit platforms composed of multiple heterogeneous accelerators, specifically GPUs OpenCL allows exploiting GPUs from different vendors 5 25/08/2014

Stochastic Seismic AVO Inversion Algorithm Concepts The Stochastic Seismic AVO Inversion algorithm is based in two main concepts: 1. Density, P-wave and S-wave velocity models (ρ, Vp and Vs) perturbed towards an objective function, according to the Direct Sequential Simulation (DSS) Algorithm 2. A genetic algorithm, acting as a global optimizer, that ensures the convergence of the solution from iteration to iteration 6

Stochastic Seismic AVO Inversion Algorithm Flowchart 1. Stochastic simulation of the ρ/vp/vs data using the Direct Sequential Simulation (DSS) algorithm 7

Stochastic Seismic AVO Inversion Algorithm Flowchart 1. Stochastic simulation of the ρ/vp/vs data using the Direct Sequential Simulation (DSS) algorithm 2. Calculation of the synthetic pre-stack seismic cube with the simulated ρ/vp/vs models 8 25/08/2014

Stochastic Seismic AVO Inversion Algorithm Flowchart 1. Stochastic simulation of the ρ/vp/vs data using the Direct Sequential Simulation (DSS) algorithm 2. Calculation of the synthetic pre-stack seismic cube with the simulated ρ/vp/vs models 3. Comparison between the synthetic seismic and the real seismic data and creation of the seismic correlation cube 9 25/08/2014

Stochastic Seismic AVO Inversion Algorithm Flowchart 1. Stochastic simulation of the ρ/vp/vs data using the Direct Sequential Simulation (DSS) algorithm 2. Calculation of the synthetic pre-stack seismic cube with the simulated ρ/vp/vs models 3. Comparison between the synthetic seismic and the real seismic data and creation of the seismic correlation cube 4. Best density/vp/vs models are built by selecting areas of higher correlation using a genetic algorithm 10 25/08/2014

Stochastic Seismic AVO Inversion Algorithm Flowchart 1. Stochastic simulation of the ρ/vp/vs data using the Direct Sequential Simulation (DSS) algorithm 2. Calculation of the synthetic pre-stack seismic cube with the simulated ρ/vp/vs models 3. Comparison between the synthetic seismic and the real seismic data and creation of the seismic correlation cube 4. Best density/vp/vs models are built by selecting areas of higher correlation using a genetic algorithm 5. Repeat step 1, 2, 3 and 4 until a defined number of simulations per iteration have been achieved 11 25/08/2014

Stochastic Seismic AVO Inversion Algorithm Flowchart 1. Stochastic simulation of the ρ/vp/vs data using the Direct Sequential Simulation (DSS) algorithm 2. Calculation of the synthetic pre-stack seismic cube with the simulated ρ/vp/vs models 3. Comparison between the synthetic seismic and the real seismic data and creation of the seismic correlation cube 4. Best density/vp/vs models are built by selecting areas of higher correlation using a genetic algorithm 5. Repeat step 1, 2, 3 and 4 until a defined number of simulations per iteration have been achieved 6. Estimate the models accuracy by computing the correlation cubes (using the best ρ/vp/vs models) 12 25/08/2014

Stochastic Seismic AVO Inversion Algorithm Flowchart 1. Stochastic simulation of the ρ/vp/vs data using the Direct Sequential Simulation (DSS) algorithm 2. Calculation of the synthetic pre-stack seismic cube with the simulated ρ/vp/vs models 3. Comparison between the synthetic seismic and the real seismic data and creation of the seismic correlation cube 4. Best density/vp/vs models are built by selecting areas of higher correlation using a genetic algorithm 5. Repeat step 1, 2, 3 and 4 until a defined number of simulations per iteration have been achieved 6. Estimate the models accuracy by computing the correlation cubes (using the best ρ/vp/vs models) 7. Repeat the whole procedure, using information regarding the best models found, until a matching criteria is reached 13 25/08/2014

Stochastic Seismic AVO Inversion DSS Algorithm 95% of the execution time Direct Sequential Simulation (DSS) algorithm required for ρ/vp/vs Required for each node of the simulation cube, leading to Simulation Cube 14

Direct Sequential Simulation (DSS) Algorithm Dependencies 95% of the execution time Required for each node of the simulation cube, leading to 85% of the execution time Simulation Cube 15 25/08/2014

Next: Parallelization Strategy Motivation and objectives Stochastic Seismic AVO Inversion Algorithm Algorithm Description Dependencies Parallelization Strategy Mapping Methodology Main Considered Optimizations Multi-GPU Parallelization Approach Experimental Results Conclusions 16

Algorithm Parallelization Possible strategies (1) Data-level: each part of the algorithm is individually accelerated by simultaneously processing independent data Few parallelization opportunities and sequential processing time is not enough to be worth the parallelization overhead Speed-up is around 1 17

Algorithm Parallelization Possible strategies (2) Functional-level: simultaneously execute multiple independent parts of the algorithm Create a pipeline of operations for the random sequence of nodes Simulation is limited by the sequential execution of parts C and D, limiting the maximum achievable speed-up to 1.4 85% of the execution time

Algorithm Parallelization Possible strategies (3) Path-level: multiple nodes from the random path are simulated at the same time by different parallel processing threads Has to ensure that no conflicts are observed between the nodes being simulated in parallel Large division in sub-grids: more dependencies broken eventually impacting on algorithm convergence 19

Algorithm Parallelization Optimizations Postpone the simulation of nodes located in regions with few available data values in the neighbourhood 5 6 5 4 1 4 5 4 4 2 For example, in this case only nodes from sub-grids with at least 4 conditioning nodes (both in the sub-grid being simulated and in the neighbouring sub-grids) are simulated. 3 4 3 3 2 20 25/08/2014

Algorithm Parallelization Optimizations Postpone the simulation of nodes located in regions with few available data values in the neighbourhood Optimize data structures and its indexing in order to increase the coalescence of memory accesses 21 25/08/2014

Algorithm Parallelization Optimizations Postpone the simulation of nodes located in regions with few available data values in the neighbourhood Optimize data structures and its indexing in order to increase the coalescence of memory accesses Use GPU local memory in order to optimize average access times to global memory buffers 22 25/08/2014

Algorithm Parallelization Optimizations Postpone the simulation of nodes located in regions with few available data values in the neighbourhood Optimize data structures and its indexing in order to increase the coalescence of memory accesses Use GPU local memory in order to optimize average access times to global memory buffers Minimize the number of OpenCL calls in order to reduce the parallelization overhead 23 25/08/2014

Algorithm Parallelization Optimizations Postpone the simulation of nodes located in regions with few available data values in the neighbourhood Optimize data structures and its indexing in order to increase the coalescence of memory accesses Use GPU local memory in order to optimize average access times to global memory buffers Minimize the number of OpenCL calls in order to reduce the parallelization overhead Use the bottom-up merge sort algorithm to minimize the warp divergence when executing multiple sorts in parallel 24 25/08/2014

Algorithm Parallelization Optimizations Postpone the simulation of nodes located in regions with few available data values in the neighbourhood Optimize data structures and its indexing in order to increase the coalescence of memory accesses Use GPU local memory in order to optimize average access times to global memory buffers Minimize the number of OpenCL calls in order to reduce the parallelization overhead Use the bottom-up merge sort algorithm to minimize the warp divergence when executing multiple sorts in parallel Overlap computations with communications and write file operations 25 25/08/2014

Algorithm Parallelization Multi-GPU Approach Requires a synchronization step at the end of each iteration Distribute nodes amongst the available devices according to realtime performance measurements 26

Next: Experimental Results Motivation and objectives Stochastic Seismic AVO Inversion Algorithm Algorithm Description Dependencies Parallelization Strategy Mapping Methodology Main Considered Optimizations Multi-GPU Parallelization Approach Experimental Results Conclusions 27

Experimental Results Computing Platforms Component System 1 System 2 System 3 CPU i7 3820 Xeon E5-2609 i7 4770K RAM 16 GB 32 GB 32 GB GPU 1 Hawaii R9 290X GeForce GTX 680 GeForce GTX 780 GPU2 GeForce GTX 560 Ti GeForce GTX 680 GeForce GTX 660 Ti Reference: i7 3820 CPU (4-cores) Experimental dataset with 237x197x350 nodes 5 iterations, each with 8 sets of simulations 28

Experimental Results Performance 1 4 cores: 5.3x speedup (hyper-threading enabled) 29

Experimental Results Performance 1 GPU 2 GPUs 1GPU 2GPUs: 1.8x speedup for the parallelized parts (difference for 2x due to device synchronization requirements) 30

Experimental Results Performance 1 GPU 2 GPUs Algorithm is now limited by the non-parallelized parts 1GPU 2GPUs: 1.8x speedup for the parallelized parts (difference for 2x due to device synchronization requirements) 31

Experimental Results Performance 1 GPU 2 GPUs 1GPU 2GPUs: 1.8x speedup for the parallelized parts (difference for 2x due to device synchronization requirements) Best performance: 22x speedup 32

Experimental Results Convergence Convergence is achieved when the number of grid divisions is well selected 33

Conclusions and on-going work Significant acceleration of state of the art Stochastic Seismic AVO Inversion algorithm Up to 22x speed-up versus 4-core CPU-only version Parallelism achieved by relaxing the algorithm dependencies in what concerns the random path No loss of accuracy in the generated models On-going work: Accelerate other parts of the algorithm, such as the computation of the correlation Further explore the algorithm parallelism 34

Pedro Tomás pedro.tomas@inesc-id.pt 35 Título da apresentação

Experimental Results Performance Bottlenecks Synchronization and communication requirements Application is memory bounded 36