Relatório 2 para a reunião 04 do grupo de otimização do mach2d

Transcrição

1 Relatório 2 para a reunião 04 do grupo de otimização do mach2d Nome: Diego Fernando Moro Data: 15 a 16/10/12 Índice: A Avaliação do efeito de dt usando diferentes dt s e solvers : Pág.1 B OTIMIZAÇÃO DAS SUBROTINAS DO MACH2D USANDO O PROGRAMA Intel(R) VTune(TM) Performance Analyzer 9.1 : Pág. 9 C - RESUMO DAS modificações : Pág. 22 D - REFINO PG : Pág. 23 E - CONCLUSÕES : Pág. 26 A Avaliação do efeito de dt usando diferentes dt s e solvers. Microcomputador utilizado: Intel Core i5-2450m 2,5 GHz 4 GB de memória RAM HD de 750 GB PARAMETROS 1 Utilizou-se nas tabelas a seguir os seguintes parâmetros: - imax = 1 - nitm_u = nitm_p = 4 - tolu = tolp = 1d-2 - num = 1 - kg = coord = 1 - modvis = modtur = cctw = 0 - beta = 0.d0 - Solver variável: MSI puro (0), TDMAXY (1), TDMAX (2), MSI e TDMAX (3), MSI e GS(4) - Critério de parada: Erro de máquina para a variação de Fd* - MALHA DE 56 x 20 NÓS Tabela 1 Malha de 56x20 nós, Tempo TEMPO por ESQUEMA (s) 5.00E-05 NC NC NC NC NC Back.01_X_ E NC NC Back.01_X_ E NC NC Back.01_X_ E NC Back.01_X_ E NC Tabela 2 Malha de 56x20 nós, It it por ESQUEMA 5.00E-05 NC NC NC NC NC Back.01_X_ E NC NC Back.01_X_ E NC NC Back.01_X_ E NC Back.01_X_ E NC

2 Tabela 3 Malha de 56x20 nós, Cd Cd por ESQUEMA 5.00E-05 NC NC NC NC NC Back.01_X_ E E+00 NC NC E E+00 Back.01_X_ E E+00 NC NC E E+00 Back.01_X_ E E E+00 NC E E+00 Back.01_X_ E E E+00 NC E E+00 Tabela 4 Malha de 56x20 nós, Fd* Fd* por ESQUEMA 5.00E-05 NC NC NC NC NC Back.01_X_ E E-01 NC NC E E-01 Back.01_X_ E E-01 NC NC E E-01 Back.01_X_ E E E-01 NC E E-01 Back.01_X_ E E E-01 NC E E-01 Tabela 5 Malha de 56x20 nós, d(fd*) d(fd*) por ESQUEMA 5.00E-05 NC NC NC NC NC Back.01_X_ E E+00 NC NC E E+00 Back.01_X_ E E+00 NC NC E E+00 Back.01_X_ E E E+00 NC E E+00 Back.01_X_ E E E+00 NC E E+00 - MALHA DE 112 x 40 NÓS Tabela 6 Malha de 112x40 nós, Tempo TEMPO por ESQUEMA (s) 3.00E-05 NC NC NC NC NC Back.02_X_ E NC NC Back.02_X_ E NC NC Back.02_X_ E NC NC Back.02_X_ E NC NC Tabela 7 Malha de 112x40 nós, It it por ESQUEMA 3.00E-05 NC NC NC NC NC Back.02_X_ E NC NC Back.02_X_ E NC NC Back.02_X_ E NC NC Back.02_X_ E NC NC Tabela 8 Malha de 112x40 nós, Cd Cd por ESQUEMA 3.00E-05 NC NC NC NC NC Back.02_X_ E E+00 NC NC E E+00 Back.02_X_ E E+00 NC NC E E+00 Back.02_X_ E E+00 NC NC E E+00 Back.02_X_ E E+00 NC NC E E+00 2

3 Tabela 9 Malha de 112x40 nós, Fd* Fd* por ESQUEMA 3.00E-05 NC NC NC NC NC Back.02_X_ E E-01 NC NC E E-01 Back.02_X_ E E-01 NC NC E E-01 Back.02_X_ E E-01 NC NC E E-01 Back.02_X_ E E-01 NC NC E E-01 Tabela 10 Malha de 112x40 nós, d(fd*) d(fd*) por ESQUEMA 3.00E-05 NC NC NC NC NC Back.02_X_ E E+00 NC NC E E+00 Back.02_X_ E E+00 NC NC E E+00 Back.02_X_ E E+00 NC NC E E+00 Back.02_X_ E E+00 NC NC E E+00 - MALHA DE 224 x 80 NÓS Tabela 11 Malha de 224x80 nós, Tempo TEMPO por ESQUEMA (s) 2.00E-05 NC NC NC NC NC Back.03_X_ E NC NC Back.03_X_ E NC NC Back.03_X_ E NC NC Back.03_X_ E NC NC Tabela 12 Malha de 224x80 nós, It it por ESQUEMA 2.00E-05 NC NC NC NC NC Back.03_X_ E NC NC Back.03_X_ E NC NC Back.03_X_ E NC NC Back.03_X_ E NC NC Tabela 13 Malha de 224x80 nós, Cd Cd por ESQUEMA 2.00E-05 NC NC NC NC NC Back.03_X_ E E+00 NC NC E E+00 Back.03_X_ E E+00 NC NC E E+00 Back.03_X_ E E+00 NC NC E E+00 Back.03_X_ E E+00 NC NC E E+00 Tabela 14 Malha de 224x80 nós, Fd* Fd* por ESQUEMA 2.00E-05 NC NC NC NC NC Back.03_X_ E E-01 NC NC E E-01 Back.03_X_ E E-01 NC NC E E-01 Back.03_X_ E E-01 NC NC E E-01 Back.03_X_ E E-01 NC NC E E-01 3

4 Tabela 15 Malha de 224x80 nós, d(fd*) d(fd*) por ESQUEMA 2.00E-05 NC NC NC NC NC Back.03_X_ E E+00 NC NC E E+00 Back.03_X_ E E+00 NC NC E E-16 Back.03_X_ E E+00 NC NC E E+00 Back.03_X_ E E+00 NC NC E E+00 - MALHA DE 448 x 160 NÓS Tabela 16 Malha de 448x160 nós, Tempo TEMPO por ESQUEMA (s) 6.00E-06 NC NC NC NC NC Back.04_X_ E NC NC Tabela 17 Malha de 448x160 nós, It it por ESQUEMA 6.00E-06 NC NC NC NC NC Back.04_X_ E NC NC Tabela 18 Malha de 448x16nós, Cd Cd por ESQUEMA 6.00E-06 NC NC NC NC NC Back.04_X_ E E+00 NC NC E E+00 Tabela 19 Malha de 448x160 nós, Fd* Fd* por ESQUEMA 6.00E-06 NC NC NC NC NC Back.04_X_ E E-01 NC NC E E-01 Tabela 20 Malha de 448x160 nós, d(fd*) d(fd*) por ESQUEMA 6.00E-06 NC NC NC NC NC Back.04_X_ E E+00 NC NC E E+00 PARAMETROS 2 Utilizou-se nas tabelas a seguir os seguintes parâmetros: - imax = 2 - nitm_u = nitm_p = 4 - tolu = 1d-1 - tolp = 1d-2 - num = 1 4

5 - kg = coord = 1 - modvis = modtur = cctw = 0 - beta = 0.d0 Critério de parada: Erro de máquina para a variação de Fd* - MALHA DE 56 x 20 NÓS Tabela 21 Malha de 56x20 nós, Tempo TEMPO por ESQUEMA (s) 5.00E-05 NC NC NC NC NC Back2.01_X_ E NC Back2.01_X_ E NC Back2.01_X_ E NC Back2.01_X_ E Tabela 22 Malha de 56x20 nós, It it por ESQUEMA 5.00E-05 NC NC NC NC NC Back2.01_X_ E NC Back2.01_X_ E NC Back2.01_X_ E NC Back2.01_X_ E Tabela 23 Malha de 56x20 nós, Cd Cd por ESQUEMA 5E-05 NC NC NC NC NC Back2.01_X_01 4E E E+00 NC E E+00 Back2.01_X_02 3E E E+00 NC E E+00 Back2.01_X_03 2E E E+00 NC E E+00 Back2.01_X_04 1E E E E E E+00 Tabela 24 Malha de 56x20 nós, Fd* Fd* por ESQUEMA 5.00E-05 NC NC NC NC NC Back2.01_X_ E E E-01 NC E E-01 Back2.01_X_ E E E-01 NC E E-01 Back2.01_X_ E E E-01 NC E E-01 Back2.01_X_ E E E E E E-01 Tabela 25 Malha de 56x20 nós, d(fd*) d(fd*) por ESQUEMA 5.00E-05 NC NC NC NC NC Back2.01_X_ E E E+00 NC E E+00 Back2.01_X_ E E E+00 NC E E+00 Back2.01_X_ E E E+00 NC E E+00 Back2.01_X_ E E E E E E+00 5

6 - MALHA DE 112 x 40 NÓS Tabela 26 Malha de 112x40 nós, Tempo TEMPO por ESQUEMA (s) 3.00E-05 NC NC NC NC NC Back2.02_X_ E NC NC Back2.02_X_ E NC Back2.02_X_ E NC Back2.02_X_ E NC Tabela 27 Malha de 112x40 nós, It it por ESQUEMA 3.00E-05 NC NC NC NC NC Back2.02_X_ E NC NC Back2.02_X_ E NC Back2.02_X_ E NC Back2.02_X_ E NC Tabela 28 Malha de 112x40 nós, Cd Cd por ESQUEMA 3.00E-05 NC NC NC NC NC Back2.02_X_ E E+00 NC NC E E+00 Back2.02_X_ E E E+00 NC E E+00 Back2.02_X_ E E E+00 NC E E+00 Back2.02_X_ E E E+00 NC E E+00 Tabela 29 Malha de 112x40 nós, Fd* Fd* por ESQUEMA 3.00E-05 NC NC NC NC NC Back2.02_X_ E E-01 NC NC E E-01 Back2.02_X_ E E E-01 NC E E-01 Back2.02_X_ E E E-01 NC E E-01 Back2.02_X_ E E E-01 NC E E-01 Tabela 30 Malha de 112x40 nós, d(fd*) d(fd*) por ESQUEMA 3.00E-05 NC NC NC NC NC Back2.02_X_ E E+00 NC NC E E+00 Back2.02_X_ E E+00 NC NC E E+00 Back2.02_X_ E E+00 NC NC E E+00 Back2.02_X_ E E+00 NC NC E E+00 - MALHA DE 224 x 80 NÓS Tabela 31 Malha de 224x80 nós, Tempo TEMPO por ESQUEMA (s) 2.00E-05 NC NC NC NC NC Back2.03_X_ E NC NC Back2.03_X_ E NC NC Back2.03_X_ E NC NC Back2.03_X_ E NC NC

7 Tabela 32 Malha de 224x80 nós, It it por ESQUEMA 2.00E-05 NC NC NC NC NC Back2.03_X_ E NC NC Back2.03_X_ E NC NC Back2.03_X_ E NC NC Back2.03_X_ E NC NC Tabela 33 Malha de 224x80 nós, Cd Cd por ESQUEMA 2.00E-05 NC NC NC NC NC Back2.03_X_ E E+00 NC NC E E+00 Back2.03_X_ E E+00 NC NC E E+00 Back2.03_X_ E E+00 NC NC E E+00 Back2.03_X_ E E+00 NC NC E E+00 Tabela 34 Malha de 224x80 nós, Fd* Fd* por ESQUEMA 2.00E-05 NC NC NC NC NC Back2.03_X_ E E-01 NC NC E E-01 Back2.03_X_ E E-01 NC NC E E-01 Back2.03_X_ E E-01 NC NC E E-01 Back2.03_X_ E E-01 NC NC E E-01 Tabela 35 Malha de 224x80 nós, d(fd*) d(fd*) por ESQUEMA 2.00E-05 NC NC NC NC NC Back2.03_X_ E E+00 NC NC E E+00 Back2.03_X_ E E+00 NC NC E E+00 Back2.03_X_ E E+00 NC NC E E+00 Back2.03_X_ E E+00 NC NC E E+00 - MALHA DE 448 x 160 NÓS Tabela 36 Malha de 448x160 nós, Tempo TEMPO por ESQUEMA (s) 6.00E-06 NC NC NC NC NC Back2.04_X_ E NC NC Back2.04_X_ E NC NC Tabela 37 Malha de 448x160 nós, It it por ESQUEMA 6.00E-06 NC NC NC NC NC Back2.04_X_ E NC NC Back2.04_X_ E NC NC

8 Tabela 38 Malha de 448x16nós, Cd Cd por ESQUEMA 6.00E-06 NC NC NC NC NC Back2.04_X_ E E-01 NC NC E E-01 Back2.04_X_ E E-01 NC NC E E-01 Tabela 39 Malha de 448x160 nós, Fd* Fd* por ESQUEMA 6.00E-06 NC NC NC NC NC Back2.04_X_ E E-01 NC NC E E-01 Back2.04_X_ E E-01 NC NC E E-01 Tabela 40 Malha de 448x160 nós, d(fd*) d(fd*) por ESQUEMA 6.00E-06 NC NC NC NC NC Back2.04_X_ E E+00 NC NC E E+00 Back2.04_X_ E E+00 NC NC E E+00 8

9 B - OTIMIZAÇÃO DAS SUBROTINAS DO MACH2D USANDO O PROGRAMA Intel(R) VTune(TM) Performance Analyzer 9.1 Microcomputador utilizado: CFD-6 Intel Core 2 Duo E6700-2,66 GHz 8 GB de memória RAM HD de 160 GB A variável que é analisada aqui é o CPU_CLK_UNHALTED events. Ela significa o número de ciclos de clock que a subrotina utiliza, no total. Se este valor for alto, isto significa que a subrotina leva muitos ciclos de clock, ou seja, ela pode ser otimizada, não necessáriamente lenta. - Para este estudo utilizou-se uma malha de 224x80 nós, dt=1d-5, itmax=1000, imax=2, nitm_u=nitm_p=4, tolu=1d-1, tolp=1d-2, solver = MSI 1)Versão GB_2012_10_01, simulação: SEN03_0001 Figura 1 Versão original *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) 9

10 = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) 2)Versão com algumas modificações no 'main.f90' : SEN03_0002 Figura 2 Versão modificada *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) 3)Versão com multiplicação por número constante nas subrotinas, ao invés de divisão e variáveis auxiliares para armazenamento do índice: SEN03_

11 Figura 3 Versão modificada *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) 4)SUBROTINA: COEFFICIENTS_mp_GET_T_COEFFICIENTS_AND_SOURCE: SEN03_0004 Utilizando-se uma variável auxiliar chamada inv_dt, que é a inversa de dt. Utilizou-se assim, a multiplicação por inv_dt ao invés de divisão (que consome mais tempo computacional). Houve uma redução de 7,544,780,000 ciclos de clock para 6,963,592,000. A rotina que utilizava 4,79% do tempo total, agora utiliza 4,43%. 11

12 Figura 4 Antes da Otimização Figura 5 Depois da otimização *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum 12

13 E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) 5)SUBROTINA: COEFFICIENTS_mp_GET_INTERNAL_SIMPLEC_COEFFICIENTS: SEN03_0005 Dividiu-se o cálculo em 2: um ciclo para encontrar a soma dos coeficientes au e av e outros dois ciclos para o cálculo. Foi criado 2 vetores, soma_au e soma_av, que são utilizados nos cálculos. Houve uma redução de 6,291,760,000 ciclos de clock para 4,668,166,000. A rotina que utilizava 4,00% do tempo total, agora utiliza 2,99%. Figura 6 Antes da Otimização 13

14 Figura 7 Depois da otimização *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) 6)SUBROTINA: COEFFICIENTS_mp_GET_VELOCITIES_AT_FACES: SEN03_0006 Utilizando-se uma variável auxiliar chamada inv_dt, que é a inversa de dt. Utilizou-se assim, a multiplicação por inv_dt ao invés de divisão (que consome mais tempo computacional). Houve uma redução de 26,308,088,000 ciclos de clock para 24,695,158,000. A rotina que utilizava 16,73% do tempo total, agora utiliza 16,00%. 14

15 Figura 8 Antes da Otimização Figura 9 Depois da otimização *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) 15

16 E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) 7)SUBROTINA: COEFFICIENTS_mp_GET_VELOCITIES_AT_FACES: SEN03_0007 Foi indexado a variável mp, agora usa-se apenas mpa(np), mpa(npe) e mpa(npn). Houve uma redução de 24,695,158,000 ciclos de clock para 24,100,640. A rotina que utilizava 16,00% do tempo total, agora utiliza 15,68%. Figura 10 Antes da Otimização 16

17 Figura 11 Depois da otimização *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) 8)SUBROTINA: COEFFICIENTS_mp_GET_VELOCITIES_AT_FACES: SEN03_0008 Foi utilizado apenas 1 único ciclo para cálculo das velocidades, com casos especiais para i = nx - 1 e j = ny - 1. Houve uma redução de 24,100,640,000 ciclos de clock para 22,085,144,000. A rotina que utilizava 15,68% do tempo total, agora utiliza 14,55%. 17

18 Figura 12 Antes da Otimização Figura 13 Depois da otimização *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) 18

19 E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) 9)SUBROTINA: COEFFICIENTS_mp_GET_VELOCITIES_AT_FACES: SEN03_0009 Foi indexado sumup e sumvp. Houve uma redução de 22,085,144,000 ciclos de clock para 14,447,054,000. A rotina que utilizava 14,55%.do tempo total, agora utiliza 10,00%. Figura 14 Antes da Otimização 19

20 Figura 15 Depois da otimização *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) COMPARACAO com uso do esquema de MSI no ciclo da massa e TDMAX nos outros sem paralelização (uso da biblioteca -qopenmp no command line) e MSI PURO: 20

21 Figura 16 MSI no ciclo da massa e TDMAX nos outros : mach2d-sen03_0011 *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) 21

22 Figura 17 MSI puro : mach2d-sen03_0012 *** Efficiency: numerical solution 2D / analytic Q1D (dimensionless) *** E+00 = discharge coefficient E-01 = dynamic thrust E-01 = pressure thrust at sea level (p = Pa) E+00 = pressure thrust in the vacuum E-01 = total thrust at sea level E-01 = total thrust in the vacuum E-01 = thrust coefficient at sea level E-01 = thrust coefficient in the vacuum E-01 = characterist velocity E-01 = velocity of efective ejection at sea level E-01 = velocity of efective ejection in the vacuum E-01 = specific impulse at sea level E-01 = specific impulse at vacuum = tcpuo: acumulated CPU time (s) (before interuption) = dtcpu: CPU time (s) (after interuption) = tcpu: total CPU time (s) C - RESUMO DAS MODIFICAÇÕES em relação a versão do mach2d recebida pelo Guilherme Bertoldo: 1) Foi retirada a linha de declaração da variável g no main.f90, senão o programa nem rodava. 2) Criou-se uma pasta com os arquivos do gnuplot na raiz do mach2d, assim a subrotina postp.f90 usa a variável character gnuplot =.\gnuplot\gnuplot. 3) Em todas as subrotinas que utilizavam um ciclo em j e outro em i, foi criado uma variável auxiliar aux_i para receber parte do índice: aux_i = (j-1) * nx índice = aux_i + i 4) Em todas as subrotinas que faziam a divisão de um número constante, foi feito uma multiplicação por um número constante, por exemplo: 22

23 - ao invés de 1/4, utilizou-se 0.25d0. - Fez-se ao invés de 1/3, d-1, mas o professor Manoel de Computação de Alto Desempenho não aprovou esta mudança, diz ele que pode acrescentar erros de arredondamento á solução numérica. Pelas otimizações acima, notou-se que a divisão por dt também consome muito tempo computacional, então criar uma variável auxiliar inv_dt é uma recomendação minha que pode economizar um tempo computacional considerável. 5) Adicionou-se a opção de 5 esquemas de solver: 0) Msi puro; 1)TDMA-XY; 2) TDMA-X; 3) MSI em p e TDMAX em u, v e T ; 4) MSI em p e Gauss-Siedel em u, v e T. Bastando apenas alterar uma variável no arquivo de entrada de dados (utilizou-se select case para escolher entre elas. 6) Alterou-se a posição de algumas linhas de código do main.f90, como a escrita do cabeçalho para fora do ciclo externo do programa. 7) As rotinas do solver TDMA foram otimizadas para tirar os if s do programa, tendo casos especiais, ao invés de if s 8) Na subrotina get_t_coefficients_and_source, foi criado uma variável inv_dt para armazenar a inversa de dt 9) Na subrotina get_internal_simplec_coefficients, foi indexado soma_au e soma_av 10) Na subrotina get_velocities_at_faces foi criado uma variável para armazenar a inversa de dt, foi indexado mpa, sumup e sumvp além de agrupar os dois ciclos em apenas um com casos especiais. D REFINO PG - Fazer refino PG no mach2d é mais fácil do que se pensava, basta fazer o seguinte: Por exemplo num refino 2 nas duas direções, o primeiro volume será dividido por 2 (na direção transversal), ou seja, na malha refinada basta utilizar um a1 grosso dividido por 2, que obter-se-á o refino PG 23

24 Figura 18 Malha base do refino PG de 19/12/11 Figura 19 Malha base do refino PG NOVO 24

25 Figura 20 Malha - base * 2 do refino PG de 21/12/11 Figura 21 Malha - base * 2 do refino PG NOVO 25

26 Figura 22 Malha - base * 4 do refino PG de 19/12/11 Figura 23 Malha - base * 4 do refino PG NOVO E - CONCLUSÕES - Nesta versão do mach2d, realmente não há efeito dt na solução, utilizando-se um dt maior ou menor. O mesmo foi verificado para o uso de diferentes solvers. - O uso do fator E não funciona para u e v, apenas para p e T. Tendo como base um dt fixo para todas as variáveis, é possível encontrar um fator E para p e T que diminua o tempo computacional, mantendo-se fixo um dt para u e v. 26

27 - O Esquema com MSI em p e TDMAX em u, v e T é o melhor esquema de solver para o mach2d-euler, mas ao simular-se mach2d-navier_stokes notou-se que MSI puro é o melhor esquema. Anteriormente, concluiu-se que o esquema com MSI em p e Gauss-Seidel em u, v e T era mais rápido, mas na versão atual retirou-se o cálculo do resíduo do solver TDMAX, o que diminuiu o tempo computacional o suficiente para tornar o esquema com MSI e TDMAX o mais rápido. - Com o Vtune é possível analisar o tempo computacional subrotina por subrotina, o que é muito importante no mach2d que possui muitas subrotinas. Algumas delas consomem pouquíssimo tempo computacional e outras, surpreendentemente, consumiam maior tempo computacional que o próprio solver. Isto é importantíssimo para a otimização, pois uma pequena alteração na rotina que mais consome tempo de CPU pode melhorar muito o programa. Basta ver a idéia de utilizar uma variável auxiliar para armazenar a inversa de dt (strength reduction), o que diminuiu em quase 1 bilhão e 700 milhões de ciclos de clock da subrotina get_velocities_at_faces. - Otimizou-se várias subrotinas com o Vtune, sendo a mais importante a subrotina get_velocities_at_faces. Foi possível partir de um tempo computacional de 61,329 s para 56,281 s (MSI puro com a biblioteca -Qopenmp habilitada) Isto numa malha de 224x80 nós, numa malha mais fina, será possível observar melhor o ganho computacional. Também, num computador melhor, o programa será mais rápido, no Intel i5, a mesma malha com os mesmos parâmetros consumia 22,887 s (antes das otimizações), agora utilizou 20,200 s (após as otimizações). - Para fazer refino PG, basta utilizar um a1 refinado = a1 grosso / 2 27