What is so special in the Tomasulo Hardware Algorithm?



Documentos relacionados
Welcome to Lesson A of Story Time for Portuguese

Guião M. Descrição das actividades

Organização de Computadores

Guião A. Descrição das actividades

Software reliability analysis by considering fault dependency and debugging time lag Autores

Aula 12 - Correção de erros

Arquitetura e Organização de Computadores 2

Arquitetura de Computadores

GUIÃO A. Ano: 9º Domínio de Referência: O Mundo do Trabalho. 1º Momento. Intervenientes e Tempos. Descrição das actividades

Serviços: API REST. URL - Recurso

Métodos Formais em Engenharia de Software. VDMToolTutorial

Bíblia de Estudo Conselheira - Gênesis: Acolhimento Reflexão Graça (Portuguese Edition)

Solutions. Adição de Ingredientes. TC=0.5m TC=2m TC=1m TC=3m TC=10m. O Tempo de Ciclo do Processo é determinado pelo TC da operação mais lenta.

SISTEMAS DISTRIBUÍDOS 1º EXAME

Efficient Locally Trackable Deduplication in Replicated Systems. technology from seed

Capítulo Sistemas de Memória Memória Virtual. Ch7b 1

Slides_Java_1 !"$ % & $ ' ' Output: Run java. Compile javac. Name of program. Must be the same as name of file. Java source code.

Inglês. Guião. Teste Intermédio de Inglês. Parte IV Interação oral em pares. Teste Intermédio

Prova Escrita de Inglês

Strings. COM10615-Tópicos Especiais em Programação I edmar.kampke@ufes.br 2014-II

User Guide Manual de Utilizador

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Secondary Education

Pipelining - analogia

Mestrado em Ciências Jurídicas Especialização em História do Direito

CODIGOS CONTROLE RS232 Matrix HDMI 4x2 Control EDID/RS232 (GB )

Cigré/Brasil. CE B5 Proteção e Automação. Seminário Interno de Preparação para o Colóquio do SC B5 2009

Conteúdo Programático Anual

Perguntas & Respostas

Easy Linux! FUNAMBOL FOR IPBRICK MANUAL. IPortalMais: a «brainware» company Manual

Sistemas Operativos - Mooshak. 1 Mooshak. in fct.ualg.pt/. mooshak.deei.fct.ualg.pt/.

Universidade Federal de Campina Grande Unidade Acadêmica de Sistemas e Computação Curso de Bacharelado em Ciência da Computação.

NORMAS DE FUNCIONAMENTO DOS CURSOS DE LÍNGUAS (TURMAS REGULARES E INTENSIVAS) 2015/2016

01-A GRAMMAR / VERB CLASSIFICATION / VERB FORMS

O guia do enxoval do bebê nos Estados Unidos: Dicas e segredos da maior especialista de compras em Miami, Orlando e Nova York (Portuguese Edition)

Direito Processual Civil (Coleção Sucesso Concursos Públicos e OAB) (Portuguese Edition)

Comportamento Organizacional: O Comportamento Humano no Trabalho (Portuguese Edition)

Modelagem e Simulação de Incêndios. Fire dynamics. Carlos André Vaz Junior

Versão: 1.0. Segue abaixo, os passos para o processo de publicação de artigos que envolvem as etapas de Usuário/Autor. Figura 1 Creating new user.

e-lab: a didactic interactive experiment An approach to the Boyle-Mariotte law

DEPARTAMENTO DE ENGENHARIA CIVIL E ARQUITETURA

IMMIGRATION Canada. Study Permit. São Paulo Visa Office Instructions. Table of Contents. For the following country: Brazil IMM 5849 E ( )

Manual de Docência para a disciplina de Algoritmia e Programação 2005/2006 Engenharia Informática, 1º ano José Manuel Torres

Project Management Activities

Solicitação de Mudança 01

User interface evaluation experiences: A brief comparison between usability and communicability testing

Trabalho sobre Escalonamento Estático

REDUZIR, REUTILAR E RECICLAR PAPEL

Interoperability through Web Services: Evaluating OGC Standards in Client Development for Spatial Data Infrastructures

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Secondary Education PORTUGUESE 0540/03

Análise de Sistemas de Informação

Sermões expositivos em todos os livros da Bíblia - Antigo Testamento: Esboços completos que percorrem todo o Antigo Testamento (Portuguese Edition)

Predição de Desvios e Processadores Superescalares Especulativos

Java RMI. Alcides Calsavara

ACFES MAIORES DE 23 ANOS INGLÊS. Prova-modelo. Instruções. Verifique se o exemplar da prova está completo, isto é, se termina com a palavra FIM.

Addition of Fields in Line Item Display Report Output for TCode FBL1N/FBL5N

Alfabetizar letrando na biblioteca escolar (Coleção biblioteca básica de alfabetização e letramento) (Portuguese Edition)

Como testar componentes eletrônicos - volume 1 (Portuguese Edition)

Descrição das actividades

VGM. VGM information. ALIANÇA VGM WEB PORTAL USER GUIDE June 2016

Bíblia do Obreiro - Almeida Revista e Atualizada: Concordância Dicionário Auxílios Cerimônias (Portuguese Edition)

Prova Oral de Inglês Duração da Prova: 20 a 25 minutos 2013/ º Momento. 4 (A), are you a health-conscious person?

RcPDV Rica Informática

Laboratório 3. Base de Dados II 2008/2009

MARCELO DE LIMA BRAZ REDUÇÃO DA QUANTIDADE DE REPROCESSO NO SETOR DE PRODUÇÃO DE CALDOS ALIMENTÍCIOS NA EMPRESA DO RAMO ALIMENTÍCIO (ERA).

My English Language Passport

WORKING CHILDREN. a) How many children in Britain have part-time jobs?. b) What do many Asian children do to make money in Britain?.

NORMAS PARA AUTORES. As normas a seguir descritas não dispensam a leitura do Regulamento da Revista Portuguesa de Marketing, disponível em

Dealing with Device Data Overflow in the Cloud

Operação de Instalações Marítimas

Versão 1. Nome do aluno: N.º: Turma: Atenção! Não vires esta página até receberes a indicação para o fazeres.

Session 8 The Economy of Information and Information Strategy for e-business

Trabalho de AMSR. Especificação e Verificação de uma Câmara Fotográfica Digital. Problema a Resolver FEUP/MRSC/AMSR MPR. » Problema a concurso

ÍNDICE PORTUGUÊS INDEX ENGLISH

Como Mudar a Senha do Roteador Pelo IP o.1.1. Configure e Altere a Senha do seu Roteador acessando o IP Acesse o Site e Confira!

1ª A, B, C, D, E Nº DE HORAS/AULA SEMANAL: 02 TOTAL DE HORAS/AULA/ANO:

Como escrever para o Enem: roteiro para uma redação nota (Portuguese Edition)

Biscuit - potes (Coleção Artesanato) (Portuguese Edition)

Introdução A Delphi Com Banco De Dados Firebird (Portuguese Edition)

Guia de Preenchimento da Proposta de Adesão ao Plano de Saúde Claro Dental

Transcrição:

What is so special in the Tomasulo Hardware Algorithm? Leonel Sousa 1

Outline 1. Motivation and main objective 2. Tomasulo hardware algorithm 3. Impact in current processors and systems Robert Tomasulo life and OoO history 2

Motivation and main objective 1. Motivation and main objective 2. Tomasulo hardware algorithm 3. Impact in current processors and systems 4. Robert Tomasulo life and OoO history 3

Motivation and main objective Computers in the late 60 s start having several execution units First step toward exploiting concurrency was to divide execution into two independent parts: fixed-point vs floating-point However the program must contain a balanced mixture of both type of instructions, for this scheme to be effective Therefore, the main objective of the Tomasulo s paper is to achieve concurrent execution of floating-point instructions in the IBM System/360 Model 91 In the case of the IBM System/360 Model 91, an adder and a multiplier/divider!! 4

Motivation and main objective It is no longer a matter of classifying individual instruction Independent of the previous instructions Rather, it is a question of determining each instruction s relationship with all previous, incomplete instructions The final objective is in HARDWARE To preserve the precedences while allowing the execution overlap independent operations for the first time to dynamically exploit concurrency in programs with none effort by the programmer or by the compiler 5

Motivation and main objective IBM System/360 Model 91 Designed to handle data processing for scientific applications such as space exploration, theoretical astronomy, subatomic physics and global weather forecasting Instruction time: 60x10-9 s (ns) Time to fetch/store 8 Bytes: 780 ns Main memory: 2,097,152 Byte 6

Tomasulo hardware algorithm 1. Motivation and main objective 2. Tomasulo hardware algorithm 3. Impact in current processors and systems 4. Robert Tomasulo life and OoO history 7

Tomasulo hardware algorithm: IBM360/91 datapath 8

Tomasulo hardware algorithm: Data (True) dependencies FO and FLB2 cannot be sent to the multiplier FLB1 has not yet been set into FO. This sequence illustrates the cardinal precedence principle: No floating-point register may participate in an operation if it is the sink of another incomplete instruction A register cannot be used until its contents reflects the result of the most recent operation that uses that register as a sink 9

Tomasulo hardware algorithm Before Tomasulo algorithm Designs didn t incorporate mechanisms to deal with this dependence Such hardware mechanism require the following functions To identify dependencies and cause the correct sequencing of the dependent instructions To distinguish between the given sequence from others LD FO, FLB1 MD F2, FLB2 where it must allow the independent MD to proceed regardless of the disposition of the LD 10

Tomasulo hardware algorithm: simplest approach doesn t work One possible hardware approach to identify true depend. busy bit associated with each register set when instruction designate sink register and reset when result is written No instruction can be issued if the busy bit of its sink is on If busy bit of source register is on, control bits are set When result is written into the register, these control bits cause the register to be sent via the FLR bus to the unit waiting it as a source However in practice this simplest hardware scheme does not allow to exploit concurrency by two main reasons Execution units are occupied waiting for the operands It does not solve not-true name dependencies, such as in the code LD FO, Ei; MD FO, Di; AD FO, Ci; LD F0,Ei-1 11

Tomasulo hardware algorithm: RS + CDB/Tag Reservation Stations (RS) Several set of registers (control, sink, source) associated to each execution unit - > Each set is called a RS Instruction issuing depends on the availability of reservation station In the Model 91 there are three add and two multiply/divide RS Common Data Bus (CDB) The CDB is fed by all units that can modify a register and feeds all units which can have a register as an operand (RS and FLR) A TAG is used to identify each contributor to the CDB, Since there are eleven contributors in the 360/91 a 4-bit tag is used 12

Tomasulo hardware algorithm: Microarchitecture 13

Tomasulo hardware algorithm 1. Instruction issue to a free RS for the current operation Copy the source operands whose value is available to this RS If not available, copy the tag of the source register to the RS Set the tag of the sink register with the identifier of the used RS 2. Instruction execution When all source operands are available in a RS execute operation If not, each RS snoop the CDB to catch missing source operands 3. Write Results When an executed instruction produces result, it is placed in the CDB together with the RS tag This is the opportunity for other RSs and FLRs to update themselves

Tomasulo hardware algorithm: Why is it so important? RSs solve structural hazards providing virtual units with a single physical device The CDB based Tag scheme solve true dependencies Inst I : LD F0 FLB1 Inst J : MD F0 FLB2 Inst I Inst J F0 CDB Tag

Tomasulo hardware algorithm: Why is it so important? Better than that it solves all non-true dependencies Name dependencies: anti-depencies or output depedencies WHY? Because with RS Tomasulo automaticaly performs renaming Anti-dependency Inst I : DADD R2,R1,R0 Inst J : DSUB R1,R3,R4 Inst I Inst J F0 Output-dependency Inst I : DADD R2,R1,R0 Inst J : DSUB R2,R3,R4

Impact in current processors 1. Motivation and main objective 2. The Tomasulo hardware algorithm 3. Impact in current processors and systems 4. Robert Tomasulo life and OoO history 17

Impact in current processors It naturally supports Out-of-Order (OoO) execution With only slight changes it supports Speculative Execution Additional commit step and ReOrder Buffer (ROB) OoO speculative execution allows processors to be really parallel, executing several instructions per single cycle: Instruction Level Parallelism (ILP) All actual superscalar processors are based on the Tomasulo Hardware algorithm General Purpose Processors: PowerPC (IBM), Intel, AMD Embedded Processors: ARM,MIPS 18

Impact in current processors It has been fundamental to follow the Moore s law Number of transistors on an integrated circuit double every two years 19

Robert Tomasulo life and OoO history 1. Motivation and main objective 2. The Tomasulo hardware algorithm 3. Impact in current processors and systems 4. Robert Tomasulo life and OoO history 20

Robert Tomasulo life and OoO history Eckert-Mauchly Award 1997, citation For the ingenious Tomasulo's algorithm, which enabled out-oforder execution processors to be implemented Why did it take so much time to make use of Tamsulo s hardware algorithm? Mainly because at that time it did not significantly improve the performance Memory is slow, the number of execution units is not too much, and no ILP was exploited 21

Robert Tomasulo life and OoO history Last Seminar January 2008 at University of Michigan: Out-of-Order processing-history of the IBM System/360 Model 91 http://inst-tech.engin.umich.edu/media/index.php?sk=cse-dls-07&id=1683 Age 73, Robert Tomasulo passed away on April 3. His friends call him Uncle Bob WHAT is GOING ON? Limit in the exploitation of ILP, starting multi-core era Several processors for processing in parallel More on the software side -> programming models and compilers PPL Stanford: http://ppl.stanford.edu/wiki/index.php/pervasive_parallelism_laboratory 22

Sips.inesc-id.pt See John Hennessy on YouTube: http://www.youtube.com/watch?v=fou3zzh5qo4 http://sips.inesc-id.pt 23

24 Caravela: A Distributed Stream-based Computing Platform IBM Haifa Labs 17-04-2007