Performance-aware task management and frequency scaling in embedded systems



Documentos relacionados
A Cloud Computing Architecture for Large Scale Video Data Processing

CHPC Computational Platforms

Tese / Thesis Work Análise de desempenho de sistemas distribuídos de grande porte na plataforma Java

HMI Caracteristicas e extensões utilizando FT View ME v6.1 e PanelView Plus 6

OVERVIEW DO EAMS. Enterprise Architecture Management System 2.0

GPU-based Heterogeneous Systems [PCs (CPU + GPU) = Heterogeneous Systems]

GPON-IN-A-BOX. QREN - I&D em Co-Promoção. Co-financiado por:

Multicriteria Impact Assessment of the certified reference material for ethanol in water

Curso CP100A - Google Cloud Platform Fundamentals (8h)

AWS Certified Solutions Architect Associate Level

MEDIÇÃO DA CORRENTE ELÉCTRICA COM SENSOR DE EFEITO HALL

CANape/vSignalyzer. Data Mining and Report Examples Offline Analysis V

SmartLPR. SmartLPR Placa Reconhecimento da Matrícula

Automated Control in Cloud Computing: Challenges and Opportunities

Designing Solutions for Microsoft SQL Server 2014 (20465)

hdd enclosure caixa externa para disco rígido

Easy Linux! FUNAMBOL FOR IPBRICK MANUAL. IPortalMais: a «brainware» company Manual

Otimização geral de processos (OEE) Fabian Prehn Campinas Setembro 2014

Presentation: MegaVoz Contact Center Tool

Análise do impacto de operações de live migration em ambientes de computação em nuvem Workshop MoDCS

User interface evaluation experiences: A brief comparison between usability and communicability testing

ICS-GT INTEGRATED CONTROL SYSTEM FOR GAS TURBINE

...de forma confiável, consistente, económica. Permite- nos acesso a grandes capacidades. Infra-estrutura de hardware e software

Análise de desempenho e eficiência energética de aceleradores NVIDIA Kepler


Por dentro do Windows: Gerenciamento de Memória

GESTÃO DE RECURSOS NATURAIS. Ano letivo 2011/2012. Exercício: Sistema de apoio à decisão para eucalipto (Aplicação de Programação Linear)

Interoperability through Web Services: Evaluating OGC Standards in Client Development for Spatial Data Infrastructures

Hitachi Unified Storage. Família HUS 100. Henrique Leite! Tuesday, 4 de September de 12! Solutions Consultant!

Arquitetura e Organização de Computadores 2

Felipe Beltrán Rodríguez 1, Eng., Master Student Prof. Erlon Cristian Finardi 1, D. Eng., Advisor Welington de Oliveira 2, D.Sc.

Designing drive controllers with Matlab - Simulink 1kW

Project Management Activities

Collaborative Networks the rsptic example espap Entidade de Serviços Partilhados da Administração Pública, I.P. Direitos reservados.

Efficient Locally Trackable Deduplication in Replicated Systems. technology from seed

Computação de alto desempenho. Joubert de Castro Lima Professor Adjunto DECOM

MEDIÇÕES DE RÁDIO-FREQUÊNCIA SUPORTANDO A OPERAÇÃO DE SISTEMAS DE TV DIGITAL ISDB-T. Agilent Restricted

Deploying and Managing Windows 10 Using Enterprise Services ( )

Introduction to Network Design and Planning

Software Testing with Visual Studio 2013 (20497)

User Guide Manual de Utilizador

Implementing a Data Warehouse with Microsoft SQL Server 2014 (20463)

ACCESS TO ENERGY IN TIMOR-LESTE

CALENDÁRIO DE FORMAÇÃO MICROSOFT > 2º Semestre 2010

Redes de Telecom Evolução e Tendências

Braskem Máxio. Maio / May 2015

comscore, Inc. Proprietary. 1

Hybrid Cloud com Cloud Platform

Java RMI. Alcides Calsavara

Transcript name: 1. Introduction to DB2 Express-C

Em Direção à Comparação do Desempenho das Aplicações Paralelas nas Ferramentas OpenStack e OpenNebula

Acelerando Seus Negócios Riverbed Performance Platform

T22 Virtualização, Computação em nuvem e Mobilidade. Quais os benefícios destas tecnologias para a Manufatura?

ANALYSIS OF THE APPLICATION OF THE LADM IN THE BRAZILIAN URBAN CADASTRE: A CASE STUDY FOR THE CITY OF ARAPIRACA BRAZIL

Sistemas de Reflectometria de Microondas e Ondas. Milimétricas para Plasmas de Fusão

SATA 3.5. hd:basic. hdd enclosure caixa externa para disco rígido

Programação em Paralelo. N. Cardoso & P. Bicudo. Física Computacional - MEFT 2012/2013

Técnicas de Desenvolvimento para Sistemas Real Time com LabVIEW

Máquinas virtuais. Máquina virtual de um processo. Máquinas virtuais (3) Máquina virtual de sistema. Máquinas virtuais (1) VMware para Windows e Linux

Using Big Data to build decision support tools in

Windows NT 4.0. Centro de Computação

Xenomai Short Intro. Paulo Pedreiras DETI/University of Aveiro. Sistemas Tempo-Real Out/2013 (Rev. 1 - Out/2015)

Online Collaborative Learning Design

Cloud Computing Thomas Santana IBM Corporation

Dino SMART Production. Monitoração de Jobs da produçao do ambiente mainframe IBM

Transcrição:

Performance-aware task management and frequency scaling in embedded systems Leonel Sousa Francisco Gaspar Aleksandar Ilic Pedro Tomás {las,fgaspar,ilic,pfzt}@sips.inesc-id.pt Signal Processing Systems INESC-ID / IST Portugal 1

Motivation Demand for high performance in mobile embedded devices is increasing High frequency multi-core architectures Solution to high power consumption è Single-ISA Heterogeneity (big.little) Default OS scheduling does not consider performance targets: Resources may be over-allocated No performance fairness among tasks Tasks on mobile embedded systems do not require high performance goals that typical schedulers aim to achieve Default Shares Frequency Normalised task performance 1 1 1 0 0 0 A B A B A B 2

Motivation Demand for high performance in mobile embedded devices is increasing High frequency multi-core architectures Solution to high power consumption è Single-ISA Heterogeneity (big.little) Default OS scheduling does not consider performance targets: Resources may be over-allocated No performance fairness among tasks Tasks on mobile embedded systems do not require high performance goals that typical schedulers aim to achieve Default Shares Frequency Equalize performance Normalised task performance 1 0 A B 1 0 A B 1 0 A B Reduce error to target 3

Objectives Adaptive and lightweight task management Provide performance fairness among the running tasks Attain control over the allocation of shared computational resources Automatically scale frequency according to the dynamic characterization of the execution of the parallel tasks Achieve energy-efficient execution 4

Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 5

Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 6

Scheduler Scheduler (CFS) attributes shares For a compute bound task, shares mainly depend on Nice level By default tasks have the same Nice level (i.e., same processor share) Epoch i A B C A B C Time } } Epoch i Epoch i+1 Task with lower Nice levels will increase their CPU share Epoch i+1 7

DVFS and Cluster migration Dynamic Voltage and Frequency Scaling (DVFS) Different governors result in different behaviors, voltage is set according to frequency In heterogeneous system with cluster migration DVFS controls migration System sees range of virtual frequency 250 MHz 1.6 GHz 250 MHz 600 MHz map to A7 at twice the frequency 800 MHz 1.6 GHz map to A15 directly Virtual Frequency Range 250 MHz 600 MHz 800 MHz 1.6 GHz 500 MHz 1.2 GHz Cortex-A7 Real Frequency Range 800 MHz 1.6 GHz Cortex-A15 Real Frequency Range 8

Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 9

Performance-aware task management and frequency scaling in embedded systems Concept Application-system interaction model Performance assumed proportional to share and frequency: P s ; P f Applications report their performance Application-specific parameter: P c P=c x s x f 10

Performance-aware task management and frequency scaling in embedded systems Concept Application-system interaction model Performance assumed proportional to share and frequency: P s ; P f Applications report their performance Application-specific parameter: P c P=c x s x f 11

Performance-aware task management and frequency scaling in embedded systems Concept Application-system interaction model Performance assumed proportional to share and frequency: P s ; P f Applications report their performance Application-specific parameter: P c P=c x s x f 12

Performance-aware task management and frequency scaling in embedded systems Concept Application-system interaction model Performance assumed proportional to share and frequency: P s ; P f Applications report their performance Application-specific parameter: P c P=c x s x f 13

Share calculation and conversion Attribute shares to minimize global error Equalize application error Performance * Previous Target After * * * Mathematical formulation 0 14

Share calculation and conversion Attribute shares to minimize global error Equalize application error Performance * Previous Target After 0 * Shares applied through Nice levels * Conversion only handles intervals Additional restriction introduced: Highest priority task as close as possible to nice level 0 * Mathematical formulation 15

Frequency scaling Scale frequency Bring applications to target Achieve energy savings Performance Previous Target After 0 Mathematical formulation (Target performance) (Predicted performance) 16

System and applications Shares applied to system by changing the tasks Nice levels Frequency applied by interacting with DVFS and setting system frequency Both affect application performance Modified applications report their performance through Heartbeats* * H. Hoffmann, J. Eastep, M. D. Santambrogio, J. E. Miller, and A. Agarwal, Application Heartbeats: A Generic Interface for Specifying Program Performance and Goals in Autonomous Computing Environments 17

Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 18

Experimental Evaluation Platform Odroid-XU+E big.little 4x Cortex-A7; 4x Cortex-A15 Cluster migration 2GB of RAM OS: Linux Ubuntu custom 3.4 kernel (by Hardkernel) Virtual Frequency Range 250 MHz 500 MHz 600 MHz 1.2 GHz Cortex-A7 Real Frequency Range 800 MHz 1.6 GHz 800 MHz 1.6 GHz Cortex-A15 Real Frequency Range 19

Experimental Evaluation Platform Odroid-XU+E big.little 4x Cortex-A7; 4x Cortex-A15 Cluster migration 2GB of RAM OS: Linux Ubuntu custom 3.4 kernel (by Hardkernel) Virtual Frequency Range 250 MHz 500 MHz 600 MHz 1.2 GHz Cortex-A7 Real Frequency Range 800 MHz 1.6 GHz 800 MHz 1.6 GHz Cortex-A15 Real Frequency Range 20

Experimental Evaluation Benchmarks and results Iterative QoS applications That interact in real-time with the user (target set by maximum perceived performance) That sample data from sensors (target set by data availability) Fluidanimate, Swaptions, Blackscholes and x264 (from PARSEC) used for benchmarking (4 threads each) 21

Experimental Evaluation Benchmarks and results Iterative QoS applications That interact in real-time with the user (target set by maximum perceived performance) That sample data from sensors (target set by data availability) Fluidanimate, Swaptions, Blackscholes and x264 (from PARSEC) used for benchmarking (4 threads each) Share controller (fairness) 22

Experimental Evaluation Benchmarks and results Iterative QoS applications That interact in real-time with the user (target set by maximum perceived performance) That sample data from sensors (target set by data availability) Fluidanimate, Swaptions, Blackscholes and x264 (from PARSEC) used for benchmarking (4 threads each) Share controller (fairness) Freq. controller (energy) 23

Experimental Evaluation Benchmarks and results Iterative QoS applications That interact in real-time with the user (target set by maximum perceived performance) That sample data from sensors (target set by data availability) Fluidanimate, Swaptions, Blackscholes and x264 (from PARSEC) used for benchmarking (4 threads each) Share controller (fairness) Freq. controller (energy) 24

Experimental Evaluation Performance results Fluidanimate, Swaptions and x264 simultaneosly No controller With controller +-10% target 25

Experimental Evaluation Performance results Fluidanimate, Swaptions and x264 simultaneosly No controller Not on target With controller +-10% target 26

Experimental Evaluation Performance results Fluidanimate, Swaptions and x264 simultaneosly No controller Not on target With controller +-10% target Perf. on target 27

Experimental Evaluation Frequency and power results No controller With controller 28

Experimental Evaluation Frequency and power results No controller Thermal throttling With controller 29

Experimental Evaluation Frequency and power results No controller Thermal throttling With controller Migration to A7 30

Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 31

Roundup and Conclusions Scheduling for heterogeneous embedded systems Lightweight task management and frequency scaling method Performance-aware Application-system interaction acquired Capture the run-time behavior of multiple parallel applications Performance fairness and energy savings facilitated Shared system resources allocated to meet target performance Relies on DVFS to manage the system energy-efficiency levels Experimental evaluation Relative performance error was reduced from 2.801 to 0.168, a 16 drop Achieve up to 49% reduction in the overall energy consumption 32

Future Work Improve response in case of thermal emergencies Gracefully handle non-qos tasks Explore per core performance fairness (thread level)* Consider systems that allow different frequency levels per core * already in progress 33

Thank You! Questions? technology 34 Leonel Sousa las@sips.inesc-id.pt