Performance-aware task management and frequency scaling in embedded systems Leonel Sousa Francisco Gaspar Aleksandar Ilic Pedro Tomás {las,fgaspar,ilic,pfzt}@sips.inesc-id.pt Signal Processing Systems INESC-ID / IST Portugal 1
Motivation Demand for high performance in mobile embedded devices is increasing High frequency multi-core architectures Solution to high power consumption è Single-ISA Heterogeneity (big.little) Default OS scheduling does not consider performance targets: Resources may be over-allocated No performance fairness among tasks Tasks on mobile embedded systems do not require high performance goals that typical schedulers aim to achieve Default Shares Frequency Normalised task performance 1 1 1 0 0 0 A B A B A B 2
Motivation Demand for high performance in mobile embedded devices is increasing High frequency multi-core architectures Solution to high power consumption è Single-ISA Heterogeneity (big.little) Default OS scheduling does not consider performance targets: Resources may be over-allocated No performance fairness among tasks Tasks on mobile embedded systems do not require high performance goals that typical schedulers aim to achieve Default Shares Frequency Equalize performance Normalised task performance 1 0 A B 1 0 A B 1 0 A B Reduce error to target 3
Objectives Adaptive and lightweight task management Provide performance fairness among the running tasks Attain control over the allocation of shared computational resources Automatically scale frequency according to the dynamic characterization of the execution of the parallel tasks Achieve energy-efficient execution 4
Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 5
Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 6
Scheduler Scheduler (CFS) attributes shares For a compute bound task, shares mainly depend on Nice level By default tasks have the same Nice level (i.e., same processor share) Epoch i A B C A B C Time } } Epoch i Epoch i+1 Task with lower Nice levels will increase their CPU share Epoch i+1 7
DVFS and Cluster migration Dynamic Voltage and Frequency Scaling (DVFS) Different governors result in different behaviors, voltage is set according to frequency In heterogeneous system with cluster migration DVFS controls migration System sees range of virtual frequency 250 MHz 1.6 GHz 250 MHz 600 MHz map to A7 at twice the frequency 800 MHz 1.6 GHz map to A15 directly Virtual Frequency Range 250 MHz 600 MHz 800 MHz 1.6 GHz 500 MHz 1.2 GHz Cortex-A7 Real Frequency Range 800 MHz 1.6 GHz Cortex-A15 Real Frequency Range 8
Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 9
Performance-aware task management and frequency scaling in embedded systems Concept Application-system interaction model Performance assumed proportional to share and frequency: P s ; P f Applications report their performance Application-specific parameter: P c P=c x s x f 10
Performance-aware task management and frequency scaling in embedded systems Concept Application-system interaction model Performance assumed proportional to share and frequency: P s ; P f Applications report their performance Application-specific parameter: P c P=c x s x f 11
Performance-aware task management and frequency scaling in embedded systems Concept Application-system interaction model Performance assumed proportional to share and frequency: P s ; P f Applications report their performance Application-specific parameter: P c P=c x s x f 12
Performance-aware task management and frequency scaling in embedded systems Concept Application-system interaction model Performance assumed proportional to share and frequency: P s ; P f Applications report their performance Application-specific parameter: P c P=c x s x f 13
Share calculation and conversion Attribute shares to minimize global error Equalize application error Performance * Previous Target After * * * Mathematical formulation 0 14
Share calculation and conversion Attribute shares to minimize global error Equalize application error Performance * Previous Target After 0 * Shares applied through Nice levels * Conversion only handles intervals Additional restriction introduced: Highest priority task as close as possible to nice level 0 * Mathematical formulation 15
Frequency scaling Scale frequency Bring applications to target Achieve energy savings Performance Previous Target After 0 Mathematical formulation (Target performance) (Predicted performance) 16
System and applications Shares applied to system by changing the tasks Nice levels Frequency applied by interacting with DVFS and setting system frequency Both affect application performance Modified applications report their performance through Heartbeats* * H. Hoffmann, J. Eastep, M. D. Santambrogio, J. E. Miller, and A. Agarwal, Application Heartbeats: A Generic Interface for Specifying Program Performance and Goals in Autonomous Computing Environments 17
Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 18
Experimental Evaluation Platform Odroid-XU+E big.little 4x Cortex-A7; 4x Cortex-A15 Cluster migration 2GB of RAM OS: Linux Ubuntu custom 3.4 kernel (by Hardkernel) Virtual Frequency Range 250 MHz 500 MHz 600 MHz 1.2 GHz Cortex-A7 Real Frequency Range 800 MHz 1.6 GHz 800 MHz 1.6 GHz Cortex-A15 Real Frequency Range 19
Experimental Evaluation Platform Odroid-XU+E big.little 4x Cortex-A7; 4x Cortex-A15 Cluster migration 2GB of RAM OS: Linux Ubuntu custom 3.4 kernel (by Hardkernel) Virtual Frequency Range 250 MHz 500 MHz 600 MHz 1.2 GHz Cortex-A7 Real Frequency Range 800 MHz 1.6 GHz 800 MHz 1.6 GHz Cortex-A15 Real Frequency Range 20
Experimental Evaluation Benchmarks and results Iterative QoS applications That interact in real-time with the user (target set by maximum perceived performance) That sample data from sensors (target set by data availability) Fluidanimate, Swaptions, Blackscholes and x264 (from PARSEC) used for benchmarking (4 threads each) 21
Experimental Evaluation Benchmarks and results Iterative QoS applications That interact in real-time with the user (target set by maximum perceived performance) That sample data from sensors (target set by data availability) Fluidanimate, Swaptions, Blackscholes and x264 (from PARSEC) used for benchmarking (4 threads each) Share controller (fairness) 22
Experimental Evaluation Benchmarks and results Iterative QoS applications That interact in real-time with the user (target set by maximum perceived performance) That sample data from sensors (target set by data availability) Fluidanimate, Swaptions, Blackscholes and x264 (from PARSEC) used for benchmarking (4 threads each) Share controller (fairness) Freq. controller (energy) 23
Experimental Evaluation Benchmarks and results Iterative QoS applications That interact in real-time with the user (target set by maximum perceived performance) That sample data from sensors (target set by data availability) Fluidanimate, Swaptions, Blackscholes and x264 (from PARSEC) used for benchmarking (4 threads each) Share controller (fairness) Freq. controller (energy) 24
Experimental Evaluation Performance results Fluidanimate, Swaptions and x264 simultaneosly No controller With controller +-10% target 25
Experimental Evaluation Performance results Fluidanimate, Swaptions and x264 simultaneosly No controller Not on target With controller +-10% target 26
Experimental Evaluation Performance results Fluidanimate, Swaptions and x264 simultaneosly No controller Not on target With controller +-10% target Perf. on target 27
Experimental Evaluation Frequency and power results No controller With controller 28
Experimental Evaluation Frequency and power results No controller Thermal throttling With controller 29
Experimental Evaluation Frequency and power results No controller Thermal throttling With controller Migration to A7 30
Outline Background Scheduler DVFS and Cluster migration Performance-aware task management and frequency scaling in embedded systems Concept Share calculation and conversion Frequency Scaling System and applications Experimental Evaluation Platform Results Conclusions and Future Work 31
Roundup and Conclusions Scheduling for heterogeneous embedded systems Lightweight task management and frequency scaling method Performance-aware Application-system interaction acquired Capture the run-time behavior of multiple parallel applications Performance fairness and energy savings facilitated Shared system resources allocated to meet target performance Relies on DVFS to manage the system energy-efficiency levels Experimental evaluation Relative performance error was reduced from 2.801 to 0.168, a 16 drop Achieve up to 49% reduction in the overall energy consumption 32
Future Work Improve response in case of thermal emergencies Gracefully handle non-qos tasks Explore per core performance fairness (thread level)* Consider systems that allow different frequency levels per core * already in progress 33
Thank You! Questions? technology 34 Leonel Sousa las@sips.inesc-id.pt