Anais SESSÃO DE FERRAMENTAS Congresso Brasileiro de Software: Teoria e Prática 28 de setembro a 03 de outubro de 2014 Maceió/AL

Transcrição

1 Congresso Brasileiro de Software: Teoria e Prática 28 de setembro a 03 de outubro de 2014 Maceió/AL XXI SESSÃO DE FERRAMENTAS SESSÃO DE FERRAMENTAS 2014 Anais

2 Anais Volume 02 ISSN SESSÃO DE FERRAMENTAS 2014 XXI Sessão de Ferramentas COORDENADORES DO COMITÊ DE PROGRAMA Uirá Kulesza - Universidade Federal do Rio Grande do Norte (UFRN) Valter Camargo - Universidade Federal de São Carlos (UFSCar) COORDENAÇÃO DO CBSOFT 2014 Baldoino Fonseca - Universidade Federal de Alagoas (UFAL) Leandro Dias da Silva - Universidade Federal de Alagoas (UFAL) Márcio Ribeiro - Universidade Federal de Alagoas (UFAL) REALIZAÇÃO Universidade Federal de Alagoas (UFAL) Instituto de Computação (IC/UFAL) PROMOÇÃO Sociedade Brasileira de Computação (SBC) PATROCÍNIO CAPES, CNPq, INES, Google APOIO Instituto Federal de Alagoas, Aloo Telecom, Springer, Secretaria de Estado do Turismo AL, Maceió Convention & Visitors Bureau, Centro Universitário CESMAC e Mix Cópia 2

3 PROCEEDINGS Volume 02 ISSN TOOLS 2014 XXI Tools Session PROGRAM CHAIRS Uirá Kulesza - Universidade Federal do Rio Grande do Norte (UFRN) Valter Camargo - Universidade Federal de São Carlos (UFSCar) CBSOFT 2014 GENERAL CHAIRS Baldoino Fonseca - Universidade Federal de Alagoas (UFAL) Leandro Dias da Silva - Universidade Federal de Alagoas (UFAL) Márcio Ribeiro - Universidade Federal de Alagoas (UFAL) ORGANIZATION Universidade Federal de Alagoas (UFAL) Instituto de Computação (IC/UFAL) PROMOTION Sociedade Brasileira de Computação (SBC) SPONSORS CAPES, CNPq, INES, Google SUPPORT Instituto Federal de Alagoas, Aloo Telecom, Springer, Secretaria de Estado do Turismo - AL, Maceió Convention & Visitors Bureau, Centro Universitário CESMAC and Mix Cópia 3

4 Autorizo a reprodução parcial ou total desta obra, para fins acadêmicos, desde que citada a fonte 4

5 Apresentação A Sessão de Ferramentas é um evento bastante tradicional da comunidade de software brasileira, sendo esta sua 21ª. edição. Até 2009, a Sessão de Ferramentas era realizada como evento satélite de dois simpósios: o SBES (Simpósio Brasileiro de Engenharia de Software) e o SBCARS (Simpósio Brasileiro de Componentes, Arquiteturas e Reutilização de Software). Em 2010, a criação do CBSoft passou a englobar SBES, SBCARS e SBLP (Simpósio Brasileiro de Linguagens de Programação) sob um único congresso. Em 2011 o SBMF (Simpósio Brasileiro de Métodos Formais) foi também incorporado ao CBSoft. Dessa forma, o escopo da sessão de ferramentas foi ampliado, aceitando contribuições dessas quatro comunidades. Nesta 21ª. edição, a sessão de ferramentas foi realizada novamente no âmbito do CBSoft, nos dias 01 e 02 de outubro na cidade de Maceió, Alagoas, Brasil. O comitê de programa foi composto por 49 membros de diferentes universidades brasileiras e estrangeiras, cobrindo diferentes áreas de pesquisa da engenharia de software. Foram recebidas 31 submissões de artigos de diferentes programas de pós-graduação do Brasil. Cada artigo foi revisto por três membros do comitê de programa, gerando um total de 93 excelentes revisões, o que contribuiu imensamente no processo de seleção dos artigos. Durante o processo de revisão, uma etapa de consenso e uma de rebuttal foram realizadas, melhorando o consenso entre revisores e dando aos autores a oportunidade de responderem questões levantadas nas revisões. Um diferencial desta edição foi a exigência de um vídeo sobre a ferramenta, eliminando a tradicional necessidade de deixar a ferramenta disponível para download. Como resultado do processo de revisão, 16 ferramentas foram selecionadas para serem publicadas nos anais e apresentadas na conferência (taxa de aceitação = 51%). Os artigos selecionados abordam as seguintes áreas da engenharia de software: arquitetura de software; modularidade e refatoração; mineração de repositórios; testes de software e métodos formais; linhas de produtos de software, e processos de software e negócio. O sucesso da Sessão de Ferramentas do CBSoft 2014 somente foi possível por causa da dedicação e entusiasmo de muitas pessoas. Primeiramente, gostaríamos de agradecer aos autores que submeteram seus trabalhos. Gostaríamos de agradecer também aos membros do comitê de programa e revisores externos, pelo excelente trabalho de revisão e participação ativa nas discussões. Também agradecemos à organização geral do CBSoft, representada por Leandro Dias da Silva (UFAL), Baldoino Fonseca (UFAL) e Márcio Ribeiro (UFAL), que foi essencial para o ótimo andamento deste evento. Esperamos que você aprecie o programa técnico da 21ª. Sessão de Ferramentas do 5º CBSoft Maceió, Outubro de 2014 Prof. Dr. Uirá Kulesza Prof. Dr. Valter Vieira de Camargo Coordenadores da 21ª. Sessão de Ferramentas do CBSoft

6 Foreword Tools Session is one of the most traditional events of the Brazilian software community and this is its 21 st edition. Until 2009, the Tools Session had been held as a satellite event of two well known Brazilian Symposiums: SBES (Brazilian Symposium on Software Engineering) and SBCARS (Brazilian Symposium on Software Components, Architectures and Reuse). In 2010, a new conference series named CBSoft was initiated, putting together SBES, SBCARS and also SBLP (Brazilian Symposium on Programming Languages). In 2011, SBMF (Brazilian Symposium on Formal Methods) was also incorporated. As a consequence, the scope of the Tools Session was broadened by accepting contributions from these four communities. In this 21 st edition, the Tools Session was held again under CBSoft conference on October 1 st and 2 nd in Maceió, Alagoas, Brazil. The program committee involved 49 members from Brazilian and international universities, covering the software engineering main research areas. We have received 31 submissions from different graduate programs in Brazil. Each paper was reviewed by 3 (three) members of the program committee, resulting in 93 excellent reviews that contributed to the selection process of the tools. A consensus and rebuttal phase was also conducted along the reviewing process, leading to a better consensus among reviewers and giving the authors the opportunity to elucidate unclear points. A remarkable characteristic of this edition was the requirement to make a video describing the tool available together with the submission, instead of the traditional necessity of making the tool available for download. As a result of the reviewing process, 16 tools were selected to be included in this proceedings and presented in the conference (acceptance rate = 51%). The selected tools encompass the following software engineering areas: software architecture; modularity and refactoring; mining software repositories; software testing and formal methods; product lines and business and software processes. The success of CBSoft Tools Session 2014 was only possible because of the dedication and enthusiasm of many people. First of all, we would like to thank the authors for submitting their papers. We would also like to thank the Program Committee members and external reviewers for the excellent reviewing work and the active participation on the discussions. We also thank the organization of CBSoft, Leandro Dias da Silva (UFAL), Baldoino Fonseca (UFAL) e Márcio Ribeiro (UFAL) that was fundamental to the organization and success of this event. We hope you enjoy the technical program of the 21 st CBSoft Tools Session Prof. Dr. Uirá Kulesza Prof. Dr. Valter Vieira de Camargo Chairs of the CBSoft 2014 Tools Session 6

7 Biografia dos Coordenadores / Chairs Short Biographies Uirá Kulesza Uirá Kulesza is an Associate Professor at the Department of Informatics and Applied Mathematics (DIMAp), Federal University of Rio Grande do Norte (UFRN), Brazil. He obtained his PhD in Computer Science at PUC-Rio Brazil (2007), in cooperation with University of Waterloo and Lancaster University. His main research interests include: software architecture, modularity, and software product lines. He has co-authored over 150 referred papers in journals, conferences, and books. He is currently a visiting researcher at the Software Engineering Research Group (SERG) at Delft University of Technology (TU Delft). He worked as a post-doc researcher member of the AMPLE project ( ) Aspect-Oriented Model-Driven Product Line Engineering ( at the New University of Lisbon, Portugal. He is currently a CNPq research fellow level 2. Valter Vieira de Camargo Valter Vieira de Camargo is an Associate Professor at the Computing Department of Computer Science of Federal University of São Carlos, Brazil (DC/UFSCar). Currently he is the head of the AdvanSE (Advansed Research on Software Engineering Group) in this department. He obtained his PhD in Computer Science at ICMC/USP in 2006 and his Master Degree in 2001 in DC/UFSCar. Along the year of 2013, he worked as an invited researcher in the ENOFES Project, at the Computing Department of the University of Twente, Netherlands. His main research interests are Software Modernization, Software Modularity and Software Reuse (frameworks and product lines). He has co-authored over 110 referred papers in journals, conferences and books. 7

8 Comitês Técnicos / Program Committee Comitê do programa / Program Committee Adenilso Simao - Universidade de São Paulo (USP) Alexandre Mota - Universidade Federal de Pernambuco (UFPE) Anamaria Martins Moreira - Universidade Federal do Rio de Janeiro (UFRJ) André Santos - Universidade Federal de Pernambuco (UFPE) Arilo Dias Neto - Universidade Federal do Amazonas (UFAM) Cecilia Rubira - Universidade Estadual de Campinas (UNICAMP) Cláudio Sant`Anna - Universidade Federal da Bahia (UFBA) Daniel Lucrédio - Universidade Federal de São Carlos (UFSCar) David Déharbe - Universidade Federal do Rio Grande do Norte (UFRN) Delano Beder - Universidade Federal de São Carlos (UFSCar) Eduardo Almeida - Universidade Federal da Bahia (UFBA) Eduardo Figueiredo - Universidade Federal de Minas Gerais (UFMG) Elder José Cirilo - Universidade Federal de São João del-rei (UFSJ) Elisa Huzita - Universidade Estadual de Maringá (UEM) Fabiano Ferrari - Universidade Federal de São Carlos (UFSCar) Fernando Castor - Universidade Federal de Pernambuco (UFPE) Fernando Trinta - Universidade Federal do Ceará (UFC) Frank Siqueira - Universidade Federal de Santa Catarina (UFSC) Franklin Ramalho - Universidade Federal de Campina Grande (UFCG) Glauco Carneiro - Universidade Salvador (UNIFACS) Gledson Elias - Universidade Federal da Paraíba (UFPB) Ingrid Nunes - Universidade Federal do Rio Grande do Sul (UFRGS) Leila Silva - Universidade Federal de Sergipe (UFS) Lile Hattori - Microsoft Research Luis Ferreira Pires - University of Twente, Netherlands Marcel Oliveira - Universidade Federal do Rio Grande do Norte (UFRN) Marcelo damorim - Universidade Federal de Pernambuco (UFPE) Marcelo Augusto Santos Turine - Universidade Federal de Mato Grosso do Sul (UFMS) Marco Tulio Valente - Universidade Federal de Minas Gerais (UFMG) Maria Istela Cagnin - Universidade Federal de Mato Grosso do Sul (UFMS) Márcio Cornélio - Universidade Federal de Pernambuco (UFPE) Nabor Mendonca - Universidade de Fortaleza (UNIFOR) Otavio Lemos - Universidade Federal de São Paulo (UNIFESP) Patricia Machado - Universidade Federal de Campina Grande (UFCG) Paulo Maciel - Universidade Federal de Pernambuco (UFPE) Paulo Pires - Universidade Federal do Rio de Janeiro (UFRJ) Pedro Santos Neto - Universidade Federal do Piauí (UFPI) Raphael Camargo - Universidade Federal do ABC (UFABC) Ricardo Lima - Universidade Federal de Pernambuco (UFPE) Rita Suzana Pitangueira Maciel - Universidade Federal da Bahia (UFBA) Roberta Coelho - Universidade Federal do Rio Grande do Norte (UFRN) 8

9 Rohit Gheyi - Universidade Federal de Campina Grande (UFCG) Rosana Braga - Universidade de São Paulo (USP) Rosângela Penteado - Universidade Federal de São Carlos (UFSCar) Sandra Fabbri - Universidade Federal de São Carlos (UFSCar) Sérgio Soares - Universidade Federal de Pernambuco (UFPE) Tayana Conte - Universidade Federal do Amazonas (UFAM) Tiago Massoni - Universidade Federal de Campina Grande (UFCG) Vander Alves - Universidade de Brasília (UnB) Avaliadores Externos / Additional Reviewers Alex Alberto - Universidade de São Paulo (USP) Davi Viana - Universidade Federal do Amazonas (UFAM) Heitor Costa - Universidade Federal de Lavras (UFLA) Jacilane Rabelo - Universidade Federal do Amazonas (UFAM) Ricardo Terra - Universidade Federal de Lavras (UFLA) 9

10 Comitê organizador / Organizing Committee COORDENAÇÃO GERAL Baldoino Fonseca - Universidade Federal de Alagoas (UFAL) Leandro Dias da Silva - Universidade Federal de Alagoas (UFAL) Márcio Ribeiro - Universidade Federal de Alagoas (UFAL) COMITÊ LOCAL Adilson Santos - Centro Universitário Cesmac (CESMAC) Elvys Soares - Instituto Federal de Alagoas (IFAL) Francisco Dalton Barbosa Dias - Universidade Federal de Alagoas (UFAL) COORDENADORES DO COMITÊ DE PROGRAMA DA SESSÃO DE FERRAMENTAS Uirá Kulesza - Universidade Federal do Rio Grande do Norte (UFRN) Valter Vieira de Camargo - Universidade Federal de São Carlos (UFSCar) 10

11 Índice / Table of Contents ArchViz: a Tool to Support Architecture Recovery Research Vanius Zapalowski, Ingrid Nunes e/and Daltro Nunes Uma Ferramenta para Verificação de Conformidade Visando Diferentes Percepções de Arquiteturas de Software Izabela Melo, Dalton Serey e/and Marco Túlio Valente JExtract: An Eclipse Plug-in for Recommending Automated Extract Method Refactorings Danilo Silva, Ricardo Terra e/and Marco Túlio Valente ModularityCheck: A Tool for Assessing Modularity using Co-Change Clusters Luciana Silva, Daniel Félix, Marco Túlio Valente e/and Marcelo de Almeida Maia Nuggets Miner: Assisting Developers by Harnessing the StackOverflow Crowd Knowledge and the GitHub Traceability Eduardo Campos, Lucas Batista Leite de Souza e/and Marcelo de Almeida Maia NextBug: A Tool for Recommending Similar Bugs in Open-Source Systems Henrique Rocha, Guilherme Oliveira, Humberto Marques e/and Marco Túlio Valente FunTester: A fully automatic functional testing tool Thiago Pinto e/and Arndt von Staa JMLOK2: A tool for detecting and categorizing nonconformances Alysson Milanez, Dennis de Sousa, Tiago Massoni e/and Rohit Gheyi A Rapid Approach for Building a Semantically Well Founded Circus Model Checker Alexandre Mota e/and Adalberto Farias SPLConfig: Product Configuration in Software Product Line Lucas Machado, Juliana Pereira, Lucas Garcia e/and Eduardo Figueiredo SPLICE: Software Product Lines Integrated Construction Environment Bruno Cabral, Tassio Vale e/and Eduardo Almeida

12 FlexMonitorWS: uma solução para monitoração de serviços Web com foco em atributos de QoS Rômulo Franco, Cecilia Rubira, e/and Amanda Nascimento A Code Smell Detection Tool for Compositional-based Software Product Lines Ramon Abílio, Gustavo Vale, Johnatan Oliveira, Eduardo Figueiredo e/and Heitor Costa AccTrace: Considerando Acessibilidade no Processo de Desenvolvimento de Software Rodrigo Branco, Maria Istela Cagnin e/and Debora Paiva Spider-RM: Uma Ferramenta para Auxílio ao Gerenciamento de Riscos em Projetos de Software Heresson Mendes, Bleno Silva, Diego Abreu, Diogo Ferreira, Manoel Victor Leite, Marcos Leal e/and Sandro Oliveira A Tool to Generate Natural Language Text from Business Process Models. Raphael Rodrigues, Leonardo Azevedo, Kate Revoredo e/and Henrik Leopold

13 ArchViz: a Tool to Support Architecture Recovery Research Vanius Zapalowski 1, Ingrid Nunes 1, and Daltro José Nunes 1 1 Prosoft Research Group Instituto de Informática Universidade Federal do Rio Grande do Sul, Brazil {vzapalowski,ingridnunes,daltro}@inf.ufrgs.br Abstract. In order to produce documented software architectures, many software architecture recovery methods have been proposed. Developing such methods involves a not trivial data analysis, and this calls for different data visualisations, which help compare predicted and target software architectures. Moreover, comparing methods is also difficult, because they use divergent measurements to evaluate their performance. With the goal of improving and supporting architecture recovery research, we developed the ArchViz tool, which is presented in this paper. Our tool provides metrics and visualisations of software architectures, supporting the analysis of the output of architecture recovery methods, and possibly the standardisation of their evaluation and comparison. Video link: 1. Introduction An explicitly documented software architecture plays a key role in software development as it keeps track of many design decisions and helps maintain consistence in the developed software. It provides useful knowledge to deal with the software evolution accordingly to planned architectural principles captured by a high-level model usually represented in a graphical model. Despite of the importance of having a documented architecture, many systems lack proper architectural documentation. Software architecture recovery (SAR) methods aid software architects in the task of inspecting the source code to understand an implemented software when there is no architectural documentation available or it is outdated. SAR methods have been proposed to reduce the human effort needed to perform this task. Such methods use different inputs (e.g., dependencies, semantics, and patterns) and a variety of metrics (e.g., precision, recall, and distance) to produce recovered architectures. Moreover, SAR studies focus on the measurement of certain properties lacking a visual representation of their recovered and target architectures [Ducasse and Pollet 2009]. Consequently, the process of evaluating and analysing results of a SAR method is a complex and time-consuming activity [Garcia et al. 2013], given the combination of possible sources of information, evaluation metrics, and results analysis. Different tools have been proposed in the literature to improve software architectures, e.g. [Lindvall and Muthig 2008], and most of them focus on checking architecture conformance or compliance. On the other hand, ArchViz, the tool introduced in this paper, has the goal of supporting SAR research. Therefore, the purpose of this new tool is the key difference from other existing tools in the context on software architecture. In previous work [Zapalowski et al. 2014], we faced the problems discussed above in a 13

14 study to evaluate the relevance of code-based characteristics to identify modules of recovered architectures. To address such problems, we implemented a web-based tool, named ArchViz, able to partially automate the analysis of recovered architectures, as no other similar tool was available. Therefore, our tool emerged from our own (real) need for supporting our research, and its effectiveness is indicated by the research results we were able to derive from our data analysis with the support of ArchViz. Our tool provides evaluation metrics of recovered software architectures using well-known information retrieval measures. In addition, our tool generates three visualisations (tree-map, module dependencies graph and element dependencies graph) of recovered (or predicted) and target architectures in order to help understand the results of the recovery process. This paper is organised as follows. Section 2 describes ArchViz, presenting its main features. Next, Section 3 discusses existing tools related to software architecture visualisation and their support to SAR. Finally, we present the final remarks in Section The ArchViz Tool This section presents the contributions that ArchViz provides to support SAR research. Our intended users are researchers, which are able to compare a target architecture (oracle) with a recovered architecture, and verify metrics that indicate the classification effectiveness. First, we describe in Section 2.1 ArchViz s architecture and how to use it, presenting its user interface. In Section 2.3, we present the two main features of ArchViz: (i) measurement of well-known information retrieval metrics, which are adopted in the context of general-purpose classification problems, and are detailed in Section 2.2; and (ii) plotting of three different graphical models of recovered and target architectures. In order to illustrate the functionalities of ArchViz, we present the evaluation and analysis of one of the five subject system used in our previous work [Zapalowski et al. 2014], named OLIS. Thus, the metrics and visualisations presented in the remainder of this paper are extracted from OLIS, which is an agent-based product line that provides personal services for users Architecture and User Interface ArchViz is a web-based application implemented in Ruby using the Ruby on Rails (RoR) 1 framework. Consequently, the architecture of our tool follows the Model View Controller architectural pattern adopted by RoR. In our implementation, the main task of each architectural module are: the Model represents and stores imported architectures; the Controller calculates the implemented evaluation metrics; and the View plots the architectural visualisations using D3 2 JavaScript library. To start using ArchViz, users should import a software project using the Import Project option available in the menu, which allows users to provide input data. Projects details must be specified in two Comma-Separated Values (CSV) files: (i) the first with information about to which module each architectural element belongs in the recovered and in the target architectures; and (ii) the second with all the dependencies between architectural elements. The adequate format of such files is detailed in the functionality 1 Available at 2 Available at 14

15 of importing projects. After importing a project, our tool summarises and presents the data related to it, as illustrated in Figure 1. Figure 1. ArchViz User Interface Architecture Recovery Metrics We selected general purpose metrics to evaluate multi-class prediction of machine learning algorithms, defined by Sokolova and Lapalme [Sokolova and Lapalme 2009], as the metrics to evaluate the quality of the recovered architecture, because they are also applicable to our context. Using these metrics, we are able to standardise the analysis of results of SAR methods, given that this set of metrics only needs the recovered and target architectures to be calculated. The metrics definitions are given in Table 1, following this notation: K is the set of proposed architectural modules; i is a module such that i K; K represents the cardinality of K; tp i are the true positives of i; tn i are the true negatives of i; fp i are the false positives of i; fn i are the false negatives of i; and i is the number of elements in the module i. The definition of what is an element is specific to each recovery method: it can be a class, a components, a procedure, and so on Architecture Visualizations Because of the complexity of large-scale software, it is difficult to represent it in a single simple model. Software architecture visualisation helps stakeholders involved with software development to understand the concepts adopted in their applications using a high-level representation. Most of the SAR approaches focus on presenting metrics to evaluate their results, and they do not provide architectural visualisations that enable a finer-grained analysis of results. This is helpful particularly to researchers, because humans can derive findings based on visual models and data abstractions better than machines [Keim et al. 2008]. Therefore, we proposed and implemented three visualisations that aim to improve the analysis and comparison of recovered and target architectures. We next present the three visualisations that our tool provides: (i) Tree-map, which provides a visual analysis of the recovered architecture using a hierarchical representation (Section 2.3.1); (ii) Module Dependencies Graph, which details dependencies 15

16 Table 1. Metrics implemented in ArchViz. Metric Description Formula Precision Average Precision Average Recall The precision measures the correctness of the overall recovered architecture independently from architectural modules size. It considers only the tp of each module. To evaluate the per-module precision, the average precision measures the agreement between the recovered and the target architecture for each module. It only considers the cases where the recovered classification of the architectural elements agrees with the target architecture. By calculating the average recall, we obtain an average of the per-class effectiveness of an SAR method to identify architectural modules. To calculate the average recall, we consider the tp and fn of each module. K K i=1 K i=1 tp i i tp i tp i +fp i i=1 K K tp i tp i +fn i i=1 K Average Accuracy Average F-measure The average accuracy measures the correctness of each module and the distinctness from the other modules. It evaluates the correct architectural elements, tp and tn, of each recovered module. The average accuracy is useful to measure the recovery method per-module effectiveness. The average F-measure combines the average precision and average recall to provide one metric that indicated both the overall correctness and the module prediction quality. This metric measures the relationship between correctly predicted elements and those given by a metric based on a per-module average. K i=1 tp i +tn i tp i +tn i +fp i +fn i K 2 avg prec avg rec (avg prec+avg rec) among modules, and their respective sizes (Section 2.3.2); and (iii) Element Dependencies Graph, which presents a fine-grained visualisation of architectural elements showing only the inter-modules dependencies (Section 2.3.3). Note that, in our tool, all visualisations are shown together with the metrics described previously. Moreover, the last two visualisation types show two graphs corresponding to the recovered and target architectures, allowing a side-by-side comparison Tree-map The tree-map visualisation is a two-dimensional hierarchy graph created by Shneiderman [Shneiderman 1992] to analyse hard disk usage. Similarly to software architectures, the disk folder hierarchy represents categories, and files are leaf elements that are in a folder. A common problem to visualise the data is to represent the relevance of more than two attributes in a single chart. In the hard disk usage, for example, files have a parent folder and size, which means that we need to plot separate graphs for file usage and for folder usage, to visualise both attributes using a Cartesian coordinate system. Shneiderman proposed a tree-organisation structure, where each element is represented as a rectangle and attributes can be specified by colours, sizes or hierarchy position of the rectangles. Then, the hard disk usage can be represented in a single graph, where folders 16

17 are outside rectangles with its file elements inside. Additionally, their sizes can represent the amount of disk usage. Figure 2. Tree-map Visualisation of the OLIS Recovered Architecture. As in hard disk usage, a software architecture typically has a hierarchical structure: architectural elements belong to modules. So, we mapped software architectures to the tree-map representation, in order to understand the predicted results of the recovered and target architectures. We represent both architecture versions in a single graph to visualise predicted results. Figure 2 is an example of the tree-map visualisation, where the outer rectangles are the target architectural modules, the inner rectangles are the architectural elements with their name, and colours of architectural elements are assigned according to the recovered module to which they belong. The target module names, shown in the upper right hand side of Figure 2, are possibly from a manually recovered architecture. In the case when the recovered architecture matches the target architecture completely, all outer rectangles are coloured by only one colour and each outer rectangle has a different colour from the others. Figure 2 illustrates a scenario in which the recovered architecture differs from the target architecture. As can be seen in Figure 2, the outer rectangles major colour defines its recovered architectural module, i.e. in Figure 2, the lower left rectangle corresponds to the Data module and the upper left rectangle corresponds to the UI module. Figure 2 thus indicates the recovered modules and the assignment distribution of architectural elements to the target modules. Furthermore, this representation confirms the information provided by the module accuracy: (i) if a single colour is concentrated in a single module, accuracy is high; and (ii) otherwise, it is low. Additionally, the tree-map visualisation combines the recovered and target architecture allowing a visual comparison of the architectural measures extracted from a SAR method Module Dependencies Graph The module dependencies graph is a coarse-grained view that aims to provide an overall view of the system. It is similar to the the most common notation to represent architectures used, where architectural modules are represented as nodes and communication among them as edges. This representation improves architecture understanding, because 17

18 it exposes the main system concepts, and presents them in a concise way, showing both the architectural modules and how they communicate to each other. Figure 3 shows an example of this notation, presenting the architecture of OLIS, which uses a layered architectural pattern. Figure 3. Example of a Typical Architecture Model. Although this traditional model presents, in a high-level view, the main architectural modules and their communication, it lacks important details needed to compare the recovered and target architectures. Furthermore, it undertakes architectural information that could be represented in an architectural visualisation, such as the intensity of the dependency among two modules, which is helpful for understanding a recovered architecture. Analysing the representation of the OLIS architecture in Figure 3, it is impossible to identify the intensity of dependencies among modules. In this usual representation, the module sizes represent just the existence of architectural modules and they do not correspond to the size of the modules in the system. To enrich the information provided by this traditional module dependencies graph, we implemented the module visualisation with modifications. The same architecture presented in Figure 3 is represented in ArchViz as shown in Figure 4(a). In ArchViz, the modules are defined by their size and colour. Their colours characterise each module role and their sizes are proportional to the number of architectural elements that they have. Additionally, we add labels to the modules nodes with the architectural role and number of elements that they have. The edges represent the communication among modules specifying the dependency hierarchy, i.e. an edge in red means that the red module uses the module that it is linked to. Moreover, the edge thickness is proportional to the dependency level that the modules has, e.g. the dependency among the Agent and Model modules is stronger than that of the Agent and Business modules, in the presented OLIS architecture Element Dependencies Graph The element dependencies graph is the finest-grained architectural visualisation that ArchViz provides. It presents the dependencies among architectural elements classifying them into architectural modules. This visualisation represents elements as nodes, and the node colours characterise to which module they belong. The edges are coloured by the inter-module dependency, similarly to the module dependencies graph. This representation disregards the intra-module dependencies to reduce the number shown edges otherwise the graph would provide too much information. As an example, we present the OLIS target architecture using the element dependencies graph in Figure 4(b). The graph 18

19 (a) Module Dependencies. (b) Element Dependencies. Figure 4. OLIS Graphs. shows: (i) five modules, (ii) the inter-modules dependencies, and (iii) the 211 elements of the system. 3. Related Work Besides previous work we already discussed, which defines adopted metrics and inspired our visualisations, there are important studies that are closely related to ArchViz. We discuss these studies in this section. A metric that is commonly used to evaluate recovered architectures is the MoJo distance metric [Tzerpos and Holt 1999]. It is a domain-specific metric to measure the number of steps needed to obtain the target architecture, given a recovered architecture. It was not implemented in ArchViz due to problems on its application in some specific SAR methods that were recently reported [Garcia et al. 2013]. Bunch [Mancoridis et al. 1999] is one of the first tools that support all the software decomposition process, from the manual investigation to the recovered architecture visualisation. It generates subsystems and creates a fine-grained representation of the software based on the architectural element dependencies, similar to that presented in Figure 3. Differently from ArchViz, Bunch s objective is to obtain a recovered architecture. Thus, it does not comprise the evaluation metrics and comparisons of the recovered against the target architecture. An approach that compares architectural models was proposed by Beck and Diehl [Beck and Diehl 2010]. Their approach evaluates the similarity of architectural element dependencies using a matrix dependencies representation. This method points out similarities and divergences of the architectures in the architectural elements level. Therefore, it analyses only the similarities in the elements, and modules information, such as modules communication, are not taken into account. 4. Conclusion Software architecture recovery (SAR) methods have been proposed to decrease the effort needed to maintain up-to-date architectural documentation of software systems. These 19

20 methods apply different evaluation metrics to analyse recovered architectures and to be used as basis to derive their findings. In addition, SAR methods often lack a visual representation of their recovered and target architectures, which are essential to analyse results. We built ArchViz to address these issues to support SAR research, by providing measurement of evaluation metrics and architecture visualisation representations. The implemented metrics provide statistical evidences of the level of agreement between recovered and target architectures. Moreover, we provided visualisations and side-by-side comparisons of recovered and concrete architectures contribute with useful knowledge to understand results of a method, which helps in the process of refining and improving it. Thus, ArchViz is a tool that reduces the effort needed to analyse the recovered architectures, providing a useful set of metrics together with an automatic generation of architectural models to support the SAR research. It is important to highlight that one of the subject systems investigated in our research on SAR using ArchViz is from the industry. However, this system was not presented in this paper (but the OLIS) due to a confidentiality agreement. Although we used the tool in a real world scenario, we have not used it with (very) large scale systems, and this is part of our future work. References [Beck and Diehl 2010] Beck, F. and Diehl, S. (2010). Visual comparison of software architectures. In International Symposium on Software Visualization, pages [Ducasse and Pollet 2009] Ducasse, S. and Pollet, D. (2009). Software architecture reconstruction: A process-oriented taxonomy. Trans. Softw. Eng., pages [Garcia et al. 2013] Garcia, J., Ivkovic, I., and Medvidovic, N. (2013). A comparative analysis of software architecture recovery techniques. In International Conference on Automated Software Engineering, pages [Keim et al. 2008] Keim, D., Mansmann, F., Schneidewind, J., Thomas, J., and Ziegler, H. (2008). Visual analytics: Scope and challenges. In Visual Data Mining, pages [Lindvall and Muthig 2008] Lindvall, M. and Muthig, D. (2008). architecture gap. Computer, 41(6): Bridging the software [Mancoridis et al. 1999] Mancoridis, S., Mitchell, B. S., Chen, Y., and Gansner, E. R. (1999). Bunch: A clustering tool for the recovery and maintenance of software system structures. In International Conference on Software Maintenance, pages [Shneiderman 1992] Shneiderman, B. (1992). Tree visualization with tree-maps: 2-d spacefilling approach. Transactions Graphics, pages [Sokolova and Lapalme 2009] Sokolova, M. and Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Inform. Process. Manag., pages [Tzerpos and Holt 1999] Tzerpos, V. and Holt, R. C. (1999). Mojo: A distance metric for software clusterings. In Working Conference on Reverse Engineering, pages [Zapalowski et al. 2014] Zapalowski, V., Nunes, I., and Nunes, D. (2014). Revealing the relationship between architectural elements and source code characteristics. In International Conference on Program Comprehension, pages

21 Uma Ferramenta para Verificação de Conformidade Visando Diferentes Percepções de Arquiteturas de Software Izabela Melo 1, Dalton Serey 1, Marco Tulio Valente 2 1 Laboratório de Práticas de Software Departamento de Sistemas e Computação Universidade Federal de Campina Grande (UFCG) Campina Grande PB Brasil 2 Departamento de Ciência da Computação Universidade Federal de Minas Gerais (UFMG) Belo Horizonte MG Brasil izabela@copin.ufcg.edu.br, dalton@dsc.ufcg.edu.br, mtov@dcc.ufmg.br Abstract. Current architecture conformance checking tools do not take into account the different levels of abstraction for defining architectural rules. For example, architects often use a descriptive language, whereas developers prefer automatic and testable technologies. In this paper we present ARTT, an architecture conformance checking tool which allows the automatic transformation between different architectural representations. ARRT extracts rules defined in a document, written in a declarative language, and generates design tests using the DesignWizard API. The tests generated by ARTT agreed with 89,12% of the tests written by a specialist. Resumo. As atuais ferramentas de verificação de conformidade arquitetural não levam em consideração os diferentes níveis de abstração para definir regras arquiteturais. Por exemplo, arquitetos de software normalmente usam linguagem descritiva, enquanto os desenvolvedores preferem tecnologias automáticas e testáveis. Este trabalho apresenta ARTT, uma ferramenta de verificação arquitetural que considera a existência de diferentes níveis de abstração arquitetural, permitindo a transformação automática entre eles. ARTT extrai regras definidas em um documento, descritas em uma linguagem declarativa, e as transforma em testes de design, utilizando DesignWizard. Os testes gerados automaticamente por ARTT concordam com 89,12% dos testes escritos por um especialista. URL do vídeo: 1. Introdução Arquitetura de software envolve um conjunto de decisões e regras arquiteturais que estabelecem relações entre os componentes de uma aplicação [Jansen and Bosch 2005], sendo um dos artefatos mais importantes no ciclo de vida de um sistema [Knodel and Popescu 2007]. Ela interfere nos objetivos de negócios, objetivos funcionais e na qualidade do sistema. A arquitetura, uma vez criada, é raramente atualizada. Aliado a esse fato, restrições técnicas de desenvolvimento e requisitos conflitantes de qualidade que podem surgir durante a implementação são fatores que geram violações arquiteturais. Com a evolução do sistema, o número de violações tende a crescer e não serem removidas [Brunet et al. 2012]. Para evitar o surgimento de problemas causados pelo acúmulo de violações arquiteturais e garantir a adequação da arquitetura, diversas técnicas de verificação de conformidade arquitetural já foram propostas [Passos et al. 2010] [Knodel and Popescu 2007]. Tais técnicas verificam se a 21

22 implementação do sistema está de acordo com a arquitetura planejada pelos desenvolvedores e arquitetos [Clements et al. 2003]. Garantir a conformidade arquitetural de um sistema é importante para permitir reuso, compreensão do sistema, consistência da documentação com a implementação, controle da evolução do sistema e permitir a discussão entre os membros da equipe sobre a estrutura do sistema [Knodel and Popescu 2007]. Porém, as atuais técnicas para verificação arquitetural não levam em consideração os diferentes níveis de abstração da arquitetura. Enquanto a equipe de arquitetura tende a preferir linguagens que expressam propriedades de forma declarativa e/ou descritiva, equipes de desenvolvedores tendem a preferir linguagens de natureza comportamental e/ou executáveis para expressar restrições e/ou regras arquiteturais. Com isso, nem sempre a comunicação entre os diferentes níveis de abstração é consistente. Além disso, transformar uma abstração em outra pode ser uma atividade dispendiosa e sujeita a erros. A solução proposta neste artigo teve seu início em uma cooperação com a equipe responsável pelas atividades de verificação arquitetural de software da Dataprev. No contexto dessa empresa, equipes distintas executam as atividades de verificação arquitetural e de desenvolvimento de software. Um dos arquitetos dessa empresa afirmou: De início, pensamos em utilizar UML para escrever a arquitetura do software. Porém, na prática, UML não é utilizado. É pesado, estrito e sofre mudanças constantes. Nós optamos por escrever as restrições arquiteturais em português e utilizar DesignWizard para realizar a checagem de conformidade. Mas escrever, manualmente, os testes de design em Java, apesar de bem aceito na equipe de desenvolvimento, tomaria tempo da equipe de arquitetura e/ou da equipe de desenvolvimento. A equipe de arquitetura precisa de uma linguagem que se aproxime do português ou do inglês, que seja mais descritiva, e de uma ferramenta que transforme nossas regras em testes de design. Com isso, este artigo apresenta ARTT (Architectural Representation Transformation Tool), uma ferramenta de transformação automática entre os níveis de abstração das equipes de desenvolvimento e arquitetura. O objetivo central é permitir uma comunicação mais rápida e consistente entre essas equipes. De um lado, a equipe de arquitetura define as regras arquiteturais em uma linguagem declarativa, inspirada em DCL (Dependency Constraint Language) [Terra and Valente 2009], uma linguagem de domínio específico criada para representar arquiteturas de software. Do outro lado, a equipe de desenvolvimento terá as definições das regras arquiteturais escritas em testes de design, os quais poderão, então, ser incorporados ao conjunto de testes funcionais do sistema. Os testes de design são escritos em Java (linguagem mais próxima da equipe de desenvolvimento) com o auxílio da API DesignWizard [Brunet et al. 2009], que dá suporte a análise estática de programas Java. A transformação automática proposta economiza tempo no processo de verificação arquitetural e garante que cada equipe utilize seu próprio nível de abstração. Como estudo de caso da ferramenta, foi utilizado um sistema real (e-pol - Sistema de Gestão das Informações de Polícia Judiciária). Nosso estudo mostrou que os testes gerados automaticamente por ARTT concordam com 89,12% dos testes gerados manualmente por um especialista. A discordância entre os dois conjuntos de testes foi causada por defeitos na API DesignWizard. Estes, já foram comunicados e discutidos com equipe de evolução da API. 22

23 O restante deste artigo está organizado da seguinte forma. A seção 2 apresenta a ferramenta ARTT. A seção 3 apresenta um estudo de caso. A seção 4 apresenta os trabalhos relacionados e a seção 5 apresenta as conclusões. 2. A Ferramenta ARTT ARTT é uma ferramenta de verificação que leva em consideração dois níveis de abstração arquitetural: o da equipe de arquitetura, utilizando uma linguagem simples e declarativa para descrever regras arquiteturais; e o da equipe de desenvolvimento, utilizando uma linguagem de programação para escrever regras arquiteturais no formato de testes automáticos. A Figura 1 apresenta a ideia geral de ARTT. Documentos arquiteturais, escritos numa extensão.pdf, são recebidos como parâmetro de entrada. Nesses documentos, as regras arquiteturais são precedidas de #archtest. Com isso, é possível detalhar, em linguagem natural, informações sobre a arquitetura que sejam relevantes para o leitor do documento. Em seguida, as regras são extraídas e armazenadas em um arquivo com extensão.arch. Caso o usuário da ferramenta não queira escrever um arquivo em.pdf, ele pode também passar como parâmetro de entrada o arquivo.arch. Essa especificação é, então, transformada em testes de design escritos em Java, utilizando DesignWizard e JUnit. Os testes, neste momento, estão prontos para serem executados, de forma a capturar violações arquiteturais. Figura 1. Funcionamento da ferramenta. Acredita-se que utilizar uma linguagem declarativa para definir regras arquiteturais seja mais simples e rápido para o arquiteto (usuário da ferramenta em questão). Essa linguagem foi inspirada em DCL e possui a sintaxe apresentada na Figura 2. A diferença entre a linguagem utilizada nessa ferramenta e DCL é pequena. As modificações adicionadas por ARTT tiveram como objetivo apenas aproximá-la da língua inglesa. Por exemplo, enquanto em DCL define-se um módulo com module NOME: DIRETORIO, na linguagem de ARTT utiliza-se module NOME is DIRETORIO ou #archtest module NOME is DIRETORIO (se a definição for realizada num arquivo.pdf ). Por outro lado, testes de design são adequados para representar a arquitetura para a equipe de desenvolvimento, pois são escritos em uma linguagem mais próxima dessa equipe, são verificáveis durante a execução dos testes funcionais e podem garantir integridade e consistência arquitetural. Porém, escrever e especificar testes de design normalmente é uma tarefa dispendiosa para arquitetos de software. Portanto, foi criado um tradutor que transforma automaticamente uma abstração na outra para que a comunicação 23

24 Figura 2. Sintaxe da versão de DCL utilizada por ARTT. Tabela 1. Possíveis entradas da ferramenta. Documento Arquitetural em.pdf O módulo A é formado pelas classes Exemplo1 e Exemplo2. #archtest module A is Exemplo1, Exemplo2 O módulo B é formado pelas classes com endereço br.gov.classesb.* #archtest module B is br.gov.classesb Uma das regras arquiteturais definidas pelos arquitetos determina que A não pode acessar B #archtest rule A cannot-access B Documento Arquitetural em.arch module A is Exemplo1, Exemplo2 module B is br.gov.classesb rule A cannot-access B seja mais rápida e consistente. As transformações são realizadas seguindo as regras apresentadas na Tabela 2. Como exemplo, a regra definida pelo arquiteto na Tabela 1 foi transformada, por ARTT, no teste de design apresentado no Algoritmo Estudo de Caso O objeto do estudo de caso é o projeto e-pol - Sistema de Gestão das Informações de Polícia Judiciária, desenvolvido em parceria pela Polícia Federal do Brasil e a Universidade Federal de Campina Grande. Os arquitetos do projeto e-pol definiram três módulos e seis regras arquiteturais básicas. As regras foram escritas em um.pdf na linguagem proposta pela ferramenta. ARTT foi executado com sucesso, transformando as regras definidas em testes de design. Com a execução dos testes de design, foram detectadas 1489 violações arquiteturais distribuídas em 4 regras. Em seguida, um survey foi aplicado aos arquitetos do projeto e-pol com o objetivo de capturar a sua percepção com relação à ferramenta proposta. Todos os arquitetos envolvidos no experimento consideraram a linguagem expressiva, simples e fácil de ser utilizada. Além disso, eles consideraram que demandará um tempo para que os 24

25 Tabela 2. Exemplos de transformações. Regra Pseudocódigo module M is S exceptclasses C for element in S: for class in allclassesdesignwizard: if S contains class and C not contains class: M.add(c) rule A cannot-access B exceptclasses C rule only A can-implement B rule A cannot-extends B for a in A: for element in a.getcalleeclasses(): assert B not contains element or C contains element for b in B: for element in b.getentitiesthatimplements(): assert A contains element for a in A: for b in B: assert not a.extendsclass(b) membros da equipe se acostumem com a linguagem e a abordagem de testes contínuos e automáticos. Porém, esse tempo seria maior se os testes de design tivessem que ser escritos manualmente. O plano de verificação arquitetural definido por eles não mudará, visto que apenas será inserida uma ferramenta de transformação. Um dos arquitetos citou que Apesar de existir um custo, é importante implementar essa abordagem para que aumente a expectativa de vida útil do sistema. Com o objetivo de verificar se os testes gerados automaticamente pela ferramenta realmente capturam violações arquiteturais, foram inseridas algumas violações mutantes no código. Como mostrado na Figura 3, as violações inseridas foram realmente capturadas pelos testes (regras 2, 4, 5 e 6). Por fim, para verificar se os testes gerados automaticamente estavam corretos, foi solicitado que um especialista em testes de design (usando DesignWizard) escrevesse os testes para as seis regras manualmente. Os resultados das execuções dos testes foram comparados (Figura 4) e 162 das violações encontradas (10,87%) pelos testes gerados automaticamente não foram encontradas pelos testes do especialista. Enquanto isso, duas das violações encontradas pelos testes do especialista (0,13%) não foram encontradas pelos testes gerados automaticamente. Nas regras que apontaram inconsistência no número de violações entre os dois conjuntos de testes, foi observado que a única diferença entre as implementações era que o especialista utilizou o método getcallerclasses() do DesignWizard, enquanto os testes 25

26 Algoritmo 1. Teste de design gerado automaticamente pela ferramenta. 1 / / A cannot a c c e s s B 2 p u b l i c void t e s t R u l e 1 ( ) { 3 System. o u t. p r i n t l n ( Regra : A cannot a c c e s s B ) ; 4 Set<ClassNode> a l l C l a s s e s E x c e p t = new HashSet<ClassNode > ( ) ; 5 f o r ( ClassNode c a l l e r : A) { 6 Set<ClassNode> c a l l e e A = new HashSet<ClassNode > ( ) ; 7 c a l l e e A. a d d A l l ( c a l l e r. g e t C a l l e e C l a s s e s ( ) ) ; 8 i f (! c a l l e e A. isempty ( ) ) { 9 f o r ( ClassNode c a l l e e : c a l l e e A ) { 10 t r y { 11 A s s e r t. a s s e r t T r u e ( a l l C l a s s e s E x c e p t. c o n t a i n s ( c a l l e e ) 12 (! B. c o n t a i n s ( c a l l e e ) ) ) ; 13 } catch ( A s s e r t i o n F a i l e d E r r o r e ) { 14 System. o u t. p r i n t l n ( c a l l e r. getname ( ) + dependson 15 + c a l l e e. getname ( ) ) ; 16 } 17 } 18 } 19 } 20 } Figura 3. Comparação da execução dos testes gerados automaticamente em código com e sem mutantes. gerados automaticamente utilizam o método getcalleeclasses(). Após estudo e inspeção do código do DesignWizard, concluiu-se, então, que a diferença obtida entre os resultados dos dois conjuntos de testes refere-se à API utilizada. Este fato foi comunicado à equipe de evolução da API e já está em processo de correção. Assim, acredita-se que seja confiável utilizar ARTT para transformar a representação arquitetural descritiva em testes de design. Há economia de tempo, evitando que os arquitetos precisem escrever manualmente os testes de design. Por outro lado, os desenvolvedores podem verificar continuamente se há violações arquiteturais sendo inseridas pois os testes podem ser incorporados ao conjunto de testes do projeto e, além disso, os testes de design são escritos numa linguagem próxima da equipe de desenvolvimento. Sabemos que não é possível generalizar os resultados deste estudo de caso. Além disso, os defeitos encontrados na API DesignWizard afetam nossos resultados. Somente após a correção da API será possível realizar um estudo mais aprofundado. 26

27 Figura 4. Comparação da execução dos testes gerados automaticamente com testes escritos pelo especialista. 4. Trabalhos Relacionados Na área de evolução de software, há várias pesquisas com o objetivo de encontrar formas simples e práticas de verificação arquitetural. Em sua maioria, a automatização de alguma etapa no processo de verificação é uma questão de grande importância, visto que realizar uma verificação arquitetural de forma manual, muitas vezes, se torna uma atividade complexa, principalmente se o sistema é de larga escala [Postma 2003]. Existem várias abordagens para realizar a verificação arquitetural, mas nem sempre são usadas. Muitas vezes isso ocorre porque a linguagem utilizada na verificação é diferente da linguagem usada na aplicação [Brunet et al. 2009]. Nesse contexto, Brunet et al. desenvolveram uma API, chamada DesignWizard, que permite escrever testes de design para implementações em Java, usando JUnit. Com DesignWizard, os desenvolvedores podem ter um melhor entendimento do sistema, já que utiliza a mesma linguagem de desenvolvimento. Além disso, a documentação da arquitetura passa a ser executável, facilitando a tarefa de conformidade arquitetural. Como podem ser agregados ao conjunto de testes funcionais, testes de design são úteis para garantir que as decisões arquiteturais sejam seguidas (sem demandar uma análise manual). Porém, os arquitetos de software necessitam de mais tempo para entender a API e escrever as regras arquiteturais antes de serem repassadas para os desenvolvedores. Além disso, nenhuma dessas técnicas de verificação arquitetural leva em consideração os diferentes níveis de abstração entre as equipes, nem tão pouco, uma transformação automática entre eles. Enquanto para a equipe de arquitetura um nível declarativo é mais recomendável, para a equipe de desenvolvimento um nível executável e testável tende a ser mais adequado. Há dois trabalhos que tratam de transformações entre diferentes níveis de abstração. Pires et al. propuseram uma técnica para transformar automaticamente diagramas de classe em UML para testes de design [Pires et al. 2008]. Rabelo et al. propuseram uma técnica para transformar automaticamente diagramas de sequência em UML para testes de design [Rabelo and Pinto 2012]. Apesar de ambos serem tradutores de níveis de abstrações diferentes, nenhum deles se refere à verificação arquitetural. 5. Conclusões DesignWizard é capaz de agilizar o processo de verificação de conformidade arquitetural [Brunet et al. 2011]. Portanto, acredita-se que a ferramenta ARTT pode introduzir econo- 27

28 mia de tempo no processo de verificação arquitetural, pois permite transformar automaticamente os dois níveis de abstração (da equipe de arquitetura e da equipe de desenvolvimento). Os desenvolvedores terão sua representação arquitetural escrita numa linguagem próxima a de desenvolvimento e, além disso, poderão incorporar os testes de design ao conjunto de testes funcionais do projeto. A comunicação entre as duas equipes, portanto, pode ser mais rápida e consistente. Pelos estudos realizados, a ferramenta transforma a linguagem definida neste trabalho em testes de design de forma satisfatória, possuindo uma concordância de 89,12% com os testes escritos manualmente por um especialista. Como trabalho futuro, esta ferramenta pode ser estendida para outras linguagens. Para tanto, é preciso implementar a API DesignWizard para as linguagens de saída desejadas e adaptar ARTT. Ainda como trabalho futuro, pretende-se realizar uma pesquisa qualitativa para avaliar como os arquitetos de software realizam atividades de verificação arquitetural. Espera-se entender melhor porque a indústria não utiliza ferramentas automáticas para tal atividade, apesar de existirem diversas abordagens na academia. Referências Brunet, J., Bittencourt, R., Guerrero, D., and Figueredo, J. (October 2012). On the Evolutionary Nature of Architectural Violations. Proceedings of the 19th International Conference on Reverse Engineering (WCRE 2012). Brunet, J., Guerrero, D., and Figueredo, J. (2011). Structural Conformance Checking with Design Tests: An Evaluation of Usability and Scalability. ICSM. Brunet, J., Guerrero, D., and Figueredo, J. (May 2009). Design Tests: An Approach to Programmatically Check you Code Against Design Rules. Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), New Ideas and Emerging Results. Clements, P., Garlan, D., Little, R., Nord, R., and Stafford, J. (2003). Documenting software architectures: views and beyond. Proceedings of the 25th International Conference on Software Engineering, pages Jansen, A. and Bosch, J. (2005). Software architecture as a set of architectural design decisions. Proceedings of the 5th Working Conference on Software Architecture, pages Knodel, J. and Popescu, D. (2007). A comparison of static architecture compliance checking approaches. In IEEE/IFIP Working Conference on Software Architecture, pages Passos, L., Terra, R., Diniz, R., Valente, M. T., and Mendonca, N. C. (2010). Static architecture conformance checking an illustrative overview. IEEE Software, 27(5): Pires, W., Brunet, J., Ramalho, F., and Guerrero, D. (2008). UML-based design test generation. 23nd ACM Symposium on Applied Computing (SAC 2008), pages Postma, A. (2003). A method for module architecture verification and its application on a large componentbased system. Information & Software Technology 45(4), pages Rabelo, J. and Pinto, S. E. (2012). Verificação de conformidade entre diagramas de sequência UML e código Java. Dissertação de Mestrado. Campina Grande, Brasil. Terra, R. and Valente, M. T. (2009). A dependency constraint language to manage object-oriented software architectures. Software: Practice and Experience, 32(12):

29 JExtract: An Eclipse Plug-in for Recommending Automated Extract Method Refactorings Danilo Silva 1, Ricardo Terra 2, Marco Túlio Valente 1 1 Federal University of Minas Gerais, Brazil 2 Federal University of Lavras, Brazil {danilofs,mtov}@dcc.ufmg.br, terra@dcc.ufla.br Abstract. Although Extract Method is a key refactoring for improving program comprehension, refactoring tools for such purpose are often underused. To address this shortcoming, we present JExtract, a recommendation system based on structural similarity that identifies Extract Method refactoring opportunities that are directly automated by IDE-based refactoring tools. Our evaluation suggests that JExtract is more effective (w.r.t. recall and precision) to identify contiguous misplaced code in methods than JDeodorant, a state-of-the-art tool. Tool demonstration video Introduction Refactoring has increased in importance as a technique for improving the design of existing code [2], e.g., to increase cohesion, decrease coupling, foster maintainability, etc. Particularly, Extract Method is a key refactoring for improving program comprehension. Besides promoting reuse and reducing code duplication, it contributes to readability and comprehensibility, by encouraging the extraction of self-documenting methods [2]. Nevertheless, recent empirical research indicate that, while Extract Method is one of the most common refactorings, automated tools supporting this refactoring are most of the times underused [5, 4]. For example, Negara et al. found that Extract Method is the third most frequent refactoring, but the number of developers who apply the refactoring manually is higher than the number of those who do it automatically [5]. Moreover, current tools focus only on automating refactoring application, but developers expend considerable effort on the manual identification of refactoring opportunities. To address this shortcoming, this paper presents JExtract, a tool that implements a novel approach for recommending automated Extract Method refactorings. The tool was designed as a plug-in for the Eclipse IDE that automatically identifies, ranks, and applies the refactoring when requested. Thereupon, JExtract may aid developers to find refactoring opportunities and contribute to a widespread adoption of refactoring practices. The underlying technique is inspired by the separation of concerns design guideline. More specifically, we assume that the structural dependencies established by Extract Method candidates should be very different from the ones established by the remaining statements in the original method. The remainder of this paper is structured as follows. Section 2 describes the JExtract tool, including its design and implementation. Section 3 discusses related tools and Section 4 presents final remarks. 29

30 2. The JExtract tool JExtract is a tool that analyzes the source code of methods and recommends Extract Method refactoring opportunities, as illustrated in Figure 1. First, the tool generates all Extract Method possibilities for each method. Second, these possibilities are ranked according to a scoring function based on the similarity between sets of dependencies established in the code. Source Code JExtract Generation of Candidates Scoring Function Ranking and Filtering publicgclassgcg{ gg... ggvoidgmethodm(aga)g{ ggggfoogfg=gnewgfoo(); ggggifg(x)g{ ggggggdoa(a); ggggggintgyg=ggety(); ggggggy++; ggggggdob(); gggg} ggggsuper.methodm(); gg} gg... } A B C 1 candidate Extract Method Recommendations Figure 1. The JExtract tool This main section of the paper is organized as follows. Subsection 2.1 provides an overview of our approach for identifying Extract Method refactoring opportunities. Subsection 2.2 describes the design and implementation of the tool. Finally, Subsection 2.3 presents the results of our evaluation in open-source systems. A detailed description of the recommendation technique behind JExtract is present in a recent full technical paper [9] Proposed Approach The approach is divided in three phases: Generation of Candidates, Scoring, and Ranking Generation of candidates This phase is responsible for identifying all possible Extract Method refactoring opportunities. First, we split the methods into blocks, which consist of sequential statements that follow a linear control flow. As an example, Figure 2 presents method mouserelease of class SelectionClassifierBox, extracted from ArgoUML. We can notice that each statement is labeled using the SX.Y pattern, where X and Y denote the block and the statement, respectively. For example, S2.3 is the third statement of the second block, which declares a variable cw. public void mousereleased(mouseevent me) { S1.1 for (Button btn : buttons) { S2.1 int cx = btn.fig.getx() + btn.fig.getwidth() - btn.icon.geticonwidth(); S2.2 int cy = btn.fig.gety(); S2.3 int cw = btn.icon.geticonwidth(); S2.4 int ch = btn.icon.geticonheight(); S2.5 Rectangle rect = new Rectangle(cx, cy, cw, ch); S2.6 if (rect.contains(me.getx(), me.gety())) { S3.1 Object metatype = btn.metatype; S3.2 FigClassifierBox fcb = (FigClassifierBox) getcontent(); S3.3 FigCompartment fc = fcb.getcompartment(metatype); S3.4 fc.seteditonredraw(true); S3.5 fc.createmodelelement(); S3.6 me.consume(); S3.7 return; } } S1.2 super.mousereleased(me); } Figure 2. An Extract Method candidate in a method of ArgoUML (S3.2 to S3.5) 30

31 Second, we generate all Extract Method candidates based on Algorithm 1 (extracted from [9]). Algorithm 1 Candidates generation algorithm [9] Input: A method M Output: List with Extract Method candidates 1: Candidates 2: for all block B M do 3: n statements(b) 4: for i 1, n do 5: for j i, n do 6: C subset(b, i, j) 7: if isv alid(c) then 8: Candidates Candidates + C 9: end if 10: end for 11: end for 12: end for Fundamentally, the three nested loops in Algorithm 1 (lines 2, 4, and 5) enforce that the list of selected statements attend the following preconditions: Only continuous statements inside a block are selected. In Figure 2, for example, it is not possible to select a candidate with S3.2 and S3.4 without including S3.3. The selected statements are part of a single block of statements. In Figure 2, for example, it is not possible to generate a candidate with both S2.6 and S3.1 since they belong to distinct blocks. When a statement is selected, the respective children statements are also included. In Figure 2, for example, when statement S2.6 is selected, its children statements S3.1 to S3.7 are also included. Last but not least, we do not ensure that every iteration of the loop yields an Extract Method candidate because: (i) a candidate recommendation must respect a size threshold defined by parameter Minimum Extracted Statements. The value is preset to 3 (changeable), which means that an Extract Method candidate has to have at least three statements; and (ii) a candidate recommendation must respect the preconditions defined by the Extract Method refactoring engine Scoring This phase is responsible for scoring the possible Extract Method refactoring opportunities generated in the previous phase, using a technique inspired by a Move Method recommendation heuristic [7]. Assume m as the selection of statements of an Extract Method candidate and m the remaining statements in the original method m. The proposed heuristic aims to minimize the structural similarity between m and m. Structural Dependencies: The set of dependencies established by a selection of statements S with variables, types, and packages is denoted by Dep var (S), Dep type (S), and Dep pack (S), respectively. These sets are constructed as described next. Variables: If a statement s from a selection of statements S declares, assigns, or reads a variable v, then v is added to Dep var (S). In a similar way, reads from and writes to formal parameters and fields are considered. 31

32 Types: If a statement s from a selection of statements S uses a type (class or interface) T, then T is added to Dep type (S). Packages: For each type T included in Dep type (S), as described in the previous item, the package where T is implemented and all its parent packages are also included in Dep pack (S). For instance, assume m as the highlighted code in Figure 2 (i.e., an Extract Method candidate) and m the remaining statements in the original method mousereleased. On one hand, Dep var (m ) = {metatype, fc, fcb}. On the other hand, the set Dep var (m ) = {metatype, btn, cy, cx, cw, ch, buttons, me, rect}. In this case, the intersection between these two sets contains only metatype. Moreover, the computation of fc and fcb is isolated from the remaining code. Therefore, one can claim that m is cohesive and decoupled from m, i.e., a good separation of concerns is achieved. Scoring Function: To compute the dissimilarity between m and m, we rely on the distance between the dependency sets Dep and Dep using the Kulczynski similarity coefficient [10, 7]: dist(dep, Dep ) = 1 1 [ a 2 (a + b) + a ] (a + c) where a = Dep Dep, b = Dep \ Dep, and c = Dep \ Dep. Thus, let m be the selection of statements of an Extract Method candidate for method m. Let also m be the remaining statements in m. The score of m is defined as: score(m ) = 1/3 dist(dep var (m ), Dep var (m )) + 1/3 dist(dep type (m ), Dep type (m )) + 1/3 dist(dep pack (m ), Dep pack (m )) The scoring function is centered on the observation that a good Extract Method candidate should encapsulate the use of variables, types, and packages. In other words, we should maximize the distance between the dependency sets Dep and Dep Ranking This phase is responsible for ranking and filtering the Extract Method candidates based on the score computed in the previous phase. Basically, we sort the candidates and filter them according to the following parameters: (i) Maximum Recommendations per Method. The value is preset to 3 (changeable), which means that the tool triggers up to three recommendations for each method; and (ii) Minimum Score Value, which has to be configured when the user desires to setup a minimum dissimilarity threshold Internal Architecture and Interface We implemented JExtract as a plug-in on top of the Eclipse platform. Therefore, we rely mainly on native Eclipse APIs, such as Java Development Tools (JDT) and Language 32

33 Toolkit (LTK). The current JExtract implementation follows an architecture with five main modules: 1. Code Analyzer: This module provides the following services to other modules: (a) it builds the structure of block and statements (refer to Subsection 2.1.1); (b) it extracts the structural dependencies (refer to Subsection 2.1.2); and (c) it checks if an Extract Method candidate satisfies the underlying Eclipse Extract Method refactoring preconditions. In fact, this module contains most communication between JExtract and Eclipse APIs (e.g., org.eclipse.jdt.core and org.eclipse.ltk.core.refactoring). 2. Candidate Generator: This module generates all Extract Method candidates based on Algorithm 1 and hence depends on service (a) of module Code Analyzer. 3. Scorer: This module calculates the dissimilarity of the Extract Method candidates generated by module Candidate Generator (refer to Subsection 2.1.2) and hence depends on service (b) of module Code Analyzer. 4. Ranker: This module ranks and filters the Extract Method candidates generated by module Candidate Generator and scored by module Scorer. It depends on service (c) of module Code Analyzer to filter candidates not satisfying preconditions. 5. UI: This module consists of the front-end of the tool, which relies on the Eclipse UI API (org.eclipse.ui) to implement two menu extensions, six actions, and one main view. Moreover, it depends on module UI from LTK (org.eclipse.ltk.ui.refactoring) to delegate the refactoring appliance to the underlying Eclipse Extract Method refactoring tool. Such architecture permits the extension of our tool. For example, the Scorer module may be replaced by one that employs a new heuristic based on semantic and structural information. As another example, the Candidate Generator module may be extended to support the identification of non-contiguous code fragments. Figure 3 presents JExtract s UI, displaying method mousereleased previously presented in Figure 2. When a developer triggers JExtract to identify Extract Method refactoring opportunities for this method, it opens the Extract Method Recommendations view to report the potential recommendations. In this case, the best candidate consists of the extraction of statements S3.2 to S3.5 whose dissimilarity score is Evaluation We conducted two different but complementary empirical studies. Study #1: In our previous paper [9], we evaluated the recommendations provided by our tool on three systems to assess precision and recall. We extended this study to consider minor modifications to the ranking method and to compare the results with JDeodorant, a state-of-the-art tool that identifies Extract Method opportunities [11]. For each system S, we apply random Inline Method refactoring operations to obtain a modified version S. 33

34 Figure 3. JExtract UI We assume that good Extract Method opportunities are the ones that revert the modifications (i.e., restoring S from S ). Table 1. Study #1 Recall and precision results JExtract JDeodorant Top-1 Top-2 Top-3 System # Recall Prec. Recall Prec. Recall Prec. Recall Prec. JHotDraw (34%) 34% 26 (46%) 24% 32 (57%) 20% 2 (4%) 5% JUnit (52%) 52% 16 (64%) 33% 18 (72%) 25% 0 (0%) 0% MyWebMarket (86%) 86% 14 (100%) 50% 14 (100%) 33% 2 (14%) 33% Total (46%) 46% 56 (59%) 30% 64 (67%) 23% 4 (4%) 6% Table 1 reports recall and precision values achieved using JExtract with three different configurations (Top-k Recommendations per Method). While a high parameter value favors recall (e.g., Top-3), a low one favors precision (e.g., Top-1). Table 1 also presents results achieved using JDeodorant with its default settings. As the main finding, JExtract outperforms JDeodorant regardless of the configuration used. Study #2: We replicate the previous study in other ten popular open-source Java systems to assess how the precision and recall rates would vary. Nevertheless, we do not compare our results with JDeodorant since we were not able to reliably provide the source code of all required libraries, as demanded by JDeodorant. Table 2 reports the recall and precision values achieved using the same settings from the previous study. On one hand, the overall recall value ranges from 25% to 49.2%. On the other hand, the overall precision value ranges from 25% to 16.7%. We argue these values are acceptable for two reasons: (i) we only consider as correct a recommendation that matches exactly the one at the oracle; thus, a slight difference of including (or excluding) a statement is enough to be considered a miss; and (ii) the modified methods may have preexisting Extract Method opportunities, besides the ones we introduced, that will be considered wrong by our oracle. 34

35 Table 2. Study #2 Recall and precision results JExtract Top-1 Top-2 Top-3 System # Recall Prec. Recall Prec. Recall Prec. Ant (24.4%) 24.4% 363 (37.7%) 19.1% 460 (47.7%) 16.3% ArgoUML (22.3%) 22.3% 160 (36.4%) 18.3% 186 (42.4%) 14.4% Checkstyle (42.6%) 42.6% 338 (63.4%) 31.9% 389 (73.0%) 24.7% FindBugs (25.1%) 25.1% 278 (38.9%) 19.7% 350 (49.0%) 16.7% FreeMind (24.4%) 24.4% 134 (38.5%) 19.4% 181 (52.0%) 17.8% JFreeChart , (18.7%) 18.7% 396 (36.3%) 18.2% 536 (49.2%) 16.5% JUnit (31.4%) 32.4% 17 (48.6%) 26.6% 22 (62.9%) 23.7% Quartz (41.4%) 41.4% 125 (52.3%) 26.5% 142 (59.4%) 20.4% SQuirreL SQL (38.5%) 38.5% 18 (46.2%) 23.7% 20 (51.3%) 18.2% Tomcat , (19.9%) 19.9% 325 (30.2%) 15.2% 409 (38.0%) 12.8% Total 5,477 1,367 (25.0%) 25.0% 2,154 (39.3%) 19.8% 2,695 (49.2%) 16.7% 3. Related Tools Recent empirical research shows that automated refactoring tools, especially those supporting Extract Method refactorings, are most of the times underused [5, 4]. In view of such circumstances, recent studies on identification of refactoring opportunities are seeking to address this shortcoming. In this paper, we implemented our approach in a way that it can be straightforwardly incorporated to the current development process through a tool that identifies, ranks, and automate Extract Method refactoring opportunities [9]. JMove is the refactoring recommendation system our approach is inspired by [7, 6]. The tool identifies Move Method refactoring opportunities based on the similarity between dependency sets [7]. More specifically, it computes the similarity of the set of dependencies established by a given method m with (i) the methods of its own class C 1 and (ii) the methods in other classes of the system (C 2, C 3,..., C n ). Whereas JMove recommends moving a method m to a more similar class C i, our current approach recommends extracting a fragment from a given method m into a new method m when there is a high dissimilarity between m and the remainder statements in m. JDeodorant is the state-of-the-art system to identify and apply common refactoring operations in Java systems, including Extract Method [11]. In contrast to our approach that relies on the similarity between dependency sets, JDeodorant relies on the concept of program slicing to select related statements that can be extracted into a new method. Our approach, on the other hand, is not based on specific code patterns (such as a computation slice). It is also more conservative to preserve program behavior (although it is currently restricted to non-contiguous fragments of code), and it relies on a scoring function to rank and filter recommendations. There are other techniques to identify refactoring opportunities based, for example, on search-based algorithms [8], Relational Topic Model (RTM) [1], metrics-based rules [3], etc., that can be adapted to identify Extract Method refactoring opportunities. 4. Final Remarks JExtract implements a novel approach for recommending automated Extract Method refactorings. The tool was designed as a plug-in for the Eclipse IDE that automatically 35

36 identifies, ranks, and applies the refactoring. Thereupon, the tool may contribute to increase the popularity of IDE-based refactoring tools, which are normally considered underused by most recent empirical studies on refactoring. Moreover, our evaluation indicates that JExtract is more effective (w.r.t. recall and precision) to identify contiguous misplaced code in methods than JDeodorant, a state-of-the-art tool. As ongoing work, we are extending JExtract to be able to do statement reordering to uncover better Extract Method opportunities, as long as the modification preserves the behavior of the original code. Moreover, we intend to evaluate our tool with human experts to mitigate the threat that the synthesized datasets did not capture the full spectrum of Extract Method instances faced by developers. Last, we also intend to support other kinds of refactoring (e.g., Move Method). The JExtract tool including its source code is publicly available at Acknowledgments: Our research is supported by CAPES, FAPEMIG, and CNPq. References [1] G. Bavota, R. Oliveto, M. Gethers, D. Poshyvanyk, and A. D. Lucia. Methodbook: Recommending move method refactorings via relational topic models. IEEE Transactions on Software Engineering, pages 1 26, [2] M. Fowler. Refactoring: Improving the design of existing code. Addison-Wesley, [3] R. Marinescu. Detection strategies: Metrics-based rules for detecting design flaws. In 20th International Conference on Software Maintenance (ICSM), pages , [4] E. R. Murphy-Hill, C. Parnin, and A. P. Black. How we refactor, and how we know it. IEEE Transactions on Software Engineering, 38(1):5 18, [5] S. Negara, N. Chen, M. Vakilian, R. E. Johnson, and D. Dig. A comparative study of manual and automated refactorings. In 27th European Conference on Object-Oriented Programming (ECOOP), pages , [6] V. Sales, R. Terra, L. F. Miranda, and M. T. Valente. JMove: Seus métodos em classes apropriadas. In IV Brazilian Conference on Software: Theory and Practice (CBSoft), Tools Session, pages 1 6, [7] V. Sales, R. Terra, L. F. Miranda, and M. T. Valente. Recommending move method refactorings using dependency sets. In 20th Working Conference on Reverse Engineering (WCRE), pages , [8] O. Seng, J. Stammel, and D. Burkhart. Search-based determination of refactorings for improving the class structure of object-oriented systems. In 8th Conference on Genetic and Evolutionary Computation (GECCO), pages , [9] D. Silva, R. Terra, and M. T. Valente. Recommending automated Extract Method refactorings. In 22nd International Conference on Program Comprehension (ICPC), pages , [10] R. Terra, J. Brunet, L. F. Miranda, M. T. Valente, D. Serey, D. Castilho, and R. S. Bigonha. Measuring the structural similarity between source code entities. In 25th Conference on Software Engineering and Knowledge Engineering (SEKE), pages , [11] N. Tsantalis and A. Chatzigeorgiou. Identification of extract method refactoring opportunities for the decomposition of methods. Journal of Systems and Software, 84(10): ,

37 ModularityCheck: A Tool for Assessing Modularity using Co-Change Clusters Luciana Lourdes Silva 1,2, Daniel Félix 1, Marco Túlio Valente 1, Marcelo de A. Maia 3 1 Department of Computer Science Federal University of Minas Gerais (UFMG) 2 Federal Institute of Minas Gerais IFMG 3 Faculty of Computing Federal University of Uberlândia {luciana.lourdes, dfelix, mtov}@dcc.ufmg.br, marcmaia@facom.ufu.br Abstract. It is widely accepted that traditional modular structures suffer from the dominant decomposition problem. Therefore, to improve current modularity views, it is important to investigate the impact of design decisions concerning modularity in other dimensions, as the evolutionary view. In this paper, we propose the ModularityCheck tool to assess package modularity using co-change clusters, which are sets of classes that usually changed together in the past. Our tool extracts information from version control platforms and issue reports, retrieves co-change clusters, generates metrics related to co-change clusters, and provides visualizations for assessing modularity. We also provide a case study to evaluate the tool Introduction There is a growing interest in tools to enhance software quality [Kersten and Murphy 2006, Zimmermann et al. 2005]. Specifically, several tools have been developed for supporting software modularity improvement [Rebêlo et al. 2014, Vacchi et al. 2014, Bryton and Brito e Abreu 2008, Schwanke 1991]. Most of such tools help architects to understand the current package decomposition. Basically, they extract information from the source code by using structural dependencies and the source code text [Robillard and Murphy 2007, Robillard and Murphy 2002]. Modularity is a key concept when designing complex software systems [Baldwin and Clark 2003]. The central idea is that modules should hide important design decisions or decisions that are likely to change [Parnas 1972]. Typically, the standard approach to assess modularity is based on coupling and cohesion, calculated using the structural dependencies established between the modules of a system (coupling) and between the internal elements from each module (cohesion). Usually, high cohesive and low-coupled modules are desirable because they ease software comprehension, maintenance, and reuse. However, typical cohesion and coupling metrics measure a single dimension of the software implementation (the static-structural dimension). On the other hand, it is widely accepted that traditional modular structures and metrics suffer from the dominant decomposition problem and tend to hinder different facets that developers may be interested in [Kersten and Murphy 2006, Robillard and Murphy 2002, Robillard and Murphy 2007]. For example, there are various effects of coupling that are not captured by structural coupling. Therefore, to improve current modularity views, it 37

38 Figure 1. ModularityCheck s overview. is important to investigate the impact of design decisions concerning modularity in other dimensions of a software system, as the evolutionary dimensions. To address this question, we present in this paper the ModularityCheck tool to support package modularity assessment and understanding using co-change clusters. The proposed tool has the following features: The tool extracts commits automatically from the version history of the target system and discards noisy commits by checking with their issue reports. The tool retrieves set of classes that usually changed together in the past, which we termed co-change clusters. The tool relies on distribution maps [Ducasse et al. 2006] to reason about the projection of the extracted co-change clusters in the tradition decomposition of a system in packages. It also calculates a set of metrics defined for distribution maps to support the characterization of the extracted co-change clusters. 2. ModularityCheck in a Nutshell ModularityCheck supports the following stages to assess the quality of a system package modularity: pre-processing, post-processing, co-change clusters retrieval, and cluster visualization. Figure 1 shows the process to retrieve co-change clusters. A detailed presentation of this process is available in a full technical paper [Silva et al. 2014]. In the first stage, the tool applies several preprocessing tasks, which are responsible for selecting commits from version history to create the co-change graph. In such graphs, the vertices are classes and the edges link classes changed together in the same commits. In the second stage, a post-processing task prune edges with small weights from the co-change graphs. After that, the co-change graph is automatically processed to produce a new modular facet: co-change clusters, which abstract out common changes made to a system, as stored in version control platforms. Therefore, co-change clusters represent sets of classes that changed together in the past. Finally, the tool uses distribution maps [Ducasse et al. 2006] a well-known visualization technique to reason about the projection of the extracted clusters in the traditional decomposition of a system in packages. ModularityCheck also provides a set of metrics defined for distribution maps to reason about the extracted co-change clusters. Particularly, it is possible to reason about recurrent distribution patterns of co-change clusters listed by the tool, including patterns denoting well-modularized and crosscutting clusters Architecture ModularityCheck supports package modularity assessment of software systems implemented in the Java language. The tool relies on the following inputs: (i) the issue reports 38

39 Figure 2. ModularityCheck s architecture. saved in XML files; (ii) URL of the version control platform (SVN or GIT). (iii) maximum number of packages to remove highly scattered commits. (iv) minimum number of classes in a co-change cluster. We discard small clusters because they may eventually generate a decomposition of the system with hundreds of clusters. Figure 2 shows the tool s architecture which includes the following modules: Co-Change Graph Extraction: As illustrated in Figure 2, the tool receives the URL associated with the version control platform of the target system and the issue reports. When extracting co-change graphs, it is fundamental to preprocess the considered commits to filter out commits that may pollute the graph with noise. Firstly, the tool removes commits not associated to maintenance issues because commits can denote partial implementations of programming tasks. Secondly, the tool removes commits not changing classes because the co-changes considered by ModularityCheck are defined for classes. Thirdly, commits associated to multiple maintenance issues are removed. Such commits could generate edges connecting classes modified to implement semantically unrelated maintenance tasks, which were included in the same commit just by convenience, for example. Finally, the last pruning task removes highly scattered commits, according to the Maximum Scattering threshold, an input parameter. Such commits usually are associated to refactoring activities, dead code removal, or changes to comment styles. The default value considered by the tool is ten packages. Co-Change Cluster Retrieval: After extracting the co-change graph, a post-processing tasks is applied to prune edges with small weights. In this phase, edges with weights less than two co-changes are removed. Then, in a further step, a data mining algorithm named Chameleon [Karypis et al. 1999] is performed to retrieve subgraphs with high density. The number of clusters is defined by executing Chameleon multiple times. After each execution, small clusters are discarded by the Minimum Cluster Size threshold informed by the user. The default value considered by the tool is four classes, i.e., after the clustering execution, clusters with less than four classes are removed. Metric Set Extraction: The tool calculates the number of vertices, edges, and cochange graph s density before and after the post-processing filter. After retrieving the co-change clusters, the tool presents the final number of clusters and several standard descriptive statistics measurements. These metrics describes the size and density of the extracted co-change clusters, and cluster average edges weight. Moreover, the tool presents metrics defined for distribution maps, like focus and spread. ModularityCheck also allows to investigate the distribution of the co-change clusters over the package structure by using distribution maps [Ducasse et al. 2006]. In our distribution maps [Santos et al. 2013, Santos et al. 2014], entities (classes) are represented as small 39

40 Figure 3. Filters and metric results. squares and package structure groups such squares into large rectangles. In the package structure, we only consider classes that are members of co-change clusters, in order to improve the maps visualization. Finally, all classes in a co-change cluster have the same color. A distribution maps metric, named focus, ranges between 0 and 1, where the value one means that the cluster q dominates the packages that it touches. There is also a second metric, called spread, that measures the number of packages touched by q. After measuring focus and spread, the tool classifies recurrent distribution patterns of co-change clusters, as follows: well-encapsulated, partially encapsulated, wellconfined in packages, or crosscutting clusters. Well-encapsulated clusters are those that dominate the packages they touch. Clusters classified as partially encapsulated have focus 1.0 but touching classes in other packages (spread > 1). Clusters defined as wellconfined have focus < 1.0 and spread = 1. Finally, clusters with crosscutting behavior have focus 0 and spread >= Use Case Scenario: Geronimo Web Application Server In order to present ModularityCheck, we provide a scenario of usage involving information from the Geronimo Web Application Server system, extracted almost 10 years (08/20/ /04/2013). Figure 3 shows the results concerning co-change clustering. A detailed discussion of such results is presented in technical paper [Silva et al. 2014] Co-Change Extraction First, our tool extracted 9,829 commits. We maintained the value for Maximum Scattering as 10, i.e., the tool discarded commits changing classes located in more than ten 40

41 packages. After the pre-processing tasks, only 1,406 commits were considered as useful. However, we observed that about 44.4% of the commits change a single software artifact and therefore they would not be used anyway in terms of co-change Co-Change Clustering In the next step, small clusters are discarded by following Minimum Cluster Size filter. The tool removed clusters with less than 4 classes, resulting in 21 clusters. The ratio between the final number of clusters and the number of packages in the system is 0.05%. This fact is an indication that the maintenance activity in the system is concentrated in few classes. Figure 3a shows standard descriptive statistics measurements regarding the size, density, average edge s weight of the extracted co-change clusters. ModularityCheck presents the size of the extracted co-change clusters, in terms of number of classes. The extracted clusters have 7.48 ± 3.78 classes in Geronimo. Moreover, the biggest cluster has a considerable number of classes: 20 classes. The tool also presents the density of the extracted co-change clusters. The clusters have a density of 0.79 ± We can also analyze the average weight of the edges in the extracted co-change clusters. For a given co-change cluster, we define this average as the sum of the weights of all edges divided by the number of edges in the cluster. We can observe that the average edges weight is not high, being slightly greater than two in Geronimo Modularity Analysis ModularityCheck also provides a visualization, which relies on co-change clusters to assess the quality of a system s package decomposition. Basically, this visualization allows to reveal the distribution of the co-change clusters over the package structure by using distribution maps. The tool also shows the standard descriptive statistics measurements regarding respectively the focus and spread of the co-change clusters. As presented in Figure 3a, the co-change clusters in Geronimo have high focus with the average Regarding spread, on average the spread is Figure 3b shows the focus, spread, and type of patterns for each cluster Geronimo Results Figure 4 shows the distribution map for Geronimo. To improve the visualization, besides background colors, we use a number in each class (small squares) to indicate their respective clusters. If we stop the mouse over a class, a tooltip is displayed with its respective name. The large boxes are the packages and the text below is the package name. Considering the clusters that are well-encapsulated (high focus) in Geronimo, we found two relevant distribution patterns: Clusters well-encapsulated (focus = 1.0) in a single package (spread = 1). Four clusters have this behavior. As an example, we have Cluster 2, which dominates the co-change classes in the package main.webapp.web- INF.view.realmwizard (line 1 in the map, column 9). Cluster 5 (package mail, line 1 in the map, column 10) and Cluster 11 (package security.remoting.jmx, line 1, column 3). 41

42 Figure 4. Distribution maps for Geronimo [Silva et al. 2014]. Clusters partially encapsulated (focus 1.0), but touching classes in other packages (spread > 1). As an example, we have Cluster 8 (focus = 0.97, spread = 2), which dominates the co-change classes in the package tomcat.model (line 1 and column 1 in the map), but also touches the class TomcatServerGBean from package tomcat (line 2, column 8) Practical Usage ModularityCheck can support software architects to assess modularity under an evolutionary view. It helps to detect co-change behavior patterns, as follows: When the package structure is adherent to the cluster structure, localized cochanges are likely to occur, as in Geronimo s clusters. When there is no a clear adherence to the cluster structure. Our tool detects two cluster patterns that may suggest modularity flaws. The first pattern denotes clusters with crosscutting behavior, not detected in Geronimo but we could detect them in other systems presented in [Silva et al. 2014]. The second indicates clusters partially encapsulated that suggest a possible ripple effect when changes in a module can propagate to dependent modules during maintenance activities. 4. Related Tools Zimmermann et al. proposed ROSE, a tool that uses association rule mining on version histories to recommend further changes [Zimmermann et al. 2005]. Their tool differs from ours because they rely on association rules and we use co-change clusters that are semantically related to a maintenance task. Furthermore, our goal is not to recommend future changes but to assess modularity using distribution maps to compare and contrast co-change clusters with the system s packages. ConcernMapper [Robillard and Weigand-Warr 2005] is an Eclipse Plug-in to organize and view concerns using a hierarchical structure similar to the package structure. However, the concern model is created manually by developers and the relations between concerns are typically syntactical and structural. On the other hand, in our tool, the elements and their relationships are obtained by mining the version history. Particularly, 42

43 relationships express co-changes and concerns are retrieved automatically by clustering co-change graphs. Wong et al. presented CLIO, a tool that detects and locates modularity violations [Wong et al. 2011]. CLIO compares how components should co-change according to the modular structure and how components usually co-change retrieving information from version history. A modularity violation is detected when two components usually change together but they belong to different modules, which are supposed to evolve independently. CLIO identifies modularity violations by comparing the results of structural coupling with the results of change coupling. They compare association rules and structural information to detect modularity violations. On the other hand, we retrieve cochange clusters and use distribution maps to reason about the projection of the extracted clusters in the traditional decomposition of a software system in packages. Palomba et al. proposed HIST, a tool that uses association rule mining on version histories to detect the following code smells: Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy [Palomba et al. 2013]. HIST bases on changes at method level granularity. For each smell, they defined a heuristics that relies on association rules discovery or that analyzes co-changed classes/methods for detecting bad smells. In contrast, our goal is not to detect code smells but to assess package decomposition using co-change clusters. 5. Conclusion In this paper, we proposed a tool to assess modularity using evolutionary information. The tool extracts commits automatically from version histories and filter out noisy information by parsing issue reports. After that, the tool retrieves co-change clusters, a set of metrics concerning clusters, and provides a visualization based on distribution maps. The central goal of ModularityCheck is to detect classes of the target system that usually change together to help on assessment of the package modular decomposition. Moreover, the co-change clusters can also be used as an alternative view during maintenance tasks to improve the developer s comprehension of the their tasks. The ModularityCheck tool is publicly available at: aserg.labsoft.dcc.ufmg.br/modularitycheck Acknowledgement This work was supported by CNPq, CAPES, and FAPEMIG. References Baldwin, C. Y. and Clark, K. B. (2003). Design Rules: The Power of Modularity. MIT Press. Bryton, S. and Brito e Abreu, F. (2008). Modularity-oriented refactoring. In 12th European Conf. on Soft. Maintenance and Reengineering (CSMR), pages Ducasse, S., Gîrba, T., and Kuhn, A. (2006). Distribution map. In 22nd IEEE International Conference on Software Maintenance (ICSM), pages Karypis, G., Han, E.-H. S., and Kumar, V. (1999). Chameleon: hierarchical clustering using dynamic modeling. Computer, 32(8):

44 Kersten, M. and Murphy, G. C. (2006). Using task context to improve programmer productivity. In 14th International Symposium on Foundations of Software Engineering (FSE), pages Palomba, F., Bavota, G., Penta, M. D., Oliveto, R., de Lucia, A., and Poshyvanyk, D. (2013). Detecting bad smells in source code using change history information. In 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages Parnas, D. L. (1972). On the criteria to be used in decomposing systems into modules. Communications of the ACM, 15(12): Rebêlo, H., Leavens, G. T., Bagherzadeh, M., Rajan, H., Lima, R., Zimmerman, D. M., Cornélio, M., and Thüm, T. (2014). Modularizing crosscutting contracts with aspectjml. In 13th International Conference on Modularity (MODULARITY), pages Robillard, M. P. and Murphy, G. C. (2002). Concern graphs: finding and describing concerns using structural program dependencies. In 24th International Conference on Software Engineering (ICSE), pages Robillard, M. P. and Murphy, G. C. (2007). Representing concerns in source code. ACM Transactions on Software Engineering and Methodology, 16(1):1 38. Robillard, M. P. and Weigand-Warr, F. (2005). Concernmapper: simple view-based separation of scattered concerns. In OOPSLA workshop on Eclipse technology exchange, eclipse 05, pages Santos, G., Santos, K., Valente, M. T., Serey, D., and Anquetil, N. (2013). Topicviewer: Evaluating remodularizations using semantic clustering. In IV Congresso Brasileiro de Software: Teoria e Prática (Sessão de Ferramentas), pages 1 6. Santos, G., Valente, M. T., and Anquetil, N. (2014). Remodularization analysis using semantic clustering. In IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), pages Schwanke, R. (1991). An intelligent tool for re-engineering software modularity. In 13th International Conference on Software Engineering (ICSE), pages Silva, L., Valente, M. T., and Maia, M. (2014). Assessing modularity using co-change clusters. In 13th International Conference on Modularity, pages Vacchi, E., Olivares, D. M., Shaqiri, A., and Cazzola, W. (2014). Neverlang 2: A framework for modular language implementation. In 13th International Conference on Modularity (MODULARITY), pages Wong, S., Cai, Y., Kim, M., and Dalton, M. (2011). Detecting software modularity violations. In 33rd Int. Conference on Software Engineering (ICSE), pages Zimmermann, T., Weissgerber, P., Diehl, S., and Zeller, A. (2005). Mining version histories to guide software changes. IEEE Transactions on Software Engineering, 31(6):

45 Nuggets Miner: Assisting Developers by Harnessing the StackOverflow Crowd Knowledge and the GitHub Traceability Eduardo C. Campos 1, Lucas B. L. de Souza 1, Marcelo de A. Maia 1 1 Department of Computer Science Federal University of Uberlândia (UFU), Uberlândia, MG, , Brazil {eduardocunha11,lucas.facom.ufu}@gmail.com, marcmaia@facom.ufu.br Abstract. StackOverflow.com (SOF) is a Question and Answer service oriented to support collaboration among developers. The information available on this type of service is also known as crowd knowledge and currently is one important trend in supporting activities related to software development. GitHub.com (GitHub) is a successful social site for developers that makes unique information about users and their activities visible within and across open source software projects. The traceability of GitHub s issue tracker can be harnessed in the Integrated Development Environment (IDE) to assist software maintenance. We give a form to our approach by implementing Nuggets Miner, an Eclipse plugin, that recommends a ranked and interactive list of results to system s user. Video Demo URL: 1. Introduction Modern-day software development is inseparable from the use of the Application Programming Interfaces (APIs) [Duala-Ekoko and Robillard 2012]. Several studies have shown that developers face problems when dealing with unfamiliar APIs [Duala-Ekoko and Robillard 2012, Holmes et al. 2006, Thung et al. 2013]. It is seldom the case that the documentation and examples provided with a large framework or library are sufficient for a developer to use their API effectively. Frequently, developers become lost when trying to use an API, unsure of how to make a progress on a programming task [Holmes et al. 2006]. A common behavior of developers is to post questions on social media services and receive answers from other programmers that belong to different projects [Treude et al. 2011]. To help developers find their way, a widely-know alternative is StackOverflow (SOF), which is a Question and Answer (Q&A) website which uses social media to facilitate knowledge exchange between programmers by mitigating the pitfalls involved in using code from the Internet. Mamykina et al. conducted a statistical study of the entire SOF corpus to find out what is behind the immediate success of it. Their findings showed that a majority of the questions will receive one or more answers (above 90% very quickly - with a median answer time of 11 minutes) [Mamykina et al. 2011]. The set of information available on this social media services is called crowd knowledge and often become a substitute for the official software documentation [Treude et al. 2011]. Despite its usefulness, the knowledge provided by Q&A services cannot be directly leveraged from within an Integrated Development Environment (IDE), in the sense 45

46 that developers must toggle to the Web browser to access those services. Moreover, when dealing with maintenance tasks, software developers often also need to know what changes were made in the past of the project. Thus, the developers are forced to explore the historical information of the project (e.g., issues and respective commits) [Robillard and Dagenais 2010]. In order to address this problem, we examined a successful social site called GitHub 1. This site makes unique information about users and their activities visible within and across open source software projects [Dabbish et al. 2012]. Furthermore, the GitHub s issue tracker has excellent traceability and this feature can be harnessed in the IDE (e.g., given a closed issue, we can explore the respective commit). Although GitHub s site has an integrated issue tracker, it is not possible to search automatically for related issues to a particular maintenance task. Thus, during the maintenance task, the developer is constantly reviewing the issue tracker in search for some issue related to your task. Concluding, developers spend most of their time in the IDE to write and understand code [LaToza et al. 2006] and they should be only focused on the current task without any major interruption or disturbance [Raskin 2000]. Nevertheless, developers are forced to leave the IDE, thus interrupting the programming flow and lowering their focus on the current task, and also maybe getting distracted with other activities in the Web. To deal with those problems, recommendation systems can be a reasonable alternative option. According to Robillard et al., a recommendation system for software engineering (RSSE) can assist developers during maintenance and development tasks providing useful information (e.g., right code for a task, good example of API usage) [Robillard et al. 2010]. This information can be gathered from the crowd knowledge provided by Q&A services or gathered from closed issues in GitHub related to the current maintenance task. Considering StackOverflow, we can rely on regular dumps of the entire dataset to obtain the desired information. In the case of GitHub, the current project that the developer is working on must host their issues in the GitHub s issue tracker instead of other issue trackers (e.g., Bugzilla 2 ). Nuggets Miner extracts only issues with CLOSED state (i.e., issues that were previously solved by other developers), displays the ranked search issues directly in the IDE and allows developers to select an issue and explore the historical changes made to the respective commit files. Our work has the following contribution. We present Nuggets Miner, a recommendation system in the form of a plugin for Eclipse IDE 3 to assist software developers in development and maintenance tasks. Our recommendation strategy has been partially assessed in [Souza et al. 2014] (i.e., only recommendations of SOF were evaluated). There are several differences from this paper and [Souza et al. 2014] paper, but these two are the most important: 1) that paper was not tool-oriented, indeed, no tool was presented; 2) that paper was only about SOF posts, but Nuggets Miner also indexes project s issues. The rest of this paper is organized as follows. In Section 2 we illustrate Nuggets Miner usage with a use case scenario. In Section 3 we present Nuggets Miner components and its architecture. In Section 4 we discuss related work. In Section 5 we draw our

47 conclusions. 2. A Use Case Scenario We show how Nuggets Miner can help developers to solve programming problems by leveraging SOF and GitHub traceability from within the Eclipse IDE. Bob is required to build a panel with three tabs using Java Swing API. However, he is novice in this library. Bob opens up the Eclipse IDE, with the Nuggets Miner plugin installed and writes the following query in Nuggets Miner s Navigator: tab pane java swing. Figure 1 shows the search results returned by the search engine for the query tab pane java swing : Q&A pairs in the left panel and issues in the right panel. Concerning the StackOverflow panel, the search engine returns to Bob, the top 15 Q&A pairs from SOF in a ranked list considering two main aspects: the textual similarity of the pairs with respect to the query and the quality of pairs (whose content was previously evaluated by SOF community). Among the recommended Q&A pairs, Bob finds out a pair whose title is JTabbedPane: show task progress in a tab. Figure 2 shows the content of a selected Q&A pair. He reads the Q&A pair and finds an accepted answer that creates an object of JTabbedPane type and invokes the method public void addtab(string title, Icon icon, Component component) of this object. Bob can also import the code snippet given in the answer into program s editor via drag & drop and execute the Java program (in this case without any modification). Thus, Bob can start modifying the code in the editor to achieve the desired outcome. Concerning the Github panel, in the list of returned issues, Bob can choose an issue that he thinks is more related to his activity. Figure 3 shows the content of a selected issue. He can visualize the conversations between Bob s colleagues about the selected issue (through Conversation tab) which is supposed to be related to his current task. In the Figure 3, it is possible to visualize the list of commits with their respective links (through Commits tab). When Bob clicks in the commit s link, another page will open showing the code modified by the commit. Figure 4 shows a snapshot of commit selected by Bob. Figure 1. Nuggets Miner s Navigator: Search Results. 3. Nuggets Miner In this section, we present Nuggets Miner s architecture (Subsection 3.1), the mechanism for data collection and classification (Subsection 3.2) and the query engine (Subsection 3.3). 47

48 Figure 2. Nuggets Miner s Document s Content for the Q&A pair selected by Bob. Figure 3. Nuggets Miner s Issue s Content for the issue selected by Bob. Figure 4. Snapshot of commit selected by Bob The Architecture Figure 5 depicts Nuggets Miner s architecture. The left part of this figure represents the server side, while the right side represents the client side (i.e., the graphical user interface and features of the plugin). On the server side, there is a component for collection and classification of data, which is responsible for collecting and classifying Q&A pairs from SOF. Through this 48

49 component, we can also collect issues with CLOSED state along with the respective commits for a given interest project hosted on GitHub. Therefore, the Apache Solr 4 index is constructed with Q&A pairs from SOF and with issues and respective commits of the developer s project hosted in the GitHub. Figure 5. Nuggets Miner s architecture. The client side is responsible for querying the Apache Solr index, parsing the JSON response (converting the JSON in a object-oriented representation for further manipulation), applying the methodology for ranking the Q&A pairs, applying the methodology for ranking the related issues and present these search results to the user s system. The ranking criteria for Q&A pairs is based on two main aspects: the textual similarity of the pairs with respect to the query and the quality of the pairs (assessed by SOF community members), while the ranking for GitHub issues takes into account only the textual similarity of the issues with respect to the query Mechanism for Data Collection and Classification In this subsection, we explain the mechanism for data collection (Subsection 3.2.1) and the mechanism for data classification for Q&A pairs (Subsection 3.2.2) Data Collection We downloaded a release of SOF public data dump 5 provided by Stack Exchange 6, which comprises several XML files that represent the database of each website. Since performing these operations by manipulating data directly from XML files is resource intensive, we imported everything in a relational database in order to classify the SOF Q&A pairs. The posts table of this database stored all questions posted by questioners

50 in the website until the date the dump was performed. This table also stores all answers that were given to each question, if any. To retrieve the issues with CLOSED state from a GitHub project, we developed another program that connects in the GitHub server (informing the user and password) and performs the download of these issues from a given repository (e.g., in our study we considered a Swing look-and-feel project called Insubstantial 7 that is hosted on GitHub 8 ). Our program used an object-oriented GitHub API 9. Then, for each retrieved issue, the program stores the issue data (e.g., issue title, issue body, issue id, commit address of the issue in GitHub, code modified by the commit ) in a XML file in the format required by Apache Solr search engine. For instance, the issue whose id is #124 belongs to the repository Insubstantial/insubstantial. The issue title of this issue is: Modify base delay of TabWindowPreview. The commit address of this issue in GitHub is: This page will be displayed inside the browser of Nuggets Miner plugin Data Classification for Q&A pairs On SOF, users ask many kind of different questions. Accordingly to Nasehi et al. [Nasehi et al. 2012] questions from SOF can also be classified in a second dimension that is about the main concerns of the questioners and what they wanted to solve. In this dimension, one of the categories is the How-to-do-it in which the questioner provides a scenario and asks about how to implement it. This category is very close to scenario in which a developer has a programming task at hand and needs to solve it. For this reason, in our approach, we only consider Q&A pairs that are classified as How-to-do-it. In order to automate the selection of this kind of Q&A pairs, we developed a classification strategy. The information about the categories of Q&A pairs proposed in this study, the classifier s attributes and the steps to build the dataset for training/test of the classifier are described in more detail in [Souza et al. 2014]. We used this classifier to automatically classify Q&A pairs of a pre-selected set of APIs (Swing of Java, Boost of C++ and LINQ of C#) into one of three categories: How-to-do-it, Conceptual and Seeking-something. The Apache Solr index was populated with only Q&A pairs of How-to-do-it category of these pre-selected set of APIs The Query Engine Nuggets Miner s Eclipse plugin makes the Q&A crowd knowledge and closed issues of a working GitHub project available in the IDE. Users can interact with this crowd knowledge in ways that the SOF website normally does not allow, such as import code snippets to the program s editor through simple drag & drop. The main goal of the query engine is to communicate with Apache Solr, by creating a query given an input string. It is necessary that the Q&A pair has some information on your title, question body or answer body to be returned by the search engine. It is also needed that the GitHub issues has some information on your issue title, issue body or code modified by

51 the commit to be returned by the search engine. As stated above, the query engine simultaneously queries Apache Solr index for both Q&A pairs and issues similar to the entered query. The query engine tokenizes the string inserted by the developer. The engine builds the query, according to Apache Solr syntax, in a way that every token must be presented in the document fields. 4. Related Work Ponzanelli et al. [Ponzanelli et al. 2013] presented an approach to assist programmers who want to leverage the crowd knowledge of Q&A services. They implemented SEA- HAWK, a recommendation system in the form of a plugin for the Eclipse IDE to harness the crowd knowledge of SOF from within the IDE. In our work, we introduced a more efficient ranking mechanism than SEAHAWK and provided the GitHub access point. We used SEAHAWK software 10 to help us developing our plugin. Cordeiro et al. [Cordeiro et al. 2012] presented an Eclipse plugin to help developers in problem solving tasks. Based on an exception s stack trace gathered from the IDE s console, they suggest related documents from SOF. HIPKAT [ČubraniĆ et al. 2004] is a recommendation system developed to support newcomers in a project by recommending items from problem reports, newsgroup, and articles. Our approach recommends project s issues with CLOSED state related to the maintenance task at hand. Takuya et al. presented SELENE [Takuya and Masuhara 2011], a source code recommendation tool based on an associative search engine. It spontaneously searches and displays example programs while the developer is editing a program text. Our work also relies on search engines, but we suggest Q&A pairs taken from SOF to enrich the information provided by code snippets. 5. Conclusions We presented a novel approach to leverage the SOF crowd knowledge and the GitHub traceability. We have detailed the implementation of our approach, Nuggets Miner. This recommendation system allows users interact with SOF knowledge by importing code snippets. It also allows users navigate through related issues previously solved by others developers. Thus, users can explore the respective commit for the recommended issue and see the modifications. As future work, we intend to improve the evaluation of Nuggets Miner with human subjects to assess the performance gains compared to the use of an external browser. 6. Acknowledgments This work was partially supported by FAPEMIG grant CEXAPQ and CNPQ grant / References Cordeiro, J., Antunes, B., and Gomes, P. (2012). Context-based recommendation to support problem solving in sof. development. In Proceedings of 3rd Int. Workshop on RSSE), pages

52 Dabbish, L., Stuart, C., Tsay, J., and Herbsleb, J. (2012). Social Coding in Github: Transparency and collaboration in an open software repository. CSCW 12, pages ACM. Duala-Ekoko, E. and Robillard, M. P. (2012). Asking and answering questions about unfamiliar apis: An exploratory study. In Proc. of ICSE 2012, pages IEEE Press. Holmes, R., Walker, R. J., and Murphy, G. C. (2006). Approximate structural context matching: An approach to recommend relevant examples. IEEE Trans. Softw. Eng., 32(12): LaToza, T. D., Venolia, G., and DeLine, R. (2006). Maintaining mental models: A study of developer work habits. In Proc. of ICSE 2006, pages ACM. Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., and Hartmann, B. (2011). Design lessons from the fastest q&a site in the west. In Proc. of the SIGCHI Conference on Human Factors in Computing Systems, pages ACM. Nasehi, S., Sillito, J., Maurer, F., and Burns, C. (2012). What makes a good code example? A study of programming Q&A in Stack Overflow. In Proc. of ICSM 2012, pages Ponzanelli, L., Bacchelli, A., and Lanza, M. (2013). Leveraging crowd knowledge for software comprehension and development. In Cleve, A., Ricca, F., and Cerioli, M., editors, Proc. of CSMR 2013, pages IEEE Computer Society. Raskin, J. (2000). The Humane Interface: New Directions for Designing Interactive Systems. ACM Press/Addison-Wesley Publishing Co., New York, NY, USA. Robillard, M. P. and Dagenais, B. (2010). Recommending change clusters to support software investigation: An empirical study. J. Softw. Maint. Evol., 22(3): Robillard, M. P., Walker, R. J., and Zimmermann, T. (2010). Recommendation systems for software engineering. IEEE Software, 27(4): Souza, L., Campos, E., and Maia, M. (2014). Ranking crowd knowledge to assist software development. In Proc. of ICPC 2014, pages Takuya, W. and Masuhara, H. (2011). A spontaneous code recommendation tool based on associative search. In Proceedings of the 3rd International Workshop on Search-Driven Development, pages ACM. Thung, F., Wang, S., Lo, D., and Lawall, J. L. (2013). Automatic recommendation of api methods from feature requests. In ASE, pages IEEE. Treude, C., Barzilay, O., and Storey, M.-A. (2011). How do programmers ask and answer questions on the web? (nier track). In Proc. of ICSE 2011, pages ACM. ČubraniĆ, D., Murphy, G. C., Singer, J., and Booth, K. S. (2004). Learning from project history: A case study for software development. In Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work, CSCW 04, pages ACM. 52

53 NextBug: A Tool for Recommending Similar Bugs in Open-Source Systems Henrique S. C. Rocha 1, Guilherme A. de Oliveira 2, Humberto T. Marques-Neto 2, Marco Túlio O. Valente 1 1 Department of Computer Science Federal University of Minas Gerais (UFMG) Belo Horizonte MG Brazil 2 Department of Computer Science Pontifical Catholic University of Minas Gerais (PUC Minas) Belo Horizonte MG Brazil henrique.rocha@dcc.ufmg.br, guilherme.oliveira @sga.pucminas.br humberto@pucminas.br, mtov@dcc.ufmg.br Abstract. Due to the characteristics of the maintenance process of open-source systems, grouping similar bugs to improve developers productivity is a challenging task. In this paper, we proposed and evaluate a tool, called NextBug, for recommending similar bugs in open-source systems. NextBug is implemented as Bugzilla plug-in and it was design to help maintainers selecting the next bug he/she would fix. We also report an experience on using NextBug with 109,145 bugs previously reported for Mozilla products. Video URL: < 1. Introduction Considering the great importance, the costs, and the increasing complexity of software maintenance activities, most organizations usually maintain their systems by performing tasks periodically, i.e., maintenance requests are grouped and implemented as part of large software projects [Tan and Mookerjee 2005; Aziz et al. 2009; Junio et al. 2011; Marques-Neto et al. 2013]. On the other hand, open-source projects typically adopt continuous maintenance policies where the maintenance requests are addressed by maintainers with different skills and commitment levels, as soon as possible, after being registered in an issue tracking platform, such as Bugzilla and Jira [Mockus et al. 2002; Tan and Mookerjee 2005; Liu et al. 2012]. However, this process is usually uncoordinated, which results in a high number of issues from which many are invalid or duplicated [Liu et al. 2012]. In 2005, a certified maintainer from the Mozilla Software foundation made the following comment on this situation: everyday, almost 300 bugs appear that need triaging. This is far too much for only the Mozilla programmers to handle [Anvik et al. 2006]. The dataset formed by bugs reported for the Mozilla projects indicates that, in 2011, the number of reported issues per year increased approximately 75% when compared to In this context, tools to assist in the issue processing would be very helpful and could contribute to increase the productivity of open-source systems development. 53

54 In this paper, we claim that a very simple form of periodic maintenance policy can be promoted in open-source systems by recommending similar maintenance requests to maintainers whenever they manifest interest in handling a given request. Suppose that a developer has manifested interest in a bug with a textual description d i. In this case, we rely on text mining techniques to retrieve open bugs with descriptions d j similar to d i and we recommend such bugs to the maintainers. More specifically, we present NextBug, a tool to recommend similar bugs to maintainers based on the textual description of each bug stored in Bugzilla, an issue tracking system widely used by open-source projects. The proposed tool is compatible with the software development process followed by open-source systems for the following reasons: (a) it is based on recommendations and, therefore, maintainers are not required to accept extra bugs to fix; (b) it is a fully automatic and unsupervised approach which does not depend on human intervention; and (c) it relies on information readily available in Bugzilla. Assuming the recommendations effectively denote similar bugs and supposing that the maintainers would accept the recommendations pointed out by NextBug, the tool can contribute to introduce gains of scale similar to the ones achieved with periodic policies [Banker and Slaughter 1997]. We also report a field study when we populated NextBug with a dataset of bugs reported for Mozilla systems. The remainder of this paper is organized as follows. Section 2 discuss tools for finding duplicated issue reports in bug tracking systems and also tools that assign bugs to developers. The architecture and the central features of NextBug are described in Section 3. An example of usage is presented in Section 4. Finally, conclusions are offered in Section Related Tools Most open-source systems adopt an Issue Tracking System (ITS) to support their maintenance processes. Normally, in such systems both users and testers can report modification requests [Liu et al. 2012]. This practice usually results in a continuous maintenance process where maintainers address the change requests as soon as possible. The ITS provides a central knowledge repository which also serves as a communication channel for geographically distributed developers and users [Anvik et al. 2006; Ihara et al. 2009]. Recent studies have focused on finding duplicated issue reports in bug tracking systems. Duplicated reports can hamper the bug triaging process and may drain maintenance resources [Cavalcanti et al. 2013]. Typically, studies for finding duplicated issues rely on traditional information retrieval techniques such as natural language processing, vector space model, and cosine similarity [Alipour et al. 2013]. Approaches to infer the most suitable developer to correct a software issue are also reported in the literature. Most of them can be viewed as recommendation systems that suggest developers to handle a reported bug. For instance, [Anvik and Murphy 2011] proposed an approach based on supervised machine learning that requires training to create a classifier. This classifier assigns the data (bug reports) to the closest developer. However, to the best of our knowledge, we are not aware of any tool designed to recommend similar bugs to maintainers of open-source systems. 54

55 Figure 1. NextBug Screenshot (similar bugs are shown in the lower right corner) 3. NextBug in a Nutshell In this section, we present NextBug 1 main features (Section 3.1). We also present the tool s architecture and main components (Section 3.2) Main Features Currently, there are several ITSs that are used in software maintenance such as Bugzilla, Jira, Mantis, and RedMine. NextBug was implemented as a Bugzilla plug-in mainly because this ITS is used by the Mozilla project, which was used to validate our tool. When a developer is analyzing or browsing an issue, NextBug can recommend similar bugs in the usual Bugzilla web interface. As described in Section 3.2, NextBug uses a textual similarity algorithm to verify the similarity among bug reports. Figure 1 shows an usage example of our tool. This figure shows a real bug from the Mozilla project, which refers to a FirefoxOS application issue related to a mobile device camera (Bug ). As we can observe, Bugzilla shows detailed information about this bug, such as a summary description, creation date, product, component, operational system, and hardware information. NextBug extends this original interface by showing a list of similar bugs to the browsed one. This list is shown on the lower right corner. Another important feature is that NextBug is only executed if its Ajax link is clicked and, thus, it will not cause additional overhead or hinder performance to developers who do not wish to use similar bug recommendations. In Figure 1, NextBug suggested three similar bugs to the one which is browsed on the screenshot. As we can note, NextBug not only detects similar bugs but it also calculates an index to express this similarity. Our final goal is to guide the developer s workflow by suggesting similar bugs to the one he/she is currently browsing. If a developer chooses to handle one of the recommended bugs, we claim he/she can minimize the context change inherent to the task of handling different bugs and, consequently, improve his/her productivity. 1 NextBug is open-source and available under the Mozilla Public License (MPL) at < labsoft.dcc.ufmg.br/nextbug/>. 55

56 Figure 2. NextBug Architecture 3.2. Architecture and Algorithms Figure 2 shows NextBug s architecture, including the system main components and the interaction among them. As described in Section 3.1, NextBug is a plug-in for Bugzilla. Therefore, it is implemented in Perl, the same language used in the implementation of Bugzilla. Basically, NextBug instruments the Bugzilla interface used for browsing and for selecting bugs reported for a system. NextBug registers an Ajax event in this interface that calls NextBug passing the browsed issue as an input parameter. NextBug architecture has two central components: Information Retrieval (IR) Process and Recommender. The IR Process component obtains all open issues currently available on the Bugzilla system along with the browsed issue. Then it relies on IR techniques for natural language processing such as: tokenization, stemming, and stop-words removal [Runeson et al. 2007]. We implemnted all such techniques in Perl. After this processing, the issues are transformed into vectors using the vector space model [Baeza-Yates and Ribeiro-Neto 1999; Runeson et al. 2007]. VSM is a classical information retrieval model to process documents and to quantify their similarities. The usage of VSM is accomplished by decomposing the data (available bug reports and queries) into t-dimensional vectors, assigning weights to each indexed term. The weights w i are positive real numbers that represent the i-th index term in the vector. To calculate w i we used the following equation, which is called a tf-idf weight formula: w i = (1 + log 2 f i ) log 2 N n i where f i is the frequency of the i-th term in the document, N is the total number of documents, and n i is the number of documents in which the i-th term occurs. The Recommender component receives the processed issues and verifies the ones similar to the browsed issue. The similarity is computed using the cosine similarity measure [Baeza-Yates and Ribeiro-Neto 1999; Runeson et al. 2007]. More specifically, the similarity between the vectors of a document d j and a query q is described by the following equation, which is called the cosine similarity because it measures the cosine of the angle between the two vectors: Sim(d j, q) = cos(θ) = dj q d j q = t t i=1 w i,d w i,q i=1 (w i,d) 2 t i=1 (w i,q) 2 56

57 Since all the weights are greater or equal to zero, we have 0 Sim(d j, q) 1, where zero indicates that there is no relation between the two vectors, and one indicates the highest possible similarity, i.e., both vectors are actually the same. The issues are then ordered according to their similarity before being returned to Bugzilla. Since NextBug started as an Ajax event, the recommendations are showed in the same Bugzilla interface used by developers for browsing and selecting bugs to fix. 4. Evaluation We used a dataset with bugs from the Mozilla project to evaluate the proposed tool. Mozilla is composed of 69 products from different domains which are implemented in different programming languages. Mozilla project includes some popular systems such as Firefox, Thunderbird, SeaMonkey, and Bugzilla. We considered only issues that were actually fixed from January 2009 to October More specifically, we ignored issue types such as duplicated, incomplete, and invalid. Mozilla issues are also classified according to their severity in the following scale: blocker, critical, major, normal, minor, and trivial. Table 1 shows the number and the percentage of each of these severity categories in the considered dataset. This scale also includes enhancements as a particular severity category. Although, it was not considered in our study, i.e., we do not provide recommendations for similar enhancements. Severity Table 1. Issues per Severity Issues Days to Resolve Number % Min Max Avg Dev Med blocker 2, critical 7, enhancement 3, major 7, minor 3, normal 103, trivial 2, Total 130, Final Dataset 109, Table 1 also shows the number of days required to fix the issues in each category. We can observe that blocker bugs are quickly corrected by developers, showing the lowest values for maximum, average, standard deviation, and median measures among the considered categories. The presented lifetimes also indicate that issues with critical and major severity are closer to each other. Finally, enhancements are very different from the others, showing the highest values for average, standard deviation, and median. Issues marked as blocker, critical, or major were not considered in our evaluation because developers have to fix them as quickly as possible. In other words, they would probably not consider fixing other issues together, since their ultimate priority is to fix the main blocker issue. In other words, our dataset is formed by fixed issues classified as normal, minor, and trivial. These issues count for 109,154 bugs (83.64%) from our initial population of bugs available for the NextBug evaluation. 57

58 We used three metrics in our evaluation: Feedback, Precision, and Likelihood. These metrics were inspired by the evaluation followed by the ROSE recommendation system [Zimmermann et al. 2004]. Feedback presents the ratio of queries where NextBug makes at least k recommendations. Precision indicates the percentage of recommendations that were actually relevant among the top-k suggestions by NextBug. Finally, Likelihood indicates whether at least one relevant recommendation is included in NextBug s top-k suggestions. In this evaluation, we defined a relevant recommendation as one that shares the same developer with the main issue. More specifically, we consider that a recently created issue q is connected to a second opened issue when they are handled by the same developer. The assumption in this case is that our approach fosters gains of productivity whenever it recommends issues that were later fixed anyway by the same developer. Figure 3 shows the average feedback (left chart), precision (central chart) and likelihood (right chart) up to k = 5. metric value Feedback Precision Likelihood k=1 k=2 k=3 k=4 k=5 Figure 3. Average Evaluation Results for k = 1 to k = 5. We summarize our results as follows: We achieved a feedback of 0.63 for k = 1. Therefore, on average, NextBug made at least one suggestion for 63% of the bugs, i.e., for every five bugs NextBug was able to provide at least one similar recommendation for three of those. Moreover, NextBug showed on average 3.2 recommendations for its queries. We achieved a precision of 0.31 or more for all values of k. In other words, the NextBug recommendations were on average 31% relevant (i.e., further handled by the same developer), no matter how many suggestions were given. We achieved a likelihood of 0.54 for k = 3. More specifically, in about 54% of the cases, there is a top-3 recommendation that was later handled by the same developer responsible for the original bug. We also conducted a survey with Mozilla developers using our tool. We gave recommendations suggested by NextBug to 176 Mozilla maintainers and asked them a few questions. Our summarized results are: (i) 77% found our recommendations relevant; (ii) 85% confirmed that a tool to recommend similar bugs would be useful to the Mozilla community and it would allow them to do more work in less time. 58

59 4.1. Example of Recommendation Table 2 presents an example of a bug (browsed or main issue) opened for the component DOM:Device Interfaces of the Core Mozilla product and the first three recommendations (top-3) suggested by our tool for this bug. As we can observe in the summary description, both query and recommendations require maintenance in the Device Storage API, used by Web apps to access local file systems. Moreover, all four issues were handled by the same developer (Dev ID ). Table 2. Example of Recommendation Similarity Bug ID Summary Creation Date Fix Date Device Storage - Default Browsed location for device storage on windows should be NS WIN PERSONAL DIR Top-1 56% Device Storage - Clean up error strings Top-2 47% Device Storage - Convert tests to use public types Top-3 42% Device Storage - use a properties file instead of the mime service We can also observe that the three recommended issues were created before the original query. In fact, the developer fixed the bugs associated to the second and the third recommendations in the same date which he has fixed the original query, i.e. on However, he only resolved the other recommended bug (ID ) 41 days latter, i.e., on Therefore, our approach would have helped this maintainer to discover quickly the related issues. This task probably demanded more effort without a recommendation automatically provided by a tool like NextBug. 5. Conclusion This paper presented NextBug, a tool for recommending similar bugs. NextBug is implemented as a plug-in for Bugzilla, a widely used Issue Tracking Systems (ITS), specially used by open-source systems. The proposed tool relies on information retrieval techniques to extract semantic information from issue reports in order to identify the similarity of open bugs with the one that is being handled by a developer. We evaluate the NextBug with a dataset of 109,154 Mozilla bugs, achieving feedback results of 63%, precision results around 31% and likelihood results greater than 54%. Those results are very reasonable compared to other recommendation tools. We also conducted a survey with 176 Mozilla developers using recommendations provided by NextBug. From such developers, 77% of them thought our recommendations were relevant and 85% confirmed that a tool like NextBug would be useful to the Mozilla community. 6. Acknowledgements This work was supported by CNPq, CAPES, and FAPEMIG. 59

60 References [Alipour et al. 2013] Alipour, A., Hindle, A., and Stroulia, E. (2013). A contextual approach towards more accurate duplicate bug report detection. In 10th Working Conference on Mining Software Repositories (MSR), pages [Anvik et al. 2006] Anvik, J., Hiew, L., and Murphy, G. C. (2006). Who should fix this bug? In 28th International Conference on Software engineering (ICSE), pages [Anvik and Murphy 2011] Anvik, J. and Murphy, G. C. (2011). Reducing the effort of bug report triage: recommenders for development-oriented decisions. ACM Transactions on Software Engineering Methodology (TOSEM), 20(3):10:1 10:35. [Aziz et al. 2009] Aziz, J., Ahmed, F., and Laghari, M. (2009). Empirical analysis of team and application size on software maintenance and support activities. In 1st International Conference on Information Management and Engineering (ICIME), pages [Baeza-Yates and Ribeiro-Neto 1999] Baeza-Yates, R. A. and Ribeiro-Neto, B. (1999). Modern information retrieval. Addison-Wesley, 2nd edition. [Banker and Slaughter 1997] Banker, R. D. and Slaughter, S. A. (1997). A field study of scale economies in software maintenance. Management Science, 43: [Cavalcanti et al. 2013] Cavalcanti, Y. C., Mota Silveira Neto, P. A., Lucrédio, D., Vale, T., Almeida, E. S., and Lemos Meira, S. R. (2013). The bug report duplication problem: an exploratory study. Software Quality Journal, 21(1): [Ihara et al. 2009] Ihara, A., Ohira, M., and Matsumoto, K. (2009). An analysis method for improving a bug modification process in open source software development. In 7th International Workshop Principles of Software Evolution and Software Evolution (IWPSE-Evol), pages [Junio et al. 2011] Junio, G., Malta, M., de Almeida Mossri, H., Marques-Neto, H., and Valente, M. (2011). On the benefits of planning and grouping software maintenance requests. In 15th European Conference on Software Maintenance and Reengineering (CSMR), pages [Liu et al. 2012] Liu, K., Tan, H. B. K., and Chandramohan, M. (2012). Has this bug been reported? In 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), pages 28:1 28:4. [Marques-Neto et al. 2013] Marques-Neto, H., Aparecido, G. J., and Valente, M. T. (2013). A quantitative approach for evaluating software maintenance services. In 28th ACM Symposium on Applied Computing (SAC), pages [Mockus et al. 2002] Mockus, A., Fielding, R. T., and Herbsleb, J. D. (2002). Two case studies of open source software development: Apache and Mozilla. ACM Transactions on Software Engineering and Methodology, 11(3): [Runeson et al. 2007] Runeson, P., Alexandersson, M., and Nyholm, O. (2007). Detection of duplicate defect reports using natural language processing. In 29th International Conference on Software Engineering (ICSE), pages [Tan and Mookerjee 2005] Tan, Y. and Mookerjee, V. (2005). Comparing uniform and flexible policies for software maintenance and replacement. IEEE Transactions on Software Engineering, 31(3): [Zimmermann et al. 2004] Zimmermann, T., Weisgerber, P., Diehl, S., and Zeller, A. (2004). Mining version histories to guide software changes. In 26th International Conference on Software Engineering (ICSE), pages

61 FunTester: A fully automatic functional testing tool Thiago Delgado Pinto 1,2, Arndt von Staa 2 * 1 Informatics Department Federal Center of Technological Education (CEFET/RJ) Nova Friburgo RJ Brazil 2 Informatics Department Pontifical Catholic University of Rio de Janeiro (PUC-Rio) Rio de Janeiro RJ Brazil {tpinto,arndt}@inf.puc-rio.br Abstract. This paper presents a free, multi-language, model-based testing tool that uses use cases and their business rules for generating relevant functional tests with test data and oracles. These business rules can describe constraints about data located at external sources such as relational databases, and use them for generating tests. The tool also executes the tests and analyzes their results. 1. Introduction Video available at Over the last years, researchers have been using Model-Based Testing (MBT) to address the problem of the automatic test generation. Tools like [1], [2], and [3] use interesting approaches that allow software engineers to derive the system under test (SUT) model from use case scenarios written in structured or natural languages, without needing a formal modeling expertise. Such tools were also successful in presenting a set of directives to guide the test generation process in an automated manner. However, these directives did not tackle the test data generation or the automatic oracle generation, both very important to create effective tests. In a previous work [4], we described a successful approach to solve these problems. In this paper we present FunTester, 1 an open-source tool that uses our approach and tries to fill these gaps, using business rules and techniques like equivalence partitioning and boundary value analysis for generating test data and oracles. The tool is supposed to be used for a wide range of applications such as websites and form-based desktop and mobile applications. 2. Overview The tool provides a GUI for helping the user documenting functional requirements using use cases. When describing a use case, the user can detail its basic and alternative flows and define some business rules about the widgets involved in it. These business rules try to capture the accepted values, value ranges or formats, and describe the expected behavior when a user enters an incorrect value. In this way, the tool can generate valid and invalid test values for using in different tests, and create oracles that will verify whether the system under test (SUT) is behaving as expected (e.g.: show the right error message). * Financially supported by CNPq/Brazil grant / FunTester is available at with an open-sourced license. 61

62 After describing a use case and its business rules, the tool can generate abstract test cases and test scripts, run these scripts and evaluate their results. Each abstract test case is an instance of a testing scenario with testing data and oracles, in a structure not tied to programming languages or testing frameworks. FunTester can transform these abstract test cases into test scripts using plug-ins. Each plug-in executes these scripts and transforms the execution results (e.g. a XML file containing the test results) into a format that the automatic test tool can read. After running the plug-in, the tool presents the results, relating failing tests with their respective scenarios and use case steps, so that the tester can diagnose the failures in order to identify their causes. Figure 1 shows an overview of this process Main Features Figure 1 Process FunTester's main features are: (i) generating abstract test cases with data and corresponding oracles that explore the software business rules and try to expose defects in the SUT; (ii) transforming abstract test cases into test scripts through plug-ins (iii) executing the test scripts through plug-ins and testing frameworks and collects their results; (iv) analyzing test results and requirements, helping a user to understand the reason of failures. Other interesting features are: a) configurable vocabulary: a vocabulary is, in this context, a kind of translation of a profile, which is a set of reserved words (e.g. "click", "type", "move") used to compose the steps (sentences) of a flow. It can use one or more synonyms to better express the intent of a system action or user (actor) action. In this way, the documentation can be written in, say, French, but the generated tests will stay in English; b) referencing external databases in business rules: allow defining the data source for values of editable widgets through database queries, so the tests can use these data to generate valid or invalid values. This is especially useful for testers because they can prepare a testing database with values similar to those used in a production environment for simulating a real use of the system; and c) generate tests with meaningful names: each test method name aims to help the tester understanding what the method verifies, and thus making failure diagnose easier, when compared to names generated with record-and-playback tools (such as t1, t2, t3, and so forth). For instance, a test named price_with_random_value_below_lower_limit will fill out all the editable fields with valid values except for the price, which will be filled out with a random value below its lower limit. 62

63 A comparison to other tools' approaches can be found at [4] Test Cases Meyers et al. [5] affirm that test cases that explore boundary conditions have higher pay off than test cases that do not. Most of FunTester's test cases explore boundary conditions. Chen et al. [6] indicate that failures are likely to manifest themselves on or near the boundary between subdomains and test cases based on the knowledge about the program input domain exploring these boundary conditions can help revealing failures. Our tool uses the business rules to create equivalence classes and generate test cases with valid and invalid random data, according to the these classes. Figure 2(a) shows an example of a valid value/length range, according to defined lower and upper bounds. Examples of generated valid values/lengths are: (i) lower bound; (ii) just above the lower bound; (iii) zero, whether applicable; (iv) the median, whether applicable; (v) just below the upper bound; (vi) upper bound; and (vii) random value/length between the lower and upper bounds. Figure 2 Valid and invalid ranges Figure 2(b) shows an example of invalid value/length ranges. Examples of generated invalid values/lengths are: (i) just below the lower bound; (ii) random value/length below the lower bound; (iii) just above the upper bound; (iv) random value/length above the upper bound. Additionally, it also generates values with invalid format, according to the respective business rules. Each abstract test case verifies a use case scenario. When a scenario is being executed, depending on the data informed by a user, there could be variations in the expected behavior. For instance, given a form that has an field (widget), whether the value of this has a format considered invalid, according to its business rule, it could be expected from the system to ask the user to correct this value. In such case, the scenario is behaving differently from it would do whether a valid value would be informed. Thus, the tool generates different tests for a same scenario, each one in a test method Architecture The solution is distributed as a set of four basic Maven 3 projects: (i) core: contains the main project classes and artifacts; (ii) common: useful classes used by the other solution's projects; (iii) app: the application's user interface; (iv) plugin-common: base classes for Java plug-ins. Albeit the solution is implemented in Java, its plug-ins do not need to be, nor do they need to use the plugin-common project. A plug-in can be any executable file (including executable Java Archive files) that follows some rules such as receiving 2 More information on this at

64 specific parameters and producing a test execution report file. The tool detects and executes the plug-in through a plug-in descriptor a simple JSON file containing some information about the plug-in, and how to run it. A plug-in is responsible for (i) transforming abstract test cases into test scripts; (ii) running the tests; and (iii) analyzing and transforming the framework-specific execution results into a common framework-independent format (e.g. transforming a XML file produced by JUnit 4 into the JSON file format expected by FunTester). Figure 3 illustrates the process performed by a plug-in. Figure 3 Plug-in execution process Most of the files handled by FunTester are JSON 5 files, enabling a user to visualize or, if needed, to edit them in simple text editors (e.g. for solving version conflicts), and control their evolution using any version control system. 4. End-to-end example For illustrating the tool's usage, let us document a little FunTester's use case called "Create a Software", shown in Figure 4. Observe that FunTester is being used for documenting itself. Figure 5 shows how to document the target use case ("Create a Software"), its Basic Flow, and one of the Steps of this Basic Flow (an Oracle Step). In this use case, the system shows the Software dialog, the actor types the software name, selects one of the available vocabularies, clicks "OK", the system checks the business rules and, whether they are being met, closes the dialog. Figure 4 "Create a Software" use case

65 Figure 5 Creating the use case, flow, and steps Figure 6 Describing elements and business rules Figure 6 shows how a test analyst can document the business rules involved in the use case, through the Elements tab. He or she can also describe the files that should be included in the test cases through the Include Files tab. The Elements tab presents the user interface elements (widgets) involved in the flow's steps. The test analyst can give some information about these elements, such as their internal names (the widget names), types (e.g. textbox, combobox, button, etc.), and accepted value types and business rules and whether they are editable. The currently available business rule types (as per version 0.7) are: (a) Minimum and maximum values: the minimum or the maximum values for numbers, dates, times, 65

66 and date-time values; (b) Minimum and maximum lengths: the minimum or the maximum lengths; (c) Required: whether the element must be filled out; (d) Regular Expression: a regular expression for the value; (e) Equal to: an accepted value; (f) One of: a list of accepted values; and (g) Not one of: a list of non-accepted values. Each of these business rules types can also be configured to come from database queries or from other elements (widgets). Each database query accepts parameters that can come from other business rule configurations, what makes the business rules very flexible. After describing the business rules and included files, we are ready to generate the tests. However, the tests will not know how to fire the use case "Create a Software", because it is fired through our main screen (see Figure 4). In this case, we describe an "Access System" use case with an alternate flow that calls our target use case. Now our tests can execute the system and reach our target use case. Figure 7 shows the Generate and Run dialog used to configure the test generation, which involves the abstract test generation, the plug-in selection, the generated test code, and the parameters to run the tests and get their execution results. Figure 7 Test generation configuration and execution Figure 8 shows the screen that presents the execution results and an example of source code generated by the FEST Plug-in for FunTester (Java with FEST 6 and TestNG 7 frameworks)

67 Figure 8 Execution results In case of failing tests, a tester can view some details about the failures (e.g. the execution trace, the related use case step) that can give him/her relevant information about the problem. 5. Final remarks This paper presented FunTester, a fully automatic model-based functional testing tool that generates and executes test suites from use case specifications. The tool reifies our approach [4] and can be used in a wide range of applications such as websites and form-based desktop and mobile applications. We are currently developing a plug-in for Selenium 8 and other for Robotium 9 by of means of allowing testing web, ios, and Android (native and web) applications with JUnit or TestNG. More information on the tool and plug-in development can be found at the FunTester Wiki page: Our future plans include (i) improving the flexibility of the test case steps to allow other kinds of interaction between an actor and the system; (ii) allowing a system analyst using steps in the business rules (just like he/she does for the flows) for defining the expected system behavior, aiming at generating other kinds of test oracles; (iii) reducing the number of generated scenarios by using a history-based and incremental use case combination; and (iv) creating plug-ins for other programming languages and testing frameworks. References [1] Felype Ferreira, Laís Neves, Michelle Silva, and Paulo Borba, "TaRGeT: a Model Based Product Line Testing Tool," in 1st Brazilian Conference on Software: Theory and Practice, Salvador, Bahia, 2010, pp [2] Neil W. Kassel, "An approach to automate test case generation from structured use cases," Clemson University, Clemson, SC, USA, Doctor Dissertation

68 [3] Mingyue Jiang and Zuohua Ding, "Automation of test case generation from textual use cases," Hangzhou, China, [4] Thiago Delgado Pinto and Arndt von Staa, "Functional validation driven by automated tests," in XXVII Brazilian Symposium on Software Engineering (SBES 2013), Brasília, 2013, pp [5] Glenford J. Myers, Corey Sandler, and Tom Badgett, The Art of Software Testing, 3rd ed., Wiley, Ed., [6] Tsong Yueh Chen, Fei-Ching Kuo, Robert G. Merkel, and T. H. Tse, "Adaptive Random Testing: the Art of Test Case Diversity," Journal of Systems and Software, vol. 83, no. 1, pp , January

69 JMLOK2: A tool for detecting and categorizing nonconformances Alysson Milanez 1, Dênnis Sousa 1, Tiago Massoni 1, Rohit Gheyi 1 1 Department of Computing Systems UFCG alyssonfilgueira@copin.ufcg.edu.br,dennis.souza@ccc.ufcg.edu.br, {massoni,rohit}@dsc.ufcg.edu.br Abstract. In contract-based programs, detection and characterization of nonconformances is hard. Assigning categories to nonconformances can be useful for maintenance. In this work, we present JMLOK2, which detects and categorizes nonconformances, suggesting their likely causes. We evaluated this tool by comparing its categorization results with manually-provided results, with respect to 84 nonconformances discovered in Java Modeling Language (JML) projects summing up 29 KLOC and 9 K lines of contracts. JMLOK2 is demonstrated online: 1. Introduction In contract-based programs [Guttag et al. 1993] (as with the Java Modeling Language (JML) [Leavens et al. 1999]), early detection of nonconformances is highly desirable, in order to provide a more reliable account of correctness and robustness [Meyer 1997]. However, nonconformance detection can be hard to achieve. Formal conformance is quite costly and not scalable, making it unfeasible for large-scale development. Therefore, developers tend to apply automated, although incomplete, approaches. For JML, there are basically two ways to automatically check conformance: statically, with ESC/Java2 [Cok and Kiniry 2004]; and dynamically, with several tools (JMLUnit [Cheon and Leavens 2002b], JMLUnitNG [Zimmerman and Nagmoti 2011], JET [Cheon 2007], Jartege [Oriat 2005], and Korat [Boyapati et al. 2002]). Nevertheless, those approaches present limitations, mostly by falling short in providing (1) effective test data generation; (2) comprehensive unit tests fully exercise sequences of calls to unveil subtle nonconformances (as seen in Section 2); and (3) categorization of detected nonconformances. In this paper, we describe JMLOK2, a tool for detecting and categorizing nonconformances in contract-based programs. The tool applies a randomly-generated tests (RGT) for detecting nonconformances, and a heuristics-based approach for categorizing nonconformances (Section 3). JMLOK2 was evaluated in two scenarios: first, JMLOK2 was applied to open-source JML projects, in order to assess the applicability of the approach in detecting and categorizing nonconformances; then, JMLOK2 was compared with JET [Cheon 2007], to the best of our knowledge, the only tool that does not require test data provision (Section 4). 69

70 2. Motivating Example In JML, contracts are written as qualified comments (Listing 1). The following example is adapted from TransactedMemory experimental unit (details in Section 4) visibility issues are omitted, for simplicity. Listing 1. GenCounter and MapMemory classes c l a s s GenCounter { / /@ i n v a r i a n t 0 <= cntgen && cntgen <= MapMemory.MAX; i n t cntgen ; GenCounter ( ) { cntgen= 1 ; } / /@ e n s u r e s ( b == t r u e )==>(cntgen == \ o l d ( cntgen + 1 ) ) ; void updatecount ( boolean b ){ i f ( b ){ cntgen ++; }} / /@ e n s u r e s cntgen == 0; void r e s e t C o u n t ( ) { cntgen= 0 ; }} c l a s s MapMemory { f i n a l s t a t i c i n t MAX = 3, MSIZE = 1 0 ; GenCounter g ; boolean [ ] map ; i n t pos ; MapMemory ( ) { g= new GenCounter ( ) ; map= new boolean [ MSIZE ] ; pos= 0 ; } / /@ r e q u i r e s pos < MSIZE 1; void updatemap ( boolean m){ map [ pos ++]= m; g. updatecount (m) ; } / /@ e n s u r e s pos == 0; void resetmap ( ) { map= new boolean [ MSIZE ] ; g. r e s e t C o u n t ( ) ; pos= 0 ; }} GenCounter represents a piece of information about some named tag, while MapMemory represents a Java implementation of memory for smart cards. These classes declare a constructor and two methods: one for updating values and one for resetting values. JML contracts are declared with keywords requires and ensures, specifying pre- and postconditions, respectively, for a method. Class invariant clause must hold after constructor execution, and before and after every method call; the invariant in GenCounter enforces field cntgen must be in the range [0, MapMemory.MAX]. The \old clause used in the postcondition refers to pre-state value of cntgen. Despite its simplicity, this program is not in conformance with its contracts. The nonconformance in GenCounter can be detected only with at least a sequence of three calls to MapMemory.updateMap with parameter m = true. In Listing 2, a test case reveals this problem. This example is illustrative: nonconformances between contract and implementation may be subtle to detect, even in small programs more complex programs tend to represent greater challenges for detection. Regardless of where the bug is located (contract or code, or both), the failure may only arise within a sequence of calls to two or more methods, called in a particular order. This nonconformance can be removed by adding a precondition to GenCounter.updateCount, testing whether the value of cntgen is less than MapMemory.MAX. Listing 2. A test case revealing the nonconformance from GenCounter class MapMemory m = new MapMemory ( ) ; m. updatemap ( t rue ) ; m. updatemap ( t rue ) ; m. updatemap ( t rue ) ; 3. JMLOK2 In this work, we propose and implement a RGT-based (randomly-generated tests) approach to detect nonconformances, and a categorization model for those nonconformances. Our approach automatically generates and executes tests, comparing the test 70

71 results with oracles (generated from the contracts). The generated tests are composed of sequences of calls to methods and constructors under test, while the test oracles are assertions from the contracts, generated from JML contracts by specialized compilers, such as jmlc [Cheon and Leavens 2002a] and OpenJML [Cok 2011]. After test execution, two filters are applied: first, meaningless test cases are discarded [Cheon 2007] tests violating a precondition in the first call to a method under test. The remaining failures consist of relevant contract violations, which are candidate nonconformances. The second filter distinguishes faults from the returned failures those faults make up the nonconformances subject to categorization process. Regarding to nonconformances categorization, we propose a three-level model composed by a category, a type and a likely cause. The category corresponds to the artifact in which probably occurs the nonconformance source code or contract. The type is given automatically by the assertion checker, and corresponds to violated part of JML - considering only visible behavior from the systems. The suggested likely cause is given by specific heuristics derived from our experience in investigate likely causes for nonconformances. This model is implemented in a heuristics-based approach, which suggests a specific category and likely cause for a given nonconformance. Each heuristic is based on a set of possible scenarios related to the type of detected nonconformance. Based on the contract-based program, the nonconformance type and the corresponding set of heuristics, a likely cause is suggested. For instance, regarding an invariant error in class C, when calling method m, we suggest a likely cause with the following heuristics: (1) First check for uninitialized fields in C; in this case, suggest category Code error; (2) Otherwise, check for the absence of precondition (default = true), or the presence of at least one field modified in m body; in either case, suggest category Contract error, and likely cause Weak precondition; (3) Otherwise, suggest category Contract error, and likely cause Strong invariant. From the example in Section 2, an invariant nonconformance, as method GenCounter.updateCount has default precondition, the likely cause suggested is Weak precondition. JMLOK2 is the implementation of this approach in the context of Java/JML programs (JMLOK2 is an improvement over JMLOK [Varjão et al. 2011]). JMLOK2 avoids false positives by grouping failures into faults, whereas JMLOK tool presented the overall failures (possibly repeated) revealed by the tests. Moreover, JMLOK2 is an extension that categorizes nonconformances. JMLOK2 is available online: http: //massoni.computacao.ufcg.edu.br/home/jmlok, for Windows, Mac and Linux platforms under the GNU (GNU General Public License) GPL v3. In the detection module, the test generation is performed automatically by Randoop [C. Pacheco and Ball 2007]. Randoop is a feedback-directed test generator tool, producing JUnit test cases with sequences of calls to methods and constructors. This feature made Randoop a satisfactory infrastructure for our approach. In addition, the test oracle generation is performed by the jmlc compiler; although OpenJML is currently recommended by the JML research community, it still presents limitations that led us to report false positives 1. Although jmlc presents no active development, it is mature enough to support the most basic JML features, limited to Java 1.4. Afterwards, the two filters are 1 We have contacted the OpenJML team for support, but a solution was not feasible for our time constraints, so this integration is left for future work. 71

72 enacted. Next, the set of distinct nonconformances is returned to the categorization module. In the categorization module, the contract-based program and a set of heuristics are used to suggest a likely cause for each nonconformance. Subsequently the categorization, the list of categorized nonconformances is sent to the Controller module, which sends the list to the UI. 4. Evaluation 4.1. Detection and Categorization The first study assesses JMLOK2 with respect to nonconformance detection and categorization, from the point of view of the developer in the context of JML programs. This study addresses the following research questions: Q1. Is JMLOK2 able to detect nonconformances in JML programs? Q2. How many answers from tool are coincident with our previous manual analysis? The experimental units consist of sample programs from the JML web site 2 and collected open-source JML projects. Samples include eleven example programs for training purposes 3. Regarding open-source JML programs, Bomber [Rebêlo et al. 2009] is a mobile game; HealthCard [Rodrigues 2009] is an application for management of medical appointments into smart cards. JAccounting and JSpider are two case studies from the ajml compiler project [Rebêlo et al. 2009], implementing, respectively, an accounting system and a Web Spider Engine. In addition, Mondex [Tonin 2007] is a direct translation to JML from an existing Z specification 4 context. Finally, TransactedMemory [Poll et al. 2002] is a specific feature of the Javacard API. These units totalize over 29 KLOC and 9 K lines of JML contracts (that we will refer as KLJML henceforth) and are characterized in Table 1. Table 1. Programs characterization. Samples Bomber Health Transacted JAccounting JSpider Mondex Card Memory Total LOC 3,400 6,400 1,700 6,500 8,800 1,000 1,800 29,600 LJML 5, , ,268 The study was performed in a PC with CPU Intel Core i GHz, RAM 8 GB, OS Windows 8 and Java 7 update 51. Once Randoop [C. Pacheco and Ball 2007] requires a time limit to generate tests the time after which the generation process stops, we used 10s as basis 5. To collect data about test coverage we used EclEmma (Eclipse plugin) 6, and manually collect the JML coverage aided by EclEmma, which counted the number of assertions, generated from the contracts, covered by the tests. Table 2 presents the results of JMLOK2 for sample and open-source JML projects, including information about detected nonconformances. For sample programs, 2 leavens/jml/examples.shtml 3 dbc, digraph, dirobserver, jmlkluwer, jmltutorial, list, misc, reader, sets, stacks, table, and an adaptation of the subpackage stacks bounded We performed some experiments increasing the time limit from 10s to 120s, but results were unchanged, thus 10s was chosen as our reference time

73 18 nonconformances were detected 15 were categorized as postcondition errors, other two as invariant, and one as evaluation. For open-source JML projects, 66 nonconformances were detected: Bomber (4), HealthCard (30), JAccounting (23), Mondex (2), and TransactedMemory (7). JSpider did not present nonconformances. Regarding type, most of the 84 nonconformances had type postcondition (38), followed by invariant (35). Concerning likely causes, most of the 84 nonconformances had cause categorized as weak precondition (38), followed by Code error (23). Table 2. For each experimental unit we present: the number of generated test cases, test coverage, and all nonconformances detected, grouped by the nonconformance type, and by the likely cause manually assigned. Samples Bomber Health Transacted JAccounting JSpider Mondex Card Memory # of Generated Tests 7, , , Java Coverage 93.44% 11.62% 87.51% 36.14% 32.93% 53.42% 70.30% JML Coverage 96.33% 11.62% 87.51% 62.63% 32.93% 22.58% 55.93% Nonconformance s Type Postcondition error Invariant error Constraint error Evaluation error Precondition error Nonconformance s Likely Cause Weak precondition Code error Strong postcondition Undefined Strong precondition Strong constraint Strong invariant Weak postcondition Total Total Total We compare the results of JMLOK2 with the manual categorization presented in our tech report [Milanez 2014]. The coincidences ratio is measure by matches (Equation 1). We got matches = 1 for bounded, stack, misc, JAccounting, Mondex and TransactedMemory; and 0 (dbc), 0.2 (list), 0.5 (Bomber), and 0.63 (HealthCard). matches(x) = T otal of Agreements(x) T otal of Categorized Nonconformances(x) (1) where x is an experimental unit, and T otal of Agreements is the total of coincidences between automatic and manual categorization. Discussion Q1. The JMLOK2 tool was able to detect 84 nonconformances. The generated sequences come to be a benefit of the approach, as several nonconformances were only detected by running a particular sequence of constructor and method calls. For instance, a postcondition error in AbstractTransactedMemory (a class from TransactedMemory) is only revealed after 32 specific method calls. In addition, test coverage results are also varying. While Bomber showed a very low value (due to the 73

74 need of user interaction), Samples and HealthCard presented the highest coverage rates. Discussion Q2. The mean value of the matches metric used to compare the results from manual and automatic categorization was Nevertheless there were two cases in which the metric was very low: in Samples, 0.00 on dbc and 0.20 on list. In dbc, this result occurred because only a semantic analysis of the contract-based program can give a precise result. In list, the low matches metric is due to the fact of manual categorization assigned Undefined as likely cause, whereas the automatic categorization assigned Weak precondition. This difference occurred due to the fact the manual analysis was not able to understand whether the problem arises from a code or a specification problem. On the other hand, there were six experimental units: bounded, stacks, misc, JAccounting, Mondex, and TransactedMemory where the highest possible values to matches were obtained; and in the other two experimental units: Bomber, and HealthCard, had values 0.5, and 0.63, respectively. These results showed, although we are using a heuristics-based approach, our automatic categorization has good results in comparison with the baseline (manual categorization) Comparison between JMLOK2 and JET The goal of this study is to compare two RGT approaches: JMLOK2 and JET, for the purpose of evaluation with respect to their effectiveness from the point view of the developer in the context of JML programs. This study addresses the following question: Q3. Does the JMLOK2 approach perform better than the JET tool? This comparison considered only a subset of the experimental units from the first study, due to JET requirements [Cheon 2007]. The units were Samples, JAccounting, Mondex and TransactedMemory, totalizing over 6 KLOC and 5 KLJML; from JAccounting, only Account class was considered. This study was performed in the same machine setup of the first study (Section 4.1). We used the JM- LOK2 and JET tools with their default configurations. Table 3 presents the results of the experimental evaluation considering JMLOK2 and JET. The total number of nonconformances detected by JET was 9, against 30 nonconformances detected by JMLOK2. In relation to test coverage, only for JAccounting JET presented higher coverage. JET JMLOK2 Table 3. Comparison between JMLOK2 and JET. Samples JAccounting Mondex TransactedMemory # Tests 8,306 1, ,958 Java Coverage 62.86% 100% 7.50% 21.53% JML Coverage 63.70% 100% 11.60% 52.59% # NCs Types 2 invariant, 2 postcondition 4 postcondition - 1 invariant # Tests 7,581 1,000 3, Java Coverage 93.44% 96.60% 53.42% 70.30% JML Coverage 96.33% 95.83% 22.58% 55.93% # NCs Types 15 postcondition, 2 invariant, 1 evaluation 1 postcondition, 2 evaluation 2 invariant 6 invariant, 1 postcondition Discussion Q3. JET was able to reveal nonconformances not detected by JMLOK2, specially for the JAccounting experimental unit. However, we observed an important 74

75 drawback: the tool is inconstant about the nonconformances discovered; for instance, in JAccounting unit, different executions found different nonconformances: JET often detects zero nonconformances, then in the next execution shows four nonconformances; for the same unit JMLOK2 always find three nonconformances. Maybe the genetic algorithm in the backend makes JET differ between repeated executions. This property was not observed in JMLOK2, despite its RGT approach. Considering the test coverage, in general JMLOK2 performed better than JET. The only case where JET was better was JAccounting. This result can be related to JET requirements: no public fields can be assigned, and object sharing is not allowed; the tests miss several parts from the programs that do not fulfill those requirements, which does not occur with JMLOK2. Considering the number of nonconformances detected, the only case where JET performs better than JMLOK2 too was JAccounting: four against three. 5. Conclusions In this work, we present an approach for detecting and categorizing nonconformances in contract-based programs. In our experimental studies JMLOK2 detected 84 nonconformances in over 29 KLOC and over 9 KLJML. We reported those nonconformances and their classification to authors, and answers were positive. Furthermore, we classified the nonconformances and established likely causes; causes mostly split into Weak preconditions and Code errors. Comparing the coincidences matches between the automatic categorization (by means of JMLOK2 tool) and our manual categorization (baseline), we got a mean matches of When comparing JMLOK2 with JET, the first detected 30 nonconformances with Java instructions coverage of 78.44% and JML instructions coverage of 67.67%, while JET detected 9 nonconformances by covering 47.97% of Java instructions and 56.97% of JML instructions; for the same experimental units (a subset from the first study, totalizing approximately 6 KLOC and 5 KLJML). These numbers suggest JMLOK2 performs better than JET considering the number of nonconformances detected and test coverage (block instructions coverage), for the experimental units. As future work, we intend to improve the test generation of JMLOK2, integrate with Open- JML, and extend our model to treat with nonconformances into method bodies. Acknowledgment This work was supported by CAPES, CNPq PIBITI 04/2013 and the National Institute of Science and Technology for Software Engineering (INES 7 ), funded by CNPq, grant / References [Boyapati et al. 2002] Boyapati, C., Khurshid, S., and Marinov, D. (2002). Korat: Automated Testing Based on Java Predicate. In ISSTA. ACM. [C. Pacheco and Ball 2007] C. Pacheco, S. Lahiri, M. E. and Ball, T. (2007). Feedbackdirected random test generation. In ICSE. [Cheon 2007] Cheon, Y. (2007). Automated Random Testing to Detect Specification-Code Inconsistencies. In SETP

76 [Cheon and Leavens 2002a] Cheon, Y. and Leavens, G. (2002a). A Runtime Assertion Checker for the Java Modeling Language (JML). In SERP. CSREA Press. [Cheon and Leavens 2002b] Cheon, Y. and Leavens, G. (2002b). A Simple and Practical Approach to Unit Testing: The JML and JUnit Way. In ECOOP. Springer-Verlag. [Cok 2011] Cok, D. (2011). OpenJML: JML for Java 7 by Extending OpenJDK. In NFM. Springer-Verlag. [Cok and Kiniry 2004] Cok, D. and Kiniry, J. (2004). ESC/Java2: Uniting ESC/Java and JML Progress and issues in building and using ESC/Java2. In CASSIS. Springer- Verlag. [Guttag et al. 1993] Guttag, J., Horning, J., Garl, W., Jones, K., Modet, A., and Wing, J. (1993). Larch: Languages and Tools for Formal Specification. Spring-Verlag. [Leavens et al. 1999] Leavens, G., Baker, A., and Ruby, C. (1999). JML: A Notation for Detailed Design. [Meyer 1997] Meyer, B. (1997). Object-Oriented Software Construction. Prentice Hall. [Milanez 2014] Milanez, A. (2014). Case study on categorizing nonconformances. Technical report, Software Practices Laboratory, Federal University of Campina Grande. [Oriat 2005] Oriat, C. (2005). Jartege: A Tool for Random Generation of Unit Tests for Java Classes. In QoSA/SOQUA. Springer Berlin Heidelberg. [Poll et al. 2002] Poll, E., Hartel, P., and Jong, E. (2002). A Java Reference Model of Transacted Memory for Smart Cards. In CARDIS. USENIX Association. [Rebêlo et al. 2009] Rebêlo, H., Lima, R., Cornélio, M., Leavens, G., Mota, A., and Oliveira, C. (2009). Optimizing JML Features Compilation in ajmlc Using Aspect- Oriented Refactorings. In SBLP. [Rodrigues 2009] Rodrigues, R. (2009). JML-Based Formal Development of a Java Card Application for Managing Medical Appointments. Master s thesis, Universidade da Madeira. [Tonin 2007] Tonin, I. (2007). Verifying the Mondex Case Study: the KeY approach. Technical report, Fakultät für Informatik, Universität Karlsruhe. [Varjão et al. 2011] Varjão, C., Massoni, T., Gheyi, R., and Soares, G. (2011). JMLOK: Uma Ferramenta para Verificar Conformidade em Programas Java/JML. In CBSoft (Tools session). [Zimmerman and Nagmoti 2011] Zimmerman, D. and Nagmoti, R. (2011). JMLUnit: the Next Generation. In FoVeOOS 10. Springer-Verlag. 76

77 A Rapid Approach for Building a Semantically Well Founded Circus Model Checker Alexandre Mota 1, Adalberto Farias 2 1 Centro de Informática UFPE Caixa Postal Recife PE Brazil 2 Departamento de Sistemas e Computação UFCG Rua Aprígio Veloso, 882, Bloco CN Campina Grande PB Brazil acm@cin.ufpe.br, adalberto@computacao.ufcg.edu.br Abstract. Model checkers are tools focused on checking the satisfaction relation M =f, where M is a transition system (graph representation) of a specification written in a language L and f is a property. Such a graph may come from the semantics of L. We present a model checker resulted from a rapid prototyping strategy for the language Circus. We capture its operational semantics with the Microsoft FORMULA framework and use it to analyse classical properties of specifications. As FORMULA supports SMT-solving, we can handle infinite data communications and predicates. Furthermore, we create a semantically correct Circus model checker because working with FORMULA is equivalent to reasoning with First-Order Logic (Clark completion). We illustrate the use of the model checker with an extract of an industrial case study. Link: 1. Problem and Motivation Model checking [Clarke et al. 1994] is an automatic technique to verify the satisfiability of the relation M = f, where M is a model (a Labelled Transition System or Kripke structure) of some formal language L and f is a temporal logic formula. A model checker is a tool containing search procedures and specific representations for M and f. They use very specialized algorithms and data structures to achieve the best space and time complexities to check M = f. It is not common to find model checkers for rich-state space languages (that use elaborate data structures) and that clearly follow a formal semantics. Two essential issues are intrinsic to model checker development: how to guarantee that M conforms to the semantics (usually the Structured Operational Semantics SOS) of the language L, and how to guarantee the correctness of the check M = f (or M f M ). For instance, FDR [Roscoe et al. 1994] and PAT [Liu et al. 2010] had several delivered versions due to bug fixes. And [Mota and Sampaio 2001] analyses CSP-Z via FDR but it is not assured to be correct. The possibility of building a model checker from a formal semantics document plays an important role in this scenario. A very recent technology developed by Microsoft Research, known as FOR- MULA [Jackson et al. 2011] (Formal Modelling Using Logic Programming and Analysis), seems to be appropriate for creating semantically correct model checkers. It is based 77

78 on the Constraint Programming Paradigm [Rossi et al. 2006] and Satisfiability Modulo Theory (SMT) solving provided by Z3 [De Moura and Bjørner 2008]. Besides providing a high abstraction level for describing structures, FORMULA allows one to deal with some infiniteness aspects of data types and defining search procedures over structures. We used FORMULA as a framework for describing semantics and for analysing Circus models. The language Circus [Woodcock and Cavalcanti 2002] is a formal notation that combines Z [Woodcock and Davies 1996], CSP [Roscoe 2010], and constructs of the refinement calculi [Morgan 1990] and Dijkstra s language of guarded commands, which is a simple imperative language with nondeterminism. Our model checker contains a transcription from Circus SOS rules to FORMULA and the encoding of classical properties (deadlock, livelock and nondeterminism) properties in terms of (FORMULA) queries. The encoding of SOS rules allows FORMULA to create an LTS (as logical facts) for an arbitrary Circus specification according to the formal semantics, while queries check desirable properties by looking at certain logical facts. These tasks involve interaction with the Z3 SMT solver and, therefore, are powerful to handle a fair class of infinite state-space problems. Another interesting feature of FOR- MULA is that values can be dynamically instantiated to satisfy a property. Naturally, this instantiation represents a drawback if it does not validate the property (continuous search for other values can originate non termination). Fortunately, FORMULA works with the least fixed point search and, using some good practices, one can overcome this limitation. 2. Design and Implementation Figure 1 shows the scenario for creating semantically correct model checkers using FOR- MULA. The language of FORMULA includes algebraic data types (ADTs) and strongly typed constraint logic programming (CLP). This allows one to create concise specifications [Jackson et al. 2011], analysable by SMT-solving. The necessary elements to implement model checkers in FORMULA are a BNF grammar, an SOS, and a set of properties stated in some (temporal) logic. The SOS (associated to the constructors defined by the BNF) are described as abstractions (how to build a model for an instance of Circus and how to check properties over it) in FORMULA. The Circus model checker is a FOR- MULA abstraction. Regarding correctness of our model checker, we follow the idea of Clark completion [Dao-Tran et al. 2010] of a definite clause program, which makes the assumption that the axioms in a program completely axiomatise all possible reasons for atomic formulas to be true. This approach is also used by other works in the literature. Figure 1. A model checker product line We spent 2 months learning FORMULA, 8 months to create the proposed strat- 78

79 egy for any SOS, and 72 hours to build the model checker. This fast development is result of the high-level abstraction of FORMULA. Currently, our model checker does not have an optimal performance but it is semantically well founded. Other approaches like [Freitas 2005], for instance, spent a whole PhD to build a first model checker for Circus and the manipulation of infiniteness aspects is not fully automatic. Figure 2 illustrates the use of our model checker over the Graphic User Interface of Visual Studio. In fact, FORMULA relies on Visual Studio; that is, it uses libraries and components to implement a specific engine that is able to generate terms and validate constraints over it (automatically invoking Z3). Thus, in our model checker the user has to encode a Circus specification using suitable (and already defined) FORMULA constructors and inform the property to be checked. Then FORMULA generates the LTS and checks the property. Naturally, the translation from Circus to FORMULA requires knowledge about how Circus terms are mapped into FORMULA (it is almost a one-toone mapping). However, tools like Stratego/XT can be used for this purpose and a GUI (model checker front-end) emerges very easily. The answer returned by FORMULA is SAT or UNSAT. In the first case FOR- MULA is able to instantiate a value (even when its type is infinite) to validate the property. Moreover, as FORMULA creates the LTS (and all elements involved in its creation), deeper analyses using the internal structure of the LTS are possible. Figure 2. FORMULA running over Visual Studio 3. Practical Use and Case Study The Circus model checker requires previous installation of FORMULA 1, which is free and requires Microsoft Visual Studio 2. This makes the current version of the Circus model checker platform dependent as the underlying framework is from Microsoft. A FORMULA script can be put as part of a FORMULA project in Visual Studio. The user creates a new FORMULA project and replaces the default content with that of the script. Then analysis can be done by using features of Visual Studio. Figure 3 shows the analysis result of a Circus specification. It is possible to view information about the FORMULA code itself (B), the internal structure (domains and models in region C), execution time of internal tasks (D), executed queries and the base of facts containing the LTS and all elements used to create and analyse it (A)

80 Figure 3. Running FORMULA on Visual Studio To evaluate our Circus model checker, we consider the Emergency Response System (ERS) introduced in [Andrews et al. 2013b, Andrews et al. 2013a]. Figure 4 shows its outline view. The ERS model is a set of SysML diagrams and the behaviour in Circus is obtained from Activity Diagrams with specialized stereotypes. Due to space restrictions, we extract the code that corresponds to the activation, detection and recovery of faults. Listing C 1 shows the Circus specification. We add controller processes ERUs 0, ERUs 1, or ERUs 2. It adds details of the behaviour of the Call Centre, controlling the number of ERUs currently allocated. Version 0 has a flaw in the implementation of the schema AllocateState. The schema should add 1 to the previous allocated value. This simple mistake causes a deadlock on process ERSystem 0 because channel service rescue is never offered by ERUs 0 that is successfully detected by the model-checker. Process ERUs 1 fixes this problem and ERSystem 1 is deadlock-free. The FORMULA code and instructions to run our case study are available at adalbertocajueiro/research/circusmc. Figure 4. Outline of the ERS 4. Overall Architecture The heart of our model checker is the embedding of the structured operational semantics of Circus in FORMULA and the embedding (obtained by derivation rules) of predicates describing the classical properties in FORMULA. Obviously, we have made a modular embedding of the elements so that they are mapped to distinct parts/sections in the FOR- MULA script. This is illustrated in Figure 5, which is composed by several sections with dependencies between them (represented by arrows). 80

81 process ERUs 0 = begin end state Control == [ allocated, total erus : N ] InitControl == [ Control allocated = 0 total erus = 5 ] AllocateState == [ Control allocated = allocated ] Allocate = allocate idle eru AllocateState ; Choose ServiceState == [ Control allocated = allocated 1 ] Service = service rescue ServiceState ; Choose Choose = if[ Control allocated = 0 ] Allocate [][ Control allocated = total erus ] Service [][ Control allocated > 0 allocated < total erus ] Allocate Service fi InitControl ; Choose process InitiateRescueFault1Activation = begin CallCentreStart = start rescue FindIdleEru FindIdleEru = find idle erus (IdleEru (wait FindIdleEru)) IdleEru = allocate idle eru send rescue info to eru IR1 IR1 = (process message FAReceiveMessage) (fault 1 activation IR2) FAReceiveMessage = receive message ServiceRescue ServiceRescue = service rescue CallCentreStart IR2 = IR2Out (error 1 detection FAStartRecovery) IR2Out = drop message target not attended CallCentreStart FAStartRecovery = start recovery 1 end recovery 1 ServiceRescue CallCentreStart end process Recovery1 = begin Recovery1Start = start recovery 1 log fault 1 resend rescue info to eru process message receive message end recovery 1 Recovery1Start Recovery1Start end process ERSystem i {0,1} = InitiateRescueFault1Activation ERUs i ERUsSignals process ERSystem 2 = ERSystem 1 Recovery1 RecoverySignals chanset ERUsSignals == { allocate idle eru, service rescue } chanset RecoverySignals == { start recovery 1, end recovery 1 } Listing C 1: Emergency Response System processes Figure 5. Overall Embedding 81

82 The section Auxiliary Definitions contain the representation of types and operations over it. The Circus syntax is conveniently mapped (transcript) to a specific Syntax Domain section, which defines abstractions for Circus constructs. Then each Circus SOS rule is mapped to a logical rule in the Semantics Domain section of the FORMULA script. It contains abstractions useful to create the LTS according to the firing rules of the operational semantics of Circus [Woodcock et al. 2005]. Afterwards, the properties (described in first-order logic) are translated to FORMULA queries (kept in the Properties Domain section) and define the way the search engine will analyse the LTS to validate properties Comparisons Palikareva et al. [Palikareva et al. 2012] propose a prototype called SymFDR, which implements a bounded model checker for CSP based on SAT-solvers. The authors make a comparison to show that SymFDR can deal with problems beyond FDR (such as combinatorial complex problems). They reported that FDR outperforms SymFDR when a counter-example does not exist. In our work we extend the class of problems analysable by SymFDR with the aid of SMT-solving. This resulted in a more expressive approach to create the LTS because we do not depend on FDR. This makes our approach able to handle infinite state systems while SymFDR can only deal with systems that FDR can. Leuschel [Leuschel 2001] proposes an implementation of CSP in SICStus Prolog for interpretation and animation purposes. Part of the design of our model checker in FORMULA follows a similar declarative and logic representation. However, as we handle infinite state systems, we indeed implement a future work of [Leuschel 2001]. The advances of SMT-solving bring a new level of verification. Bjørner et al. [Bjørner et al. 2012] extend the SMT-LIB to describe rules and declare recursive predicates, which can be used by symbolic model checking. Alberti et al. [Alberti et al. 2012] propose an SMT-based specification language to improve the verification of safety properties. We use Circus as the language and a model checker with a new perspective for reasoning about infinite systems, where SMT-solving allows automatic verification and reasoning of concurrent systems described in Circus. Another similar approach was proposed in [Verdejo and Marti-Oliet 2002] and uses MAUDE for executing and verifying CCS (Concurrent Communicating Systems). According to that work, only behavioural aspects can be handled, whereas we deal with data aspects even if they come from an infinite domain and are involved in communications and in predicates. Moreover, that work also considers temporal logic, whereas we do not (it is not a Circus culture but FORMULA can handle it). We point out that MAUDE can be more powerful than FORMULA but it can be harder to guarantee convergence when applying rewriting rules. Our work is free of convergence problems because the engine of FORMULA focuses on finding the least fixed-point using SMT solving. 5. Final Remarks This work proposed a model checker for Circus that can handle infinite data (involved in communications and in predicates). The relation between first-order logic and FOR- MULA assures the semantic correctness of the model checker. The development strategy used here represents a remarkable result: it uses principles of model driven development, where a conceptual and abstract model (SOS) is the starting point for the implementation. 82

83 The use of FORMULA as underlying framework has been crucial for reducing time and complexity in the development life-cycle. The real implementation of Circus semantics took about 3 days. If compared with normal approaches for implementing model checkers, this is incredibly smaller. Besides that, the correctness is another immediate benefit of our development strategy. Our model checker is freely available at sites.google.com/site/adalbertocajueiro/research/circusmc 3. Our work is used in the context of the COMPASS 4, which uses CML a formal language based on the maturity of Circus and is a combination of VDM, CSP, and the refinement calculus of Morgan [Morgan 1990]. Our model checker has been adapted to CML and also allows time aspects. It is available in a single language and tool support. As future work we intend to propose a DSL [Fowler 2010] for describing SOS rules (following [Corradini et al. 2000], for example), using Stratego/XT [Visser 2004] or QVT [Dan 2010] to automatize the generation of FORMULA abstractions, giving a UTP semantics [Hoare and He 1998] to FORMULA and developing a refinement calculus. References Alberti, F., Bruttomesso, R., Ghilardi, S., Ranise, S., and Sharygina, N. (2012). Reachability Modulo Theory Library (Extended abstract). In SMT Workshop. Andrews, Z., Didier, A., Payne, R., Ingram, C., Holt, J., Perry, S., Oliveira, M., Woodcock, J., Mota, A., and Romanovsky, A. (2013a). Report on timed fault tree analysis fault modelling. Technical Report D24.2, COMPASS. Andrews, Z., Payne, R., Romanovsky, A., Didier, A., and Mota, A. (2013b). Model-based development of fault tolerant systems of systems. In Systems Conference (SysCon), 2013 IEEE International, pages Bjørner, N., McMillan, K., and Rybalchenko, A. (2012). Program Verification as Satisfiability Modulo Theories. In SMT Workshop. Clarke, E., Grumberg, O., and Long, D. (1994). Model Checking and Abstraction. ACM Trans. on Programming Languages and Systems, 16(5): Corradini, A., Heckel, R., and Montanari, U. (2000). Graphical operational semantics. In ICALP Satellite Workshops, pages Dan, L. (2010). QVT Based Model Transformation from Sequence Diagram to CSP. In Engineering of Complex Computer Systems (ICECCS), th IEEE International Conference on, pages Dao-Tran, M., Eiter, T., Fink, M., and Krennwallner, T. (2010). First-order encodings for modular nonmonotonic datalog programs. In de Moor, O., Gottlob, G., Furche, T., and Sellers, A. J., editors, Datalog, volume 6702 of Lecture Notes in Computer Science, pages Springer. De Moura, L. and Bjørner, N. (2008). Z3: an efficient SMT solver. In Proceedings of the Theory and practice of software, 14th international conference on Tools and 3 Visual Studio 2010 is available under MSDN Licensing. FORMULA is distributed over Microsoft Research License for non-commercial use only. 4 The EU Framework 7 Integrated Project Comprehensive Modelling for Advanced Systems of Systems (COMPASS, Grant Agreement ). 83

84 algorithms for the construction and analysis of systems, TACAS 08/ETAPS 08, pages , Berlin, Heidelberg. Springer-Verlag. Fowler, M. (2010). Domain Specific Languages. Addison-Wesley Professional, 1st edition. Freitas, L. (2005). Model Checking Circus. PhD thesis, University of York. Hoare, T. and He, J. (1998). Unifying theories of programming, volume 14. Prentice Hall Englewood Cliffs. Jackson, E. K., Levendovszky, T., and Balasubramanian, D. (2011). Reasoning about metamodeling with formal specifications and automatic proofs. In Model Driven Engineering Languages and Systems, pages Springer. Leuschel, M. (2001). Design and Implementation of the High-Level Specification Language CSP(LP). In PADL, volume 1990 of LNCS, pages Springer. Liu, Y., Sun, J., and Dong, J. (2010). Developing Model Checkers Using PAT. In Bouajjani, A. and Chin, W.-N., editors, Automated Technology for Verification and Analysis, volume 6252 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg. Morgan, C. (1990). Programming from Specifications. Prentice-Hall, Inc., Upper Saddle River, NJ, USA. Mota, A. and Sampaio, A. (2001). Model-checking CSP-Z: strategy, tool support and industrial application. Science of Computer Programming, 40(1): Palikareva, H., Ouaknine, J., and Roscoe, A. W. (2012). SAT-solving in CSP Trace Refinement. Sci. Comput. Program., 77(10-11): Roscoe, A. (2010). Understanding Concurrent Systems. Springer. Roscoe, A. W. et al. (1994). Model-checking csp. A classical mind: essays in honour of CAR Hoare, pages Rossi, F., van Beek, P., and Walsh, T., editors (2006). Handbook of Constraint Programming. Elsevier. Verdejo, A. and Marti-Oliet, N. (2002). Executing and Verifying CCS in Maude. Technical report, Dpto. Sist. Informaticos y Programacion, Univ. Complutense de. Visser, E. (2004). Program transformation with stratego/xt. In Domain-Specific Program Generation, pages Springer. Woodcock, J. and Cavalcanti, A. (2002). The semantics of circus. In Proceedings of the 2Nd International Conference of B and Z Users on Formal Specification and Development in Z and B, ZB 02, pages , London, UK, UK. Springer-Verlag. Woodcock, J., Cavalcanti, A., and Freitas, L. (2005). Operational semantics for model checking circus. In Fitzgerald, J., Hayes, I., and Tarlecki, A., editors, FM 2005: Formal Methods, volume 3582 of Lecture Notes in Computer Science, pages Springer Berlin Heidelberg. Woodcock, J. and Davies, J. (1996). Using Z: Specification, Refinement, and Proof. Prentice Hall International Series in Computer Science. 84

85 SPLConfig: Product Configuration in Software Product Line Lucas Machado, Juliana Pereira, Lucas Garcia, Eduardo Figueiredo Department of Computer Science, Federal University of Minas Gerais (UFMG), Brazil {lucasmdo, juliana.pereira, lucas.sg, Abstract. Software product line (SPL) is a set of software systems that share a common set of features satisfying the specific needs of a particular market segment. A feature represents an increment in functionality relevant to some stakeholders. SPLs commonly use a feature model to capture and document common and varying features. The key challenge of using feature models is to derive a product configuration that satisfies all business and customer requirements. To address this challenge, this paper presents a tool, called SPLConfig, to support business during product configuration in SPL. Based on feature models, SPLConfig automatically finds an optimal product configuration that maximizes the customer satisfaction. Demo Video Introduction The growing need for developing larger and more complex software systems demands better support for reusable software artifacts [Pohl et al., 2005]. In order to address these demands, software product line (SPL) has been increasingly adopted in software industry [Clements and Northrop, 2001; Apel et al., 2013]. SPL is a set of software systems that share a common set of features satisfying the specific needs of a particular market segment [Pohl et al., 2005]. It is built around a set of common software components with points of variability that allow product configuration [Clements and Northrop, 2001]. Large companies, such as Hewlett-Packard, Nokia, Motorola, and Dell, have adopted SPL practices 1. The potential benefits of SPLs are achieved through a software architecture designed to increase reuse of features in several SPL products. An important concept of an SPL is the feature model. Feature models are used to represent the common features found on all products of the product line (known as mandatory features) and variable features that allow distinguishing between products in a product line (generally represented by optional or alternative features) [Czarnecki and Eisenecker, 2000; Kang et al., 1990]. Variable features define points of variation and their role is to permit the instantiation of different products by enabling or disabling specific SPL functionality. In practice, developing an SPL involves modeling features to represent different viewpoints, sub-systems, or concerns of the software system [Batory, 2005]. A fundamental challenge in SPL is the process of enabling and disabling features in a feature model for a new software product configuration [Pohl et al., 2005]. As the number of features increase in a feature model, so does the number of product options in

86 an SPL [Benavides et al., 2005]. For instance, an SPL where all features are optional can instantiate 2 n different products where n is the number of features. Moreover, once a feature is selected, it must be verified to conform to the myriad constraints in the feature model, turning this process into a complex, time-consuming, and error-prone task. Industrial-sized feature models with hundreds or thousands of features make this process impractical. Guidance and automatic support are needed to increase business efficiency when dealing with many possible combinations in an SPL. This paper presents a tool, called SPLConfig 2, to support automatic product configuration in SPL. The main goal of the SPLConfig tool is to derive an optimized features set that satisfies the customer requirements. The primary contribution of this tool is to assist businesses during product configuration, answering the following question: What is the set of features that balances cost and customer satisfaction, based on available budget? By resolving this problem, industries can more effectively achieve greater customer satisfaction. The rest of this paper is organized as follow. Section 2 describes the problem. Section 3 presents the SPLConfig architecture. Section 4 discusses the design and implementation of this tool. Examples are used to validate the tool main functionalities in Section 5. Section 6 presents some related work. Finally, Section 7 concludes and points out directions for future work. 2. Problem Description Feature models represent the common and variable features in SPL using a feature tree [Kang et al., 1990]. In feature models, nodes represent features and edges show relationships between parent and child features [Batory, 2005]. These relationships constraint how the features can be combined. As an example, the mobile phone industry uses features to specify and build software for configurable phones. The software product deployed into a phone is determined by the phone characteristics. Figure 1 depicts a simplified feature model of an SPL, called MobileMedia [Figueiredo et al., 2008], inspired by the mobile phone industry. It was developed for a family of 4 brands of devices, namely Nokia, Motorola, Siemens, and RIM. Figure 1. Example of a Feature Model for a Mobile Phone Product Line 2 SPLConfig is available at 86

87 A key need with SPL is determining how to configure an optimized feature set that satisfies the customer requirements. In Figure 1, features refer to functional requirements of the MobileMedia SPL. However, feature may also be associated with non-functional requirements. Kang et al. [1990] suggest the need to take into account non-functional requirements since For instance, considering the MobileMedia SPL, it is possible to identify non-functional requirements related to each feature, such as cost and benefit of each feature to the customer. It means that every product differs not only because of its functional features, but also because of its non-functional requirements. Therefore, we propose to extend feature models with non-functional requirements by adapting the proposed notation in Benavides et al. [2005] to our problem. Figure 2 illustrates the non-functional requirements in feature model of Figure 1. In Figure 2, all features (mandatory and optional features) have the cost attribute, and only optional features have the benefit attribute. For example, the optional feature Favourites has cost and benefit attributes, while the mandatory feature MediaSelection has only the cost attribute. Note that the benefit attribute is classified into six qualitative levels: none, very low, low, medium, high and very high. Our goal is to find an optimal solution by means of an objective function that maximizes customer satisfaction (benefit/cost) without exceeding the available budget (in the lower right corner of the Figure 2). As a motivating example, given a mobile phone product line that includes a variety of varying features, what is the product that best meets the customer requirements limited by a given budget? The challenge is that with hundreds or thousands of features, it is hard to analyze all different product configurations to find an optimal configuration. Figure 2. A Feature Model Decorate with Non-Functional Requirements Therefore, the main goal of the SPLConfig tool is to propose an automatic product configuration method based on search-based software engineering (SBSE) techniques [Harman & Jones, 2001]. Figure 3 presents an abstract overview of our method. As shown in Figure 3, from a feature model we use search-based algorithms to derive an optimized product configuration that maximizes the customer satisfaction, subject to business requirements (cost), customer requirements (benefit and budget), composition constraints, and cross-tree constraints. It is noteworthy that these algorithms are minutely detailed in Pereira (2014). 87

88 Figure 3. Overview of the Method to Automatic Product Configuration 3. SPLConfig Architecture SPLConfig is an Eclipse plugin implemented in Java. It requires one additional plugin, named FeatureIDE, in order to support product configuration. Figure 4 presents the architecture of SPLConfig and its relationships with FeatureIDE and Eclipse platform. The decision of extending FeatureIDE is particularly attractive for two reasons. First, FeatureIDE is an open source, extensible, modular, and easy to understand tool. Second, our motivation in extending FeatureIDE occurred because we could reuse the key functionalities of typical feature modeling tools, such as to create and edit a feature model, to automatically analyze the feature model, product configuration basic infrastructure, code generation, and import/export feature model. FeatureIDE is a development tool for SPL largely used and it supports all phases of feature-oriented software development of SPLs: analysis and application domains and code generation. Figure 4. SPLConfig's Architecture Automatic product configuration allows the customization of different products according to the business and customer requirements, composition, and cross-tree constraints of the feature model (SPLConfig in Figure 4). Two important activities are performed in the SPL lifecycle, domain engineering and application engineering, described below: Domain Engineering. This process is represented by a feature model and is supported by FeatureIDE (feature model editor in Figure 4). Common and variable requirements of the product line are elicited and documented. The developers create reusable artifacts of a product line in such a way that these artifacts are sufficiently adaptable (variable) to efficiently derive individual applications from them. Application Engineering. From meetings, the developers identify and describe the requirements prioritization of customer (SPLConfig in Figure 4). SPLConfig prioritize 88

89 and create reports to aid the product builder when defining a product configuration for a specific customer. Our product configuration algorithm uses an optimization scheme and provides a valuable decision support to product configuration within the SPL. The product configuration result is visualized in FeatureIDE (product configuration module in Figure 4). It is important to observe that the tool can be easily extended - by means of new algorithms - to support additional non-functional features. As show in Figure 4, feature modeling in early stages enables to decide which features should be supported by an SPL and which should not. This result is a feature model represented in FeatureIDE. In a second stage, during the product configuration, it is necessary to choose the features that appropriately fulfill the business and customer requirements. This stage allows a single product configuration to be created through search-based algorithms in SPLConfig. This product configuration is presented by FeatureIDE in the product configuration module. Note that the current implementation of SPLConfig randomly picks up one of the optimal solutions. However, we are working on a new version that presents a sorted list of the top best solutions. Therefore, the customer is going to have several options of products to choose from. In a third stage, SPLConfig allows manual configuration of features if necessary. In a fourth stage, it also allows compiling and building the product. 4. Design and Implementation Decisions This section discusses some design and implementation decisions we made in the development of SPLConfig. Figure 5 shows a screenshot of SPLConfig view in the Eclipse IDE. Figure 5 shows the package explorer view typical of Eclipse IDE (a). It also illustrates the FeatureIDE model editor (b) and outline view (c). Figure 5 (d) show details of the SPLConfig view. Figure 5. SPLConfig View in the Eclipse IDE 89

90 Figure 5 (d) presents the main view of SPLConfig integrated with the Eclipse IDE. At the top of the view, we have two fields named Budget and Customer that should be filled by the customer with the available budget and with the customer identification, respectively. Each feature in the feature model is presented in a row of this view. Columns give additional information, such as the feature name, the level of importance of the feature to the customer (Benefit), and the cost of development of each feature (Cost). The tool is supposed to be used by developers who are expected to translate qualitative information from customers into quantitative ones. Moreover, this view includes typical buttons, such as Refresh and Execute, which trigger their respective actions. Refresh is used to update data being presented in this view, while Execute selects the most appropriate product configuration that best satisfies the customer requirements. In addition to this view, we extend Eclipse with a preference page presented in Figure 6. Industries can set specific preferences, as the cost of development of each feature that makes up the SPL. Therefore, for the same SPL, several products can be generated according to needs and constraints specific of each customer, but keeping the cost of each feature fixed. Note that the FeatureIDE view can be used to show the product configuration that best satisfies the customer requirements, but it does not prevent other features from being included or excluded in the final product (i.e., manual tuning). Figure 6. SPLConfig Preference Page 5. Preliminary Evaluation We reviewed a large number of research works in the field of SPL and selected a set of ten feature models that were used as benchmark instances to evaluate SPLConfig: MobileMedia [Figueiredo et al., 2008], System [Thüm et al., 2014], Smart Home [Alferez et al., 2009], Devolution [Thüm et al., 2014], Gasparc [Aranega et al., 2012], 90

91 Web Portal [Mendonça et al., 2008], FraSCAti [Seinturier et al., 2012], Model Transformation [Czarnecki & Helsen, 2003], Battle of Tanks [Thüm & Benduhn, 2011], and e-shop [Lau, 2006]. We found that the time spent in the configuration of a feature model with 213 features (e-shop) is about 6 milliseconds. However, so far, SPLConfig has scaled well for all evaluated feature models (up to 3 hundreds features). It is noteworthy that evaluation details are available in Pereira (2014). 6. Related Tools Automated reasoning is a challenging field in SPL engineering. Mendonça et al. (2009) and Thüm et al. (2014) have proposed visualization techniques for representing staged feature configuration in SPLs. However, industrial-sized feature models with hundreds or thousands of features make a manual feature selection process hard. Moreover, these approaches focus on functional features of a product and their dependencies; neglecting non-functional requirements. To the best of our knowledge, we still lack tools to deal with non-functional features. 7. Conclusions and Future Work This paper presented SPLConfig, a tool developed for product configuration in SPL by the use of search-based algorithms. We described the problem handled by the tool (Section 2) and summarized the method behind the tool and its main functionalities (Sections 3 and 4). Our results so far are, in general, satisfactory. Nevertheless, our goals for future work include the improvement of SPLConfig tool in order to (i) include others non-functional requirements [Kang et al. 1998] and (ii) consider the difficulties of developing and maintaining of the product for the industries, in order to minimize the development effort integrating features to compose a product. Further work should also address case studies in industries in order to ensure that realistic instances of requirements are being used, through direct contact with the business and customers. Acknowledgment This work was partially supported by CNPq (grant Universal /2013-5) and FAPEMIG (grants APQ and PPM ). References Apel, S., Batory, D., Kästner, C., and Saake, G. (2013). Feature-Oriented Software Product Lines: Concepts and Implementation. Springer-Verlag. Alferez, M., Santos, J., Moreira, A.and Garcia, A., Kulesza, U., Araujo, J., and Amaral, V. (2009). Multi-view composition language for software product line requirements. In International Conference on Software Language Engineering (SLE), p Aranega, V., Etien, A., and Mosser, S. (2012). Using feature model to build model transformation chains. In 15th international conference on Model Driven Engineering Languages and Systems (MODELS), pages Batory, D. S. (2005). Feature models, grammars and propositional formulas. In 9th International Software Product Lines Conference (SPLC), pages

92 Benavides, D., Martin-Arroyo, P. T., and Cortes, A. R. (2005). Automated reasoning on feature models. In 17th Conference on Advanced Information Systems Engineering (CAiSE), pages Clements, P. and Northrop, L. (2001). Software product lines: Practices and patterns. Addison-Wesley. Czarnecki, K. and Eisenecker, U. W. (2000). Generative programming: Methods, tools, and applications. Addison-Wesley. Czarnecki, K. and Helsen, S. (2003). Classification of model transformation approaches. Available at: /Czarnecki_Helsen.pdf/. Figueiredo, E. et al. (2008). Evolving software product lines with aspects: An empirical study. In 30th International Conference on Software Engineering (ICSE), p Harman, M. and Jones, B. F. (2001). Search based software engineering. Journal Information and Software Technology, 43(0): Kang, K., Cohen, S., Hess, J., Novak, W., and Peterson, S. (1990). Feature-oriented domain analysis (FODA) feasibility study. Technical Report. CMU/SEI-90-TR-21. ESD-90-TR-222. Kang, K., Kim, S., Lee, J., Kim, K., Shin, E., and Huh, M. (1998). Form: A featureoriented reuse method with domain-specific reference architectures. Annals of Software Engineering, 5(1): Lau, S. Q. (2006). Domain analysis of e-commerce systems using feature-based model templates. Master s thesis. University of Waterloo, Canada. Mendonça, M., Bartolomei, T., and Donald, C. (2008). Decision-making coordination in collaborative product configuration. In ACM Symposium on Applied Computing (SAC), p Mendonça, M., Branco, M., and Cowan, D. (2009). S.p.l.o.t.: Software product lines online tools. In 24th Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages Pereira, Juliana A. Search-Based Product Configuration in Software Product Lines. April Federal University of Minas Gerais. Belo Horizonte - April Pohl, K., Böckle, G., and Van der Linden, F. J. (2005). Software product line engineering: Foundations, principles and techniques. Springer-Verlag. Seinturier, L., Merle, P., Rouvoy, R., Romero, D.and Schiavoni, V., and Stefani, J. (2012). A component-based middleware platform for reconfigurable service-oriented architectures. Software Practice and Experience, 42(5): Thüm, T. and Benduhn, F. (2011). Spl2go: An online repository for open-source software product lines. Available at: [Online; accessed 10-December-2013]. Thüm, T., Kästner, C., Benduhn, F., Meinicke, J., Saake, G., and Leich, T. (2014). Featureide: An extensible framework for feature-oriented software development. Journal Science of Computer Programming, 79(0):

93 SPLICE: Software Product Line Integrated Construction Environment Bruno Cabral 1,3, Tassio Vale 2,3, Eduardo Santana de Almeida 1,3 1 Computer Science Department Federal University of Bahia (UFBA) Salvador, BA Brazil 2 Center of Exact Sciences and Technology Federal University of Recôncavo da Bahia (UFRB) Cruz das Almas, BA Brazil 3 RiSE Labs Reuse in Software Engineering Salvador, BA Brazil bruno.cabral@ufba.br, tassio.vale@ufrb.edu.br, esa@dcc.ufba.br Abstract. A Software Product Lines (SPL) is basically, a set of products developed from reusable assets. During the development of a SPL, a wide range of artifacts needs to be created and maintained to preserve the consistency of the family model during development, and it is important to manage the SPL variability and the traceability among those artifacts. In this paper, we propose the Software Product Line Integrated Construction Environment (SPLICE). SPLICE is an open source web-based life cycle management tool for managing, in an automated way, the software product line activities. This initiative intends to support most of the SPL process activities such as scoping, requirements, architecture, testing, version control, evolution, management and agile practices. 1. Introduction Overview video: Software Product Line (SPL) is considered one of the most effective methodologies for developing software products and is proven as a successful approach in business environments. The successful introduction of a software product line provides a significant opportunity for a company to improve its competitive position [Pohl et al. 2005]. However, managing a SPL is not so simple, since it demands planning and reuse, adequate management and development techniques, and also the ability to deal with organizational issues and architectural complexity [Cavalcanti et al. 2012]. During the development of an SPL, a wide range of artifacts needs to be created and maintained to preserve the consistency of the family model during development, and it is important to manage the SPL variability and the traceability among those artifacts. However, this is a hard task, due to the heterogeneity of assets developed during the SPL lifecycle. Maintaining the traceability of artifacts updated manually is error-prone, time consuming and complex. Therefore, using a Project management system for supporting those activities is essential. A large number of CASE tools exist in for assisting Software Engineering activities and there are also specific tools for SPL engineering [Lisboa 2008]. However, they are complex, formal and enforce a specific project management process, without a suitable customization. Another issue is that the tools focus on a specific activity of the process, 93

94 forcing the Software Engineer to use different tools to support their process. By using different tools, it is hard to automatize trace links (links from one artifact to another), providing traceability among the artifacts. Moreover, software engineers also have to provide the installation, maintainability and user management for a number of tools, or rely on an external person for those tasks not directly related to the product development. Aiming to address such issues, we present the Software Product Lines Integrated Construction Environment (SPLICE). SPLICE is a web-based software product line lifecycle management tool, providing traceability and variability management and supporting most of the SPL process activities such as scoping, requirements, architecture, testing, version control, evolution, management and agile practices. This tool assists the engineers involved in the process, with the assets creation and maintenance, while providing traceability and variability management, as well offering detailed reports and enabling engineers to easily navigate between the assets using the traceability links. It also provides a basic infrastructure for development, and a centralized point for user management. The remaining of this paper is organized as follows: Section 2 addresses the what was been proposed in the literature and tools available in the market; Section 3 presents the tool, including the requirements of the tool, the proposed metamodel and its general architecture; Section 4 shows a case study conducted inside a research laboratory to validate the tool; Section 5 provides the concluding remarks. 2. Related Work Schwaber [Schwaber 2006] defines an ALM tool as a tool that provides: The coordination of development lifecycle activities, including requirements, modeling, development, build and testing, through: 1) enforcement of processes that span these activities; 2) management of relationships between development artifacts used or produced by these activities; and 3) reporting on progress of the development effort as a whole. During our search[cabral 2014] for similar commercial or academic tools, we initially came up with 221 possible tools related to ALM or general project management, but after filtering, it was reduced to twenty-three tools. The selection criteria comprised the support for the following assets: Requirements elicitation and/or use cases; Planning using agile methodologies; Issue reports management; Testing; Feature modeling. In addition, the tool should provide traceability among the managed assets; Flexibility for changing and modeling the tool metamodel for specific needs; the tool has features oriented specifically for SPL development. According to our analysis, the desired characteristics were sparsely supported among different tools. Only the IBM Jazz Collaborative Lifecycle Management, Polarium ALM, codebeamer ALM and Endeavour Agile ALM fulfilled multiple characteristics, however with some issues. Issues from the other solutions included missing characteristics, metamodel inflexibility and absence of support for SPL development. A detailed comparison table is available on the web

95 3. SPLICE SPLICE is a open source (GNU General Public License 2 ), Python 3 web-based tool built in order to support and integrate SPL activities such as requirements management, architecture, coding, testing, tracking, and release management, providing process automation and traceability. Our tool provides the infrastructure of Version Control and Issue Tracking, enabling stakeholders and Software Engineers to create the architecture and artifacts in a controllable and traceable way. The tool requirements are described as follows: FR1 - Traceability of lifecycle artifacts. It should identify and maintain relationships between managed lifecycle artifacts. FR2 - Reporting of lifecycle artifacts. It must use lifecycle artifacts and traceability information to generate needed reports from the lifecycle product information. FR3 - Metamodel Implementation. The SPLICE should implement entities and relationships described in a defined metamodel. The metamodel created comprises the relationships among the SPL assets, allowing traceability and facilitating the evolution and maintenance of the SPL. FR4 - Issue Tracking. Issue Tracking play a central role in software development, and the tool must support it. It is used by developers to support collaborative bug fixing and the implementation of new features. In addition, it is also used by other stakeholders such as managers, QA experts, and end-users for tasks such as project management, communication and discussion, code reviews, and story tracking. FR5 - Agile Planning. In the software industry, there is a strong shift from traditional phase-based development towards agile methods and practices[bjarnason et al. 2011]. The tool must support it. FR6 - Configuration management. For evolution management, the tool must support change management across all the managed artifacts. It must also support creating and controlling the mainstream version management systems such as SVN 4 and GIT 5. FR7 - Unified User management. As an integrated environment, with the plan to cover all the lifecycle activities, the tool can use a number of external tools, taking advantage of the vibrant community and quality present in some opensource/freely available tools. For convenience, it must provide a unified user management between all the tools. FR8 - Collaborative documentation. Wiki is a collaborative authoring system for collective intelligence which is quickly gaining popularity in content publication and it is a good candidate for documentation creation. Wikis collaborative features and markup language make it very appropriate for documentation of Software Development tasks. FR9 - Artifacts search. All artifacts managed by the tool must support keyword search, and present the results in an adequate way. For brevity, the non-functional requirements will not be described. However, it included Easy access, Metamodel Flexibility, Extensibility, Usability, Accountability, Transparency and Security

96 3.1. Metamodel To address the previously defined functional and non-functional requirements, we decided to use a model-driven approach to represent all the information, activities and connections among artifacts. With a well-defined metamodel we can provide automation and interactive tool support to aid the corresponding activities. Several metamodels have been proposed in the literature in the past [Buhne et al. 2005, Cavalcanti et al. 2012, Schubanz et al. 2013]. However, we argue that all the metamodels proposed a traditional phase-based methodology, and do not fit with a more lightweight, flexible and informal methodology such as agile methods. We propose a lightweight metamodel, adapted from [Cavalcanti et al. 2012], representing the interactions among SPL assets, developed in order to provide a way of managing traceability and variability. The proposed metamodel represents the reusable assets involved in a SPL project, and simplified description of the models is presented next. The complete metamodel, with all relations is available on the web 6. Scoping Module comprises the Feature and the Product Model. Many artifacts relates directly with the Feature Model including Use Case, Glossary, User Story and Scope Backlog. A Product is composed of one or more Features. Requirements Module involves the requirement engineering traceability and interactions issues, considering the variability and commonality in the SPL products. The main object of this SPL phase is Use Case. The Use Case model is composed by description, precondition, title and a number of MainSteps and AlternativeSteps.The concept of User stories is used in this metamodel to represent what a user does or needs to do as part of his or her job function. It is composed by a name and the associated template : As a, I want and So that. Testing Module is composed of a name, description, the Expected result and a set of Test Steps. One Test Case can have many Test Execution that represent one execution of it. The reasoning for the Test Execution is to enable a test automation machinery.the metamodel also represent the acceptance testing with the Acceptance Test and Acceptance Test Execution. Agile Planning Module contains Sprint Planning models, which are composed of a number of Tickets, a deadline, an objective and a start date. At the end of the sprint, it happens a retrospective, represented in the model by Sprint Retrospective, that contains a set of Strong Points and Should be Improved models that express what points in the spring was adequate, and what needs improvement Architecture SPLICE architecture is composed of Trac, a core (Tonho) module, a database module, an Authentication Control (Maculellê) module and a set of versioning and revision control tools. Trac 7 is an open source, Web-based project management and bug tracking system and it is used as a foundation to the SPLICE. On top of that, two separated modules were built to provide the missing functionality. Tonho is the main module, where the metamodel and functionality not provided by Trac is implemented and the Authentication manager is the module who provides the Single sign-on property among the tools, supplying unified access control with a single login

97 Figure 1. SPLICE architecture overview Figure 1 illustrates a simplified organization of the architecture: the Authentication Control validates the request and enable access to the tools if the user have the right credentials; Trac is responsible for Issue Tracking, managing the versioning and revision control tools, collaborative documentation and plugin extensibility; the main module (Tonho), it is where the metamodel is implemented and has sub-modules for each metamodel module, such as: Scoping (SC), Requirements (RQ), Tests (TE), Agile Planning (AP) and Other (OT); Versioning and revision control is part of software configuration management, and is composed of the set of external Version control systems (VCS) tools such as SVN, GIT and Mercurial. It tracks and provides control over changes to source code, documents and other collections of information; and finally, the Database stores in an organized way all the asserts and do the persistence of the data among the tools. The bridge between Tonho and Trac is made using plugins, a shared database and templates. Absolutely no modification was done to Trac core. This solution allows to easy upgrading of Trac in the future taking advantage of new features and security fixes. All modules of the architecture share the same Database and have the same template design Main Functionalities Aiming to address the requirements described previously, the main functionalities of SPLICE includes: Metamodel Implementation All the screens are completely auto-generated based on the models descriptions, allowing the Software Engineer to easily modify the process and to specific project needs. For every model, a complete CRUD (Create, Read, Update and Delete) system is created, but idiosyncrasies can be easily customized. The SPLICE also provide advanced features such as filtering and classification. 97

98 Issue Tracking Based on trac core, the SPLICE have a full-featured Issue Tracking. We extended it to implement SPL specific features and to provide traceability between other assets. Traceability SPLICE provides total traceability for all assets in the metamodel, and is able to report direct and indirect relations between them. In reports, asserts have hyperlinks, enabling the navigation between them. Custom SPL Widgets SPLICE has a set of custom widgets to represent specific SPL models. Such as Feature Map, Feature restriction, Product Map, and Agile Poker planning. Change history and Timeline SPLICE has a rich set of features to visualize how the project is going, where the changes are happening, and who did it. For every Issue or Asset, a complete Change history is recorded. Unified Control Panel The tool aggregates the configuration of all external tools in a unified interface. With the same credentials, the user is able to access all SPLICE features, including external tools as VCS. Agile Planning The SPLICE supports a set of Agile practices such as effort estimation, where team members use effort and degree of difficulty to estimate their own work. The Features can be dragged by the mouse, and their position is updated in accordance. Automatic reports generation SPLICE has the ability of creating reports, including PDFs. The generated report includes a cover, a summary and the set of the chosen artifact related to the product. This format is suitable for the requirements validation by stakeholders. The tool is also able to collect all reports for a given Product, and create a compressed file containing the set of generated reports. 4. Case Study: RescueMe SPL The SPLICE tool was evaluated during an SPL development project. The study comprised the migration from a manual Software Engineering process to the SPLICE tool and the proposed metamodel. Cabral (2013) 8 provides a more detailed description of the case study. Figure 2. Features list in SPLICE During the months of June and November 2013, we developed the RescueMe SPL, following an SPL agile process. RescueMe is a product line developed in Objective-C for 8 Bachelor s thesis available at 98

99 ios devices, designed to help its users in dangerous situations. It was developed using an iterative and incremental process, carried out by four developers, with a face-to-face meeting at the end of each sprint. These meetings were responsible for evaluating results and planning the next sprint. Before using SPLICE, the group manually maintained the SPL process based on a set of external tools, such as SourceForge 9 service for issue tracking and VCS. All the requirements artifacts where maintained using text documents questionnaires. SPLICE was introduced to manage the SPL process and all artifacts were migrated to it. After the migration, the development continued to use only the SPLICE to manage the application lifecycle. Figure 2 shows a list of features using the SPLICE. After the migration to SPLICE, we selected survey as data collection instrument. To evaluate the applicability of the tool, the survey design was based on [Kitchenham and Pfleeger 2008] guidelines and is composed of a set of personal questions, closed-ended and open-ended questions related to the research questions. We used the cross-sectional design, which participants were asked about their past experiences at a particular fixed point in time. This survey was performed as a self-administered printed questionnaire, that can be seen on the web 10, and were aimed to the PhD students who are experts on the subject. Two experts answered the questionnaire, and they have more than 5 years of experience in Software Engineering, and more than 4 years on SPL development. Analyzing the answers, no one reported any major difficulties during the tool usage. One developer reported a minor problem with the fact the initial screen is the collaborative document and not the assets screen, which is hidden behind a menu item. No major usability problem was found, and all were able to use and evaluate the tool without supervision. This can indicate that the tool fulfilled the requirement of Usability. The experts explicitly stated that the tool was useful, aided on the assets traceability, provided all the traceability links they wanted and offered a valuable set of features. They also stated that would, spontaneously, use the tool in future SPL projects. The experts also mentioned some points of improvement during the survey. One improvement suggested was the ability to configure the process and the metamodel. This is a non-functional requirement of the tool, and the SPLICE architecture is capable of it. However, it requires editing some files manually, so visual editor should be added to the tool to address this problem. Some other problems includes the need to a better change impact analysis and integration with variability in source code, to perform product line derivation, that are on the roadmap for the next version. 5. Conclusion This paper presented the SPLICE, an open source Python web-based tool built in order to support and integrate SPL activities, such as, requirements management, architecture, coding, testing, tracking, and release management, providing process automation and traceability across the process. Moreover, we also presented a lightweight metamodel for SPL using agile methodologies, which was implemented as default process in SPLICE. SPLICE fills a gap on open-source ALM tools focused on SPL. It supports the main SPL life cycle artifacts, providing traceability between those artifacts and consequently an integrate and consistent management.compared to the other tools, SPLICE also have as a

100 differential the fact of being web-based, allowing users with different devices to use it and collaborate. As a future work, we intend to implement enhancements and issues reported by the experts. Two important upcoming features include visual metamodel editing, so users can visually adapt the metamodel for their specific needs and source-code variation support, to archive a complete derivation process. The proposed tool is publicly available at Acknowledgments This work was partially supported by the National Institute of Science and Technology for Software Engineering (INES 11 ), funded by CNPq and FACEPE, grants / and APQ /08 and CNPq grants /2010-6, /2010-8, / and FAPESB. References Bayer, J. and Widen, T. (2002). Introducing traceability to product lines. In Revised Papers from the 4th International Workshop on Software Product-Family Engineering, PFE 01, pages , London, UK, UK. Springer-Verlag. Baysal, O., Holmes, R., and Godfrey, M. W. (2013). Situational awareness: Personalizing issue tracking systems. In Proceedings of the 2013 International Conference on Software Engineering, ICSE 13, pages , Piscataway, NJ, USA. IEEE Press. Bjarnason, E., Wnuk, K., and Regnell, B. r. (2011). A case study on benefits and sideeffects of agile practices in large-scale requirements engineering. page 5. ACM. Buhne, S., Lauenroth, K., and Pohl, K. (2005). Modelling requirements variability across product lines. In Proceedings of the 13th IEEE Conference on Requirements Engineering, RE 05, pages 41 52, Washington, DC, USA. IEEE. Cabral, B. S. (2014). Splice: A flexible spl lifecycle management tool. Cavalcanti, Y. C., do Carmo Machado, I., da Mota Silveira Neto, P. A., and Lobato, L. L. (2012). Software Product Line - Advanced Topic, chapter Handling Variability and Traceability over SPL Disciplines, pages InTech. Kitchenham, B. and Pfleeger, S. (2008). Personal opinion surveys. In Shull, F., Singer, J., and SjÃ berg, D., editors, Guide to Advanced Empirical Software Engineering, pages Springer London. Lisboa, L. B. (2008). Toolday - a tool for domain analysis. Pohl, K., Böckle, G., and van der Linden, F. (2005). Software Product Line Engineering: Foundations, Principles, and Techniques. Schubanz, M., Pleuss, A., Pradhan, L., Botterweck, G., and Thurimella, A. K. (2013). Model-driven planning and monitoring of long-term software product line evolution. In Proceedings of the Seventh International Workshop on VAMOS, VaMoS 13, pages 18:1 18:5, New York, NY, USA. ACM. Schwaber, C. (2006). The Changing Face Of Application Life-Cycle Management by Carey Schwaber - Forrester Research. 11 INES

101 FlexMonitorWS: uma solução para monitoração de serviços Web com foco em atributos de QoS Rômulo J. Franco 1, Cecília M. Rubira 1, Amanda S. Nascimento 2 1 Instituto de Computação Universidade Estadual de Campinas romulojosefranco@gmail.com, cmrubira@ic.unicamp.br 2 Departamento de Computação Universidade Federal de Ouro Preto amanda.nascimento@gmail.com Resumo. FlexMonitorWS is a solution tool for monitoring Web services that aims to monitor the QoS attribute values, as well as enable understanding the degradation factors from the QoS attributes in the context of process monitoring. Solution adopts methods based on Software Product Lines to explore the software variability in current solutions of monitoring systems, generating a family of monitors responsible in to monitor different QoS attributes and different resources as IT targets, where different operating methods can be applied. Two case studies were performed to assess feasibility of tool, obtaining satisfactory results of delivery QoS attributes values and understanding of its degradation. Apresentação da ferramenta em: 1. Introdução Arquitetura Orientada a Serviços (SOA) e Linhas de Produtos de Software apoiam a reutilização de Software. SOA é um modelo de componentes de software que interrelaciona diferentes unidades funcionais, chamadas serviços, por intermédio de interfaces bem definidas, que são independentes de plataformas e linguagens de implementação. Já LPS pode ser definida como um conjunto de sistemas de software que compartilham características comuns e possuem características distintas visando satisfazer as necessidades de um nicho de mercado [8]. O principal propósito da engenharia de linhas de produtos é oferecer produtos personalizados a um custo razoável [11]. Devido às propriedades inerentes de SOA, tais como, dinamicidade, heterogeneidade, distribuição, autonomia dos serviços e a incerteza do ambiente de execução, atributos de qualidade de serviço (QoS) podem sofrer flutuações em seus valores ao longo do tempo (e.g. disponibilidade e desempenho) [1]. Consequentemente, é necessário monitorar serviços ao longo do tempo a fim de garantir que tais serviços apóiem o nível de qualidade esperado ou definido, ou seja, definido pelo provedor do serviço. Contudo, embora exista uma demanda crescente por monitoração de atributos de QoS, projetar e implementar ferramentas para tal atividade é uma tarefa não trivial. A monitoração pode ser realizada, por exemplo, a partir de diferentes locais (e.g. do lado do cliente e/ou servidor), considerando diferentes atributos de QoS e diferentes frequências de monitoração. Há uma carência de soluções que contemplam simultaneamente as diferentes formas de monitoração de atributos QoS, apoiando, desta forma, requisitos de diferentes usuários e aplicações. 101

102 Neste trabalho, apresentamos a FlexMonitorWS, uma solução baseada em LPS que apoia uma família de ferramentas que permitem monitorar atributos de QoS. Primeiramente, a partir de uma revisão sistemática da literatura, foram estudadas soluções existentes para monitoração de QoS a fim de indentificar funcionalidades comuns e variáveis entre elas [5, 6, 2, 1, 7]. Tais funcionalidades são inicialmente mapeadas para um modelo de características e posteriormente implementadas mediante uso de uma arquitetura baseada em componentes da linha de produtos. A ALP visa facilitar a instanciação de ferramentas (i.e. produtos) específicas conforme características de interesse. 2. Fundamentos de LPS A visão original da LPS é formada por um domínio concebido a partir das necessidades do negócio, formulado como uma família de produtos e com um processo de produção associado [9]. Este domínio apóia a construção rápida e evolução de produtos personalizados conforme mudanças nos requisitos dos clientes. Variabilidade de Software é um conceito fundamental em LPS e refere-se a capacidade de um sistema de software ou artefato ser modificado, personalizado ou configurado, para ser utilizado em um contexto específico [10]. As variabilidades podem ser inicialmente identificadas por meio de características. Característica é uma propriedade de sistema que é relevante para alguma parte interessante e é usada para capturar partes comuns ou variáveis entre sistemas de uma mesma família [12]. Variabilidades de software podem ser representadas por meio de um modelo de características, em que há uma classificação, separando características comuns das variáveis em sistemas [8]. Esta classificação ainda pode ser mais detalhada separando-as em opcionais, alternativas e mandatórias [3]. 3. FlexMonitorWS A FlexMonitorWS é uma ferramenta de monitoração baseada em conceitos de LPS explorando a variabilidade de software existente em sistemas de monitoração de serviços Web. A partir de uma revisão sistemática da literatura de soluções de monitoração de serviços, buscamos responder as seguintes questões: 1) Onde ocorrerá a monitoração (alvo)? 2) O que se deseja obter com a monitoração ou o que deveria ser monitorado (atributos de QoS)? 3) De que modo isso pode ser realizado (modo de operação)? 4) Qual a frequência da monitoração? 5) Quais os meios de se obter resultados ou alertas gerados pela monitoração (notificação)? Ao responder tais questões, é possível perceber a variabilidade existente nestes tipos de sistemas de monitoração. Estas questões foram mapeadas como características e são representadas no Modelo de características que pode ser conferido na Figura 1. O modelo de características representado na Figura 1 foi usado para projetar e implementar a arquitetura da LPS, que permite a instanciação de produtos específicos conforme requisitos dos diferentes usuários interessados na monitoração. Isto é, ao combinar as possibilidades identificadas como características no modelo da Figura 1 atende a um monitor, face a uma necessidade específica. Note no modelo representado na Figura 1, contêm as características mandatórias, como: Alvo, AQoS (Atributo de QoS), Modo de 102

103 Figura 1. Modelo de características de sistemas de monitoração Operação, Frequência de Execução e a Notificação. No restante da hierarquia do modelo têm-se as demais características opcionais. Figura 2. Diagrama de comunicação representando a arquitetura da LPS da Flex- MonitorWS Figura 2, apresentamos um diagrama de comunicação que representa a interação entre os objetos da FlexMonitorWS. Conforme ilustrado nesta figura, a partir do ob- 103

104 jeto central FlexMonitorWS é configurada a Frequência de Execução da monitoração por meio de um Timer, que uma vez acionado gera uma interrupção na máquina virtual e dispara o processo de monitoração. A etapa posterior ocorre no Modo de Operação que obtém amostras e informações sobre atributos de QoS por meio das maneiras associadas a este ponto (e.g. invocação, interceptação ou inspeção). A etapa seguinte ocorre no objeto AQoS, os atributos de QoS associados a este ponto são calculados os valores para cada atributo. O objeto de Notificação finaliza o processo, tendo em Tipos de notificação as possíveis maneiras de notificar (e.g. envio de ou salvar arquivo de Log) os interessados sobre os resultados obtidos durante a monitoração. Observe na Figura 2, os elementos no diagrama que são do tipo Control e Entity permitem oferecer uma visão ligada a pontos de extensão da solução e as possíveis alternativas de projeto associadas a estes pontos. Dado que, estes pontos estão ligados às características opcionais do modelo da Figura 1. A interface para configuração da ferramenta é através de um arquivo de parâmetros denominado por target.properties. Este arquivo é separado por blocos, cada bloco corresponde a um alvo com configurações pertinentes a ele. Sendo assim, vários blocos podem ser inseridos no arquivo para monitorar um ilimitado número de alvos. Os três modos de operação possíveis da ferramenta são: interceptação através do comando TCPDUMP 1, a saída do comando é analisada e interpretada para detectar políticas de segurança implementada no serviço Web; a inspeção de servidores é realizada utilizando a API Hyperic Sigar que obtêm diversas informações sobre recursos de hardware. Ainda no modo de inspeção, são capturados exceções geradas dentro do arquivo de log da aplicação servidora; para o modo invocação, utilizamos a API SAAJ 2, o arquivo de parâmetros target.properties contendo endereço, porta e demais parâmetros do serviço Web são obtidos e o XML de requisição criado em tempo de execução. Consideramos o uso do comando PING através do protocolo ICMP um modo de invocação, em que é aplicado para avaliar atributos de QoS relacionados a elementos da rede de comunicação. 4. Estudos de caso Apresentamos dois estudos de caso executados e analisados para avaliar nossa solução de forma qualitativa Estudo de caso 1 - Cenário Controlado O estudo de caso avalia a viabilidade da solução ao demonstrar a capacidade em identificar a degradação de um atributo de QoS em uma composição de serviços. No qual, têmse diversos provedores com diversos recursos (e.g. serviços, servidores, rede) passíveis de falhas. Quando uma falha ocorre, implica em degradar um dado atributo de QoS. Como há muitos recursos, torna-se difícil monitorar todos de forma efetiva. Neste caso, uma família de monitores é gerada para atender ao cenário que exemplificamos aqui. Ao final da monitoração, os dados podem ser cruzados para saber quem é o responsável pela degradação, tanto em nível de provedores, quanto em nível de recursos. A Figura 3 apresenta a identificação de cada monitor no cenário por meio de círculos enumerados. Note na Figura 3 que cada monitor foi gerado pela LPS a atender de 1 TCPDUMP, 2 Overview of SAAJ

105 forma específica e foi inserido de forma estratégica a coletar valores de atributos de QoS dos alvos. Os monitores gerados são para monitorar 1) Disponibilidade; 2) Desempenho; 3) Rede (incluindo gateways, roteadores e IP Público Google); 4) Servidor; 5) Confiabilidade e 6) Aplicação Servidora. Em resumo, foi gerado uma família de produtos baseado em LPS (e.g. monitorardisponibilidade.jar, monitorardesempenho.jar, etc). Figura 3. Cenário controlado composto por provedores, consumidores e os monitores representados por círculos enumerados sobre os alvos A execução dos seis monitores, de forma a atender ao cenário definido na Figura 3, durou 14 horas de monitoração. Após vários testes com diferentes intervalos, conseguimos encaixar nestas 14 horas, a execução de quatro intervenções no cenário controlado. Intervenções estas, como por exemplo, na primeira hora, desligar interface de rede do Provedor Master, as demais intervenções foram desligar serviços e recursos. Estas intervenções foram executadas em intervalos de tempos estrategicamente definidos para gerar anomalias no cenário e capturar informações sobre a qualidade dos dados gerados provenientes da monitoração. A partir dos valores entregues para cada atributo de QoS, foi possível cruzar dados de diferentes monitores e identificar de onde de fato ocorria a degradação de um atributo de QoS. O gráfico na Figura 4 apresenta um dos cruzamentos obtidos. Por um lado, no caso apresentado na Figura 4, mesmo que o Provedor Master inconsciente da indisponibilidade proveniente de uma intervenção na primeira hora, a monitoração do Provedor A sobre o Provedor Master e sobre o IP Público do Google iria garantir a precisão do diagnóstico oferecido pela solução de monitoração. Por outro, mesmo que o Provedor Master consciente da degradação do atributo e queira eventualmente se isentar desta, o Provedor A apontaria o Provedor Master como causador da degradação. Os resultados obtidos foram do monitor de rede apontado para IP Público Google (Monitor 3 da Figura 3) e monitor de disponibilidade do serviço do Provedor Master (Monitor 2 da Figura 3). 105

106 Figura 4. Gráfico representando o resultado da intervenção aplicada sobre o cenário ontrolado 4.2. Estudo de caso 2 - Cenário de Injeção de falhas Este segundo estudo de caso foi aplicado a FlexMonitorWS para injetar falhas e identificar vulnerabilidades em serviços Web. O principal objetivo deste estudo foi determinar a viabilidade da FlexMonitorWS como solução de injeção de falhas para apoiar a monitoração do atributo de QoS de robustez. Executamos scripts criados de forma automática para ataques do tipo Malformed XML, XML Bomb injection e requisições duplicadas. Analisamos os resultados e a FlexMonitorWS apontou vulnerabilidades para os serviços Web públicos, ao inserir um script do tipo Malformed XML repetindo-se tags após uma tag existente (e.g. <param1>value</param1><fault><fault1></fault></fault1>). A primeira vulnerabilidade identificada é sobre a inserção de uma cadeia de caracteres após um valor válido, de acordo com o relatório apontado na Figura 5. Para este caso específico, o número de cartão de crédito inserido é válido, porém, ao acrescentar-se caracteres especiais ao final do valor, ele se torna inválido, criando-se uma requisição maliciosa correspondente a uma injeção de falha. Observe na Figura 5, valores destacados apontam o diagnóstico oferecido pela solução. Figura 5. Relatório de execução dos scripts e identificação de vulnerabilidades no serviço Web público 106

107 5. Trabalhos relacionados A Tabela 1 apresenta uma comparação da ferramenta FlexMonitorWS com estudos identificados durante uma revisão das soluções existentes na literatura como [4, 2, 5, 1, 6, 7]. O primeiro item da Tabela 1 é a independência de plataforma de aplicação servidora, onde muitas propostas estudadas como representado na Tabela 1, executam do lado do servidor vinculado a uma aplicação servidora (e.g. Tomcat), e somente monitoram os serviços que estão dentro desta. O nível de flexibilidade citado na Tabela 1 é aquele que permite adicionar e remover atributos de QoS, o maior nível de flexibilidade seria aquele que permitira adicionar e remover qualquer atributo. Contudo, nem todos atributos podem ser obtidos somente a um modo de operação. É possível observar na Tabela 1 que a FlexMonitorWS é inovadora ao usar LPS para apoiar uma família de ferramentas de monitoração. Percebemos que ela apoia o nível de flexibilidade também com modos de operação e a atuar em um conjunto de atributo de QoS. Tabela 1. Comparativo dos trabalhos relacionados com a FlexMonitorWS Vantagens [4] [2] [5] [1] [6] [7] FlexMonitorWS Independência de plataforma da aplicação servidora Existe nível de flexibilidade com atributos de QoS Número de Atributos de QoS verificados e validados NA 7 Existe nível de flexibilidade com modos de operação Monitora mais de um alvo simultaneamente* Utiliza abordagem LPS Legenda: A proposta considera uma abordagem semelhante a vantagem NA Nenhum atributo de QoS avaliado na abordagem *) Monitora serviços e outros recursos simultaneamente 6. Conclusões Este artigo apresenta a FlexMonitorWS uma solução de monitoração baseada em LPS que apoia uma família de ferramentas de monitoração. Foram realizados estudos de casos em que a solução proposta é utilizada para monitorar atributos de QoS de serviços Web. Os resultados obtidos apontam a viabilidade, aplicabilidade e flexibilidade da Flex- MonitorWS. Os estudos de caso também sugerem a viabilidade da FlexMonitorWs para monitorar robustez de serviços Web ao permitir a injeção de falhas e identificação das vulnerabilidades associadas. Como trabalhos futuros a FlexMonitorWS será estendida para: 1) Monitorar Serviços Web com protocolo REST; 2) Incluir protocolos de comunicação TCP/ICMP no modelo de característica; 3) Monitoração autoadaptativa, onde a monitoração compreende e atua no ambiente para tomar decisões sobre o impacto da própria monitoração. 107

108 Referências [1] F. Souza, D. Lopes, K. Gama, N. S. Rosa, and R. Lima (2011). Dynamic event-based monitoring in a soa environment. In OTM Conferences. [2] B. Wetzstein, P. Leitner, F. Rosenberg, I. Brandic, S. Dustdar, F. Leymann. (2009). Monitoring and Analyzing Influential Factors of Business Process Performance. EDOC, [3] H. Gomaa. (2004) Designing Software Product Lines with UML: From Use Cases to Pattern-Based Software Architectures. Addison Wesley Longman Publishing Co., Inc., Redwo o d City, CA, USA. [4] C. Müller, M. Oriol, M. Rodríguez, X. Franch, J. Marco, M. Resinas, A. Ruiz- Cortés. (2012) SALMonADA: A platform for Monitoring and Explaining Violations of WS-Agreement-compliant Documents, In Proceedings of the 4th International Workshop on Principles of Engineering Service-Oriented Systems, PESOS 2012, pp , IEEE, June. [5] N. Artaiam and T. Senivongse. (2008) Enhancing Service side QoS monitoring for web services. In Proceedings of the 2008 Ninth ACIS International Conference on Software Engineering. Washington, DC, USA, IEEE Computer Society. [6] Q. Wang, J. Shao, F. Deng, Y. Liu, M. Li, J. Han, H. Mei, (2009) An Online Monitoring Approach for Web Service Requirements. IEEE Transactions on Services Computing, vol. 2, no. 4, pp , 2009 [7] L. Baresi, S. Guinea, M. Pistore, M. Trainotti. (2009) Dynamo + Astro: An Integrated Approach for BPEL Monitoring. ICWS 2009: [8] Clements, P. and Northrop, L. (2001) Software product lines: practices and patterns. Addison Wesley Longman Publishing Co., Inc., Boston, MA, USA, [9] G. H. Campbell. Renewing the product line vision. In Proceedings of the th International Software Product Line Conference, Washington, DC, USA, IEEE Computer Society [10] J. V. Gurp, J. Bosch, and M. Svahnb erg. (2001) On the notion of variability in software product lines. In Proceedings of the Working IEEE/IFIP Conference on Software Architecture, WICSA 01, Washington, DC, USA, IEEE Computer Society. [11] Pohl, K.; Böckle, G.; Van Der Linden, F. Software product line engineering: Foundations, principles, and techniques. Berlin/Heidelberg: Springer, [12] S. H. Chang and S. D. Kim. A variability modeling method for adaptable services in service-oriented computing. In SPLC 07: Proceedings of the 11th International Software Product Line Conference, pages , Washington, DC, USA, IEEE Computer Society. 108

109 A Code Smell Detection Tool for Compositional-based Software Product Lines Ramon Abilio 1, Gustavo Vale 2, Johnatan Oliveira 2, Eduardo Figueiredo 2, Heitor Costa 3 1 IT Department - Federal University of Lavras (UFLA), Lavras, MG, Brazil 2 Department of Computer Science - Federal University of Minas Gerais (UFMG) 3 Department of Computer Science - Federal University of Lavras (UFLA) ramon.abilio@dgti.ufla.br, {gustavovale,johnatan,figueiredo}@dcc.ufmg.br, heitor@dcc.ufla.br Abstract. Software systems have different properties that can be measured. Developers may perform manual inspection or use software measure-based detection strategies for evaluating software quality. Detection strategies may be implemented in a computational tool and they perform detection faster. We developed an Eclipse plug-in called VSD (Variability Smell Detection) to measure and detect code smells in AHEAD-based Software Product Line. 1. Introduction Despite of the extensive use of software measures, isolated values of measures are not meaningful because they are too fine-grained. However, we can combine measures to obtain measure-based detection strategies to detect code smells, for example. Detection strategies are based on the combination of measures and thresholds using logical operators (AND and OR) [Marinescu, 2004; Lanza; Marinescu, 2006]. The values of thresholds can be represented with the labels: Low, Avg (average), and High, because real values may be different depending on the context [Marinescu, 2004; Lanza; Marinescu, 2006]. A measure-based detection strategy may be implemented in a computational tool and used to detect code smells faster than manual inspections, which are time-consuming. Measure-based detection strategies have been used to localize code smells in Object-Oriented (OO) [Lanza; Marinescu, 2006] and Aspect-Oriented (AO) [Figueiredo et al., 2012] software, but they have not been applied to detect code smells in Feature- Oriented (FO) software. Feature-oriented programming (FOP) is particularly useful in applications where a large variety of similar objects is needed [Prehofer, 1997; Batory et al., 2003], such as in the development of Software Product Lines (SPL). SPL is an approach to design and implementation of software systems that share common properties and differ themselves by some features to meet needs of the market or of the specific clients [Pohl et al., 2005]. A feature may be defined as a prominent or distinctive user-visible aspect, quality, or characteristic of software [Kang et al., 1990] and can be implemented with different approaches, such as Compositional and Annotative. In the Compositional approach, features are implemented in separated artifacts, using FOP [Batory et al., 2003] or Aspect-Oriented Programming (AOP) [Kiczales et al., 1997]. AHEAD is a compositional approach based on gradual refinements, in which programs are defined as 109

110 constants and features are added using refinement functions [Batory et al., 2003]. Besides, classes implement the basic functions of a system (constants) and extensions, variations, and adaptations in these functions constitute the features (refinements). Features are implemented in modules syntactically independent of the classes and can insert or modify methods and attributes [Batory et al., 2003]. The definition of code smells and its detection strategies for OO and AO address mechanisms of those techniques, such as classes, methods, aspects, and pointcuts [Fowler et al., 1999; Lanza; Marinescu, 2006; Macia et al., 2010]. In SPL, there are smells which indicate potentially inadequate feature modeling or implementation. To emphasize the difference in the focus on variability, Apel et al. (2013) called them variability smells. A variability smell is a perceivable property of a SPL that is an indicator of an undesired code property. It may be related to all kinds of artifacts in a SPL, including feature models, domain artifacts, feature selections, and products. Focusing on the variability smells related to implementation of features, Abilio (2014) adapted three traditional code smells - God Method, God Class, and Shotgun Surgery - to address specific characteristics of compositional-based SPL. The adaptation of the code smells was based on literature [Fowler et al., 1999; Lanza; Marinescu, 2006; Macia et al., 2010] and on analysis of a set of AHEAD-based SPLs. This adaptation was necessary because the traditional code smells do not address mechanisms of FOP, such as constants and refinements. Besides, detection strategies for those code smells were defined [Abilio, 2014]. Abilio (2014) described the code smells and detection strategies using the structure presented by Lanza and Marinescu (2006). Proposed measures to address specific mechanisms of FOP [Abilio, 2014] and measures indicated as useful to detect the traditional code smells [Lanza; Marinescu, 2006; Padilha et al., 2013; Padilha et al., 2014] were used for filtering. Low, Avg, and High values were calculated from a set of SPLs (not products). To measure source code of an AHEAD-based SPL and detect the proposed code smells, we developed the Variability Smell Detection 1 (VSD) tool as an Eclipse plug-in. Software measures are key means for assessing software modularity and detecting design flaws [Blonski et al., 2013; Lanza; Marinescu, 2006; Marinescu, 2004]. The community of software measures has traditionally explored quantifiable module properties, such as class coupling, cohesion, and interface size, in order to identify code smells in software systems [Lanza; Marinescu, 2006; Marinescu, 2004]. For instance, Marinescu (2004) relies on traditional measures to detect code smells in object-oriented systems. Following a similar trend, Blonski et al. (2013) propose a tool, called ConcernMeBS, to detect code smells based on concern measures. However, as far as we are concerned, there is not tool to measure the source code and to detect code smell in AHEAD-based SPLs. Therefore, the main goal of this work is presenting VSD by showing a high-level view of its architecture, the implemented measures and the detection strategies, and presenting its main functions (Section 2). Section 3 presents an example of the VSD use and discusses the results of a preliminary evaluation. Finally, Section 5 concludes this paper and suggests future work. 1 VSD source code is available in < 110

111 2. Variability Smell Detection Tool (VSD) This section presents VSD, a tool to detect code smells in AHEAD-based SPL. Section 2.1 presents the VSD architecture highlighting the main components and their interaction. The implemented measures and an example of the detection strategies are summarized in Section 2.2, and the main functions are detailed. To illustrate VSD use, we measured and detected code smells in TankWar SPL [Schulze et al., 2012]. This SPL is a game developed by students of the University of Magdeburg (German). It has medium size (~5,000 LOC), has 37 features (31 concrete features), and runs on PC and mobile phones [Schulze et al., 2012]. This SPL was chosen due to its size (lines of code and number of features) and because it was used in other studies [Schulze; Apel; Kastner, 2010; Apel; Beyer, 2011; Schulze et al., 2012] VSD Architecture Figure 1 presents a high level view of VSD architecture. We used Eclipse IDE 4.3 (Kepler) and FeatureIDE to develop VSD. FeatureIDE is an Eclipse-based IDE that supports feature-oriented software development for the development of SPLs [Thüm et al., 2014]. It integrates different SPL implementation techniques, such as FOP, AOP, and Preprocessors, and provides resources to deal with AHEAD projects. VSD has 3,5 KLoC and its classes were distributed in three packages illustrated in Figure 1: i) Detection Strategies: contains classes that implement the detection strategies; ii) Measurement: contains measures and classes that perform the measurement; and iii) Plugin: contains classes responsible to interact with Eclipse. Figure 1 High Level View of VSD Architecture 2.2. Implemented Measures and Detection Strategies To measure an AHEAD-based SPL source code, VSD implements traditional, OO, and FO measures (Table 1), such as McCabe's Cyclomatic Complexity (Cyclo) [McCabe, 1976], Weighted Methods Per Class (WMC) [Chidamber; Kemerer, 1994], and Number of Constants (NOCt) [Abilio, 2014], respectively. As AHEAD uses the programming language Jak (a Java superset), traditional and OO measures assess properties of OO source code and FO measures assess the properties added by the mechanisms that Jak provides to implement features. We organized the measures in three groups: i) Method group: measures related to individual properties of methods; ii) Component group: measures for properties of components (classes, interfaces, constants, refinements); and iii) SPL group: measures to assess properties of source code of an entire SPL. VSD implements three detection strategies, one for each code smell. When the user selects the option to detect some code smell, VSD performs a measurement, executes the respective detection strategy based on obtained values, and shows the 111

112 methods/components found with the selected code smell symptom. For example, the detection strategy for God Method is based on two main characteristics [Abilio, 2014]: i) Methods that may concentrate responsibilities, represented by method overrides, i.e., complete override and refinements; and ii) Long and complex methods. To address these characteristics, four measures were selected: NOOr, NMR, MLoC, and Cyclo. The detection strategy is [Abilio, 2014]: ((NOOr + NMR) > HIGH) OR ((MLoC > AVG) AND ((Cyclo/MLoC) > HIGH)) Table 1 - VSD Measures Groups Method Component SPL Measures Method's Lines of Code (MLoC) Number of Method Refinements (NMR) McCabe's Cyclomatic Complexity (Cyclo) Number of Operation Overrides (NOOr) Number of Parameters (NP) Lines of Code (LoC) Coupling between Objects (CBO) Number of Methods (NOM) Weighted Methods Per Class (WMC) Number of Attributes (NOA) Number of Constant Refinements (NCR) Number of Components (NOC) Number of Features (NOF) Number of Constants (NOCt) Total Lines of Code (TLoC) Number of Refinements (NOR) Total Number of Methods (TNOM) Number of Refined Constants (NRC) Total Number of Attributes (TNOA) Total Number of Method Refinements (TNMR) Total Cyclomatic Complexity (TCyclo) Number of Refined Methods (NRM) Total Coupling between Objects (TCBO) Number of Overridden Operations (NOrO) Total Number of Operation Overrides (TNOOr) 2.3. Functions The potential users of VSD are software engineers that want to measure an AHEADbased SPL and detect some undesired behavior in the implementation of the SPL features. They can access VSD via pop-up menu showed after right-clicking an AHEAD project. The available options are Detect Shotgun Surgery, Detect God Class, Detect God Method, and Measure. After selecting an option, VSD shows the results in the respective view: VSD Shotgun Surgery, VSD God Class, VSD God Method, and VSD Measures. Figure 2 depicts VSD Measures view that presents the measurement results in a tree view; thus, users can see the value per measure - e.g. Total of Method s Lines of Code is 3,910. Users can also expand each measure to check the value for a method / component - e.g. malen() method, from the ExplodierenEffekt.jak component and the explodieren feature, has 5 as Cyclomatic Complexity value. If the user selects Save as CSV button (on top-right position), VSD saves the results in three Comma-Separated Values (CSV) files - one file per group (Table 1) - in vsd-output folder. VSD God Method, VSD God Class, and VSD Shotgun Surgery views are similar. Figure 3 depicts the VSD God Method view. Each view presents the respective strategy with threshold values centralized on top, Save as CSV button at the top-right corner, and one table. The table contains id, feature, component, and method (only, VSD God Method) names, indication if component/method is a refinement (Y = 112

113 yes, N = no), and measures. For instance, the strategy ((NOOr+NMR) > 2.39) OR ((MLoC > 10.09) AND ((Cyclo/MLoC) > 0.24))is centralized on top of Figure 3. Line #2 presents the Beschleunigung feature, Tank.jak component, and toolkontroller() method. This method is a refinement (Y) and has: MLoC: 11, Cyclo: 6, NP: 0, NMR: 0, and NOOr: 0. If the user performs a double-click in a row, VSD opens the respective component in a code editor. In addition, if the user selects Save as CSV, result is saved in a CSV file whose identifier (name) is the code smell name (e.g., godmethod.csv) in vsd-output folder. Figure 2 - VSD Measure View Figure 3 - VSD God Method View The threshold values may vary depending on the selected (set of) project(s) (i.e., SPL). VSD has a preferences page as presented in Figure 4 - accessed via Window -> Preferences -> VSD Preferences. In the VSD preference page, users can define values for each measure in each strategy. The strategies with the acronym and name of each measure are presented. When the user saves or applies changes, VSD updates the values presented in the views and new values are used when the user performs a new detection. 3. An Example of VSD Use We configured VSD with default threshold values, to detect the proposed code smells, based in empirical data of eight SPLs [Abilio, 2014]. Using those values, VSD found 64 methods with God Method symptoms in TankWar SPL. Table 2, Table 3, and Table 4 show samples of methods and components detected by VSD with code smell symptoms in TankWar SPL. 113

114 Figure 4 - VSD Preference Page Table 2 - Sample of Methods with SPL God Method Feature Component Method Refinement MLoC Cyclo NMR NOOr TankWar Tank.jak toolbehandeln(int) N Tools Tank.jak toolbehandeln(int) N Beschleunigung Tank.jak toolbehandeln(int) Y Bombe Tank.jak toolbehandeln(int) Y einfrieren Tank.jak toolbehandeln(int) Y Feuerkraft Tank.jak toolbehandeln(int) Y Table 3 - Sample of Components with SPL God Class Feature Component Interface Refinement LoC CBO WMC NCR Handy Maler.jak N N PC Maler.jak N N fuer_handy Maler.jak N Y fuer_pc Maler.jak N Y Table 4 - Sample of Components with SPL Shotgun Surgery Feature Component Interface Refinement NOM NOA CBO NCR Handy Maler.jak N N PC Maler.jak N N The sample of methods (Table 2) addresses one method and its refinements. The toolbehandeln(int) method from the Tank.jak component was implemented as an empty method in the TankWar feature, which is the root of the feature model. This method was re-implemented (overridden) in Tools feature, and refined in the Beschleunigung, Bombe, einfrieren, and Feuerkraft features. One possible problem is: if one adds some code in the first method (TankWar feature), it will be overridden and will not be used in the refinement chain because the second method (Tools feature) completely overrides it. That is, the first method was not refined. The other four refinements were detected with code smell because the density of branches (Cyclo/MLoC) is higher than the threshold. In fact, these methods seem to be simple, but the software engineer needs to pay attention to them. The sample of components (Table 3) detected with SPL God Class shows that Maler.jak components are very similar between Handy and PC features and between fuer_handy and fuer_pc features. This occurred because Handy and PC are 114

115 alternative features and the selection of one of them implies the selection of only one fuer. We identified three possible problems with Maler.jak. The first problem is that we have duplicated code, which is also code smell. The second one is that we have two constants and a large number of refinements adding behaviors that the developer does not know to which constant until the build of the product. Finally, the third problem is that the refinements are very complex and coupled to other components. Observing the code of the Maler.jak (PC feature), we noted that this component is responsible for screen, menus, behavior of menus and keys, and help items, for example. That is, this component concentrates many responsibilities. Table 4 shows two constants of TankWar SPL detected with SPL Shotgun Surgery: Maler.jak from the Handy and PC features. These components were also indicated with SPL God Class and were detected with SPL Shotgun Surgery because they are coupled to many components and share many methods and attributes with many refinements. By a manual inspection in the code of Maler.jak (PC feature) and its refinements, we noticed that the refinements access directly the attributes of the constant, i.e., they do not use setter and getter methods. Hence, a change in the attributes may be propagated to the refinements. For example, helpitemerstellen() method instantiates the protected attribute menu in PC and this method was refined six times to add items to the menu, which is directly accessed into the refinements. That is, changes in the refinements may occur if Maler.jak is refactored regarding menu. 4. Conclusion Several tools have been developed to measure properties of software, and measuresbased strategies have been proposed to detect code smells in OO and AO software. We developed Variability Smell Detection (VSD) tool to measure FO software and detect specific code smells in AHEAD-based SPLs. We used VSD with eight SPLs of different sizes, e.g. AHEAD-java with 16,719 lines of code; 963 components; and 838 refinements. In an empirical study, Abilio (2014) verified that the results of VSD detection are in agreement with results of a manual inspection performed by specialists. Therefore, it can save time in the analysis of methods and components, allows software engineers to have a notion of the feature implementation, and allows the identification of code smells. The first version of VSD only measures AHEAD-based SPL, but it can be expanded to measure SPLs developed with other techniques, such as AspectJ and FeatureHouse, and we only need to parse the code to VSD structure and adapt the measures, if necessary. It can also implement strategies to detect variability smells related to feature models, for example. Therefore, our future goal is to improve VSD perform further empirical studies to evaluate it. Acknowledges This work was partially supported by Capes, CNPq (grant Universal /2013-5) and FAPEMIG (grants APQ and PPM ). References Abilio, R. (2014) Detecting Code Smells in Software Product Lines. Master s thesis, Federal University of Lavras (UFLA), 141p. Apel, S.; Batory, D.; Kastner, C.; Saake, G. (2013) Feature-Oriented Software Product Lines: Concepts and Implementation. Springer, 315p. 115

116 Apel, S.; Beyer, D. (2011) Feature Cohesion in Software Product Lines: An Exploratory Study. In: International Conf. on Software Engineering, pp Batory, D.; Sarvela, J.; Rauschmayer, A. (2003) Scaling Step-Wise Refinement. In: 25th International Conference on Software Engineering, pp Blonski, H.; Padilha, J.; Barbosa, M.; Santana, D.; Figueiredo, E. (2013) ConcernMeBS: Metrics-based Detection of Code Smells. In: Brazilian Conference on Software (CBSoft), Tools Session. Brasilia, Brazil, Chidamber, S.; Kemerer, C. (1994) A Metrics Suite for Object Oriented Design. IEEE Transactions on Software Engineering, v. 20, n. 6, pp Figueiredo, E.; Sant'Anna, C.; Garcia, A.; Lucena, C.et al. (2012) Applying and Evaluating Concern-Sensitive Design Heuristics. Journal of Systems and Software, v.85, n.2, pp Fowler, M.; Beck, K.; Brant, J.; Opdyke, W.; Roberts, D. (1999) Refactoring: Improving the Design of Existing Code. Addison Wesley, 464p. Kang, K. C.; Cohen, S. G.; Hess, J. A.; Novak, W. E.; Peterson, A. S. (1990) Feature- Oriented Domain Analysis (FODA) Feasibility Study, Technical Report, SEI. Kiczales, G.; Lamping, J.; Mendhekar, A.; Maeda, C.; Lopes, C.; Loigtier, J.; Irwin, J.; (1997) Aspect-Oriented Programming. In ECOOP 97, pp Lanza, M.; Marinescu, R. (2006) Object-Oriented Metrics in Practice: Using Software Metrics to Characterize, Evaluate, and Improve the Design of Object-Oriented Systems. Springer, 205p. Macia, I.; Garcia, A.; Staa, A. von. (2010) Defining and Applying Detection Strategies for Aspect-Oriented Code Smells. In: 24th Brazilian Symposium on Software Engineering, pp Marinescu, R. (2004) Detection Strategies: Metrics-Based Rules for Detecting Design Flaws. In: International Conference on Software Maintenance, pp McCabe, T. J. (1976) A Complexity Measure. IEEE Transactions on Software Engineering, v. 2, n. 4, pp Padilha, J.; Figueiredo, E.; Sant'Anna, C.; Garcia, A. (2013) Detecting God Methods with Concern Metrics: An Exploratory Study. In: 7th Latin-American Workshop on Aspect-Oriented Software Development, pp Padilha, J.; Pereira, J.; Figueiredo, E.; Almeida, J.; Garcia, A.; Sant Anna, C. (2014) On the Effectiveness of Concern Metrics to Detect Code Smells: An Empirical Study. In: International Conference on Advanced Information Systems Engineering. Pohl, K.; Bockle, G.; Linden, F. J. van der. (2005) Software Product Line Engineering: Foundations, Principles, and Techniques. Springer, 490p. Prehofer, C. (1997) Feature-Oriented Programming: A Fresh Look at Objects. In: European Conference of Object-Oriented Programming, pp Schulze, S.; Apel, S.; Kastner, C. (2010) Code Clones in Feature-Oriented Software Product Lines. In: 9th International Conference on Generative Programming and Component Engineering, pp Schulze, S.; Thüm, T.; Kuhlemann, M.; Saake, G. (2012) Variant-Preserving Refactoring in Feature-Oriented Software Product Lines. In: 6th Workshop on Variability Modeling of Software-Intensive Systems, pp Thüm, T.; Kästner, C.; Benduhn, F.; Meinicke, J.; Saake, G.; Leich, T. (2014) FeatureIDE: An Extensible Framework for Feature-Oriented Software Development. Science of Computer Programming, v.79, pp

117 AccTrace: Considerando Acessibilidade no Processo de Desenvolvimento de Software Rodrigo Gonçalves de Branco, Maria Istela Cagnin, Debora Maria Barroso Paiva 1 Faculdade de Computação Universidade Federal de Mato Grosso do Sul (UFMS) Campo Grande, MS Brasil {rodrigo.g.branco,istela,dmbpaiva}@gmail.com Abstract. Software Development Processes which not consider accessibility in their scope can deliver a product inaccessible as a result. In addition, developers may not have the skills to interpret and implement accessibility requirements. This paper presents the AccTrace tool, a CASE tool built as an Eclipse plugin, which delivery to developer, through traceability of accessibility requirements and comments in the source code, useful information for the implementation of these requirements. link to video: Resumo. Processos de Desenvolvimento de Software que não consideram Acessibilidade em seu escopo podem entregar um produto inacessível como resultado. Além disso, os desenvolvedores podem não ter as habilidades necessárias para interpretar e implementar requisitos de acessibilidade. Este trabalho apresenta a ferramenta AccTrace, uma ferramenta CASE construída como um plugin do Eclipse, que entrega ao desenvolvedor, através da rastreabilidade dos requisitos de acessibilidade e comentários no código-fonte, informações úteis para a implementação de tais requisitos. 1. Introdução Fornecer softwares acessíveis continua sendo um desafio nos dias atuais, com diversas pesquisas na área [Lazar et al. 2004, Brajnik 2006, Parmanto e Zeng 2005]. Dentre as dificuldades pertinentes ao problema, pode-se destacar a identificação dos requisitos de acessibilidade e sua posterior propagação e rastreabilidade até a fase de construção do produto. Enquanto existem propostas para integrar usabilidade e acessibilidade nos processos de Engenharia de Software, muitos desenvolvedores não sabem como implementar tais produtos acessíveis [Kavcic 2005, Alves 2011]. A utilização de ferramentas CASE nos processos de Engenharia de Software é muito comum, e em geral, aumenta a produtividade dos desenvolvedores, já que elas automatizam algumas tarefas diminuindo o esforço e o tempo de construção da solução. Na área de acessibilidade, é possível encontrar diversas ferramentas especializadas, como frameworks, simuladores, validadores, entre outras [Fuertes et al. 2011, Bigham et al. 2010, Votis et al. 2009, Masuwa-Morgan 2008]. Contudo, os desenvolvedores constantemente apontam diversos problemas nessas ferramentas e geralmente estão insatisfeitos com o suporte fornecido pelas empresas que as desenvolvem e comercializam [Trewin et al. 2010]. 117

118 A maioria das ferramentas existentes neste contexto é utilizada quando o produto está em fase de codificação. Por isso, seria interessante que os requisitos de rastreabilidade, assim que localizados, pudessem ser rastreados para identificar se estão sendo codificados corretamente. Há vários estudos relacionados à rastreabilidade de requisitos de forma genérica [Ali et al. 2011, Gotel e Finkelstein 1994, Mader e Egyed 2012], porém, poucos estudos estão relacionados aos requisitos de acessibilidade durante o processo de desenvolvimento de software [Dias et al. 2010]. Este trabalho apresenta a AccTrace, uma ferramenta CASE desenvolvida como um plugin do Eclipse, que permite acompanhar a evolução dos requisitos de acessibilidade até a fase de codificação, fornecendo informações relevantes ao desenvolvedor para a construção do produto acessível. A ferramenta foi construída utilizando o MTA [Maia 2010], um processo de desenvolvimento de software baseado na ISO/IEC [ISO/IEC 1998], que inclui tarefas de acessibilidade. Usando esta nova abordagem provida pela AccTrace, o relacionamento entre os requisitos de acessibilidade e os modelos UML (Unified Modeling Language [Booch et al. 1996]), adicionados de informações importantes ao desenvolvedor, são transformados em comentários no código-fonte, recuperados em tempo real e apresentados ao desenvolvedor. A ferramenta tem caráter experimental e acadêmico e é distribuída sob a licença Eclipse Public License V1.0 e pode ser baixada em https: //github.com/rodrigogbranco/acctrace. Este trabalho está organizado da seguinte forma: na Seção 2 são discutidas as características da ferramenta, principais funcionalidades e potenciais usuários. Na Seção 3 são discutidos os principais conceitos da arquitetura da ferramenta, componentes de software e interfaces. Na Seção 4 são descritos ferramentas e trabalhos relacionados. Por fim, na Seção 5 são discutidas as conclusões e trabalhos futuros. 2. Características da Ferramenta A principal funcionalidade da ferramenta AccTrace é promover a rastreabilidade dos requisitos de acessibilidade, partindo da engenharia de requisitos até a fase de codificação. Construída como uma ferramenta CASE que atua como um plugin da IDE Eclipse, esta ferramenta trabalha em conjunto com outros plugins para atingir seu objetivo Fundamentação Teórica A AccTrace utilizou o Processo de Desenvolvimento de Software MTA [Maia 2010] para definir o fluxo de trabalho da ferramenta, principalmente os subprocessos 4 (Análise de Requisitos de Software), 5 (Projeto de Software) e 6 (Construção do Software). A Figura 1 apresenta o fluxo de trabalho e o esquema de rastreabilidade em alto nível e a inclusão dos requisitos de acessibilidade. A ferramenta AccTrace, assim como o MTA, prevê que no projeto exista um papel designado Especialista em Acessibilidade. A pessoa que assume este papel tem a responsabilidade de identificar os requisitos de acessibilidade e relacioná-los aos modelos e técnicas de implementação dos mesmos Funcionalidades De forma geral, a ferramenta AccTrace permite que (a) os relacionamentos entre os requisitos de acessibilidade, modelos UML e técnicas de implementação de acessibilidade 118

119 Figura 1. Detalhamento dos Subprocessos do MTA [Maia 2010] para prover a rastreabilidade dos requisitos de acessibilidade de acordo com a abordagem adotada neste trabalho. O artefato com nome em negrito refere-se ao artefato gerado pela AccTrace sejam gerenciados; (b) a matriz de rastreabilidade dos relacionamentos descritos seja criada; (c) o código-fonte receba comentários personalizados indicando tais relacionamentos e (d) o desenvolvedor recupere informações sobre tais relacionamentos em tempo real Potenciais Usuários O Especialista em Acessibilidade cadastrará as informações relativas ao relacionamento dos requisitos de acessibilidade e modelos UML. Ele incluirá neste relacionamento informações sobre as técnicas de implementação de acessibilidade diretamente na ferramenta AccTrace. Em seguida, os usuários que se beneficiam de artefatos como Documento de Requisitos e Modelos UML (Gerentes de Projeto, Engenheiros de Requisitos, Analistas, Desenvolvedores, entre outros) podem usufruir destas informações, seja através das Visões diretamente na ferramenta, seja na Matriz de Rastreabilidade ou, no caso dos desenvolvedores, na visão específica para recuperação de informações no código-fonte. 3. Arquitetura da Ferramenta A AccTrace trabalha em conjunto com as seguintes ferramentas: (a) Eclipse como IDE; (b) Requirement Designer - Plugin de Gerenciamento de Requisitos; (c) UML Designer - Plugin de modelagem UML e (d) UML to Java Generator - Plugin de geração de código Os três plugins descritos anteriormente são distribuídos pela empresa Obeo e foram escolhidos para trabalharem em conjunto com a AccTrace por serem interoperáveis. Eles estão disponíveis em Além disso, uma tecnologia essencial utilizada é uma Ontologia de Acessibilidade do Projeto Aegis [Aegis 2013], que utiliza o documento de referência WCAG 2.0, disponibilizadas no formato OWL, e mapeia as diretrizes e técnicas de implementação de acessibilidade. A Figura 2 mostra o comportamento e o relacionamento das ferramentas, tecnologias e atores envolvidos no fluxo de trabalho da AccTrace. 119

120 Figura 2. Associação das ferramentas e atores no contexto do trabalho Não existe uma interface formal entre as ferramentas, e por esse motivo, não existe uma linguagem que descreva a arquitetura. Os artefatos gerados em formato RDF pelos plugins Requirement Designer e UML Designer são informados como entrada para a AccTrace. As ferramentas devem ser encaradas como uma caixa-preta, que recebem as entradas e produzem as saídas. A AccTrace também usa um arquivo no formato RDF como forma de persistência, que é usado como entrada para o plugin UML to Java Generator, que foi adaptado para atender aos propósitos deste trabalho. O diagrama de classes da AccTrace pode ser visto na Figura 3. Figura 3. Diagrama de classes da ferramenta (AccTrace) A classe principal é a AccTraceModel, que armazena as referências para os repositórios dos requisitos (Repository), e também objetos de referência às associações dos requisitos, modelos e técnicas de implementação de acessibilidade (Reference). Esse objeto referencia um requisito (Requirement), um diagrama UML (EObject) e uma ou mais técnicas de implementação de acessibilidade, representadas aqui pela seleção da ontologia disponível. As referências à ontologia são persistidas através de sua IRI (Internationalized Resource Identifier) (uma generalização de URI Uniform Resource Identifier) em forma de String. Além disso, devido a possibilidade de existência de inúmeros requisitos no projeto e considerando o fato de que apenas os requisitos de acessibilidade sejam im- 120

121 portantes para a associação à técnicas de implementação de acessibilidade, é previsto no modelo também a inclusão de filtros dos requisitos (RequirementFilter), para não poluir a visualização dos requisitos na ferramenta. A ferramenta possui três visões principais, de acordo com a Figura 4. No editor (AccTrace Editor - 2) é possível alterar os repositórios dos requisitos e gerar as associações entre os modelos UML, requisitos e técnicas de implementação. Na visão dos requisitos (Requirement Associations - 1) é possível visualizar quais requisitos associados ao modelo UML foram selecionados no editor. Na visão das técnicas já vinculadas (Accessibility Specifications View - 3) é possível visualizar as técnicas de implementação já associadas, de acordo com o modelo UML selecionado no editor e o requisito de acessibilidade selecionado na visão dos requisitos. Além disso, é possível remover as técnicas de implementação associadas. As três visões são importantes para o correto funcionamento da ferramenta. Figura 4. Visualização da ferramenta AccTrace na tela principal do Eclipse Uma vez selecionado o modelo UML e o requisito, é possível efetuar a associação da técnica de implementação de acessibilidade, clicando com o botão direito do mouse sobre o modelo UML, conforme ilustrado na Figura 5. As técnicas de implementação de acessibilidade estão mapeadas na Ontologia fornecida pelo Projeto Aegis [Aegis 2013]. Esta ontologia é o repositório das técnicas de implementação de acessibilidade e mapeia o domínio. Essas técnicas são ligadas aos requisitos e modelos UML, e suas informações armazenadas no repositório são recuperadas na visão específica. Como os artefatos são descritos em formato RDF (requisitos, modelos UML e a ontologia), os vínculos são feitos a partir do elemento RDF:ID. Na prática, qualquer modelo UML que seja descrito em formato RDF pode ser vinculado e rastreado através da matriz de rastreabilidade e das visões no Eclipse. Para os comentários no código fonte, contudo, apenas terão reflexos os modelos UML que são recebidos como entrada no plugin de geração de código, como diagrama de classes, por exemplo. Depois que o relacionamento entre os requisitos, modelos UML e técnicas de 121

122 Figura 5. Procedimento para efetuar a associação da técnica de implementação de acessibilidade implementação de acessibilidade estiver definido, é possível gerar a matriz de rastreabilidade de forma automática pela ferramenta AccTrace no formato ODS (Open Document Sheet, equivalente à planilha do Microsoft Excel). Parte dessa matriz pode ser vista na Figura 6, que mostra o relacionamento entre Requisitos e Modelos UML. A matriz deve ser gerada utilizando o wizard específico para esse fim, acessando na barra de tarefas as opções File, New e em seguida Other..., selecionando a opção Traceability Matrix file wizard. Figura 6. Parte da matriz de rastreabilidade gerada pela ferramenta AccTrace Para a geração de código, o plugin UML to Java Generator foi customizado para receber como entrada, além dos modelos UML, o arquivo RDF da AccTrace. A medida que o código-fonte é gerado, é verificado se existe um relacionamento de acessibilidade para o modelo UML usado como base. Em caso positivo, um comentário personalizado é gerado, através da seguinte expressão regular Java: String regex = "//!ACCTRACE!(/)?([ˆ/\\\\0#]+(/)?)+#([ˆ\\*\\*/])+"; A Figura 7 mostra um comentário baseado na expressão regular supracitada já traduzida na View específica para este fim, em que é possível observar informações como o requisito, o modelo UML e quais técnicas estão referenciados no comentário. Uma prova de conceito da AccTrace pode ser encontrada em [Branco 2013]. 122

123 Figura 7. Explicitação do comentário selecionado 4. Ferramentas e Trabalhos Relacionados Já existem iniciativas, principalmente corporativas, que permitem agregar os requisitos levantados aos artefatos do processo de desenvolvimento, por exemplo, construção de relatórios de rastreabilidade usando os programas IBM Rational Software Architect, IBM Rational RequisitePro e BIRT para WebSphere [Hovater 2008]. O software Enterprise Architect da empresa Sparx Systems permite utilizar diagramas de requisitos, que são extensões dos diagramas tradicionais da UML, permitindo a rastreabilidade do modelo. Contudo, não foram encontrados na literatura trabalhos que tratem especificamente da rastreabilidade dos requisitos de acessibilidade dentro do processo de desenvolvimento de software. Além disso, não foi encontrado como essas alternativas permitem, partindo do código fonte da solução, recuperar as informações de rastreabilidade dos requisitos. 5. Conclusões Este estudo mostrou ser possível especificar, antes das fases de codificação e vinculadas aos modelos e requisitos de acessibilidade, as técnicas de implementação que deverão ser visualizadas pelos programadores. A utilização de a ontologia pré-definida do projeto Aegis [Aegis 2013] ajudou a alcançar tal objetivo, estendendo as técnicas de implementação anteriormente ditas para abordagens, diretrizes, critérios de sucesso, etc. É esperado que o produto final tenha melhor acessibilidade, já que as informações sobre a implementação da mesma estarão disponíveis durante o processo de desenvolvimento de software. Como limitação, apenas requisitos de acessibilidade podem ser utilizados, pois o domínio, em forma de ontologia, está mapeado desta forma. Nota-se que a ferramenta AccTrace não recupera, a partir de códigos fontes arbitrários, as informações sobre a rastreabilidade dos requisitos, modelos UML e técnicas de implementação, já que em nível de código, a informação necessária para a recuperação dos dados é o comentário AccTrace personalizado, que não estará presente em tais projetos. A ferramenta também exige o acompanhamento de um Especialista em Acessibilidade para que o registro das informações de acessibilidade seja feito no momento correto, profissional este que pode não estar disponível na equipe de desenvolvimento do produto. É possível identificar algumas atividades para trabalhos futuros: (a) utilizar a Acc- Trace em um projeto real como estudo de caso; (b) melhorar a usabilidade da ferramenta, melhorando as mensagens apresentadas e aproveitando o relacionamento da ontologia do projeto Aegis; (c) estender o escopo deste trabalho, incluindo tarefas de testes e integração do software e do sistema (Subtarefas 7, 8, 9 e 10 do MTA); (d) estender a matriz de rastreabilidade dos requisitos construída para incluir os casos de testes descritos no item anterior; (e) implementar consultas e visualizações dos artefatos rastreáveis, recuperáveis através dos vínculos realizados pelos elementos RDF:ID dos artefatos e (f) substituir as 123

124 strings de entidades HTML recuperadas da ontologia ao serem apresentadas nas visões, resolvendo assim o problema dos caracteres desconhecidos apresentados na Figura 7. Referências Aegis (2013). Aegis ontology. content&view=article&id=107&itemid=65. Acessado em Maio de Ali, N., Gueheneuc, Y., e Antoniol, G. (2011). Trust-based requirements traceability. Em 19th IEEE ICPC, páginas , Kingston, Ontario, Canadá. Alves, D. D. (2011). Acessibilidade no desenvolvimento de software livre. Dissertação de Mestrado, UFMS. 135 páginas. Bigham, J. P., Brudvik, J. T., e Zhang, B. (2010). Accessibility by demonstration: enabling end users to guide developers to web accessibility solutions. Em Proceedings of the 12th International ACM SIGACCESS, ASSETS 10, páginas 35 42, Nova York, NY, EUA. ACM. Booch, G., Rumbaugh, J., e Jacobson, I. (1996). The unified modeling language : selections from oopsla 96. Tutorial 37. Brajnik, G. (2006). Web accessibility testing: When the method is the culprit. Em ICCHP, páginas Branco, R. G. d. (2013). Acessibilidade nas fases de engenharia de requisitos, projeto e codificação de software: Uma ferramenta de apoio. Dissertação de Mestrado, UFMS. 95 páginas. Dias, A. L., de Mattos Fortes, R. P., Masiero, P. C., e Goularte, R. (2010). Uma revisão sistemática sobre a inserção de acessibilidade nas fases de desenvolvimento da engenharia de software em sistemas web. Em Proceedings of the IX Symposium on Human Factors in Computing Systems, IHC 10, páginas 39 48, Porto Alegre, Brasil. Sociedade Brasileira de Computação. Fuertes, J. L., Gutiérrez, E., e Martínez, L. (2011). Developing hera-ffx for wcag 2.0. Em Proceedings of the International Cross-Disciplinary Conference on Web Accessibility, W4A 11, páginas 3:1 3:9, New York, NY, USA. ACM. Gotel, O. C. Z. e Finkelstein, A. C. W. (1994). An analysis of the requirements traceability problem. Em Proceedings of the First International Conference on Requirements Engineering, páginas , Colorado Springs, Colorado, Estados Unidos. Hovater, S. (2008). Uml-requirements traceability using ibm rational requisitepro, ibm rational software architect, and birt, part 1: Reporting requirements. rational/tutorials/dw-r-umltracebirt/dw-r-umltracebirt-pdf.pdf. Acessado em Maio de ISO/IEC (1998). ISO/IEC Standard for Informational Technology - Software Lifecycle Processes. ISO/IEC, 1, ch. de la Voie-Creuse - CP 56 - CH-1211 Geneva 20 - Switzerland. Kavcic, A. (2005). Software accessibility: Recommendations and guidelines. Em The International Conference on Computer as a Tool, EUROCON 2005, volume 2, páginas , Belgrado, Sérvia. Lazar, J., Dudley-Sponaugle, A., e Greenidge, K.-D. (2004). Improving web accessibility: a study of webmaster perceptions. Computers in Human Behavior, 20(2): Mader, P. e Egyed, A. (2012). Assessing the effect of requirements traceability for software maintenance. Em 28th IEEE ICSM, páginas , Trento, Itália. Maia, L. S. (2010). Um processo para o desenvolvimento de aplicações web acessíveis. Dissertação de Mestrado, UFMS. 94 páginas. Masuwa-Morgan, K. (2008). Introducing accessonto: Ontology for accessibility requirements specification. Em First International Workshop on ONTORACT, páginas Parmanto, B. e Zeng, X. (2005). Metric for web accessibility evaluation. JASIST, 56(13): Trewin, S., Cragun, B., Swart, C., Brezin, J., e Richards, J. (2010). Accessibility challenges and tool features: an ibm web developer perspective. Em Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A), W4A 10, páginas 1 10, New York, NY, USA. ACM. Votis, K., Oikonomou, T., Korn, P., Tzovaras, D., e Likothanassis, S. (2009). A visual impaired simulator to achieve embedded accessibility designs. Em IEEE International Conference on ICIS, volume 3, páginas

125 Spider-RM: Uma Ferramenta para Auxílio ao Gerenciamento de Riscos em Projetos de Software Heresson João Pampolha de Siqueira Mendes 1, Bleno Wilson Franklin Vale da Silva², Diego Oliveira Abreu², Diogo Adriel Lima Ferreira², Manoel Victor Rodrigues Leite², Marcos Senna Benaion Leal², Sandro Ronaldo Bezerra Oliveira 1,2 1 Programa de Pós-Graduação em Ciência da Computação (PPGCC) Universidade Federal do Pará (UFPA) - Rua Augusto Corrêa, 01 Guamá Belém - PA - Brasil ²Faculdade da Computação Faculdade de Computação, Instituto de Ciência Exatas e Naturais, Universidade Federal do Pará (UFPA) {heresson, blenofvale, diegooabreu, cetlho, victor.ufpaa, marcosbenaion}@gmail.com, srbo@ufpa.br Abstract. This paper presents a software tool named Spider-RM, a desktop solution that supports risk management according to software quality models. The main purpose of it is systematizing risks management best practices reducing the execution time of tasks and enhancing learning by those involved. Vídeo da Ferramenta Introdução Para se tornarem competitivas, as organizações desenvolvedoras de software devem entregar produtos que satisfaçam as necessidades dos clientes de forma a garantir a confiança e satisfação [SOFTEX 2012]. Consequentemente, qualidade é um atributo indispensável a ser considerado em todo processo de produção do software [KOSCIANSKI e SOARES 2007]. Além disso, o desenvolvimento de software possui um aspecto não repetível, tornando esta atividade imprevisível [KOSCIANSKI e SOARES 2007]. A gerência de riscos tem como objetivo tratar as incertezas de um projeto de forma pró-ativa, evitando que se tornem problemas e prejudiquem a execução do projeto conforme planejado. Alguns dos principais modelos de qualidade para projetos de software apresentam boas práticas para o gerenciamento efetivo de riscos, entre estes modelos podem-se citar o MR-MPS-SW [SOFTEX 2012], o PMBOK [PMI 2013], a ISO/IEC [ABNT 2009] e o CMMI-DEV [SEI 2010]. Geralmente, o gerenciamento de riscos é recomendado nos maiores níveis de maturidade da organização, e devido à pouca experiência prática nesses níveis no Brasil [SOFTEX 2014], torna-se importante uma ferramenta de software que sistematize e agilize as tarefas deste processo, reduzindo assim custos e facilitando o aprendizado O artigo está organizado da seguinte maneira: a Seção 2 apresenta a arquitetura da ferramenta; a Seção 3 apresenta as principais funcionalidades; a Seção 4 apresenta alguns aspectos sobre a implementação; a Seção 5 relata um estudo de validação; a Seção 6 apresenta alguns trabalhos relacionados; e, finalmente, a Seção 7 apresenta as conclusões e trabalhos futuros. 2. Arquitetura A ferramenta foi construída para ser utilizada em modo desktop e a princípio dá suporte a um usuário, Gerente ou Líder de Projeto. A escolha dessa abordagem deu-se para 125

126 atender às necessidades da empresa que serviu para a realização do Estudo de Caso. Entretanto, sua arquitetura foi desenvolvida de modo a facilitar constantes evoluções, podendo ser adaptada futuramente ao ambiente web e multiusuário. A arquitetura do Spider-RM (Risks Management) foi baseada em uma combinação entre a arquitetura em três camadas e o modelo MVC. Deste modo, os eventos ocorridos são gerenciados por controladores, que são intermediários entre a interface com o usuário e as entidades modeladas do banco de dados. Assim, o principal ganho com essa abordagem é a facilidade de manutenção e adição de novos recursos que podem surgir, como a mudança das interfaces com o usuário ou do banco de dados nativo. Para manter o código o mais legível possível, padronizar o entendimento da equipe de desenvolvimento e reduzir custos de futuras manutenções, também foram adotados os padrões de projetos Facade e DAO [Gamma et al. 2000], isolando a camada de negócio das camadas de visualização e de persistência. Há também uma integração da arquitetura entre ferramenta Spider-RM com o Redmine [Redmine 2014], integrando as necessidades de dar visibilidade da gerência de riscos aos demais integrantes da equipe do projeto. 3. Principais Funcionalidades Nesta seção serão apresentadas as principais funcionalidades da ferramenta Spider-RM, que fazem da mesma um diferencial em relação a outras ferramentas disponíveis Criação de Política Organizacional A ferramenta permite o armazenamento de uma política organizacional no formato de texto, ou também a opção de anexar um documento já existente na organização, para futuras consultas. A Política Organizacional serve para definir as diretrizes quanto à abrangência de aplicação do processo de Gerência de Riscos na organização em relação à sua estrutura. A centralização de informações agiliza o planejamento e monitoramento, reduzindo esforço na execução do projeto Gerenciamento de vários Projetos Simultâneos A ferramenta foi desenvolvida para ser facilmente utilizada para gerenciar múltiplos projetos, visualizando informações de cada um de forma detalhada ou comparada. De forma agregada, também é permitido realizar a avaliação de projetos concluídos e a análise de novas categorias de riscos identificadas durante um novo projeto. As avaliações são importantes insumos para o registro de uma base histórica organizacional e orientação em futuros projetos Criação de Plano de Gerência de Riscos De forma semelhante à política organizacional, é permitido inserir textualmente, ou anexar um documento existente de um plano de riscos de um projeto. Além disso, há uma funcionalidade especial para inserção de marcos e pontos de controle, de forma que a ferramenta cria tarefas relacionadas a estas datas importantes Gerenciamento da Estrutura Analítica de Riscos Um recurso bastante utilizado no gerenciamento de riscos é a Estrutura Analítica de Riscos (EAR) [PMI 2013], que acumula informações de categorias, auxiliando futura 126

127 análise e mitigação. A ferramenta fornece um modelo de EAR e permite a edição de acordo com as necessidade da organização. Cada projeto tem sua própria EAR independente da organizacional, podendo conter novas categorias ou não conter categorias da EAR organizacional. Após conclusão de um projeto, é permitida a institucionalização de novas categorias identificadas Identificação e Análise de Riscos A ferramenta permite identificar riscos e realizar análise detalhada, através de: subcondições, que identificam ocorrência; registro de relações entre riscos identificados; e cálculo do grau de severidade de um risco Registro e Monitoramento de Plano de Mitigação e Contingência Para cada risco identificado, é permitido definir planos de mitigação e/ou contingência, orientando o monitoramento durante o ciclo de vida do projeto. O plano de mitigação é definido como uma tarefa em um determinado marco ou ponto de controle do projeto, enquanto o plano de contingência apresenta-se como uma tarefa pendente após a ocorrência de todas as subcondições de um risco Monitoramento de Riscos A Spider-RM além de realizar o planejamento, provê total suporte ao monitoramento de riscos, permitindo realizar funções de: acompanhamento de mudanças nos riscos de um projeto; inclusão de novos riscos ocorridos durante a execução do projeto; escolha livre de riscos a serem monitorados e priorizados; histórico de ocorrência dos riscos Gerenciamento de Tarefas Boa parte do monitoramento funciona com o auxílio de tarefas. Os planos de mitigação são definidos como tarefas em marcos ou pontos de controles, e caso não sejam realizados, serão alertados como tarefas pendentes. Desta forma, é possível realizar um controle de quais riscos precisam ser mitigados, apenas monitorados ou até mesmo contingenciados. 4. Implementação A ferramenta Spider-RM foi implementada utilizando a linguagem de programação Java, sob a licença GPL General Public License voltada especificamente para o processo de Gerência de Riscos, aderente às boas práticas recomendadas pelos padrões MR-MPS-SW, CMMI-DEV, ISO/IEC e PMBOK. Trata-se de um ambiente Desktop e seu desenvolvimento foi pautado no uso de ferramentas de software livre, tais como: Sistema Operacional Ubuntu 14.04, Netbeans IDE 7.4, banco de dados MySQL tanto para persistência de dados quanto para comunicação entre o servidor local e a ferramenta. 5. Estudo de Caso Para avaliar a sistematização do processo de gerência de riscos, a ferramenta foi utilizada no ambiente de uma empresa de desenvolvimento de software em Recife, que certificou seus projetos no Nível 3 do CMMI, no qual os projetos tratam da evolução de produtos com foco em processos de decisão. 127

128 A equipe do projeto piloto era composta por três Gerentes e uma equipe técnica formada por quarenta e oito membros. Os gerentes possuem experiência com mais de sete anos neste perfil, tendo certificações PMBoK e Scrum, sendo que dois deles participaram da implantação do CMMI em outras organizações de Recife. Inicialmente, foi criada uma estrutura analítica de riscos padrão da organização. A ferramenta já indica uma sugestão, como apresentado na Figura 1, porém esta sugestão foi adaptada à necessidade do usuário. Em seguida foi criado um novo projeto, inserindo suas informações básicas, anexando um documento referente ao plano de riscos e mapeando a estrutura analítica de riscos organizacional para as categorias de riscos deste projeto. Figura 1. Estrutura Analítica de Riscos Rrganizacional na Ferramenta Spider-RM O segundo passo foi a inclusão de marcos e pontos de controle já previstos no projeto piloto, para começar a identificar os riscos. A partir da identificação de riscos e análise de probabilidade e impacto, demonstrada na Figura 2, a ferramenta detecta automaticamente a prioridade, porém a ordem de prioridade é ajustada pelo usuário de acordo com sua necessidade. Uma amostra dos riscos priorizados foi selecionada para monitoramento durante execução do projeto nos marcos e pontos de controle, gerando assim tarefas para esses dias, que serão identificadas como pendentes, caso não sejam realizadas. Para os riscos monitorados foram criados planos de mitigação e contingência. Além disso, foram identificadas relações de causa-efeito com outros riscos e subcondições a serem avaliadas para detectar ocorrência durante o monitoramento. Em uma segunda iteração, foram identificados novos riscos, que foram inseridos na ferramenta para serem analisados e realizar uma nova priorização de riscos. Durante o monitoramento, foram executados planos de mitigação e uma nova análise para identificar redução de probabilidade ou impacto do risco mitigado. Os riscos que tiveram todas suas subcondições atingidas foram detectados como ocorridos, acionando assim seu plano de contingência como uma nova tarefa pendente. A Figura 3 demonstra o monitoramento de um dos risco ocorrido. 128

129 Figura 2. Cadastro de Novos Riscos na Ferramenta Spider-RM Figura 3. Monitoramento de Subcondições de Riscos na Ferramenta Spider-RM. Após a conclusão do projeto, foi registrado na ferramenta sua avaliação, contendo os pontos fortes, pontos fracos e oportunidades de melhoria identificados, de forma a nortear futuros projetos, que terão seus riscos gerenciados. A Figura 4 apresenta a tela de avaliação de projetos concluídos. Após a conclusão do projeto, a equipe de especialistas que utilizou a ferramenta evidenciou alguns resultados positivos, como: a visibilidade do monitoramento dos riscos sobre as condições e tarefas para mitigação e contingência; o planejamento claro dos riscos em marcos e pontos de controle do cronograma; uma base de conhecimentos para todos os Gerentes sobre os riscos mantidos por projeto; definição de uma EAR para 129

130 projetos e organização; apoio à melhoria organizacional a partir do MPS.BR, CMMI e ISO/IEC Os três participantes do experimento também solicitaram ajustes em requisitos não funcionais, como usabilidade portabilidade e manutenibilidade, que serão atendidos nas próximas versões da Spider-RM. Na empresa do Estudo de Caso outros Gerentes fizeram uso da Spider-RM, porém estes profissionais estavam em formação, o que nos permitiu realizar mentoring sobre gestão de riscos a partir do uso da ferramenta e percebemos uma clareza do entendimento desta área de processo. Figura 4. Avaliação de Projetos Concluídos na Ferramenta Spider-RM 6. Trabalhos Relacionados Dentre as ferramentas para Gerência de Riscos encontradas, foram utilizadas como base comparativa as ferramentas TRIMS, CRAMM e RiskFree. A ferramenta TRIMS foi desenvolvida pelo BMP Center of Excellence como parte de uma suíte de produtos do PMWS (Program Manager s WorkStation), sendo propriedade do Departamento da Marinha do Governo dos Estados Unidos [TRIMS 2014]. Já a ferramenta CRAMM (CCTA Risk Analysis and Management Method) foi desenvolvida pelo British CCTA (Central Communication and Telecommunication Agency) do governo do Reino Unido [Yazar 2002]. Por fim, a ferramenta de Gerenciamento de Riscos RiskFree, desenvolvida na PUC-RS. Tal software é baseado nas boas práticas do PMBOK e aderente ao modelo CMMI [Knob 2006]. Os principais diferenciais da ferramenta Spider-RM aos trabalhos relacionados são: o embasamento em modelos de qualidade de processo de software; a possibilidade de definição e personalização da EAR; a priorização de riscos; a identificação de condições e tarefas para o monitoramento dos riscos; e o compartilhamento de riscos entre diferentes projetos. Para mais detalhes, o Quadro 1 compara funcionalidades entre as ferramentas TRIMS, CRAMM, RiskFree e Spider-RM. 130

131 Foi realizado um agrupamento de funcionalidades em categorias semelhantes, recebendo cada ferramenta a informação: "S", caso possua a funcionalidade; "P", caso realize a funcionalidade parcialmente, ou seja, com restrições; "N", caso não exista tal funcionalidade; e "D", caso a referência analisada sobre a ferramenta não identifique esta informação. Quadro 1. Comparativo entre as Principais Funcionalidades da Ferramenta Spider-RM e os Trabalhos Relacionados. Categorias de Funcionalidades Funcionalidades TRIMS CRAMM RiskFree Spider- RM Organizacional Planejamento para a Gerência de Riscos Análise de Riscos Monitoração de Riscos Gerenciamento de tarefas 7. Considerações Finais Permite fácil acesso à Política Organizacional N N N S Avaliação de Projeto Concluído N N P S Avaliação de Categorias de Risco Adicionadas N N N S Gerenciamento de vários Projetos Simultaneamente S D S S Inserção do Plano de Gerência de Riscos para Fácil Acesso N N S S Definir e Personalizar a Estrutura Analítica de Riscos (Categorias de Risco) S D N S Definir Marcos e Pontos de Controle N N N S Inserção e Controle de Plano de Mitigação S P S S Inserção e Controle de Plano de Contingência N N S S Identificar Subcondições para Monitorar Ocorrência dos Riscos P N N S Identificação de Relações entre Riscos N N N S Flexibilidade para Priorização de Riscos N N N S Cálculo do Grau de Severidade de um Risco S S S S Personalização do Grau de Severidade de um Risco P N N S Acompanhamento de Mudanças em Riscos durante o Projeto S P S S Identificação de Risco Ocorrido S D S S Permite Escolha Livre dos Riscos a serem Monitorados N N N S Histórico de Ocorrência de Risco S N D S Histórico de Ocorrência geral no Projeto N D D S Histórico de Alterações nos Riscos N N N S Apresentação de Tarefas Pendentes P D N S Histórico de Tarefas Realizadas N D S S Apresentação de Tarefas a serem Realizadas em Ponto de Controle ou Marco do Projeto N N N S O foco do trabalho foi realizar um estudo sobre o gerenciamento de riscos em modelos de qualidade, para que fosse desenvolvida uma ferramenta de software que sistematizasse as boas práticas destes modelos. A ferramenta foi utilizada em um projeto de experimentação identificando possíveis ajustes. 131

132 A ferramenta Spider-RM propõe reduzir custos e agilizar a implementação do processo de gerenciamento de riscos em organizações desenvolvedoras de software. Desta forma, a organização será beneficiada como um todo, tendo melhor controle das tarefas relacionadas a riscos. Também os gerentes com reduzida experiência relacionada a riscos podem de maneira mais fácil implantar este processo em seus projetos, de forma alinhada aos principais modelos de qualidade de processo de software. Como trabalhos futuros, pretendem-se: (1) promover a utilização da ferramenta em outros projetos reais de software, contemplando diferentes cenários no desenvolvimento do software, principalmente em organizações que buscam a certificação em modelos de qualidade; (2) evoluir a ferramenta para suporte a ambiente Web e múltiplos usuários; (3) integrar com ferramentas de suporte à implementação de outros processos de software, como gerência de projetos, gerência de requisitos, etc. 8. Agradecimentos Este trabalho recebe o apoio financeiro da CAPES a partir da concessão de bolsa institucional de mestrado ao PPGCC-UFPA, e SPIDER com a concessão de bolsas de Iniciação Científica. Este projeto é parte do Projeto SPIDER-UFPA ( [OLIVEIRA et al. 2011]. Referências ABNT - Associação Brasileira De Normas Técnicas (2009) NBR ISO/IEC 12207: Engenharia de Sistemas de Software - Processos de Ciclo de Vida de Software. Gamma, E. et al. (2000) "Padrões de Projetos - Soluções reutilizáveis de software orientado a objetos". Bookman. Knob, F. et al. (2006) "RiskFree Uma Ferramenta de Gerenciamento de Riscos Baseada no PMBOK e Aderente ao CMMI". Anais do V Simpósio Brasileiro de Qualidade de Software - SBQS, Vila Velha, ES. Koscianski, A., Soares, M. S. (2007) "Qualidade de Software". São Paulo, Novatec, 2. ed. Oliveira, S. R. B. et al. (2011) "SPIDER Uma Proposta de Solução Sistêmica de um SUITE de Ferramentas de Software Livre de Apoio à Implementação do Modelo MPS.BR". Revista do Programa Brasileiro da Qualidade e Produtividade em Software, SEPIN-MCT. 2ª Edição. Brasília-DF. PMI - Project Management Institute (2013) "A Guide to the Project Management Body of Knowledge". Campus Boulevard, Newton Square, 5th Edition. Redmine (2014) "Ferramenta Web Flexível para Gerenciamento de Projetos". Disponivel em Acesso em 01 de Agosto de 2014 SEI - Software Engineering Institute (2010) "Capability Maturity Model Integration (CMMI) for Development"., Version 1.3, Carnegie Mellon, USA. SOFTEX - Associação para Promoção da Excelência do Software Brasileiro (2012) "Melhoria do Processo de Software Brasileiro (MPS.BR) - Guia Geral 2012". Brasil. SOFTEX - Associação para Promoção da Excelência do Software Brasileiro (2014) "Avaliações MPS-SW (Software) Publicadas (prazo de validade: 3 anos)". Disponível em MPSSW-Publicadas_29.JAN_.2014_5331.pdf. Acesso em 02 de fevereiro de TRIMS (2014) Ferramenta de gerenciamento de riscos. Disponível em: Acesso em 18 de maio de Yazar, Z. (2002) "A qualitative risk analysis and management tool CRAMM". SANS Institute - GSEC, disponível em: doi= &rep=rep1&type=pdf. Último acesso em 18 de maio de

133 A Tool to Generate Natural Language Text from Business Process Models Raphael de Almeida Rodrigues 1, Leonardo Guerreiro Azevedo 1,2, Kate Revoredo 1, Henrik Leopold 3 1 Graduate Program in Informatics (PPGI) Federal University of the State of Rio de Janeiro (UNIRIO) Av. Pasteur, 456 Urca Rio de Janeiro RJ Brazil IBM Research - Brazil Av. Pasteur 146 Botafogo Rio de Janeiro RJ Brazil WU Vienna, Welthandelsplatz 1, 1020 Vienna, Austria {raphael.rodrigues, azevedo, katerevoredo}@uniriotec.br, LGA@br.ibm.com, henrik.leopold@wu.ac.at Abstract. Today, many organizations extensively model their business processes using standardized notations such as the Business Process Model and Notation (BPMN). However, such notations are often very specific and are not necessarily intuitive for domain experts. This paper addresses this problem by providing a flexible technique for automatically transforming BPMN process models into natural language text. In this way, people with none or limited knowledge of process modeling are enabled to read and understand the information captured by a process model. The presented version supports a transformation of English as well as Portuguese models. However, the technique is flexible and extensions for other languages can be implemented with reasonable effort. * Tool s presentation video: 1. Introduction Business process models provide an abstract graphical view on organizational procedures by reducing the complex reality to a number of activities. By doing so, they help to foster an understanding of the underlying organizational procedures, serve as process documentation, and represent an important asset for organizational redesign [Larman 2005]. In order to depict business processes, many companies use specific notations, such as, BPMN [Ko et al. 2009], which was developed and standardized by the Object Management Group [OMG 2011]. While these notations are useful in many different scenarios, it still represents a challenge for many employees to fully understand the semantics of a process model. If the reader is not familiar with the wide range concepts (e.g., gateways, events, or actors), parts of the process may remain unclear. For example, domain experts usually do not have the necessary skills to read the process models designed by business analysts [Dumas et al. 2013]. Training employees in understanding process models is costly and can hardly be considered an option for the entire workforce of a company. In this paper, we follow up on prior work [Leopold et al. 2012a, Leopold et al. 2014] and present a tool that implements a language-independent (e.g., 133

134 Portuguese, English) framework which is capable of generating natural language texts from BPMN process models [Rodrigues 2013] 1. In order to demonstrate the capabilities of our tool, we present a proof of concept implementation based on English process models. It shows the texts generated by our tool fully describe the input process models and, thus, it is possible to understand the business process models without being familiar with the employed process modeling notation. Our tool has the potential to increase the benefits that can be derived from process modeling as discussions based on text tend to be more productive than discussions based on models [Castro et al. 2011]. Furthermore, it may significantly increase the audience of process models as an understanding of process models is no longer bound to the knowledge of a specific notation. The remainder of this work is structured as follows. Section 2 presents the pipeline concept of natural language generation systems. Section 3 presents the framework. Section 4 presents an example of natural language generation in English. Finally, Section 5 presents the conclusion and future work. 2. The Pipeline Approach for Text Generation As pointed out by Reiter and Dale [Reiter and Dale 1997], many natural language generation systems follow a pipeline approach consisting of three main steps: Text Planning: The information is determined which is communicated in the text. Furthermore, it is specified in which order this information will be conveyed. Sentence Planning: Specific words are chosen to express the information determined in the preceding phase. If applicable, messages are aggregated and pronouns are introduced in order to obtain variety. Sentence Realization: The messages are transformed into grammatically correct sentences. Figure 1. Tool s execution process [Leopold et al. 2012a]. Figure 1 illustrates how we adapted the pipeline architecture for generating text from process models. In total, it consists of six components: Linguistic Information Extraction: In this component, we use the linguistic label analysis presented in [Leopold et al. 2012b] to decompose the differing formats of process model element labels. In this way, for instance, we are able to decompose an activity label such as Inform customer about problem into the action inform, the business object customer, and the addition about problem. 1 Available for download at: 134

135 Annotated RPST Generation: This component derives a tree representation (Refined Process Structure Tree - RPST [Vanhatalo et al. 2009]) from the process model in order to provide a basis for a step-by-step process description. Text Structuring: After deriving the RPST, we annotate each element with the linguistic information obtained in the previous phase. DSynT-Message Generation: The message generation component maps the annotated RPST elements to a list of intermediate messages. More specifically, each sentence is stored as a Deep-Syntactic Tree (DSynT) [Mel čuk and Polguere 1987]. DSynT facilitates the manageable yet comprehensive storage of the constituents of a sentence. Message Refinement: This component takes care of message aggregation, referring expression generation (e.g,. replace the role analyst with he), and discourse marker insertion (e.g., afterwards or subsequently). The need for these measures arises if the considered process contains long sequences of tasks. Surface Realization: This component transforms the intermediate messages into grammatically correct sentences. This is accomplished by systematically mapping the generated DSynT to the corresponding grammatical entities. To the best of our knowledge, we are the first to propose a tool for generating natural language text from business process models that can be adapted to a wide range of languages. In prior work, the base generation technique has been introduced [Leopold et al. 2012a, Leopold et al. 2014], but it does not support other languages than English. There are, however, tools that address the opposite direction, i.e., there are works on generating models (process models [Friedrich et al. 2011], ontologies [Leão et al. 2013], and UML diagrams [Bajwa and Choudhary 2006]) from natural language text. While all these techniques address different challenges, they mainly differ from our technique by using real natural language text as input. The main challenge for generating text from process models is to adequately analyze the existing natural language fragments from the process model elements, and to organize the information from the process model in a sequential fashion. 3. The NLG Tool According to Pree, our tool can be classified as an application framework. Application frameworks consist of ready-to-use and semi-finished building blocks. The overall architecture is predefined as well. Producing specific applications usually means to adjust building blocks to specific needs by overriding some methods in subclasses [Pree 1994]. The tool architecture (Figure 2) is composed by several ready-to-use building blocks (known as Frozen spots [Pree 1994]) and defines interfaces which must be implemented to support specific languages. Each interface represents a hot spot [Pree 1994], because they are flexible to satisfy specific needs (in our case, generate text in a specific language). The architecture s frozen spots are represented by classes, while the hot spots are represented by interfaces (elements stereotyped as interface). The GeneralLanguageCommon package (Figure 2) is the generic (languageindependent) module. It includes the interfaces definitions, which must be implemented for a specific language in order to generate natural language text, i.e, it is the Natural Language Generation core. It contains the necessary infrastructure to 135

136 work with the NLG pipeline process. It includes the data structures, and it knows exactly which and when an object must be called to deal with a specific phase of the pipeline. For example, regarding the Localization strategy (represented by the classes of the Localization package), the module knows when to call the LocalizationManager object to translate a specific message, retrieved from the LocalizationMessages enumeration (keys) during the text information extraction. E.g, for the key PROCESS BEGIN WHEN, the returned text would be O processo começa quando for Portuguese. Analogously, it knows when to trigger each interface method implemented for a given language at runtime. Figure 2. Tool Architecture - UML Package diagram. Figure 3 presents a package diagram of the hot spots implementations for Portuguese and English. They are named as Realizer since they realize the implementations of GeneralLanguageCommon package interfaces. Each language has its own specific implementation (e.g, PortugueseLabelHelper class implements ILabelHelper). So, PortugueseRealizer and EnglishRealizer classes implement all architecture hot spots and use frozen spots to accomplish necessary tasks. 136

137 Figure 3. Implementation of the hot spots defined by the architecture. With the definition of the NLG core module, the developer does not need to know in detail how the NLG process works. The module assures that a natural language text will be produced for the given language, as long as interfaces and their methods are implemented according to the specification. The components of this model are described as follows. LabelAnalysis: extracts linguistic information from process model labels. Interfaces defined in this package (hot spots) must be implemented for each supported language, i.e, all the linguistic classification algorithms must be implemented for each language. For example, algorithms to identify that assess is the verb in the label application assessment. Fragments: represents sentences in natural language patterns. The classes defined in this package are frozen spots. DSynt: maps the information from the process into DSynT trees. Classes defined in this package are frozen spots. Localization: define the logic needed to accesses specific language dictionaries. Besides, it has the common functionality for fetching the translation for a given word. For example, the LocalizationManager class is used to fetch messages from the dictionary, which will be used in the final text representation. ISurfaceRealizer: defines the contract that all the language-specific realizers must implement to produce messages in a natural language format. In a nutshell, the implementation of this interface for a specific language must be able to use a given DSynT tree to read the textual information from the nodes, and to assemble a grammatically correct sentence. LanguageConfig in the MultiLanguageProject package: is a Factory 2 that creates objects from the classes that implement language-specific logic interfaces for a given language (e.g, Portuguese or English). 4. Example of Generation of Natural Language Text As an example of the a use of our tool, consider a scenario where a business expert wants to validate the process model written in BPMN by a Business Process Analyst. When the business expert looks at the model (depicted in Figure 4) he realizes that he never 2 A factory is a program component which main responsibility is the creation of other objects 137

138 used the notation before and cannot understand what the analyst tried to express with the process model. Instead of trying to learn the BPMN notation, the business expert runs the tool using the given process model as input 3. Then, it outputs a natural language text (depicted in Figure 5). With the natural language representation of the model, the business expert reads the text and realizes that the analyst did not model guest age verification activity before processing an alcoholic beverage order. He decides to engage with a discussion with the system analyst. During the discussion, both realize that this information is really missing and the expert asks the analyst to add it in the model. The business expert was able to identity a flaw in the model, only by reading the textual representation in a natural language format. Figure 4. Business Process Model sample using the BPMN notation. 5. Conclusion Process models are frequently used in various organizations for understanding, documenting, and visualizing performed tasks. Through the approach of Generating Natural Language text (NLG), we enable non-technical users to understand process models without understanding the process model notation that was used for designing the models. This paper presented a tool that builds on NLG techniques to generate natural language text from BPMN process models. The tool s architecture is flexible to support other languages (e.g., Spanish or German). Currently, the tool coverages several elements 3 Samples of business process models in Portuguese are presented in Appendix B of the document available at azevedo/bpm2nlg/rebuttal_apendix.pdf. 138

139 Figure 5. Tool s output: Natural language text generated from the BPMN process model. of BPMN notation 4. The running example demonstrates texts generated correctly. Those texts can be used to align business experts knowledge with system analysts. The tool was implemented using the Java language and it is composed by: 207 Java classes 4 Java projects 62 Java Libs, including WordNet (English corpus) and RealPro (English Sentence realization) Portuguese corpus: 22,821 nouns; 32 adverbs and conjunctions; 230,637 verbs; 39 prepositions; 50 pronouns; 63,344 adjectives; and, 8 articles. Mostly gather from Floresta corpus [Afonso et al. 2002]. As future work, we suggest use the tool in a real business scenario. By presenting the generated text to nontechnical users, we can learn about the usefulness of our tool. In particular, questionnaires could be applied to evaluate if the texts generated from the models were sufficient to understand the process. We further suggest adding new languages to the tool. For new languages, it is necessary to implement the operations defined in the specific interfaces (package LabelAnalysis). Suitable languages for that are, among others, German and Spanish. References Afonso, S., Bick, E., Haber, R., and Santos, D. (2002). Floresta sintá (c) tica: A treebank for portuguese. In LREC. Bajwa, I. S. and Choudhary, M. A. (2006). Natural language processing based automated system for uml diagrams generation. In The 18th Saudi National Computer Conf. on computer science (NCC18). Riyadh, Saudi Arabia: The Saudi Computer Society (SCS). 4 The BPMN elements supported by the tool are presented in Appendix C of the document available at azevedo/bpm2nlg/rebuttal_apendix.pdf 139

Exibir mais