4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 INPA's Biological Collection Data Quality Improvement Laurindo Campos lcampos@inpa.gov.br MCTI/ INPA The National Institute for Amazonian Research Information Technology Coordination BioGeo Informatics Unit Semantic Interoperability Laboratory
Outline q Biodiversity Scenarios: Global and National q INPA and its presence in Amazonia q Data Quality Issues q Disseminating Biological Data q INPA s IT Evolution q Concluding Remarks 4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Biodiversity Scenarios: Global and National Global Diversity q From ~ 1.7 million of known species 56% are insects! 14% are plants 2.7% are mammals and birds q It is es?mated that 4-20 million of species have not been described yet
4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Megadiverse countries Latin America and the Caribbean is the region with the greatest biological diversity on the planet: 50% of the world s tropical forests 33% of its total mammals 35% of its reptilian species 41% of its birds 50% of its amphibians Six countries in Latin America
National Scenario: q Brazil - higher rate of biodiversity in the world (~ 20%) - (Assunção, 2011) q Six biomes (disruption and degradation are the main threats) q Combined pressures are forcing the loss of habitat and species q Planning and Decisions are dependent on data/ metadata management 4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
National Scenario (cont.): q Data and metadata are collected as a preliminary process in scientific experiments q Management is mandatory q Sharing data, analysis and synthesis are crucial q Data Governance - data and information as commodities (Jason Kolb, 2011) 4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Barriers to overcome: 1. Data policy for organizations; 2. Improvements in infrastructure; 3. Improvements in data quality; 4. Effective management and use of data/ metadata and information. 4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 INPA and its presence in Amazonia Mission: To generate and disseminate knowledge and technologies, and to enable human resources for the development of the Amazon"
4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 THEMES Biodiversity Knowledge of the biological diversity of the Amazon region. Environmental dynamics Understanding the Amazon ecosystem. Society, Environment and Health Dynamics of human popula?ons of the Amazon and its social and environmental implica?ons. Technology & InnovaHon Applica?on of the knowledge produced on natural resources for the development of techniques, processes and products that meet the socioeconomic demands.
10 Research Areas ecology botany aquaculture entomology Aqua?c Biology Health Science Natural Products Forest Products tropical forest agronomy Food Technology Climate and Water Resources Humani?es and Social Sciences MSc e PhD Programs ecology botany entomology agriculture tropical forest Aqua?c Biology and Fisheries Gene?cs, Conserva?on & Evolu?onary Biology Biological Reserves Management Biotechnology (UFAM) Regional Products and Biotechnology (UEA) Science of Food (UFAM) 4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Brazilian Amazonia - INPA Central and State Centers São Gabriel da Cachoeira Consolidated Partnership Tefé Santarém
4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 INPA s New Geographic Approach Amazonia sensu latissimo, Source: Eva & Huber (2004).
4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Expertise - Projects, Partnerships & Training 13
4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 14 INVERTEBRATES ~ 3,500,000 insects PEIXES ~ 120,000 specimens of various river flows. Program of ScienHfic CollecHons and Archives Zoological CollecHons Reptiles and Amphibians ~ 17000 BIRDS ~ 800 specimens HERBARIUM 217.462 records CARPOTECA 2.500 samples Wood Collection ~ 10.445 samples MAMMALS ~ 5.242 specimens COLLECTIONS MICROBIOLOGICAL Medical and agroforestry
4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 To follow principles The way to fulfill INPA s mission is to treat data as longterm asset and managing it within a coordinated framework. Principles of data quality need to be applied at all stages of the data management process (capture, digitization, storage, analysis, presentation and use). Focus on two keys to the improvement of data quality: prevention and correction.
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Data Quality POOR QUALITY Jeopardize decision making process, credibility of data, satisfaction of users; High costs of data management and the effective use and value of data (Redman, 1996).
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 POOR DATA QUALITY: IMPACTS Pervasiveness of poor data; Troublesome data and collection management; Difficult data integration and database merging; Scientific and institutional reputation. (Dalcin, 2005)
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 APPROACH Since data users have wide range of needs, and data are collected from different sources, INPA must enable data of known (good) quality to be shared. For specific dataset, it must document the way data has been compiled and verified, and use it to provide valuable information to metadata description. Implement data curation activities
Data and Computational Resources 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Curadoria de Dados Diretoria INPA Estrutura COAE CPAF Pesquisas CTIN Programa de Coleções Grande Projetos SDIN LIS NBGI Curadoria de Dados Científicos
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 DATA CURATION @ INPA On going management activities to maintain scientific data in long-term mode such that it is available for reuse and preservation. Ex.: LBA, GEOMA, PELD,PPBIO, TEAM, GO AMAZON, ATTO, etc; Institutional Data Committee (Researchers & IT Professionals) Implementing data policy and its enforcement;
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 DATA CURATION @ INPA Best practices and the development/adoption of specific tools. Focus on: Accuracy of taxonomic identification Precision over the location and associated information in the record Clarity of the recording approach and methodology Accuracy of producing and documenting the record Quality of data transmission
Generic Error Pattern (English, 1999) Data cleansing Error patterns Domain value redundancy Missing data values Incorrect data values Nonatomic data values Domain schizophrenia Duplicate occurrences Inconsistent data values Information quality contamination 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
DisseminaHng Biological Data 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Information Network: traditional Infrastructure Tools for Synthesis Science Application Policy Tools for Presentation Information Tools for Analysis Information Infrastructure Data Providers Data Digitization Adapted from Erick Mata, 2008. 4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Applications / data access today...
Web site INPA Portal DiGIR INPA Servidor colecoes.inpa.gov.br Coleções Biológicas do INPA
Web site Biodiversidade amazonica (PPBio Amazônia) Web site INPA Portal DiGIR Biodiversidade amazônica Portal DiGIR INPA Servidor Biodiversidadeamazonica.net.br Servidor colecoes.inpa.gov.br Acervos de outras coleções da Amazônia Ocidental Coleções Biológicas do INPA
Web site specieslink Web site Biodiversidade amazonica (PPBio Amazônia) Web site INPA Portal DiGIR specieslink Portal DiGIR Biodiversidade amazônica Portal DiGIR INPA Servidor Biodiversidadeamazonica.net.br Servidor colecoes.inpa.gov.br Rede Paraná Taxon-line Rede Espírito Santo Rede São Paulo Rede Rio de Janeiro Acervos de outras coleções da Amazônia Ocidental Coleções Biológicas do INPA
Ferramentas Mapas Modelagem Datacleaning Georreferenciamento automático Web site specieslink Web site Biodiversidade amazonica (PPBio Amazônia) Web site INPA Portal DiGIR specieslink Portal DiGIR Biodiversidade amazônica Portal DiGIR INPA Servidor Biodiversidadeamazonica.net.br Servidor colecoes.inpa.gov.br Rede Paraná Taxon-line Rede Espírito Santo Rede São Paulo Rede Rio de Janeiro Acervos de outras coleções da Amazônia Ocidental Coleções Biológicas do INPA
GBIF IABIN Web site Biodiversidade amazonica (PPBio Amazônia) Web site INPA SIBBr Portal DiGIR Biodiversidade amazônica Portal DiGIR INPA Rede specieslink Servidor Biodiversidadeamazonica.net.br Servidor colecoes.inpa.gov.br Rede Paraná Taxon-line Rede Espírito Santo Rede São Paulo Rede Rio de Janeiro Acervos de outras coleções da Amazônia Ocidental Coleções Biológicas do INPA
SIBBr: A NaHonal IniHaHve 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Slide from SIBBR/LNCC, 2013
Towards a beuer cyberinfrastrucure 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Conceptual Map Slide from D. Pennington
Conceptual Landscape of Technology Enabling Science - CLTES Mental Model Research Design Collect Data Conduct Analyses Dissemination/Publishing Cyberinfrastructure Systems 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 CLTES Technology & Research Cicle Slide from D. Pennington
CLTES no INPA, MPEG, SIBBr, GBIF, Probio II, Biota, Cria, etc SIBBr Slide Adapted From D. Pennington 4 th 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
Infrastructure for improving data management 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
From Data on the Web to a Web of Data 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
EVOLUTION OF DATA/METADATA 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 (Tim Bernes-Lee s Open Data Classification, 2010) On the web, open license Machine-readable data Non-proprietary format RDF standards Linked RDF Linked to rich, descriptions capable of supporting interoperability Linked Open (Biological Data) - LOD
EVOLUÇÃO DA WEB 4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Concluding Remarks Data quality refer to the understanding and description of the processes in data acquisition, treatment and management; information production, usage and delivery; and data modeling and implementation. The significant aspect of data quality issues is related with the Internet and robust cyberinfrastructure which promotes a better way information is delivered. Researchers (Biologists) must follow/trust the new way data is managed.
4 th SIBBR Workshop - Petrópolis-RJ, 25-29 of August 2014 Thank you! Laurindo Campos lcampos@inpa.gov.br
Partners and Collaborators