School Performance Evaluation in Portugal: A Data Warehouse Implementation to Automate Information Analysis DSIE 11 Doctoral Symposium in Informatics Engineering Rui Alberto Castro ProDEI Edição 2010/2011 27 de Janeiro de 2011
Contents Introduction Data Warehouse System Description Data Warehouse Implementation ETL Process Data Extraction, Transformation and Loading Results Conclusions DAPI DSIE 11 Edição 2010/2011 Doctoral Symposium 27 de Janeiro in de 2011 Informatics Engineering José Monteiro Rui Castro 2
Introduction (1) Importance of Measure, Compare and Rank in all knowledge areas Comparative Analysis between schools Using National Exams Results since 2002 (Inter-school) Intra-school Analysis Importance Internal Factors (teacher, course, social background, past school history) Important Improvement factors (teachers, directors) DAPI DSIE 11 Edição 2010/2011 Doctoral Symposium 27 de Janeiro in de 2011 Informatics Engineering José Monteiro Rui Castro 3
Introduction (2) Difficulties to Perform Inter and Intra-school analysis Very Different Data Sources Incompatible Data Systems Complex Relational Models not suited for Analysis Proposed Data Warehouse System All Data (inter and intra) in a consistent and simple Model Simple Query (Dimensional Model) suited for Automated Analysis DAPI DSIE 11 Edição 2010/2011 Doctoral Symposium 27 de Janeiro in de 2011 Informatics Engineering José Monteiro Rui Castro 4
Data Warehouse System Description (1) System built around Two Stars (fact tables) Star One stores the results from the national exams. Both Stars share four common dimensions: anolectivo: school year to which the grades refers. escola_dw: data about a specific school [name, code, geographical location and type (public or private)]. course_dw: information about all programs. disciplina: information about all courses (name, cod_enes, cod_exame and the year of course termination. DAPI DSIE 11 Edição 2010/2011 Doctoral Symposium 27 de Janeiro in de 2011 Informatics Engineering José Monteiro Rui Castro 5
DW System (Star One Exams) (2) System built around Two Stars (fact tables) DAPI DSIE 11 Edição 2010/2011 Doctoral Symposium 27 de Janeiro in de 2011 Informatics Engineering José Monteiro Rui Castro 6
DW System (Star Two a school) (3) Star Two stores the grades and exams of a specific school. Star Two has two more dimensions: aluno: Stores information about students. professor: Stores information about the student teachers. DAPI DSIE 11 Edição 2010/2011 Doctoral Symposium 27 de Janeiro in de 2011 Informatics Engineering José Monteiro Rui Castro 7
DW System (Star Two a school) (4) DAPI DSIE 11 Edição 2010/2011 Doctoral Symposium 27 de Janeiro in de 2011 Informatics Engineering José Monteiro Rui Castro 8
Data Warehouse System Description (5) From the two fact tables: Built aggregated stars to make analysis easier Some pre-computed results Example: Ranking DAPI DSIE 11 Edição 2010/2011 Doctoral Symposium 27 de Janeiro in de 2011 Informatics Engineering José Monteiro Rui Castro 9
DW System (Aggregated Star Rank) (6) 10
Data Warehouse Implementation (1) ETL Process (Extraction, Transformation and Loading) Data Extraction Straightforward Process Data available in electronic format Exams from ME in Access Tables Data from school in Relational Database 11
Data Warehouse Implementation (2) Data Transformation Complex and heavyweight Process Ambiguity of Data (course definition) Lack of Information (ME tables) Need of Manual Assistance Semi-automated Processes 12
Data Warehouse Implementation (3) Example of a Semi-Automated SQL Transformation Process select distinct course.name, abr, cod_enes, termina, class.year as AnoD,cast('' as varchar(200))as nome_tbl,cast(0 as integer)as CodExame, cast('' as varchar(25))as AnoTerm, 0 as OkFlag into _temp_tab_course from course, year_area, year, class_course, class where course.id_year_area=year_area.id and id_year=year.id and year.year>=2001 and exame=1 and id_course=course.id and id_class=class.id and (termina=class.year or termina=0) update _temp_tab_course setnome_tbl=descr,codexame=exame, anoterm=[anoterminal], okflag=1 from tblexames t where name=descr This SQL code automatically processes 28% of all courses Remains 72% that needs manual intervention and validation 13
Data Warehouse Implementation (4) Data Loading Simple and Automatic Process Transformation phase prepared all data Fill Star tables from temporary tables Data No Need of manual assistance 14
Data Warehouse Implementation (5) Example of Data Loading for Star One (year 2009/2010) insert exames_nacionais_me select escola_dw.id, disciplina.id, anolectivo.id, 'I', fase, sexo, count(*), sum(cif), sum(class_exam), sum(cfd) from tblhomologa_2009, escola_dw, disciplina, anolectivo where ano=2009 and escola=cod_escola and cod_exame=exame and interno='s' group by escola_dw.id, disciplina.id, anolectivo.id, fase, sexo union select escola_dw.id, disciplina.id, anolectivo.id, 'E', fase, sexo, count(*), sum(cif), sum(class_exam), sum(cfd) from tblhomologa_2009, escola_dw, disciplina, anolectivo where ano=2009 and escola=cod_escola and cod_exame=exame and interno='n' group by escola_dw.id, disciplina.id, anolectivo.id, fase, sexo This SQL code automatically processes 100% of data 15
Results (1) Examples of Queries to the Implemented System Very Simple SQL Queries Queries with same Structure Variety of Results allows Wide range of Analysis 16
Results (2) Grades and exams of all students of CPVI in Math and in 2009/2010 (sample) select ano_desc,abr,anoterm,professor.nome,per3,cif, exame,cfd from notas_exames_escola,professor,disciplina, escola_dw, anolectivo where id_disciplina=disciplina.id and id_anolectivo = anolectivo.id and id_professor=professor.id and abr = 'mat-a' and ano=2009 year abr code nf cif exame_f1 exame_f2 cfd 2009 MAT-A JC 10 11 121 11 2009 MAT-A JC 15 15 171 16 2009 MAT-A JC 17 16 147 135 16 2009 MAT-A JC 14 14 147 156 15 2009 MAT-A JC 20 20 196 20 2009 MAT-A JC 9 11 96 11 2009 MAT-A JC 19 19 178 19 2009 MAT-A JC 18 19 185 177 19 2009 MAT-A JC 15 15 160 15 2009 MAT-A JC 17 18 196 19 17
Results (3) Top 12 schools with more exams in 2009/2010 select top 12 nome,count(*) Total_Exames_Nacionais from exames_nacionais_me,anolectivo,escola_dw where id_anolectivo=anolectivo.id and id_escola=escola_dw.id and ano_desc='2009/2010' group by nome order by count(*) desc School Number of Exams in 2009/2010 Escola Secundária Camões 591 Escola Secundária Alberto Sampaio 522 Escola Secundária Jaime Moniz 482 Escola Secundária Santa Maria de Sintra 482 Escola Secundária da Amadora 466 Escola Secundária Alexandre Herculano 459 Escola Secundária de Odivelas 454 Escola Secundária Maria Amália Vaz de Carvalho 443 Escola Secundária Leal da Câmara 429 Escola Secundária Avelar Brotero 428 Escola Secundária de Cascais 428 Escola Secundária Alves Martins 419 18
Results (4) Total Exams Count in CPVI by Sex and Year select ano_desc,sexo,count(*) Total_Exames_Nacionais from notas_exames_escola,anolectivo,aluno where id_anolectivo=anolectivo.id and id_aluno=aluno.id group by ano_desc,sexo order by ano_desc,sexo Year Sex Exams(Total) 2005/2006 F 193 2005/2006 M 86 2006/2007 F 359 2006/2007 M 231 2007/2008 F 287 2007/2008 M 250 2008/2009 F 300 2008/2009 M 313 2009/2010 F 304 2009/2010 M 318 19
Results (5) Three year analysis of CPVI Math Teachers Performance select ano_desc,disciplina.nome,id_prof,avg(exame) Média_Exame_Prof from notas_exames_escola,anolectivo,disciplina, professor where id_anolectivo=anolectivo.id and id_disciplina=disciplina.id and id_professor=professor.id and disciplina.abr='mat-a' group by ano_desc,disciplina.nome,id_prof order by ano_desc,avg(exame) desc Year Course Teacher Id Exam Avg 2006/2007 Matemática A 6 124 2006/2007 Matemática A 21 109 2006/2007 Matemática A 157 82 2007/2008 Matemática A 6 163 2007/2008 Matemática A 21 115 2007/2008 Matemática A 157 102 2008/2009 Matemática A 41 158 2008/2009 Matemática A 21 151 2008/2009 Matemática A 157 122 2009/2010 Matemática A 6 143 2009/2010 Matemática A 41 140 2009/2010 Matemática A 21 115 20
Conclusions and Future Work Implemented a DW System suited for inter and intra-school performance analysis Able to obtain a variety of results and analysis System allows simple and fixed structure SQL Queries Future Work Using simple and fixed structure to built an automatic analysis tool Increase Automation in Data Transformation phase Build a prototype and test in new environments 21