1 Latin American Research Network Registration Form The Quality of Education in LAC 1. Name of institution: Instituto Futuro Brasil 2. Name of the participants: Project Director: Naercio Aquino Menezes Filho Researcher 1: Creso Franco Researcher 2: Fabio Waltenberg Researcher 3: 3. Name, title, phone number and of the person responsible for signing the letter of agreement with the Bank: Name: Regina Carla Madalozzo Title: Director of IFB Phone Number: Fax: Name of institution: Escola de Economia de São Paulo - Fundação Getúlio Vargas 5. Name of the participants: Project Director: André Portela Souza Researcher 1: Aloisio Araújo Researcher 2: Gabriel Buchmann Researcher 3: Marcelo Neri Researcher 4: Paulo Picchetti Researcher 5: Vladimir Ponczek 6. Name, title, phone number and of the person responsible for signing the letter of agreement with the Bank: Name: Yoshiaki Nakano Title: Director, Escola de Economia de São Paulo Fundação Getúlio Vargas Phone Number: Fax:
2 Instituto Futuro Brasil & Escola de Economia de São Paulo Fundação Getúlio Vargas The Quality of Education in Brazil Research Proposal Inter-American Development Bank June 2007
3 Research Team Instituto Futuro Brasil (IFB) Naercio Aquino Menezes-Filho (Leader) - IFB Creso Franco - PUC- RJ Fabio Waltenberg - IETS Escola de Economia de São Paulo e Escola de Pós-Graduação em Economia Fundação Getúlio Vargas (EESP/EPGE-FGV) Aloísio Araújo - EPGE-FGV Gabriel Buchmann CPS/IBRE-FGV Marcelo Néri - EPGE-FGV Paulo Picchetti - EESP- FGV Vladimir Ponczek - EESP- FGV André Portela Souza (Leader) - EESP- FGV
4 Introduction The human capital is one of the main determinants of the rate economic growth and level of welfare in a country and formal education is one of the most important components of human capital. The process of educational attainment in Brazil can be described as backward (even) when compared to less developed countries and highly skewed in favor of a privileged slice of the population. The debate over the importance of education as a factor explaining Brazil s income inequality is intense, with the leading current of opinion being that the distribution of schooling is the main causal factor explaining this inequality, by generating productivity differences among individuals that last throughout their lifetimes [Menezes-Filho (2001)]. Substantial improvements have been observed in Brazilian education in terms of quantitative indicators, both in terms of flow variables (decreased average delay, decreased fraction of delayed children, increased gross and net enrollment rates) and in terms of stock variables (increased average years of schooling, reduced illiteracy rates), as shown in table 1. However, there is still considerable room for improvements in those respects, given that the country is clearly still very far from a situation of universal access to all levels of education, with acceptable drop out rates in each level. Another insufficiency is that recent changes have not led to a homogeneous pattern across the country. Indeed, there is huge variation across regions, states and metropolitan areas. For example, illiteracy rate among children (10-14 years old) is 0.4% in the metropolitan area of Curitiba and 3.8% in Fortaleza; the fraction of children (7-14 years old) at school is 98.4% in the state of Santa Catarina and 89.1% in that of Maranhão; net enrollment rate at secondary school: 66.6% in the state of São Paulo; 22.1% in that of Alagoas. 1 Table 1 Education Indicators 1 Source: Brazilian household surveys (PNAD) compiled by the Brazilian IETS (www.iets.org.br).
5 Finally, and more importantly, while the numbers above suggest that a quantitative change has taken place, they do not indicate it has been accompanied by a qualitative improvement. It is not enough to make sure pupils go to school; they should be learning something there. Unfortunately, there are reasons to believe this is not happening in Brazil. Recent evidence indicates the average quality of education in the country is low and it is unequally distributed among its population, as compared to the situation found in many countries, both internationally and regionally (Latin America). 2 According to a study conducted by Ministry of Education in 2003, for example, 55% of students who complete fourth grade have reading performance considered critical or very critical, and in the Northeast and North regions this percentage reaches 70% and 66%, respectively. This dismal performance is also observed in nearly 40% of students who complete the third year of high school. Once again the regional disparities stand out: in the South 29% have a reading level deemed critical or very critical, while 2 See, for example, how (bad) Brazil compares to other countries in: OECD (2001, 2004a, 2004b), Ravela (2004), Waltenberg (2005), Willms (2006), Hanushek & Wössman (2007).
6 in the North and Northeast, these percentages are much larger, at 51% and 48%, respectively [SAEB (2004)]. Therefore, this project intends to examine in more detail the quality of education in Brazil, to understand its main determinants and to verify the impact of the quality of education on future socioeconomic outcomes, such as wages, inequality and health. The main objectives of the proposal are: 1- Decompose the factors that contribute to education quality in Brazil (topics 1 and 2) 2- Compare education quality in Brazil with that of other Latin American and South European countries (topic 3) 3- Investigate the impact of education quality on wages and early pregnancy (topics 4 and 5) 4- Analyze the design of efficient education policies (topic 6) This proposal consists of six additional sections and 2 annexes. Each of the following sections below describes the specific topic it intends to cover in the project, together with the main data sets to be used in each topic. We then present the bibliographic references, the budget and the CVs of all the researchers participating in the project in the annexes. 1) Decomposition of the Quality of Education Fundação Getulio Vargas In this topic we plan to investigate the relative importance of the schools characteristics, the student characteristics and the teacher characteristics on the quality of education. It will be accomplished by applying different methods of decomposition on test results on the SAEB and Prova Brasil datasets. The main question to be answered is the following: Of all the total dispersion of test-results, how much is due to school, student, and teacher characteristics? This question will be answered by using three different methodologies Data Sources: SAEB and Prova Brasil Tha dataset to be used are the SAEB dataset and, conditional upon availability, the Prova Brasil dataset. It is collected by the Anísio Teixeira National Institute of Studies and Research (INEP) from the
7 Ministry of Education of Brazil. 3 SAEB dataset (Sistema de Avaliação de Educação Básica) consists of microdata on students proficiency test-scores on portuguese and mathematics for students of the 4 th and 8 th grades of the primary education, and 3 th grade of the secondary education. It also includes two questionnaires that elicit information on the family background of the student and on the school characteristics. It runs every two years since 1995 and its sample is stratified to be representative at different levels (with some variations across the years), e.g., grade levels, state and national levels, public and private schools, metropolitan and non-metropolitan areas. It is important to note that it is not a panel data. Rather, it is a series of cross-section data where the students take only one of the two exams, Portuguese or Mathematics. The most recent year available is The Prova Brazil dataset is a set of data on proficiency test-scores collected also by the Anísio Teixeira National Institute of Studies and Research (INEP). It is intended to cover most of students and the 4 th and 8 th grades of the primary education in More precisely, it includes all students in urban public schools with at least 30 students in the respective grade. The exams are in Portuguese and Mathematics as well. Differently from SAEB, it is representative at the municipality level and the student takes both Portuguese and Math exams. It also includes information on the student family background. 1.2 The Methodologies Three different methodologies will be used to decompose the test results into school characteristics, student characteristics, and teacher characteristics. The first two methods will explore the correlations of observable characteristics and the test scores. The third one will make a further effort to disentangle the effect of quality of education from other critical unobservable inputs Regression-Based Approach Decomposition The first method to be used is the Regression-Based Approach decomposition of inequality developed by Morduch and Sicular (2002). It is the decomposition method by factor components developed by Shorrocks (1982) extended to the linear regression analysis. Its advantages are that it is very flexible and fits in our objective easily. First, it admits OLS, weighted least squares, quantile regressions, 3
8 and corrections for endogeneity. Second, it is an exact decomposition (the sum of the parts adds to one) and yields an exact allocation of the contributions of each variables. Third, it can be applied to many different inequality indices that satisfy the most common required properties of inequality indices plus the property of uniform additions. This property states that measured inequality should fall if everyone in the population receives a positive transfer of equal size. This property is implied by the transfer axiom and the scale invariance axiom. The Gini, the Coeficient of Variation and the Theil-T indices satisfy this property. Briefly, the authors start with a linear equation y = i X i β + ε i where y i can be the proficiency score of student i and X is a vector of M x 1 explanatory variables that m include student, school, and teacher characteristics, and ε i is the error term. Note that y i = yˆ i, for all i, m m m where yˆ i = βˆ m xi for m = 1,..,M, and yˆ i = εˆ i for m = M+1. These estimated test-scores can then be used to directly compute decomposition components for all regression variables. The share of variable m of total M 1 + m=1 inequality index I(y) take the general form of s m m a i ( y) xi = ˆ i β m, for m = 1,,M. The term a i (y) will I( y) depend on the particular index used that can be written as a weighted sum of factor components. Examples for the Gini, CV and Theil-T can be found in Morduch and Sicular (2002) Hierarchical Models The advantage of the first method is that is simple. It gives us the first approximation to the relative importance of the observable characteristics of schools, students, and teachers on test results. However, the drawbacks are that it does not take into account the fact the variables have different levels of aggregation and that it does not control for possible unobservable characteristics correlated with observable characteristics and test results. The first drawback can be circumvented by a second method that uses multilevel models to assess school performance. When analyzing the determinants of students performance, one is normally interested in modeling a response variable such as a test score or any other measure related to 4
9 proficiency or well-being. This variable can be continuous, discrete or even categorical, but in all circumstances, the interest lies on the relationship between this variable and a set of explanatory variables representing known characteristics of students. The rationale for using multilevel models rests on the issue of exploring the fundamental relationships between different forms of aggregating the data. At one extreme we can, for example, work with school averages when investigating the relationships between the response and explanatory variables, while at the other extreme we could analyze the same relationships using data at the students level, ignoring the effects caused by the fact that these students are grouped in different schools. Some studies, such as Aitkin et alli (1986) and Woodhouse and Goldstein (1989), demonstrate that results are not robust when ignoring these issues of aggregation. When the different levels of aggregation are explicitly considered, it is possible to analyze the interaction between these levels in terms of their relation to the response variable. It is possible to simultaneously model data containing information on individual students, the classrooms (and teachers) they are part of, their families, their schools, and even some meaningful group of schools such as determined for instance by geographic location. A simple relationship between a response variable and a explanatory variable illustrates the point. If we index students in a sample by i, and schools by j, the values observed for a response variable can be represented by y ij, and the values for an explanatory variable by x ij. Assuming a linear form, the relationship between these variables can be modeled as y ij = 0 + β1x ij β + ε ij If we estimate this model by traditional methods such as Ordinary Least Squares, the residual term will necessarily capture all the non-modeled relationships between students and their schools, and these will remain unexplained. One possibility is to include dummy variables for schools, which will then control for differences in the intercept between the schools. One can even interact these dummy variables with the explanatory variable, obtaining different values for the associated coefficient, one for each school in the data sample. This approach faces statistical problems in terms of decreased efficiency (due to the large 4 For more details, see Morduch and Sicular (2002).
10 number of coefficients to be estimated), but mainly from a lack of a probabilistic structure to capture the important relationships between the different levels contained in the information in the data. Multilevel models will explicitly circumvent these shortcomings assuming that the coefficients are allowed to vary between units in a probabilistic way. We can assume, for example: Coefficients are now random around a common value, assuming particular values for each unit of the analysis. The relationships between the effects among these units are captured by the matrix of correlations of these coefficients. These models have been applied to a variety of problems, including assessing education performance. A good survey of the techniques and applications can be found in Goldstein (1987). The technique is general in the sense that the number of levels is part of the specification of the model, along with the included explanatory variables and their functional forms. Each of the explanatory variables can have associated coefficients at any of the different levels, with the benefit of statistical inference tools to decide when it is appropriate or not to allow the coefficients to be random (capturing the differences between the levels) or not. Therefore, with the available information on students backgrounds, one can tentatively specify models to capture the impacts of students characteristics, such as gender, but also on their belonging to specific households, classrooms, schools and geographic regions upon measures of their academic performance. Algorithms implementing the statistical inference procedures involved in this class of models are found in dedicated software, such as MlWin, and general-purpose statistical software, such as STATA Structural Model Using Minimum Distance Estimators
11 The fist and second method have the potential problem of not ideally controlling for the potential bias of unobservable characteristics correlated with school, student, and teacher characteristics. One way to tackle this issue is to estimate a structural model that formally assumes the presence of unobservable characteristics. A third method to be used is the estimation of a structural panel model using minimum distance estimators. It tries to extract information about the unobservable characteristics of the student by exploring the correlation of test-scores of Portuguese and Math of the same student of the Prova Brasil. Of course, this part of the project will be carried out only if the data is available on time. The advantage of this method is that, under specific assumptions, it controls for some unobservable variables that are correlated with the observable inputs and outcomes as well. This is a variance component model that establishes how much of the total variance of the testscores are due to (unobservable) student effect and other effects. The structure of the model can vary depending on the degrees of freedom and can include observable variables as well. For simplicity, suppose the test-scores depend on individual fixed effect plus a random shock. Formally, let additive form: N isp be the grade of student i in school s in exam p (normalized to mean 0) assumes the N isp = μ + ε, i isp where: 2 μ = student fixed effect with N(0, ) i σ μ 2 ε = error term with N(0, ) i σ ε Supose: E[ μ i, ε isp ] = 0, p ε, ε ] = ρ, for the same i. E[ isp isp` The variance-covariance system is given by (when two exams are considered): V 2 2 ( N isp ) = σ μ + σ ε
12 V ( 2 2 N isp = σ μ + σ ε `) COV ( N, N isp isp`) = σ 2 μ + ρ There are three observed moments in this system: Two variances and one covariance. One can estimate the parameters σ 2 μ, σ 2 eρ using the observed moments. The ratio ε 2 σ μ V ( N isp ) gives the relative importance of the student effect on the total variance of scores. In this example, this system is exactly identified. However, if one considers different states, regions, school systems, etc. as different moments, an over-identified system can be estimated. Moreover, one can start with a regression equation where the observable student characteristics are added. The computed residual can then be used to decompose its variance into the two components. The comparisons with the two results can give us an idea of the relative weight of observable variables to explain total student effect. The same can be done for schools and teachers. Of course, different models can be estimated and the refinement of them will depend on the degrees of freedom available. The estimation can be done and the over-identification restrictions tested by using of minimum distance estimation as in Chamberlain (1982, 1984) and Abowd and Card (1989). 2) Identifying Classroom and School Effects using Longitudinal data Instituto Futuro Brasil 2.1 Research Questions (a) To what extent students achievement varies between classrooms and between schools? (b) Which classroom and schools features promotes students learning? (c) Which classrooms and schools feature promotes a more equitable social distribution of learning within classrooms and schools? 2.2 Method Design
13 Almost all attempts to study school effect in Latin American countries have important limitation, related to the cross sectional feature of most database available. The quotations below stress this methodological point boldly: "[Due to] the cross-sectional nature of data, problems of causal inference are daunting." (Raudenbush, Fotiu e Cheong 1998). "A common obstacle to carrying out appropriate adjustments when modelling examination results is the lack of suitable prior achievement measures". (Goldstein 1995). "If only cross sectional data are used, whether with aggregate or individual level data, it is not possible to make inferences about the 'effectiveness' of schools". (Goldstein, Huiqi, Rath e Hill 1995). "A major strength of these two studies is their longitudinal designs, which allow us to investigate learning (which measures change over time in academic status) rather than achievement (which is a status measure). The ability to demonstrate how schools influence the students who attend them is strengthened by being able to take into account the status of these students at the point that they enter the schools (specially their academic status)." (Lee 2001) "To establish appropriate estimates of a school's improvement over time a number of components must be brought together. This, in our view, includes measures of outcomes and prior attainment on individual pupils; data from 3 or (preferably) more years; a multilevel statistical analysis; an orientation towards examining the data for systematic changes in school performance over time." (Gray, Jesson, Goldstein, Hedger e Rasbash 1999). "The fact that very few studies, if any, satisfy the minimum conditions for satisfactory inference, suggest that few positive conclusions can be derived from existing evidence. The minimum conditions can be summarized as: that a study is longitudinal so that pre-existing student differences and subsequent contingent events among institutions can be taken into account; that a proper multilevel analysis is undertaken so that statistical inferences are valid and in particular that 'differential effectiveness' is explored; that some replication over time and
14 space is undertaken to support reliability; that some plausible explanation of the process whereby schools become effective is available. (Goldstein 1997) Data and measures We are going to use panel data, gathered by the GERES project. The GERES project is financed by the Ford Foundation, by the Brazilian National Research Council and the Ministry of Education. It follows around 20 thousand students who attend 303 schools, in five major Brazilian cities. Up to now, the study collected three waves of data: baseline data, at the beginning of 1 st grade (March 2005); end of 1 st grade data (November 2005); and end of 2 nd year data (November 2006). Although the project is going to collect additional data on November 2007 and on November 2008, we are going to use the data that have already been collected in waves 1 to 3. The GERES project is a joint project of six Brazilian Universities and one member of our team is the coordinator of the Geres project. The cities were chosen as a consequence of the location of the Universities that are collaborating in the GERES project. Considering the whole population of schools offering 1 st grade education in these cities, a probabilistic sample, stratified by city, sector and level of resources available, were taken. Once a school was selected all 1 st grade classrooms and students were selected. These students were followed in subsequent waves, even if they repeated grade. Although the project did not followed students that moved for schools outside the school sample, every effort was taken to gather information, including school academic records, on leaving students. This, along with the decision of measuring students that repeated a grade, create a good context to differentiating value added from selection. In each wave students took a reading and mathematics test and teachers were asked to fill a questionnaire on their background, classroom features and school features. In wave 1 and wave 3 principals filled a questionnaire. Around wave 2, parents filled a small questionnaire with basic family background information (place of residence, father and mother education, father and mother occupation, family access to services and to consumption goods, as a proxy of income). Variables available on classroom and schools are classified in the following categories: (a) classroom social composition; (b) school social composition;
15 (c) classroom resources and size; (d) school resources and size; (e) classroom climate and practice; (f) school academic climate Analytical Approach Analytical approach is based on the hierarchical structure of data. It includes a preliminary step of partition of variance in its three components, between-students-within-classrooms, between-classroomswithin-schools and between schools. This is going to be estimate for both achievement in each wave and achievement in each wave controlled by previous achievement. As the concept of classroom does not hold for more than one academic year, classroom effect is going to be investigated based on two subsequent waves (wave 1 and 2; wave 2 and 3). Two and three levels hierarchical linear models will be estimated. For school effect, it will also be useful to use achievement of wave 3 as the dependent variable and to use achievement in wave 1 as the prior achievement control. 3) Benchmarking of Brazil s Education Performance Instituto Futuro Brasil The objectives of this part of the research are the following: 1. To assess more precisely how low and unequally distributed the quality of education is in Brazil, both in a regional perspective (Latin America), and in a more global perspective. 2. To compare Brazil with other countries (and complementarily to compare Brazilian intra-national units), in terms of patterns of educational inequality and of educational inequity. After finishing this research, we would like to be able to identify more clearly the characteristics of quality of education in Brazil which are similar to those found in other countries, as well as those characteristics which are unique to the Brazilian case. In order to accomplish the two main objectives stated above, we will use PISA 2003 datasets.
16 3.1 - Research steps Firstly, we will summarize, interpret, and contextualize all the available findings concerning the performance in an international perspective of Brazilian students in PISA 2003, such as those contained in the studies made by: Ravela (2004), OECD (2004a and 2004b), and Sprietsma (2007). Whenever relevant and feasible, we will compare those findings with those related to PISA 2000, such as: OECD (2001), Willms (2006), Brazil s national report for PISA 2000 (INEP, 2001), our own previous study (Waltenberg, 2005), and Fuchs and Wössman (2007). We would like to provide a precise description of the education quality in the country, and an idea of its evolution in three years (from PISA 2000 to PISA 2003). This initial descriptive exercise is all the more important if we take into account the fact that the Brazilian educational authorities have not published a national report analyzing PISA For the international comparisons, we will compare Brazil s results with those of other Latin American countries. But we also plan to compare the performance of students from Southern European countries, which share some cultural characteristics with Latin American countries (Portugal and Spain in particular) and which could be taken as a more useful benchmark than much more developed and/or culturally-distant countries like Northern and Eastern European, Asian, and Anglo-Saxon countries. Secondly, due to Brazil s vast territorial extension, its large and diverse population, and also due to the heterogeneity verified in terms of quantitative educational indicators (mentioned above), it is important to make further analysis at the intra-national level (according to the type of school students attend, to its location, etc.), along the same lines followed by the studies mentioned in the previous paragraphs (detailed descriptive statistics, analysis of variance, multilevel/cluster analysis etc.). We will come up with new results, mapping the situation in terms of quality of education inside the country. Thirdly, given the importance of quality of education along the whole distribution (cf. IDB, 2007), given the unenviable record of Brazil in this respect, and finally given the fact that we dispose of a unique dataset which allows us, both to compare Brazil to a great number of countries, and to compare Brazilian intra-
17 national units to some extent, we will devote some efforts in order to investigate patterns of educational inequality and educational inequity. Finally, we will interpret, discuss, and put into perspective our results, trying to distinguish those which are specific to the Brazilian case from those which are relevant for other countries emphasizing how our results can be relevant to policy-making. 3-2 Data source: PISA 2003 We plan to work with PISA 2003, a choice which provides a series of advantages, and is restricted by a few limitations. In what follows, the advantages and limitations are described, both generally (regarding all countries) and specifically (regarding Brazil). PISA 2003 not only is the most recent, but also it is to-date the most complete student achievement dataset in terms of the number of participating countries and of the number of students represented by the samples. PISA datasets also provide detailed information on students background and schools functioning conditions, which paves the way for describing a great number of relationships between students and schools characteristics with students performance. In particular, it is possible to decompose test results into students, schools, and teacher characteristics, with potential policy implications. Moreover, the focus of PISA is not restricted to assessing students knowledge, but also to evaluate their ability to reflect, and to apply their knowledge and experience to real-world issues (OECD, 2003: 9), which is of paramount importance in today s world. An additional virtue of those datasets is comparability, since it is possible to compare results drawn from PISA 2003 with those of the two sets of assessments which compose the first cycle (PISA 2000 and PISA Plus in 2002), and it will be possible to compare them with future cycles, planned to take place every three years. Finally, PISA data and scales occupy a privileged status a kind of currency of school quality in the protocols that are being develop by Hanushek & Wössman (2007) for pooling and analyzing different test scores data. It is also relevant to compare Brazilian education quality to that of countries which are more similar to Brazil in different respects. We would like to include in this subset of countries, not only Latin- American countries (which are the closest ones), but also the two Iberian countries (given their cultural
18 similarities with Latin American countries 5 ), and finally other Southern European countries (which have relatively similar culture patterns, and while being more developed than Latin American countries, they are not as far in terms of culture or development as Northern and Eastern European, Asian, or Anglo-Saxon countries are). By proceeding this way, we will be able to more carefully compare Brazil to a group of seven countries (ten if we include countries evaluated in the previous cycle of PISA) 6, in a similar vein of the one adopted in the Uruguay PISA 2003 national report (Ravela, 2004) to compare that country with other small-scale countries. An additional reason for analyzing education quality in Brazil by means of the PISA datasets relies on the shortness of studies of such data made by Brazilian researchers. To the best of our knowledge, there is little published, or on-going, research undertaken in the country using these datasets. One reason for that scarcity is certainly the fact that many researchers have chosen to focus on the national equivalent of PISA (the so-called SAEB), which is quite useful for intra-country analysis, as discussed the first topic of this proposal. The main limitation of PISA is the difficulty in inferring causality about the relationships unveiled by the data, as acknowledged by Willms (2006: pp. 55 and 71-72). That is due to the following features: (i) the cross-sectional design nature of the data and the absence of random assignment of students to treatment and control groups; (ii) test scores at the age 15 reflect the cumulative effect of a series of factors over a long period of time, and not only the recent impact of schools; (iii) there is no information on the classroom level, but only at individual and school level. This latter point is also highlighted in the PISA Data Analysis Manual (OECD, 2005: chapter 13). Yet, it is useful to employ PISA data as a powerful descriptive device, as pointed out by Willms (2006), with presumable policy implications. One limitation which is specific to Brazil is the very low coverage of 15-year-old Brazilians both in PISA 2003 (around 54% of the cohort) and in PISA 2000 (around 2/3 of the cohort) which is due to the 5 The recently-created research network Ibero-American PISA Group (or GIP) is an evidence of the increasing interest of such kind of comparison. 6 Greece, Italy, Mexico, Portugal, Spain, Turkey, Uruguay took part in PISA Argentina, Peru, Chile took part in PISA Plus in Colombia joined this group of countries in PISA 2006.
19 high dropout rates of very young pupils in the country. 7 This is a problem which would affect any data collected in Brazil, and whose procedure was focused in schools. It is thus a problem which is common to other education quality datasets regarding Brazil. In any case, all the results should be interpreted taking into account this limitation. Additionally, some variables (manually coded items) are not available for Brazil in PISA 2003 because Brazil did not submit them on time (OECD, 2005: p. 240), but that should not constitute a major problem for our analysis. Finally, in PISA 2003, while information is available for relevant strata (regions, private/publicstate/public-municipal, school size, urban/rural, school infrastructure index), there is no information on some important disaggregated levels (state, metropolitan/not, suburban/central), which is an important limitation for a more detailed description of education quality in Brazil. This limitation is partially compensated in another part of this research project, which uses the SAEB data Methodology Due to the nested nature of the data, HLM (or multilevel or clusters) regressions 8 will also be estimated in this part of the proposal, in order to distinguish inter-individual variance from inter-school variance, as discussed in topic 1 above. Various studies, including Hanushek & Wössman (2007), maintain that teachers quality is of paramount importance for learning. Moreover, according to OECD data, teachers wages account for around ¾ of total expenditures on primary and secondary education in Brazil in the beginning of the 2000s. So clearly, while teachers characteristics are a component of the more general category of school inputs, both due to its impact on learning and due to its cost, it is advisable to include specific variables describing them in the estimations. We will take the models proposed and estimated by Willms (2006: p.57), Dumay & Dupriez (2004), Sprietsma (2007) and Fuchs and Wössman (2007) as a departure point for the international comparisons using all the countries data. Then we will check alternative specifications, taking into account relevant 7 This problem is also mentioned in Hanushek & Wössman (2007: Section 5), when they try to calculate the overall literacy in different developing countries by combining household surveys and test scores data. 8 Bryk & Raudenbush (1992), Wooldridge (2002).
20 specificities of Brazil and of the subset of countries described above (Latin-American plus Southern European countries). To assess educational inequality, both across countries and across Brazilian intra-national units, we will apply to the PISA data of usual distributional analysis tools, such as dominance analysis and decomposable inequality indices. To assess educational inequity, both across countries and across Brazilian intra-national units, we will calculate SES gradients and school profiles (e.g., Willms, 2006; Vandenberghe & Zachary, 2000). But we will also employ recently developed opportunity-dominance analysis tools (Pistolesi et al., 2005, Lefranc et al., 2006) and indices of equality of educational opportunity (Checchi & Peragine, 2005), which are based, or try to go beyond, the normative framework stated by Roemer (1998), and which we have recently explored using SAEB data (Waltenberg, 2007). 4) Effects of the Quality of Education on Earnings Instituto Futuro Brasil In this topic we intend to investigate and quantify the impact of the quality of education on future wages in Brazil, using a pseudo-panel formed of groups of individuals born in 1977/78 in various States and followed until Methodology Many international studies show that educational quality positively influences individuals future wages [Murnane et. al. (1995), Murphy and Peltzman (2004)], their probability of continuing on to higher education [Rivkin (1995)] and countries economic growth [Bishop (1989), Hanushek and Kimko (2000)]. The econometric analysis we propose to carry out in this topic will use a pseudo-panel and correct for selection bias. The literature on pseudo-panels was pioneered by Browning et al. (1985), and the technique is used by researchers who do not have panel data, but instead various mutually independent cross sections in which different individuals are interviewed in each period. The objective of this technique is to overcome the limits of the cross sections, while using their advantages in relation to panel data. The limitations result from the fact that the researcher does not have lagged values for the variables, which in