Wednesday, 4 February 2015

University Entrance: Part II - Standards Based Testing

This post is part of a multi-part series on University Entrance and whether it is set at the right standard. For the previous part, click here - Introduction.

What is University Entrance?
We start by investigating the context in which the UE debate is held. The New Zealand Qualifications Authority (NZQA) states that:

University Entrance is the minimum requirement to go to a New Zealand university. To qualify, you will need:
-          NCEA Level 3 [60 credits at Level 3 or above + 20 credits at Level 2 or above]
-          14 credits in each of three approved subjects at Level 3
-          Literacy requirement: 10 credits at Level 2 or above, with at least 5 credits in writing and 5 credits in reading
-          Numeracy requirement: 10 credits at Level 1 or above, made up of either achievement standards or all three of the unit standards 26623, 26626, 26627

It is important to note that this is the minimum requirement to be admitted to a university; each programme can then have additional or higher requirements. Often, the credits are converted into another score, such as a rank score (Auckland) or guaranteed entry score (Victoria), for the purposes of comparing NCEA to other high school qualifications like Cambridge (CIE) and International Baccalaureate (IB). It’s probably worth mentioning here that these blog posts will only deal with entrance for domestic students.

What existed before University Entrance (as we know it now)?
Before 2003/2004, the system had two parts. Year 12 and 13 students worked towards the School Certificate, which led towards the New Zealand University Bursary, or Bursary for short. Most students (Forms 6 and 7 at the time) completed internally assessed coursework and sat externally assessed exams in the same way as they do now for NCEA.

Marks for up to five subjects are added together; with each subject scored out of 100, an aggregate score of 300 was an A Bursary, and a score between 250 and 299 was a B Bursary. This influenced how much cash was provided by the government for tertiary study. On a side note, scholarship grades were separate and awarded to the top 3-4% of each subject. Paradoxically, scholarship grades did not have any monetary award attached to them, although Top Scholars (top in each subject or scholarship in five or six subjects) did receive additional money.

Critically, the exams were norm-referenced; students received grades dependent on their performance in comparison to other students, with a letter awarded based on a pre-determined distribution.

What does norm-referenced mean?
Students are compared to other students – you don’t have to get a high number of questions correct, you just have to do better than everyone else (or a large majority of them). It was very popular around the world because the outputs are predictable, mainly because the proportion of students who pass, and the spread of those students, are pre-determined. It allows easy comparison between two students, as you can easily say which one is “better”. It’s useful if you’re a government and you have to plan for the right number of students to enter universities because there are only so many spaces and only so much funding available.

Scores are often given in percentiles instead of percentages – the higher the percentile, the more highly you scored in comparison to other students. Alternatively, grade boundaries are used to divide students into broader groups, acknowledging that a student who scores 85% in a test is probably roughly at the same level as a student who scores 86%. The grade boundaries move year-to-year to ensure that the number of students in each group is roughly proportional to the desired distribution. This is the approach used in the Cambridge and IB examinations. For Bursary, it's a combination of both - marks were scaled to fit a distribution and then letter grades assigned based on fixed grade boundaries. The grade boundaries looked something like this:


For School Certificate, it was widely known that roughly half (or maybe 46%) of the students would fail. The results were scaled to fit a standard distribution, because it was decided that a standard distribution was the desired distribution. These days it seems weird that we once forced half of all students to fail. Over a number of decades, there were increasing calls for educational reform. Ultimately, we scrapped School Certificate and Bursary, and replaced it with NCEA, a standards-based system.

So what does standards-based mean?
The big problem with norm-referenced testing is that it doesn’t ensure that any particular student has proficiency in a particular skill or knowledge about a particular subject; only that they are better/worse at the skill or know more/less about the subject than another student. Standards-based testing sets targets (also known as criteria, as in criterion-based marking), and assess whether or not the student can achieve those targets. The idea is that every student who passes a standard is known to be capable of achieving that target, regardless of whether they are in the top 50% or not. In fact, you shouldn’t care about what percentile they are in comparison to other students; just whether they achieved the standard or not. 

NCEA breaks the targets up into different subjects, different areas within those subjects (achievement standards), and divides those subjects/areas into levels (the numerical levels 1, 2, and 3 as well as the Achieved, Merit, Excellence sublevels within each numerical level). Since we aren’t focused on making students better than other students, we can focus on getting students across each line and ensuring that every student can achieve as highly as they can. This seems to make a lot more sense to most people, and educational reform around the world is moving in this direction.

What’s the problem with standards then?
The challenge is where to set the targets. If you set them too low, then they’re too easy, so students and teachers are not incentivised to put in much effort. If you set them too high, then they’re too hard, so students and teachers are demoralised (and see no point in trying). On top of the psychological effects, setting the targets at appropriate levels has implications on sectors of society that rely on the standards to judge ability and competency, such as employers and universities. Holding a particular qualification, such as NCEA Level 2 with Merit, has to communicate to other people that this student will be capable of completing some task. If the standard isn’t quite in the right place, then the value of the qualification becomes more questionable. Additionally, there are issues surrounding “teaching to the test/standard” and students working only just hard enough to get across the line, rather than continuously working to be the best that they could be as you would in a norms-referenced system like School Certificate. But let’s get back to getting the targets right.

How are the standards set?
If we purely followed the philosophy of standards-based testing, then you would set the standards at some level, and leave them there. If you need to, you make more standards higher or lower, and you make the standards public so people can look up what each standard means. However, there is a trade-off between the granularity/number of standards and the communicability of those standards. In a hypothetical world, you can have a hundred different targets under NCEA Level 1 math, and that would very accurately describe what the student can or can’t do. But when you tried to communicate that ability to someone else, it would be a very painful process; it makes life much easier to be able to say that the student has the ability to do geometry at an NCEA Level 1 Merit level. So in order to make things easier to understand, the system has to sacrifice some granularity and create broader bands.

When the standards exist in bands, then it’s much harder to get them right. Over time, you have to move them up and down slightly to ensure that you have the boundaries at the right levels. Accepting that you have to move them up and down means that the standards can be more responsive to the needs of society and the abilities of students; for example if students are getting smarter over time (as we hope they are), then the standards can drift upwards. So the NZQA monitors the results, consults with stakeholders, and decides whether standards need to move. In fact, the NZQA updates lists of standards that are being revised or reviewed on a monthly basis.

So what makes a standard “right”?
And here we arrive at what I think is probably the biggest issue. When the NCEA standards were first set, they were drafted, consulted upon, redrafted, and consulted upon again many times until teachers and government were mostly okay with things. It was a process that took many years, but they wanted to get things right. NCEA Level 1 was introduced in 2002, Level 2 in 2003, and Level 3 in 2004. As students sat the exams and went through the system, NZQA adjusted the standards as they deemed necessary. This is to be expected for any new system – some calibration is needed when you move from the theory to the real world. When the States Services Commission (SSC) investigated the performance of the NZQA in delivering secondary school qualifications in 2005, we found patterns such as this:


The targets were moved and they became harder. This didn’t happen to every achievement standard, but it occurred enough that one of the recommendations from the SSC report was that expectations with tolerances should be introduced to ensure that standards are appropriately set. Assessment should be consistent year to year, and assuming that the performance of students does not change significant year to year, then roughly the same proportions of students should be scoring achieved, merit, and excellence each year. If assessment is unnecessarily harsh one year, then it would fall outside of the expected range and the alarm bells would ring.

To be clear, officially, the NZQA says that if 100% of students get enough credits to meet the requirements for NCEA certificates, then they pass. Education Minister Hekia Parata has set an explicit target that by 2017, at least 85% of 18 year olds will have passed NCEA Level 2 (or equivalent). Pass rates for NCEA are generally increasing. But when we look at the individual standards, it becomes clear that something else is going on. For example, ENGLISH 90951 (NCEA Level 1 Unfamiliar Texts after ENGLISH 90057 was expired in 2011), has this “Profile of Expected Performance” published for 2014:


The NZQA expects that something between 20% and 25% of students who sit ENGLISH 90951 will fail. Exam papers are usually remarked in order to force the grade distribution into these expected “profiles”, although the NZQA denies that this is scaling. In fact, the NZQA says “there is no predetermined distribution of grades”. The NZQA deems claims that profiles of expected performance (PEP) are scaling or norm-referenced to be myths. Perhaps regardless of the intention, the outcome is the same; a certain percentage of students are expected to fail.

So once PEPs were set, that dictated how to set standards. The “right” standard is the one that achieves the expected grade distribution. There are some tolerances (as there should be), but overall the standard is expected to be set at a level that causes some students to fail. There’s the problem – you can’t expect a 100% pass rate if you set a standard that forces some students to fail. It all seems reminiscent of… norm-referencing (even if NZQA calls that suggestion a myth). It’s important to note that you can still pass NCEA overall even if you fail individual achievement standards; that’s why NCEA overall pass rates are increasing.

What does this have to do with University Entrance?
Universities require students to be at a particular level of competency before they enter university; otherwise they would struggle to be able to provide degrees that are of sufficient quality to be accredited. So Universities NZ, the body that represents all universities in New Zealand, along with NZQA and other stakeholders, set a minimum bar. It makes sense to use NCEA to set the height of the bar, and thus University Entrance is standards-based. It causes UE perceptions to be subject to the psychological implications of standards. It causes UE to be susceptible to the fluctuations of changing standards levels. It causes UE to be affected by the use of Profiles of Expected Performance and their suggested fail rates. Ideally, universities would be able to say “an incoming student must be able to…” but the way that standards are used in New Zealand makes that tricky.

So it makes answering the question of “Is University Entrance in New Zealand set at the right standard?” just a little bit harder, because the very system used to measure University Entrance has some flaws. So far we’ve provided some context about what we used to do, why that was bad, what standards are, why we have standards-based testing, and what some of the shortcomings are. In the next section, we’ll have a look at the more recent history of University Entrance, why it was changed recently, and the perspective of government.

If you’re further interested in the problems of NCEA and steps taken to improve it, I suggest this 2007 policy paper from the Maxim Institute as a starting point. There are probably additional issues surrounding internal vs. external assessment, vague standards, and marking fairness, but that’s outside the scope of this discussion on University Entrance. It’s interesting to see recommendations for NCEA taken on, with initiatives such as Grade Score Marking, which provides some additional granularity to the system without affecting overall communicability of the qualification. Ultimately, I think very few people would say that we should go back to the School Certificate and Bursary system; NCEA is mostly working and still better.

Part III of this series - History, Reviews, and the Perspective of Government, is available here.

No comments:

Post a comment