Assessing Teachers Needs Research

Now the summer exam results circus is over &mdash and the usual suspects have made their usual criticisms of the examination system &mdash I would like to suggest to OFSTED (and school managers) what they should be looking at when they assess teaching.

Currently, most schools operate formal lesson observations where a manager sits in on a lesson and fills in a proforma. This sheet has key observations to make and a four level grading system, based on OFSTED's procedures so that the school can defend it when inspectors arrive. (The grades range from 1 = very good to 4 = poor, with 3 = satisfactory, the new poor). Teachers are, of course, carefully trained in the system, so that formally observed lessons fill one tick-box after another.

Particularly important to the watched are those lesson features deemed good practice. They can be set as hurdles, limiting the grades otherwise good lessons if the box is not ticked. For example, 'are the lesson aims written on the board?' Not 'do the students understand what they are learning?' or 'were the students swept along?'. Another example: 'was ICT used in the lesson?', even if that use was no better that the non-computer alternative, since there is a government target on ICT use in the classroom.

So what would be better? I suggest that there is plenty of research evidence as to what techniques work in classrooms. Rather than writing off a teacher on their annual observation because they did not use ICT that lesson and the class exam average was below the 'benchmark', or because the teacher was idiosyncratic, the observers should be checking to see if the teacher was doing what objectively works.

Most educational interventions have some positive effect on students achievements, so what is needed is a list of the most effective interventions, since we all have only limited time and energy. There are several literature reviews summarising the evidence for interventions. For example here, and here.

Effect sizes can show quickly which interventions are worth expending time and effort on and which can safely be given lower priority. A list from the first link shows these effect sizes:
Feedback / 1.13
Prior Ability / 1.04
Instructional Quality / 1.00
Direct instruction / 0.82
Remediation feedback / 0.65
Student disposition / 0.61
Class environment (culture) / 0.56
Challenge/goals / 0.52
Peer tutoring / 0.50
Mastery learning / 0.50
Team teaching / 0.06
Behavioural objectives / 0.12
Finances/money / 0.12
Individualisation / 0.14
Audio visual aids / 0.16
Ability grouping / 0.18
Effect sizes do not tell you what is good, but they do indicate what actually improves student outcomes. An effect size of 1.0 is well worth achieving, and is approx. equivalent to one year of advancement. 0.5 is well worth a try. Requiring your teachers to include interventions with lower effect sizes may be counter-productive, indeed some of your better teachers may start to quietly rebel.

Lesson observations have the power to force teachers to do what the Principle or Head Teacher wants them to do. It is essential that these demands are informed by the best educational research, and not by political or bureaucratic considerations.

Read more!


And Another Thing...

Why do the top A Level stories in the newspapers, especially front page photo ones, always exclude successful boys. Or, for that matter, any but the most pretty girls? Or black students?

Yes, I do know the answer, but it hardly matches the equality rhetoric of their editorials.
Read more!

Defensive Statistics

The results are in, so now is the time for all conscientious Physics teachers to analyse their A Level results.

The newspapers are always first off the mark with ranked tables of gross percentages, showing what proportion of each grammar or independent school gained A or B grades. The tables are then dissected and the top schools pronounced. That they are always selective, either academically or socially, will not be dwelt upon.

Within schools, the Physics teachers will have a couple of weeks to analyse the results of their own students. There are two pressing reasons: to properly assess you own performance, and to have a ready defence against naive assessments by managers.

There will be decisions made using unjustified or unreliable comparisons between very different subjects and with small sample problems. (Why on Earth do class stats get reported to three significant figures?) There will be a need to explain that Johnny did not necessarily underperform in Physics just because he got higher grades in Computing and English.

I have had a cross Head of Sixth Form who presented data showing that half the Physics class had Physics as their worst grade at AS Level. Was I really suggesting that Physics was harder than other subjects? - he asked me. Well, er, yes.

It would be funny if important decisions weren't being made based on these amateur stats and analysis.
Read more!


A Level Results Up, Again

It's time for the annual hoo-ha over the remorseless rise in A Level passes. Hooray! Students starting the first year of their courses next month will be the first to have the chance of A* grades if they get a grade A in the first year and 90%+ in the second year units. This is the best that can be done with the erosion of grade As as the identifiers of top flight students, now awarded at a rate of nearly 30%.

And how long before grade inflation renders the A* insufficiently discriminating, requiring A**? With a one fifth increase in A grades in the last five years, it won't be long.

None of this affects this years' students, though, but surely stabilising standards in the long run will be essential.

Unless the government is happy to see A Levels wither away.
Read more!


Sats Results - Not a Good Test

The Press Association manages to make much of a one percent change in Key Stage 3 figures:
Teachers' leaders have warned that too much importance is placed on national school tests as figures revealed a drop in reading standards.

Almost one in three 14-year-olds are failing to reach the reading level expected of their age group, according to Government figures.
Dr Mary Bousted, general secretary of the Association of Teachers and Lecturers (ATL), said: "The Key Stage 3 tests are an irrelevance. No-one will be interested in the results when young people apply for a job."
Plummet? Physics teachers train their students to assess the level of uncertainty in presenting data, but the government press release makes much of a 1% drop in one measure. The sample size is huge, for sure, but the tests are different and there is an element of judgment in deciding the grade boundaries. Is Ed Balls really saying that these tests are so precise, year on year? Is he aware of the reliability of the tests? Can he tell us what they actually measure for that matter?

At least the ATL has called it right. An irrelevence, except that their side effects are not. This, from the Guardian puts the issue well:
For all age cohorts, the system of assessment is now the major inhibitor of much-needed curriculum innovation.
But Sats are also suffering from a more general public policy problem. Even if they were ever a reliable indicator of performance, over time they've tended to become merely a guide to schools' willingness and ability to teach to the test.

This is a classic example of Goodhart's law - that a measure of performance is no longer a reliable indicator once it becomes a target.
Why does everyone but ministers and managers know this?
Read more!


Are Fewer Pupil Bunking Off, Really?

The DCSF has released school pupil absence data "presented as emerging findings (based on provisional data)" that seem to show slightly reduced unauthorised absence rates during the recent Spring Term, down to 1.00% from 1.06%. The reasons given for publishing emerging findings are "to help planning, to study trends and to monitor outcomes of initiatives and interventions on pupil attendance."

The data as presented, naturally, do not even remotely meet this worthy aim as they are collected together under a few broad headings. There is no attempt to report on any controlled tests of any of these initiatives, so no judgements can be made for them. But, surely, a drop in the gross absence figures should be welcomed?

No. Since the data is reported publicly on a school by school basis and included in the league tables, the collection of this data is contaminated. The data collectors, the schools themselves, benefit from improvements in the data, so small reported drops are just that. Reported. In a previous position I have myself been instructed by the headteacher to change absences marked in my form's register from unauthorised (i.e. skiving) to authorised, to make the school figures look better.

It is high time that government stopped measuring the education by measuring every student all the time. Any sensible industry would be making random samples for overall judgements: cheaper, reliable and immune to fiddling by schools.
Read more!