Jul 13, 2004
I've done a fair bit of essay assessment, not so much recently, but enough foot-high stacks to have developed a methodology. I think that the assessment of bulletin board discussions falls under the same metric, and I have done a great deal of that as well.
In assessing discursive writing - the sort of writing we would expect to find in student essays or on discussion boards - I look at two major criteria: sentence construction, and reasoning. For the actual grading, I assess the degree of difficulty and then the number of errors. There are also some 'wildcards' that I will discuss at the end.
Looking at each of these in turn:
Students should be aware of the
properties of a well-constructed sentence. In particular, the standard for each sentence is that it:
- say something
- do so clearly
The first criterion is not so frivolous as it may seem. Many sentences (at least, those penned by students in essays) do not actually say anything. With allowance for style and context, it can be said that sentence fragments do not say anything. The same can be said of sentences with ambiguous word usage, amphibolies, and the like. Such sentences are marked as errors.
The difficulty of creating a sentence (that says something) can be measured
as a function of the semantic complexity of the sentence. A rough
rule of thumb can be obtained from observing various possibilities
of construction. Specificially, in order of increasing difficulty:
- Simple declaration: A is a B
- Categorical proposition: All As are Bs, Some As are Bs
- Logical Relation: if A then B, A or B
- Propositional attitude: it is true (false, likely) that P, P said that 'Q'
- Modality: it is possible that P, it is necessary that P
The second criterion, clarity, may also seem frivolous, but again, it is not. Most writing (and sadly, most academic writing) is unclear. Unclarity results not from the use of too complex a vocabulary, but from an inappropriate use of vocabularity. Unclarity stems from two major causes: vagueness, and superfluous precision.
Vagueness is caused when a more general expression (or an equivocal expression) is used where a specific expression is required. It is one thing to say "There is something in the woods" and quite another to say "There is a tiger in the woods." The use of vague expressions is a hedge; it frequently masks a lack of comprehension, and should be penalized.
Though greater precision aids clarity, there occurs a point at which additional precision ceases to add meaning to an expression. For example, if being an A entails being a B, then when an A is identified, nothing is added by the mention that it is also a B. For example, "There is a tiger in the woods" is clear; "There is a striped tiger with claws and teeth in the woods" is less clear.
Precision is obtained through two major mechanisms:
- word selection - selecting the most appropriate element of the taxonomy of possible entities (eg., 'thing' - 'cat' - 'tiger')
- the use of adjectives, adverbs, and associated expressions and subordinate clauses.
A greater degree of difficulty exists the more precise a writer attempts to be, because it is more difficult to be precise than vague, and because with increasing precisions comes the risk of superfluous precision.
There are four types of reasoning. Each of these is subject to different criteria of assessment. Addressing each briefly:
- Description: one or more sentences which assert that some thing is the case. A description typically has two parts: (a) a refrerence to one or more entities, and (b) the assertion that some property, function or relation is possessed by those entities. Contrary to popular belief, a properly constructed description is difficult to write. Clarity is critical.
A description is assessed according to whether it is accurate or inaccurate. 'Snow is white' is an accurate description, since snow is white; 'Snow is blue' is not an accurate description, since snow is not blue. Of course, we are not always able to verify the accuracy of a description. In such cases, we need to ask whether (a) the author was directly in a position to know (for example, the author is reporting a personal experience), or (b) the description could be independently verified - that is, the description is such that, were an observer in the appropriate place, he or she could distinguish whether or not a statement is true or false.
The purpose of the reference in an academic paper is to satisfy this latter criterion. The writer is asserting that 'P said Q'. In order for me to know whether it is true that "P said Q' I need to be pointed to the location where P, in fact, said Q. The reference performs this function. It need not be added that it does not follow from the fact that 'P said Q' that 'Q is true'. The assertions of P are subject to the same assessment as the writer of this paper.
- Definition - the use of an expression to fix the meaning of a term or expression. In general, a definition will appeal (sometimes implicitly) to a taxonomy, and within that taxonomy, assert that an entity P is an entity of type T, and distinguished from other entitites of type type T by virtue of having properties, functions or relations P,F or R.
Working with this definition (and keeping in mind that words may be defined ostensively, through use, or though various other mechanisms) then there are four major criteria for assessing definitions: clarity, wideness, narrowness, and consistency. Specifically, a definition must identify a discrete set of entities, it must subsume all the entities in question, and only the entities in question, and be such that there could exist at least one entity of that description.
The difficulty entailed by a definition is a function of the type of features used to distinguish entitites falling under the definition. In general, it is more difficult (though often more useful) to definie entities according to their function or their relation with other entities than by their properties.
- Argument - the use of one or more propositions, called the premises, in order to show that another proposition, called the conclusion, is true. The argument is usually the sole topic of most discussions of critical reasoning (sadly). Arguments are difficult to construct but relatively easy to assess.
There are two major forms of argument, each of which must be assessed according to its own criteria: the deductive argument, and the industive argument. In general, though, the assessment of an argument falls into two stages: first, a determination of whether the premises are true, and second, an assessment of whether the conclusion follows from the premises.
The premises of an argument may be the result of any of the four forms of reasoning described here, and are thus evaluated according to the appropriate criteria.
The question of whether the conclusion follows is determined by the type of argument. In the case of deductive reasoning (which includes mathematics, propositional logic, predicate calculus, and more) this determination is mechanical, based solely on the form of the argument.
In an inductive argument, the premises are only required to establish a liklihood that the conclusion is true (a common error of assessment is to demand certainty of an inductive argument). The premises of an inductive argument constitute a 'sample' (a1 is a B, a2 is a B, etc) while the conclusion may be either a generalization (a's are likely Bs) or a projection (a3 is likely a B).
Most inductive arguments that fail do so because either (a) the sample size is too small to warrant the conclusion, or (b) the sample is in some important respect unrepresentative of the population as a whole. Much student work commits one of these errors, for example, asserting that a generalization is true on the basis of a personal experience (an argument which, interestingly, commits both errors, since the sample size (1) is too small, and the sample (yourself) is unrepresentative (no matter how much you think the rest of the world is like you).
A special case of inductive reasoning worth touching on here is the causal argument, that is, an inductive argument that has a conclusion in the form 'A causes B'. Though it is common, in general, a conclusion of the form 'A causes B' cannot be established inductively; causal reasoning is most properly the product of an explanation (see below). Very often, a correlation (a sample that shows that 'When A occurs, B occurs' and 'When A does not occur, B does not occur') is used to infer that 'A causes B'. This conclusion, however, does not follow and is easily refuted by positing alternative explanations (for example, 'C causes both A and B').
- Explanation - the derivation of a causal relation or other underlying principle from an observed set of phenomena. Explanations typically take one of two forms: an invocation of an underlying cause or principle ('rain falls because of the condensation of water vapour') or a specific instance of the event or principle ('it is raining today because it was so humid yesterday'). In either case, the reasoning is the same, with one or more elements left implicit.
An inference to the best explanation (also known
as 'abduction') is subject to several well established
criteria of evaluation. Among these are:
- genuine phenomena - many purported explanations In general, the four forms of reasoning present are of phenomena that do not actually exists (for example, 'most people hate baseball because baseball involves competition')
- simplicity - a good explanation does not multiply entities beyond necessity (Ockham's razor)
- breadth - a good explanation applies to more instances, and more varied instances, of phenomena
- testability - the explanation can be used to make projections, which may then be observed as confirming (or disconfirming) instances
- relevance - the explanation is appropriate for the circumstances in which it is used (for example, a description of the process of photo- synthesis may well explain why plants grow, but will not be of use to the gardener)
The oft-cited requirement that students 'consider other (or multiple) points of view' is essentially a request for an evaluation of explanations. Unlike other forms of reasoning, an explanation is assessed not in isolation but rather with respect to a set of alternative competing hypotheses.
Degree of Difficulty
Though people are swayed by such things as Bloom's taxonomy, there is no real difference in the difficulty inherent in one type of reasoning as opposed to another. Comprehension, for example, is often more difficult than evaluation ('I don't know what it is, but I know I don't like it'). Analysis is often impossible without an understanding of synthesis, while synthesis is often possible without prior knowledge of analysis.
A paper, therefore, that expresses an opinion is not inherently more difficult than, say, a paper that describes an event. The degree of difficulty is obtained through an analysis of the clarity attempted by the writer, that is, the achievement of a greater degree of precision without descent into obscurity.
Greater precision in sentence structure has been discussed above. Greater precision in reasoning is obtained through the use of multiple (and coherent) instances of reasoning.Fred
For example, the following consists of one argument: 'Fred will win because he is faster than Jill' And the following consists of two arguments: 'Fred is faster than Jill because he gets lower times, and so Fred will win'. The second instance of reasoning is more complex than the first.
Multiple instances of reasoning need also be coherent. The following also consists of two instances of reasoning: 'Fred will win because he is faster than Jill and pizza is good for you because it contains cheese'. The two arguments are not coherent, that is, neither plays any role in the formation of the other. The degree of difficulty in forming these two arguments is no greater than that involved in forming the first argument (though the quantity - a vastly over-rated criterion - is greater).
The overall difficulty of a paper, therefore, is a function of the precision attempted in each sentence and the precision attempted in the reasoning as a whole. This yields intuitive results: a paper in a specific discipline that requires care and attention to terminology is more difficult than a paper that requires and uses more general everyday terminology. A paper that assesses the work of another writer is more difficult than a paper that does not (because it involves a greater use of propositional attitudes). A paper that has a unifying theme - arguing for a single proposition, for example - is more difficult than a paper that makes a set of unrelated assertions.
Number of Errors
A paper consists of a finite number of sentences and (we hope) a non-zero and finite number of instances of reasoning. This constitutes what we may call the 'quantity' of the paper. An off-the-cuff calculation is usually sufficient to establish the quantity of a paper; a four page paper, for example, may consist of 1000 words and therefore roughly 100 sentences (of 10 words each; your mileage may vary). It takes on average three or four sentences to complete an instance of reasoning, so such a paper may contain 25 instances of reasoning. This creates a total of 125 possible errors.
Depending on your intent, you may want to weigh these forms of errors more of less greatly. In my own classes I have alweays placed a greater emphasis on reasoning, and hence gave half the total weight to reasoning even though it represents only one fifth the number of total possible errors. This gives me a calculation of roughly 100 points, each sentence having a weight of 0.5 and each instance of reasoning having a weight of 2.
Note that if the average sentence is longer (as it would be if greater precision is attempted, which we would expect the higher the grade level) then the numbers of sentences and instances of reasoning will be lower, and these point values need to be adjusted accordingly.
In different contexts, different forms of reasoning may be more impostant. For example, in a lower university class, it may be more important to the instructor that the student understood the papers he or she is reading and writing about, and therefore, more weight would be given to instances of description (and possibly definition) than to argument or explanation.
In any event, before marking papers, the instructor should have an assessment of what elements are important, that is, what degree of weight will be given to each sentence and to each instance of reasoning.
Marking then becomes an exercise of counting the errors and comparing this number against the total possible errors. This creates a raw grade, which is then measured against the difficulty expected for students at a given level.
There are no metrics (that I am aware of) that plot difficulty (as I have defined it) against grade level. In general, instructors are expected to have a rule of thumb (though this rule is sometimes expressed in terms of number of pages, not a useful metric). In general, students are expected to be more precise and more coherent as they progress (though they receive no training in this, except perhaps by osmosis).
At any rate, if we estimate that there are five levels of difficulty in a given assignment, then we define a range of five possible maximum scores: a perfect paper at difficulty level five will obtain 100, a perfect paper at difficulty level 4 will obtain only a 90, and so on. The final grade is then a function of the difficulty level and the percentage of errors.
Wildcards are adjustments in grade for those students that break out of the curve, that is, those students that are exceptionally good or exceptionally bad. Though applied only in unusual instances, wildcards represent common sense constraints on the metric described above.
Relevance: most students will write on the topic assigned (or, if they select a topic, will select a topic related to the course content). On occasion, however, a student will write about something else entirely. It goes without saying that the submitted work should actually attempt to accomplish the assignment given, and thus, it is reasonable to penalize students for handing in work that does not address the question at hand.
Originality: originality is difficult to assess because what is original to the student may not be original to you. That said, originality is relatively easy for an experienced (human - I doubt that automatic grading systems are of much use here) instructor to identify.
The Metric in Practice
Consider, for example, the criteria offered by Melissa Lee Price (this is not to pick on a particular example, just to use an example that happens to be convenient).
Excellent: Essay shows a high degree of reflection and analysis of the teaching/learning experiences you've encountered in this online class. It is a thorough, critical assessment of the different communication experiences. It weighs the positives and negatives of each one and how each applies to communicating and working in the modern world. It is also well written with few or no grammar and spelling errors.
What constitutes 'excellent' in this description is on the one hand too vague and on the other hand too narrow. It is too vague because the meanings of 'reflection' and 'critical assessment' are unclear. Reflection and critical assessment are not forms of reason; they denote a melange of discourse that may include reason but may also include digressions and diatribes.
It is too narrow because, in attempting to clarify the meaning of critical assessment, it identifies only one of many discursive strategies. The discussion of positives and negatives is required only if you are working on a pre-theoretical basis, that is, it resembles brainstorming more than it does discourse. What should be required is not a mere (disassociated) listing of assessments but some consideration of why something would be thought of as a 'positive' (and some discussion of what would constitute a 'positive' in this context).
Spelling and grammar are also considered to be important, but there is no consideration of why. This encourages a definition of excellence where some functions are performed by rote. The objective is clarity, and proper spelling and grammar are what make a paper clear. And so the paper should be assessed on the basis of clarity, not adherence to a particular rule.
Average: Essay shows some degree of reflection and analysis of the teaching/learning experiences you've encountered in this online class. It lacks thoroughness and its critical assessment of the communication process is superficial. There is some understanding of how the different communication processes applies to the modern world. There are some spelling and grammar errors.
The difficulty I have with this next definition is that I am unable to determine what consistitutes 'more' reflection or 'less' reflection. There is, presumably, some sort of discourse that is not reflective, and an average essay will have a greater amount of it than does an excellent essay.
I can only guess that the 'non-reflective' elements of a paper are those parts that constitute description. But a paper that was all reflection and no description would be terrible! You have to hang your reflective hat on something!
Moreover, if we assume that by 'reflection' we mean argumentation and explanation (which includes, to anticipate an objection, theorizing), then we must ask, what is meant, really, by a greater or lesser 'degree' of reflection? Presumably, the difference between an excellent paper and an average paper is not the amount of argument or explanation, but the quality of it and the degree of difficulty (that is, the extent to which it is coherent). But this aspect is nowhere to be found; on this metric, a bad argument is as good as a good argument.
Looking at these two grades of assessemnt, it appears that the author is looking for:
- a taxonomy of communication processes (or experiences; these two words are used interchangably)
- (implicitly) a statement or account of what would make a communications process (or experience) good
- an assessment of each communication process with respect to this statement
Presumably such a taxonomy has been provided in class, as has some account of what makes them good or bad. Thus, the first two parts of the assignment is a request for a set of definitions and should be evaluated according to appropriate criteria (and note that 'reflection' and 'critical assessment' are not of a lot of use to this, the bulk, of the assignment; clarity of description (from sources offered as readings in class) and accuracy of definition are what count here.
The final part of the assignment requests an evaluation of the goodness or badness of each process of communication. Weighing the 'goodness and badness' is essentially the application of a set of evaluation criteria against an entity. For each, the writer will be required to form an argument: 'communications process C has property P, instances of P fail against criteria K, therefore C fails against K').
The best possible essay, therefore, on these criteria will be one that most completely (and presumably, most accurately, though this criterion is not stated) fills out a grid measuring communications processes against assessment criteria. This may be all the author wants, but is easy to see several ways in which such an essay could be better.
One way would be to increase the complexity of the assessemnt criteria, thereby increasing the complexity of the argumentation. For evaluation of a process is not univocal; whether something is good or bad depends very much on what you want to do. Thus, 'spraying a hose' has many advantages when you are trying to grow a garden; much fewer when trying to program a computer.
Another way would be to increase the compelxity of the argumentation by weighting the assessment criteria. For example, it may be difficult to draw pictures using a discussion board, but this is of little importance in an English class, and rather more so in a drafting class.
The perfect paper (and possibly beyond the capacity of this class) would involve these considerations in a single coherent assessment of what constitutes assessment of communication processes. It would identify regularities among the assessments offered and offer an explanation of the regularities, supporting it with comparisons to alternative explanations, resulting in a process or mechanism through which a communictaions designer cou;d reliably obtain a 'good' assessment.
Some Concluding Remarks
What I have tried to do in this paper is mostly to describe the way I mark papers. But as in everything, I do it this way because I think it ought to be done this way. And though I have kept the argumentation in this work to a minimum, I believe that what I offer here provides a greater degree of precision - and fairness - in the marking of papers.