Assessing Discursive Writing

Terry Anderson writes in IFETS: Thanks Melissa for your comments about tying the "discussion' into the assessment. Could you (or anyone else) on this list talk a bit more how one does "Assess a personal essay about their participation in the BBS and on the web project. This is where the QUALITATIVE assessment comes in."

I've done a fair bit of essay assessment, not so much recently, but enough foot-high stacks to have developed a methodology. I think that the assessment of bulletin board discussions falls under the same metric, and I have done a great deal of that as well.

In assessing discursive writing - the sort of writing we would expect to find in student essays or on discussion boards - I look at two major criteria: sentence construction, and reasoning. For the actual grading, I assess the degree of difficulty and then the number of errors. There are also some 'wildcards' that I will discuss at the end.

Looking at each of these in turn:

Sentence construction

Students should be aware of the properties of a well-constructed sentence. In particular, the standard for each sentence is that it:
- say something
- do so clearly

The first criterion is not so frivolous as it may seem. Many sentences (at least, those penned by students in essays) do not actually say anything. With allowance for style and context, it can be said that sentence fragments do not say anything. The same can be said of sentences with ambiguous word usage, amphibolies, and the like. Such sentences are marked as errors.

The difficulty of creating a sentence (that says something) can be measured as a function of the semantic complexity of the sentence. A rough rule of thumb can be obtained from observing various possibilities of construction. Specificially, in order of increasing difficulty:
- Simple declaration: A is a B
- Categorical proposition: All As are Bs, Some As are Bs
- Logical Relation: if A then B, A or B
- Propositional attitude: it is true (false, likely) that P, P said that 'Q'
- Modality: it is possible that P, it is necessary that P

The second criterion, clarity, may also seem frivolous, but again, it is not. Most writing (and sadly, most academic writing) is unclear. Unclarity results not from the use of too complex a vocabulary, but from an inappropriate use of vocabularity. Unclarity stems from two major causes: vagueness, and superfluous precision.

Vagueness is caused when a more general expression (or an equivocal expression) is used where a specific expression is required. It is one thing to say "There is something in the woods" and quite another to say "There is a tiger in the woods." The use of vague expressions is a hedge; it frequently masks a lack of comprehension, and should be penalized.

Though greater precision aids clarity, there occurs a point at which additional precision ceases to add meaning to an expression. For example, if being an A entails being a B, then when an A is identified, nothing is added by the mention that it is also a B. For example, "There is a tiger in the woods" is clear; "There is a striped tiger with claws and teeth in the woods" is less clear.

Precision is obtained through two major mechanisms:
- word selection - selecting the most appropriate element of the taxonomy of possible entities (eg., 'thing' - 'cat' - 'tiger')
- the use of adjectives, adverbs, and associated expressions and subordinate clauses.

A greater degree of difficulty exists the more precise a writer attempts to be, because it is more difficult to be precise than vague, and because with increasing precisions comes the risk of superfluous precision.

Reasoning

There are four types of reasoning. Each of these is subject to different criteria of assessment. Addressing each briefly:

- Description: one or more sentences which assert that some thing is the case. A description typically has two parts: (a) a refrerence to one or more entities, and (b) the assertion that some property, function or relation is possessed by those entities. Contrary to popular belief, a properly constructed description is difficult to write. Clarity is critical.

A description is assessed according to whether it is accurate or inaccurate. 'Snow is white' is an accurate description, since snow is white; 'Snow is blue' is not an accurate description, since snow is not blue. Of course, we are not always able to verify the accuracy of a description. In such cases, we need to ask whether (a) the author was directly in a position to know (for example, the author is reporting a personal experience), or (b) the description could be independently verified - that is, the description is such that, were an observer in the appropriate place, he or she could distinguish whether or not a statement is true or false.

The purpose of the reference in an academic paper is to satisfy this latter criterion. The writer is asserting that 'P said Q'. In order for me to know whether it is true that "P said Q' I need to be pointed to the location where P, in fact, said Q. The reference performs this function. It need not be added that it does not follow from the fact that 'P said Q' that 'Q is true'. The assertions of P are subject to the same assessment as the writer of this paper.

- Definition - the use of an expression to fix the meaning of a term or expression. In general, a definition will appeal (sometimes implicitly) to a taxonomy, and within that taxonomy, assert that an entity P is an entity of type T, and distinguished from other entitites of type type T by virtue of having properties, functions or relations P,F or R.

Working with this definition (and keeping in mind that words may be defined ostensively, through use, or though various other mechanisms) then there are four major criteria for assessing definitions: clarity, wideness, narrowness, and consistency. Specifically, a definition must identify a discrete set of entities, it must subsume all the entities in question, and only the entities in question, and be such that there could exist at least one entity of that description.

The difficulty entailed by a definition is a function of the type of features used to distinguish entitites falling under the definition. In general, it is more difficult (though often more useful) to definie entities according to their function or their relation with other entities than by their properties.

- Argument - the use of one or more propositions, called the premises, in order to show that another proposition, called the conclusion, is true. The argument is usually the sole topic of most discussions of critical reasoning (sadly). Arguments are difficult to construct but relatively easy to assess.

There are two major forms of argument, each of which must be assessed according to its own criteria: the deductive argument, and the industive argument. In general, though, the assessment of an argument falls into two stages: first, a determination of whether the premises are true, and second, an assessment of whether the conclusion follows from the premises.

The premises of an argument may be the result of any of the four forms of reasoning described here, and are thus evaluated according to the appropriate criteria.

The question of whether the conclusion follows is determined by the type of argument. In the case of deductive reasoning (which includes mathematics, propositional logic, predicate calculus, and more) this determination is mechanical, based solely on the form of the argument.

In an inductive argument, the premises are only required to establish a liklihood that the conclusion is true (a common error of assessment is to demand certainty of an inductive argument). The premises of an inductive argument constitute a 'sample' (a1 is a B, a2 is a B, etc) while the conclusion may be either a generalization (a's are likely Bs) or a projection (a3 is likely a B).

Most inductive arguments that fail do so because either (a) the sample size is too small to warrant the conclusion, or (b) the sample is in some important respect unrepresentative of the population as a whole. Much student work commits one of these errors, for example, asserting that a generalization is true on the basis of a personal experience (an argument which, interestingly, commits both errors, since the sample size (1) is too small, and the sample (yourself) is unrepresentative (no matter how much you think the rest of the world is like you).

A special case of inductive reasoning worth touching on here is the causal argument, that is, an inductive argument that has a conclusion in the form 'A causes B'. Though it is common, in general, a conclusion of the form 'A causes B' cannot be established inductively; causal reasoning is most properly the product of an explanation (see below). Very often, a correlation (a sample that shows that 'When A occurs, B occurs' and 'When A does not occur, B does not occur') is used to infer that 'A causes B'. This conclusion, however, does not follow and is easily refuted by positing alternative explanations (for example, 'C causes both A and B').

- Explanation - the derivation of a causal relation or other underlying principle from an observed set of phenomena. Explanations typically take one of two forms: an invocation of an underlying cause or principle ('rain falls because of the condensation of water vapour') or a specific instance of the event or principle ('it is raining today because it was so humid yesterday'). In either case, the reasoning is the same, with one or more elements left implicit.

An inference to the best explanation (also known as 'abduction') is subject to several well established criteria of evaluation. Among these are:
- genuine phenomena - many purported explanations In general, the four forms of reasoning present are of phenomena that do not actually exists (for example, 'most people hate baseball because baseball involves competition')
- simplicity - a good explanation does not multiply entities beyond necessity (Ockham's razor)
- breadth - a good explanation applies to more instances, and more varied instances, of phenomena
- testability - the explanation can be used to make projections, which may then be observed as confirming (or disconfirming) instances
- relevance - the explanation is appropriate for the circumstances in which it is used (for example, a description of the process of photo- synthesis may well explain why plants grow, but will not be of use to the gardener)

The oft-cited requirement that students 'consider other (or multiple) points of view' is essentially a request for an evaluation of explanations. Unlike other forms of reasoning, an explanation is assessed not in isolation but rather with respect to a set of alternative competing hypotheses.

Degree of Difficulty

Though people are swayed by such things as Bloom's taxonomy, there is no real difference in the difficulty inherent in one type of reasoning as opposed to another. Comprehension, for example, is often more difficult than evaluation ('I don't know what it is, but I know I don't like it'). Analysis is often impossible without an understanding of synthesis, while synthesis is often possible without prior knowledge of analysis.

A paper, therefore, that expresses an opinion is not inherently more difficult than, say, a paper that describes an event. The degree of difficulty is obtained through an analysis of the clarity attempted by the writer, that is, the achievement of a greater degree of precision without descent into obscurity.

Greater precision in sentence structure has been discussed above. Greater precision in reasoning is obtained through the use of multiple (and coherent) instances of reasoning.Fred

For example, the following consists of one argument: 'Fred will win because he is faster than Jill' And the following consists of two arguments: 'Fred is faster than Jill because he gets lower times, and so Fred will win'. The second instance of reasoning is more complex than the first.

Multiple instances of reasoning need also be coherent. The following also consists of two instances of reasoning: 'Fred will win because he is faster than Jill and pizza is good for you because it contains cheese'. The two arguments are not coherent, that is, neither plays any role in the formation of the other. The degree of difficulty in forming these two arguments is no greater than that involved in forming the first argument (though the quantity - a vastly over-rated criterion - is greater).

The overall difficulty of a paper, therefore, is a function of the precision attempted in each sentence and the precision attempted in the reasoning as a whole. This yields intuitive results: a paper in a specific discipline that requires care and attention to terminology is more difficult than a paper that requires and uses more general everyday terminology. A paper that assesses the work of another writer is more difficult than a paper that does not (because it involves a greater use of propositional attitudes). A paper that has a unifying theme - arguing for a single proposition, for example - is more difficult than a paper that makes a set of unrelated assertions.

Number of Errors

A paper consists of a finite number of sentences and (we hope) a non-zero and finite number of instances of reasoning. This constitutes what we may call the 'quantity' of the paper. An off-the-cuff calculation is usually sufficient to establish the quantity of a paper; a four page paper, for example, may consist of 1000 words and therefore roughly 100 sentences (of 10 words each; your mileage may vary). It takes on average three or four sentences to complete an instance of reasoning, so such a paper may contain 25 instances of reasoning. This creates a total of 125 possible errors.

Depending on your intent, you may want to weigh these forms of errors more of less greatly. In my own classes I have alweays placed a greater emphasis on reasoning, and hence gave half the total weight to reasoning even though it represents only one fifth the number of total possible errors. This gives me a calculation of roughly 100 points, each sentence having a weight of 0.5 and each instance of reasoning having a weight of 2.

Note that if the average sentence is longer (as it would be if greater precision is attempted, which we would expect the higher the grade level) then the numbers of sentences and instances of reasoning will be lower, and these point values need to be adjusted accordingly.

In different contexts, different forms of reasoning may be more impostant. For example, in a lower university class, it may be more important to the instructor that the student understood the papers he or she is reading and writing about, and therefore, more weight would be given to instances of description (and possibly definition) than to argument or explanation.

In any event, before marking papers, the instructor should have an assessment of what elements are important, that is, what degree of weight will be given to each sentence and to each instance of reasoning.

Marking then becomes an exercise of counting the errors and comparing this number against the total possible errors. This creates a raw grade, which is then measured against the difficulty expected for students at a given level.

There are no metrics (that I am aware of) that plot difficulty (as I have defined it) against grade level. In general, instructors are expected to have a rule of thumb (though this rule is sometimes expressed in terms of number of pages, not a useful metric). In general, students are expected to be more precise and more coherent as they progress (though they receive no training in this, except perhaps by osmosis).

At any rate, if we estimate that there are five levels of difficulty in a given assignment, then we define a range of five possible maximum scores: a perfect paper at difficulty level five will obtain 100, a perfect paper at difficulty level 4 will obtain only a 90, and so on. The final grade is then a function of the difficulty level and the percentage of errors.

Wildcards

Wildcards are adjustments in grade for those students that break out of the curve, that is, those students that are exceptionally good or exceptionally bad. Though applied only in unusual instances, wildcards represent common sense constraints on the metric described above.

Relevance: most students will write on the topic assigned (or, if they select a topic, will select a topic related to the course content). On occasion, however, a student will write about something else entirely. It goes without saying that the submitted work should actually attempt to accomplish the assignment given, and thus, it is reasonable to penalize students for handing in work that does not address the question at hand.

Originality: originality is difficult to assess because what is original to the student may not be original to you. That said, originality is relatively easy for an experienced (human - I doubt that automatic grading systems are of much use here) instructor to identify.

The Metric in Practice

Consider, for example, the criteria offered by Melissa Lee Price (this is not to pick on a particular example, just to use an example that happens to be convenient).

Excellent: Essay shows a high degree of reflection and analysis of the teaching/learning experiences you've encountered in this online class. It is a thorough, critical assessment of the different communication experiences. It weighs the positives and negatives of each one and how each applies to communicating and working in the modern world. It is also well written with few or no grammar and spelling errors.

What constitutes 'excellent' in this description is on the one hand too vague and on the other hand too narrow. It is too vague because the meanings of 'reflection' and 'critical assessment' are unclear. Reflection and critical assessment are not forms of reason; they denote a melange of discourse that may include reason but may also include digressions and diatribes.

It is too narrow because, in attempting to clarify the meaning of critical assessment, it identifies only one of many discursive strategies. The discussion of positives and negatives is required only if you are working on a pre-theoretical basis, that is, it resembles brainstorming more than it does discourse. What should be required is not a mere (disassociated) listing of assessments but some consideration of why something would be thought of as a 'positive' (and some discussion of what would constitute a 'positive' in this context).

Spelling and grammar are also considered to be important, but there is no consideration of why. This encourages a definition of excellence where some functions are performed by rote. The objective is clarity, and proper spelling and grammar are what make a paper clear. And so the paper should be assessed on the basis of clarity, not adherence to a particular rule.

Average: Essay shows some degree of reflection and analysis of the teaching/learning experiences you've encountered in this online class. It lacks thoroughness and its critical assessment of the communication process is superficial. There is some understanding of how the different communication processes applies to the modern world. There are some spelling and grammar errors.

The difficulty I have with this next definition is that I am unable to determine what consistitutes 'more' reflection or 'less' reflection. There is, presumably, some sort of discourse that is not reflective, and an average essay will have a greater amount of it than does an excellent essay.

I can only guess that the 'non-reflective' elements of a paper are those parts that constitute description. But a paper that was all reflection and no description would be terrible! You have to hang your reflective hat on something!

Moreover, if we assume that by 'reflection' we mean argumentation and explanation (which includes, to anticipate an objection, theorizing), then we must ask, what is meant, really, by a greater or lesser 'degree' of reflection? Presumably, the difference between an excellent paper and an average paper is not the amount of argument or explanation, but the quality of it and the degree of difficulty (that is, the extent to which it is coherent). But this aspect is nowhere to be found; on this metric, a bad argument is as good as a good argument.

Looking at these two grades of assessemnt, it appears that the author is looking for:

- a taxonomy of communication processes (or experiences; these two words are used interchangably)

- (implicitly) a statement or account of what would make a communications process (or experience) good

- an assessment of each communication process with respect to this statement

Presumably such a taxonomy has been provided in class, as has some account of what makes them good or bad. Thus, the first two parts of the assignment is a request for a set of definitions and should be evaluated according to appropriate criteria (and note that 'reflection' and 'critical assessment' are not of a lot of use to this, the bulk, of the assignment; clarity of description (from sources offered as readings in class) and accuracy of definition are what count here.

The final part of the assignment requests an evaluation of the goodness or badness of each process of communication. Weighing the 'goodness and badness' is essentially the application of a set of evaluation criteria against an entity. For each, the writer will be required to form an argument: 'communications process C has property P, instances of P fail against criteria K, therefore C fails against K').

The best possible essay, therefore, on these criteria will be one that most completely (and presumably, most accurately, though this criterion is not stated) fills out a grid measuring communications processes against assessment criteria. This may be all the author wants, but is easy to see several ways in which such an essay could be better.

One way would be to increase the complexity of the assessemnt criteria, thereby increasing the complexity of the argumentation. For evaluation of a process is not univocal; whether something is good or bad depends very much on what you want to do. Thus, 'spraying a hose' has many advantages when you are trying to grow a garden; much fewer when trying to program a computer.

Another way would be to increase the compelxity of the argumentation by weighting the assessment criteria. For example, it may be difficult to draw pictures using a discussion board, but this is of little importance in an English class, and rather more so in a drafting class.

The perfect paper (and possibly beyond the capacity of this class) would involve these considerations in a single coherent assessment of what constitutes assessment of communication processes. It would identify regularities among the assessments offered and offer an explanation of the regularities, supporting it with comparisons to alternative explanations, resulting in a process or mechanism through which a communictaions designer cou;d reliably obtain a 'good' assessment.

Some Concluding Remarks

What I have tried to do in this paper is mostly to describe the way I mark papers. But as in everything, I do it this way because I think it ought to be done this way. And though I have kept the argumentation in this work to a minimum, I believe that what I offer here provides a greater degree of precision - and fairness - in the marking of papers.

Share |

Comments

Re: Assessing Discursive Writing

Your criteria Stephen is based on two major criteria: sentence construction and reasoning. And clarity is an important component of sentence construction.

But I would be interested to know if and how you assess the "rightness" or "wrongness" or, say, number of errors, when considering the following (related) factors:

a) the writing in relation to the audience i.e clarity - yes, but for whom? For example: something written for peers would be different to something written for a community of experts to which the student is aspiring.

b) the objective for writing (for example: if the objective is to clarify an argument to an audience that includes people with other cultural - and linguistic - references, it will be written in a different way if it is to clarify an argument for a culturally homogenous audience).

c) the identity or the persona of the author. A student is experimenting with his or her voice - how it sounds (authoritarian, dismissive, obsequious, respectful, inclusive etc.) and sometimes gets it "right" or "wrong" (and also "creative") in relation to the audience and to his/her objective.

In my experience students are often self-conscious of spelling/grammatical errors and those of reasoning at the cost of being aware of their audience, their objective(s) for writing and the identity they are creating/projecting.

How do you consider these factors in your assessment of a "perfect paper"?

Cheers
Bev
(P.S. This is the same message I put in IFETS) [Comment] [Permalink] [Previous][Next]

Re: Assessing Discursive Writing

Hiya, I would like to offer some brief comments in reply to the responses to my previous post. First, in the dicusssion area on my site and in Bill Williams's response the question of ease of use is raised. Admittedly, what I describe is, as Bill says, "pretty exhaustive." But so would the practice of riding a bicyle if described in a similar manner. In fact, variables such as the number of sentences, the number of elements of reasoning, and their respective complexity, are not measured (by counting, say) by experienced evaluators, but rather, are sense directly. How many words are there in this paragraph? An experienced reader would say "a hundred or so" without counting. The errors, meanwhile, are simply spotted in the process of reading (and in my case, simply circled). Learning what constitutes an error in reasoning and how to read sentences for content may take more time, but this is something I am sure professors are up to.

Second, I want to emphasize that the four forms of discursive writing I offer are exhaustive. Aside from some non-discursive elements (such as questions and exclamations), these four types exhaust the potential types of writing a student - or anyone - can provide. Terry Anderson returns to the word 'reflection' and offers a clarification from the OED: "The action of turning (back) or fixing the thoughts on some subject; meditation, deep or serious consideration." It should be obvious that one's thoughts are not on the page, only expressions of them. Even so. Let's look at the thoughts: what can they be? Aside from being focused on a topic, we are told only 'meditation, deep or serious consideration'. How is this to be evaluated? Such 'meditations' or deep and serious considerations' can only be in one of (or a combination of) the four forms of discursive writing I identified.

In other words, Terry is identifying some sort of writing that is 'a much more affective, contemplative account of the process, as opposed to a descriptive or logical description.' Now I'm not sure whether he means here the assessment process or the writing process, but it doesn't matter. The typology I offered is exhaustive; there are no other criteria. I don't say this as a matter of opinion as to what should be counted in the writing or assessment of a piece of writing, I state it as a matter of observation. Pick any writing you care to offer, and it falls into one of the four categories I identify. Pick any assessment criteria, and they either fall under the criteria I offered, or they amount to a non-statement of criteria, leaving the student guessing as to what is being evaluated, and the evaluator assigning arbitrary grades for no reason at all.

I recognize that readers may find this contentious, but I offer my observations here to empirical test. Find some writing and see whether any part of it falls outside my typology. Find some assessment criteria and ask, "What would count as a successful instance, and what would not?" If an answer can be found, it is one of the assessment criteria I described. Otherwise, if no answer can be found, then it turns out that the assessment is based on nothing at all.

To return, for example, to Bill Williams's account, he offers four criteria:
1) understanding of processes and their application
2) analysis and argument
3) use of appropriate language and genre
4) structure and organisation

The first element is seeking accurate descriptions and (where approipriate) definitions of a certain set of communication processes. Aside from merely counting the fact that such descriptions and definitions are attempted, what would constiutute a good answer here? If we consider only the descriptions, it seems clear that accurate descriptions would score better than inaccurate ones, and that the external referents employed in order to verify this are (a) actual observed communications processes, and (b) texts by authorities describing those processes and their applicability. What else would count?

The second element requires 'analysis and argument'. Argument already being one of the criteria I identify, I can move directly to analysis. What is analysis? What would distinguish between a good one and a bad one? A naive account of 'analysis' will tell you that it is the 'breaking down into parts' of something - naive, because such an account does not encourage the writer to take into account the functional or relational properties of a thing, only the structural properties. But even so. This is no more than a request for a description of something. And as such is evaluated according to the criteria I describe earlier.

The third criterion is 'use of appropriate language and genre'. Again, we ask, what would make some piece of language or a genre 'appropriate' or 'inappropriate'? Without further clarification, this designation is meaningless, and will offer no guidance to either the student or the assessor. For example, is it appropriate to hand in a work of fiction as an academic essay? This is no idle question - it is something I did as a student and something that has been done to be as an assessor of student work. If by 'appropriate' we mean something other than clarity and complexity as I have described them, then what do we mean? And if not, then I would argue, we are not served well in the use of a vague term such as 'appropriate'.

Finally, once again, we ask what would count as good 'structure and organization' as compared to a poorer performance? This would depend ineliminably on the nature of the structure and organization being considered. But as the writing can be composed only of the four forms of reason I describe, then the structure and organization amount to what I called 'coherence' - do the parts of the piece of writing fit into a logical whole? Does one argument lead into the next? Is a description used as the basis of an explanation? The terms 'structure and organization' are vague descriptors for what is actually required.

My apologies if this post sounds a little strident; it wasn't meant to, but I felt a straightforward discursion would be the most effective means of presenting my argument in a short space. Thanks for your time. [Comment] [Permalink] [Previous][Next]

Re: Assessing Discursive Writing

I want to comment later on the content of the article but first I would like to suggest, Stephen, that you do post a copy of it to the IFETs list so that it will be archived there also. It is useful to have all contributions to the discussion archived together. Bill Williams [Comment] [Permalink] [Previous][Next]

Assessing Discursive Writing

I really wonder if you do this with every piece of writing that you get in an online course. Consider Edgar Allen Poe's "Philosophy of Composition". Did he write the Raven first and then compose the Philosophy or the other way around. [Comment] [Permalink] [Previous][Next]

Comment



Title
Your comment:
Enter email to receive replies:

Your comments always remain your property, but in posting them here you agree to license under the same terms as this site (CC By-NC-SA). If your comment is offensive it will be deleted.

Automated Spam-checking is in effect. If you are a registered user you may submit links and other HTML. Anonymous users cannot post links and will have their content screened - certain words are prohibited and your comment will be analyzed to make sure it makes sense.