Gregory S. Hadley
and John E. Naaykens
Department of General Education
Few issues in the field of second language research have
been as contentious as cloze testing. Over the years, opinions in the TEFL
academic community have been divided over the applicability of cloze tests for
the second language classroom. Some contend that cloze tests measure a language
learner's overall communicative ability in the target language (Hanania and
Shikhani 1986). Others maintain that cloze tests assess only the most basic of
second language learning and reading comprehension (Shanahan, Kamil and Tobin
1982). Still others support a moderate position. Ikeguchi (1995), who quotes
Bachman (1990:86-89), states that cloze testing:
. . . hold[s] potential for measuring aspects of
students' written grammatical competence, "knowledge of vocabulary, morphology,
syntax, and phonology," as well as textual competence, "knowledge of
cohesive and rhetorical properties of text" in second language (p. 167).
Some years earlier, Bachman
(1982:61-70) reported that certain types of cloze tests, such as the selective
deletion cloze, can be used to investigate a subject's knowledge of written
discourse items such as context cohesion, syntax and strategic textual
comprehension. Anderson (1979) adds that cloze testing correlates more closely
with grammar tests than with reading tests, and according to Bowen et al. (1985:376),
the selective deletion cloze is ideal for testing vocabulary and grammar.
Claims such as these should prompt us to find out for ourselves if cloze tests,
such as the selective deletion cloze, can measure a subject's knowledge of
grammar. Would students with higher scores on a selective deletion cloze test
also score higher on a criterion-referenced examination designed to measure
grammatical competency?
We will consider this question as we review a 1996 study
conducted at Niigata University. The purpose of this study was to investigate
whether the selective deletion cloze correlates highly with traditional,
grammar-based tests. Many language teachers in the national university system
opt for criterion-referenced tests (C-RTs) which attempt to measure grammatical
knowledge (Garland 1996). Putting aside the issue of whether language teachers
should focus primarily on grammatical proficiency, a selective-deletion cloze
test, if proven to be a valid measure of grammatical competency, might provide
a time-saving method of examination which is both fair to students and easier
to grade for teachers. Before looking at the findings of this study, however, a
brief history has been provided for those new to cloze testing.
Cloze testing was first introduced by W.L. Taylor (1953),
who developed it as a reading test for native speakers. He defined the term
"cloze" from a gestalt concept which teaches that an individual will
be able to complete a task only after its pattern has been discerned:
A cloze unit may be defined as: any single occurrence of
a successful attempt to reproduce accurately a part deleted from a 'message'
(any language product), by deciding from the context that remains, what the
missing part should be (p. 416).
Cloze tests consist of a text
(usually two or three paragraphs) which has had words or parts of words deleted
from it. Test subjects must draw from their knowledge of the language in order
to write appropriate words in the blanks (see Table One).

There are at least five main
types of cloze tests available to language teachers: The fixed-rate deletion,
the selective deletion (also known as the rational cloze), the multiple-choice
cloze, the cloze elide and the C-test (Ikeguchi 1995; Weir 1990; Klein-Braley
and Raatz 1984).
In the fixed-rate deletion,
after one or two sentences, every nth word is deleted. Usually every
fifth or seventh word is deleted, but Brown (1983) suggests that longer texts
with every eleventh or fifteenth word deleted can be used with subjects who
have a lower level of language proficiency. Multiple choice cloze tests provide
the subjects with several possible items to choose from for each blank. The
cloze elide inserts words which do not belong in the text, and requires the
subjects to identify the incorrect words plus write appropriate items in their
place. The C-test consists of deleting only part of every second word in a
text, and asks subjects to complete each truncated word. In the selective
deletion or rational cloze, the tester chooses which items he or she wishes to
delete from the text. The goal for teachers using this test is not only to fine
tune the level of difficulty of the text, but also to measure the knowledge of specific
grammatical points and vocabulary items. Let us now consider whether the
selective deletion cloze truly is a reliable measure of grammatical knowledge.
One group (see Table Two) from Niigata University was
selected for this study. As Table Two shows, all were native Japanese speakers
consisting mostly of first year Science majors. No special criteria was used in
selecting or excluding the subjects. Neither was the group tested on their
English proficiency level before entering the course. However, classroom
experience with the subjects led us to believe that most group members had
limited speaking, listening and writing skills, typically representative of a
Japanese university first year EFL class (cf. Wadden 1993).
English 1B,
Niigata University, 1996-1997
|
|
|
Language
|
Japanese |
|
Age
|
18 (82%) 19 (18%) |
|
Sex
|
Male (55%) Female (45%) |
|
Department
|
Science (91%) Education (9%) |
|
Skill
Level |
False Beginners |
|
Total
Number Subjects |
22 |
Interchange Two (Richards, et al. 1993) was
used as the primary text. The selective deletion cloze was created from one of
the general interest reading texts in the first chapter of the course book
(Richards et al. 1993:7, see Table Three). While the subjects had read
the text several months earlier, we were fairly certain that very few, if any
of the students had read the text again since that time. The cloze test
consisted of a 133 word passage with 25 blanks, meaning that roughly 19% of the
total text was deleted. Test-retest was conducted two separate times on this
particular cloze. At a probability rating at less than one percent that the
results are due to chance ( p < .01), the reliability
coefficient for this cloze test reached a moderate level of significance (rxx
=
+.56 and +.60).

The cloze test (see Figure
One) was administered to the subjects two times, separated by a period of two
weeks. During the second administration, a grammar-based test created by the
textbook designers was also given to the subjects (Richards, et al.
1993:168-172 ). The instructions were given to the students verbally and in
written form, both in English and Japanese, to facilitate a clear understanding
of the task. On each occasion, the cloze tests were collected after 20 minutes.
One significant variable that was different, however, is that the first test
was administered during a regular class session, while the other was given
during their midterm test. While this is certainly not standard practice when studying
the validity of a certain test design, allowing this procedure provided a venue
to find out how the cloze test would function under a variety of classroom
conditions.

Figure 1
The tests were graded by two scorers. The classroom teacher
graded the grammar-based tests using the key provided in the teacher's manual
(Richards, et al., 1993:189-190), while a native English speaking TEFL lecturer
graded the tests using the Semantically Acceptable Word (SEMAC) Method.
Typically, cloze tests can be graded using either the Exact Word or SEMAC
scoring method. In the exact word method, the cloze test blanks must be
completed with the exact word as was in the original text. Correct answers
receive 1 point, while any other response receives no points. SEMAC scoring
allows subjects to write answers which are grammatically and lexically
appropriate, although not the original words deleted from the text. For the
purposes of this experiment, it did not matter whether the exact word method or
SEMAC method was used, since they both correlate highly with each other (cf.
Owen et al. 1996; Hadley and Naaykens, in press). However, SEMAC scoring may
require a subjective judgment by the scorer. In order to avoid the cloze test
scores to be influenced by personal knowledge of the subjects, an evaluator
unacquainted with the subjects was chosen. Before grading the tests, the blind
evaluator was given a manuscript of the complete text, and instructed to allow
any words in the cloze that were either synonymous, lexically and grammatically
correct. Mistakes in historical accuracy, and minor spelling errors were
ignored. If it was difficult to ascertain whether an answer was acceptable or
not, it was scored as incorrect.
After the scores were totaled,
all of the data was analyzed using the VAR Grade for Windows 2.0 software
package (Revie 1997). The method of analysis was set up as a directional
one-tailed test which used the Pearson r correlation coefficient. The
cloze test scores were correlated with the scores of the grammar-based test,
and resulted in a correlation coefficient of +.72 (See Figure Two).

According to Brown
(1993:132-141), at p <.005, the critical level of significance for a
group of 22 is approximately +.51 (see also Fisher and Yates, 1963). This suggests
that the correlation between the grammar-based test and the selective-deletion
cloze may be quite significant.
It would be foolhardy if language teachers completely
changed their testing practices simply on the basis of this one study. However,
the findings of this research tends to suggest that selective-deletion cloze
tests could be used in place of or alongside of grammar-based language tests.
If careful consideration is given to the design of the selective-deletion
cloze, it has a high potential for reliability, even under less than desirable
testing conditions. It may be even more reliable than tests which our learners
are frequently exposed to: tests which have been thrown together late at night
by language teachers under the pressure of several deadlines. Conservative use
of the selective deletion cloze could provide teachers with a time-saving
method of testing their learners. Learners could be assured that, despite the
brevity of the test, their level of grammatical competence in the target
language is being, to a certain degree, reliably measured. Both teacher and
learners might then be liberated from the unnecessary amount of time normally
spent on testing, and more time could be dedicated to studying the target
language.
Conclusion
It is hoped that language teachers will begin experimenting
with cloze testing as a viable option to the traditional tests which are
normally administered in university language classrooms. Even if some are
uncertain about the reliability and validity of the selective deletion cloze
for use as a C-RT, it could still be used as a quick measure to see if the
learners are making progress in the course.
This study opens avenues for
future research. For example, to what extent would a selective-deletion cloze
correlate with a test measuring oral proficiency, or with a listening
proficiency test? If such scores did consistently correlate highly, would this
suggest that cloze tests can measure more than just grammatical competence in
second language learning? These are just a few of the many questions which
deserve further investigation as we continue our search for innovative and
effective methods of second language testing.
Alderson, J.C. (1979).
"The cloze procedure and proficiency in English as a second
language." TESOL Quarterly, 13, 219-226.
Bachman, L. (1990). Fundamental
Considerations in Language Testing. Oxford: Oxford University Press.
Bachman, L. (1982). "The
trait structure of cloze test scores. TESOL Quarterly, 16, 61- 70.
Bowen, J.D., Madsen H, and
Hilferty, A. (1985). TESOL: Techniques and Procedures. Rowley, MA:
Newbury House Publishers.
Brown, J.D. (1983). "A
closer look at the cloze: Validity and reliability." In J.W. Oller, Jr.
(Ed.) Issues in Language Testing Research. (p. 237-250). Rowley, MA:
Newbury House.
Brown, J.D. (1993). Understanding
Research in Second Language Learning. New York: Cambridge University Press.
Brown, J.D. and Yamashita S.
(Eds.) (1995). Language Testing in Japan. Tokyo: The Japan Association
for Language Teaching.
Fisher, R.A. and Yates, F.
(1963). Statistical Tables for Biological, Agricultural and Medical
Research. London: Longman.
Garland, V. (1996). 'Teaching
techniques and learning styles in Japanese universities'. Journal of Cross-Cultural
Studies. 6:73-96.
Hadley, G. and Naaykens, J.
(In Press). 'Testing the Test: Comparing SEMAC and Exact Word Scoring on
the Selective Deletion Cloze.' Korea TESOL Journal. 1:1.
Hanania, E. and Shikhani, M.
(1986). 'Interrelationships among three tests of language proficiency:
Standardized ESL, cloze and writing.' TESOL Quarterly, 20, 97- 109.
Ikeguchi, C. (1995)
"Cloze testing options for the classroom." in J.D. Brown and S.
Yamashita (Eds.) 1995. Language Testing in Japan (p. 166-178). Tokyo:
The Japan Association for Language Teaching.
Klein-Braley, C. and Raatz, U.
(1984). "A survey of research on the C-test." Language Testing, 1,
134-146.
Oller, J.W. Jr. (Ed.) (1983). Issues
in Language Testing Research. Rowley, MA: Newbury House.
Owen, C., Reeves, J. and
Widener, S. (1996). Testing. Birmingham, UK: University of Birmingham.
Revie, D. (1997). VAR Grade
for Windows 2.0: Grading Tools for Teachers. Thousand Oaks, CA: VARed
Software.
Richards, J., Hull, J., and
Proctor, S. (1993). Interchange 2: English for International Communication.
New York: Cambridge University Press.
Richards, J., Hull, J., and
Proctor, S. (1993). Interchange 2: English for International Communication:
Teacher's Manual. New York: Cambridge University Press.
Shanahan, T., Kamil, M.L., and
Tobin, A. (1982). 'Cloze as a measure of intersentiental comprehension.' Reading
Research Quarterly, 17, 229-225.
Taylor, W.L. (1953).
"Cloze procedure: A new tool for measuring readability." Journalism
Quarterly, 30, 415-433.
Wadden. P. (Ed.) (1992). A
Handbook for Teaching English at Japanese Colleges and Universities. New
York: Oxford University Press.
Weir, C. (1990). Communicative
Language Testing. Hemel Hempstead: Prentice Hall International Ltd.