Bochum English Countability Lexicon

Countable or uncountable – empirical, computational, logical
A project supported by the Alexander-von-Humboldt-Foundation
Current research team

Project staff at Ruhr-Universität Bochum in June 2015. From left: Francis Jeffry Pelletier, Tibor Kiss, Halima Husic, Johanna Marie Poppek, Ron Hoffmann and Roman Nino Simunic

Most analyses of the count/mass distinction start with a small set of nouns, which serve to illustrate some salient properties. It strikes us as problematic that the survey of nouns often stops at this level, so that a variety of problematic issues cannot be detected.

We are aiming to provide a broader picture of the count/mass distinction and some deeper insights into the more problematic areas of this distinction at the same time. Our approach is guided by two basic assumptions:

  • The distinction between count and mass nouns is not actually a binary one.
  • The distinction between count and mass nouns cannot be analyzed at the level of the lemma (or lexeme), but must be analyzed at the level of individual senses of a lexeme.

We thus have started by extracting a large-set of (American English) nouns from the Open ANC corpus. The extracted nouns where matched with their senses in WordNet. For the current analysis, we have added up to four senses from WordNet to each lemma (future analyses will include more senses). The result was a large lexicon consisting of noun-sense pairs, i.e. lemmata with added senses from WordNet. An illustration is found here:

noun sense pairs

Based on Tobias Stadtfeld’s dissertation, we have developed a questionnaire so that annotators can answer questions about individual noun-sense pairs.

Research team in 2013

Project kick-off at Simon Fraser University, May 2013, research team and annotators
from left: Tobias Stadtfeld, Tibor Kiss, Mathieu Dovan, Lisa Shorten, Francis Jeffrey Pelletier, Meghan Jeffrey, Fiona Wilson

Most importantly, none of the questions were about the count/mass distinction directly. Instead, we wanted to know whether speakers with intuitions about syntactic and semantic contexts are able to implicitly classify noun-sense pairs.

An illustration of a question is provided below:

(1) Does inserting noun#x (where #x is the sense x of the noun) into NP1 VERB more NOUN(Sg) than NP2 lead to grammaticality, ungrammaticality, or is it not possible?

Possible answers here are yes, no, and not applicable (if e.g. X does not have a singular form). The pattern then was subjected to a second question:

(2) If you have answered the first question with yes, is the comparison based on number or a different measurement?

Let’s look at a few examples: car#1 (a motor vehicle with four wheels), fruitcake#1 (a whimsical person), fruitcake#2 (a rich cake containing dried fruit and nuts), lingerie#1 (women’s underwear and nightclothes), and whiskey#1 (a liquor made from fermented mash of grain).

??a.John bought more car#1 than Mike. (no, not applicable)
??b.John knows more fruitcake#1 than Mike. (no, not applicable)
??c.John ate more fruitcake#2 than Mike. (yes, not number)
??d.John bought more lingerie#1 than Mike. (yes, number)
??e.John bought more whiskey#1 than Mike. (yes, not number)

In total, the annotators had to answer six questions for each noun. You can find more information on the annotation process here and here.

The resulting 14,000 noun-sense pairs were independently annotated by at least two native speakers of Canadian English; reports from an inter-annotator agreement study can be found in Kiss, Pelletier and Stadtfeld (2014).

The next step was the classification based on the annotations. Basically, each possible answer makes up a possible class. We have restricted classes however to answer patterns that have been consistent. That means that we have only included noun sense pairs that have received unanimous answers in the first step (candidates that had not received unanimous answers were retained for adjudication and will be included in BECL 2.1).

This resulted in 18 fine-grained classes, which however can be summarized into four large classes. The grouping of the 18 fine-grained classes into four major classes is based on the answers of two annotation questions, whose answers show tendencies for a classification as a count or as a mass noun-sense. Since these questions are not mutually exclusive some noun senses tend to be classified as count and mass at the same time. On the other hand some noun senses can thus be classified as neither count nor mass. The four major classes suggest that we have noun senses that are (i) regular count, (ii) noun senses that are regular mass, (iii) noun senses that are both count and mass, and (iv) noun senses that are neither count nor mass.

A brief introduction to the current state is found here.

By annotating noun-sense pairs, BECL distinguishes between nouns with different senses that fall into the same countability class, and nouns with different senses that fall into different countability classes. We call the latter multiples as they are assigned multiple countability classes for a noun. This case should be clearly distinguished from those noun senses which are assigned a countability class from the third major class, i.e. both count and noun senses. Multiples are nouns whose different senses appear to be, for example, count and mass. For instance, classification#2 (a group of people or things arranged by class or category) is classified as a count noun sense whereas classification#3 (the basic cognitive process of arranging into classes or categories) is classified as a mass sense. An example of a noun sense from the third large class is guarantee#2 (an unconditional commitment that something will happen or that something is true). It received annotations according to which it tends to be classified as being both count and mass.

The current release of BECL is available under a non-disclosure agreement. For further information, please contact us or register with us and submit a request .

The project is supported by the Alexander-von-Humboldt-Foundation (AvH). Francis Jeffry Pelletier has received the Anneliese-Maier-Award from the AvH. The Anneliese-Maier-Award is granted to excellent international researchers to foster and facilitate the cooperation of researchers in the humanities and social sciences from Germany and abroad, to make research in this area in Germany more visible.

