When one starts thinking of famous names in the speech and natural language process world, names like Jurafsky, Joshi, or Jelenek come to mind. These names are pretty much universally recognized, and chances are, if you work in NLP you have either met them yourself, or work with someone who has. However, there is another well known name in the community, but chances are slim that anybody who works in NLP has actually met this person. At the same time almost anyone who has done work in parsing or semantic role labeling might be able to tell you an age they associate with his name.
Who is this person, and why would anybody know this? The answer comes from an artifact of NLP history. Some time in the early 90’s Mitch Marcus and others at the University of Pennsylvania obtained a million words of 1989 Wall Street Journal material. The first two sentences of this corpus are:
Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .
By 1992, this text had been hand labeled with part of speech tags, and syntactic parse structure. By 2005 Martha Palmer and Mitch Marcus had led an effort to add semantic information in the form of predicate argument structure on top of the existing treebank. More recently, the Conference of Natural Language Learning converted this treebank to dependency parse structures. What does this all mean? In short it means that nearly every English based statistical parser and semantic role labeler has been trained on this data, and with that training comes debugging, which inevitably leads many to read about the illustrious Pierre Vinken.
But who is Pierre Vinken? Searches on the web yield little additional information. Many of the search results are confused by his legacy in the Penn Treebank. Others are in Dutch. Some point to book he has written. The most insight comes from a brief paragraph from an article titled “THe HIstory and Heritage of Science Information Systems”, which strangely enough is hosted on a University of Pennsylvania library site.
Another non-traditional information pioneer I should mentionis Pierre Vinken. A neurosurgeon and editor, I met him in the 1950s whenthe Excerpta Medica Foundation was established. He converted this to a commercial enterprise which has become one of the world’s largest publishing conglomerates — Reed Elsevier.
Given the sparseness of information about him, and the fact that over 20 years has elapsed since the aforementioned sentences were published, I sometimes wonder if Mr. Vinken is still alive, and if he is, Idoes he knows of his role in the world of computational linguistics.