All posts by lee

The Ashokan Farewell

In 10th grade, my U.S. history teacher taught our class about the  American Civil War by playing the Ken Burns’ documentary, frequently pausing at the salient points to interject other notes of worth.  This approach often lead to digressions about the ills of drunk driving or the dangers getting married before you’re thirty, and consequently it took us nearly a month to get through these videos.  While the combination of Mr. Burns’ thought provoking storytelling combined with Mr. Davis’ tirades provided an education and entertaining exploration of the Civil War, my biggest takeaway came not from the history, but from the soundtrack.

Ashokan Farewell features prominently in the documentary, getting played 25 times throughout the span of the eleven-hour series.  In one segment we heard a voice reading “Grant stood by me when I was crazy, and I stood by him when he was drunk, and now we stand by each other.” over this captivating fiddle-based waltz.  And then the voice signed it “William Tecumseh Sherman”.  It was at this point a classmate and I realized that you could read about anything to Ashokan Farewell, and it would be automatically imbued with incredible depth and gravitas.  More importantly, this could be exploited to great comedic effect.  We soon began amuse ourselves during Mr. Davis’ lectures by reciting nonsensical prose, imagining it set to this tune, and closing with the all important “William Tecumseh Sherman”.  Shortly after, I discovered that Conan O’Brien and his staff found similar humor, producing a short sketch wherein Conan writes letters to his parents from summer camp.

This idea sat dormant for far too long, until late 2008 year when in an e-mail thread with some friends it was mentioned that reading a heartfelt letter in a raspy voice over “That song from The Civil War” equalled comedy gold.  Within a day we were producing our own takes on this theme sharing them with each other via e-mail, but just as quickly as this idea resurfaced we allowed this idea to go back into hibernation… until today, when my friend Tom had reminded us of our silliness, and now I feel the time has come to share some of this with the rest of the world.  I should note that the samples below were not the first Ashokan trailers produced, but are the ones that I made.  Hopefully, my friends will upload their videos, so that I can update this post with some new links.

And now without further ado I present:

Ashokan Bloodsport

Ashokan Bigalow

Butter the bread with cream cheese

Today in my computational lexical semantics class we were discussing Talmy’s Toward a Cognitive Semantics, specifically about Lexicalization Patterns.  At one point in the presentation our presenter spoke to how some verbs undergo “incorporation” wherein they integrate multiple semantic concepts in one verb usage.  For motion events, the semantic concepts include:

  • Figure – object moving or being located
  • Ground – reference object
  • Path – path followed or site occupied by figure
  • Motion – presence of motion of locatedness in the event

so in a sentence like “The pencil rolled off the table” Figure = pencil, Ground = table, path = off.  What is interesting about the verb roll has a manner co-event in addition to its motion one meaning that it manages to incorporate two semantic concepts into the same verb usage.

At the point this came up, our professor suggested that the somewhat-uniquely English verb “butter” also undergoes incorporation in a somewhat different way.  In “I buttered the bread”, butter is both the Figure and the Manner and possibly even Motion, whereas many other languages would force the construction to look like “I put the butter on the bread” or “I spread the butter on the bread”.  Another student countered that in the sentence “I buttered the bread with cream cheese” the Figure and the Manner are different, and that this semantic decomposition can be undergone through some other explanation.  Most of this class thought the sentence seemed a little awkward as “I spread the bread with cream cheese” would be more natural to a native speaker.  Promptly after this was brought up, yet another student went to Google, our repository for all possible combinations of language and searched for “butter the bread with cream cheese” and found no matche

And such brings me to this post.  I thought it was a pity to have something so simple spoken in real life, yet not  reflected in the vast index that is Google.  To perform due diligence, I put together a more general query taking note to subtract the name of a band that would skew the results.  Among the first several pages of results the phrase following “butter the bread with” was followed with some variation like “softened butter”, “teaspoon of butter”, “lowfat butter” or with some instrument like a “knife” which does not change the Figure.  Though if you go far enough you start to find some different Figures including “mustard”, “coconut oil”, “garlic powder”, or “cheez-whiz”.  Even further down I found an article on load alternation and semantic shift that stated one can only butter things with items that are butter like in consistency. It is funny how garlic powder still kind of fits that restriction.  Given all this, it may be acceptable to say something like “butter the bread with cream cheese” after all.  If nothing else, perhaps this posting will bump that phrase to at least one hit in the annals of search

Merantau: a Silat player’s review

After several months of waiting, I finally saw Merantau[1], an Indonesian language, martial arts flick.  When watching the trailers it was billed as kind  of an Ong Bak, but with Muay Thai swapped for Pencak Silat.  Merantau more than delivered on this premise.

In terms of story, Merantau does very little to differentiate itself.  A boy from the country goes to the city and encounters evil men doing evil things to the vulnerable, and through gifts of well placed punches and kicks, he remedies the situation. Almost stereotypically, the main villain is a white guy with a bad temper who mistreats the women he plans to sell into prostitution.  Additionally, blood and gore were overused to little effect.[2]

Nonetheless…While the plot is overly familiar and the acting is not altogether memorable, director Gareth Evans breaks ground in a much more interesting manner.  This film marks the first time I’ve seen real Pencak Silat in a movie.   Most Indonesian movies seem to forget the beauty and richness of their native arts, and typically present fight scenes using choreography that looks more like karate or tae kwon do.  While those are respectable arts in their own right, they are not Silat, and it shows.  Though Merantau has a good number of Hollywood (Bali-wood?) touches with gigantic leaping kicks and improbable knockouts, it manages to stay grounded in its Silat core.   From the opening jurus scene to the finale, Silat practitioners will recognize many of the locks, traps, and manipulations that make the art so deadly and effective.  I took great pleasure in seeing foot traps and puter kepalas [3] subtly woven into the choreography.  Moreover Iko Uwais does an incredible job of making us believe Yuda, the protagonist, is truly a pendekar[4] of Silat Minangkabau (a style from Western Sumatra well known whose movements are well known throughout Indonesia).  Most importantly, Evans’ cinematography allows the viewer the fully appreciate the timing and fluidity of this choreography without getting overwhelmed with strange camera angles and slow motion effects.  For the most part its pure, enjoyable, unadulterated hand-to-hand combat.

I am hoping this movie will start an upward trend of Pencak Silat in movies.  If all goes well, someone will one day make an epic set in Dutch colonial times featuring not just Silat Minang, but also Silat Madura, Silat Cimande, Silat Mataram, and even Chinese Kuntao,

[1] The word “merantau” is roughly translated as “to wander about” or “to go abroad”  It plays a central role in Minangkabau culture, as inheritance is passed down matrilinearly (i.e. from woman to woman) and a man must go out into the world and earn his keep before returning to his homeland in Western Sumatra.  I believe this practice explains the multitude of Padang-style eateries across the archipelago.

[2]This review reminds me of why sentiment analysis is such a hard proposition.  I managed to state both negative and positive aspects in the same review in a manner that makes it nearly impossible for any algorithms to tease apart in a principled manner.  This little meta-blurb at the bottom probably doesn’t help as well.

[3] Puter = turn, Kepala = head

[4] Pendekar = master of martial arts

Named Entity Recognition

Hi Wayne,
I’ve been trying to figure out an appropriate information model (most likely XML-based) to correspond to my annotation schema, as I have started to form my notions of how this should look to allow for future expansion, ease of use when annotating, and accessibility for feature extraction, I’m kind of rethinking how annotation of the thematic roles should look.
In the annotations we’ve done so far, we’ve been labeling sections of the text with labels like agent, patient, theme, etc.  However, in past discussions we’ve both come to the conclusion that this should be produced by a statistical semantic role labeler with arguments mapped to something like VerbNet classes.
To think through the possible annotation, I’ve started playing around a bit with ASSERT, and have come to the realization that its spans are no where as detailed as how I have been annotating, and I may need to back off things a bit.
Take this example:
S0: <new_speaker_male_1> we’ve been learning a little bit about <um> how electricity conducts through a battery to a light bulb
T9: You said you’ve been learning about electricity and lightbulbs
T10: Tell me more about that.
If I run the first statement through ASSERT I get the following parses:
>new_speaker_male_1< [ARG0 we] ‘ve been [TARGET learning ] [ARG1 a little bit about >um<] how electricity conducts through a battery to a light bulb
>new_speaker_male_1< we ‘ve been learning a little bit about >um< [ARGM-MNR how] [ARG1 electricity] [TARGET conducts ] through a battery to a light bulb
Notice first of all how the entities marked by the tutor do not correspond exactly to the any of the arguments for any of the predicates.  How would a link annotation look in this instance for the marking act at turn T9?  Would it be best to advise the annotators to select the argument (role) closest to the one present, and if none fit, make no link?  Or would something else be more appropriate.
Secondly, do you think it would be better to annotate the links directly to the PropBank argument structure, or would it be better to see if we could translate the arguments into a VerbNet role first and then do the linking?
I know these are kind of mundane details, but I think I need to work through them to really finalize on the annotation approach.
Thanks,
Lee

When one starts thinking of famous names in the speech and natural language process world, names like Jurafsky, Joshi, or Jelenek come to mind.  These names are pretty much universally recognized, and chances are, if you work in NLP you have either met them yourself, or work with someone who has.  However, there is another well known name in the community, but chances are slim that anybody who works in NLP has actually met this person.  At the same time almost anyone who has done work in parsing or semantic role labeling might be able to tell you an age they associate with his name.

Who is this person, and why would anybody know this?  The answer comes from an artifact of NLP history. Some time in the early 90’s Mitch Marcus and others at the University of Pennsylvania obtained a million words of 1989 Wall Street Journal material. The first two sentences of this corpus are:
Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group .

By 1992, this text had been hand labeled with part of speech tags, and syntactic parse structure. By 2005 Martha Palmer and Mitch Marcus had led an effort to add semantic information in the form of predicate argument structure on top of the existing treebank. More recently, the Conference of Natural Language Learning converted this treebank to dependency parse structures. What does this all mean? In short it means that nearly every English based statistical parser and semantic role labeler has been trained on this data, and with that training comes debugging, which inevitably leads many to read about the illustrious Pierre Vinken.

But who is Pierre Vinken? Searches on the web yield little additional information. Many of the search results are confused by his legacy in the Penn Treebank. Others are in Dutch. Some point to book he has written. The most insight comes from a brief paragraph from an article titled “THe HIstory and Heritage of Science Information Systems”, which strangely enough is hosted on a University of Pennsylvania library site.

Another non-traditional information pioneer I should mentionis Pierre Vinken. A neurosurgeon and editor, I met him in the 1950s whenthe Excerpta Medica Foundation was established. He converted this to a commercial enterprise which has become one of the world’s largest publishing conglomerates — Reed Elsevier.

Given the sparseness of  information about him, and the fact that over 20 years has elapsed since the aforementioned sentences were published, I sometimes wonder if Mr. Vinken is still alive, and if he is, Idoes he knows of his role in the world of computational linguistics.

Presenting Pabst Honorable Mention

Pabst Blue Ribbon (PBR) owes much of its resurgence over the past decade to the indie-rock loving, fixed-gear riding, and tight-jean wearing hipsters of America.  Its modest price and old fashioned labeling has resonated well with a crowd looking to display its anti-consumerism and sense of irony at the same time.  However, PBR is starting to lose some of its indie cred.  I have seen places charging $3-4 for a can, and a quick google search for “Most Expensive PBR” finds stories of bars charging more than $5 for this beer.

Thus I offer my suggestion to Pabst.  They should make a discount line of lower quality, possibly rejected beer called their “Honorable Mention” label, which could also go by the catchier PGR — Pabst Green Ribbon.  This would allow them to turn a profit on any slips in quality control while simultaneously continuing to foster their image as the preferred anti-establishment beer.

Silat Demo at CSU

It’s become kind of an annual event for Inner Wave Pencak Silat to do a demonstration at Colorado State University’s World Unity Fair. As with all of our demos, the level of planning is fairly minimal. When I first started learning Silat under Daniel, there were only about 3 students. I learned on the day of the day of the demo not only the form I would be doing, but the form itself. This year I at least was given over a week’s warning. Perhaps, Daniel is losing his Indonesian touch.

Not that there weren’t our share of surprises. This year, with an hour before it was time to perform, we found out we had to sing — in Javanese. Surprisingly, for how tone deaf we as a group are we managed to pull it off. Our Silat, not unexpectedly was much better. John and Brandon did some nice staff work. Laura and Jason did a cool bit of choreography using the Clurit (aka Maduran sickle). Steve and I, performed an impromptu rain dance using whips, and of course Daniel pulled out some motion none of us have ever seen before.

Obligatory first post

I guess before I can start writing my second and third posts, I need to have a first post.  Unlike SaraAndLee.com which was started as a place to keep guests informed of our wedding plans, and which shifted into a way of telling stories about our time in Indonesia, Ke-LeeBecker-an really has no initial purpose other than to drive my google pagerank score up.  I imagine like most other blogs, this will be a collection of ramblings that few besides myself will appreciate.

Enjoy!