I just wanted to announce that there is now a meetup for people in the Front Range who are interested in all things related to Natural Language Processing. If you’re interested be sure to join the group, and if you have ideas for possible topics, feel free to share. Hopefully this grows into a venue where people can come to share their knowledge and discuss ideas relating to NLP.
I spent a good part of this evening (can you call it evening when it’s well past midnight?) trying to learn how to do a simple Java-based query using Amazon’s Product Advertising API. This entire exercise could have been finished within an hour had there been clear, concise documentation. Instead of one recipe for success, Amazon presents a dizzying array of approaches that lead to many dead ends.
In hopes of saving others from encountering these difficulties, I present you with my findings. For those of you that just want the recipe. Jump to the bottom.
Lesson 1 – Don’t Trust the Getting Started Guide
Admittedly the getting started guide gives a useful overview of the functionality and the approach to using the API. However, following the instructions step by step will only lead to frustration. The Java example for
Implementing a Product Advertising API Request DOES NOT WORK!. One would think that simply swapping in the proper access key where it says “YOUR ID” would be all that is needed, but upon execution I found it yields the following:
Exception in thread "main" com.sun.xml.internal.ws.client.ClientTransportException: The server sent HTTP status code 400: Bad Request
Thinking I had omitted something small, I looked into resolving this error only to discover:
Lesson 2 – APA API’s Authentication has changed
As of August 15, 2009, the API requires a signature mechanism for authentication. In addition to invalidating all the code from the getting started guide, it also adds additional poorly documented steps to the process. Amazon does provide some detail, but it’s probably not the quickest path to get up and running.
Lesson 3 – There are some semi-functional examples
After digging around in the documentation, I found these two examples: Product Advertising API Signed Requests Sample Code – Java REST/QUERY and Product Advertising API Signed Requests Sample Code – Java SOAP. Since everything I had tried up until this point had been SOAP-centric, I decided to try the SOAP example first. Upon the code into Eclipse, I found that this example was fraught with too many errors and dependencies, so I turned to the REST example.
The REST code was clear and mostly error free. The few errors I saw were caused by the absence of the Apache Commons Codec library. After downloading this jar and adding it to my classpath, the example code finally compiled. Unfortunately, when I went to run it, I was greeted with this exception:
Server returned HTTP response code: 403 for URL: http://ecs.amazonaws.com/onca/xml?....
Lesson 4 – Apache Commons Codec 1.3 and 1.4 are different
After crawling through the forums looking for answers, I found out that the REST example above depended on Apache Commons Codec version 1.3, whereas the version I downloaded was 1.4. It turns out the old version appended extra CRLF (\r\n) characters onto the authentication signature, and the workaround is to force the new codec to exhibit the same behavior. If you read the codec’s documentation, you’ll see that the default behavior comes when you set the line length to 76 characters. To fix the REST example change line 183 of SignatureRequestHelper to:
Base64 encoder = new Base64(76, new byte);
After doing all this, I finally got a small victory in the form of real output:
Map form example: Signed Request is "http://ecs.amazonaws.com/onca/xml?AWSAccessKeyId=...." Signed Title is "Harry Potter and the Deathly Hallows (Book 7)" String form example: Request is "http://ecs.amazonaws.com/onca/xml?AWSAccessKeyId=...." Title is "Harry Potter and the Deathly Hallows (Book 7)"
About a month ago I ssh-ed into my server to discover that my bash settings were not initializing. I thought it was just a little hiccup with our server configuration, but it turned out that our system was compromised. The hacker had logged in by exploiting an ssh overload bug. From there he/she/it felt compelled to replace our ssh binaries with his own. Fortunately our files were still in tact, and I was able to download a backup of all my files and databases before we wiped the system clean.
After the reinstall, I spent a couple of days reconfiguring my websites and getting the packages necessary to get my django based dialog annotation tool back up and running. Things were as good as they were before, we even decided to upgrade to having backup service. Then just a few days later, another hacker broke in. This time he changed all of our passwords and pretty much made the server unusable. Thanks to some heroics, my friend Ian was able to log in through some minimal access and recover our files.
Seeing as much of the data for my dissertation lived on this server, I decided I needed to break away from the server that has served me well for the past seven and get a more managed service. After doing some searching, I discovered Webfaction allowed for easy configuration for many web development frameworks including django. They also provide nice tools for importing your WordPress blogs from other sites, as well as full backup of my directory and databases. Plus it ended up being cheaper per month than my share of the old server.
And now I present the LeeBecker.com affiliate program!
Commercial aside, I have enjoyed using this service, and now feel like I have better control of all things I host remotely whether it be my research tools, blogs, or code repositories. This change also has encouraged me to fully redesign my personal website in WordPress. While part of me still thinks I can do all the CSS and HTML by hand, for my needs WordPress does enough, and it brings a level of cohesion to the site that I previously lacked.
In 10th grade, my U.S. history teacher taught our class about the American Civil War by playing the Ken Burns’ documentary, frequently pausing at the salient points to interject other notes of worth. This approach often lead to digressions about the ills of drunk driving or the dangers getting married before you’re thirty, and consequently it took us nearly a month to get through these videos. While the combination of Mr. Burns’ thought provoking storytelling combined with Mr. Davis’ tirades provided an education and entertaining exploration of the Civil War, my biggest takeaway came not from the history, but from the soundtrack.
Ashokan Farewell features prominently in the documentary, getting played 25 times throughout the span of the eleven-hour series. In one segment we heard a voice reading “Grant stood by me when I was crazy, and I stood by him when he was drunk, and now we stand by each other.” over this captivating fiddle-based waltz. And then the voice signed it “William Tecumseh Sherman”. It was at this point a classmate and I realized that you could read about anything to Ashokan Farewell, and it would be automatically imbued with incredible depth and gravitas. More importantly, this could be exploited to great comedic effect. We soon began amuse ourselves during Mr. Davis’ lectures by reciting nonsensical prose, imagining it set to this tune, and closing with the all important “William Tecumseh Sherman”. Shortly after, I discovered that Conan O’Brien and his staff found similar humor, producing a short sketch wherein Conan writes letters to his parents from summer camp.
This idea sat dormant for far too long, until late 2008 year when in an e-mail thread with some friends it was mentioned that reading a heartfelt letter in a raspy voice over “That song from The Civil War” equalled comedy gold. Within a day we were producing our own takes on this theme sharing them with each other via e-mail, but just as quickly as this idea resurfaced we allowed this idea to go back into hibernation… until today, when my friend Tom had reminded us of our silliness, and now I feel the time has come to share some of this with the rest of the world. I should note that the samples below were not the first Ashokan trailers produced, but are the ones that I made. Hopefully, my friends will upload their videos, so that I can update this post with some new links.
And now without further ado I present:
Today in my computational lexical semantics class we were discussing Talmy’s Toward a Cognitive Semantics, specifically about Lexicalization Patterns. At one point in the presentation our presenter spoke to how some verbs undergo “incorporation” wherein they integrate multiple semantic concepts in one verb usage. For motion events, the semantic concepts include:
- Figure – object moving or being located
- Ground – reference object
- Path – path followed or site occupied by figure
- Motion – presence of motion of locatedness in the event
so in a sentence like “The pencil rolled off the table” Figure = pencil, Ground = table, path = off. What is interesting about the verb roll has a manner co-event in addition to its motion one meaning that it manages to incorporate two semantic concepts into the same verb usage.
At the point this came up, our professor suggested that the somewhat-uniquely English verb “butter” also undergoes incorporation in a somewhat different way. In “I buttered the bread”, butter is both the Figure and the Manner and possibly even Motion, whereas many other languages would force the construction to look like “I put the butter on the bread” or “I spread the butter on the bread”. Another student countered that in the sentence “I buttered the bread with cream cheese” the Figure and the Manner are different, and that this semantic decomposition can be undergone through some other explanation. Most of this class thought the sentence seemed a little awkward as “I spread the bread with cream cheese” would be more natural to a native speaker. Promptly after this was brought up, yet another student went to Google, our repository for all possible combinations of language and searched for “butter the bread with cream cheese” and found no matche
And such brings me to this post. I thought it was a pity to have something so simple spoken in real life, yet not reflected in the vast index that is Google. To perform due diligence, I put together a more general query taking note to subtract the name of a band that would skew the results. Among the first several pages of results the phrase following “butter the bread with” was followed with some variation like “softened butter”, “teaspoon of butter”, “lowfat butter” or with some instrument like a “knife” which does not change the Figure. Though if you go far enough you start to find some different Figures including “mustard”, “coconut oil”, “garlic powder”, or “cheez-whiz”. Even further down I found an article on load alternation and semantic shift that stated one can only butter things with items that are butter like in consistency. It is funny how garlic powder still kind of fits that restriction. Given all this, it may be acceptable to say something like “butter the bread with cream cheese” after all. If nothing else, perhaps this posting will bump that phrase to at least one hit in the annals of search
Pabst Blue Ribbon (PBR) owes much of its resurgence over the past decade to the indie-rock loving, fixed-gear riding, and tight-jean wearing hipsters of America. Its modest price and old fashioned labeling has resonated well with a crowd looking to display its anti-consumerism and sense of irony at the same time. However, PBR is starting to lose some of its indie cred. I have seen places charging $3-4 for a can, and a quick google search for “Most Expensive PBR” finds stories of bars charging more than $5 for this beer.
Thus I offer my suggestion to Pabst. They should make a discount line of lower quality, possibly rejected beer called their “Honorable Mention” label, which could also go by the catchier PGR — Pabst Green Ribbon. This would allow them to turn a profit on any slips in quality control while simultaneously continuing to foster their image as the preferred anti-establishment beer.
I guess before I can start writing my second and third posts, I need to have a first post. Unlike SaraAndLee.com which was started as a place to keep guests informed of our wedding plans, and which shifted into a way of telling stories about our time in Indonesia, Ke-LeeBecker-an really has no initial purpose other than to drive my google pagerank score up. I imagine like most other blogs, this will be a collection of ramblings that few besides myself will appreciate.