Lucene Revolution 2012, Boston, May 9th. Topic: Big Data Search
Hadoop Summit 2012, San Jose, June 13-14. Topic: Semantic Zooming Interfaces for Big Data
And don’t be late…… Read the rest
Toby Ord of Giving What We Can has other interests, including ones that connect back to Solomonoff inference and algorithmic information theory. Specifically, Ord worked earlier on topics related to hypercomputation or, more simply put, the notion that there may be computational systems that exceed the capabilities of Turing Machines.
Turing Machines are abstract computers that can compute logical functions, but the question that has dominated theoretical computer science is what is computable and what is incomputable. The Kolmogorov Complexity of a string is the minimal specification needed to compute the string given a certain computational specification (a program). And the Kolmogorov Complexity is incomputable. Yet, a compact representation is a minimalist model that can, in turn, lead to optimal future prediction of the underlying generator.
Wouldn’t it be astonishing if there were, in fact, computational systems that exceeded the limits of computability? That’s what Ord’s work set-out to investigate, though there have been detractors.… Read the rest
For fun, I decided to try writing a partial post using Apple’s iBooks Author. The application runs on Mac OS X Lion and is available for free. It appears to be derivative of Keynote, which explains Apple’s rapid development of the authoring tool.
There are some limitations, though. I couldn’t embed equations from Word for Mac 2011 without converting them into images. It also only publishes to iBookstore, although you can export to PDF (as below). There are few PDF export options, however, and the metadata and labeling includes Apple logos.
Tearing apart the .iba format via unzip showed a collection of .jpg and .tiff images, a binary color array, and an .xml specification of the project. Fairly simple, but not including the compiled .epub file that iBookstore generally takes.
Total elapsed time: 1 hour (including download/installation). With improvements to the software and with more experience, that should be halved.
Kalev Leetaru at UIUC highlights the use of sentiment analysis to retrospectively predict the Arab Spring using Big Data in this paper. Dr. Leetaru took English transcriptions of Egyptian press sources and looked at aggregate measures of positive and negative sentiment terminology. Sentiment terminology is fairly simple in this case, consisting of positive and negative adjectives primarily, but could be more discriminating by checking for negative modifiers (“not happy,” “less than happy,” etc.). Leetaru points out some of the other follies that can arise from semi-intelligent broad measures like this one applied too liberally:
It is important to note that computer–based tone scores capture only the overall language used in a news article, which is a combination of both factual events and their framing by the reporter. A classic example of this is a college football game: the hometown papers of both teams will report the same facts about the game, but the winning team’s paper will likely cast the game as a positive outcome, while the losing team’s paper will have a more negative take on the game, yielding insight into their respective views towards it.
This is an old issue in computational linguistics. In the “pragmatics” of automatic machine translation, for example, the classic example is how do you translate fighters in a rebellion. They could be anything from “terrorists” to “freedom fighters,” depending on the perspective of the translator and the original writer.
In Leetaru’s work, the end result was an unusually high churn of negative-going sentiment as the events of the Egyptian revolution unfolded.
But is it repeatable or generalizable? I’m skeptical. The rise of social media, enhanced government suppression of the media, spamming, disinformation, rapid technological change, distributed availability of technology, and the evolving government understanding of social dynamics can all significantly smear-out the priors associated with the positive signal relative to the indeterminacy of the messaging.… Read the rest