Maxwell Terry

Finding Interesting People

Online dating is one of those things, like auctions and classified listings, that's really only valuable once the userbase reaches a certain size. It could be argued that the threshold is even higher for dating sites, since one will presumably only use the service until they've found someone. Unlike auction and classified sites, successful use of a dating site means abandoning the service. I've never seriously used an online personals site, but I've seen what's out there and have been struck by tasteless advertising, exploitation of users to lure in others, poor design, and overly complicated questionnaires. It would seem a waste of time to even fill out the forms.

I've investigated different ways of providing a valuable meeting service without having to build up a dedicated network first. I approached this idea as location-aware software, dabbled with Facebook, and eventually built on top of Twitter. I started out with an iPhone app that generated a "compatibility score" of people around you from data obtained from a fairly extensive set of questions. It would ping the server with one's current location, and then compare the GPS coordinates and each user's answers.

Wasn't right. Questionnaires are weird when the people that are filling them out are around. I felt awkward asking anyone to fill it out, and wasn't too compelled to do so myself, which is clearly a bad sign. Maybe we only like filling out the same form a few times, before it gets tirelessly old. Some have done them for dating sites, or on MySpace, Facebook, and others. It's one thing when you're expressing information about yourself for the hell of it, and another when looking for something in return.

Now since a lot of people already have profiles up somewhere online, I thought about doing a Facebook or OpenSocial app. Again I was able to come up with a technical solution that didn't work socially. I just didn't want what I'd written. If making something you (and by extension, others) want is "scratching an itch," this might be "scratching a scab." For whatever reason, I for one don't want to walk around broadcasting my Facebook profile with my current location. Around this time I first tried out Loopt. When they rolled out the Mix feature, they nailed the core functionality of what I had in mind, and true enough, it wasn't that compelling (in Maine anyway).

I think exploratory product development is graph traversal: you proceed down one path, discover it's flawed, go down another, see someone else has already realized a big part of it, reduce down what you considered important, and move in another direction. My work on this project was essentially depth-first, with an aim to "do matchmaking right."

Most approaches to matchmaking-style dating sites seem to be a kind of "compatibility test." This might be the optimal strategy for, say, finding an organ donar or maybe an employee, but doesn't seem well-suited to starting romantic relationships, or even friendships. Around this time I watched Stephen Pinker discuss the non-rational basis of love, and it's evolutionary function:

.

I'd been going about it all wrong. I was trying to do pattern matching when what I really wanted were interesting diffs. What I was looking for was luck, surprise, and randomness; those were the requirements to optimize for. When I was able to mentally articulate this, it gave me a new perspective on the project, and the design became more clear.

I wrote a (mobile-optimized) web app with a reduced set of questions and an algorithm based on complementary answers, timing, and a heavy helping of stochasm. The basic idea is that one fills in some simple fields and gets occasional messages linking to matched profiles. To implement this, I first had to choose between different communication protocols. Should contact be by location, phone, email, IM, Facebook, or Twitter?

Twitter seemed to fit best. The pure communication systems lacked much context. Just being recommended an address isn't that helpful; what you want is a way of checking someone out first, and then being able to communicate with them comes second. Plus, I realized that the best meeting site probably won't even have to be about "dating" per se. So I generalized it out to just finding interesting people on Twitter:

http://tweetmatcher.appjet.net

In gearing up for this application, I started writing a wrapper over the Twitter API for AppJet. While it wasn't used extensively in the app, it's somethat that I see being useful (to myself and others) in the future:

http://source.lib-status.appjet.net

When it came to the interface, I decided to make this very much an exercise in getting a form to work with smart, reusable text fields. Forms are a pain: validation is fairly arbitrary, and it's a real pain to write the logic (on the client and server!) each time you want to collect some information. So I wrote a number of jQuery extensions, which would tap into the Value.js library I wrote largely in parallel.

The first field in the application is "Username." When text is entered, the given user name is queried against Twitter to see if it's a real user. Right now there's no protection against spoofing someone else's name, in the future a password field should be included, or Twitter OAuth (which was opened after primary development) could be used. The implementation is a big hack: it just looks up a Twitter page and checks the source to see if the user exists. This could be done more elegantly by using the REST API.

Let's look more deeply the age extension to jQuery.

/**
 * Change background color on age.
 *
 * @params {object} [o] Options. Can pass on/off properties for activated/deactivated colors and min/max for youngest/oldest valid age.
 */
jQuery.fn.age = function(o) {

  o = o || {}

  o.on  = o.on  || "#90ee90"
  o.off = o.off || "#ff5d5d"
  o.min = o.min || 18
  o.max = o.max || 115 // http://en.wikipedia.org/wiki/Gertrude_Baines

  var agelength = 0

  $(this).keypress(function(e) {

    var key = $.which(e)

    if (!$.is.num(e)) return(false)

    key != 8 && agelength+1 <= 3 && agelength++
    key == 8 || agelength   != 3 && agelength--

    if (agelength <= 3) {

      var age = $("input[name=age]").val(),
          id  = Number(key != 8 ? Number(age+$.str(e)) :
                age.substring(0, age.length-1))

      if (id && (id < o.min || id > o.max))
        $("input[name=age]").bg($.start(o.off, "#"))
      else if (id >= o.min && id <= o.max)
        $("input[name=age]").bg($.start(o.on, "#"))
      else
        $("input[name=age]").bg("#fff")

    }

  })

}

The function optionally takes in an object specifying activated/deactivated color and minimum/maximum age, with defaults provided otherwise. [1] The keypress method gets called on each press of a key. It only takes numbers, and then displays whether the given age is invalid (show deactivated background color) or not (activated background). Here on and off basically mean valid and invalid.

Notice the bg jQuery extension method. Defined earlier in the source, this is a convenience allowing one to say

$("a").bg("fff")

instead of

$("a").css("background", "#fff")

The "Gender" and "Seeking" fields are implemented similarly to the age extension.

The "Location" field uses an API designed by Aza Raskin, using the MaxMind geolocation API. It's not perfect, sometimes it'll get caught on "Loading..." and can give inaccurate results (it thinks my iPhone is in New York). But it usually works for me, giving United States and Maine for Country and District respectively. When a result is determined, it's cached. So if "Country" is clicked, then "District," and then "Country" again, the country name is put in from saved state on the client, without having to pull the data again.

The "Interests" field basically collects keywords for comparison. Right now given terms are just put up against others' submissions, but these could be used to find people on Twitter by using the Search API. Right now hitting tab jumps down to the submit button. It could alternatively add a new field below the current one, which for now has to be done by clicking the "+" with the mouse. Ideally the interface should work completely with just the keyboard.

Let's turn to the server-side. While client-side validation is very mature, the server basically just spits back anything obviously wrong. The key parts of the server-side are the matching algorithm and caching mechanism. Matching is done by a function called get_match; (the get_ is just to denote that it gets executed on HTTP GET requests to the /match subdirectory; this is a built-in AppJet feature and is somewhat reminiscent of Sinatra). The match function computes the differences between other (suitable) users, and comes up with a score. This is kept internal, and depends on the prefs object, which gives the "points" to award for similarities (i.e. living in the same location awards 10 points right now; each shared interest 2 points). While this is still a "compatibility score," the idea is that it could be determined implicitly (scraping profile information, at the user's command, perhaps), and from this light the form is just a way of testing it out.

I didn't realize before working on this software that caching is the key to performance. I implemented three caching mechanisms throughout the program, for usernames, geolocation, and on the client-side. Retrieved usernames from Twitter are stored in the database, so typing "max" should be faster than doing, I don't know, "askjki." This also decreases requests needed to be made to Twitter, and since the data is fairly small, shouldn't be much of a burden to me, at least for now. The geolocation determined from IP is handled in a similar manner. And the client-side caching, mentioned earlier, just saves location results in an object on the client.

I'm probably most happy with the jQuery extensions and AppJet Twitter API wrapper I wrote. I feel that the interface components are valuable, my experience working on caching worthwhile, and the result achieves what I set out to do. I plan to continue working on this through the components, and see what more can come of it.

The code is available at http://source.tweetmatcher.appjet.net; feel free to clone it!

1. The default acceptable age range is 18 (legal adult in the United States) to 115 (the current age of the oldest living human). The minimum age should maybe default to 13 instead?

Tagged  essay  

Scraping a News.arc Feed With AppJet

Feeds are machine-readable serializations of web content. While the web was designed as a network of linked documents, feeds offer linked data. Most feeds provide frequently updated content, like weather forecasts, stock listings, status updates, and alerts of new blog posts. While this data could be published in any form, standard interchange formats are typically used so that other services don't have to use or write a custom parser.

The two dominant formats are XML and JSON: the former is a well-formed version of HTML (which gives stucture to web pages) and the latter a subset of the JavaScript programming language (which runs natively in all popular browsers). XML and JSON are widely supported, and libraries to convert the data to in-memory program structures are already written for most languages. (See http://en.wikipedia.org/wiki/Category:XML_parsers and http://json.org/.) We'll be working exclusively with JSON.

Sometimes a site doesn't provide a feed, or the official feed is found lacking. We can however scrape (i.e. programmatically extract) content from any accessible site. It's important to first make sure doing so doesn't violate the host's terms of service, and remember that the data might be blocked or otherwise unavailable to you at any given time. (I'd advise against starting a business around scraping.) But it can be very useful for getting content alerts from data that isn't already syndicated.

Let's look at how we would scrape updates from news.arc sites with AppJet. Included in Paul Graham's implementation of the Arc language, news.arc is a library deployed as Hacker News, Arc Forum, New Mogul, and Academic Hacker News, among others. Let's scrape the newest stories every hour, which we can use as an alert of forum activity.

Rather than simply grabbing the source and stripping out solely what we need, we'll build up a complete JSON feed of the site's dynamic content.

At the beginning of our AppJet code we'll include some metadata, including the version of the framework (required by AppJet) and an overview of what we're doing.

/* appjet:version 0.1 */
/** @fileoverview Scrape headlines from news.arc sites to JSON. */

We'll need to import a few libraries.

import("storage", "lib-json2", "lib-value")

The storage library can be used to persist objects on disk, lib-json2 is a server- and client-side copy of Douglas Crockford's json2.js, and lib-value is my own Value.js framework.

We'll include a table of the shortnames and addresses of existing deployments.

sites = {
  hn:  "http://news.ycombinator.com",
  arc: "http://arclanguage.org/forum",
  nm:  "http://newmogul.com",
  ahn: "http://www.cs.toronto.edu/~ad/news/"
}

And feature pages. (Some of these may only be available on Hacker News for the time being.)

pages = [
  "top",
  "newest",
  "threads",
  "newcomments",
  "leaders",
  "jobs",
  "best",
  "active",
  "bestcomments",
  "noobs",
  "classic"
]

It's generally rude to pull data from a host indiscriminately. To prevent excessive requests, we'll cap the maximum frequency at once a minute. This is user-driven: if no one requests data, it's not being pulled behind the scenes. [1]

cache = function(site) {

  if (!storage.site) storage.site = {}
  if (!storage.time) storage.time = {}

  if (!storage.site[site]) {
    storage.site[site] = wget(site)
    storage.time[site] = as.now()
  }

  as.minutely(function() {
    storage.site[site] = wget(site)
    storage.time[site] = as.now()
  }, storage.time[site])

}

The cache function takes in a site URL string. We first make sure the storage object has site and time properties. Then if the site hasn't been cached yet we retrieve it and set the current time, saving the site URL as a property of the time property. (This will let us know how stale the cache is.) Finally, the as.minutely function is a method of the as object in Value.js. A convenience variation of what's currently called as.cron (but should probably be expanded or renamed to as.every or as.often), as.minutely will only call the passed function (first argument) if the given Unix timestamp integer is from at least a minute ago.

posts = function(site, n, html) {

  cache(site)

  var them    = [],
      stories = storage.site[site].split("vote?").slice(1, n+1)

  for (var i=0, l=stories.length; i<l; i++) {

    var o   = {}

    o.by    = as.after(stories[i], "?id=", "\">")
    o.id    = 1 * as.after(stories[i], 4, "&")
    o.url   = as.after(stories[i], "href=\"", "\"")
    o.title = as.after(as.after(stories[i], "href=\"", "</a>"), ">")
    o.score = 1 * as.after(stories[i], "score_", " point")
                    .substring((o.id+"").length + 1)
    o.type  = "story"
    o.time  = trim(as.before(stories[i], "| ", "</a>"))

    if (o.url.substring(0,4) != "http") {
      o.url = site.split("/").slice(0,3).join("/")+"/"+o.url
    }

        if (!html) o = as.str(o)
    them.push(o)

  }

  return(html ? them : "["+them.join(",")+"]")

}

The meat of the program is the posts function, which accepts a site name, number of posts to include, and whether to return HTML (mostly for debugging purposes). The site address is given as a string, and should include the full address of the page (i.e. "http://news.ycombinator.com/newest"). The number of posts to display is currently capped at 30 [2]. The cache function is called, refreshing the cache if necessary. After splitting the code into chunks for each submission, the central for loop runs through each block, pulling out data and adding it as a value to a property in an object, then appending the new object to an array accumulating them. In the end the JSON (or HTML) is returned.

A main function is also provided for calling on each request. This could probably be better written as a block (i.e. (function() {})()), but AppJet doesn't seem to support them.

srv = function(site, n) {

  page.setMode("plain")

  site = request.params.site || sites.hn
  n = n || 30 //: only supports first page right now
  html = !!request.params.html

  if (!request.params.page) print(posts(site, n, html))

  else {

    if (is.within(pages, request.params.page)) {
      print(posts(site+"/"+request.params.page, n, html))
    }

  }

}

This can be called as

srv()

or optionally provided with a value for the site and number of posts, which otherwise default respectively to "http://news.ycombinator.com" and 30.

You can access this at http://news-arc-scrape.appjet.net/. It can be manipulated solely with URL parameters; try http://news-arc-scrape.appjet.net/?page=newest, http://news-arc-scrape.appjet.net/?page=newest&html=true, and http://news-arc-scrape.appjet.net/?site=http://arclanguage.org/forum.

This solution isn't perfect. Since the post date is displayed relatively, the absolute time can only be computed for more recent submissions, and must be done manually. (It would be preferable if the Unix time stamps that are used internally were exposed in the HTML or an official data feed.) Since the intended use here is to get the newest stories, the time information doesn't really matter. But while we're at it, we should derive a general solution to scraping news.arc sites. Future applications may need the time information.

Replies and text could be added by pulling the data from the story link. I'll leave this, as well as scraping comments, as an exercise for the reader. [3]

1. If this were preferred, we could use a cron job.

2. This is just because news pages usually have 30 elements. It could be expanded by recursively getting the fnid of the next page and retrieving it.

3. sockvotes, ip, and votes could also be included if one had admin access.

Tagged  essay  

Arduino Thermometer

We previously looked at an Arduino class for Python. Now let's use it to create an Arduino thermometer.

#!/usr/bin/env python

# temperature
# Maxwell Terry
# MIT license

from xml.etree.ElementTree import *
from urllib2 import *

def temperature(url, scale):
  return int(parse(urlopen(url)).findtext('temp_'+scale))

def leds(temp):
  if fahr >= 120:  return 5
  elif fahr >= 90: return 4
  elif fahr >= 60: return 3
  elif fahr >= 32: return 2
  elif fahr > 0:   return 1

The temperature function takes in a url and scale (C or F) and returns an integer value. This is dependent on a feed that gives the temperature after "temp_" of course, so isn't a particularly general solution. The leds function takes in a temperature and returns the number of LED lights to turn on. This is arbitrary, but seems appropriate when only 5 LEDs are available.

#!/usr/bin/env python

# Old Town
# Maxwell Terry
# MIT license

import Arduino

connection = Arduino("/dev/tty.usbserial-A5002tRQ")

url = "http://www.weather.gov/data/current_obs/KOLD.xml"
unit = "f"

connection.writeLine(leds(temperature(url, unit))

Here's the Arduino class in action. This is specifically how we'd get the temperature of Old Town, Maine. As Weather.gov lists feeds by location (rather than, say, zipcode), it's hard to automate access. To do so, we'd have to create a table pairing city and location names, or seek out another collection of local feeds.

Tagged  tutorial  

Interfacing Python and Arduino

Arduino is a physical computing platform, supported by an assortment of microcontroller boards, such as the Diecimila. Since it's an open specification, one could make their own board by hand if desired. Let's look at scripting an Arduino board with Python.

Python is a general-purpose high-level programming language, drawing from the object-oriented, imperative, and functional paradigms, and featuring strong duck typing. It provides built-in data structures like lists, tuples, and dictionaries, a classical object system, and very clean syntax. While functions are first-class citizens, they lack the flexibility of function literals in Lisp, JavaScript, and Ruby (lambdas can only include a single line, or else must be named). It's a very fine language, and particularly noteworthy for its extensive standard library.

Python can easily talk to an Arduino board over a serial interface. While serial devices can be read from and written to like files on Unix-like systems, the pySerial wrapper makes this easier across operating systems.

Let's try reading some data:

>>> import serial
>>> arduino = serial.Serial('/dev/tty.usbserial', 9600)
>>> while 1:
... arduino.readline()
'1 Hello world!\r\n'
'2 Hello world!\r\n'
'3 Hello world!\r\n'

And writing:

>>> ser.write('5')

Let's write an Arduino-specific class.

#!/usr/bin/env python

# Arduino
# Maxwell Terry
# MIT license

import sys, os, time, serial

class Arduino:

connection = None

 def __init__(self, path):
   self.connection = serial.Serial(path, 9600)
   pass

 def __del__(self):
   self.connection.close()
   pass

 def isOpen(self):
   return self.connection.isOpen()

 def readLine(self):
   if self.isOpen():
     if self.connection.inWaiting() > 0:
       return self.connection.readLine()
     else:
       print "---"
   else:
     return None
   pass

 def writeLine(self, line):
   if self.isOpen():
     for i in range(0, len(line)):
       self.connection.write(line[i])
       time.sleep(0.1)

 def flush(self):
   self.connection.flushInput()
   self.connection.flushOutput()

 def readPin(self, pin, line):
   self.flush()
   self.writeLine(line + str(pin))
   result = self.readLine()
   result = result[result.find("Value: ")+7:result.find("Value: ")+10]
   if result[2] == '-':
     result = result[:1]
   return result

 def setPin(self, pin, value):
   self.writeLine("S1" + str(pin) + str(value))
   return True

If you're already familiar with Python, this should be pretty straightforward. Feedback and comments welcome. I'm going to elaborate on the implementation, to serve as a general introduction to Python (assuming one has reasonable experience with other high-level languages).

First notice that with Python, nesting is denoted with indentation rather than braces (C-style languages) or parentheses (Lisp). We start out with the connection instance variable, which is defaulted to None. [1] Python requires one to explicitly pass around the current object, done with the self parameter. [2] __init__ and __del__ are special methods, called on initialization and deletion, allowing one to add polymorphism to built-in operators. [3] Here the methods open and close the serial connection.

isOpen tests whether a connection is open, which is used by readLine and writeLine to determine if the board can be read from or written to, and does so if possible. flush clears the input and output, while readPin and setPin respectively read and write to a given pin.

Obviously, this is only a minimal wrapper over pySerial. We don't need it, but these convenience methods make actions more clear and easier both to debug and extend.

 

1. Python's None is equivalent to JavaScript's null or Arc's nil.

2. Along with significant whitespace and crippled lambdas, this is a common criticism of Python. I personally support the sentiment, but not execution. It's a bit of a hack to just pass state around as the first parameter; that kind of thing seems deserving of syntactic support. I prefer the way JavaScript does it:

example = function() {

this.method = function(x) {
return x
}

this.property = 1

return this

}

Note that JavaScript's this isn't identical to Python's self; the value of JS's depends on the object it's called within (for instance, inside of this.method, this would refer to the context of that function, not example).

3. This kind of functionality debatably offsets Python's general lack of metaprogramming capabilities (i.e. macros), but only provides extensibility to a point.

Tagged  tutorial  

Kid Paint Animations

(download)

(download)

(download)

(download)

Kid Paint

(download)

Futura

via YouTube

Created for a design class. The music is Wolf Parade's "You Are A Runner And I Am My Father's Son" from Apologies to the Queen Mary (2005).

Tagged  video  

Radiohead - fitter happier (2005)

I assembled this video for a new media class at the University of Maine in the fall of '05.

Tagged  music   video