enjoying salad since 1978.

Sunday, May 11, 2008

John McCarthy has a good sense of humor

From an informal talk he gave at Stanford recently that was written up in Hacker News:
Q. Can computers know?

A. This is largely a question of definition. If a camera looked at a table, we could
say it "knows" that there are four containers of liquid on the table (which was true).

Q. Is there any definition of "know" in which computers cannot succeed?

A. Well, I suppose the biblical sense.

Q. Ha, well, what makes you think that?

A. They don't satisfy the necessary axioms (laughter)

Monday, April 21, 2008

What are you doing?

reading @biz out me as a Twitter employee.

Tuesday, April 15, 2008

curious what delicious is saying about something?

Here's a bookmarklet that checks the current URL you're visiting with del.icio.us.

Sunday, April 13, 2008

My first Thrift app

When you find yourself working on big systems, a useful technique is to decompose it into services. Moving from a big monolithic server to a bunch of separate services can be a big challenge but if you had foresight, many of your services were already decoupled in your system from day 1 even though you were deploying it monolithicly.

A common technique for decomposing services is using RPC. At Google, we used protocol buffers, which were briefly descibed in the Sawzall paper.

Basically, you describe your data and the interface that process the data in a language-independent format (a DDL, essentially) and use code generators to turn that DDL into set of objects in your target langauge that can create and send those structures over the wire. This makes it easy to write servers in one language and clients in another and the generated code deals with serialization.

I found that using a DDL to describe your code and services was really nice. When building a new service, you could simply reference your DDL in the design doc and have a meanginful discussion about the service without getting into the details of how it would be written until you had the semantics nailed down.

Facebook, as they were growing, decided to move to a homegrown binary RPC mechanism similar to protocol buffers called Thrift.

Let's say I wanted to write a simple service that would tell the client what time it was on the server. Here would be the DDL file describing both the data and the service plus a little extra to help out the generated code files.

# time.thrift
namespace java tserver.gen
namespace ruby TServer.Gen

typedef i64 Timestamp

service TimeServer {
  // Simply returns the current time.
  Timestamp time()
}

After running thrift --gen java --gen rb time.thrift on the file, I'd have an interface and server that I could implement in Java and a client that I could use in Ruby.

Based on the generated java code, I could write a short server in Scala:


package tserver

import tserver.gen._
import com.facebook.thrift.TException
import com.facebook.thrift.TProcessor
import com.facebook.thrift.TProcessorFactory
import com.facebook.thrift.protocol.TProtocol
import com.facebook.thrift.protocol.TProtocolFactory
import com.facebook.thrift.transport.TServerTransport
import com.facebook.thrift.transport.TServerSocket
import com.facebook.thrift.transport.TTransport
import com.facebook.thrift.transport.TTransportFactory
import com.facebook.thrift.transport.TTransportException
import com.facebook.thrift.server.TServer
import com.facebook.thrift.server.TThreadPoolServer
import com.facebook.thrift.protocol.TBinaryProtocol

/**
 * TimeServer.time returns the current time according to the server.
 */
class TimeServer extends TimeServer.Iface {
  override def time: Long = {
    val now = System.currentTimeMillis
    println("somebody just asked me what time it is: " + now)
    now
  }
}

object SimpleServer extends Application {
  try {
    val serverTransport = new TServerSocket(7911)
    val processor = new TimeServer.Processor(new TimeServer())
    val protFactory = new TBinaryProtocol.Factory(true, true)
    val server = new TThreadPoolServer(processor, serverTransport,
      protFactory)
    
    println("starting server")
    server.serve();     
  } catch { 
    case x: Exception => x.printStackTrace();
  }
}

(Geez, most of that space was taken up in my obsessive need to separate out all my imports. You can thank Google for that bit of OCD.)

The client is even shorter:


#!/usr/bin/ruby
$:.push('~/thrift/lib/rb/lib')
$:.push('../gen-rb')

require 'thrift/transport/tsocket'
require 'thrift/protocol/tbinaryprotocol'
require 'TimeServer'

transport = TBufferedTransport.new(TSocket.new("localhost", 7911))
protocol = TBinaryProtocol.new(transport)
client = TimeServer::Client.new(protocol)

transport.open()

puts "I wonder what time it is. Let's ask!"
puts client.time()

The ruby client took about 20ms to get an answer from the Scala server.

Thrift advantages:

  • Pipelined connections means you spend less time in connection setup/teardown and TCP likes longer-lived connections.
  • Asynchronous requests. Asynchronous replies would be nice too but would be trickier to use.
  • Binary representation is much more efficient to transmit and process than, say, XML.

Thrift drawbacks:

  • Integrating generated source into your build system can be tricky. Typically, you rarely have to regenerate your stubs but debugging generated code can be a huge pain.
  • It's Java server should move away from ServerSocket to NIO for increased throughput. That's probably not more than a week's work as long as the existing code isn't too tightly coupled.
  • Currently it doesn't build cleanly on the Mac. I did some work and got it working but I don't think it's used extensively on the Mac so if that's your primary platform, you should be prepared to send them patches from time to time.

If you're looking to move towards decoupled services, Thrift is worth a hard look.

Here's a tarball with my time server. It contains all the generated code as well as libthrift.jar and a Makefile to run the example server.

Sunday, April 06, 2008

GVN and gold

Two things popped up on my radar recently:

gvn, Google's wrappers around Subversion to help them work in their code-review heavy workflow. Even if you're not into code reviews, tkdiff integration is a nice improvement over colordiff or FileMerge.

gold, a new ELF linker built with giant binaries in mind. When you're building 900MB+ static binaries routinely, linking speed matters. gold claims to be at least 5x faster currently. Even if you have a massive distcc cluster, linking is still serial. One of gold's future design goals is to be concurrent and that would be pretty awesome. Imagine how fast I could link with a concurrent linker on my 8-core Mac Pro! Not that using an ELF linker under Leopard helps much since OS X uses Mach-O binaries but hey, there's always cross-compiling.

BTW, Ian Lance Taylor, the author of gold, has an excellent series of blog articles on linkers.

Friday, March 14, 2008

It's not just you

[downorjustyou] Since it's back now, check out al3x's neat new site.

Saturday, March 01, 2008

White People think Indian Food is Hot.

I just finished reading a secular biography of the Buddha recently and Stacy and I were chatting about the Buddha's magic Iddhi powers which he acquired as a highly skilled yogin. She said her first impressions of yoga came from watching Dhalsim's stretchy limbs in Street Fighter II. As a kid, she figured there must be some small kernel of truth to that. Equally stupid is that, as a kid, that's exactly where I got my first impressions of Indian food from. The manual says that Dhalsim got his "Yoga Flame" from the hot curry he ate. I couldn't imagine a food so hot that you felt like you were breathing fire. Wasn't that a bad meal? Wouldn't you ask for your money back? How could a place like that stay in business? These types of comics didn't help, either:

This clichéd comedic target was common in the magazines that adults read when I was a kid, the particular one I've always remembered had a guy running to the bathroom with flames coming out of his pants in a restaurant with a clearly Indian name. At 10, I recognized the dangers that the comic bravely warned me of. I knew that often political comics took aim at targets that the media didn't dare attack and figured this must be the same.

The first time somebody actually offered me Indian food, I was very nervous. It had a dangerously bright orange color but smelled and tasted great. Thankfully fire did not come out of my ass and set my pants on fire.

Sunday, February 17, 2008

JavaScript undefined vs. null

I was reading a modern, popular book on JavaScript last night and was disappointed by the handling of null. The author started out doing a lot of checking like:

if (foo == null) {   alert('foo is not set.'); }

Then told the reader that they could just remove the == null because javascript knows you mean "== null"

What?! This isn't why you don't check for equality with null. It's because foo == null doesn't even remotely do what most people think it does in this context.

It's a commonly held belief that uninitialized properties in JavaScript are set to null as default values. People believe this mostly for 2 reasons: 1) foo == null returns true if foo is undefined and 2) authors don't teach JavaScript properly.

A property, when it has no definition, is undefined. Put that way, it's pretty obvious.

null is an object. It's type is null. undefined is not an object, it's type is undefined. That part is less obvious.

The real trouble is that == does type coercion. === checks for both type and value and is the most intuitive form of equality in JavaScript, in my opinion.

I fired up a Jash console to hopefully clear things up for you.

>> window.hello
null
>> window.hello.something
window.hello has no properties
>> window.hello == null
true
>> window.hello === null
false
>> window.hello === undefined
true
>> if (window.hello) { alert('truthy'); } else { alert('falsy'); } // will print falsy.
null
>> window.hello == undefined
true
>> null == undefined
true // there's the rub, sir.
>> null
null
>> undefined
null
>> typeof null
object
>> typeof undefined
undefined

So people write

if (foo == null) {
   foo = "Joe";
}
When what they really mean is
if (!foo) {
   foo = "Joe";
}
If you find yourself with a lot of null checks in your JavaScript, set aside some time and watch Douglas Crockford's "The JavaScript Programming Language" talk on Yahoo Video. It's part 1 of a 3-part series of excellent and enlightening talks.

Sunday, February 10, 2008

Ubuntu on the Mac Pro

Last year I bought a Mac Pro to replace my aging Linux PC. It was a pretty significant upgrade, from a HyperThreaded 2.5Ghz Pentium 4 to a dual-processor, quad-core 3Ghz Xeon with 8G of RAM. Leopard is nice but I wanted to try and get Ubuntu running on it.

Here's the setup, I have 4 drives:

  • 750G - Leopard
  • 200G - Windows
  • 750G - Backups
  • 750G - To be Ubuntu
and wanted to install Ubuntu on the 4th drive.

Here are the basic steps I followed:

  1. Install refit
  2. Install from LiveCD (7.10)
  3. Select manual Manual Partition and from the advanced options, choose hd3,0. I setup my disk with one ext3 partition and 1 swap partition.
  4. After install, boot the linux disk from the refit menu
  5. Grub will get to stage 1.5, you will see the menu but be unable to start the system, it will complain about unknown partition type
  6. hit 'c'

Here's a transcript of my tinkering with grub to find the right drive with my initrd and kernel on it so I could boot my fancy Apple-branded Linux machine.

grub> root (hd<TAB>
showed a bunch of drives with various partitions but most importantly showed that hd0,0 had the Linux partition I was looking for. Even though with the ubuntu installer, we told it to use hd3, here we use hd0. the boot manager has shuffled the order of the drives. Telling the ubuntu installer to use hd0,0 will only result in an unbootable disk.
grub> root (hd0,0)
grub> kernel /boot/vmlinuz<TAB>
will show you some vmlinuz files. typically just the one ubuntu installed. pick it.
grub> kernel /boot/vmlinuz-2.6.22-14-generic root=/dev/sdd1 ro
grub> initrd /boot/initrd.img-2.6.22-14-generic
grub> boot

Now your system will boot. Edit your /boot/grub/menu.lst to point the root (hd0,0)

Now reboot and you will be in Ubuntu. Your wifi adapter won't work. Follow these steps to enable ndiswrapper to use the windows drivers.

I used a usb thumb drive to move the windows drivers zip file between my laptop and desktop.

Once you have networking up, don't forget to enable ssh so you can login in case you screw up your X config.

Edit /etc/apt/sources.list to include what you might want.

I have an ATI Radeon X1950 Pro video card. To get it working, I simply installed Envy which downloaded the right drivers and installed them for me.

But it didn't span monitors, they were just clones so I ran aticonfig (which Envy installed for me)

aticonfig --initial=dual-head --overlay-on=1

And we were good to go.

Banshee imported all my music from my iPod after converting it's database from an iTunes format it didn't recognize.

Using hibernate left the Mac's wifi adapter in a weird state. Rebooting it didn't help but booting into OS X and back into Linux corrected the wifi card.

The fan speed still isn't quite right, various fans will become louder and softer at odd intervals. I will try monkeying with lm-sensors later.

I have nice smooth fonts by running Emacs 23 from Alexandre Vassalotti's repo and the Inconsolata font for programming.

Overall, it's a great Linux machine and maybe one day it won't be so much trouble to get running. Hopefully my instructions will help you get it up and running.

Monday, November 19, 2007

Lift Tutorial

I've been working on lift quite a bit lately, fitting it in with some other projects and my obsessive need for football. (Both NFL and Madden)

If you're interested in trying it out, give the lift tutorial I wrote a try. Please drop us a note if you have any problems.

Sunday, October 21, 2007

Tomato Wifi!

While doing that podcast with Daniel, I noticed I was seeing 30% or so packet loss on my wifi network. After rebooting some equipment and keeping an eye on things, I decided to scrap my faulty Airport Express base stations and replace them with Linksys WRT54GLs running a custom opensource firmware called Tomato.

While the instructions are good, they assume you already have the router up and running. The first thing I realized was to throw the windows install cd away and just plug my mac into the switch on the back. The router will be listening on 192.168.1.1:80 with a username and password of admin/admin. The instructions will carry you from that point.

I have a 3 story house so I bought two routers and bridged them together with WDS. My only complaint is that Tomato's WDS is difficult to setup. The Airport Express admin utility did a wonderful job of easily bridging the two aiport express base stations together.

Except for one of the routers dying after only a few days, which Amazon was nice enough to replace for free with next-day shipping, things have been great on Tomato. 4 bars everywhere I go in the house.

[Hat tip to Nelson for the recommendation.]

Saturday, October 06, 2007

My interview with Daniel Brusilovsky

Daniel runs the Apple Universe podcast and interviewed me today. We had a fun chat, check it out!

Friday, September 28, 2007

What's next?

Werner Vogels on Michael Stonebraker's 50X challenge:

I like this challenge, given that 50X is likely to be able to make impact, where 2-4X in general can be easily compensated for by the next generation hardware. But something bugs me about the challenge and also about some of the demonstrations in the papers; 50X is still focused on scaling-up, just as many of the current database systems do, instead of scaling out, which is what the world really needs. The evidence in the paper is indeed about single box performance. This continuing N=1 thinking will never yield systems that can break through the current scalability limitations of enterprise software, regardless whether it runs 50 times faster or not.

I couldn't agree more. It's exciting to see so many people arguing for new kinds of databases, especially distributed databases. Why do I hate on relational databases? It's not the relational part, it's the part where they fail at 3am and wake me up. Joins are nice but I need my beauty sleep.

Friday, September 21, 2007

43 Folders makes the switch...

that I expect a number of others to follow. They just switched from a blogging system to a content management system. Lots of sites that have daily articles aren't blogs but until recently it was easier to have separate blogs and auxiliary sites than to setup a full-blown CMS. These sites are easy to distinguish: they'll have forums, wikis, or other miscellany that aren't simply chronologically ordered notes of interest. For instance, 43 Folders has a Forum, Job Board, and sub-sites like gtd and Inbox Zero.

I always assumed CMS's would get easier to setup and use and some of the more popular sites would switch to them. I fully expect this to become a growing trend and am glad to see traditional blogging tools start to get some competition for serious users.

Update: Earlier, I mentioned in this post that Matt pointed to a site that switched from WordPress to a CMS. I misread it. It was the other way around. Thanks, Jeremy. I hate it when you're right.