Thursday, December 24, 2009

JetBrains TeamCity

JetBrains have released the latest version of their TeamCity build-and-test automation software.

This is starting to become a crowded field, with open-source offerings such as Hudson, BuildBot, Continuum, CruiseControl, etc., and commercial systems such as Atlassian's Bamboo. TeamCity has been around for a while, and their offering is both low cost and powerful, and the JetBrains team have a reputation as providers of solid tools.

Among the things I like about the latest TeamCity software are the integration with Amazon's EC2 cloud computing infrastructure, since it makes no sense to build your own machine farm in this day and age, and the Java API for extensibility, since no system like this is ever deployed without a certain amount of customization.

Among the things I don't like about TeamCity are their lack of built-in Git support, and their "pre-tested commit" feature.

The whole pre-tested commit thing is a bit of philosophy. It's extremely seductive, and I can completely understand why people think they want it. But you don't want it, really. Laura Wingerd does a great job of explaining why in her book on Perforce, which I don't happen to have handy or I'd give you the quote verbatim.

But the nub of it is that you want to remove barriers for commit, not add them. Your overall philosphy should be:

  • Make building and testing easy

  • Make commit easy and encourage it

  • Depend on your version control system and your build tracking system to help you monitor and understand test failures and regressions, not to try to prevent them.


There are enough barriers to commit as it is; any continuous integration system should have as its first and most important requirement: make it easy to Check In Early, Check In Often.

At work, we're still fairly happy with our home-grown continuous integration system (the Build Farm), but I can see the writing on the wall: between turnkey continuous integration systems, and cloud computing, the build-and-test experience of the typical commercial software product development staff in 2010 is going to be far different than it was 5-10 years ago.

Coders at Work: Donald Knuth

At last I've reached the end of Peter Seibel's Coders at Work; chapter 15 contains his interview with Donald Knuth.

Knuth has always been more of an educator and an author than a coder, but he has some serious coding chops as well, having designed and implemented Tex and MetaFont as well as inventing the concept of literate programming in order to do so. Of course, he's most well known for The Art of Computer Programming, but though I have those books on my shelf, it's interesting to me that my favorite Knuth book is actually Concrete Mathematics.

Seibel does his best to stick to his standard formula when interviewing Knuth: When did you learn to program, what was the hardest bug you ever fixed, how do you go about finding good programmers, how do you read code, etc. But I also enjoyed the parts of the interview where they stray from those topics into other areas.

For example, Knuth defends Literate Programming against one of its most common criticisms, that it's too wordy and ends up being redundant and repetitive:

The first rule of writing is to understand your audience -- the better you know your reader the better you can write. The second rule, for technical writing, is to say everything twice in complementary ways so that the person who's reading it has a chance to put the ideas into his or her brain in ways that reinforce each other.

So in technical writing usually there's redundancy. Things are said both formally and informally. Or you give a definition and then you say, "Therefore, such and such is true," which you can only understand if you've understood the definition.
...
So literate programming is based on this idea that the best way to communicate is to say things both informally and formally that are related.

I enjoy reading literate programming; I enjoy writing programs and their associated specifications/documentation/comments in as literate a fashion as I can accomplish. I think that Knuth's defense of literate programming holds water.

Another part of the interview that I found fascinating was this spirited attack on reusability, whether it comes from reusable subroutine libraries, object-oriented frameworks, or whatever:

People have this strange idea that we want to write our programs as worlds unto themselves so that everybody else can just set up a few parameters and our program will do it for them. So there'll be a few programmers in the world who write the libraries, and then there are people who write the user manuals for these libraries, and then there are people who apply these libraries and that's it.

The problem is that coding isn't fun if all you can do is call things out of a library, if you can't write the library yourself. If the job of coding is just to be finding the right combination of parameters, that does fairly obvious things, then who'd want to go into that as a career?

There's this overemphasis on reusable software where you never get to open up the box and see what's inside the box. It's nice to have these black boxes but, almost always, if you can look inside the box you can improve it and make it work better once you know what's inside the box. Instead people make these closed wrappers around everything and present the closure to the programmers of the world, and the programmers of the world aren't allowed to diddle with that. All they're able to do is assemble the parts.

I think this is Knuth-the-educator speaking. He doesn't want to see Computer Science degenerate into some sort of clerical and monotonous assembly task; he wants each successive generation of programmers to be standing on the shoulders of the ones before them, understanding what they did and why, and inventing the next version of programs.

Knuth returns to this topic later in the interview; it's clearly of tremendous importance to him:

[T]here's the change that I'm really worried about: that the way a lot of programming goes today isn't any fun because it's just plugging in magic incantations -- combine somebody else's software and start it up. It doesn't have much creativity. I'm worried that it's becoming too boring because you don't have a chance to do anything much new. Your kick comes out of seeing fun results coming out of the machine, but not the kind of kick that I always got by creating something new.


As an educator, Knuth realizes that this is an extremely challenging task, because you need to understand that students of computer science need to start at the beginning and learn the basics, not just assume the presence of vast libraries of existing code and go from there:

[M]y take on it is this: take a scientist in any field. The scientist gets older and says, "Oh, yes, some of the things that I've been doing have a really great payoff and other things, I'm not using anymore. I'm not going to have my students waste time on the stuff that doesn't make giant steps. I'm not going to talk about low-level stuff at all. These theoretical concepts are really so powerful -- that's the whole story. Forget about how I got to this point."

I think that's a fundamental error made by scientists in every field. They don't realize that when you're learning something you've got to see something at all levels. You've got to see the floor before you build the ceiling. That all goes into the brain and gets shoved down to the point where the older people forget that they needed it.


As I've said many times, I think that there is great potential for Open Source in education, for it provides a large body of existing software that is available for study, critique, and improvement.

As I've come to the end of the book, I can't close without including the most startling paragraph in the entire book, the one which must have made Seibel's jaw, and the jaw of every reader, drop to the ground with a thundering "thwack", as Knuth singles out for celebration and praise the single most abhorred and condemned feature that Computer Science has produced in its first half-century of existence:

To me one of the most important revolutions in programming languages was the use of pointers in the C language. When you have nontrivial data structures, you often need one part of the structure to point to another part, and people played around with different ways to put that into a higher-level language. Tony Hoare, for example, had a pretty nice clean system but the thing that the C language added -- which at first I thought was a big mistake and then it turned out I loved it -- was that when x is a pointer and then you say, x + 1, that doesn't mean one more byte after x but it means one more node after x, depending on what x points to: if it points to a big node, x + 1 jumps by a large amount; if x points to a small thing, x + 1 just moves a little. That, to me, is one of the most amazing improvements in notation.

And with that, Knuth joins Joel Spolsky and doubles the number of people on the planet who celebrate the C pointer feature.

I really enjoyed Coders at Work, as you can tell by the depth to which I worked through it. In the end, it probably wasn't worth this much time, but I certainly found lots of food for thought in every chapter. If you're at all interested in coding, and in the people who do and enjoy it, you'll probably find this book interesting, too.

Tuesday, December 22, 2009

Language subsetting

On the Stack Overflow podcast recently, Joel and Jeff were discussing the topic of: when and why do programmers intentionally restrict themselves to using only a subset of the functionality available to them in their programming language.

At first it seems like an odd behavior: if you have features available in your programming language, why would you not use them?

I think there are (at least) 4 reasons: 3 valid reasons and 1 bad reason. Here's my list:


  • Complexity. C++ is a perfect example of this. C++ is such an enormous language, with so many features and possibilities and variations on ways of getting things done, you'll end up creating incomprehensible, illegible, unmaintainable programs. So pretty much every C++ organization I've ever worked with has decided to restrict themselves to some simpler subset of the language's features

  • Vendor portability. SQL is a perfect example of this. There are many different implementations of SQL: Oracle, DB2, SQL Server, Sybase, MySQL, Postgres, Ingres, Derby, etc., and each of them has implemented a different subset of SQL's features, often with slightly different semantics. If you are writing a database-related application, you often find yourself wanting to be careful about particular database behaviors that you are using, so that you can "port" your application from one database to another without too many problems.

  • Version compatibility. Java is a perfect example of this. Over the years, there have been multiple versions of Java, and later releases have introduced many new features. But if you write an application against a new release of Java, using the new release's features, your application probably won't run in an older release of Java. So if you are hoping for your application to be widely used, you are reluctant to use those latest features until they have found widespread deployment in the Java community. Currently, it's my sense of things that JDK 1.4 is still widely used, although most Java environments are now moving to JDK 1.5. JDK 1.6 is commonly used, but it's still somewhat surprising when you encounter a major Java deployment environment (application server, etc.) which has already moved to the JDK 1.6 level of support. So most large Java applications are only now moving from JDK 1.4 to JDK 1.5 as their base level of support. The current version of Ant, for example, still supports JDK 1.2!

  • Unfamiliarity. This is the bad reason for restricting yourself to a subset of your programming language's capabilities. Modern programming languages have an astounding number of features, and it can take a while to learn all these different features, and how to use them effectively. So many programmers, perhaps unconsciously, find themselves reluctant to use certain features: "yeah, I saw that in the manual, but I didn't understand what it was or how to use it, so I'm not using that feature". This is a shame: you owe it to yourself, each time you encounter such a situation, to seize the opportunity to learn something new, and stop and take the time to figure out what this feature is and how it works.



So, anyway, there you go, Jeff and Joel: ask a question (on your podcast) and people will give you their answers!

Friday, December 18, 2009

p4 shelve

Perforce version 2009.2 is now out in beta, and it contains a very interesting new feature: shelving.

Laura Wingerd gave a fairly high-level introduction to shelving in her blog post:

You can cache your modified files on the server, without having to check them in as a versioned change. For teams, shelving makes code reviews and code handoffs possible. Individuals can use shelving for multi-tasking, code shunting, incremental snapshots, and safety nets.


The new commands are p4 shelve and p4 unshelve, and the blog post explains a bit of the workflow involved in using these commands to accomplish the various new scenarios.

I think it's going to take a bit of time to become comfortable with the new commands and how to use them, but I'm looking forward to getting this feature installed and available so I can start to learn more about it!

Scam victims and software security

When both DailyDave and Bruce Schneier point to a paper, you can bet it's going to be very interesting. So if you are at all interested in software security, run don't walk to this paper by Stajano and Wilson: Understanding scam victims: seven principles for systems security.

The seven principles are psychological aspects of human behavior which provide vulnerabilities that scammers and other bad guys exploit:

  • Distraction: While you are distracted by what retains your interest, hustlers can do anything to you and you won't notice.

  • Social Compliance: Society trains people not to question authority. Hustlers exploit this "suspension of suspiciousness" to make you do what they want.

  • Herd: Even suspicious marks will let their guard down when everyone next to them appears to share the same risks. Safety in numbers? Not if they're all conspiring against you.

  • Dishonesty: Anything illegal that you do will be used against you by the fraudster, making it harder for you to seek help once you realize you've been had.

  • Deception: Things and people are not what they seem. Hustlers know how to manipulate you to make you believe that they are.

  • Need and Greed: Your needs and desires make you vulnerable. Once hustlers know what you really want, they can easily manipulate you.

  • Time: When you are under time pressure to make an important choice, you use a different decision strategy. Hustlers steer you towards a strategy involving less reasoning.



It would be great if the BBC would release the TV show episodes on DVD; I'd really enjoy watching them I think.

Wednesday, December 16, 2009

Coders at Work: Fran Allen, Bernie Cosell

I'm almost done with Peter Seibel's fascinating Coders at Work. Chapters 13 and 14 contain his interviews with Frances Allen and Bernie Cosell.

I didn't know much about Fran Allen, although I've certainly benefited from her work, as has anyone who has ever programmed a computer using a language other than assembler. The interview discusses much of her early work on developing the theory and practice of compilers, and, particular, of compiler optimization. The goal of her work is simple to state:

The user writes a sequential code in the language in a way that's natural for the application and then have the compiler do the optimization and mapping it to the machine and taking advantage of concurrency.


Allen's recollections were interesting because they go a long ways back:

I'm not sure if this is accurate, but I used to believe that the first early work of symbolics for names of variables came from a man named Nat Rochester, on a very early IBM machine, the 701 around 1951. He was in charge of testing it and they wrote programs to test the machine. In the process of doing that, they introduced symbolic variables. Now, I've seen some other things since that make me believe that there were earlier ways of representing information symbolically. It emerged in the early 50's, I think, or maybe even in the 40's. One would have to go back and see exactly how things were expressed in the ENIAC, for one thing.


Anyone who can speak about their work in the days of the 701 and the ENIAC certainly is worth listening to!

The interview with Bernie Cosell is interesting because he's a networking guy, and I've always been fascinated with networking software and how it is developed. Cosell gets credit, along with Dave Walden and Will Crowther, for the early programming of the IMP, the Interface Message Processor, that was the key mechanism in the building of the first computer networks.

As Cosell tells the story, his arrival in the networking field was somewhat accidental, for he started doing other work:

BBN was working on a project with Massachusetts General Hospital to experiment with automating hospitals and I got brought onto that project. I started out as an application programmer because that was all I was good for. I think I spent about three weeks as an application programmer. I quickly became a systems programmer, working on the libraries that they were using.
...
When my projects ran out, Frank [Heart] would figure out what I should work on next. ... Somehow, Frank had decided that I was to be the third guy on the IMP project.


Cosell's interview contains a number of great passages. I liked this description of a debugging session that he remembered "fondly":

... thousands and thousands of machine cycles later, the program crashed because some data structure was corrupt. But it turns out the data structure was used all the time, so we couldn't put in code that says, "Stop when it changes." So I thought about it for a while and eventually I put in this two- or three-stage patch that when the this first thing happened, it enabled another patch that went through a different part of the code. When that happened, it enabled another patch to put in another thing. And then when it noticed something bad happening, it froze the system. I managed to figure how to delay it until the right time by doing a dynamic patching hack where one path through the code was patched dynamically to another piece of the code.

Nowadays, we programmers are spoiled with our powerful high-level programming languages. With Java's various features, such as the absence of a memory pointer type, bounds-checked arrays, immutable strings, automatic memory management, and so forth, we rarely experience such debugging scenarios. But Cosell's recollection brought back a fair number of memories from my own early days in programming, and it was certainly entertaining to read.

I also thought Cosell's description of the role of the design review was very good, and I wish more people had had his experience in order to be able to comprehend the value of that process:

Another thing that Frank did, on other projects, was design reviews. He had the most scary design reviews and I actually carried that idea forward. People would quake in their boots at his design reviews.
...
The parts that you did absolutely fine hardly got a mention. We all said, "Oh." But the part that you were most uncomfortable with, we would focus in on. I know some people were terrified of it. The trouble is if you were an insecure programmer you assumed that this was an attack and that you have now been shown up as being incompetent, and life sucks for you.

The reality -- I got to be one the good side of the table occasionally -- was it wasn't. The design review was to help you get your program right. There's nothing we can do to help you for the parts thta you got right and now what you've got is four of the brightest people at BBN helping you fix this part that you hadn't thought through. Tell us why you didn't think it through. Tell us what you were thinking. What did you get wrong? We have 15 minutes and we can help you.

That takes enough confidence in your skill as an engineer, to say, "Well that's wonderful. Here's my problem. I couldn't figure out how to do this and I was hoping you guys wouldn't notice so you'd give me an OK on the design review." The implicit answer was, "Of course you're going to get an OK on the design review because it looks OK. Let's fix that problem while we've got all the good guys here so you don't flounder with it for another week or two."


This is a wonderful description of a process which, if handled correctly, can be incredibly effective. I've personally seen it work that way, from both the giving and receiving end. I've also seen it, far too often, fail, to the extent that it seems that more and more often people don't even attempt design reviews anymore.

It's a shame that people haven't learned how to do a design review effectively, and it's a shame that software managers rarely seem to understand what a powerful tool it is that they aren't using. Perhaps more people will read the Cosell interview and will realize that they have a powerful process improvement available to them.

Fran Allen is the only female programmer interviewed in the book, which is a shame. It would be nice to have had more. Women have been involved in computing since the beginning (think Ada Lovelace, Grace Hopper, etc.). How about, say, Val Henson, or Pat Selinger, or Barbara Liskov?

One last chapter to go...

Tuesday, December 15, 2009

git-add --patch

I just recently found Ryan Tomayko's essay about using git-add --patch.

I love the essay; it's quite well written.

He makes two interesting and inter-related points in the essay:

  • git-add --patch allows you to solve a problem in a (fairly) easy way which is extremely hard to solve using other source code control tools and methodologies.

  • The flexibility and power of Git is integral to its philosophy, and you won't understand Git until you understand this philosophy.



From the essay:

The thing about Git is that it's oddly liberal with how and when you use it. Version control systems have traditionally required a lot of up-front planning followed by constant interaction to get changes to the right place at the right time and in the right order.
...
Git is quite different in this regard.
...
When I'm coding, I'm coding. Period. Version control -- out of my head. When I feel the need to organize code into logical pieces and write about it, I switch into version control mode and go at it.


It's a very interesting essay, both the concrete parts about how to use the low-level Git tools to accomplish some very specific tests, as well as the more abstract sections about why the author feels that this tool supports his own personal software development process more effectively.

I think that there isn't enough discussion about the role of tools in the development process, and about how tools influence and guide the success or failure of a particular process. One of my favorite articles in this respect is Martin Fowler's essay on Continuous Integration. I'm pleased whenever I find articles discussing the interaction of tools and process, since more discussion can only help improve the situation with respect to software development processes and tools.

Monday, December 14, 2009

HTML 5 and multiple file uploads

Firefox 3.6 now supports multiple file input, as in:

<input type="file" multiple=""/>


This is a great bit of functionality, and pretty much finishes the closing of a 10-year old feature request for Firefox/Mozilla.

I know that for many years, the common solution to this was "use Flash". It's nice to see that features like this are making their way into the browser base.

Phenomenally complex security exploits

Secure software techniques have come a long way in the past decade, but it's important to understand that attacks against secure software have come a long way, as well. This wonderful essay by the ISS X-Force team at IBM gives some of the details behind the current state of the art of software vulnerability exploitation. In order to exploit the actual bug they had to work through multiple other steps first, including a technique they call "heap normalization", which involved inventing a pattern of leaking memory, then freeing the leaked memory, then leaking more, etc., in order to arrange the memory contents "just so".

Here's the conclusion; the whole paper is fun to read:

Although the time it took us to reach reliable exploitation neared several weeks it was worth the effort to prove a few things. Some people would have called this vulnerability "un-exploitable", which is obviously not the case. While others would have claimed remote code execution without actually showing it was possible. X-Force always demands that a working exploit be written for a code execution bug. This way, we never have to use the term "potential code execution". Finally we had to prove that the heap cache exploitation techniques were not just parlor tricks designed for a BlackHat talk, but a real-world technique that could leverage near impossible exploitation scenarios into internet victories.

Thursday, December 10, 2009

High end storage systems

For most of us, our principal exposure to computer storage systems involves the sorts of things we find on our personal computers:

  • Winchester-technology hard drives

  • USB-attached flash memory systems

  • writable optical (DVD/CD) media



But at the high end, where price is (mostly) no object, the state of the art in computer storage systems is advancing rapidly.

If you haven't been pay much attention to this area of computer technology, you owe it to yourself to have a look through this three part series from Chuck Hollis at EMC:



A little sample to whet your appetite:

In this picture, we've got a pool of VMs, a pool of servers, a pool of paths, and a pool of different storage media types.

This sort of picture wants to be managed very differently than traditional server/fabric/storage approaches. It wants to you set policy, stand back -- and simply add more resources if and when they're needed.
...
In just a few short years, virtualization concepts have changed forever how we think about server environments: how we build them, and how we run them.


Earlier this fall I was talking with my friend Gil about a major e-commerce roll-out he's managing for a high-end retail enterprise, and he confirmed this impact that virtualization technologies are having on enterprise computing. There is lots to learn, and lots of new opportunities become available.

Wednesday, December 9, 2009

Coders at Work: Ken Thompson

Chapter 12 of Coders at Work contains Peter Seibel's interview with Ken Thompson.

In almost any book involving computer programmers, Ken Thompson would be the most famous, impressive, and respected name in the book. He's most famous for his work on Unix, but he's also quite well known for Plan 9, and for Multics, and for B, and for Belle, and for ed, and so on and so on. Many people feel that he gave the best Turing Award Lecture ever.

Here, however, we still have Chapter 15 to look forward to.

Still, the Thompson interview does not disappoint. Fairly early into the interview, it's obvious that we're listening to somebody who has programming in their blood:

I knew, in a deep sense, every line of code I ever wrote. I'd write a program during the day, and at night I'd sit there and walk through it line by line and find bugs. I'd go back the next day and, sure enough, it would be wrong.

If you've ever fixed a bug in the shower, or while riding your bike, or while playing basketball, you know instantly what Thompson means.

This ability to manipulate symbols in your mind is crucial to successful programming, and Thompson describes it well:

I can visualize the structure of programs and how things are efficient or inefficient based on those op codes, by seeing the bottom and imagining the hierarchy. And I can see the same thing with programs. If someone shows me library routines or basic bottom-level things, I can see how you can build that into different programs and what's missing -- the kinds of programs that would still be hard to write.

This ability to conceptualize a synthesis of various component parts into an integrated whole is the essence of what many people call "systems thinking" or "system design", or "software architecture". It's an extremely impressive skill when you see it done well.

Later in the interview Thompson tells a story that I, at least, had not heard before, about the very early design of Unix, and how he wasn't even intending to write an operating system:

A group of us sat down and talked about a file system.
...
So I went off and implemented this file system, strictly on a PDP-7. At some point I decided that I had to test it. So I wrote some load-generating stuff. But I was having trouble writing programs to drive the file system. You want something interactive.

Seibel: And you just wanted to play around with writing a file system? At that point you weren't planning to write an OS?

Thompson: No, it was just a file system.

Seibel: So you basically wrote an OS so you'd have a better environment to test your file system.

Thompson: Yes. Halfway through there that I realized it was a real time-sharing system.


I'm not really sure if I believe this story, but it was entertaining to read it.

Thompson describes his philosophy of identifying talented programmers:

It's just enthusiasm. You ask them what's the most interesting program they worked on. And then you get them to describe it and its algorithms and what's going on. If they can't withstand my questioning on their program, then they're not good. If I can attack them or find problems with their algorithms and their solutions and they can't defend it, being much more personally involved than I am, then no.
...
That's how I interview. I've been told that it's devastating to be on the receiving side of that.

I bet it is devastating!

I've been on both sides of such interviews. It's exhausting, although strangely enjoyable, to be on the receiving end of such questioning; it's exhausting, and often emotionally draining, to be the questioner. But it does seem to be a technique that works. Programming-in-the-large is all about finding groups of capable people who can communicate effectively about extraordinarily abstract topics, and I don't know any better way to do this than the one that Thompson describes.

It seems like Thompson has been drifting of late. He describes his work at Google:

Probably my job description -- whether I follow it or not, that's a different question -- would be just to find something to make life better. Or have some new idea of new stuff that replaces old stuff. Try to make it better. Whatever it is that's wrong, that takes time, that causes bugs.

In his defense, it can't be easy to join an organization such as Google and figure out how to effectively contribute.

A very interesting part of the Thompson interview comes near the end, when Seibel gets Thompson to talk about C++. After getting Thompson to describe the delicate working relationship he had with Stroustrup, Seibel gets Thompson to open up about the language itself:

It certainly has its good points. But by and large I think it's a bad language. It does a lot of things half well and it's just a garbage heap of ideas that are mutually exclusive. Everybody I know, whether it's personal or corporate, selects a subset and these subsets are different. So it's not a good language to transport an algorithm -- to say, "I wrote it; here, take it." It's way too big, way too complex. And it's obviously built by committee.

Thompson certainly lets us know how he feels!

But I spent nearly a decade working in C++, across several different companies and in several different domains, and the "subset" behavior that he describes is exactly true. No two uses of C++ ever seem to be the same, and it makes it strangely hard to move from one body of C++ to another body of C++ because even though they are all, at some level, written in the same language, any two libraries of C++ code always seem to be written in two different language.

I wished that the Thompson interview went on and on, and I'm glad that Seibel was able to include him in the book.

IntelliJ IDEA version 9 is out of beta

IntelliJ have released version 9 of their IDEA development environment.

I've been using the Beta version of version 9 and I like it very much. It does take a long time to start up, but once it has started up it is very powerful and responsive.

Perhaps I have set up my project incorrectly, as it seems like IDEA wants to completely re-scan my rt.jar each time it starts up and opens the project. Is there something that I have mis-configured?

Time to go figure out how to upgrade my Beta version to the released version!

Monday, December 7, 2009

Coders at Work: Dan Ingalls and Peter Deutsch

Chapters 10 and 11 of Coders at Work contain Seibel's interviews with Dan Ingalls and Peter Deutsch, and represent Seibel's inclusion of the Smalltalk world.

Ingalls and Deutsch were both part of Alan Kay's historic team at Xerox PARC, and much of the two interviews contains discussions of the work that went on at that time, its goals and techniques and observations. Ingalls recalls the focus on educational software:

It was envisioned as a language for kids -- children of all ages -- to use Alan's original phrase. I think one of the things that helped that whole project, and it was a long-term project, was that it wasn't like we were out to do the world's best programming environment. We were out to build educational software, so a lot of the charter was more in the space of simplicity and modeling the real world.

One of the implementation techniques that Ingalls particularly remembers, in stark contrast to the multi-hour batch cycles that were then common, was the sense of immediacy you got for making a change and almost instantly being able to observe the effect of the change:

For instance, our turnaround time for making changes in the system from the beginning was seconds, and subsecond fairly soon. It was just totally alive. Ant that's something which I, and a bunch of the other people, got a passion for. Let's build a system that's living like that. That's what Smalltalk became.

This goal of providing rapid feedback, interactive development, and immediate reflection of changes is indeed powerful. It lead to the development of modern IDE software (Eclipse was built by the OTI team, who wanted to expand on their work building environments for Smalltalk systems), as well to many software methodologies that favor iteration and short cycles, so that feedback and learning can occur. As the Agile Manifesto proposes:

Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.


My favorite part of the Ingalls interview was his description of the sort of person who makes the best programmer:

If you grow up in a family where when the cupboard door doesn't close right, somebody opens it up and looks at the hinge and sees that a screw is loose and therefore it's hanging this way vs. if they say, "Oh, the door doesn't work right; call somebody" -- there's a difference there. To me you don't need any involvement with computers to have that experience of what you see isn't right, what do you do? Inquire. Look. And then if you see the problem, how do you fix it?


In addition to this almost feverish curiosity, another major aspect of great programmers is their ability to think about problems abstractly. Deutsch recalls:

I've always been really comfortable in what I would call the symbolic world, the world of symbols. Symbols and their patterns have always just been the stuff I eat for lunch. And for a lot of people that's not the case.
...
The people who should be programming are the people who feel comfortable in the world of symbols. If you don't feel really pretty comfortable swimming around in that world, maybe programming isn't what you should be doing.


I thought it was very interesting to include Deutsch, as he's a notable person who was a Famous Computer Programmer for decades, then just one day got on his horse and rode out of town:

And I had this little epiphany that the reason that I was having trouble finding another software project to get excited about was not that I was having trouble finding a project. It was that I wasn't excited about software anymore. As crazy as it may seem now, a lot of my motivation for going into software in the first place was that I thought you could actually make the world a better place by doing it. I don't believe that anymore. Not really. Not in the same way.

Can software make the world a better place? I think it can, and after 30 years I'm not done trying. But I also believe you should follow your desires, and I'm pleased that Deutsch was able to recognize that his passion had moved elsewhere.

Thursday, December 3, 2009

The ongoing evolution of Java

Java continues to evolve. That's a good thing, of course, but it can be a little bit overwhelming trying to keep up, even in the focused areas of concern to me.

A brief set of capsule reports from my small area of Java-related interests:

  • JUnit has released version 4.8. This version adds the new annotations Category, IncludeCategory, and ExcludeCategory, as well as the Categories test runner. Meanwhile, I'm still stuck on version 3 of JUnit, as are many codebases that I know about. Is there an automated assistant which helps with the transition from JUnit 3.X to JUnit 4.X? How do others deal with this migration?

  • Mark Reinhold announced that JDK 1.7 has slipped to September 2010. However, as part of this slippage, there is also increasing anticipation that some form of closures will make the 1.7 release. That closures syntax looks pretty horrendous to me, and reminds me of ancient C-language function prototype syntax; I hope they can see their way to building something less grotesque.

  • Meanwhile, in more near-term JDK 1.7 news, apparently Escape Analysis is now enabled by default as part of 1.7 Milestone 5. Escape Analysis is a very powerful compiler optimization which can dramatically reduce object allocation under certain circumstances. If you want to learn more about it, here's a good place to start; that weblog posting contains a pointer to Brian Goetz's fascinating analysis of Java memory allocation from 2005.



Back in Derby land, we're still somewhat stuck on JDK 1.4. We only dropped support for JDK 1.3 about 18 months ago, so we tend to be fairly conservative and support the older Java implementations long after others have moved on.

But there seems to be some pressure building to consider moving to JDK 1.5 as a base level of support. I think it's increasingly hard to find significant Java deployment environments where JDK 1.5 is not the standard, so I don't think it will be that much longer before Derby makes the jump to assuming a base JDK 1.5 level.

For example, it appears that the new Android Java system ("Dalvik") works best with Java bytecodes that are built by a JDK 1.5 compiler, and the Dalvik VM tools issue warnings when pointed at the Derby jar files.

As the Red Queen said to Alice:

you have to run as fast as you can to stay where you are; if you want to go somewhere, you have to run twice as fast as that

Greg Kroah-Hartman and the Linux drivers project

There was an interesting interview of Greg Kroah-Hartman over at How Software Is Built.

Kroah-Hartman describes the role of trust in the open source Linux development process:

I maintain the subsystems such as USB, and I have people who I trust enough that if they send me a patch, I’ll take it, no questions asked. Because the most important thing is I know that they will still be around in case there’s a problem with it. [laughs]

And then I send stuff off to Linus. So, Linus trusts 10 to 15 people, and I trust 10 to 15 people. And I’m one of the subsystem maintainers. So, it’s a big, giant web of trust helping this go on.

Other people have described this behavior as "meritocracy", or as "reputation-based development". It's a fascinating look inside the social aspects of the software development process, and it's interesting how much of the overall development process involves non-technical topics:

Companies want to get the most value out of Linux, so I counsel them that they should drive the development of their driver and of Linux as a whole in the direction that they think makes the most sense. If they rely on Linux and feel that Linux is going to be part of their business, I think they should become involved so they can help change it for the better.

Kroah-Hartman talks about the increasing maturity of Linux:

we don’t gratuitously change things. A big part of what drives that change is that what Linux is being used for is evolving. We’re the only operating system in something like 85 percent of the world’s top 500 supercomputers today, and we’re also in the number-one-selling phone for the past year, by quantity.

It’s the same exact kernel, and the same exact code base, which is pretty amazing.

In fact, he goes so far as to make a rather bold prediction: Linux is now so well established that it cannot be displaced:

I just looked it up, and we add 11,000 lines, remove 5500 lines, and modify 2200 lines every single day.

People ask whether we can you keep that up, and I have to tell you that every single year, I say there’s no way we can go any faster than this. And then we do. We keep growing, and I don’t see that slowing down at all anywhere.

I mean, the giant server guys love us, the embedded guys love us, and there are entire processor families that only run Linux, so they rely on us. The fact that we’re out there everywhere in the world these days is actually pretty scary from an engineering standpoint. And even at that rate of change, we maintain a stable kernel.

It’s something that no one company can keep up with. It would actually be impossible at this point to create an operating system to compete against us. You can’t sustain that rate of change on your own, which makes it interesting to consider what might come after Linux.

For my part, I think the only thing that’s going to come after Linux is Linux itself, because we keep changing so much. I don’t see how companies can really compete with that.

I'm not sure I really believe it is "impossible"; perhaps this is one of those claims that people will laugh at, 50 years from now, but in my opinion the Linux work is astonishingly good. I run lots of different Linux distributions on a variety of machines and without exception they are solid, reliable, and impressively efficient.

It's a very interesting software development process and I enjoyed reading this interview and recommend it for those who are interested in how open source software development actually works.

Tuesday, December 1, 2009

Consistency in naming matters

If you have a naming convention for your code, stick with it.

If you don't have a naming convention for your code, establish one, and then stick with it.

You may not like the naming convention, but it doesn't matter. Even if you don't like it, stick with it. If you really don't like it, you can try to change the convention (which implies changing all the code to match it, of course), but don't just casually violate the naming convention.

It's just unbelievable how many hours of my life I've wasted dealing with situations where one programmer wrote:

String result = obj.getKeyByID(my_id);

while elsewhere, nearby in the code, a different programmer wrote:

Object obj = store.readById(my_id);

and thus, while working in that code, I casually wrote:

Object anotherObj = store.readByID(different_id);


Yes, I know, modern IDEs help you notice such simple problems rapidly.

But the whole world would be so much better if we just avoided such inconsistencies in the first place.

Grumble grumble grumble

Monday, November 30, 2009

Finding what you're looking for by looking for everything else

At work, one of my responsibilities is to maintain a complicated object cache. Caches are a great mechanism for improving the performance of searches. However, they don't work well for improving the performance of searches for non-existent items; in fact, they often worsen such performance because the code must first search the cache, fail to find the object, then must search the underlying store, and fail to find the object again. In this case, the cache has added overhead and contention without adding benefit.

In practice, we try to minimize the situations where clients search for objects that don't exist, but such situations still arise, so, every six months or so, somebody tries to figure out a way to improve the cache so that it can cache items that don't exist, a discussion that usually ends quietly once my colleague Tom points out the obvious impossibility of enumerating the infinite space of objects that don't exist.

At any rate, I was reminded (a bit) of this by this cute post on The Daily WTF, in which the programmer makes the basic mistake of trying to enumerate all the unwanted data, rather than simply specifying the valid data.

Of course, not only is this technique a poor performer, but, perhaps more importantly, it is a classic source of security bugs, since the bad guys can always think of an extension that you left off your list. So, in this case as in so many others, the simplest code is not only easiest to understand, and best performing, but it is the most secure as well.

Saturday, November 28, 2009

Coders at Work: Guy Steele

Chapter 9 of Coders at Work contains Peter Seibel's interview of Guy Steele. Steele is another in the progression of language designers that Seibel chose to include in the book: Crockford, Eich, Bloch, Armstrong, Peyton Jones, and now Steele. Although programming language design is clearly one of the most important fields within Computer Science, I do wish Seibel had balanced his choices somewhat, to include more coders from other areas: operating systems, networking, databases, graphics, etc. Still, Steele is an interesting engineer and I quite enjoyed this interview.

Having heard from Zawinski about some of the later developments in the work on Emacs and Lisp, it is interesting to hear Steele talk about some of the very early work in this area:

One of the wonderful things about MIT was that there was a lot of code sitting around that was not kept under lock and key, written by pretty smart hackers. So I read the ITS operating system. I read the implementations of TECO and of Lisp. And the first pretty printer for Lisp, written by Bill Gosper. In fact I read them as a high-school student and then proceeded to replicate some of that in my 1130 implementation.

This description of learning how to program by reading the programs of others, was wide-spread. It is certainly how I learned to program. Although I think that computer science education has come a long way in 30 years, I think that the technique of reading code is still a wonderful way to learn how to program. If you don't like reading code, and don't develop a great deal of comfort with reading code, then you're not going to enjoy programming.

Steele talks about the need to have a wide variety of high quality code to read:

I would not have been able to implement Lisp for an 1130 without having had access to existing implementations of Lisp on another computer. I wouldn't have known what to do. That was an important part of my education. Part of the problem we face nowadays, now that software had become valuable and most software of any size is commercial, is that we don't have a lot of examples of good code to read. The open source movement has helped to rectify that to some extent. You can go in and read the source to Linux, if you want to.


I think that the open source movement is an excellent source of code to read; in addition to just reading the code, many open source projects have communities of programmers who love to talk about the code in great detail, so if you have questions about why the code was written the way it was, open source projects are usually very willing to discuss the reasoning behind the code.

In addition to early work on Lisp, Steele was also present for the invention of the Emacs editor, one of the most famous and longest-living programs in existence:

Then came the breakthrough. The suggestion was, we have this idea of taking a character and looking it up in a table and executing TECO commands. Why don't we apply that to real-time edit mode? So that every character you can type is used as a lookup character in this table. And the default table says, printing characters are self-inserting and control characters do these things. But let's just make it programmable and see what happens. And what immediately happened was four or five different bright people around MIT had their own ideas about what to do with that.

In retrospect, a WYSIWYG text-editing program seems so obvious, but somebody had to think of it for the first time, and to hear first hand from somebody who was actually part of that process is great!

My favorite part of the Steele interview, however, was this description of programming language design, which, again, sounds simple in retrospect, but really cuts directly to the core of what programming language design is trying to achieve:

I think it's important that a language be able to capture what the programmer wants to tell the computer, to be recorded and taken into account. Now different programmers have different styles and different ideas about what they want recorded. As I've progressed through my understand of what ought to be recorded I think that we want to say a lot more about data structures, we want to say a lot more about their invariants. The kinds of things we capture in Javadoc are the kinds of things that ought to be told to a compiler. If it's worth telling another programmer, it's worth telling the compiler, I think.

Exactly, and double-exactly! Firstly, Steele is absolutely right that most programming languages concentrate far too much on helping programmers describe control flow and not enough on helping programmers describe data structures. Most, perhaps nearly all, of the bugs and mistakes that I work on have to do with confusion about data structures, not with confusion about control flow. When reading most programs, the control flow is simple and evident, but teasing out the behavior of the data structures is often horrendous.

And, secondly, I just love the way Steele distills it:

If it's worth telling another programmer, it's worth telling the compiler, I think.


Steele is apparently working on a new programming language, Fortress. It will be interesting to see how it turns out.

Monday, November 23, 2009

Coders at Work: Peter Norvig

Chapter 8 of Peter Seibel's Coders at Work contains his interview with Peter Norvig.

Norvig is the 2nd of 3 Google employees that Seibel interviews in his book, I believe. Is 25% too high a ratio for a single company to have? Perhaps, but there's no disputing that Google has a fantastic array of talent, and Norvig reputedly is one of the reasons why, so I'm pleased to see him included.

At this point in his career, Norvig is much more of a manager/executive than a coder, so his observations on coding are perhaps not as immediate as with others that Seibel speaks to, but still, Norvig has some very serious programming chops. Joel Spolsky and Jeff Atwood told a story on their podcast recently about their DevDays conference, which had the unusual format of being a short conference held multiple times, successively, in various locations around the world: at each location, one of the events that they organized was a tutorial introduction to the programming language Python. Each tutorial was taught by a different instructor, who was chosen from the available instructors local to the conference location, and each time, Joel and Jeff suggested to the instructor that for the material of the course, the instructor could choose to perform an in-depth analysis of Norvig's spelling checker. 10 of the 12 instructors apparently chose to take this suggestion, and, each time, the technique of analyzing a single brilliantly-written piece of software was successful.

One of the things that must be fascinating about being at Google is the scope and scale of the problems. Norvig says:

At Google I think we run up against all these types of problems. There's constantly a scaling problem. If you look at where we are today and say, we'll build something that can handle ten times more than that, in a couple years you'll have exceeded that and you have to throw it out and start all over again. But you want to at least make the right choide for the operating conditions that you've chosen -- you'll work for a billion up to ten billion web pages or something. So what does that mean in terms of how you distribute it over multiple machines? What kind of traffic are you going to have going back and forth?

Although the number of such Internet-scale applications is increasing, there still aren't many places where programmers work on problems of this size.

Another interesting section of the interview (for me) involves Norvig's thoughts about software testing:

But then he never got anywhere. He had five different blog posts and in each one he wrote a bit more and wrote lots of tests but he never got anything working because he didn't know how to solve the problem.
...
I see tests more as a way of correcting errors rather than as a way of design. This extreme approach of saying, "Well, the first thing you do is write a test that says I get the right answer at the end," and then you run it and see that it fails, and then you say, "What do I need next?" -- that doesn't seem like the right way to design something to me.
...
You look at these test suites and they have assertEqual and assertNotEqual and assertTrue and so on. And that's useful but we also want to have assertAsFastAsPossible and assert over this large database of possible queries we get results whose precision value of such and such...
...
They should write lots of tests. They should think about different conditions. And I think you want to have more complex regression tests as well as the unit tests. And think about failure modes.


During this long section of the interview, which actually spans about 5 pages, Norvig appears to be making two basic points:

  • Testing is not design. Design is design.

  • Many test suites are overly simple. Testing needs to start simple, but it needs to be pursued to the point where it is deep and powerful and sophisticated.


I think these are excellent points. I wonder what Norvig would think of Derby's test harness, which has support functions that are much more sophisticated than assertEquals and the like: assertFullResultSet, assertSameContents, assertParameterTypes, etc.

I'm quite pleased that Seibel raises the often-controversial question of Google's "puzzle" style of interviewing, and Norvig's answer is quite interesting:

I don't think it's important whether people can solve the puzzles or not. I don't like the trick puzzle questions. I think it's important to put them in a technical situation and not just chitchat and get a feeling if they're a nice guy.
...
It's more you want to get a feeling for how this person thinks and how they work together, so do they know the basic ideas? Can they say, "Well, in order to solve this, I need to know A, B, and C," and they start putting it together. And I think you can demonstrate that while still failing on a puzzle. You can say, "Well, here's how I attack this puzzle. Well, I first think about this. Then I do that. Then I do that, but geez, here's this part I don't quite understand."
...
And then you really want to have people write code on the board if you're interviewing them for a coding job. Because some people have forgotten or didn't quite know and you could see that pretty quickly.

So the puzzle question, in Norvig's view, is just a way to force the interviewer and the interviewee to talk concretely about an actual problem, rather than retreating into the abstract.
Over the years (decades), I've interviewed at many interesting software companies, including both Microsoft and Google, and I think that this approach to the interviewing process is quite sensible. Although in my experience it was actually Microsoft, not Google, where I encountered the full-on trick puzzle interview process, I can see that, as a technique, it actually works very well. And, as the interviewee, I actually appreciate getting past the small talk and getting direct to the "code on a wall" portion of the interview.

I really enjoyed the Norvig interview.

Saturday, November 21, 2009

Chromium OS and custom hardware

I've been making my way, slowly, through the Chromium OS information. One of the things that surprised me was the notion that the OS requires custom hardware. Did Google make a mistake by doing this? This seems like it would reduce their potential user base and make it harder for casual users to experiment with the operating system. However, I see that there is already a growing list of hardware vendors and viable systems, so maybe (particularly since this is Google) there won't be a problem here, and support for Chromium OS will be commonly present in mainstream hardware.

Will a single physical machine be capable of running multiple OS's? (Say, Windows 7, Ubuntu, Chromium OS, etc.) Will virtualization technologies (Virtual Box, VMWare, etc.) help here?

Wednesday, November 18, 2009

Coders at Work: Joe Armstrong, Simon Peyton Jones

Chapters 6 and 7 of Coders at Work are the interviews with Joe Armstrong and Simon Peyton Jones.

I hadn't heard of Joe Armstrong before, although I knew of Erlang. Armstrong is clearly the real deal, the sort of coder I recognize and can tell almost instantly when I'm around them. It can be hard to read an interview with such a person, because when you see their words in print, you think, ouch!

I used to read programs and think, "Why are they writing it this way; this is very complicated," and I'd just rewrite them to simplify them. It used to strike me as strange that people wrote complicated programs. I could see how to do things in a few lines and they'd written tens of lines and I'd sort of wonder why they didn't see the simple way. I got quite good at that.

I know this feeling; I have this feeling all the time; I'm very familiar with this feeling.

The danger, of course, is that what you think of as "the simple way" may be perhaps too simple, or may miss some property of the problem under study which was apparent to the original author but not to the next reader. This urge to re-write is incredibly strong, though, and while I was reading the interview with Armstrong I was instantly reminded of a saying that was prevalent two decades ago, and which I think is attributed to Michael Stonebraker: "pissing on the code". Stonebraker was describing a behavior that he observed during the research projects at UC Berkeley, during which, from time to time, one student would leave the project and another student would join the project to replace the first. Inevitably, the new student would decide that the prior student's work wasn't up to par, and would embark on an effort to re-write the code, a cycle of revision which came to be compared to the way that dogs mark their territory.

As I was reading the Armstrong interview, I couldn't really decide if he was pulling our legs or not:

If you haven't got a directory system and you have to put all the files in one directory, you have to be fairly disciplined. If you haven't got a revision control system, you have to be fairly disciplined. Given that you apply that discipline to what you're doing it doesn't seem to me to be any better to have hierarchical file systems and revision control. They don't solve the fundamental problem of solving your problem. They probably make it easier for groups of people to work together. For individuals I don't see any difference.

Is he serious? He doesn't see any difference? It's hard to believe.

His primitivistic approach seems almost boundless:

I said, "What's that for? You don't use it." He said, "I know. Reserved for future expansion." So I removed that.

I would write a specific algorithm removing all things that were not necessary for this program. Whenever I got the program, it became shorter as it became more specific.

This, too, is an emotion I know well; people who know me know that I rail against the unneeded and unnecessary in software, as I find that complexity breeds failure; it results in lower developer productivity, and in lower performance, harder to use, buggier software. There's a famous story which is retold in one of the old books describing the joint development of OS/2 by Microsoft and IBM, about how the software management at IBM was obsessed with measuring productivity by counting the lines of code, and how the Microsoft engineers kept "messing up" the schedule by re-writing bits of IBM-written code in fewer lines, thus causing the graphs to show a negative slope and alarm bells to ring.

Many parts of the Armstrong interview definitely ring true, such as the observation that programming is a skill which can be improved by practice, almost to the point of being an addiction:

The really good programmers spend a lot of time programming. I haven't seen very good programmers who don't spend a lot of time programming. If I don't program for two or three days, I need to do it.

As well as his observation on the value of describing what a program is supposed to do:

I quite like a specification. I think it's unprofessional these people who say, "What does it do? Read the code." The code shows me what it does. It doesn't show me what it's supposed to do.

However I ended up feeling about the Amstrong interview, one thing is for sure: it was not boring!

I found the Peyton Jones interview much less gripping. Again, I had heard of Haskell, the language that Peyton Jones is associated with, but I hadn't heard much about Peyton Jones himself. I'd say that Peyton Jones is not really a coder; rather, he is a professor, a teacher, a researcher:

I write some code every day. It's not actually every day, but that's my mantra. I think there's this horrible danger that people who are any good at anything get promoted or become more important until they don't get to do the thing they're any good at anymore. So one of the things I like about working here and working in research generally is that I can still work on the compiler that I've been working on since 1990.
...
How much code do I write? Some days I spend the whole day programming, actually staring at code. Other days, none. So maybe, on average, a couple hours a day, certainly.

It's like he's a researcher who wants to be a coder, but also wants to be a researcher. But I suspect that what he really wants to be is a teacher:

I have a paper about how to write a good paper or give a good research talk and one of the high-order bits is, don't describe an artifact. An artifact is an implementation of an idea. What is the idea, the reusable brain-thing that you're trying to transfer into the minds of your listeners? There's something that's useful to them. It's the business of academics, I think, to abstract reusable ideas from concrete artifacts.

I think that's a great description; I suspect that if you flip it around, it's not a bad description of coders: "to implement concrete artifacts from abstract reusable ideas".

I went into both the Armstrong and Peyton Jones interviews thinking, "Oh, I know these guys! This is the guy that did language X; perhaps I will learn something about language X, or about why this guy's life led him to invent language X." Unfortunately, neither interview did that, perhaps because those stories have been told sufficiently many times elsewhere.

I'm still interested in Erlang, and in Haskell, and hopefully someday I will find the time to study them more. But these interviews were not the springboard to that activity.

Good rant on programming language design

I found this rant interesting.

It's rather strongly worded at times (it is a rant, after all, but I think the author makes some excellent points.

I felt his pain about the appearance of:


just another language written first for the compiler and only secondarily for the programmer --- and stuck in a 70s mindset* about the relationship of that programmer to the (digital) world within which they live and work


And I appreciated his observation that:

- programming languages are for *programmers* --- not compilers and compiler-writers
- until you make the everyday, "simple" things simple, it will continue to be a dark art practiced by fewer and fewer


Is it time for a Great New Programming Language? It's been 15 years since Java and JavaScript. What language will be that next language?

Tuesday, November 17, 2009

Valid rant on Google Closure? Or premature optimization?

I found this rant on Google Closure on a SitePoint blog.

Midway through the long, detailed article, we find this:

Closure Library contains plenty of bloopers that further reveal that its authors lack extensive experience with the finer points of JavaScript.

From string.js, line 97:

// We cast to String in case an argument
// is a Function. ...
var replacement =
String(arguments[i]).replace(...);

This code converts arguments[i] to a string object using the String conversion function. This is possibly the slowest way to perform such a conversion, although it would be the most obvious to many developers coming from other languages.

Much quicker is to add an empty string ("") to the value you wish to convert:

var replacement
= (arguments[i] + "").replace(...);




Now, it's quite possible that the author of the blog entry has a point, and the one technique is faster than the other.

However, unless this is in a very performance critical section, I think that the loss of readability is substantial. It is much easier, particularly for the casual reader, to read the first form of the code (with the String() conversion function), then it is to read the second form.

I'm only a part-time JavaScript coder, but for the foreseeable future I intend to concentrate on writing clear and legible code, and place my bets on the compiler and VM authors improving their ability to optimize my code in such a way that hacks like this become less and less necessary.

Monday, November 16, 2009

Stack Overflow

I've been trying to learn how to use Stack Overflow.

And I've been unsuccessful.

I've had a very hard time finding interesting questions being discussed. I see a lot of questions that look like they might be interesting, but aren't phrased very well. I looked today, and there was a question that read:

My application takes a long time to shut down.


And I see a lot of other questions that don't seem interesting at all: "which language is better, C# or Visual Basic."

Also, I have a hard time figuring out what I can do on the site. I started out with a reputation value of 1, which provides me with a very limited range of actions I can perform on the site.

And I'm having a terrible time learning how to use the tagging system effectively. For example, I'd like to pay attention to questions involving Java JDBC database development, but there are just dozens of tags that are relevant to this area: "java", "database", "DBMS", "SQL", "jdbc", "derby", "apache-derby", etc. Why, there are 416 pages of tags on the web site, currently!

I think that Jeff Atwood and Joel Spolsky are really super smart, and I can tell that there is, potentially, a lot of value in the Stack Overflow concept, but if I can't figure out how to use it better, I'm not sure how much more I'm going to be able to get out of it.

Is there an "Idiot's guide to getting started with Stack Overflow" somewhere?

Saturday, November 14, 2009

Coders at Work: Joshua Bloch

Chapter 5 of Coders at Work is the interview with Joshua Bloch.

I'm very familiar with Bloch's work in the Java community, and I've read his online writings as well as several of his books, so the material in his interview was pretty familiar to me.

Several things about Bloch's comments struck me:

  • Bloch talks frequently about programming-as-writing, an important topic for me:

    Another is Elements of Style, which isn't even a programming book. You should read it for two reasons: The first is that a large part of every software engineer's job is writing prose. If you can't write precise, coherent, readable specs, nobody is going to be able to use your stuff. So anything that improves your prose style is good. The second reason is that most of the ideas in that book are also applicable to programs.
    ...
    Oh, one more book: Merriam-Webster's Collegiate Dictionary, 11th Edition. Never go anywhere without it. It's not something you actually read, but as I said, when you're writing programs you need to be able to name your identifiers well. And your prose has to be good.
    ...
    The older I get, the more I realize that it isn't just about making it work; it's about producing an artifact that is readable, maintainable, and efficient.

    The Elements of Style, and a dictionary: what inspired choices! I wish that more programmers would read these books! One of the reasons I work at my blog is to keep up with the practice of writing; like any skill, it requires constant practice

  • I also very much liked Bloch's description of test-first API development, although he doesn't really call it "test-first"; instead he talks about needing to write use cases:

    Coming up with a good set of use cases is the most important thing you can do at this stage. Once you have that, you have a benchmark against which you can measure any possible solution. It's OK if you spend a lot of time getting it reasonably close to right, because if you get it wrong, you're already dead. The rest of the process will be an exercise in futility.
    ...
    The whole idea is to stay agile at this stage, to flesh out the API just enough that you can take the use cases and code them up with this nascent API to see if it's up to the task.
    ...
    In fact, write the code that uses the API before you even flesh out the spec, because otherwise you may be wasting your time writing detailed specs for something that's fundamentally broken.
    ...
    In a sense, what I'm talking about is test-first programming and refactoring applied to APIs. How do you test an API? You write use cases to it before you've implemented it. Although I can't run them, I am doing test-first programming: I'm testing the quality of the API, when I code up the use cases to see whether the API is up to the task.

    Having such a well-known and influential person as Bloch coming out so strongly in favor of test development is a wonderful thing, and I think he makes the case very persuasively.



Bloch is so closely identified with Java, and so deeply involved in its development, that it's hard to imagine him ever doing anything else. Seibel is interested in this question, too:

Seibel: Do you expect that you will change your primary language again in your career or do you think you'll be doing Java until you retire?

Bloch: I don't know. I sort of turned on a dime from C to Java. I programmed in C pretty much exclusively from the time I left grad school until 1996, and then Java exclusively until now. I could certainly see some circumstance under which I would change to another programming language. But I don't know what that language would be.


My own experience mirrors Bloch's almost exactly: I programmed in C from 1987-1994, then briefly studied C++ from 1994-1997, then have been programming in Java almost exclusively for the last dozen years.

I continue to study other languages and environments, but Java is so rich, and so powerful, and I feel so effective and capable in Java, that I haven't yet found that next great language that offers enough of an advantage to take me out of Java.

Blog spam

I've been hit by blog comment spam for the first time.

I have full comment moderation turned on, so the spam just shows up in the moderation queue, and it's easy to delete.

So long as it's only 1 or 2 spam comments per day, that is.

Groan.

Wednesday, November 11, 2009

Go: a new programming language

Rob Pike and Ken Thompson have announced (with Google's help) their new language Go.

Here's a quick summary, from the Language Design FAQ:


Go is an attempt to combine the ease of programming of an interpreted, dynamically typed language with the efficiency and safety of a statically typed, compiled language. It also aims to be modern, with support for networked and multicore computing. Finally, it is intended to be fast: it should take at most a few seconds to build a large executable on a single computer. To meet these goals required addressing a number of linguistic issues: an expressive but lightweight type system; concurrency and garbage collection; rigid dependency specification; and so on. These cannot be addressed well by libraries or tools; a new language was called for.


It's interesting that they are concerned with compilation speed, as modern compilers seem extraordinarily fast to me: I can build the entire Derby system soup-to-nuts in barely 2 minutes on a 5-year-old computer, and that includes building the entire enormous test suite as well as constructing the sample database.

They seem to spend most of their time comparing Go to C and C++; perhaps it's less compelling to somebody coming from a Java background?

Regardless, it definitely looks like it's worth learning more about.

And it's interesting that they are doing this using an open-source methodology.

Learning about Maven

I'm taking an opportunity to try to learn a little bit about Maven.

I actually started by first taking an opportunity to try to learn a little bit about jUDDI. I encountered a small problem with jUDDI, which was patched within hours by one of the jUDDI developers (thanks!).

But now I need to learn how to build jUDDI, in order to experiment with the patch. And jUDDI uses Maven as its build environment.

It's somewhat interesting that this is the first project I've worked with that has used Maven for its builds, since my understanding is that Maven is increasingly popular. But until now, I hadn't actually encountered it in my own usage.

I can build almost all of jUDDI using Maven. jUDDI has about half-a-dozen different subsystems, and I can build all but one of them individually. But when I go to build the "Tomcat package", I get a Maven build error:


[INFO] Failed to resolve artifact.

GroupId: org.apache.juddi
ArtifactId: juddi-parent
Version: 3.0.0.SNAPSHOT

Reason: Unable to download the artifact from any repository

org.apache.juddi:juddi-parent:pom:3.0.0.SNAPSHOT

from the specified remote repositories:
central (http://repo1.maven.org/maven2),
apache (http://people.apache.org/repo/m2-ibiblio-rsync-repository),
maven2-repository.dev.java.net (http://download.java.net/maven/2),
maven-repository.dev.java.net (http://download.java.net/maven/1)


So far, I haven't figured this problem out, except that I know:

  • I get a similar problem on both the trunk, and on the 3.0.0-tagged branch

  • The jUDDI developers don't get this problem when they build

  • The juddi-parent subsystem is built successfully, and exists correctly in my local repository (under the .m2 folder



My current working theory is that the build scripts are expecting that Maven will fetch this already-built object from my local repository, but for some reason Maven is not looking in the local repository, and is only willing to look in remote repositories.

I've learned about running Maven with the -X flag, and I can see that, at what seems to be the critical point, Maven deliberates about where to look for the juddi-parent object:


[DEBUG] Retrieving parent-POM:
org.apache.juddi:juddi-parent:pom:3.0.0.SNAPSHOT
for project: org.apache.juddi.bootstrap:apache-tomcat:pom:6.0.20
from the repository.
[DEBUG] Skipping disabled repository central
[DEBUG] juddi-parent: using locally installed snapshot
[DEBUG] Trying repository maven2-repository.dev.java.net


It seems like "using locally installed snapshot" should mean that it found the built object in my .m2 local repository, but then why does it then proceed to start looking out on the net?

The next step, I guess, is to learn more about Maven's behavior by reading through the Maven docs, and the jUDDI pom.xml files, and trying to correlate them to the output of the -X build.

Slow going, but that's how learning occurs.

Monday, November 9, 2009

Coders at Work: Brendan Eich

Chapter 4 of Coders at Work presents the interview with Brendan Eich.

Eich is the creator of JavaScript, as well as being the first implementer of the language. He continues to lead the JavaScript language design efforts, but in addition he is still active in implementation, including notably the recent ultra-high-performance JavaScript implementations in modern Mozilla products.

Eich's description of the invention of JavaScript is well-known, but it's still good to hear it again, from his perspective:

The immediate concern at Netscape was it must look like Java. People have done Algol-like syntaxes for Lisp but I didn't have time to take a Scheme core so I ended up doing it all directly and that meant I could make the same mistakes that others make.

...

But I didn't stick to Scheme and it was because of the rushing. I had too little time to actually think through some of the consequences of things I was doing. I was economizing on the number of objects that I was going to have to implement in the browser. So I made the global object be the window object, which is a source of unknown new name bindings and makes it impossible to make static judgments about free variables. So that was regrettable. Doug Crockford and other object-capabilities devotees are upset about the unwanted source of authority you get through the global object. That's a different way of saying the same thing. JavaScript has memory-safe references so we're close to where we want to be but there are these big blunders, these loopholes.


In addition to reading Eich's account of the technical details of the development of JavaScript, he provides a very interesting account of the various social pressures which were complicating the work on the language:

It was definitely a collaborative effort and in some ways a compromise because we were working with Adobe, who had done a derivative language called ActionScript. Their version 3 was the one that was influencing the fourth-edition proposals. And that was based on Waldemar Horwat's work on the original JavaScript 2/ECMAScript fourth-edition proposals in the late 90's, which got mothballed in 2003 when Netscape mostly got laid off and the Mozilla foundation was set up.
...
At the time there was a real risk politically that Microsoft was just not going to cooperate. They came back into ECMA after being asleep and coasting. The new guy, who was from Hyderabad, was very enthusiastic and said, "Yes, we will put the CLR into IE8 and JScript.net will be our new implementation of web JavaScript." But I think his enthusiasm went upstairs and then he got told, "No, that's not what we're doing." So it led to the great revolt and splitting the committee.
...
I think there's been kind of a Stockholm syndrome with JavaScript: "Oh, it only does what it does because Microsoft stopped letting it improve, so why should we want better syntax; it's actually a virtue to go lambda-code everything". But that Stockholm syndrome aside, and Microsoft stagnating the Web aside, language design can do well to take a kernel idea or two and push them hard.

Eich's interview rushes like a whirlwind. He clearly is such an intense and active thinker, and has so much that he wants to talk about, that there just isn't enough time or space in a few small pages to contain it all.

Luckily, there are a number of places on the web where you can find recorded lectures and writings that he has done so you can learn more, and in more detail. For example, here is a recent talk he gave at Yahoo.

As I was reading through the chapter, I found myself pausing about every other page to go chase a reference to various bits of knowledge I hadn't been aware of:

  • Hindley-Milner type inferences

  • The Currey-Howard correspondence

  • Valgrind, Helgrind, Chronomancer, and Replay


It must be exhausting to share an office with Eich :)

And, as a person who loves to use a debugger while studying code, I was pleased to read that Eich shares my fondness for stepping through code in the debugger:

When I did JavaScript's regular expressions I was looking at Perl 4. I did step through it in the debugger, as well as read the code. And that gave me ideas; the implementation I did was similar. In this case the recursive backtracking nature of them was a little novel, so that I had to wrap my head around. It did help to just debug simple regular expressions, just to trace the execution. I know other programmers talk about this: you should step through code, you should understand what the dynamic state of the program looks like in various quick bird's-eye views or sanity checks, and I agree with that.

Seibel: Do you do that with your own code, even when you're not tracking down a bug?

Eich: Absolutely -- just sanity checks. I have plenty of assertions, so if those botch then I'll be in the debugger for sure. But sometimes you write code and you've got some clever bookkeeping scheme or other. And you test it and it seems to work until you step through it in the debugger. Particularly if there's a bit of cleverness that only kicks in when the stars and the moon align. Then you want to use a conditional break point or even a watch point, a data break point, and then you can actually catch it in the act and check that, yes, the planets are all aligned the way they should be and maybe test that you weren't living in optimistic pony land. You can actually look in the debugger, whereas in the source you're still in pony land. So that seems important; I still do it.


"Optimistic pony land" -- what a great expression! It captures perfectly that fantasy world that all programmers are living in when they first start writing some code, before they work slowly and thoroughly through the myriad of pesky details that are inherent in specifying actions to the detail that computers require.

Well, more thoughts will have to wait for later; I'm heading back to pony land :)

Saturday, November 7, 2009

Coders at Work: Douglas Crockford

Chapter Three of Peter Seibel's Coders at Work contains his interview with Douglas Crockford.

I find the inclusion of Crockford in the book a little odd, because I don't really see him as much of a coder. You can see this in the interview: Crockford doesn't spend much time talking about code review, or source code control, or test harnesses, or debuggers, or the other sorts of things that occupy most waking seconds of most coders. When Crockford does talk about these things, he talks about his work at Basic Four, when he was using a Z80, or he talks about ideas from Multics; this is all relevant, but it's 30+ years old at this point.

I would describe Crockford as a language designer, because the work that has (rightfully) brought him attention and renown is his work in transforming the image (and reality) of JavaScript into its current position as the most important programming language in the world. So for that reason alone I think he is worth including in the book.

When I started learning about JavaScript about 10 years ago, the world was full of books talking about "DHTML" and telling you how to paste 3 ugly lines of JavaScript into your HTML form element so that when you clicked the Submit button, the JavaScript would check that your userid was not blank. Now we have Google Maps, and Yahoo Mail, and the Netflix movie browser UI, etc.: example after example of elegant, powerful, full-featured applications written in JavaScript.

Furthermore, the juxtaposition of the Crockford interview with the Eich interview (next chapter) is quite entertaining, as it has been the back-and-forth interaction between these two that has brought JavaScript to where it is. For example, in this chapter we get to hear Crockford say:

I can appreciate Brendan Eich's position there because he did some brilliant work but he rushed it and he was mismanaged and so bad stuff got out. And he's been cursed and vilified for the last dozen years about how stupid he is and how stupid the language is and none of that's true. There's actually brilliance there and he's a brilliant guy. So he's now trying to vindicate himself and prove, I'm really a smart guy and I'm going to show it off with this language that has every good feature that I've ever seen and we're going to put them all together and it's going to work.

And, next chapter, we get to hear this part of the story from Eich's point of view. So well done, Peter Seibel!

I found myself reading this chapter with an on-off switch: I kind of skimmed through the parts where Crockford discusses his work in his pre-JavaScript life, but when he talks about JavaScript, I found it much more interesting. He talks about the experience which has occurred to (probably) every experienced programmer who picked up JavaScript (certainly, it happened to me):

I understand why people are frustrated with the language. If you try to write in JavaScript as though it is Java, it'll keep biting you. I did this. One of the first things I did in the language was to figure out how to simulate something that looked sort of like a Java class, but at the edges it didn't work anything like it. And I would always eventually get pushed up against those edges and get hurt.

Eventually I figured out I just don't need these classes at all and then the language started working for me. Instead of fighting it, I found I was being empowered by it.

The key aspect of JavaScript which takes most Java programmers a long time to get past is the difference between abstraction-based-on-classification (Java) and abstraction-based-on-prototype (JavaScript), so it is here that I found Crockford's insights most fascinating:

Part of what makes programming difficult is most of the time we're doing stuff we've never done before. If it was stuff that had been done before we'd all be re-using something else. For most of what we do, we're doing something that we haven't done before. And doing things that you haven't done before is hard. It's a lot of fun but it's difficult. Particularly if you're using a classical methodology you're having to do classification on systems that you don't fully understand. And the likelihood that you're going to get the classification wrong is high.

Seibel: By "classical" you mean using classes.

Crockford: Right. I've found it's less of a problem in the prototypal world because you focus on the instances. If you can find one instance which is sort of typical of what the problem is, you're done. And generally you don't have to refactor those. But in a classical system you can't do that -- you're always working from the abstract back to the instance. And then making hierarchy out of that is really difficult to get right. So ultimately when you understand the problem better you have to go back and refactor it. But often that can have a huge impact on the code, particularly if the code's gotten big since you figured it out. So you don't.
...
I've become a really big fan of soft objects. In JavaScript, any object is whatever you say it is. That's alarming to people who come at it from a classical perspective because without a class, then what have you got? It turns out you just have what you need, and that's really useful. Adapting your objects ... the objects that you want is much more straightforward.


I think this may be one of the most insightful and brilliant distillations of everything that's right with object-oriented programming, and everything that's wrong with object-oriented programming, that I've ever read. I've been doing object-oriented programming for 15 years, and I haven't seen such a concise, precise, and accurate critique of its essence in a long time.

I hope that Crockford continues to find an audience, and I hope that he continues to work on improving JavaScript. Although he can be a prickly and sharp-tongued fellow, he has a lot of interesting thoughts on this subject, and, particularly given the importance of the subject, I hope he continues to keep attention focused on this topic for a long time.

Friday, November 6, 2009

Google have open-sourced their JavaScript library

Another powerful and sophisticated open source JavaScript library has joined the party.

Among the things that seem interesting about Google Closure are:

  • Their approach is to introduce an explicit compilation step, so that developers can feel confident in writing well-commented maintainable source, which is then compiled by the Closure compiler down to a smaller and tighter deployable JavaScript program

  • They've gone with a templating approach. Templating approaches seem to come and go, dating back to things like ASP and JSP last century, and probably older systems before that -- I think ColdFusion was a templating system.

  • Their infrastructure runs both server-side and client-side, and supports Java as the implementation language on the server-side (and of course JavaScript as the language on the client side.



Since this library is from Google, you can be sure that it is thorough, powerful, and sophisticated, and therefore worthy of study.

I guess that means I've got something else to learn about now!