Alan Kay, on Quora, about Web Browsers as document viewers: My Wiki

Alan Kay, on Quora, about Web Browsers as document viewers

13th November 2022 at 8:58pm

Alan Kay answering on Quora the question "Should web browsers have stuck to being document viewers?" (related to this tweet).

See my comment

Actually quite the opposite, if “document” means an imitation of old static text media (and later including pictures, and audio and video recordings).

It was being willing to settle for an overly simple text format and formatting scheme — “for convenience” — that started the web media architecture off in entirely the wrong direction (including the too simple reference scheme c.f. Doug Engelbart and Ted Nelson). Circa early 90s, it had the look and feel of an atavistic hack. I expected that Netscape would fix this rather than just try to dominate what was there (I expected a better architecture both for “thinking about media in the age of computing” and also something not like “an app” but more like an operating system to deal with the actual systems requirements, demands, and scalings of the world-wide Internet-in-progress).

It’s both surprisingly and dismayingly difficult to get people — especially computerists — to criticize the web and the web browsers — even more so perhaps today.

This is despite the glaring fact that the interactive media provided by the web and browsers has always been a hefty and qualitative subset of the media on the very personal computers that run the web browsers.

At the time of the WWW’s inception — in the early 90s — I made several recommendations — especially to Apple where I and my research group had been for a number of years — and generally to the field. These were partially based on the scope and scalings that the Internet was starting to expand into.

Apple’s Hypercard was a terrific and highly successful end-user authoring system whose media was scripted, WYSIWYG, and “symmetric” (in the sense that the “reader” could turn around and “author” in the same high-level terms and forms). It should be the start of — and the guide for — the “User Experience” of encountering and dealing with web content.
The underlying system for a browser should not be that of an “app” but of an Operating System whose job would be to protectively and safely run encapsulated systems (i.e. “real objects”) gotten from the web. It should be the way that web content could be open-ended, and not tied to functional subsets in the browser.

I pointed out that — as with the Macintosh itself — these two recommendations — which seem to be somewhat at odds — have to be reconciled. The first recommendation would be the next stage in the excellent Macintosh “guidelines” about its user experience (Chris Espinosa and others have never been praised highly enough for this important work). These guidelines laid out the conventions to be followed for any app of any functionality — they are the parts that must be similar.

The second recommendation was to reinforce the idea that the content to be run within the system had to be as free from the tools of the OS as absolutely possible (because special needs often require special designs etc). An example, was that the content needed to be able to generate its own graphics if necessary (even if the OS supplied some graphics tools). The more the content wanted to go its own way, the more its presentation to the users had to be made to conform to the standards in (1). As with any decent OS, it has to allow for new ideas while also providing the resources for safety, efficiency, and to manifest user experiences.

If we squint at some of the implications of both of these, we can find a number of good principles from the past. One of them — as a real principle — I trace to the first Unix systems at Bell Labs. The design was partly a reaction against the extremely complex organization of the Multics OS at MIT. One of the great realizations of the early Unix was that the kernel of an OS — and essentially the only part that should be in “supervisor mode” — would only manage time (quanta for interleaved computations) and space (memory allocation and levels) and encapsulation (processes) — everything else should be expressible in the general vanilla processes of the system. More functionality could be supplied by the resources that came along with the OS, but these should easily be replaceable by developer processes when desired.

The original idea was to instigate as much progress as possible without incurring lock-in to a huge OS, but to protect what needed to be protected and ensure a threshold of system integrity and reliability.

Sidebar: perhaps the best early structuring and next stage design of Unix was Locus by Gerry Popek and his researchers at UCLA in the early 80s. Locus allowed live Unix processes to migrate not just from one machine to another on a network, but to a variety of machine types. This was done by combining the safety required for interrupts with multiple code hooks in each process, so an “interrupt” could allow the process to be moved to a different machine and resumed with different (equivalent) code. It was easy to see that combining this with an end-user language would provide a network-wide system that would run compatibly over the entire Internet. Soon after arriving at Apple ca 1984, I tried to get them to buy Locus, but the “powers that be” at the time couldn’t see it.

Note that when such a system is made interactive — e.g. using the sweeping ideas from the ARPA/Parc research community — the end-users need to have a user interface framework that is generically similar as much as possible over all applications — and that this can conflict with the freedoms needed for new ideas and often new functionalities.

So this is an important, large, and difficult design problem.

My complaints about the web and the web browsers have been about how poorly they were thought about and implemented, and how weak are both the functionalities of web content and the means for going forward and fixing as many of the most critical mistakes as possible.

One way to look at where things are today is that the circumstances of the Internet forced the web browsers to be more and more like operating systems, but without the design and the look-aheads that are needed.

There is now a huge range of conventions both internally and externally, and some of them require and do use a dynamic language. However, neither the architecture of this nor the form of the language, or the forms of how one gets to the language, etc. are remotely organized for the end-users. The thresholds are ridiculous when compared to both the needs and the possibilities.
There is now something like a terribly designed OS that is the organizer and provider of “features” for the non-encapsulated web content. This is a disaster of lock-in, and with actually very little bang for the buck.

This was all done after — sometimes considerably after — much better conceptions of what the web experience and powers should be like. It looks like “a hack that grew”, in part because most users and developers were happy with what it did do, and had no idea of what else it should do (and especially the larger destinies of computer media on world-wide networks).

To try to answer the question, let me use “Licklider’s Vision” from the early 60s: “the destiny of computing is to become interactive intellectual amplifiers for all humanity pervasively networked worldwide”.

This doesn’t work if you only try to imitate old media, and especially the difficult to compose and edit properties of old media. You have to include all media that computers can give rise to, and you have to do it in a form that allows both “reading” and “writing” and the “equivalent of literature” for all users.

Examples of how to do some of this existed before the web and the web browser, so what has happened is that a critically weak subset has managed to dominate the imaginations of most people — including computer people — to the point that what is possible and what is needed has for all intents and purposes disappeared.

Footnote about “Ever expanding requirements at Parc” (prompted by Phillip Remaker’s comment and question)
When Gary Starkweather invented and got the first laser printer going very quickly, and at astounding speeds (a page per second, 500 pixels per inch), there was a push to get one of these on the networked Altos (for which the Ethernet had been invented). The idea was to use an Alto as a server that could set up and run a laser printer to rapidly print high quality documents.
Several of the best graphics people at Parc created an excellent “printing standard” for how a document was to be sent to the printer. This data structure was parsed at the printer side and followed to set up printing.
But just a few weeks after this, more document requirements surfaced and with them additional printing requirements.
This led to a “sad realization” that sending a data structure to a server is a terrible idea if the degrees of freedom needed on the sending side are large.
And eventually, this led to a “happy realization”, that sending a program to a server is a very good idea if the degrees of freedom needed on the sending side are large.
John Warnock and Martin Newell were experimenting with a simple flexible language that could express arbitrary resolution independent images — called “JAM” (for “John And Martin” — and it was realized that sending JAM programs — i.e. “real objects” to the printer was a much better idea than sending a data structure.
This is because a universal interpreter can both be quite small and also can have more degrees of freedom than any data structure (that is not a program). The program has to be run in a protected address space in the printer computer, but it can be granted access to a bit-buffer, and whatever it does to it can then be printed out “blindly”.
This provides a much better match up between a desktop publishing system (which will want to print on any of the printers available, and shouldn’t have to know about their resolutions and other properties), and a printer (which shouldn’t have to know anything about the app that made the document).
“JAM” eventually became Postscript (but that’s another story).
Key Point: “sending a program, not a data structure” is a very big idea (and also scales really well if some thought is put into just how the program is set up).

Related to

My comments on Alan Kay about Web Browsers as document viewers

Backlinks: My comments on Alan Kay about Web Browsers as document viewers