A proposal to support hypertext links in ANSI terminals

Kragen Javier Sitaker, 2013-05-17 (updated 2019-12-26) (13 minutes)

We should totally have a way to render HTML-style links in terminal emulators.

Why this is a good idea

Lots of terminal emulators already have a way to recognize URLs in free text so that you can visit them (ctrl-click in gnome-terminal, an option on the dropdown menu in both gnome-terminal and Konsole), which is useful. But if you're chatting over XMPP with someone who's using HTML (like HipChat), they're probably going to embed links from time to time. And they expect that when they say:

We'll go from my house to Facebook at around 10:00.

where "my house" is linked to http://maps.google.com/maps?client=ubuntu&channel=fs&oe=utf-8&q=1456+Edgewood+Dr.,+Palo+Alto,+CA&um=1&ie=UTF-8&hq=&hnear=0x808fbb0d745a65e9:0xfde4b05151805922,1456+Edgewood+Dr,+Palo+Alto,+CA+94301&gl=us&sa=X&ei=bZNCUbbuI62g4AOPyYDQDw&ved=0CDAQ8gEwAA and "Facebook" is linked to http://maps.google.com/maps?q=facebook&hl=en&sll=37.45392,-122.13933&sspn=0.007495,0.013797&gl=us&hq=facebook&t=m&z=16, then you will see something like

We'll go from my house to Facebook at around 10:00.

and not

We'll go from http://maps.google.com/maps?client=ubuntu&channel=fs&oe =utf-8&q=1456+Edgewood+Dr.,+Palo+Alto,+CA&um=1&ie=UTF-8&hq=&hnear=0x80 8fbb0d745a65e9:0xfde4b05151805922,1456+Edgewood+Dr,+Palo+Alto,+CA+9430 1&gl=us&sa=X&ei=bZNCUbbuI62g4AOPyYDQDw&ved=0CDAQ8gEwAAmy house to
http://maps.google.com/maps?q=facebook&hl=en&sll=37.45392,-122.13933&s spn=0.007495,0.013797&gl=us&hq=facebook&t=m&z=16 Facebook at around
10:00.

which is pretty annoying and hard to read.

The proposed solution: ESC [ _ and ESC [ U < url >

We define two new ANSI-compatible escape sequences, which ought to be proposed for the next version of ECMA-48 and the corresponding ISO standard:

As an example, a link labeled "Canonical Hackers" linking to http://canonical.org/ could be represented as follows, with ESC representing the ASCII escape character:

ESC[_Canonical HackersESC[U<http://canonical.org/>

Rationale for the design of the escape sequence

Above and beyond the typographical possibilities of whitespace, we already have boldface, underlining, and different colors in our terminal emulators. These are produced by "escape sequences", invisible sequences of characters typically beginning with the ESC character (character 27) followed by a "[", which change the state of the terminal so that subsequently displayed characters will have a different effect. The ESC[ sequence is called "CSI", "Control Sequence Introducer".

So I think we should define an escape sequence to make the following (or preceding?) sequence of characters into a clickable link to a given URL.

It would be desirable if the escape sequence degraded into something that was human-readable when displayed on terminals that didn't support it. This is only possible to a limited extent, since programs that try to be aware of screen layout will necessarily be confused about where the text is on the screen, but it is still somewhat possible.

ANSI escape sequences can't contain sequences of arbitrary characters, while URLs can contain most characters. Consequently the escape sequences for setting the terminal title aren't ANSI escape sequences: ESC]0;new title^G, where ^G is the BEL character control-G (character 7) and can also be ESC\, and 0 can be replaced with 1 or 2. The ESC] sequence is known as "OSC" or "Operating System Command".

xterm's parsing of the above sequence seems to consume anything following ESC] until the next ^G or ESC\, even if it doesn't start with a digit, which is pretty unfriendly. This suggests that if we wanted to use the same ESC] introduction sequence, incompatible xterm implementations would eat the URL if we terminated it with the same ^G, and other things as well if we used a different terminator. Even if most people are now using other terminal emulators, this seems like a compatibility trap.

Putting the link escape sequence after rather than before the text to be marked up offers a certain kind of safety; it's quite easy for an unterminated color escape sequence to set the next few pages of output to white on white, or flashing, or underlined, or whatnot. It also probably improves matters on terminals that don't yet understand the escape sequence, as they could see "Canonicalhttp://canonical.org/" rather than "http://canonical.orgCanonical". Since you presumably need two escape sequences anyway to indicate the boundary of the link, you might as well put the URL in the final one. So you have one escape sequence to indicate "beginning of link" and another one to indicate "end of link; URL is xyz".

For reasons of tradition and URL-safety, I would like the delimiters (particularly in the gracefully degraded form) to be <>. Ideally the beginning and ending sequences would also have a pleasing visual and memorable symmetry, and would disappear from view entirely in old terminals.

(In the following, ESC means the ASCII character 27, escape.)

This suggests using one of the ASCII matching delimiter pairs [](){}<>`'. [] are right out, since they're already in use. ESC( and ESC) cause rxvt to swallow the following character; I'm not sure what their meaning is, but they seem to be used. Treating `' as a matched pair is sadly out of style, due to the unfortunate but now nearly universal adoption from Microsoft Windows fonts of a vertical '.

ESC{ and ESC} disappear in rxvt and xterm, render as literal ESC character glyphs in gnome-terminal, overlaid on top of the {} in my font, and disappear in konsole while producing warning messages on its stderr. ESC< and ESC> disappear in rxvt; the second disappears in gnome-terminal, while the first renders as a literal ESC character glyph; they disappear in konsole. This suggests that perhaps ESC< and ESC> are already taken.

A little looking around suggests that ESC> is "set numeric keypad mode" aka DECKPNM on VT100s, ESC< is "exit ANSI mode" on VT52s, while ESC( and ESC) are used by VT100s to change character sets (setaltg0, setaltg1, etc.)

ESC} on VT100s is "invoke the G2 character set", according to http://rtfm.etla.org/xterm/ctlseq.html, although that's ignored in xterm and probably all other modern terminal emulators.

So this suggests the following syntax:

ESC{Canonical.ESC}<http://canonical.org/>

Actually, though, you could use valid ANSI escape sequences instead of ESC{ and ESC}. I wanted to use ESC[a for the "begin link" sequence (like <a>), but rxvt already uses it for a nonstandard "move right" sequence, an alternative spelling of ESC[C. (See rxvt-2.6.4/src/command.c:2652, inside process_csi_seq.) The full set of CSI-ending codes handled by rxvt seems to be iAeBCaDEFG`dHfIZJK@LMXPT^ScmnrshltgW, which is to say, @ABCDEFGHIJKLMPSTWXZ^`acdefghilmnrst.

Argh, so what to use for the other escape sequence? Wikipedia says:

For two character sequences, the second character is in the range ASCII 64 to 95 (@ to _). However, most of the sequences are more than two characters, and start with the characters ESC and [ (left bracket). This sequence is called CSI for Control Sequence Introducer (or Control Sequence Initiator). The final character of these sequences is in the range ASCII 64 to 126 (@ to ~).

This suggests that we could use "ESC[", which konsole reports as an "Undecodable sequence" and drops, rxvt apparently drops (and isn't in rxvt's list of CSI codes), xterm and screen drop, and gnome-terminal displays literally. "ESC[" is nice and mnemonic: links are normally underlined.

ESC[U would work for the "end link, begin URL" sequence, which could be followed by the URL wrapped in <>. If the URL is specified to end at the next space or >, then this sequence would be unlikely to inadvertently gobble up a large quantity of text when random data is sent to the terminal that randomly happens to include ESC[U. So that would give us:

ESC[_Canonical.ESC[U<http://canonical.org/>

Implementing the escape sequence

You could write your own terminal emulator to support links, but it probably makes more sense to implement the feature in existing terminal-emulation software. The popular free-software terminal emulators are tmux, screen, gnome-terminal, konsole, rxvt, xterm, Emacs, and whatever Apple ships, plus perhaps implementation in ncurses is necessary for much application software to use it.

I looked through the available file on my Ubuntu box to see what other terminal emulators there are. The relevant popularity metrics from http://popcon.debian.org/main/by_vote are:

#rank name                            inst  vote   old recent no-files (maintainer)
34    libncurses5                    137051 119786  7528  9710    27 (Craig Small)                   
448   libvte9                        65767 30960 26691  8033    83 (Debian Gnome Maintainers)      
499   gnome-terminal                 54405 27886 21381  5118    20 (Debian Gnome Maintainers)      
771   xterm                          77540 17077 50118 10317    28 (Debian X Strike Force)         
918   screen                         45241 12867 30602  1758    14 (Axel Beckert)                  
1078  libvte-2.90-9                  27674  9519 11363  5591  1201 (Debian Gnome Maintainers)      
1182  konsole                        16489  8229  6662  1590     8 (Debian Qt/kde Maintainers)     
1434  emacsen-common                 28896  5699 19961  2805   431 (Rob Browning)                  
1609  xfce4-terminal                  9368  4110  4360   896     2 (Debian Xfce Maintainers)       
2109  tmux                            6908  2154  4244   508     2 (Karl Ferdinand Ebert)          
2285  lxterminal                      5290  1803  2946   541     0 (Debian Lxde Maintainers)       
2739  terminator                      2158  1274   764   119     1 (Nicolas Valcárcel Scerpella)   
2766  yakuake                         2324  1242   972   110     0 (Ana Beatriz Guerrero Lopez)    
3073  guake                           1499   972   449    78     0 (Sylvestre Ledru)               
4913  tilda                            724   324   367    33     0 (Davide Truffa)                 
4920  eterm                           1189   322   787    80     0 (Debian Qa Group)               
5130  terminal.app                     830   294   509    26     1 (Debian Gnustep Maintainers)    
5289  rxvt                            2032   274  1634   124     0 (Jan Christoph Nordholz)        
6432  aterm                            972   180   761    31     0 (Debian Qa Group)               
6948  cutecom                          796   148   596    50     2 (Roman I Khimov)                
7207  gtkterm                          781   135   619    27     0 (Sebastien Bacher)              
7299  sakura                           299   132   155    12     0 (Andrew Starr-bochicchio)       
7376  ajaxterm                         251   128   116     7     0 (Julien Valroff)                
7855  mrxvt                            452   110   320    22     0 (Jan Christoph Nordholz)        
7768  picocom                          535   113   378    43     1 (Matt Palmer)                   
7975  mlterm                           628   106   448    73     1 (Kenshi Muto)                   
8386  fbterm                          1392    95  1097   199     1 (Nobuhiro Iwamatsu)             
8947  roxterm                          470    82    79     5   304 (Tony Houghton)                 
10260 pterm                            345    59   231    55     0 (Colin Watson)                  
10458 jfbterm                         1142    56  1007    78     1 (Debian Qa Group)               
10771 kterm                            560    52   497    11     0 (Ishikawa Mutsumi)              
13843 evilvte                          117    27    87     3     0 (Wen-yen Chuang)                
14563 microcom                         187    24   150    13     0 (Alexander Reichle-schmehl)     
14898 xvt                              212    23   177    11     1 (Sam Hocevar)                   
15979 termit                            71    19    49     3     0 (Thomas Koch)                   
19919 vala-terminal                     44    10    32     2     0 (Debian Freesmartphone.org Team)
24553 xiterm+thai                       36     5    28     3     0 (Neutron Soutmun)               
24634 bogl-bterm                        41     4    32     5     0 (Samuel Thibault)               
25809 pyqonsole                         31     4    24     3     0 (Alexandre Fayolle)             
45654 libterm-vt102-perl                 5     0     5     0     0 (Debian Perl Group)

(I thought Text::CharWidth might be relevant, but it doesn't seem to handle escape sequences anyway.)

Now, libvte9 or libvte-2.90-9 is the actual terminal emulator library that powers a number of the above terminal emulators, at least gnome-terminal, sakura, xfce4-terminal, vala-terminal, tilda, lxterminal, gtkterm, and evilvte. But from looking at the code of gnome-terminal and xfce4-terminal, each separate application built on top of libvte would probably have to write some code to handle clicks on link regions.

It seems that once you add the feature to libncurses5 (in the termcap, at least), libvte9, and gnome-terminal, it's available to 20% of users of Debian and similar systems; if you add it to xterm, you get another 12%, although it's perhaps dubious whether upstream xterm will accept such a patch; adding the feature to screen makes it available to another 9%, although some of them will still be using other terminals (such as MacOS X terminal) to connect to their screen; konsole gets another 6%; emacs ansi-color.el 4%; xfce4-terminal 3%; and tmux 1.6%.

Security

Some escape sequences of the past have been disabled for security reasons in modern software. However, in general, it's safe to launch arbitrary URLs in a normal browser at the moment, or so we believe; so this should be safe. It may, however, result in terminal users getting rickrolled. It would be useful to have a way to see what the linked URL is before you click on it.

Topics