keyboards: apple keyboard | boot camp | ios keyboard

character sets and encodings: ascii | control characters | ansi escapes | unicode

operating systems and applications: mac | windows | editors | emacs | html

non-ascii characters: english punctuation | latin accent | polytonic greek | mathematics | keyboard notation

Notes on how I use my keyboard

Apple Keyboard

apple-keyboard.png

I use the keyboards that come with my laptops. I don't plug in an external keyboard.

I use Apple laptops. The keyboards are almost identical to the keyboard pictured above. The differences is that there are brightness controls for the keyboard at F5 and F6 and the eject key has been replaced by a power key.

Boot Camp

Boot Camp: Apple Keyboard

I run Windows on an Apple laptop using Boot Camp. Here are the Boot Camp mappings:

PC key AK key
Cmd (Ctrl+Esc)
Backspace delete
Enter return
Alt option
AltGr (Ctrl+Alt) ^⌥
Pause/Break fn esc
Insert fn return
Delete fn delete
Home fn←
End fn→
PgUp fn↑
PgDn fn↓
Num Lock fn F6
Print Screen fn⇧F11
Print Active Window fn⇧⌥F11
Scroll Lock fn⇧F12

iOS Keyboard

The iOS software keyboard has three panels. They cover all of the printing ASCII characters except for the backquote. The backquote can be generated by holding down on the apostrophe and selecting from a pop-up list. Some of the other keys have pop-up lists as well.

These pop-ups were added to Mac OS X 10.7. They are available in some applications (Chrome). Others will repeat instead (Terminal.app).

letter panel pop-ups
e e è é ê ë ē ė ę
y ÿ y
u ū ú ù ü û u
i ì į ī í ï î i
o õ ō ø œ ó ò ö ô o
a a à á â ä æ ã å ā
s s ß ś š
l ł l
z z ž ź ż
c c ç ć č
n ń ñ n
number and punctuation panel pop-ups
0 ° 0
- - – — •
$ ¥ € $ ¢ £ ₩
& & §
" « » „ “ ” "
. . …
? ? ¿
! ! ¡
' ` ‘ ’ '
% % ‰

I've successfully cut-and-paste other Unicode characters on iOS using the Unicode Consortium Code Charts.

One can add other keyboards to iOS. Use Settings | General | Keyboard | Keyboards. This is also where to remove the Emoji keyboard. I added the Greek keyboard in addition to a U.S. keyboard. When multiple keyboards are chosen in Settings, there is a globe key on the keyboard to switch between them.

I turn off Auto-Capitalization and Auto-Correction.

  • Auto-Capitalization
  • Auto-Correction
  • Check Spelling
  • Enable Caps Lock
  • "." Shortcut

Caps Lock is effected by double tapping the shift key. The "." shortcut is effected by double tapping the space bar. It inserts a period followed by a space.

There is an iOS app called Prompt which serves as an ssh client. The keyboard has an extra row on the top which is always visible. These are the keys:

ESC CTRL TAB  / - | @  ↑ ↓ ← →

By holding the ESC key, one gets a pop-up with META on it.

Holding the arrow keys causes them to repeat.

The keys in the middle {{/ - | @ can be changed by pressing and holding them.

Character Sets and Encodings

ASCII is a set of 128 printing and control characters. 8-bit ASCII is a way to represent ASCII characters with bytes. In 8-bit ASCII the most significant bit in a byte is zero. A string which is valid 8-bit ASCII is also valid UTF-8. See man iconv for instructions on how to convert a string in a different encoding to UTF-8.

Printing characters are characters which when typed render a character. Control characters are characters which when typed instruct a device, operating system, or application to perform an action other than render a character. Few applications or devices support all control characters, so in practice many control characters are ignored or cause an error.

ASCII

zone range chars range chars range chars
numeric 32
33-41
42-47
SPACE
! " # $ % & ' ( )
* + , - . /
48-57 0-9 58-63 : ; < = > ?
uppercase 64 65-90 A-Z 91-95 [ \ ] ^ _
lowercase 96 ` 97-122 a-z 123-126 { | } ~

Most American keyboards since the DEC vt100 (1978) and the IBM PC (1981) have had all the non-lowercase printing ASCII characters explicitly depicted. Furthermore the characters have been on the same keys since the Mac (1984) and the IBM Model M (1985).

Keys for control characters are more complicated. Broadly, keyboards since the DEC vt100 have dedicated keys for TAB, ESC, LF, and DEL, with a control modifier key which can be used to enter the other control characters. One adds 64 modulo 128 to the control character value to get the printing character whose key is used in conjunction with the control modifier key.

On DOS and Windows, the Enter key maps to a CR LF character sequence. The IBM PC had both a backspace (BS) and delete (DEL) key. The original ASCII interpretation for these characters was that BS was to be used to backup and overstrike, whereas DEL was used to remove the previous character. In IBM PC usage BS removes the previous character and DEL removes the following character.

On the original Mac there was no ESC or control key. There was, however, both a Return and an Enter key which mapped to CR and LF respectively. CR was used as newline on the original Mac OS, and LF sometimes had an application specific interpretation. Escape and control keys were added to the Mac by 1986.

Control Characters

Sometimes it is necessary to use printing characters to refer to control characters. There are several schemes for doing this. These schemes are ambiguous and rely on the reader to determine whether control characters or printing characters are the intended meaning. The space character is a printing character, but it is useful to have separate notation for it as if it were a control character. The ASCII standard provides codes of two to three printing characters for all of the control characters. The Unicode standard has special characters starting at U+2400 which combine the ASCII codes into a single character.

ASCII Unix Emacs Microsoft C string
NUL ^ C- \000
SOH ^A C-a CTRL+A \001
STX ^B C-b CTRL+B \002
ETX ^C C-c CTRL+C \003
EOT ^D C-d CTRL+D \004
ENQ ^E C-e CTRL+E \005
ACK ^F C-f CTRL+F \006
BEL ^G C-g CTRL+G \a
BS ^H C-h BACKSPACE (CTRL+H) \b
TAB ^I TAB (C-i) TAB (CTRL+I) \t
LF ^J RET (C-j) ENTER (CTRL+J) \n
VT ^K C-k CTRL+K \v
FF ^L C-k CTRL+L \f
CR ^M C-m CTRL+M \r
SO ^N C-n CTRL+N \016
SI ^O C-o CTRL+O \017
DLE ^P C-p CTRL+P \020
DC1 (XON) ^Q C-q CTRL+Q \021
DC2 ^R C-r CTRL+R \022
DC3 (XOFF) ^S C-s CTRL+S \023
DC4 ^T C-t CTRL+T \024
NAK ^U C-u CTRL+U \025
SYN ^V C-v CTRL+V \026
ETB ^W C-w CTRL+W \027
CAN ^X C-x CTRL+X \030
EM ^Y C-y CTRL+Y \031
SUB ^Z C-z CTRL+Z \032
ESC ^[ ESC (C-[) ESC \033
FS ^\ C-\ \034
GS ^] C-] \035
RS ^^ C-^ \036
US ^_ C-_ \037
SP SPC SPACEBAR
DEL ^? DEL DELETE \177

"Unix" notation predates Unix, since it was used in ITS operating system documentation in 1969. The Unix notation is used by Mac OS X and we regard it as the preferred notation. We use Emacs notation in the context of text editors and Microsoft notation in the context of Windows, however.

ANSI Escapes

ECMA-48 (pdf)

escape sequences for cursor movement and screen clearing
sequence rendering
ESC [ A
ESC [ n A
move cursor up one or n rows
ESC [ B
ESC [ n B
move cursor down one or n rows
ESC [ C
ESC [ n C
move cursor forward one or n columns
ESC [ D
ESC [ n D
move cursor back one or n columns
ESC [ E
ESC [ n E
move cursor to beginning of line down one or n columns
ESC [ F
ESC [ n F
move cursor to beginning of line up one or n columns
ESC [ n G move cursor to column n
ESC [ n ; m H move cursor to row n, column m
ESC 0 J
ESC 1 J
ESC 2 J
clear screen from cursor to end
clear screen from cursor to beginning
clear entire screen
ESC ] 0 ; name ^G set terminal name

To create an ESC when editing with emacs, type C-q ESC. To create an ESC when editing with vim, type Ctrl-ESC.

[[table class="wiki-content-table"]]
[[row]]
[[cell style="border: 1px solid black; background-color: #EEE"]]sequence[[/cell]]
[[cell style="border: 1px solid black; background-color: #EEE"]]rendering[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 0 m
ESC [ m[[/cell]]
[[cell style="border: 1px solid black"]]
normal
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 1 m
[[/cell]]
[[cell style="border: 1px solid black"]]
bold
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 3 m
[[/cell]]
[[cell style="border: 1px solid black"]]
italic
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 4 m
[[/cell]]
[[cell style="border: 1px solid black"]]
underlined
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 7 m
[[/cell]]
[[cell style="border: 1px solid black; color: white; background-color: black"]]
reverse video
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 30 m
[[/cell]]
[[cell style="border: 1px solid black"]]
black text color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 31 m
[[/cell]]
[[cell style="border: 1px solid black"]]
red text color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 32 m
[[/cell]]
[[cell style="border: 1px solid black"]]
green text color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 33 m
[[/cell]]
[[cell style="border: 1px solid black"]]
yellow text color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 34 m
[[/cell]]
[[cell style="border: 1px solid black"]]
blue text color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 35 m
[[/cell]]
[[cell style="border: 1px solid black"]]
magenta text color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 36 m
[[/cell]]
[[cell style="border: 1px solid black"]]
cyan text color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 37 m
[[/cell]]
[[cell style="border: 1px solid black"]]
white text color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 40 m
[[/cell]]
[[cell style="border: 1px solid black; color: white; background-color: black"]]
black background color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 41 m
[[/cell]]
[[cell style="border: 1px solid black; background-color: red"]]
red background color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 42 m
[[/cell]]
[[cell style="border: 1px solid black; background-color: green"]]
green background color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 43 m
[[/cell]]
[[cell style="border: 1px solid black; background-color: yellow"]]
yellow background color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 44 m
[[/cell]]
[[cell style="border: 1px solid black; background-color: blue"]]
blue background color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 45 m
[[/cell]]
[[cell style="border: 1px solid black; background-color: magenta"]]
magenta background color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 46 m
[[/cell]]
[[cell style="border: 1px solid black; background-color: cyan"]]
cyan background color
[[/cell]]
[[/row]]
[[row]]
[[cell style="border: 1px solid black"]]
ESC [ 47 m
[[/cell]]
[[cell style="border: 1px solid black; background-color: white"]]
white background color
[[/cell]]
[[/row]]
[[/table]]

Unicode

Unicode characters have a Unicode point which is a number between 0 and 1114111 inclusive. Unicode characters with a point between 0 and 65535 are said to be in the Basic Multilingual Plane (BMP). Unicode points are usually written with hex notation, in which case the BMP points range from U+0000 to U+FFFF. In this notation the highest Unicode point is U+10FFFF.

To enter a Unicode character by point on Mac, switch to Unicode Hex Input. Hold down the option key and type in the four hex digit representation. How to enter characters outside the BMP?

Because it relies on the option key, Unicode Hex Input does not work with Emacs or Terminal with Use option as meta key set.

In Emacs use the keybinding C-x 8 RET or M-x ucs-insert to insert a Unicode character by point.

On Windows, use WordPad. Type the hex value for a point, then Alt+X, and the hex value will be converted to the character. Then copy-and-paste the text to another application.

Unicode 6.2 has 110,182 encoded characters, 137,468 code points reserved for private use, 2,048 code points for surrogates, and 66 code points for non-characters. Non-characters in the BMP are U+FFFF and U+FFEF and the range U+FDD0..U+FDEF.

The fields in [ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt UnicodeData.txt] are

  • Point
  • Name
  • General_Category
  • General_Combining_Class
  • Bidi_Class
  • Decomposition_Type/Decomposition_Mapping
  • Numeric_Type/Numeric_Value
  • Bidi_Mirrored
  • <Obsolete
  • <Obsolete
  • Simple_Uppercase_Mapping
  • Simple_Lowercase_Mapping
  • Simple_Titlecase_Mapping

If you need to handle combining characters or bidirectional text there are more properties to be aware of.

Unicode Newline Guidelines

Operating Systems and Applications

Shortcuts I find useful:

Shortcuts by Operating System and Application

Mac

Custom key bindings and shortcuts hamper communication and cause disorientation when using other people's setups. That said, I map the caps lock key to the control key.

To map caps lock to control on Mac OS X go to:

System Preferences | Keyboard | Keyboard | Modifier Keys...

In Terminal.app on Mac OS X I check this checkbox:

Preferences... | Keyboard | Use option as meta key

This makes it easier to use Emacs in Terminal.app. The following meta keystrokes are useful for line-mode editing: M-b M-f M-d M-DEL M-l M-u. Also, less has the Emacs binding for M-v. However, using option as a meta key disables option shortcuts for non-ASCII characters. I would like a keystroke shortcut to toggle the Use option as meta key preference, but it doesn't appear to be exposed as a property to AppleScript.

I define the following custom shortcuts on Mac OS X:

keystroke action software
^⌥⌘M maximize window divvy
^⌥⌘← put window to left divvy
^⌥⌘→ put window to right divvy
⌥⌘Space switch input mode

On Mac OS X I use these three input sources:

Differences between ABC Extended and U.S. International:

abc_extended.jpg
us_international.jpg

Windows

To map caps lock to control on Windows add the following to the Registry:

key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout
name Scancode Map
type REG_BINARY
data 00,00,00,00,00,00,00,00,02,00,00,00,1d,00,3a,00,00,00,00,00

On Windows I use these three input languages and keyboards:

  • ENG - US
  • FRA - United States-International
  • ΕΛ - Greek Polytonic

I run Windows under Boot Camp, but I don't use the Apple keyboards. Use Shift+Space to change the input method. The United States-International input using combining keystrokes:

input character
'a á
'c ç
`a à
"a ä
^a â
~a ã

The best way to get the a combining character as a standalone seems to be to press it twice and then delete one.

The United States-International input also uses the Right-Alt key as a special modifier key:

Right-Alt+s ß
Right-Alt+1 ¡
Right-Alt+/ ¿

There are others, but I'm not aware of a way to preview them.

Editors

Editor Key Bindings

Emacs

I use the following custom key bindings in Emacs:

keystroke binding default binding
C-c b M-x revert-buffer
C-c c M-x clipboard-yank
C-c d M-x ido-dired
C-c f (defun display-buffer-file-name ()
  (interactive)
  (message buffer-file-name))
C-c r M-x query-replace
C-c v M-x clipboard-kill-region-save
C-c x M-x clipboard-kill-region
C-x b M-x ido-switch-buffer M-x switch-buffer
C-x C-b M-x ibuffer M-x list-buffers
C-x C-f M-x ido-find-file M-x find-file
C-x C-i M-x ido-insert-file M-x insert-file
C-x C-w M-x ido-write-file M-x write-file

The Windows keybindings for copy Ctrl+C, paste Ctrl+V, and cut Ctrl+X conflict with the Emacs bindings. Hence the custom bindings C-c c, C-c v, and C-c x.

Use C-\ or C-x RET C-\ to enable or disable the Emacs input method. One can use M-x list-input-methods to see all the available input methods. I use these input methods:

  • latex
  • latin-prefix
  • rfc1345
  • greek

If an input method is currently enabled, use C-h I to see the documentation for it.

When I run Emacs on Mac, here are how the modifier keys are set:

variable setting
mac-control-modifier control
mac-right-control-modifier left
mac-option-modifier meta
mac-right-option-modifier left
mac-command-modifier super
mac-right-command-modifier left
mac-function-modifier none

Because the option key is bound to meta, it is not possible to use the option key to enter Latin accent characters in the customary Mac manner. I never use the right option key as a meta key, however. Setting mac-right-option-modifier to nil means that the right option key can be used to enter Latin accent characters. The other option is to use the latin-prefix input method.

The input method rfc1345 can be used to put macrons on the vowels of Latin text. RFC 1345 is a scheme for representing a variety of non-ASCII characters, including Latin accents, Greek, Cyrillic, Hebrew, Arabic, Hiragana and Katakana, using two character ASCII sequences.

One can use greek for monotonic Greek and greek-babel for polytonic Greek. greek-babel doesn't use the standard Greek keyboard layout, however. θ is bound to J instead of U, for example. Hence I prefer to use the Mac or Windows input method.

Use the keybinding C-x 8 RET or M-x ucs-insert to insert a Unicode character by name or by point in hexadecimal.

HTML

We are told that an HTML document should always declare its character encoding. In HTML4 a document which does not permit deprecated tags starts with this:

<!DOCTYPE HTML
  PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
  <head>
    <meta http-equiv="Content-type" content="text/html;charset=UTF-8">

In HTML5 a document starts with this:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">

HTML document character encodings are seemingly paradoxical, since the reader must already know the character encoding to read the document in the first place. Some readers may try decoding an HTML document with multiple encodings. Such readers won't decode the entire document, so the declaration is supposed to be in the first 1024 bytes.

HTML 1.0 had the character entity references {{&lt;}} {{&gt;}} and {{&amp;}} since the characters {{<}}, {{> and & are interpreted as markup. HTML 2.0 added character entity references for the 96 upper 8-bit ISO-8859-1 characters. In addition to alphabetical codes, the ISO-8859-1 numeric code could be used: &#NUM; The numeric codes are decimal. There are 252 character entity references in HTML4. The numeric codes are the Unicode points? It is unclear to me whether more character entity references will be added in HTML5.

XML documents do not have to have their encoding declared if they are UTF-8 or UTF-16. UTF-16 must have a byte-order mark. Here is how to declare the encoding:

<?xml version="1.0" encoding="UTF-8"?>

The default character encoding for HTTP 1.1 is ISO-8859-1. Servers which return Unicode documents must declare the character encoding in the response header:

Content-Type: text/html; charset=utf-8

English Punctuation

Non-ASCII punctuation necessary to correctly typeset English.

To enter these characters in Emacs or Windows, memorize the hex Unicode points. In Emacs use C-x 8 RET POINT RET. On Windows, use WordPad. Type the hex value, then use Alt+X. Then cut-and-paste the text to another application.

Windows traditionally uses Alt and the numeric keypad to enter these characters, but this is not possible when running Windows under Boot Camp.

english punctuation
chr mac os x
(us extended)
unicode name unicode point html entity
⌥] LEFT SINGLE QUOTATION MARK 2018 &lsquo;
⇧⌥] RIGHT SINGLE QUOTATION MARK 2019 &rsquo;
⌥[ LEFT DOUBLE QUOTATION MARK 201C &ldquo;
⇧⌥[ RIGHT DOUBLE QUOTATION MARK 201D &rdquo;
⌥- EN DASH 2013 &ndash;
⇧⌥- EM DASH 2014 &mdash;
⌥; HORIZONTAL ELLIPSIS 2026 &hellip;

The Chicago Manual of Style, 16th Ed.:

Published works should use directional (or “smart”) quotation marks, sometimes called typographer’s or “curly” quotation marks.

All software also includes a “default” mark("); in published prose this unidirectional mark, far more portable than typographer's marks, nonetheless signals a lack of typographical sophistication. Proper directional characters should also be used for single quotation marks (‘’).

Published works should use directional (or “smart”) apostrophes.

The apostrophe is the same character as the right single quotation mark. Thanks to the limitations of conventional keyboards and many software programs, the apostrophe has been one of the most abused marks in punctuation—especially in the last generation or so. There are two common pitfalls: using the “default” unidirectional mark ('), on the one hand, and using the left single quotation mark, on the other. The latter usage in particular should always be construed as an error.

I prefer software which does not replace unidirectional marks with directional marks automatically. When using Google Docs, ⌘Z (Ctrl-Z on Windows) will sometimes undo a smart quotes conversion.

Hyphens are used in:

  • compound words
  • separators in telephone numbers, social security numbers, ISBNs, and spelled out words

En dashes are used in:

  • numeric, time, and date ranges, including unfinished ranges

Em dashes are used to:

  • set off an amplifying or explanatory clause
  • mark the end of an incomplete sentence

In source code the hyphen is used as a minus sign. In typeset mathematics the minus sign is a distinct character.

The Chicago Manual of Style, 16th Ed.:

An //ellipsis// is the omission of a word, phrase, line, paragraph, or more from a quoted passage. . . . Chicago style is to indicate such omissions by the use of three spaced periods rather than by another device such as asterisks.

Latin Accent

My default input source on Mac OS X is U.S. Extended.

In Emacs I use the input method latin-prefix to enter Latin letters with accents.

On Windows I use United States-International.

The Emacs and Windows input methods use modifying prefixes. To enter the prefix character literally, type Space after the character.

For characters which do not have Emacs or Windows prefix sequences, use C-x 8 RET POINT RET (Emacs) or POINT Alt+X (WordPad).

To get the Mac keybindings which are available using the option key, use the Keyboard Viewer, which I have bound to ^⌥⌘K.

In Emacs, browse the current input method bindings with C-h I.

vowels
chr mac os x
(us extended)
emacs
(latin-prefix)
windows
(us-intl)
unicode name unicode point html entity
á ⌥e a 'a 'a LATIN SMALL LETTER A WITH ACUTE 00C1 &aacute;
Á ⌥e A 'A 'A LATIN CAPITAL LETTER A WITH ACUTE 00E1 &Aacute;
é ⌥e e 'e 'e LATIN SMALL LETTER E WITH ACUTE 00C9 &eacute;
É ⌥e E 'E 'E LATIN CAPITAL LETTER E WITH ACUTE 00E9 &Eacute;
í ⌥e i 'i 'i &iacute;
Í ⌥e I 'I 'I &Iacute;
ó ⌥e o 'o 'o &oacute;
Ó ⌥e O 'O 'O &Oacute;
ú ⌥e u 'u 'u &uacute;
Ú ⌥e U 'U 'U &Uacute;
à ⌥` a `a `a &agrave;
À ⌥` A `A `A &Agrave;
è ⌥` e `e `e &egrave;
È ⌥` E `E `E &Egrave;
ù ⌥` u `u `u &ugrave;
Ù ⌥` U `U `U &Ugrave;
â ⌥^ a ^a ^a &acirc;
 ⌥^ A ^A ^A &Acirc;
ê ⌥^ e ^e ^e &ecirc;
Ê ⌥^ E ^E ^E &Ecirc;
î ⌥^ i ^i ^i &icirc;
Î ⌥^ I ^I ^I &Icirc;
ô ⌥^ o ^o ^o &ocirc;
Ô ⌥^ O ^O ^O &Ocirc;
û ⌥^ u ^u ^u &ucirc;
Û ⌥^ U ^U ^U &Ucirc;
œ ⌥q /o2 LATIN SMALL LIGATURE OE 0153 &oelig;
Œ ⌥Q /O2 LATIN CAPITAL LIGATURE OE 0152 &OElig;
ä ⌥u a "a "a &auml;
Ä ⌥u A "A "A &Auml;
ë ⌥u e "e "e &euml;
Ë ⌥u E "E "E &Euml;
ï ⌥u i "i "i &iuml;
Ï ⌥u I "I "I &Iuml;
ö ⌥u o "o "o &ouml;
Ö ⌥u O "O "O &Ouml;
ü ⌥u u "u "u &uuml;
Ü ⌥u U "U "U &Uuml;
ÿ ⌥u y "y "y &yuml;
Ÿ ⌥u Y "Y LATIN CAPITAL Y DIERESIS 0178 &Yuml;
vowels w/ macrons
chr mac os x
(us extended)
emacs
(rfc1345)
windows
(us-intl)
unicode name unicode point html entity
ā ⌥a a &a- LATIN SMALL LETTER A WITH MACRON 0101
Ā ⌥a A &A- LATIN CAPITAL LETTER A WITH MACRON 0100
ē ⌥a e &e- LATIN SMALL LETTER E WITH MACRON 0113
Ē ⌥a E &E- LATIN CAPITAL LETTER E WITH MACRON 0112
ī ⌥a i &i- LATIN SMALL LETTER I WITH MACRON 012B
Ī ⌥a I &I- LATIN CAPITAL LETTER I WITH MACRON 012A
ō ⌥a o &o- LATIN SMALL LETTER O WITH MACRON 014D
Ō ⌥a O &O- LATIN CAPITAL LETTER O WITH MACRON 014C
ū ⌥a u &u- LATIN SMALL LETTER U WITH MACRON 016B
Ū ⌥a U &U- LATIN CAPITAL LETTER U WITH MACRON 016A
ȳ ⌥a y LATIN SMALL LETTER Y WITH MACRON 0233
Ȳ ⌥a Y LATIN CAPITAL LETTER Y WITH MACRON 0232
consonants
chr mac os x
(us extended)
emacs
(latin-prefix)
windows
(us-intl)
unicode name unicode point html entity
ç ⌥c c ~c Alt+Ctrl+, LATIN SMALL LETTER C WITH CEDILLA 00E7 &ccedil;
Ç ⌥c C ~C Alt+Ctrl+Shift+, LATIN CAPITAL LETTER C WITH CEDILLA 00C7 &Ccedil;
ñ ⌥n n ~n ~n LATIN SMALL LETTER N WITH TILDE 00F1 &ntitle;
Ñ ⌥n N ~N ~N LATIN CAPITAL LETTER N WITH TILDE 00D1 &Ntitle;
ß ⌥s "s Alt+Ctrl+s SMALL LETTER SHARP S 00DF &szlig;
non-english punctuation
chr mac os x
(us extended)
emacs
(latin-prefix)
windows
(us-intl)
unicode name unicode point html entity
¡ ⌥1 ~! Alt+Ctrl+1 INVERTED EXCLAMATION MARK 00A1 &iexcl;
« ⌥\ ~< Alt+Ctrl+[ LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 00AB &laquo;
» ⇧⌥\ ~> Alt+Ctrl+] RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK 00BB &raquo;
¿ ⇧⌥/ ~? Alt+Ctrl+\ INVERTED QUESTION MARK 00BF &iquest;
⇧⌥0 SINGLE LOW-9 QUOTATION MARK 201A &sbquo;
⌥] LEFT SINGLE QUOTATION MARK 2018 &lsquo;
⇧⌥, DOUBLE LOW-9 QUOTATION MARK 201E &bdquo;
⌥[ LEFT DOUBLE QUOTATION MARK 201C &ldquo;
ª ⌥9 _a FEMININE ORDINAL INDICATOR 00AA &ordf;
º ⌥0 _o MASCULINE ORDINAL INDICATOR 00BA &ordm;

French

The French ordinal adjectives are

  • premier, première
  • deuxième
  • troisième

The abbreviated forms are 1er, 1re, 2e, and 3e. There are no Unicode characters for the superscripts.

Dieresis (Fr: tréma) is used in French to indicate two separate vowels that would otherwise be interpreted as a diphthong: Noël, naïf. Dieresis on a y is rare: L'Haÿ-les-Roses. The dieresis goes on the second vowel.

The letter g is a fricative before the letters e and i and a stop before the other vowels. A silent u is used to indicate when in fact the g is pronounced as a stop: givre (frost) and guivre (archaic term for a serpent). A dieresis is used if the u is not silent: aigüe (feminine form of acute). Before the spelling reform of 1990 the dieresis was written over the silent e: aiguë.

A few common French words use the œ digraph: mœurs, cœur, sœur, œuf, œuvre, œil. The œu vowel combination is a rounded front vowel like the German ö. œil is pronounced as a rounded front vowel followed by a palatal approximant and perhaps could be regarded as a diphthong.

The letter c is a velar stop before the vowels a, o, and u and an unvoiced dental sibilant before the vowels e and i. The c with cédille ç can be used before a, o, and u to represent an unvoiced dental sibilant. s is a voiced dental sibilant.

Accented letters included c with cédille appear at the same place as the unaccented letters in the collation order. If two words only differ in the presence of an accent, the accented word appear second in the collation order.

The French use guillemets « » to quote speech. The guillemets are separated from the interior text by space:

« Voulez-vous un sandwich, Henri ? »

They use the comma instead of the period as the decimal mark. The digits of large numbers can be set off in threes using spaces, e.g. 1 000 000.

German

The German name for the letter ß is (das) Eszett. ß and ss are used for the same unvoiced sibilant (English s). ß is used after long vowels and diphthongs and ss is used after short vowels. Before the spelling reform of 1996 ß was always used word-finally.

A single s is used for the voiced sibilant (English z), but note that voiced consonants do not occur word-finally in German and the German article das is an orthographic exception.

The letter ß and the unvoiced sibilant it represents does not occur word initially and there is no uppercase version of the letter. When a word is put in all caps it is replaced by SS. Before the spelling reform of 1996 SZ was used.

When German must be written in ASCII, ß is replaced by ss and ä, ö, and ü are replaced by ae, oe, and ue.

An example of how to use quotes in German:

Er fragt: „Wie sagt man ‚foobar‘ auf Deutsch?“

The quotes are called Gänsefüßchen. Germans use the English left quotes on the right. Some fonts such as Verdana have English left quotes which look incorrect when used on the right in German.

Germans write numbers the same way as the French.

Latin

For pedagogical purposes macrons are used to distinguish long vowels from short vowels.

The classical Romans used a mark called the apex for this purpose over A, E, O, and V. The apex is not dissimilar to an acute accent. A long I was indicated by making the letter taller.

Spanish

Spanish has masculine and feminine ordinal ending abbreviation characters.

E.g. primera edición can be abbreviated as 1ª edición.

In Spanish most words that end in vowels, s, or n have penultimate stress and most words that end in r, l, or d have ultimate stress.

When a word does not follow the above pattern the position of the stress is indicated with an acute accent above the vowel receiving stress.

The ñ is treated as a separate letter. The Spanish collation order puts it after n. There is a capitalized version Ñ, but only a handful of words borrowed from foreign or indigenous languages start with it.

Traditionally ch and ll were treated as separate letters that came after c and l in the collation order. Since the reform of 1994 they are treated as two separate letters for collation.

Most Spanish speaking countries write numbers in the French manner. Mexicans write numbers in the English manner.

Polytonic Greek

Polytonic Greek is used to write Classical Greek. It uses acute, grave, and circumflex accents, the iota subscript, and smooth and rough breathing marks.

Mac OS X comes with a Polytonic Greek input source. Here is how it maps Greek letters to the keyboard:

greek letter mac
(greek polytonic)
; : q Q
ς Σ w W
υ Υ y Y
θ Θ u U
η Η h H
ξ Ξ j J
χ Χ x X
ψ Ψ c C|
ω Ω v V

The rest of the Greek letters are mapped to their phonetically matching Latin keys.

accented letter mac
(greek polytonic)
]a
ά ;a
[a
'a
"a
⌥ia
-a
=a
/a
_a
+a
?a
greek mac
(greek polytonic)
usage
. . as English period
, , as English comma
; q as English question mark
· ⌥9 as English colon or semicolon

The number keys and their shift punctuation are the same when the Greek Polytonic input source is in effect. The keys for these characters are the same: `~|\,.<

Mathematics

A few non-ASCII mathematical symbols are available in the Mac OS X U.S Extended and the Emacs latin-prefix input methods:

chr mac os x
(us extended)
emacs
(latin-prefix)
unicode name unicode point html entity
× /\ MULTIPLICATION SIGN 00D7 &times;
÷ ⌥/ /: DIVISION SIGN 00F7 &divide;
NOT EQUAL TO 2260 &ne;
⌥, LESS-THAN OR EQUAL TO 2264 &le;
⌥. GREATER-THAN OR EQUAL TO 2265 &ge;
° ⇧⌥8 // DEGREE SIGN 00B0 &deg;
PRIME 2032 &prime;
DOUBLE PRIME 2033 &Prime;

For the remaining mathematical symbols I use input methods based on LaTeX.

chr unicode name unicode point latex html entity alt + fn
logical operators
¬ not sign U+00AC \neg &not; !
logical and U+2227 \wedge &and; &
logical or U+2228 \vee &or; |
for all U+2200 \forall &forall; A
there exists U+2203 \exists &exist; E
sets
empty set U+2205 \emptyset &empty; 0
element of U+2208 \in &isin; e
not an element of U+2209 \notin &notin; n
subset of U+2282 \subset &sub; (
superset of U+2283 \supset &sup; )
subset of or equal to U+2286 \subseteq &sube; [
superset of or equal to U+2287 \supseteq &supe; ]
intersection U+222A \cap &cap; I
union U+2229 \cup &cup; U
relational operators
less-than or equal to U+2264 \le &le; <
greater than or equal to U+2265 \ge &ge;
not equal to U+2260 \ne &ne; #
almost equal to U+2248 \approx &asymp; ~
identical to U+2261 \equiv &equiv; =
arithmetic operators
± plus-minus sign U+00B1 \pm &plusmn; +
÷ division sign U+00F7 \div &divide; /
× multiplication sign U+00D7 \times &times; *
relational algebra
π project
σ select
ρ rename
(natural) join \bowtie
left semijoin \ltimes
right semijoin \rtimes
antijoin
left outer join
right outer join
full outer join
other
infinity U+221E \infty &infin; i
° degree sign U+00B0 ^\circ &deg; d

Keyboard Notation

keyboard notation
chr unicode name unicode point key alt key html entity
leftwards arrow U+2190 left &larr;
upwards arrow U+2191 up &uarr;
rightwards arrow U+2192 right &rarr;
downwards arrow U+2193 down &darr;
place of interest sign U+2138 command
return symbol U+23CE return
up arrowhead between two horizontal bars U+2324 enter
upwards white arrow U+21E7 shift
upwards white arrow from bar U+21EA caps lock
downwards arrow with double stroke U+21DF page down fn↓
upwards arrow with double stroke U+21DE page up fn↑
north west arrow U+2196 home ⌘↑
south east arrow U+2198 end ⌘↓
option key U+2325 option
erase to left U+232B delete pc backspace
broken circle with northwest arrow U+238B esc
eject symbol U+23CF eject ⌘E
erase to right U+2326 pc delete fn⌫
http://upload.wikimedia.org/wikipedia/commons/thumb/b/b2/IEC5009_Standby_Symbol.svg/18px-IEC5009_Standby_Symbol.svg.png none none power
rightwards arrow to bar U+21E5 tab
leftwards arrow to bar U+21E4 tab
gear U+2699

pc delete

ctrl: various notation, multiple unicode characters

enter vs return

escape

eject and power symbol

Ability to enter the printable version of control characters?