热度 23
2011-10-14 09:47
3478 次阅读|
0 个评论
Have you heard of this old engineering joke? "Standards are great ... everyone should have one!" The problem is that—very often—everyone does. For instance, take the case of storing textual data inside a computer, where the computer regards everything as being a collection of numbers. In this case, someone has to (a) decide which characters need to be represented in the first place and (b) decide which numeric values are going to be associated with the various characters. The resulting collection of character-to-number mappings is referred to as a "code". ASCII Code(s) Towards the end of the 1950s, the American Standards Association (ASA) began to consider the problem of defining a standard character code mapping that could be used to facilitate the representation, storing, and interchanging of textual data between different computers and peripheral devices. In 1963, the ASA – which changed its name to the American National Standards Institute (ANSI) in 1969 – announced the first version of the American Standard Code for Information Interchange (ASCII). However, this first version of the ASCII code (which is pronounced "ask-key") left many things – such as the lower case Latin letters – undefined, and it wasn't until 1968 that the currently used ASCII standard of 96 printing characters and 32 control characters was defined as illustrated in Figure 1. Figure 1: The 1968 version of the ASCII code. (Dollar '$' Characters indicate hexadecimal values) Let's just pause for a moment to appreciate how tasty this version of the table looks (like all of the images in this article, it was created by yours truly in Visio). But we digress... Note that code $20 (which is annotated "SP") is equivalent to a space. Also, as an aside, the terms uppercase and lowercase were handed down to us by the printing industry, from the compositors' practice of storing the type for capital letters and small letters in two separate trays, or cases. When working at the type-setting table, the compositors invariably kept the capital letters and small letters in the upper and lower cases, respectively; hence, "uppercase" and "lowercase." Prior to this, scholars referred to capital letters as majuscules and small letters as minuscules , while everyone else simply called them capital letters and small letters . We should also note that one of the really nice things about ASCII is that all of the alpha characters are numbered sequentially; that is, 65 ($41 in hexadecimal) = 'A', 66 = 'B', 67 = 'C', and so on until the end of the alphabet. Similarly, 97 ($61 in hexadecimal) = 'a', 98 = 'b', 99 = 'c', and so forth. This means that we can perform cunning programming tricks like saying "char = 'A' + 23" and have a reasonable expectation of ending up with the letter 'X'. Alternatively, if we wish to test to see if a character (called "char") is lowercase and – if so – convert it into its uppercase counterpart, we could use a piece of code similar to the following: if (char = 'a') and (char = 'z') then char = char – 32; Don't worry as to what computer language this is; the important point here is that the left-hand portion of this statement is used to determine whether or not we have a lowercase character and, if we do, subtracting 32 ($20 in hexadecimal) from that character's code will convert it into its uppercase counterpart. As can be seen in Figure 1, in addition to the standard alphanumeric characters ('a'...'z', 'A'...'Z' and '0'...'9'), punctuation characters (comma, period, semi-colon, ...) and special characters ('*', '#', '%', ...), there are an awful lot of strange mnemonics such as EOT, ACK, NAK, and BEL. The point is that, in addition to representing textual data, ASCII was intended for a number of purposes such as communications; hence the presence of such codes as EOT, meaning "End of transmission," and BEL, which was used to ring a physical bell on old-fashioned printers. Some of these codes are still in use today, while others are, generally speaking, of historical interest only. For those who are interested, a more detailed breakdown of these special codes is presented in Figure 2. Figure 2: ASCII control characters. One final point is that ASCII is a 7bit code, which means that it only uses the binary values %0000000 through %1111111 (that is, 0 through 127 in decimal or $00 through $7F in hexadecimal). However, computers store data in multiples of 8bit bytes, which means that – when using the ASCII code – there's a bit left over. In some systems, the unused, most-significant bit of an 8bit byte representing an ASCII character is simply set to logic 0. In other systems, the extra 128 codes that can be accessed using this bit might be used to represent simple "chunky graphics" characters. Alternatively, this bit might be used to implement a form of error detection known as a parity check , in which case it would be referred to as the parity bit .