Search Java Programs

Wednesday, February 24, 2010

Understanding SQL Character Data Types

Table columns defined as being of one of the character data types can hold letters, numbers, and special characters (such as !,@,#,$,%,^, and so on). There are four character data types, each with one or two synonyms. The SQL character data types are:

Table 23.1: SQL Character Data Types

Character Type

Description

CHAR(length)

CHARACTER(length)

Fixed-length character string

VARCHAR(length)

CHAR VARYING(length)

CHARACTER VARYING(length)

Variable-length character string

NCHAR(length)

NATIONAL CHAR(length)

NATIONAL CHARACTER(length)

Fixed-length Unicode character string

NCHAR VARYING(length)

NATIONAL CHAR VARYING(length)

NATIONAL CHARACTER VARYING(length)

Variable-length Unicode character string

When declaring a column as one of the character types, you specify both the character data type and its length. (The length of a character string is the maximum number of letters, symbols, and numbers the string can hold.) Thus, given the SQL table declaration

CREATE TABLE character_table
(char_column CHAR(10),
char_column2 CHAR(100),
varchar_column VARCHAR(100),
nchar_column NCHAR(20)
nchar_varying_column NCHAR VARYING (200))

you can store 10 characters in the column CHAR_COLUMN, 100 characters in CHAR_COLUMN2, 100 characters in the VARCHAR_COLUMN column, 20 characters in the column NCHAR_COLUMN, and 200 characters in the NCHAR_VARYING_COLUMN column.

To insert values that include letters or symbols into a CHARACTER data type column, enclose the string you want to insert in either single or double quotes. In our current example, executing the SQL INSERT statement

 INSERT IN character_table
VALUES ("Konrad", 9, 5+4, '5+4')

you would store Konrad in CHAR_COLUMN, 9 in VARCHAR_COLUMN, 9 in NCHAR_COLUMN, and 5+4 in NCHAR_VARYING_COLUMN. As you can see, if a character string includes only numbers, you need not enclose it in quotes. However, if the character string is a numeric expression, you must enclose it in quotes if you want the DBMS to store the numeric expression instead of the results of the numeric expression.

Fixed-Length CHARACTER Data Types

When you store data in a CHAR or CHARACTER column, each character, symbol, or number uses 1 byte of storage space. CHAR and CHARACTER are fixed-length data types, and the DBMS will pad (add blanks to) your string to make it the length specified in the column type definition. In the current example, the CHAR_COLUMN can store 10 characters. As such, the DBMS will store 10 characters in the CHAR_COLUMN column—the character string Konrad followed by four blanks. Similarly, the 9 in CHAR_COLUMN2 is stored as the character 9 followed by 99 blank spaces, since column CHAR_COLUMN2 was declared as a fixed-length character field of 100 characters.

You can store up to 8,000 characters in a column of type CHAR or CHARACTER.

Variable-Length CHARACTER Data Types

VARCHAR, CHAR VARYING, and CHARACTER VARYING are variable-length character strings, meaning that the length in the declaration is the maximum number of characters the column can hold, but the character string in the column may actually have less characters. Thus, in the current example, the NCHAR_COLUMN holds the character 9, using only one byte of storage. Similarly, the column NCHAR_VARYING_COLUMN holds the character string 5+4, using 3 bytes of data. Conversely, the DBMS uses 100 bytes to store the character 9 in CHAR_COLUMN2 and 10 bytes to store the character string Konrad because CHAR_COLUMN and CHAR_COLUMN2 are fixed-length character fields that must have the number of characters given as the column length in the table declaration.

You can store up to 8,000 characters in a column of type VARCHAR, CHAR VARYING, or CHARACTER VARYING.

Fixed- and Variable-Length Unicode CHARACTER Data Types

Computers store characters (whether symbols, letters, or numbers) as a numeric value. As such, every character, symbol, and number in the English language is represented on the computer as a unique sequence of 1s and 0s. Because different languages have characters that differ from any characters in another language, each has its own in encoding scheme. Thus, an A in German will have a different encoding (be represented as a different sequence of 1s and 0s) than an A in Russian. In fact, the European Union requires several different encodings to cover all of its languages.

Unicode was designed to provide a unique number for every character, no matter what platform, program, or language. Thus, the Unicode encoding for the letter A will have the same numeric value whether the A is found in a table on a system in Russia, Greece, or Japan.

The advantage of using Unicode is that you don't have to program in all of the possible numeric values for each symbol, letter, and number for all of the languages whose text you want to store in your database. The disadvantage of using Unicode is that due to the large number of Unicode characters (remember, Unicode is a combination of common and unique characters from any different character sets), it takes 2 bytes instead of 1 to represent each Unicode character. As a result, a Unicode string of type NCHAR(20) takes 40 bytes of storage, while a string of type CHAR(20) takes only 20 bytes.

When you define a column of type NCHAR, NATIONAL CHAR, or NATIONAL CHARACTER, you are telling the DBMS to store a fixed-length character string in the column using the Unicode encoding for each character in the string. Thus, a column of type NCHAR(length) (NATIONAL CHAR(length) and NATIONAL CHARACTER (length)) is a fixed-length character string like a column of type CHARACTER(length). Both contain the number of characters specified by (length). Thus, in our example, the NCHAR_COLUMN defined as data type NCHAR(20) can hold a character string of 20 characters. If you insert a character string of less than 20 characters into an NCHAR(20) column, the DBMS will add spaces to the end of the string to bring it to 20 characters.

You can store up to 4,000 characters in a column of type NCHAR, NATIONAL CHAR, or NATIONAL CHARACTER.

NCHAR VARYING is the Unicode equivalent of the VARCHAR data type. Like VARCHAR, columns of data type NCHAR VARYING(length) (NATIONAL CHAR VARYING (length) and NATIONAL CHARACTER VARYING(length)) hold variable-length character strings up to the number of characters specified by length. Thus, in our example, the NCHAR_VARYING_COLUMN defined as data type NCHAR VARYING(200) can hold a character string of up to 200 characters. If you insert a string of less than 200 characters into an NCHAR VARYING(200) column, the DBMS will not add blanks to the end of the character string. As such, the length of a character string stored in an NCHAR VARYING column can be less than the maximum length (number of characters) specified for the column in the table declaration.

You can store up to 4,000 characters in a column of type NCHAR VARYING, NATIONAL CHAR VARYING, or NATIONAL CHARACTER VARYING.


Note

If you insert a character string longer than the length specified by the character type, the DBMS will truncate (or cut off) the extra characters and store the shortened string in the column without reporting an error. Therefore, if you have a column defined as being of type CHAR(10) and you attempt to insert the string abcdefghijklmnop, the DBMS will store abcdefghij in the column, shortening the maximum number of characters you specified for the character string. When storing a character string, the DBMS will truncate (shorten) a string longer than the maximum specified length, whether the character type is fixed-length or variable-length.

No comments:

Post a Comment

Website Design by Mayuri Multimedia