This article is half-done without your Comment! *** Please share your thoughts via Comment ***
In this post, I am sharing small Database theory on Character Set and Collation.
As we are Database Professionals and already experienced with “Collate” in SQL Server and MySQL.
I found that fresher and intermediate Database Professional have doubt and question about Character Set and Collation.
Let me clear this with a natural note.
What is Character Set?
A Character Set is nothing but just a list of symbols and encodings.
For example, latin1 and UTF-8 are the most traditional character set.
Using latin1, you will be able to write all American words because latin1 contains all ASCII characters, which are sufficient to write an English word. On the contrary, with ASCII you will not be able to write all words of Western European specific languages because for instance, characters like ‘ë’, ‘õ’, ‘Ñ’ are missing.A Character Set encodes characters so that they fit in memory. For example, the euro symbol, €, will be encoded as 0xa4, and in UTF-8, it will be 0xe282ac.
What is Collation?
A collation is a set of rules for comparing characters in a character set. It has also ruled for sorting of characters, and proper order of two characters varies from language to language.
A Collation compared two strings like, if a word is greater than another one, and sort accordingly.If you are using “latin1” Character set, you can use “latin1_swedish_ci” Collation.
You have to choose right collation because wrong collation may affect your database performance.
Now create one database in MySQL using Character Set and Collation:
1 |
CREATE DATABASE DatabaseName CHARACTER SET latin1 COLLATE latin1_swedish_ci; |