This article is half-done without your Comment! *** Please share your thoughts via Comment ***
Problem :
Email Address is a ubiquitous field and column for every application or system. As per the application size and scope database system stores lots of user Email Addresses.
E.g. 10,00,000 email address is stored in a database system and it is also increasing day by day.
Usually, any database user saves all this email address into one column. Most probably there is one column name is EmailAddress Varchar(250) and user store email address like ” myprofile@yahoo.co.in ” into this column. – Great and usually 70% to 75% user doing this.
Once database size and number of users are going to increase day by day at that time, this storage will not perform as per expectation and decrease the performance of the database.
How this will decrease the performance of the database?
- As we assumed that email address stored into one column like ” myprofile@yahoo.co.in”.
- Now, we require to find that, how many total numbers of users from only “Yahoo” domain?
- How many total number of users from only “India Yahoo”?
- Require to set some additional offer only for “Gmail” domain.
Friends, now think about above questions.
In my career path I have also stored email address into one column, but gradually I came to know that this is a bad idea because now a day Email address is not only for user name or only for storage purpose.
This is very important for promotional and marketing activity.
When you have stored in one column and let’s imagine, you want to find only ” Gmail ” domain out of 10,0,00,000 email address.
As a database developer, you can think that you require one string operation on all stored email address in which you will find the part which is after @”. But this is not a solution where string searching and parsing is required, which slows down the performance of relevant select statement.
Solution :
The solution is simple,
Just divide your Email address into two part or three part.
E.g.
- Single Part : ” myprofile@yahoo.co.in “.
- Two Part : ” myprofile” and ” yahoo.co.in “.
- Three Part : “myprofile” and ” yahoo ” and ” co.in “.
Now, I am going to explain this using one MySQL Demo.
I have created above two table into MySQL. I have divided Email Address into three part.
Below is a sample data.
tbl_User table Data:
tbl_Email table Data:
Examine the first email address:
User Anvesh has anvesh@gmail.com, Roy has roy@yahoo.co.in
As you can find the EmailDomainID reference column in, a User master table which is indicating that which email address is associated with User.
Now a thing is going to easy:
How ?
If I require finding how many users with Gmail, I need to find the only EmailDomainID=1.
You can also apply easy extension filter like Yahoo from co.in and .com.
This will give you 80% to 85% better performance than to store Email into one column.
Also, This will reduce the size of data, E.g, if we store ” Gmail ” keyword 10,00,000 times into the database, then it consume more space than to store only integer ” 1 “.
Yes, I also agree that, during every insert, you have to split email address for storing purpose, but this is a good instead of parsing string to count only domain name.
This structure is quite complex to manage at every insert and every email existence check, but it gives better performance.
Below is simple string operation.
Full Email Address is : anvesh@yahoo.co.in
Now parse only ” anvesh ” from anvesh@yahoo.co.in
1 2 3 4 |
SELECT SUBSTRING ('anvesh@yahoo.CO.IN',1, (POSITION('@' IN 'anvesh@yahoo.CO.IN') -1) ) |
Now parse only ” yahoo ” from anvesh@yahoo.co.in
1 2 3 4 5 6 7 8 |
SELECT SUBSTRING( SUBSTRING('anvesh@yahoo.CO.IN' ,(POSITION('@' IN 'anvesh@yahoo.CO.IN') + 1)) ,1 ,POSITION('.' IN SUBSTRING('anvesh@yahoo.CO.IN' ,(POSITION('@' IN 'anvesh@yahoo.CO.IN'))) )-2 ) |
Now parse only ” co.in ” from anvesh@yahoo.co.in
1 2 3 4 |
SELECT SUBSTRING((SUBSTRING('anvesh@yahoo.CO.IN' ,(POSITION('@' IN 'anvesh@yahoo.CO.IN') + 1))) ,POSITION('.' IN SUBSTRING('anvesh@yahoo.CO.IN' ,(POSITION('@' IN 'anvesh@yahoo.CO.IN') + 1)))+1) |
The above code is required whenever you insert the email address into the database because from the application end we receive one full email address, but for effective storage, you have to parse like above code.
You can also suggest three different input parameters from the application end where above all codes can shift to at application level.
Click here to find details about string functions of MySQL.