This article is half-done without your Comment! *** Please share your thoughts via Comment ***
In this post, I am sharing small demonstration on, how to find similarity between Postgres strings in percentage?
PostgreSQL is a well known for a variety of string functions which are used for data analysis.
One of our developers is generating random token string manually for two columns, and now he is required to find similarity between this string.
In the PostgreSQL, you can use a pg_trgm module to find similarity based on trigram matching.
Below is a demonstration of this:
Create a table with sample data:
1 2 3 4 5 6 7 8 9 10 11 |
CREATE TABLE tbl_SimilarString ( Str1 TEXT ,Str2 TEXT ); INSERT INTO tbl_SimilarString VALUES ('Anvesh Patel','Anvesh Pat') ,('dbrnd','dbrnd blog') ,('database dev','database developer') ,('postgres database','database postgres'); |
Install pg_trgm module:
1 |
CREATE EXTENSION pg_trgm; |
Use similarity():
1 |
SELECT similarity(Str1,Str2) FROM tbl_SimilarString; |
The result:
1 2 3 4 5 6 7 |
similarity ------------ 0.714286 0.545455 0.578947 1 (4 rows) |