This article is half-done without your Comment! *** Please share your thoughts via Comment ***
What is a Columnstore index of SQL Server 2012?
A columnstore index is a technology for storing, retrieving and managing data by using a columnar data format, called a columnstore.
A columnstore is data that is logically organized as a table with rows and columns, and physically stored in a column-wise data format.
Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size.
What are the advantages of Columnstore index?
Columns store values from the same domain and commonly have similar values, which results in high compression rates. This minimizes or eliminates IO bottleneck in your system while reducing the memory footprint significantly.
High compression rates improve query performance by using a smaller in-memory footprint. In turn, query performance can improve because SQL Server can perform more query and data operations in-memory.
Batch execution improves query performance, typically 2-4x, by processing multiple rows together.
Queries often select only a few columns from a table, which reduces total I/O from the physical media.
When I should use clustered column stored index?
Use a clustered columnstore index to store fact tables and large dimension tables for data warehousing workloads. This improves query performance and data compression by up to 10x.
When I should use non-clustered column stored index?
Use a nonclustered columnstore index to perform analysis in real-time on an OLTP workload.
How do I choose between a rowstore index and a columnstore index?
Rowstore indexes perform best on queries that seek into the data, searching for a particular value, or for queries on a small range of values. Use rowstore indexes with transactional workloads since they tend to require mostly table seeks instead of table scans.
Columnstore indexes give high performance gains for analytic queries that scan large amounts of data, especially on large tables. Use columnstore indexes on data warehousing and analytics workloads, especially on fact tables, since they tend to require full table scans rather than table seeks.
What do you know about the Column store Data compression?
Columnstore indexes read compressed data from disk, which means fewer bytes of data need to be read into memory.
Columnstore indexes store data in compressed form in memory which reduces IO by reducing the number of times the same data is read into memory.
Columnstore indexes compress data by columns instead of by rows which achieves high compression rates and reduces the size of the data stored on disk. Each column is compressed and stored independently.
What is fill factor and pad index?
A fill factor is a specification done during the creation of indexes so that a particular amount of space can be left on a leaf level page to decrease the occurrence of page splits when the data has to be accommodated in the future.
A pad index specifies index padding. When it is set to ON, then the free space percentage per the fill factor specification is applied to the intermediate-level pages of the index. When it is set to OFF, the fill factor is not specified and enough space is left for a maximum size row.
What is it unwise to create wide clustered index keys?
A clustered index is a good choice for searching over a range of values. After an indexed row is found, the remaining rows being adjacent to it can be found easily. However, using wide keys with clustered indexes is not wise because these keys are also used by the non-clustered indexes for look ups and are also stored in every non-clustered index leaf entry.
Explain the ‘Fill factor’ concept in Indexes.
- The fill factor option is provided for smoothening index data storage and performance.
- The percentage of space on each leaf level page to be filled with data is determined by the fill factor value when an index is created.
- This reserves a percentage of free space for future growth.
- This option is provided for fine tuning index data storage and performance.
- The fill factor value is a percentage from 1 to 100 and the server-wide default is 0 which means that the leaf-level pages are filled to capacity.
How do you maintain a fill factor in existing indexes?
- Usually the data keeps getting added, deleted, or updated in the table due to which the fill factor is implemented during the index creation itself. Fill factor is not maintained after an index is created.
- The changes in the future can be determined only when a new index is created on a table that has an existing data. It is beneficial to set the fill factor then itself.
- Maintaining extra space on the data pages obviates the purpose of using the fill factor. The SQL Server would then have to split pages to maintain an amount of free space per the fill factor, on each page.
- Thus, when data is added filling up the empty space, an index can be created and the fill factor can be re-specified distribute the data again.
Difference between Unique Index vs Unique Constraint.
Unique Index and Unique Constraint are the same and achieve same goal. SQL performance is same for both.
What are the primary differences between an index reorganization and an index rebuild?
- Reorganization is an “online” operation by default; a rebuild is an “offline” operation by default
- Reorganization only affects the leaf level of an index.
- Reorganization swaps data pages in-place by using only the pages already allocated to the index; a rebuild uses new pages/allocations
- Reorganization is always a fully-logged operation; a rebuild can be a minimally-logged operation.
- Reorganization can be stopped mid-process and all completed work is retained; a rebuild is transactional and must be completed in entirety to keep changes.
If you need to REBUILD a non-clustered index that is 10GB in size and have 5GB of free data-file space available with no room to grow the data file(s), how can you accomplish the task?
When rebuilding an existing non-clustered index, you typically require free space that is approximately equivalent to 2.2 times the size of the existing index, since during a rebuild operation, the existing index is kept until the rebuilt structure is complete and an additional approximately 20% free space for temporary sorting structures used during the rebuild operation.
In this case, you would require at least an additional 10+ GB of free space for the rebuild operation to succeed, since the index itself is 10GB in size.
Using SORT_IN_TEMPDB would not suffice in this case, since only the temporary sort tables are stored in tempdb in this case, and at least 10 GB of free space would still be required in the database data files.
Why most of the DBAs recommended Index Reorganizing instead of Index Rebuilding?
Reorganizing an index uses minimal system resources. It defragments the leaf level of clustered and nonclustered indexes on tables and views by physically reordering the leaf-level pages to match the logical, left to right, order of the leaf nodes. Reorganizing also compacts the index pages.
Compaction is based on the existing fill factor value.
Reorganize a index after one or more data loads to achieve query performance benefits as quickly as possible. Reorganizing will initially require additional CPU resources to compress the data, which could slow overall system performance. However, as soon as the data is compressed, query performance can improve.
What is SORT_IN_TEMPDB option for Indexes?
When you create or rebuild an index, by setting the SORT_IN_TEMPDB option to ON you can direct the SQL Server Database Engine to use tempdb to store the intermediate sort results that are used to build the index.
When SORT_IN_TEMPDB is set to OFF, the default, the sort runs are stored in the destination filegroup.
Do you think, INDEX REORGANIZE operation requires additional free disk space?
No, INDEX REORGANIZE does not require additional free disk space, but related transaction log is required free space to log the operation information.