SQL Server
One of my biggest hot-buttons is around shrinking data files. Although I used to own the shrink code while I was at Microsoft, I never had a chance to rewrite it so that data file shrink is a more palatable operation. I really don’t like shrink.
Now, don’t confuse shrinking the transaction log with shrinking data files. Shrinking the log is necessary if your log has grown out of control, or as part of a process to remove excessive VLF fragmentation (see Kimberly’s excellent posts on this here and here). However, shrinking the log should be a rare operation and should not be part of any regular maintenance you perform.
Shrinking of data files should be performed even more rarely, if at all. Here’s why – data file shrink causes *massive* index fragmentation. Let me demonstrate with a simple script you can run. The script below will create a data file, create a 10MB ‘filler’ table at the start of the data file, create a 10MB ‘production’ clustered index, and then analyze the fragmentation of the new clustered index.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
| USE [master]; GO IF DATABASEPROPERTYEX (N 'DBMaint2008' , N 'Version' ) IS NOT NULL DROP DATABASE [DBMaint2008]; GO CREATE DATABASE DBMaint2008; GO USE [DBMaint2008]; GO SET NOCOUNT ON ; GO -- Create the 10MB filler table at the 'front' of the data file CREATE TABLE [FillerTable] ( [c1] INT IDENTITY, [c2] CHAR (8000) DEFAULT 'filler' ); GO -- Fill up the filler table INSERT INTO [FillerTable] DEFAULT VALUES ; GO 1280 -- Create the production table, which will be 'after' the filler table in the data file CREATE TABLE [ProdTable] ( [c1] INT IDENTITY, [c2] CHAR (8000) DEFAULT 'production' ); CREATE CLUSTERED INDEX [prod_cl] ON [ProdTable] ([c1]); GO INSERT INTO [ProdTable] DEFAULT VALUES ; GO 1280 -- Check the fragmentation of the production table SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats ( DB_ID (N 'DBMaint2008' ), OBJECT_ID (N 'ProdTable' ), 1, NULL , 'LIMITED' ); GO |
avg_fragmentation_in_percent ----------------------------- 0.390625 |
The logical fragmentation of the clustered index before the shrink is a near-perfect 0.4%.
Now I’ll drop the ‘filler’ table, run a shrink to reclaim the space, and re-analyze the fragmentation of the clustered index:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| -- Drop the filler table, creating 10MB of free space at the 'front' of the data file DROP TABLE [FillerTable]; GO -- Shrink the database DBCC SHRINKDATABASE ([DBMaint2008]); GO -- Check the index fragmentation again SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats ( DB_ID (N 'DBMaint2008' ), OBJECT_ID (N 'ProdTable' ), 1, NULL , 'LIMITED' ); GO |
DbId FileId CurrentSize MinimumSize UsedPages EstimatedPages ----- ------- ------------ ------------ ---------- --------------- 6 1 1456 152 1448 1440 6 2 63 63 56 56 DBCC execution completed. If DBCC printed error messages, contact your system administrator. avg_fragmentation_in_percent ----------------------------- 99.296875 |
Wow! After the shrink, the logical fragmentation is almost 100%. The shrink operation *completely* fragmented the index, removing any chance of efficient range scans on it by ensuring the all range-scan readahead I/Os will be single-page I/Os.
Why does this happen? A data file shrink operation works on a single file at a time, and uses the GAM bitmaps (see Inside The Storage Engine: GAM, SGAM, PFS and other allocation maps) to find the highest page allocated in the file. It then moves it as far towards the front of the file as it can, and so on, and so on. In the case above, it completely reversed the order of the clustered index, taking it from perfectly defragmented to perfectly fragmented.
The same code is used for DBCC SHRINKFILE, DBCC SHRINKDATABASE, and auto-shrink – they’re equally as bad. As well as introducing index fragmentation, data file shrink also generates a lot of I/O, uses a lot of CPU, and generates *loads* of transaction log – as everything it does is fully logged.
Data file shrink should never be part of regular maintenance, and you should NEVER, NEVER have auto-shrink enabled. I tried to have it removed from the product for SQL 2005 and SQL 2008 when I was in a position to do so – the only reason it’s still there is for backwards compatibility. Don’t fall into the trap of having a maintenance plan that rebuilds all indexes and then tries to reclaim the space required to rebuild the indexes by running a shrink – that’s a zero-sum game where all you do is generate a log of transaction log for no actual gain in performance.
So what if you *do* need to run a shrink? For instance, if you’ve deleted a large proportion of a very large database and the database isn’t likely to grow, or you need to empty a file before removing it?
The method I like to recommend is as follows:
- Create a new filegroup
- Move all affected tables and indexes into the new filegroup using the CREATE INDEX … WITH (DROP_EXISTING = ON) ONsyntax, to move the tables and remove fragmentation from them at the same time
- Drop the old filegroup that you were going to shrink anyway (or shrink it way down if its the primary filegroup)
Basically you need to provision some more space before you can shrink the old files, but it’s a much cleaner mechanism.
If you absolutely have no choice and have to run a data file shrink operation, be aware that you’re going to cause index fragmentation and you should take steps to remove it afterwards if it’s going to cause performance problems. The only way to remove index fragmentation without causing data file growth again is to use DBCC INDEXDEFRAG or ALTER INDEX … REORGANIZE. These commands only require a single 8KB page of extra space, instead of needing to build a whole new index in the case of an index rebuild operation.
Bottom line – try to avoid running data file shrink at all costs!