Finding the percentage of NULL values for each column in a table.

Scenario

Let`s create a scenario where you are working on data validation and data analysis and got the requirement in which you need to find the Percentage of Null values in every column for all the tables in SQL Server Database. This can be really helpful in analysis to know how much data you are really getting or you are getting no values (Null) and plan the storage. Also depending upon the analysis we can find some records those should not be Null but you have receive Null for those Columns, so data corrections can be next step.

Solution

We are using Cursor to find percentage and Null Row Count in each of the column in all the tables in the Database. This query can take long time to execute as it will be running for each column in database if Column allows Nulls. If Column is Not Null, it will be ignored  as nobody can insert null values in those columns.
I will suggest to take the backup of the database and restore it on UAT and run this query in UAT environment. Its fine if your database size is small. Again you don't want to hurt performance of running processes, you might want to schedule to run after hours.

USE SampleDB
GO
DECLARE @DatabaseName VARCHAR(100),
                        @SchemaName VARCHAR(100),
                        @TableName VARCHAR(100),
                        @ColumnName VARCHAR(100),
                        @FullyQualifiedTableName VARCHAR(500)

--Create Temp Table to Save Results

IF OBJECT_ID('tempdb..#TempData') IS NOT NULL
DROP TABLE #TempData
CREATE TABLE #TempData
            (
            DatabaseName VARCHAR(100),
            SchemaName VARCHAR(100),
            TableName VARCHAR(100),
            ColumnName VARCHAR(100),
            TotalTableRowCount INT,
            NullRecordCount INT
            )

DECLARE Cur CURSOR
FOR
SELECT
            isc.Table_CataLog,
            isc.Table_Schema,
            isc.Table_Name,
            isc.Column_Name,
            '['+isc.Table_CataLog+']'+'.[' + isc.Table_Schema+'].'+'['+isc.Table_Name+']' AS FullQualifiedTableName
FROM information_schema.Columns isc
            INNER JOIN information_Schema.Tables ist ON isc.Table_Name = ist.Table_Name
                        AND ist.Table_Type = 'BASE TABLE'
                        AND isc.IS_Nullable='YES'
OPEN Cur
FETCH NEXT FROM Cur INTO
@DatabaseName,@SchemaName,@TableName,@ColumnName,@FullyQualifiedTableName

WHILE @@FETCH_STATUS = 0
BEGIN
DECLARE @SQL VARCHAR(MAX) = NULL
SET @SQL =
              ' Select ''' + @DatabaseName + ''' AS DatabaseName, '''+ @SchemaName + '''AS
              TableName,  ''' + @TableName + ''' AS SchemaName,  ''' + @ColumnName + '''
              AS ColumnName,  (Select count(*) from ' + @FullyQualifiedTableName + '
              with (nolock))  AS TotalTableRowCount,  count(*) as NullRecordCount from ' +
              @FullyQualifiedTableName  + 'with (nolock) Where ' + @ColumnName + ' IS NULL'
--Print @SQL

INSERT INTO #TempData
EXEC (@SQL)

FETCH NEXT FROM Cur INTO
@DatabaseName,@SchemaName,@TableName,@ColumnName,@FullyQualifiedTableName

END

CLOSE Cur
DEALLOCATE Cur

SELECT *, Cast((NullRecordCount/Cast(TotalTableRowCount as Numeric(13,1)))*100 AS Numeric(4,1)) AS NullPercentColumnValues
            FROM #TempData where NullRecordCount <>0
 --and TableName='M_OfferBuy_Info'

 --drop table #TempData

When I ran above script on one of the database, it returns Database Name, Schema Name, Table Name, Column Name, Row count for all the tables (TotalTableRowCount), Row count for null values (NullRecordCount) and Percentage of Null Column Values as shown below.







No comments:

Post a Comment