SQL Server: Convert CSV to Columns (table) – Method 1

5 03 2007

Sometimes, you’re blessed with the joy of delimited data within a column. I happened to be one of those lucky people recently, and put together a stored procedure that would produce a table with the data already columnized. Two versions of code were produced to accomplish this task. This is the first method which is more simple, but also slightly slower than Method 2. I’m posting it here to help others, and to hopefully get some feedback on how it can be improved. Method 2 will be posted as well. The procedure accepts the following parameters:

  • Delimiter – how your data is delimited (comma, etc.)
  • SourceTable – table containing your delimited data
  • DataColumn – column containing your delimited data
  • OutputTable – name of the output table to create
  • MaxCols – maximum # of columns any particular row will contain
  • ColPrefix – prefix for columns in OutputTable (‘col_’ for example)
  • ColType – type of each column in OutputTable (‘varchar(500) Null’ for example)
  • IntraColDelimiter – field delimiter for fields that can contain the overall delimiter
  • ReplaceChar – unique character known to never exist in your delimited data
  • FillerColValue – Value to fill columns with when a row is short ‘x’ # of columns

The only tricky parameter is IntraColDelimiter. In our case, the data was comma delimited, but it was also possible for us to get commas within a field. If a comma was going to be found within a field, it would be surrounded by double quotes. So, a 6 “column” delimited record in our situation might look like:

abc,123,def,456,”abc,123,,”,ghi,789

In that example, ‘abc,123,,’ is all one field of data and shouldn’t be broken into separate columns during parsing. Below is the script. Each section is preceeded by a single comment indicating the primary purpose of that section. The goals were to accurately parse the data, maintain performance as much as possible, and ensure the code was maintainable.

create procedure dbo.spParseDelimitedData
@Delimiter			char(1)			= ',',										-- column delimiter found in source data
@SourceTable		varchar(50)		= 'tbSourceRecord',							-- source table containing delimited data
@DataColumn			varchar(50)		= 'SourceRecord',							-- column containing the delimited data
@OutputTable		varchar(50)		= 'tbdump_new',								-- table this script will produce with data in columns
@MaxCols			int				= 10,										-- maximum number of columns found in @DataColumn
@ColPrefix			varchar(10)		= 'col_',									-- column prefix for @OutputTable
@ColType			varchar(20)		= ' varchar(1000) null default null ',		-- column type for each @OutputTable column
@IntraColDelimiter	char(1)			= '"',										-- delimiter for fields that actually contain the delimiter
-- for the case of:  123,432,661,"141,414",abd,afae,afe
@ReplaceChar		char(1)			= '^',										-- unique character known never to be in @DataColumn
@FillerColValue		varchar(20)		= ',Null'									-- value to fill columns with when @DataColumn is short columns
as
/*
* Convert CSV data to columns *

This script reads a delimited (CSV, etc.) column from a table and outputs a new table with the data in columns.
It handles situations where each delimited row may have different numbers of columns.  With the input parameters,
you are able to control the type and value of the columns in the otuput table.
*/
begin
-- setup environment
set nocount on
set concat_null_yields_null off

-- setup vars
declare	@x				int,
@SQL			nvarchar(max),
@Row			int,
@MaxRow			int,
@RowsAffected	int,
@Quit			bit

select	@ReplaceChar	= '^',
@Quit			= 0

-- create a copy of the source table, we need to modify records along the way
-- ensure the table doesn't already exist
select @SQL = 'if exists (select name from sysobjects where name = ' + char(39) + @OutputTable + char(39) + ') drop table ' + @OutputTable
exec (@SQL)

-- do the insert
select	@SQL = 'select *, 0 as startpos into tmpdata from ' + @SourceTable
exec (@SQL)

-- create destination table
select	@x = 2,
@SQL = 'create table ' + @OutputTable + '(id int identity, ' + @ColPrefix + '1' + @ColType + ','
while	@x <= @MaxCols
begin
select	@SQL = @SQL + @ColPrefix + convert(varchar(5), @x) + @ColType + ','
select	@x = @x + 1
end
select	@SQL = left(@SQL, len(@SQL) - 1) + ')'
exec (@SQL)

-- change delimiter chars into something else when they are inside a "field"
select @RowsAffected = 1
while @RowsAffected > 0
begin
select @SQL = '
update	tmpdata
set		' + @DataColumn + ' =
substring(' + @DataColumn + ', 1, charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', startpos) - 1)
+ replace(substring(' + @DataColumn + ', charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', startpos), charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', startpos) + 1) - charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', startpos) + 1), ''' + @Delimiter + ''', ''' + @ReplaceChar + ''')
+ substring(' + @DataColumn + ', charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', startpos) + 1) + 1, len(' + @DataColumn + ') - charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', startpos) + 1)),
startpos = charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', startpos + 1) + 1) + 1
from	tmpdata
where	startpos < charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ', startpos + 1) + 1) + 1
and		charindex(''' + @IntraColDelimiter + ''', ' + @DataColumn + ') > 0'
exec sp_executesql @SQL
select @RowsAffected = @@Rowcount
end

-- populate @SQL with insert statements
create table #ExecMe (ExecID int identity, SQL varchar(max))

select @SQL = 'select ''insert into ' + @OutputTable +  ' values ('' + char(39) + replace(replace(' + @DataColumn + ', '','', char(39) + '','' + char(39)), ' + char(39) + @replacechar + char(39) + ', ' + char(39) + @delimiter + char(39) + ') + char(39) + replicate('',Null'', ' + convert(varchar(5), @MaxCols) + ' - (len(' + @DataColumn + ') - len(replace(' + @DataColumn + ', '','', ''''))) - 1) + '')'' from tmpdata d'

insert into #ExecMe (SQL)
exec (@SQL)

declare curInserts2 cursor for
select SQL from #ExecMe

open curInserts2
fetch next from curInserts2 into @SQL
while @@fetch_status = 0
begin
exec sp_executesql @SQL
fetch next from curInserts2 into @SQL
end

-- cleanup
drop table tmpdata
close curInserts2
deallocate curInserts2

end




Create Real Database Diagrams

28 02 2007

Run the script below in the database you want to create a diagram for. It will create a series of statements that you will execute in a new [Databasename]_Diagram database.

/*

1. First, before getting into any details, before even getting
into the requirements, understand a simple scenario.

a. We have a table named some_table.
b. We have a stored procedure named some_proc.
c. The stored procedure some_proc contains the following SQL
statement:

select * from some_table

d. So, we now know that some_proc has a dependency on some_table.

2. Now understand the requirements.

a. The goal is to determine the impact of changes. Suppose some_table
is changed. Because some_proc depends on some_table, a change to
some_table could impact some_proc. That is what must be shown, the
dependency of some_proc on some_table.
b. There isn't any need to worry about whether or not a change to
some_table actually does impact some_proc. The issue is that there
is a potential impact. The process of impact analysis is really a
two step process:

(1) Figure out potential impacts.
(2) Consider the change carefully and see which potentially impacted
objects are actually impacted.

The issue is that figuring out where to look in the first place is
the pain point. That is the objective. The second part is just a
matter of going the analysis after figuring out where the analysis
should be performed.

3. The way to show a potential impact is to have lines drawn from dependent
objects to "parent" objects. The only way to draw a line is to create a
foreign key relationship. A foreign key relationship only exists between
two tables, not between a table and a stored procedure.

If the proper abstraction of the problem is made, the solution is obvious.
Create a new database. Every object in the database being analyzed is made
into a table in the new database. Dependencies between objects (which are
all tables now) are shown as foreign key relationships.

4. Generate DDL to create a table for every table, view, functions and stored procedure.
Every table name begins with a prefix indicating the type of object
for the original object. The DDL that is generated will create the following
tables:

TABL some_table
PROC some_proc

The tables will not contain data. I don't like using embedded spaces in object
names, but this is a good time to make an exception. Object names containing
embedded spaces must be delimited. The best approach is to use brackets. The
other approach involves using QUOTED_IDENTIFIERS, but that is a setting that
must be turned on. Brackets always work.

2. Generate DDL to create foreign keys for the tables created in the previous step.
The foreign keys are created where one object depends on another. Referring to
the example objects, a foreign key is needed to represent the dependency
some_proc has on some_table. This is accomplished by the following DDL (remember
it is necessary to delimit object names containing spaces):

alter table [PROC some_proc]
add constraint fk_P_some_proc_U_some_table
foreign key (fk) references [TABL some_table] (pk)

3. Sybase has all dependencies in sysdepends.
SQL Server 2000 has all dependencies in sysdepends and sysreferences.

4. How to make all of this work.
The SQL statements shown below are used to generate DDL statements that must
be executed. You will need to set the Query Analyzer or SQL Server Management
Studio to display the results as text instead of in a grid in order to
facilitate the copy/paste process.
*/

/* Step 1
Set to Results in Text first.
Run this to generate DDL statements.
*/

set nocount on  -- suppress the message "(n row(s) affected)"

/* This select statement generates DDL of this general format:

create table [PROC CustOrderHist] (pk int primary key, fk int)

It is necessary to create a table for each object that appears in the
diagram. The type of object is appended to the object's name so that
the diagram clearly indicates both the object type and object name.
*/
select
'create table ['
+ CASE rtrim(type)  -- rtrim is used because type is char(2)
WHEN 'FN' THEN 'FUNC'  -- scalar FuNction
WHEN 'IF' THEN 'FUNI'  -- Inline table Function
WHEN 'P'  THEN 'PROC'  -- stored Procedure
WHEN 'RF' THEN 'REPR'  -- Replication Filter stored procedure
WHEN 'TF' THEN 'FUNT'  -- Table Function
WHEN 'TR' THEN 'TRIG'  -- TRigger
WHEN 'U'  THEN 'TABL'  -- User table
WHEN 'V'  THEN 'VIEW'  -- View
WHEN 'X'  THEN 'XPRC'  -- eXtended stored procedure
END
+ ' ' + name + '] (pk int primary key, fk int)'
from sysobjects
/* restrict the diagram to only include the object types of interest */
where rtrim(type) in ('FN','IF','P','RF','TF','TR','U','V','X')
/* don't want system views in the diagram */
and name not in ('sysconstraints','syssegments')  -- these are system views
/* don't want the Visual SourceSafe stored procedures in the diagram */
and name not like 'dt_addtosourcecontrol%'
and name not like 'dt_adduserobject%'
and name not like 'dt_checkinobject%'
and name not like 'dt_checkoutobject%'
and name not like 'dt_displayoaerror%'
and name not like 'dt_drop%byid'
and name not like 'dt_getobjwithprop%'
and name not like 'dt_getpropertiesbyid%'
and name not like 'dt_isundersourcecontrol%'
and name not like 'dt_setpropertybyid%'
and name not like 'dt_validateloginparams%'
and name not like 'dt_verstamp%'
and name not like 'dt_whocheckedout%'
and name not in ('dtproperties','dt_generateansiname','dt_removefromsourcecontrol','dt_vcsenabled')

/* This select statement generates DDL of this general format:

alter table [PROC CustOrderHist] add constraint fk_P_CustOrderHist_U_Customers foreign key (fk) references [TABL Customers] (pk)
*/
select distinct 'alter table [' +
CASE rtrim(child.type)
WHEN 'FN' THEN 'FUNC'  -- scalar FuNction
WHEN 'IF' THEN 'FUNI'  -- Inline table Function
WHEN 'P'  THEN 'PROC'  -- stored Procedure
WHEN 'RF' THEN 'REPR'  -- Replication Filter stored procedure
WHEN 'TF' THEN 'FUNT'  -- Table Function
WHEN 'TR' THEN 'TRIG'  -- TRigger
WHEN 'U'  THEN 'TABL'  -- User table
WHEN 'V'  THEN 'VIEW'  -- View
WHEN 'X'  THEN 'XPRC'  -- eXtended stored procedure
END
+ ' ' + child.name
+ '] add constraint fk_'
+ REPLACE(rtrim(child.type)  + '_' + child.name + '_' +
rtrim(parent.type) + '_' + parent.name, ' ', '_' )
+ ' foreign key (fk) references ['
+ CASE rtrim(parent.type)
WHEN 'FN' THEN 'FUNC'   -- scalar FuNction
WHEN 'IF' THEN 'FUNI'   -- Inline table Function
WHEN 'P'  THEN 'PROC'   -- stored Procedure
WHEN 'RF' THEN 'REPR'   -- Replication Filter stored procedure
WHEN 'TF' THEN 'FUNT'   -- Table Function
WHEN 'TR' THEN 'TRIG'   -- TRigger
WHEN 'U'  THEN 'TABL'   -- User table
WHEN 'V'  THEN 'VIEW'   -- View
WHEN 'X'  THEN 'XPRC'   -- eXtended stored procedure
END
+ ' ' + parent.name
+ '] (pk)'
from sysobjects parent inner join sysdepends
on parent.id = sysdepends.depid
inner join sysobjects child
on child.id = sysdepends.id
where rtrim(child.type) in ('FN','IF','P','RF','TF','TR','U','V','X')
and child.name not in ('sysconstraints','syssegments')  -- these are system views
and child.name not like 'dt_addtosourcecontrol%'
and child.name not like 'dt_adduserobject%'
and child.name not like 'dt_checkinobject%'
and child.name not like 'dt_checkoutobject%'
and child.name not like 'dt_displayoaerror%'
and child.name not like 'dt_drop%byid'
and child.name not like 'dt_getobjwithprop%'
and child.name not like 'dt_getpropertiesbyid%'
and child.name not like 'dt_isundersourcecontrol%'
and child.name not like 'dt_setpropertybyid%'
and child.name not like 'dt_validateloginparams%'
and child.name not like 'dt_verstamp%'
and child.name not like 'dt_whocheckedout%'
and child.name not in ('dtproperties','dt_generateansiname','dt_removefromsourcecontrol','dt_vcsenabled')
and child.id != parent.id        -- filter out self dependencies

/* This select statement generates DDL of this general format:

alter table [TABL Orders] add constraint fk_U_Orders_U_Customers foreign key (fk) references [TABL Customers] (pk)
*/
select 'alter table [TABL'
+ ' ' + child.name
+ '] add constraint fk_'
+ REPLACE(rtrim(child.type)  + '_' + child.name + '_' +
rtrim(parent.type) + '_' + parent.name, ' ', '_' )
+ ' foreign key (fk) references [TABL'
+ ' ' + parent.name
+ '] (pk)'
from sysobjects parent inner join sysreferences
on parent.id = sysreferences.rkeyid
inner join sysobjects child
on child.id = sysreferences.fkeyid

/* Step 2
Run the DLL created by the previous statement.
You should probably execute the DDL in a completely new database to keep
the diagramming tables separate from the real tables.
*/

/* Step 3
Go make your diagram!
Important usage note: You might need to check the checkbox "Add related tables
automatically" before you select your first table. It depends on what you are
trying to accomplish.
If you are diagramming Northwind, select TABL Products to get a good
example diagram (check the checkbox first).
*/
Source:  http://www.dbazine.com/sql/sql-articles/cook17




Find the Nth Max Value

28 02 2007

Replace Employee with your table name, and Salary with your column name. The example below returns the 4th highest salary from the employee table.

Select * From Employee E1 Where
(4-1) = (Select Count(Distinct(E2.Salary)) From Employee E2 Where
E2.Salary > E1.Salary)

Source: http://www.sqlteam.com/item.asp?ItemID=16134





Return Numerics with Commas

28 02 2007

Replace @x with the field or variable you want formatted

replace(convert(varchar(25), convert(money, @x), 1), '.00', '')