Skip to content

CategoryDatabases

Compare Table Data

Currently I use RedGate’s SQL Data Compare to see differences in data across databases. It’s a great product. One thing I love about RedGate is the usability of their products. All of them are intuitive and easy to use. What if you don’t have the cash to do this though?

You could use the EXCEPT (introduced in SQL Server 2005) clause to do your comparison. For example:

  SELECT  *  
  FROM    #table2 -- returns 30 records

  EXCEPT

  SELECT  *  
  FROM    #table1 -- returns 10 records.

You get a result set of 20 records if those 10 records in the table1 statement are in table2. You can think of it as subtracting records from table2 that are in table1. You will get the end result of 20 records.

The following statement builds on that taking the end result and putting it back into table1. This is useful if you want to sync an outdated table (in this case table1) with another table that has additional records (table2).

INSERT INTO #table1
  SELECT  *  
  FROM    #table2 -- latest data table (we'll be copying from here)

  EXCEPT

  SELECT  *
  FROM    #table1 -- outdated table (has old data)

Other tools to compare data across tables: CompareData may let you do what you want, and its table compare is free unlimited, I believe, for the evaluation version. The great thing about this tool is also that it will check results of two SQL statements / stored procs. Useful when unit testing.

Also you can use TableDiff.exe in your “C:\Program Files\Microsoft SQL Server\90\COM” directory. Learn more about it at the SQL team website.

Connection Problem with SQL Server 2008

I can’t say how many various problems I’ve had connecting to SQL Server. I’ve lost count at this point. This was the last one I had, which I’ve encountered a number of times and always successful to fix it.

Cannot connect to x.x.x.x.

Additional Information:
A network-related or instance-specific error occurred while establishing a connection to SQL Server….

Now usually when I encounter this error, I do something suggested here at Pinal Dave’s site. This time nothing mentioned there worked. What did work, was me explicitly selecting the Network protocol, and not leaving it as the :

Still wondering why I have to do this, as I’ve never had to explicitly select the protocol before, and nothing on the server has changed in years. Will continue to investigate.

RedGate SQL Search

I love this tool. It’s a free add-on if you own SQL Prompt, RedGate’s version of intellisense, which is better, in my eyes, than Microsoft’s. It uses SQL Prompt’s FTS collection to search for any piece of string in any DB object. It’s crazy fast. What I like the most though, is the ability to right click on a result, and move the focus to appropriate DB object in the explorer panel. Hit CTRL+ALT+D to trigger the search screen and start typing away.

TSQL Functions Inspired By ColdFusion’s Lists Functions

In my last project, there was a bit of data scrubbing on the database side (SQL Server 2008) that I decided to create a few UDF’s that function similar to ColdFusion’s Lists function. The one that varies a little bit is ListLen(), since I needed to take into account empty tokens. The ChopIf() was inspired by Perl’s chop() function. These UDFs should be SQL Server 2005-compatible.

I should say though, that some of these functions depend on each other. ListLen(), GetToken(), and ChopIf() are independent.

------------------------------------------------------------------
-- Functions similarly like ColdFusion ListSort() function,
-- except it currently only sorts strings. 
--
-- Example 1:
--    dbo.ListSort( 'dan is so mega awesome that he rules all the time', 'ASC', ' ' )
--
-- Returns: 
--    all awesome dan he is mega rules so that the time
--
-- Example 2:
--    dbo.ListSort( 'dan is so mega awesome that he rules all the time', 'DESC', ' ' )
--
-- Returns: 
--    time the that so rules mega is he dan awesome all
------------------------------------------------------------------

CREATE FUNCTION [dbo].[ListSort]
(
  @string    VARCHAR(2000),  
  @sort_type CHAR(3)       = 'ASC',
  @delimiter VARCHAR(2000) = ','
)
RETURNS VARCHAR(500)
AS

BEGIN

  DECLARE @position AS INT
  DECLARE @token AS VARCHAR (2000)
  DECLARE @counter   AS INT
  DECLARE @sortedList AS VARCHAR(500)

  DECLARE @sortTempTable TABLE ( token VARCHAR(500) )
  DECLARE @sortedTable   TABLE ( token VARCHAR(500) )  

  SELECT @string   = @string + @delimiter,
         @counter  = 1,
         @position = 0,
         @token    = ''

  WHILE ( PATINDEX( '%' + @delimiter + '%' , @string )  0 ) 
  BEGIN
    SELECT @position = PATINDEX('%' + @delimiter + '%' , @string ),
           @token    = LEFT( @string, @position - 1 ),
           @string   = STUFF( @string, 1, @position, NULL ),
           @counter  = @counter + 1

    INSERT @sortTempTable( token ) VALUES( @token )     
  END
  
  SET @sortedList = ''

  -- Let's sort the table and put it into @sortedTable
  -- Because of nature of Rank(), we can't set @sortedList in this statement.
  -- Have to separate it into another select clause.
  INSERT INTO @sortedTable
    SELECT LTRIM( token )
    FROM   @sortTempTable
    ORDER  BY CASE WHEN @sort_type = 'ASC'  THEN ( RANK() OVER ( ORDER BY LTRIM(token) ASC ) )
                   WHEN @sort_type = 'DESC' THEN ( RANK() OVER ( ORDER BY LTRIM(token) DESC ) )
              END  

  SELECT @sortedList = @sortedList + token + @delimiter
  FROM   @sortedTable

  RETURN dbo.ChopIf( @sortedList, @delimiter )

END
GO
------------------------------------------------------------------
-- Functions sort of like ColdFusion's ListLen() method, but it
-- takes into account empty tokens. 
--
-- Example 1:
--    dbo.ListLen( 'Dan is cool', ' ' )
--
-- Returns: 
--    3
-- 
-- Example 2:
--    dbo.ListLen( 'dan,,very,,,,awesome,', ',' )
--
-- Returns: 
--    8
------------------------------------------------------------------

CREATE FUNCTION [dbo].[ListLen]
(
  @string VARCHAR(2000),
  @delimiter VARCHAR(2000) = ','
)
RETURNS INT
AS
BEGIN

  DECLARE @loopCount INT, 
          @tokenCount INT

  SELECT @loopCount = 0, 
         @tokenCount = 0

  -- If it's an empty string, the list length is 0
  IF DATALENGTH( @string ) = 0
    BEGIN
      SET @tokenCount = 0
    END
  ELSE
    BEGIN
      -- Count tokens, including empty ones like dan,,very,,,,awesome,
      SET @tokenCount = @tokenCount + 1
      WHILE ( @loopCount < DATALENGTH( @string ) )
      BEGIN
        IF SUBSTRING( @string, @loopCount, DATALENGTH( @delimiter ) ) = @delimiter
          BEGIN
            SET @tokenCount = @tokenCount + 1
          END
        SET @loopCount = @loopCount + 1
      END
    END

  -- Handle extra count from space being delimiter
  IF @delimiter = ' '
    SET @tokenCount = @tokenCount - 1

  -- If there's no token to the right of the last delimiter, then count that
  -- as an empty token.
  IF ( RIGHT( @string, 1 ) = @delimiter ) 
  BEGIN
    SET @tokenCount = @tokenCount + 1
  END

  RETURN @tokenCount

END
GO

——————————————————————
— Functions like ColdFusion's ListLast()
— Gets token value that's been separated by a delimiter.

— Example:
— dbo.ListLast( 'Dan is cool', ' ' )

— Returns:
— cool
——————————————————————

CREATE FUNCTION [dbo].[ListLast]
(
@string VARCHAR(2000),
@delimiter VARCHAR(2000) = ','
)
RETURNS VARCHAR(2000)
AS
BEGIN

RETURN dbo.ListGetAt( @string, dbo.ListLen( @string, @delimiter ) , @delimiter )

END
GO

——————————————————————
— Wrapper for GetToken() Function
— Gets token value that's been separated by a delimiter.

— Example:
— dbo.ListGetAt( 'Dan is cool', 2, ' ' )

— Returns:
— is
——————————————————————

CREATE FUNCTION [dbo].[ListGetAt]
(
@string VARCHAR(2000),
@token INT,
@delimiter VARCHAR(2000)
)
RETURNS VARCHAR(2000)
AS
BEGIN
RETURN dbo.GetToken( @string, @token, @delimiter )
END
GO

——————————————————————
— Returns the first item in a tokenized list.

— Example:
— dbo.ListFirst( 'Dan is cool', ' ' )

— Returns:
— Dan
——————————————————————

CREATE FUNCTION [dbo].[ListFirst]
(
@string VARCHAR(2000),
@delimiter VARCHAR(2000) = ','
)
RETURNS VARCHAR(2000)
AS
BEGIN

RETURN dbo.ListGetAt( @string, 1, @delimiter )

END
GO

——————————————————————
— Functions similarly like ColdFusion GetToken() Function.
— Gets token value that's been separated by a delimiter.

— Example:
— dbo.GetToken( 'Dan is cool', 2, ' ' )

— Returns:
— is
——————————————————————

CREATE FUNCTION [dbo].[GetToken]
(
@string VARCHAR(2000),
@tokenPosition INT,
@delimiter VARCHAR(2000)
)
RETURNS VARCHAR(2000)
AS
BEGIN

DECLARE @position AS INT
DECLARE @token AS VARCHAR (2000)
DECLARE @counter AS INT

SELECT @string = @string + @delimiter,
@counter = 1,
@position = 0,
@token = ''

WHILE ( PATINDEX('%' + @delimiter + '%' , @string ) 0) AND ( @tokenPosition + 1 @counter )
BEGIN
SELECT @position = PATINDEX(‘%’ + @delimiter + ‘%’ , @string),
@token = LEFT(@string, @position-1),
@string = STUFF(@string, 1, @position, null),
@counter = @counter + 1
END

RETURN @token
END

------------------------------------------------------------------
-- Chops the last character if it's @chopped
--
-- Example:
--    dbo.ChopIf( 'Dan is cool!', '!' )
--
-- Returns: 
--    Dan is cool
------------------------------------------------------------------

CREATE FUNCTION [dbo].[ChopIf]
(
  @string VARCHAR(2000),
  @chopped VARCHAR(2000)
)
RETURNS VARCHAR(2000)
AS
BEGIN

  IF ( RIGHT( @string, DATALENGTH(@chopped) ) = @chopped )
  BEGIN
    SET @string = LEFT( @string, DATALENGTH( @string ) - DATALENGTH( @chopped ) ) 
  END 

  RETURN @string
  
END
GO

Testing Database Import Scripts with Snapshots

Can’t stress enough how handy database snapshots (compatible only in Enterprise and Developer editions of SQL Server 2005+) come in when testing bulk imports, data scrubbing, or any sort of data / schema modification script. In nearly no time, I can create a snapshot (via shortcut snippet), run my script – don’t like the results? My script crapped out the data? I can run an instant undo by recovering from the snapshot, which works pretty fast. If I want to create another snapshot, I usually tend to overwrite the snapshot I created, and then create it again.

It’s so handy that I have a snippet for it:

Anywhoot, here’s how you create a snapshot:

-- CREATING A SNAPSHOT
CREATE DATABASE YourDatabase_snap ON         -- Name of new snapshot
(
  NAME = yourdb_data,                        -- Logical filename of source db you want to snapshot
  FILENAME = 'c:\YourDatabase_data_1800.ss'  -- Sparse file to create
)
AS SNAPSHOT OF YourDatabase;                 -- Source db name
-- RECOVERING FROM A SNAPSHOT
USE master;
RESTORE DATABASE YourDatabase                    -- Source db name where the data resided
FROM DATABASE_SNAPSHOT = 'YourDatabase_snap';    -- Snapshot db name

Deleting a snapshot is just like dropping a database.

DROP DATABASE YourDatabase_Snap

Linked Server via MS Jet 4.0 Provider

Here’s another way to created a linked server using another provider, Microsoft Jet 4.0 OLE DB Provider. This is for SQL Server 2008. Check out my previous Linked Server tutorial if needed for SQL Server 2005.

For this setup, let’s use the login’s current security context. Make sure that you whatever SSMS you use to connect to the server uses the same user credentials as user that created the linked server.

So if I create that on a server, then I’ll have query it using the same credentials just used. If you want to know how to query it, check out my Previous Tutorial.

syspolicy_purge_history

This is a new job that is created by default on SQL Server 2008. By default, the job will most likely fail unless you fix it.

Where it breaks is on STEP 3, which is a Powershell command. It does not reference the correct SQL Server object. Change it to the following to fix it:

(Get-Item SQLSERVER:\SQLPolicy\COMPUTERNAME\DEFAULT).EraseSystemHealthPhantomRecords()

The purpose of this job is to purge unneeded information coming from SQL Server 2008’s new Policy Management features.

More about this particular issue.

More about Policy-Based Management by Pinal Dave.

Common Table Expressions

This feature was introduced on SQL Server 2005. It’s a great way to query another query on the fly. I prefer using these over derived tables (DTs) because it provides more flexibility. Some people report better performance using Common Table Expressions (CTEs). I’ve seen and heard both though (that DTs are faster), but I suppose it depends. Just test it out and see for yourself.

Anywhoot, let’s play with CTEs. First let’s create two tables with dummy data.

CREATE TABLE Records 
(
  RecordID INT IDENTITY(1, 1) PRIMARY KEY,
  RandomData VARCHAR(100)
)

DECLARE @t INT 
SET @t = 0
WHILE @t < 1000
BEGIN
  SET @t = @t + 1
  INSERT INTO Records VALUES( NEWID() ) 
END


CREATE TABLE Information 
(
  RecordID INT IDENTITY(1, 1) PRIMARY KEY,
  RandomData VARCHAR(100)
)
DECLARE @t INT 
SET @t = 0
WHILE @t < 1000
BEGIN
  SET @t = @t + 1
  INSERT INTO Information VALUES( NEWID() ) 
END

Now that we’ve create the dummy tables, here’s a barebones example of a CTE:

WITH Slice1 AS 
(
  -- The results for this query gets put into Slice1
  -- It persists for the life of this query.
  SELECT * FROM Records
  WHERE RecordID BETWEEN 5 AND 400

) -- Done creating a virtual table called Slice1, now let's 
  -- query it:
  SELECT * FROM Slice1
  WHERE RecordID > 300

CTE’s real power comes when you create multiple virtual tables then finally query them, joining any virtual table you created:

-- This whole thing is 1 query:
WITH    Slice1
          AS ( SELECT   RecordID,
                        RandomData
               FROM     Records
             ) , -- done creating Table Slice1
             
        Slice2
          AS ( SELECT   RecordID
               FROM     Records
               WHERE    RecordID BETWEEN 10 AND 20
             ) , -- done creating Table Slice2
             
        Info
          AS ( SELECT   RecordID
               FROM     Information
               WHERE    RecordID IN ( 5, 6, 7, 9, 15, 18 )
             ) -- done creating Table Info
             
  -- Now that we've created all these virtual tables, let's use them together in
  -- one single query:             
    SELECT  RecordID,
            RandomData
    FROM    Slice1
    WHERE   Slice1.RecordID IN ( SELECT *
                                 FROM   Info
                                 WHERE  RecordID IN ( SELECT    RecordID
                                                      FROM      Slice2 ) )

Linked Servers

I had to import information from an Excel file with datasheets that had 40+ columns. Using SSIS could be a bit tricky sometimes, so I decided to use a linked server. This feature works well. It’s fast and less of a headache than SSIS. Originally designed for connecting to other databases, you can use it to import information by linking to a file. This is how I went about it (this is for SQL Server 2005) in importing an Excel (.xls) file.

  1. Under Server Objects in your instance, create a new Linked Server:

    1

  2. Under the General section, pick an appropriate name for your linked server. Pick the OLE DB provider for Excel documents:

    2

  3. Since I’m using this on a local machine, I don’t have to worry about security too much. Select "Be made without using a security context" under the Security section.

    3

  4. Select your Server Options. I suggest these settings for local access.

    4

  5. Hit OK to create it. You’ll see the following objects:

    5

SQL Server reads a spreadsheet in a workbook as a table. So now that we’ve created our linked server, let’s see how to query them.

-- Querying three spreadsheets.
SELECT * FROM Hardware...['CORE PROBONO$']
SELECT * FROM Hardware...['ET013-PartialRackElevation$']
SELECT * FROM Hardware...[ILO_TEMPLATE$]

Since I don’t always want to rely on the linked server, create tables into my general database where I slice and dice data.

-- Import data from a linked server into a database table
SELECT * 
INTO General.dbo.Elevation
FROM Hardware...['ET013-PartialRackElevation$']

LogParser to Query IIS logs using SQL

LogParser is a great way to query IIS logs (any text log, actually, that is delimited).

Once you have it installed (default install is to C:\Program Files\Log Parser 2.2), let’s try to query log file ex090915.log from directory C:\WINDOWS\system32\LogFiles\W3SVC1942853941 . The way you would do this, is this:

LogParser "select date, s-ip, cs-method from C:\WINDOWS\system32\LogFiles\W3SVC1942853941\ex090915.log" -rtp:-1

As you can probably imagine, “date”, “s-ip”, and “cs-method” are the column headers from the log file. The select statement goes in quotes. Also, rather naming a table, you give the path to the log file. What’s the argument -rtp:-1 ? If you don’t include this argument, every 10 results, it will prompt you to “press a key…,” then will show you the next batch of results. In any case, the select state we just ran will spit out the following in the console:

date       s-ip            cs-method 
---------- --------------- ---------
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 POST
2009-09-15 192.168.157.128 POST


Statistics:
-----------
Elements processed: 27
Elements output:    27
Execution time:     0.02 seconds

LogParser will even generate graphs (.gif format) of your results.

If you want to use a GUI for your queries, I suggest you try Log Parser Lizard.

Log Parser Lizard

Log Parser Lizard is a great free tool if you use Log Parser to parse IIS logs using SQL. It’s a visual tool to query the logs. It also comes with pre-made queries. Let’s take a look at one, “Requests and Full Status by Number of Hits” in IIS logs:

-- Let's query the IIS W3SVC80086301 Log file c:\temp\logs\ex080918.log
SELECT  STRCAT( cs-uri-stem, 
    REPLACE_IF_NOT_NULL(cs-uri-query, STRCAT('?',cs-uri-query))
    ) AS Request, 
  STRCAT( TO_STRING(sc-status),     
    STRCAT( '.',
      COALESCE(TO_STRING(sc-substatus), '?' )
      )
    ) AS Status, 
  COUNT(*) AS Total 
FROM c:\temp\logs\ex080918.log 
WHERE (sc-status >= 400) 
GROUP BY Request, Status 
ORDER BY Total DESC

Which gives you the following result (depending, of course, what’s in your logs):

2

Also, I could’ve queried all the log files put together, such as:

select * from c:\temp\logs\*log

Also, you can create global variables and use them in your queries so that you don’t always have to put the full path to a file. For example:

3

I’m setting the variable IISW3C equal to c:\temp\logs\ex*.log . The queries that come with this tool use these variables (keys) as a shortcut. For your IIS logs dir, you may want to set it up to point to C:\WINDOWS\system32\LogFiles\W3SVC80086301 . Once you’ve done this, you can do (HIT F5 to run query):

-- Get the top 10 from all IIS logs
select top 10 * from #IISW3C# 

You can also view LogParser graphs from this tool. Let’s try the query to show all extension with total hits:

SELECT  TO_UPPERCASE(EXTRACT_EXTENSION( cs-uri-stem )) AS Extension, 
  COUNT(*) AS [Total Hits]
FROM #IISW3C# 
GROUP BY Extension 
-- Ignore .CFM extension
HAVING TO_UPPERCASE(EXTRACT_EXTENSION( cs-uri-stem ))  'CFM'
ORDER BY [Total Hits] DESC

4s

Import MySQL Data into SQL Server

Today I needed to analyze some forum data from vBulletin running MySQL. The table on MySQL had 60,000 records. Because my playing field is SQL Server and not MySQL, and I needed to slice and dice the data, I needed a way to get the data onto SQL Server. Because of some security restrictions, I could not set up a linked server on SQL Server. I don’t have remote access to the Linux box either. I tried exporting from SQLYog, but CSV data could not be properly delimited and failed when I did a database import via the SSIS import wizard (the table has a lot of flexability to use any character and is often abused by spammers). What did I do?

I only had 4 columns to import for the table. So I ran a select statement returning one column ordered by the id. Then I copied and pasted into an Excel spreadsheet. I did this for all four rows. Because Excel doesn’t use delimiters, but rather cells to separate, I didn’t have to worry about data breaking. Then after that, I did an import via the SSIS import wizard. Ta-da, I can now slice and dice my data. There are probably more efficient ways to do this, but I needed a quick solution and this did it.

WordPress Tip: Allow comments on this post by default

Being tired of checking the “Allow comments on this post” checkbox every time I make a new post, I went into the database and changed the setting. By default, it’s set to “closed” meaning you have to check it all the time. See for yourself:

select option_name, option_value
from wp_options where option_name in ( 'default_comment_status' );

Change it to “open”:

update wp_options
set option_value='open'
where option_name in('default_comment_status');

Now the checkbox should always be checked.

Bulk Import Ignoring Identity Column

Ever have to bulk import a text file (e.g. from Excel, or tabular delimited rows) into a table that had the first column be an identity auto-incrementing primary key? Yes, you could create a format file that skips the identity column, so that the first column of your text file doesn’t go into the identity column of your table. This MSDN page shows more about it.

The quick way though, is to create a view of that table and omit the identity column when you create the view. In this manner, your first column in the text file won’t map to the identity column and throw one of those delicious BCP errors.