Skip to content

CategorySQL Server

Linked Server via MS Jet 4.0 Provider

Here’s another way to created a linked server using another provider, Microsoft Jet 4.0 OLE DB Provider. This is for SQL Server 2008. Check out my previous Linked Server tutorial if needed for SQL Server 2005.

For this setup, let’s use the login’s current security context. Make sure that you whatever SSMS you use to connect to the server uses the same user credentials as user that created the linked server.

So if I create that on a server, then I’ll have query it using the same credentials just used. If you want to know how to query it, check out my Previous Tutorial.

syspolicy_purge_history

This is a new job that is created by default on SQL Server 2008. By default, the job will most likely fail unless you fix it.

Where it breaks is on STEP 3, which is a Powershell command. It does not reference the correct SQL Server object. Change it to the following to fix it:

(Get-Item SQLSERVER:\SQLPolicy\COMPUTERNAME\DEFAULT).EraseSystemHealthPhantomRecords()

The purpose of this job is to purge unneeded information coming from SQL Server 2008’s new Policy Management features.

More about this particular issue.

More about Policy-Based Management by Pinal Dave.

Common Table Expressions

This feature was introduced on SQL Server 2005. It’s a great way to query another query on the fly. I prefer using these over derived tables (DTs) because it provides more flexibility. Some people report better performance using Common Table Expressions (CTEs). I’ve seen and heard both though (that DTs are faster), but I suppose it depends. Just test it out and see for yourself.

Anywhoot, let’s play with CTEs. First let’s create two tables with dummy data.

CREATE TABLE Records 
(
  RecordID INT IDENTITY(1, 1) PRIMARY KEY,
  RandomData VARCHAR(100)
)

DECLARE @t INT 
SET @t = 0
WHILE @t < 1000
BEGIN
  SET @t = @t + 1
  INSERT INTO Records VALUES( NEWID() ) 
END


CREATE TABLE Information 
(
  RecordID INT IDENTITY(1, 1) PRIMARY KEY,
  RandomData VARCHAR(100)
)
DECLARE @t INT 
SET @t = 0
WHILE @t < 1000
BEGIN
  SET @t = @t + 1
  INSERT INTO Information VALUES( NEWID() ) 
END

Now that we’ve create the dummy tables, here’s a barebones example of a CTE:

WITH Slice1 AS 
(
  -- The results for this query gets put into Slice1
  -- It persists for the life of this query.
  SELECT * FROM Records
  WHERE RecordID BETWEEN 5 AND 400

) -- Done creating a virtual table called Slice1, now let's 
  -- query it:
  SELECT * FROM Slice1
  WHERE RecordID > 300

CTE’s real power comes when you create multiple virtual tables then finally query them, joining any virtual table you created:

-- This whole thing is 1 query:
WITH    Slice1
          AS ( SELECT   RecordID,
                        RandomData
               FROM     Records
             ) , -- done creating Table Slice1
             
        Slice2
          AS ( SELECT   RecordID
               FROM     Records
               WHERE    RecordID BETWEEN 10 AND 20
             ) , -- done creating Table Slice2
             
        Info
          AS ( SELECT   RecordID
               FROM     Information
               WHERE    RecordID IN ( 5, 6, 7, 9, 15, 18 )
             ) -- done creating Table Info
             
  -- Now that we've created all these virtual tables, let's use them together in
  -- one single query:             
    SELECT  RecordID,
            RandomData
    FROM    Slice1
    WHERE   Slice1.RecordID IN ( SELECT *
                                 FROM   Info
                                 WHERE  RecordID IN ( SELECT    RecordID
                                                      FROM      Slice2 ) )

Linked Servers

I had to import information from an Excel file with datasheets that had 40+ columns. Using SSIS could be a bit tricky sometimes, so I decided to use a linked server. This feature works well. It’s fast and less of a headache than SSIS. Originally designed for connecting to other databases, you can use it to import information by linking to a file. This is how I went about it (this is for SQL Server 2005) in importing an Excel (.xls) file.

  1. Under Server Objects in your instance, create a new Linked Server:

    1

  2. Under the General section, pick an appropriate name for your linked server. Pick the OLE DB provider for Excel documents:

    2

  3. Since I’m using this on a local machine, I don’t have to worry about security too much. Select "Be made without using a security context" under the Security section.

    3

  4. Select your Server Options. I suggest these settings for local access.

    4

  5. Hit OK to create it. You’ll see the following objects:

    5

SQL Server reads a spreadsheet in a workbook as a table. So now that we’ve created our linked server, let’s see how to query them.

-- Querying three spreadsheets.
SELECT * FROM Hardware...['CORE PROBONO$']
SELECT * FROM Hardware...['ET013-PartialRackElevation$']
SELECT * FROM Hardware...[ILO_TEMPLATE$]

Since I don’t always want to rely on the linked server, create tables into my general database where I slice and dice data.

-- Import data from a linked server into a database table
SELECT * 
INTO General.dbo.Elevation
FROM Hardware...['ET013-PartialRackElevation$']

LogParser to Query IIS logs using SQL

LogParser is a great way to query IIS logs (any text log, actually, that is delimited).

Once you have it installed (default install is to C:\Program Files\Log Parser 2.2), let’s try to query log file ex090915.log from directory C:\WINDOWS\system32\LogFiles\W3SVC1942853941 . The way you would do this, is this:

LogParser "select date, s-ip, cs-method from C:\WINDOWS\system32\LogFiles\W3SVC1942853941\ex090915.log" -rtp:-1

As you can probably imagine, “date”, “s-ip”, and “cs-method” are the column headers from the log file. The select statement goes in quotes. Also, rather naming a table, you give the path to the log file. What’s the argument -rtp:-1 ? If you don’t include this argument, every 10 results, it will prompt you to “press a key…,” then will show you the next batch of results. In any case, the select state we just ran will spit out the following in the console:

date       s-ip            cs-method 
---------- --------------- ---------
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 GET
2009-09-15 192.168.157.128 POST
2009-09-15 192.168.157.128 POST


Statistics:
-----------
Elements processed: 27
Elements output:    27
Execution time:     0.02 seconds

LogParser will even generate graphs (.gif format) of your results.

If you want to use a GUI for your queries, I suggest you try Log Parser Lizard.

Log Parser Lizard

Log Parser Lizard is a great free tool if you use Log Parser to parse IIS logs using SQL. It’s a visual tool to query the logs. It also comes with pre-made queries. Let’s take a look at one, “Requests and Full Status by Number of Hits” in IIS logs:

-- Let's query the IIS W3SVC80086301 Log file c:\temp\logs\ex080918.log
SELECT  STRCAT( cs-uri-stem, 
    REPLACE_IF_NOT_NULL(cs-uri-query, STRCAT('?',cs-uri-query))
    ) AS Request, 
  STRCAT( TO_STRING(sc-status),     
    STRCAT( '.',
      COALESCE(TO_STRING(sc-substatus), '?' )
      )
    ) AS Status, 
  COUNT(*) AS Total 
FROM c:\temp\logs\ex080918.log 
WHERE (sc-status >= 400) 
GROUP BY Request, Status 
ORDER BY Total DESC

Which gives you the following result (depending, of course, what’s in your logs):

2

Also, I could’ve queried all the log files put together, such as:

select * from c:\temp\logs\*log

Also, you can create global variables and use them in your queries so that you don’t always have to put the full path to a file. For example:

3

I’m setting the variable IISW3C equal to c:\temp\logs\ex*.log . The queries that come with this tool use these variables (keys) as a shortcut. For your IIS logs dir, you may want to set it up to point to C:\WINDOWS\system32\LogFiles\W3SVC80086301 . Once you’ve done this, you can do (HIT F5 to run query):

-- Get the top 10 from all IIS logs
select top 10 * from #IISW3C# 

You can also view LogParser graphs from this tool. Let’s try the query to show all extension with total hits:

SELECT  TO_UPPERCASE(EXTRACT_EXTENSION( cs-uri-stem )) AS Extension, 
  COUNT(*) AS [Total Hits]
FROM #IISW3C# 
GROUP BY Extension 
-- Ignore .CFM extension
HAVING TO_UPPERCASE(EXTRACT_EXTENSION( cs-uri-stem ))  'CFM'
ORDER BY [Total Hits] DESC

4s

Import MySQL Data into SQL Server

Today I needed to analyze some forum data from vBulletin running MySQL. The table on MySQL had 60,000 records. Because my playing field is SQL Server and not MySQL, and I needed to slice and dice the data, I needed a way to get the data onto SQL Server. Because of some security restrictions, I could not set up a linked server on SQL Server. I don’t have remote access to the Linux box either. I tried exporting from SQLYog, but CSV data could not be properly delimited and failed when I did a database import via the SSIS import wizard (the table has a lot of flexability to use any character and is often abused by spammers). What did I do?

I only had 4 columns to import for the table. So I ran a select statement returning one column ordered by the id. Then I copied and pasted into an Excel spreadsheet. I did this for all four rows. Because Excel doesn’t use delimiters, but rather cells to separate, I didn’t have to worry about data breaking. Then after that, I did an import via the SSIS import wizard. Ta-da, I can now slice and dice my data. There are probably more efficient ways to do this, but I needed a quick solution and this did it.

Bulk Import Ignoring Identity Column

Ever have to bulk import a text file (e.g. from Excel, or tabular delimited rows) into a table that had the first column be an identity auto-incrementing primary key? Yes, you could create a format file that skips the identity column, so that the first column of your text file doesn’t go into the identity column of your table. This MSDN page shows more about it.

The quick way though, is to create a view of that table and omit the identity column when you create the view. In this manner, your first column in the text file won’t map to the identity column and throw one of those delicious BCP errors.

IIS Logs Scripts

While working with some IIS logs, I decided to start practicing my Python. I put together some handy Python functions to work with IIS Log files. These will come in handy. On a 3GB, 2.5GHz, running WinXP machine, these functions take about 3 seconds to process a 180MB Text file. Python code could be optimized to be faster if you’re dealing with larger sized files.

#!/usr/bin/env python

# An IIS log file can have various log properties. Everytime you add new columns to log for
# in IIS, it creates a new row full of columns.
import re
import os

MainLogDelimiter = "#Software: Microsoft Internet Information Services 6.0"
TestFile         = "C:\\Dan\\IIS-Log-Import\\Logs\\not-the-same.txt"
BigTestFile      = "C:\\Dan\\IIS-Log-Import\\Logs\\ex090914\\ex090914.log"
LogsDir          = "C:\\Dan\\IIS-Log-Import\\Logs"

def SearchForFile( rootpath, searchfor, includepath = 0 ):
  
  # Search for a file recursively from a root directory.
  #  rootpath  = root directory to start searching from.
  #  searchfor = regexp to search for, e.g.:
  #                 search for *.jpg : \.exe$                     
  #  includepath = appends the full path to the file
  #                this attribute is optional
  # Returns a list of filenames that can be used to loop
  # through.
  #
  # TODO: Use the glob module instead. Could be faster.  
  names = []
  append = ""
  for root, dirs, files in os.walk( rootpath ): 
    for name in files:
      if re.search( searchfor, name ):
        if includepath == 0:
          root = ""          
        else:          
          append = "\\"
        names.append( root + append + name )        
  return names  


def isSameLogProperties( FILE ):
  # Tests to see if a log file has the same number of columns throughout
  # This is in case new column properties were added/subtracted in the course
  # of the log file.
  FILE.seek( 0, 0 )
  SubLogs = FILE.read().split( MainLogDelimiter )
  
  # SubLogs[0] Stores the number of different log variations in the log file  
  SubLogs[0] = len( SubLogs ) - 1    
  
  # Grab the column names from the log file, separated by space
  columns = re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[1], re.IGNORECASE | re.MULTILINE ).group(1)   
  LogSameProperties = True
  
  for i in range( 2, SubLogs[0] + 1 ):
    # If there are columns
    if ( len( columns ) > 0 ):    
      if ( columns != re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[i], re.IGNORECASE | re.MULTILINE ).group(1) ):        
        LogSameProperties = False
        break  
    
  return LogSameProperties
  

def getFirstColumn( FILE ):
  # This gets the columns from a log file. It returns only the first columns, and ignores another column
  # row that may exist in case new columns were added/subtracted in IIS. 
  # input: FILE
  # output: 1 single element List
  FILE.seek( 0, 0 )
  names = []
  # Grab the column names from the log file, separated by space
  names.append( re.search( "^#Fields:\s([\w\-()\s]+)$", FILE.read().split( MainLogDelimiter )[1], re.IGNORECASE | re.MULTILINE ).group(1).strip() )
  return names
  

def getAllColumns( FILE ):
  # This gets all the columns from a log file. 
  # input: FILE
  # output: List
  FILE.seek( 0, 0 )  
  names = []
  SubLogs = FILE.read().split( MainLogDelimiter )    
  # SubLogs[0] Stores the number of different log variations in the log file  
  SubLogs[0] = len( SubLogs ) - 1        
  for i in range( 1, SubLogs[0] + 1 ):        
    names.append( re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[i], re.IGNORECASE | re.MULTILINE ).group(1).strip() )  
  return names  


# EXAMPLE:
# Loop through all the IIS log files in the directory
# for file in SearchForFile( LogsDir, "\.txt$", 1 ):  
LogFile = open( file, "r" )
if ( isSameLogProperties( LogFile ) ):
  print file, "the same"
else:
  print file, "not the same"
LogFile.close()

Python and SQL Server

Setting up Python to connect to SQL Server was relatively easy. First, you select a DB API driver. I chose pyodbc because I saw a Python article on Simple-Talk. There are two simple steps:

  1. Install Pywin32. Get the latest. It’s a dependency for pyodbc.
  2. Install pyodbc. Get it for the version of Python you’re using.

Once you’ve done this, you can query your SQL Server db as so:

import pyodbc

connection = pyodbc.connect('DRIVER={SQL Server};SERVER=192.168.0.5;DATABASE=MyAwesomeDB;UID=sa;PWD=password')
cursor = connection.cursor()

cursor.execute("select * from states")

for row in cursor:
  print row.StateID, row.Abbreviation, row.Name

For more snippets and a tutorial, check out the documentation.

Now let’s try something more interesting. Let’s try doing some inserts and see how long it takes.

import win32api
import uuid
import pyodbc 

connection = pyodbc.connect('DRIVER={SQL Server};SERVER=192.168.0.5;DATABASE=MrSkittles;UID=sa;PWD=password')
cursor = connection.cursor()

_start = win32api.GetTickCount()

for i in range( 0, 10000 ):  
  # Let's insert two pieces of data, both random UUIDs. 
  sql = "INSERT INTO Manager VALUES( '" + str( uuid.uuid4() ) + "', '" + str( uuid.uuid4() ) + "' )"  
  cursor.execute( sql )
  connection.commit()

_end = win32api.GetTickCount()
_total = _end - _start

print "\n\nProcess took", _total * .001, "seconds"

After some tests, 10,000 records took roughly 20-30 seconds. 1,000,000 records took 30 to 40 minutes. A bit slow, but it’s not a server machine. My machine is a Core Duo, 1.8Ghz x 2, at ~4GB with PAE on WindowsXP, but I ran this on a VMware VM with 1GB and SQL Server 2005 w/Windows Server 2003. The table was a two column table both varchar(50). On a server machine, it should be a helluva lot faster.