Skip to content
Latest

Remove Comments from IIS Logs

If you think that Log Parser is a bit on the slow side (i.e. if you’re dealing with big IIS logs) and you want to bulk import your logs into SQL Server, then you’ll have to remove # comments from the log files. Microsoft has the PrepWebLog Utility to do this, but it seems to choke for files that are > 100 MB. Also, you’ll have to write this as a batch file so it goes through a whole directory of files.

I wrote a Perl script that’s relatively fast (faster than PrepWebLog) and it can crawl folders/subfolders recursively. Here it is:

# parse.pl
# example: 
#   parse c:\temp\logs\logs*\*.log
#
# Requirement: no spaces in the directory names and file names.
# This gets called via run.bat. 


sub getFileList 
{    
    # This function returns an array of file list based on filter
    # This is the filter they can put in.       
    # Returns a file with full path. 
    # Example of filters: getFileList ( "*.log" );
    @files = ;
    return @files;    
}


sub remove_comments
{
  # Remove # pound sign comments from files. 
  # @_[0] = filename
  
  open (my $in, "", "@_[0].txt") 
      or die "out: @_[0]";

  while( my $line = )
  {
      print $out $line
          unless $line =~ /^#/;
  }

  close $in;
  close $out;
}


########## MAIN #############
$arg = @ARGV[0];

# Location of root directory of logs files
#$arg = 'c:\temp\logs\logs*\*.log';

# Replace slashes
$arg =~ s/\\/\\\\/g;

# Loop through all the log files. 
for $file (getFileList ($arg))
{  
  print ( "Processing file $file ... \n" );    
  remove_comments( $file );  
}

The Perl script gets called via run.bat:

REM No spaces in directory and file names.
perl Parse.pl D:\statesites\W3SVC*\*.log
pause

Sharepod Manages iPhone/iPod Songs

Just tried out Sharepod. It’s a great free program to extract songs from your device. No installation needed, since it’s a standalone executable. No ads or sign-ups. I did come across some quirks though.

I was able to:

  • Delete MP3s from my iPhone
  • Copy MP3s from my computer onto my iPhone
  • Copy MP3s from my iPhone onto my computer

I was not able to:

  • Update a playlist when I copied an MP3 from my computer to the iPhone – still need iTunes
  • Do anything with the Photos options on Sharepod – when I click on  “Photos” the application locks up.

IIS Logs Scripts

While working with some IIS logs, I decided to start practicing my Python. I put together some handy Python functions to work with IIS Log files. These will come in handy. On a 3GB, 2.5GHz, running WinXP machine, these functions take about 3 seconds to process a 180MB Text file. Python code could be optimized to be faster if you’re dealing with larger sized files.

#!/usr/bin/env python

# An IIS log file can have various log properties. Everytime you add new columns to log for
# in IIS, it creates a new row full of columns.
import re
import os

MainLogDelimiter = "#Software: Microsoft Internet Information Services 6.0"
TestFile         = "C:\\Dan\\IIS-Log-Import\\Logs\\not-the-same.txt"
BigTestFile      = "C:\\Dan\\IIS-Log-Import\\Logs\\ex090914\\ex090914.log"
LogsDir          = "C:\\Dan\\IIS-Log-Import\\Logs"

def SearchForFile( rootpath, searchfor, includepath = 0 ):
  
  # Search for a file recursively from a root directory.
  #  rootpath  = root directory to start searching from.
  #  searchfor = regexp to search for, e.g.:
  #                 search for *.jpg : \.exe$                     
  #  includepath = appends the full path to the file
  #                this attribute is optional
  # Returns a list of filenames that can be used to loop
  # through.
  #
  # TODO: Use the glob module instead. Could be faster.  
  names = []
  append = ""
  for root, dirs, files in os.walk( rootpath ): 
    for name in files:
      if re.search( searchfor, name ):
        if includepath == 0:
          root = ""          
        else:          
          append = "\\"
        names.append( root + append + name )        
  return names  


def isSameLogProperties( FILE ):
  # Tests to see if a log file has the same number of columns throughout
  # This is in case new column properties were added/subtracted in the course
  # of the log file.
  FILE.seek( 0, 0 )
  SubLogs = FILE.read().split( MainLogDelimiter )
  
  # SubLogs[0] Stores the number of different log variations in the log file  
  SubLogs[0] = len( SubLogs ) - 1    
  
  # Grab the column names from the log file, separated by space
  columns = re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[1], re.IGNORECASE | re.MULTILINE ).group(1)   
  LogSameProperties = True
  
  for i in range( 2, SubLogs[0] + 1 ):
    # If there are columns
    if ( len( columns ) > 0 ):    
      if ( columns != re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[i], re.IGNORECASE | re.MULTILINE ).group(1) ):        
        LogSameProperties = False
        break  
    
  return LogSameProperties
  

def getFirstColumn( FILE ):
  # This gets the columns from a log file. It returns only the first columns, and ignores another column
  # row that may exist in case new columns were added/subtracted in IIS. 
  # input: FILE
  # output: 1 single element List
  FILE.seek( 0, 0 )
  names = []
  # Grab the column names from the log file, separated by space
  names.append( re.search( "^#Fields:\s([\w\-()\s]+)$", FILE.read().split( MainLogDelimiter )[1], re.IGNORECASE | re.MULTILINE ).group(1).strip() )
  return names
  

def getAllColumns( FILE ):
  # This gets all the columns from a log file. 
  # input: FILE
  # output: List
  FILE.seek( 0, 0 )  
  names = []
  SubLogs = FILE.read().split( MainLogDelimiter )    
  # SubLogs[0] Stores the number of different log variations in the log file  
  SubLogs[0] = len( SubLogs ) - 1        
  for i in range( 1, SubLogs[0] + 1 ):        
    names.append( re.search( "^#Fields:\s([\w\-()\s]+)$", SubLogs[i], re.IGNORECASE | re.MULTILINE ).group(1).strip() )  
  return names  


# EXAMPLE:
# Loop through all the IIS log files in the directory
# for file in SearchForFile( LogsDir, "\.txt$", 1 ):  
LogFile = open( file, "r" )
if ( isSameLogProperties( LogFile ) ):
  print file, "the same"
else:
  print file, "not the same"
LogFile.close()

Converting M4P to MP3

The M4P format is proprietary to iPhones/iPods. I needed to modify some songs that I got from the iTunes store. First I used Sharepod to remove the M4P files from my iPhone. Then I used SoundForge to record the song from Winamp playback. Winamp does not play M4P files natively, so you’ll have to install the M4P plugin. Also, you may have to run the Windows Master Volume tool to properly unmute the audio source.

If you want to burn M4P files from iTunes, create a local playlist, and drag-and-drop the M4P files you got from Sharepod into the local iTunes playlist.

Twitter: Migrate People You’re Following to Another Account

I just had to migrate people I was following from one account to another account. I was able to do this using TweepML. You’ll need to have people in lists first though. The only drawback was that it didn’t create the lists in the new account so I had to manually create the lists in Twitter. Annoying, but will have to do until Twitter has this function built-in.

Python and SQL Server

Setting up Python to connect to SQL Server was relatively easy. First, you select a DB API driver. I chose pyodbc because I saw a Python article on Simple-Talk. There are two simple steps:

  1. Install Pywin32. Get the latest. It’s a dependency for pyodbc.
  2. Install pyodbc. Get it for the version of Python you’re using.

Once you’ve done this, you can query your SQL Server db as so:

import pyodbc

connection = pyodbc.connect('DRIVER={SQL Server};SERVER=192.168.0.5;DATABASE=MyAwesomeDB;UID=sa;PWD=password')
cursor = connection.cursor()

cursor.execute("select * from states")

for row in cursor:
  print row.StateID, row.Abbreviation, row.Name

For more snippets and a tutorial, check out the documentation.

Now let’s try something more interesting. Let’s try doing some inserts and see how long it takes.

import win32api
import uuid
import pyodbc 

connection = pyodbc.connect('DRIVER={SQL Server};SERVER=192.168.0.5;DATABASE=MrSkittles;UID=sa;PWD=password')
cursor = connection.cursor()

_start = win32api.GetTickCount()

for i in range( 0, 10000 ):  
  # Let's insert two pieces of data, both random UUIDs. 
  sql = "INSERT INTO Manager VALUES( '" + str( uuid.uuid4() ) + "', '" + str( uuid.uuid4() ) + "' )"  
  cursor.execute( sql )
  connection.commit()

_end = win32api.GetTickCount()
_total = _end - _start

print "\n\nProcess took", _total * .001, "seconds"

After some tests, 10,000 records took roughly 20-30 seconds. 1,000,000 records took 30 to 40 minutes. A bit slow, but it’s not a server machine. My machine is a Core Duo, 1.8Ghz x 2, at ~4GB with PAE on WindowsXP, but I ran this on a VMware VM with 1GB and SQL Server 2005 w/Windows Server 2003. The table was a two column table both varchar(50). On a server machine, it should be a helluva lot faster.

yUML and ColdFusion

I just tried to write a quick script in Python that scans CFCs and generates a yUML URL to diagram. I pointed my script to my root CFC path and I got a 13K strlen URL. I pasted it in the address bar to see what happened and I got the following:

Request-URI Too Large

The requested URL's length exceeds the capacity limit for this server.
Apache/2.2.3 (Debian) Phusion_Passenger/2.0.2 Server at Ess000235.gtcust.grouptelecom.net Port 80

I wonder what the limitation is. I suppose I’ll have to do a CFC per diagram and then bind them together somehow. I’m choosing Python so this script can be part of my build script.

Here’s the code so far, which of course, could be optimized:

import re
import os

# UML Syntax
# http://yuml.me/diagram/class/[User|Property1;Property2|Method1();Method2()]
# http://yuml.me/diagram/class/
# [
#   User
#   |
#     Property1;
#     Property2
#   |
#     Method1();
#     Method2()
#  ]


# Master Path
ROOT_PATH = 'C:\\temp\\cf-yuml'

def SearchForFile( rootpath, searchfor, includepath = 0 ):
 
  # Search for a file recursively from a root directory.
  #  rootpath  = root directory to start searching from.
  #  searchfor = regexp to search for, e.g.:
  #                 search for *.jpg : \.exe$                     
  #  includepath = appends the full path to the file
  #                this attribute is optional
  # Returns a list of filenames that can be used to loop
  # through.
  #
  # TODO: Use the glob module instead. Could be faster.  
  names = []
  append = ""
  for root, dirs, files in os.walk( rootpath ): 
    for name in files:
      if re.search( searchfor, name ):
        if includepath == 0:
          root = ""          
        else:          
          append = "\\"
        names.append( root + append + name )        
  return names  


def getCFCInfo ( FILE, path ):
  FILE.seek( 0, 0 )  
  CFCLines = FILE.readlines()
  
  CFCFunctions  = []
  CFCProperties = []
  CFC           = {}
  
  for i in CFCLines:
    # Get names of methods  
    if re.search( "^<cffunction", i , re.IGNORECASE | re.MULTILINE ):    
      CFCFunctions.append( re.search( r'name\s*=\s*"([\w$-]+)"', i, re.DOTALL | re.IGNORECASE).group(1) )
    
  # Get names of properties
    if re.search( "^<cfproperty", i , re.IGNORECASE | re.MULTILINE ):    
      CFCProperties.append( re.search( r'name\s*=\s*"([\w$-]+)"', i, re.DOTALL | re.IGNORECASE).group(1) )     
  
  CFC = { "properties":CFCProperties, "methods":CFCFunctions }  
  
  # Generate URL
  strFunctions  = ""
  strProperties = ""
  
  for i in CFCFunctions:
    strFunctions  += i + "();"
  
  for i in CFCProperties:
    strProperties += i + ";"  

  CFCFileName = re.search(r"\\([\w-]+)\.cfc$", path, re.DOTALL | re.IGNORECASE).group(1)  
  return "[" + CFCFileName + "|" + ( strProperties.strip()[:-1] + "|" if strProperties.strip()[:-1] else "" ) + strFunctions.strip()[:-1] + "]"  

URL = ""

for i in SearchForFile( ROOT_PATH, "\.cfc$", 1 ):
  CFCFile = open( i, "r" )
  URL += getCFCInfo( CFCFile, i ) + ","
  CFCFile.close()

URL = URL[:-1]
print "http://yuml.me/diagram/class/" + URL

I'll keep working on this as time goes on. So far it just goes through all the CFC's from the path you point to. It will crawl through all sub directories. There's no relationship between classes, however. Not yet at least.

Handling Images from the Command Line

Recently, I needed to do some work from the Windows Command Line, and I needed to deal with a few images. Along the way, I found some great tools. All these tools are free. Can come in useful when automating.

Manipulation

ImageMagick – This is a collection of command line tools. You can do image conversion, view properties, transform, transparency, join, overlay, add special special effects, and tons more. Also has APIs for C, C++, Java, .NET, Perl, PHP, Python, Ruby, and others. Highly recommend it.

Screen Capture


CmdCapture – Takes a screenshot of your desktop from the command line.

IECapt – Capture Internet Explorer’s rendering of a web page into a BMP, JPEG or PNG image file.

Cutycapt – Capture WebKit’s rendering of a web page into a variety of vector and bitmap formats, including SVG, PDF, PS, PNG, JPEG, TIFF, GIF, and BMP.

wkhtmltopdf – Convert HTML to PDF using the Webkit rendering engine, and Qt.

Reprogram Your Keyboard Keys

I recently got a new Microsoft keyboard without the Right-Windows-Key. I use this all the time ’cause I’m a shortcut fanatic, and I couldn’t get used to using the one on the left. Not only that, but I use a CTRL/SHIFT + INSERT a lot to copy/paste and the keys were way different from my past keyboard…

along comes AutoHotKey….

This program rocks. With it, you can create a script that runs and stays in memory. Remap keys and buttons on your keyboard, joystick, and mouse. In any case, with this tool, I was able to remap the menu key (the one usually between the Right-Windows-Key and CTRL key) that I never use to a windows key. That was close enough to me. Once installed, I was able to create a script with this command:

AppsKey::RWin

And that’s it! That mapped the menu key to the windows key! I’ll be experimenting with this program a little more and trying to get more shortcuts. You can also compile your scripts into a program that runs without an installation – in case you want your shortcuts / remapping run on a different machine.

The URL is: http://www.autohotkey.com/

Who Keeps Calling Me

Want to try to get into the habit of writing in my blog again. Been checking through my caller ID’s, and I’ve realized a whole slew of calls from 909-842-8164. Didn’t know they had a site where you can look up random phone numbers. It’s interesting to he hear people calling these pesky telemarketers to so they get taken off their slimy list.

Two Web Sites I’ve found:

http://whocallsme.com/
http://800notes.com/

Success and Self-Confidence

I’m now listening to the audio book, “Silva Mind Control for Success and Self-Confidence” by Hans DeJong. Wow, great stuff. He presents a method to control the mind, a method originally created by Jose Silva. Hans goes into detail about the Silva program and how to use both sides of the brain. The audio also contains mind relaxation audio for the mind.


One thing that captured my attention was right in the beginning when Hans states,

“How do you become confident in anything? Is it not through repeated successes- to become confident? When you do something and it comes out the right way… and then you do it again, and then it comes out the right way again, and then you do it for the third time, and then it comes out the right way, by the time you do it for the 14th time, you’re going to be pretty confident because you have all those previous succesful experiences! So? That is what you do! You create practice at things until you become good at it… “

Taking that into consideration, he mentions that one should try to capture this good feeling of confidence that comes from the previous successful experiences – capture this feeling and save it for future use. By that, he says to write down previous successes on a piece of paper going in chronological order starting from today. Also, write down how you felt.

To summarize on what I’ve listened so far, here’s a list:

1. Write list of previous successes. Look at it before you go to a meeting or some important undertaking, in order to refresh yourself with the previous feelings of confidence.

2. To achieve a goal, write down goal on index card and place somewhere to look at daily. Stick to your goal and be willing to do anything for it. Place it somewhere you look at it daily. Carry the index card if you can to look at it.

3. Write down all steps required to get there (2), and do them. Do not compromise it. If the step is too large, break it down (4).

4. Break down complex tasks into smaller units.

5. Put out all negative thinking only think positive.

Can’t wait to finish this one. Good stuff.

Different Google PageRanks

Tools out there, including the Google toolbar, may use the same PageRank algorithm, but may fetch outdated results because they get the data from different Google data centers. This may explain the reason for different page ranks you may get from the same page.

Here’s more information from http://www.web-wise-wizard.com:

Google PageRank™ Complexities

Before we proceed it might be useful to say a few words about Google PageRank. A factor that is likely to confuse anyone who is new to Google PageRank is that there are effectively four different types of Google PageRank, Real PageRank, Toolbar PageRank, Toolbar Display and Directory PageRank. Adding to the confusion is the fact that Google has a large number of data centers scattered around the World that contain the PageRank databases and these databases are very rarely in sync with each other. This means that you will regularly get conflicting results returned by different data centers for the same PageRank query. Of the four different types of Google PageRank, Real PageRank is arguably the most important by far.

More information

Retrieving the PageRank from Different Datacenters

One can check the status of a PageRank on various Google data centers by visiting: http://livepr.raketforskning.com

Google Page Rank Has Been Updated

There’s talk in the SEO blogworld that Google has changed their PageRanks and it may take a while to propagate on all datacenters around the world.

From Problogger.net.

Google Page Rank Update Underway

Tim just alerted me to the fact that it seems Google are doing one of it’s periodic Page Rank Updates. These updates take a little while to show up on all data-centers around the world so it could take a day or two to shakedown – but you can read more about it in Digital Point’s PR Update has begun discussion and at WMW’s PR update Started.

Being a blog, this information is taken with a grain of salt, so it’s unclear how long (one day, one month, a few months) for the data to synchronize on all Google servers.

Here are more blog posts and pages about this PageRank Update. It also includes posts from what other people saying about it:

Watchout, Google is updating their DB

Did Google Change Rankings?

SEO Updates – Google PageRank Updates Feb 05

Google PageRank Updates Feb 05