March 17th, 2010 | Tags:

Real life has had me far to busy to post for quite some time. I hope to get back to updating posts more in the future but I promise nothing.

Just for fun, here’s a ruby version of the code; It doesn’t have all of the frills of the others but performs roughly the same function. I don’t have time to make it pretty.

You’ll see an example of exception handling, sorting a hash by value, reading a file into an array, etc. The net result is a script that does the same thing my other scripts do but this time it’s in Ruby. I’m building a little Rosetta Stone.

#!/usr/bin/ruby
require 'resolv'

# arrays and hashes need to be initialized in ruby
accessLog = []

# set hash values to default of 0 or you can't +=
addresses = Hash.new(0)

# read in the file
accessLog = File.readlines('/var/log/httpd/access_log')

# take the first word from each line of the file
accessLog.each { |line| addresses[line.split[0]] += 1 }

# sort hash by value
addresses = addresses.to_a.sort_by {|a,b| b}

# print each of the keys and values, make them pretty
addresses.each do |key, value|
        # error handling, the resolv often fails
        begin
                puts value.to_s.ljust(10) + Resolv.new.getname(key)
        rescue
                puts value.to_s.ljust(10) + key
        end
end
October 21st, 2009 | Tags:

A while back I posted a small tutorial on creating your own command line utilities in perl and bash. I demonstrated that perl is faster, more flexible, and less verbose than bash. Recently I’ve taken up python and I decided that recreating the perl ip parsing utility that I created here would be not only good practice programming but a fine addition to the bake-off.

So, here you have the python version of the ipmolest script:

#!/usr/bin/python
#
# Script: ipmolest
'''
   This script sorts a list of IP addresses, listing the
   IP address that occurs most frequently in the list last.
   The script can also give a count of how often each IP
   appeared in the list.  And it can attempt to resolve the
   IP addresses to hostnames.
   By default it will parse the apache access_log.
   With the optional flags you can specify an alternate
   field separator or column to parse. You can also pipe
   data into this script as you would with many other UNIX
   utilities.
'''
# 10/21/2009 - Ryan Bonnett
#
#####################################

### Imports
import sys, socket
from optparse import OptionParser
from operator import itemgetter

### Usage
parser = OptionParser(usage=__doc__)
parser.add_option("-f", "--file",action="store", type="string",
        help="Path to the file you'd like to parse", dest="filename",
        default="/var/log/httpd/access_log")
parser.add_option("-s", "--field-separator", action="store", type="string",
        help="Specify a field separator", dest="separator", default=" ")
parser.add_option("-c", "--count", action="store_true",
        help="Track how many times each IP was seen", dest="count")
parser.add_option("-d", "--field", action="store", type="int", dest="field",
        help="The field that contains IPs to be parsed", default="0")
parser.add_option("-t", "--tail", action="store", type="int", default="0",
        help="Integer defining how many lines to parse", dest="tail")
parser.add_option("-r", "--resolve", action="store_true", dest="resolve",
        help="Resolve IP addresses to host names")
(options, args) = parser.parse_args()

### Functions
def nameByaddress(ip):
    try:
        ip = socket.gethostbyaddr(ip)
        return ip[0]
    except:
        return ip

### Variables
ips = dict()

# Check if we're reading from a pipe or redirected file
# Either way, create the ips dictionary containing IP and count
if not sys.stdin.isatty():
    try:
        while 1:
            i = raw_input()
            line = i.split(options.separator)
            try:
                ips[line[options.field]] += 1
            except:
                ips[line[options.field]] = 1
    except:
         pass
else:
    file = open(options.filename, "r")
    for i in file:
        line = i.split(options.separator)
        try:
            ips[line[options.field]] += 1
        except:
            ips[line[options.field]] = 1

# Okay, here we sort the dictionary named ips. First we suck the items
# of the dictionary into a list named sortedips. The list contains
# ("key", "value"), and we use itemgetter to sort on the value.
sortedips = ips.items()
sortedips.sort(key = itemgetter(1))

# Check for the tail option
if options.tail != 0:
    dictlen = len(sortedips)
    if dictlen > options.tail:
        begin = dictlen - options.tail
        sortedips = sortedips[begin:]

# Check for the resolve and count options.
# Print the results to STDOUT
for ip in sortedips:
    if options.resolve:
        if options.count:
            print ip[1], "\t", nameByaddress(ip[0])
        else:
            print nameByaddress(ip[0])
    else:
        if options.count:
            print ip[1], "\t", ip[0]
        else:
            print ip[0]

There are a lot of things I like about python and perl both. I must say though, python is winning me over. The python script is 21 lines of non-commented code shorter but it’s 214 characters longer. This is because perl uses a lot of idioms that save on typing but cost readability. The python isn’t just easier to read, it’s faster. You’ll recall that when we compared the perl script to the bash script when timing performance perl took 1/4 the time of bash. Well, as it turns out python takes less than 1/4 the time perl takes when processing the same 2 million line file.

[ryan@www www]# wc -l data/log
1995175 data/log
[ryan@www www]# time ./ipmolest -c -f data/log > output

real    0m8.308s
user    0m8.016s
sys     0m0.290s

What’s not to love? It was a bit of a pain to figure out the “if not sys.stdin.isatty():” trick to test if the script was being passed piped data. Everything else was relatively easy to figure out though. And I vastly prefer the optparse method to the getopts method of gathering and processing command line options.

I probably haven’t used all the keywords I should to help people find code snippets that might be helpful to them. Let’s try this: python check for redirection, pipe, sorting dictionaries based on values.

Also, I didn’t explain the code heavily or comment it much. You’ll just have to ask me if you have any questions (or you could always RTFM, but what’s the fun in that?).

Enjoy!

Ryan

October 15th, 2009 | Tags: , , , ,

So I was hanging out at macscripter.net and investigating some more GUI automation goodness. I ran across a little script there which I decided to modify heavily. This is actually a hybrid of a couple different examples from macscripter.net.  I’m not going to explain the inner workings of this script in much detail.  If you would like me to explain any of the details of what it is doing then please post a comment, I will respond.

The script launches a display, you click okay on that display.  From there on out, as you move your mouse about and click on different coordinates, the script sits in the background and captures your mouse activity.  It tracks where you clicked and how many milliseconds it was between clicks.  It dumps this information to a file on your desktop named playback.txt.  This file can actually be run without modification as an applescript.  It will reproduce all of the clicks you made with very nearly the same timing between clicks.

You need two things to use this, Extra Suites and cliclick.  Extra Suites is a scripting addition that allows you to do all sorts of fancy things.  We use it here to grab the mouse coordinates and check if the mouse button is currently depressed.  cliclick I’ve described before, it simply clicks at certain coordinates.  We use perl to determine how many milliseconds have passed between mouse clicks.  I could’ve used Extra Suites for the mouse click functionality but I wanted to generate output that could be easily munged into a usable shell/perl/whatever script as well (since I hate applescript with an intense passion).  Perl and BASH can’t make calls to Extra Suites, but they can call cliclick.  The output of the script looks like:

delay 1.184240102768
do shell script "/usr/local/bin/cliclick d1079 865"
delay 1.250790119171
do shell script "/usr/local/bin/cliclick d1081 856"
delay 1.734300136566
do shell script "/usr/local/bin/cliclick d1138 824"
delay 8.745839834213
do shell script "/usr/local/bin/cliclick d679 723"
delay 5.172240018845
do shell script "/usr/local/bin/cliclick d711 832"
delay 2.283070087433
do shell script "/usr/local/bin/cliclick d1094 829"

If you strip the first 3 words and the quotation marks on the cliclick lines and change the word delay to sleep this will run as a BASH script, also easily stuck into a perl script.

The code that generates this code, as ugly as it is, follows:

display alert "Mouse Recorder" message ("Click \"OK\" to start recording the position of mouse clicks." as Unicode text) & ¬
	return & return & "To return the recorded results, press " & «data utxt2318» & ¬
	"-. (command-period) or the Esc key." buttons {"Cancel", "OK"} cancel button 1 default button 2

set cliclick to "do shell script \"/usr/local/bin/cliclick d" as text

set the_file to (((path to desktop) as string) & "playback.txt") as file specification

set startTime to do shell script "perl -e 'use Time::HiRes qw(time); print time'"

set AppleScript's text item delimiters to {" "}

tell application "Extra Suites" to repeat
	try
		if ES mouse down then
			set endtime to do shell script "perl -e 'use Time::HiRes qw(time); print time'"
			set difference to (endtime - startTime)
			set coords to (ES mouse location) as text
			set coordinates to (every text item in coords) as list
			set coordx to item 1 of coordinates
			set coordy to item 2 of coordinates
			set addLine to cliclick & coordx & " " & coordy & "\"" & return as text

			try
				open for access the_file with write permission
				write "delay " & difference & return to the_file starting at eof
				write (addLine) to the_file starting at eof
				close access the_file
			on error
				return "wtf?"
			end try

			set startTime to do shell script "perl -e 'use Time::HiRes qw(time); print time'"
			repeat while ES mouse down
				delay 0.1
			end repeat
		end if
	on error
		return addLine
	end try
end repeat

There are programs out there that already do stuff like this, but none that allow you to extend the functionality of them as easily as this one does. I’ll probably modify this at some point in the future to capture key strokes as well. Why you ask? Because, applescript is sadistic and I’m a masochist.

Ryan

October 14th, 2009 | Tags: , , ,

I found myself playing a fun little flash game the other day, it can be found at clockwords.us – prelude is the only playable game on the site as of this writing.  I played it for a while and decided, once I’d had my fun, to write a script that would play the game automatically.  I wrote said script and let it play the game for a couple of days.  Thanks to this you’ll notice that linuxed is the top ranked player of all time, having passed level 263.  When writing this applescript, I’ve never written anything in applescript before, I found that there were very few good sources available on the internet that offered applescript tutorials or example code.  So I feel compelled to share my script and an explanation of its function, as meager as it is, with the community.  I wish I had been able to find something like it when I was writing it.  Instead I had to Frankenstein bits from here and there to build what I wanted, it took several hours of searching.  So, with no further ado:

First let’s discuss the game.  Essentially, you type in words and the letters from the words you type are used to destroy mechanical spiders that will destroy you if you don’t destroy them first.  The game is slightly more complex than that, play it yourself if you’d like to see the mechanics in greater detail.  Essentially though, all you need to do to keep leveling up is type in words and when you’ve beaten a level click a button to begin the next level.

So the first thing you need is a good word list.  I built my word list quickly by using the 5desk list from the 12dicts word list package which can be found here.  I choose the 5desk list because it focuses on more common words, rather than the truly esoteric which are found in many other word lists.  I didn’t want to be punching words into the game that the game didn’t recognize.  To create my list I wrote a quick and dirty little perl script I named string_length:

#!/usr/bin/perl -w

while (<>){
        @length = split('', $_);
        $count = @length;
        if ($count gt 14){
                print;
        } else {
                next;
        }
}

And with the following one-liner we pipe the 61,406 words contained in the 5desk list through an awk that only prints out the words that contain an m, an o, an b, an y, and an i. We further pipe the remaining 108 words through the string length script which only prints out the words with more than 14 characters. This leaves us with a 47 word list.

cat ../data/12dicts/5desk.txt |awk '/m/&&/o/&&/b/&&/y/&&/i/{print}'|./string_length >wordlist

Word list in hand all we need to do now is write the applescript. The script must do a few things: read in the words from the custom 47 word list we’ve created, bring up Safari, type the words into the game, and once it has beaten a level it needs to click a button in the game to start the next level. Finally, it needs to repeat this ad nauseam. Then, you too, can have a ridiculously high level in clockwords. Be forewarned that I am by no means skilled in writing applescript, I don’t know much about the language and I don’t want to. I think it is one of the ugliest languages ever conceived. I am extremely verbose in my description of the script so that you can understand the script without understanding applescript. This is meant to be a good example script to help you write your own scripts without attempting to actually “learn” applescript.

-- Here is your first lesson in applescript, the -- at the beginning of this line
-- means that it is a comment and as such it will be ignored by the interpreter.

-- The following line causes the system to popup a box asking you to select a
-- system file.  In this case we'll be picking the file containing our 47 words.
-- A variable named theFile now contains a reference to the file you selected.
-- set is the syntax used when setting a variable.
set theFile to (choose file with prompt "Select a file to read:")

-- Now we open the file that the theFile variable references for reading.
open for access theFile

-- Now we create another variable, a list actually(applescript's version of an
-- array), named fileContents.  This list will contain the contents of the file
-- that is referenced by the variable theFile.  Notice that we read the file in
-- using space as the delimiter.  You could aslo use return or anything else
-- as the field delimiter.  Make sure your list of words is on a single line and
-- space delimited in your file if you want this to work correctly without
-- modification.
set fileContents to read theFile using delimiter {space}

-- You guessed it, we close the open access to the file referenced by the
-- variable theFile here.
close access theFile

-- Now, we want to be able to iterate through the list of words, typing them
-- one at a time into the game to do that we need to determine how many
-- words are in the list.  This next line sets the value of the variable
-- contentCount to the number of items that the fileContents list contains.
set contentCount to count of fileContents

-- Now we're into the action.  We create a loop that will repeat 100 times,
-- this is arbitrary set it to loop for as long as you want.  100 times will last
-- 8 * 47 * 100 seconds, about 10 hours.
repeat 100 times
-- Now we tell the system to activate the Safari application.  At this point
-- Safari will pop up on your screen and have focus.
        tell application "Safari"
                activate
        end tell
-- Now we initialize the System Events library, that will allow us to type using
-- keystroke.  Following that we begin talking to Safari, who we brought to life
-- earlier.
        tell application "System Events"
                tell process "Safari"
-- Now we start another repeat loop, inside of the repeat 100 times loop.  As
-- you can see the construction of this loop is slightly more complicated.  This
-- repeat loop is simulating what most other programming languages would call
-- a for loop.  It effectively is saying to repeat with variable i, first setting i to
-- the number 1 and then increasing the value of i by one with each iteration
-- of the loop until the value of i is equivalent to the value of contentCount -
-- which we set earlier and we know is 47, based on the length of the
-- fileContents list which contains our words.
                        repeat with i from 1 to contentCount
-- Now the variable gameWord is set to i of fileContents.  If this is the first
-- iteration then i is 1 and that means we're asking for the first element of the
-- fileConents list - the first word of our list of 47 words.  The variable named
-- gameWord now contains the value of element i in the list fileContents.
                                set gameWord to item i of fileContents
-- Now we wait 8 seconds.  Typing the words into the game without waiting
-- actually gives you a lower score.
                                delay 8
-- Now we type in the value contained by the variable named gameWord.
                                keystroke gameWord
-- This is mildly archaic.  We're sending the return or newline by sending ascii
-- char 13.  I believe you could safely replace this with "key code return".
-- The game requires you to type enter after typing in a word.
                                keystroke (ASCII character 13)
-- Also archaic.  This sends the escape key.  This is necessary only because
-- it is possible that your word won't be a valid word.  If it isn't a valid word
-- then by hitting escape you clear what you've typed.  If the word you
-- typed was valid then hitting escape does nothing, so it doesn't bother us
-- that this will be hit for every word.
                                key code 53
-- There is more than one way to skin the mouse click cat, I like cliclick.
-- You can pick it up from the author's site at
-- http://www.bluem.net/en/mac/cliclick/
-- as you can see I installed it in /usr/local/bin.  Cliclick does just what you'd
-- expect it to.  It performs a mouse click at the given coordinates.  The
-- coordinates you see below happen to coincide with the button in clockwords
-- that begins the next level, on my screen, based on where I placed Safari.
-- What that means is: those variables will, in all likelihood, not work for you.
-- You need to put in the right coordinates for where you have the game running.
-- You can find those coordinates by mousing over the "next level" button, which
-- only appears after you beat a level, and pressing apple-shift-4.  This brings up
-- the screen capture tool which conveniently tells you exactly what coordinates
-- your mouse is currently resting on. Put those coordinates in the script here
-- and you're ready to roll.
                                do shell script "/usr/local/bin/cliclick 1561 683"
                        end repeat
                end tell
        end tell
end repeat
-- A final note.  All of the comments I've put in here make this script really ugly.
-- Make a clean copy of this script, sans comments.  The applescript is actually
-- well formatted and nice to look at without the comments.

Here is the clean version:

set theFile to (choose file with prompt "Select a file to read:")
open for access theFile
set fileContents to read theFile using delimiter {space}
close access theFile

set contentCount to count of fileContents

repeat 100 times
        tell application "Safari"
                activate
        end tell

        tell application "System Events"
                tell process "Safari"
                        repeat with i from 1 to contentCount
                                set gameWord to item i of fileContents
                                delay 8
                                keystroke gameWord
                                keystroke (ASCII character 13)
                                key code 53
                                do shell script "/usr/local/bin/cliclick 1561 683"
                        end repeat
                end tell
        end tell
end repeat

That’s all for this post, kids. I’ll probably do a follow up with some other fun hackery involving auto-generation of mouse click code with built-in time delays. If you have questions, comments, or code to share then post away. Correct anything I’ve said if you feel the need but know that I don’t care to know more than I do about applescript. I intend to avoid it by learning to use the applescript extensions that exist for ruby and or python. I hope to never deal directly with applescript again, though I likely will.

In summary: hacking flash games is fun, applescript is the devil’s handmaiden, word lists are your friend, and did I mention that applescript is made from pure concentrated evil?

Ryan

October 8th, 2009 | Tags: , , , , ,

From time to time I like to parse my Apache access_log file from the command line.  There are a lot of great tools for parsing access logs and retrieving information from them that operate outside of the command line.  When you want to operate against what’s happening right this second, there’s no equal to simply tailing the access_log from a shell.

I found myself frequently rendering the data a specific way by piping it through multiple utilities.  Specifically:

cat /var/log/httpd/access_log|awk {'print $1'}|sort -k 1,1n -k 2,2n -k 3,3n -k 4,4n|uniq -c|sort -k 1n

Which produces:

     41 66.249.65.52
     42 61.186.161.156
     47 208.101.2.194
     58 124.161.238.40
     63 67.182.236.189
     64 87.241.212.2
     64 98.100.108.130
    335 97.126.173.32

I find this rendering useful because it shows me what IPs, or hostnames if you’re resolving them, are visiting my site and how much interaction they’re having with the site.  And since I don’t have Apache attempting gethostbyname() lookups my logs only have IPs.  Once I’ve rendered the data in this way I sometimes wonder what domain the top visitors are coming from.  To obtain that information I construct another series of piped commands:

cat /var/log/httpd/access_log|awk {'print $1'}|sort -k 1,1n -k 2,2n -k 3,3n -k 4,4n|uniq -c|sort -k 1n|awk {'print $2'}|xargs -l1 dig -x |egrep 'SOA|PTR'|awk {'print $5'}

Which produces:

crawl-66-249-65-52.googlebot.com.
ns1.apnic.net.
208.101.2.194-static.reverse.softlayer.com.
dnssvr3.169ol.com.
c-67-182-236-189.hsd1.ut.comcast.net.
ns1.enforta.com.
rrcs-98-100-108-130.central.biz.rr.com.
97-126-173-32.slkc.qwest.net.

When I can’t see a PTR record I’m happy to simply see who has authority to represent that IPs DNS information, which is why I then grep for the SOA.  I mention this in case anyone was feeling the need at this point to explain to me that the PTR and SOA are not interchangeable bits of information, they’re not – but for my purposes here either is informative.

Now, typing these commands out is a bit of a pain.  I could stick them in a simple bash script.  It would be very simple to read $1, the first argument passed to the script from the command line upon invocation, and take that command line argument as the file to parse.  So I did that, well, I did it one better.  I used  getopts and  wrote a quick little script that would take the argument to -f as the file to parse and allow a -c option to turn the printing of hit count on or off.  This way you can parse a file, sort it, print it without the count, and pipe the output to xargs -l1|dig -x etc to find out who is visiting your site.

#!/bin/bash
#
# Script: unique_addresses
# Author: Ryan Bonnett
# Date: 10/02/2009
#
# Purpose: This script should properly parse any file as long
#       as the first field of each line is an IP address.
#       However, it was primarily written to parse Apache's
#       access_log inasmuch as Apache is using the default
#       logging format. The script returns either a list of
#       IP addresses or a list of IP addresses with a count
#       specifying how often that IP showed up in the log file.
#       I recommend piping the output to:
#       |xargs -l1 dig -x|egrep 'SOA|PTR'|awk {'print $5'}
#       that will attempt to resolve the IP addresses to
#       hostnames/SOA provider name.
#
############################################
#
# set -n # Uncomment to check command syntax without execution
# set -x # Uncomment to debug

file=/var/log/httpd/access_log

######## sortlog function
sortlog()
{
        cat $file |awk {'print $1'}\
          |sort -t\. -k 1,1n -k 2,2n -k 3,3n -k 4,4n\
          |uniq -c\
          |sort -k 1,1n
}

######## testfile function
testfile ()
{
        if [ -s "$file" -a -r "$file" ]
        then
                :
        else
                echo "$file doesn't exist, is empty, or we don't\
 have read permissions."
                exit 1
        fi
}

######## usage function
usage ()
{
        echo "Usage: $0 [options]"
        echo ""
        echo "-c        takes no arguments, shows IP address hit count."
        echo "-f        follow this option with the file you're processing."
        echo ""
        echo "Example: unique_addresses -c -f /var/log/httpd/access_log"
        exit 1
}

######## Main body of script
while getopts ":cf:" options
do
        case $options in
          c) countflag=1 ;;
          f) file=$OPTARG ;;
          \?) usage
                exit 1 ;;
        esac
done

if [ "$countflag" = 1 ]
then
        sortlog
else
        testfile
        sortlog |awk {'print $2'}
fi

After writing this short script I decided that it was lacking in many ways.  I could pipe data out of it, but I couldn’t pipe data in – I could only read from a file. I couldn’t specify a field separator, I couldn’t specify the field to read, I couldn’t resolve hostnames, and who knows what other features I may want in the future. A lot of files track IP address information and I wanted to be able to parse any of them and render the information sorted, counted, and possibly name resolved. So I decided to create a more fully featured utility, that’s where perl comes in:

#!/usr/bin/perl -w
#
# Script: ipmolest
# Purpose: This script sorts a list of IP addresses, listing the
#       IP address that occurs most frequently in the list last.
#       The script can also give a count of how often each IP
#       appeared in the list.  And it can attempt to resolve the
#       IP addresses to hostnames.
#       By default it will parse the apache access_log.
#       With the optional flags you can specify an alternate
#       field separator or column to parse. You can also pipe
#       data into this script as you would with many other UNIX
#       utilities.
#
# Author: Ryan Bonnett
# Date: 10/07/09
#
###############################################################

############################ Modules
use Socket;
use IO::Select;
use Getopt::Long;

############################ Subs
sub gethost { # This sub simply grabs a PTR if one is found.
        # Make sure we're looking at an IP address.
        return $_[0] if ($_[0] !~ m/^\d+\.\d+\.\d+\.\d+$/);
        my $ip = inet_aton $_[0];
        if (gethostbyaddr($ip, AF_INET)){
                return gethostbyaddr($ip, AF_INET);
        } else {
                return $_[0];
        }
}

sub usage {
        print "Usage: ipmolest [options]\n";
        printf "  %-20s%s\n",
                "--count, -c", "Track how many times each IP was seen.";
        printf "  %-20s%s\n",
                "--field, -d", "The input field that contains an IP address.";
        printf "  %-20s%s\n",
                "--file, -f", "Define the file you'd like to parse.";
        printf "  %-20s%s\n",
                "--separator, -fs", "Specify a field separator.";
        printf "  %-20s%s\n",
                "--help, -[h?]", "View the usage information.";
        printf "  %-20s%s\n",
                "--resolve, -r", "Resolve IPs to hostnames where possible.";
        printf "  %-20s%s\n",
                "--tail, -t", "Works like a pipe through tail, saves work.";
        exit;
}

############################ Default values and GetOptions
$file = "/var/log/httpd/access_log";
$field = 0;
$separator = " ";

# Read options passed to the script at the command line.
GetOptions (      'c|count'             => \$count
                , 'd|field=i'           => \$field
                , 'f|file=s'            => \$file
                , 'fs|separator=s'      => \$separator
                , 'h|help|?'            => \$help
                , 'r|resolve'           => \$resolve
                , 't|tail=i'            => \$tail);
&usage if ($help);

############################ Begin Main
# Check for STDIN - is someone piping data into this script?
# If we are reading piped data then spool it up in the @ips hash.
$pipe = IO::Select->new();
$pipe->add(\*STDIN);
if ($pipe->can_read(.5)){
        while(<>){
                @line = split($separator, $_);
                $ips{$line[$field]}+=1;
        }
}

# Pick a file to parse and open it as long as we didn't read piped input.
if (!@line) {open (LOG, "$file")|| die "Can't open: $!\n";}

# Grab the IP addresses and count them in @ips, if we're reading a file.
if (!@line){
        foreach $line () {
                @line = split($separator, $line);
                $ips{$line[$field]}+=1;
        }
        close (LOG);
}

# We stuck all of the IP addresses into a hash and gave the IPs a value
# that represents how many times that IP showed up in the input.
# Below we sort the keys by their value.
@sorted_ips = sort { $ips{$a} <=> $ips{$b} } keys %ips;

# Check for the tail option and modify the sorted_ips array accordingly
if ($tail){
        $array_length = @sorted_ips;
        if ($array_length > $tail){
                $begin = ($array_length - $tail);
                splice(@sorted_ips, 0, $begin);
        }
}

# Now we read through the sorted IP keys and print them out one at a time.
foreach $ip (@sorted_ips){
        next if (!$ip);
        # Check for the resolve flag, call the gethost sub to resolve.
        if ($resolve){
                # Check for the count flag, count if we're counting.
                if ($count){
                        printf ("%-10s%s\n", "$ips{$ip}", &gethost($ip));
                } else {
                        print &gethost($ip) . "\n";
                }
        } else {
                # We're not resolving, are we counting?
                if ($count){
                        printf ("%-10s%s\n", "$ips{$ip}", "$ip");
                } else {
                        print "$ip\n";
                }
        }
}

First let me say that this perl script isn’t “clean”. It’s something that I hacked together and it works perfectly well for me and my purposes. It could be cleaned up by adding a few more tests to make sure you’re operating with the expected data, etc. The point of this post isn’t to demonstrate my masterful skills at coding. If you do happen to take this script and industrialize it I would love to have a copy of your cleaned up version.

The perl script above weighs in at 1996 meaningful characters. I wasn’t going for concise, I was going for readable/commented. I’m sure it could be produced in half the size if that was the goal. The bash script comes in at 784 meaningful characters and it really couldn’t be written much more compact. Had I attempted to make the bash script offer the functionality that the perl script is now offering it would have taken infinitely more effort on my part. The bash script would’ve weighed in at more than double the length of the perl script. The bash script wouldn’t be portable, unless you added in endless case statements to support other Unix/Linux/GNU systems. The final reason that you shouldn’t extend bash scripts into the world of processing: the shell script takes 4 times as long to parse a file.

Behold, the perl is mightier than the bash when scripting:

[ryan@www www]# ls -l data/log
-rw-r--r-- 1 ryan ryan 455689591 2009-10-08 22:10 data/log

[ryan@www www]# wc -l data/log
1995175 data/log
[ryan@www www]# time ./ipmolest -c -f data/log > output

real    0m34.518s
user    0m33.375s
sys     0m0.636s
[ryan@www www]# time ./unique_addresses -c -f data/log > outputtoo

real    2m21.330s
user    2m23.667s
sys     0m1.613s
[ryan@www www]#

As you can see, it took the perl script 34 seconds to process the ~2 million line, 455MB, file and it took the bash script more than 4 times as long.

Okay, on to other interesting things. The perl script has some portions that are comment worthy, let’s go over those now.

Here is a handy way to have your perl script accept data that is piped to it:

# Check for STDIN - is someone piping data into this script?
$pipe = IO::Select->new();
$pipe->add(\*STDIN);
if ($pipe->can_read(.5)){
        while(<>){}
}

Another handy snippet is this bit that will create an array containing the keys of a hash sorted by the key’s values.

@sorted_ips = sort { $ips{$a} <=> $ips{$b} } keys %ips;

That’s all for this post.

Ryan

September 22nd, 2009 | Tags: , , , , , ,

Basic list of SMTP commands:
HELO:  Identifies the domain the client originates from.
MAIL FROM:  Identifies the sender of the message.
RCPT TO:  Identifies the recipient. RCPT can be issued repeatedly to address multiple recipients.
DATA:  To type in the message, terminate with a . on a line by itself.
QUIT:  Use this to close the connection.

The 8 Steps:

  1. telnet mail.linuxed.com 25 # This will connect you to the SMTP server at domain.com.
    Trying 199.71.215.6…
    Connected to mail.linuxed.com (199.71.215.6).
    Escape character is ‘^]’.
    220 mail.linuxed.com ESMTP Postfix
  2. helo bob.com # This identifies you, the soon to be sender, as bob.com.
    250 mail.linuxed.com
  3. mail from: ryan@bob.com # This specifies the sender of the email.
    250 2.1.0 Ok
  4. rcpt to: root@linuxed.com #  This tells the MTA who to send the message to.
    250 2.1.5 Ok
  5. data #  Now we can enter the message
    354 End data with .
  6. Hello, Root! #  Content of the message is typed here.
  7. . #  Once you type a line with nothing but a period on it the message queues for delivery.
    250 2.0.0 Ok: queued as 1463416A5261
  8. quit # Type quit when you want the telnet session to terminate.
    221 2.0.0 Bye
    Connection closed by foreign host.

For some advanced fun, and if you already have Tcl and Expect installed on your system, you can use this expect script to automate the process of testing SMTP via telnet.  This script was created by using autoexpect and then modifying the resulting script to accept arguments.

test-smtp.exp:

#!/usr/bin/expect -f

set force_conservative 0  ;# set to 1 to force conservative mode

if {$force_conservative} {
        set send_slow {1 .1}
        proc send {ignore arg} {
                sleep .1
                exp_send -s -- $arg
        }
}

set domain [lindex $argv 0]
set rcpt_to [lindex $argv 1]
set mail_from [lindex $argv 2]
set data [lindex $argv 3]

set timeout -1
spawn telnet $domain 25
match_max 100000
expect "220"
send -- "helo bob.com\r"
expect "250"
send -- "mail from: $mail_from\r"
expect "250"
send -- "rcpt to: $rcpt_to\r"
expect "250"
send -- "data\r"
expect "354"
send -- "$data\r"
expect  "$data\r"
send -- ".\r"
expect  "250"
send -- "quit\r"
expect eof

You run the script as follows, passing four arguments to the command line – domain to send the mail to, recipient address, sender address, and a data string.  Here is the output of the script when run:

[root@centos-gw1 Desktop]# ./test-smtp.exp linuxed.com root@linuxed.com linuxed@fake.com "Hi, Ryan"
spawn telnet linuxed.com 25
Trying 199.71.215.6...
Connected to linuxed.com (199.71.215.6).
Escape character is '^]'.
220 linuxed.com ESMTP Postfix
helo bob.com
250 linuxed.com
mail from: linuxed@fake.com
250 2.1.0 Ok
rcpt to: root@linuxed.com
250 2.1.5 Ok
data
354 End data with .
Hi, Ryan
.
250 2.0.0 Ok: queued as 5F80816A5261
quit
221 2.0.0 Bye
Connection closed by foreign host.

And of course, you could macro this down to a single command by aliasing it in your shell or wrapping it in a shell script – in which case you could then also check exit status and use this as a crude monitoring tool.

Ryan

September 21st, 2009 | Tags: , ,

franken-tux

I’ve owned the domain name linuxed.com since 5 days into 1999.  At the time of purchase I was 20 years old and only a year into my career working with Linux.   I picked up the domain planning to use it as my personal Linux soapbox.  My career and family life came first and Linuxed.com was never hosted.  Recently I’ve found myself with a bit of free time == Linuxed.com’s time has come.

Over the past 11 years I’ve worked for 5 companies and I’ve been titled manager, engineer, and architect (sometimes all three at once).  During that time I’ve kept close to Linux and the F/OSS community.  My career has been spent balancing the use of F/OSS and commercial solutions to provide the right balance of cost savings, reliability, and functionality in the enterprise space.

Linuxed.com will be used as my soapbox, it just won’t be a completely Linux-centric preaching that comes from it.  Since I’ve designed, deployed, administered, and maintained IBM AIX, HP-UX, Sun Solaris, Microsoft Windows, AS/400, BSD variants, and a dozen or more Linux variants, it wouldn’t make sense to write only about Linux on Linuxed.com.  So the plan for now is to discuss everything in the world of computing that I’m interested in and working on.

If all I manage to do here is leave a trail evidencing where I’ve been, that would satisfy me.  If this were to grow into a forum providing meaningful discourse and the exchange of ideas among professionals, I would be thrilled.

Ryan

September 21st, 2009 | Tags: , , , ,

doing-it-wrongWhen I was first learning the art of Systems Administration there were choices to be made.  What  languages, operating systems, and applications should I learn?  I decided to tackle everything.

First I bought a system to learn on, a laptop.  I was working the grave shift as level 2 support at a national ISP during this time.  My thinking was that with a laptop I could drag it to work with me and study when I wasn’t assisting the level 1 support teams.  Well, this turned out to be a great idea… Sort of.  As it turns out, in 1998 Linux wasn’t laptop friendly.  This forced me to really dig into the code and learn how the OS worked before I could succeed at installing Linux on the laptop.  It took me ~ 2 weeks before I had successfully hacked the laptop to run Redhat 4.  This, of course, was long before RHEL or Fedora existed.  Along the way I had become so frustrated with trying to get Redhat on that I tried to put on Slackware.  That quickly sent me back to Redhat, as Slackware seemed infinitely more difficult to compile than Redhat.

At the same time that I was beginning to work with Linux I began studying perl, preparing for my MCSE, and fiddling with web design studies.  I finished my MCSE studies and certification within six months, NT4 track.  Right as I completed my MCSE I was hired into another department as the Systems Admin for the QA lab.  I inherited 30 Solaris (2.5 and 2.6) sparc systems, none of which were configured.

This was wonderful, it gave me all the time in the world to study – as my job was now nothing but learning more about system administration.  As a quick side note, and something I’ve always found humorous, I asked the hiring manager why he hired me for the Solaris Admin role and he told me that it was because I had my MCSE.  Puzzled, I asked him how that qualified me for a Unix role.  He told me that it showed I had gumption.  I suppose it did.  Okay, back to the story.  So I learned Solaris and learned it quickly.  I literally spent night and day working in that office, I loved it.  Learning how it all worked was a blast.  I set up DNS, jumpstart services, NIS+, auto-mounting home directories, and all the other good stuff that you’d want in a Unix lab in 1999.

Well, having learned Solaris and having generally improved Unix skills I decided to continue my learning.  By this point I was skilled with perl, the command line, most utilities, a variety of services (including Apache, Veritas’ tools and services, etc.) and networking in general.  I decided to keep my bend towards the internet and networking, in case I decided to go that route later.  So I picked up my CCNA, CCDA, and half of my CCNP before I ran into limitations in my ability to absorb and retain knowledge (I’d also been learning C, C++, Java, PHP, Expect, and other languages).  Along the way I’d picked up my Solaris Certified Systems Administrator paper as well.  At this point I was 2 years down the system administration road.

It was at this time that, for the first time, when I tried to remember something that I had previously learned about system administration I realized I needed to refresh my memory before proceeding.  I was forgetting things, and stuff, and what have you.  It was terrible and it was a first.  And it didn’t quit happening, it only got worse.  The more I learned, the more details I forgot about what I already knew.  I wasn’t losing any conceptual knowledge, just the minutia.  It didn’t have any effect on my ability to rely on my knowledge to offer up creative solutions to situations but it was still frustrating, and enlightening.  Apparently my elastic mind has boundaries and by 21 I had found them.  At this point, I’ve forgotten more details about various services, utilities, hardware, and processes than I thought I’d ever learn.

Let’s throw out a late thesis statement based on the title of this post, this seems like just the right place.  Though the axiom goes ‘knowledge is power’ I would contend that the power isn’t in the knowing but in the learning and doing.  While knowledge will fade with time, the ability to learn and do doesn’t.  In my opinion the finest quality a Systems Administrator can posses is an unquenchable desire to learn and do.  That desire molds a person into an analytical machine capable of quickly trouble-shooting the most difficult problems.  And that, that is what a good Systems Administrator is.

Learning is critical, being able to and wanting to.  Knowing is good, but optional.  It will save you a bit of time.  Rather than hope to be an admin that knows everything you should hope to be an admin that can quickly find and understand/learn anything.  Then there is the doing.  This, of course, is the most important part.  You don’t earn a living because of what you’ve learned, can learn, will learn, or know.  You earn a living based on what you do and have done.  This is where experience comes from, doing.  The learning and the knowing come with the doing.

So if there’s something that needs doing, or something you think you’d like to do, get out there and do it.  You’ll come to learn and know it only through doing it.  There are only so many hours in a day.  You can’t do everything, but you can try.  It can help to remind yourself that you can sleep when you’re dead.

Ryan

TOP