ydal

Gaim to Pidgin log conversion

I was brow­sing through some older files of mine and clea­ning up when I stum­bled over a bunch of old instant mes­sen­ging log files. These logs were still in the legacy uni­fied log file for­mat which Gaim (today knows as Pid­gin) used in the begin­ning. I didn’t find a con­ver­ter after about ten seconds of using Google, so I went ahead and wrote my own.

It’s mostly feature-complete, and will split up any num­ber of [something].log you pass to it into [something]/[date].txt style files. What it can’t really do is deter­mine what kind of pro­to­col it’s dea­ling with, so you’ll still have to move the log direc­tory manu­ally to the appro­priate pro­to­col direc­tory inside of ~/.purple/logs. Be wary when moving files, though, as you might acci­den­tally over­write other log files. Use rsync.

Anyhoo, you can eit­her get the file directly or just try this deli­cious copypasta:

#!/usr/bin/perl
# gaim2pidgin.pl
# author:  towo <towo@ydal.de>
# version: 3
# license: CC-BY-DE-3.0
 
use strict;
 
# convert short month names to numbers.
my %shortmonths = (
	'Jan' => '01',
	'Feb' => '02',
	'Mar' => '03',
	'Apr' => '04',
	'May' => '05',
	'Jun' => '06',
	'Jul' => '07',
	'Aug' => '08',
	'Sep' => '09',
	'Oct' => '10',
	'Nov' => '11',
	'Dec' => '12'
);
 
# go through files
FILE: foreach my $file (@ARGV) {
	my ($header, $target);
 
	# sanity checks
	unless (-f $file) {
		warn "$file is not a file.\n";
		next FILE;
	}
	unless(open(LOG, $file)) {
		warn "Unable to open $file for reading: $!\n";
		next FILE;
	}
 
	# get file header, get target name
	chomp($header = <LOG>);
	$header =~ s#<.*?>##g;
	$target = $file;
	$target =~ s/\.log$//;
 
	# check header for correctness
	unless($header =~ m{^(<HTML><HEAD><TITLE>)?IM Sessions with .*? \
                             (</TITLE></HEAD><BODY BGCOLOR=".*?">)?$}i) {
		warn "$file does not seem to be a gaim conversation.\n";
		next FILE;
	}
 
	# read LOG to file
	my @contents = <LOG>;
	close(LOG);
 
	# parse log file (one loop ^= one chat session)
	while(@contents) {
		my ($session, $identifier, $date);
 
		# get session identifier
		chomp($session = shift @contents);
 
		# Strip HTML.
		#$session =~ s#<.*?>##g;
		$session =~ s#</?(FONT|B|I|ALIGN|HTML|HEAD|TITLE|HR|BR|BODY|H3).*?>##ig;
 
		# sanity check for the session identifier
		unless ($session =~ m/^ ?---- New Conversation @ \w{3} (\w{3}) ([0-9 ]{2}) \
                                      (\d{2}):(\d{2}):(\d{2}) (\d{4}) ----$/) {
			warn "Could not recognize session identifier: «$session»\n";
			next FILE;
		}
 
		# extract date from session identifier and create target identifier
		$date = "$6-$shortmonths{$1}-" . sprintf("%02d", $2) . ".$3$4$5";
		$identifier = "$target/$date.txt";
 
		# sanity check for target directory
		unless (-d $target) {
			unless(mkdir $target) {
				warn "Could not create directory $target: $!\n";
				next FILE;
			}
		}
 
		# open output file
		unless(open(OUTPUT, "> $identifier")) {
			warn "Could not write to $identifier: $!\n";
			next FILE;
		}
		select OUTPUT;
 
		# extract log to log file
		until($contents[0] =~ m/^(<HR><BR><H3 Align=Center>)? ?---- New Conversation/ \
                                      or !@contents) {
			my $line = shift @contents;
			$line =~ s#<.*?>##g;
			print $line;
		}
		close OUTPUT;
	}
}

Creative Commons License Licen­sed as CC-BY-DE-3.0.

One Comment

  1. http://gist.github.com/245103

    Named varia­bles for regex cap­tures
    Inline instead of top-of-scope varia­ble decla­ra­ti­ons
    Lexi­cal instead of glo­bal filehand­les
    3-argument open
    File::Spec and File::Basename instead of string mani­pu­la­tion on file names
    Only checks tar­get direc­tory exis­tence once instead of over and over
    No LABELs
    Does not slurp the whole log file into memory
    No nes­ted out­put loop
    30% (30 lines) shorter

    (I almost didn’t publish this because of CC-BY.)

Leave a Reply