Easy fixes using Perl

Take advantage of these simple text processing tricks using Perl

There are a lot of really nice scripting languages available to Unix admins, but Perl is still one of my favorites for doing any work that involves regular expressions -- any text that you can describe with a pattern. If you want to locate or change chunks of text that match some particular specification, you can probably throw together a script in Perl that will do the work fairly easily. In this week's post, we're going to examine some easy fixes for common problems that take advantage of Perl's versatile nature.

Removing blank lines
The first trick is removing blank lines. In Perl, recognizing a blank line is easy. Since ^ represents the beginning of a line and $ represents its end, ^$ represents a line that begins and ends and has nothing in between. You can expand this to also match lines that contain white space by changing the expression to ^\s*$. The \s means "white space", so \s* matches zero or more characters of white space.

To skip over blanks lines in a perl script, you have several choices. You could use a "next if /^$/" (skip if empty) command or a "next if /^\s*$/" skip if empty or only white space. Alternately, you could take the approach of printing only if (/\S/) (print if there is text) or print if (!^$) (print if not empty).

This script nugget would show only lines containing text by skipping blanks lines:

while (  ) {     next if /^\s*$/;	# skip blank lines     print; }              

This one would only print lines containing text:

while (  ) {     print if (/\S/; }              

This code displays only lines that aren't empty:

while () {       print if (!/^$/);	# print only if NOT empty }              

This handy one-liner removes blank lines, but also saves the original file to .old. The new file (the one without blank lines) assumes the original filename.

perl -i.old -n -e "print if /\S/" filename              

You can turn this into a script:

#!/bin/bash  if [ -f "$1" ]; then     perl -i.old -n -e "print if /\S/" $1 fi              

or an alias:

alias deblank='perl -i.old -n -e "print if /\S/"'              

Removing whitespace at the beginnings and ends of lines
Removing whitespace at the beginnings or ends of lines can facilitate later processing by reducing the options that you need to consider.

To remove leading whitespace:

$string =~ s/^\s+//              

To remove trailing whitespace:

$string =~ s/\s+$//              

Removing non-Ascii characters
Removing characters that don't fall within the range of the traditional ASCII character set is a little tricky. In the command below, we're using the perl tr command to map the range of characters between hex 80 (decimal 128) to hex FF (decimal 255) aqnd deleting them (d). This isn't going to be useful to you if you're using an extended character set.

$string =~ tr/\x80-\xFF//d;              

Removing carriage returns
When using perl, the expression \r represents a carriage return while \n is a linefeed. You can easily remove carriage returns from a string variable as shown below.

$str =~ s/\r//g;              

Carriage returns and linefeeds are removed by combining \r and \n in that order, of course, since that's the order in which they appear in text files, like those that are created on Windows systems.

$str =~ s/\r\n//g;              

This one-liner removes carriage returns in a file using the same logic:

perl -p -i -e 's/\r//g' dosfile              

This can be especially handy if you don't have a tool like dos2unix available. Plus, I really like the "in place" (no temporary files or copies involved) nature of perl commands with the "pie" (-p -i -e) arguments. And it easily turns into an alias:

alias rmCR='perl -p -i -e '\''s/\r//g'\'''              

Adding line numbers
To add line numbers to the contents of a file, try this:

perl -p -i -e '$_ = "$. $_"' myfile              

This one takes a bit of explanation. The $. variable contains the line number, so the command reads "change the line to the line preceded by the line number".

Replacing text
Perl pie commands make replacing text within a file fairly easy, though you should always be careful that you're not changing text you didn't mean to change along with the text you hoped to target.

perl -p -i -e 's/2011/2012/g' filename              

Renaming files
This command renames files to their lowercase character equivalents. Read this as "for every argument provided (i.e., the file list "*"), rename the file to lowercase unless a file by that name already exists".

perl -e 'for (@ARGV) { rename $_, lc($_) unless -e lc($_); }' *              

And, of course, you could do the same thing with uppercase by replacing lc with uc in both
locations.

Copyright © 2012 IDG Communications, Inc.