Advanced String Manipulation

This document was written by CS 290W TA David Corcoran and was last modified

Sometimes it is useful to be able to modify some of the characters in a string. You may want to change all A's to B's, change all lowercase to capital letters, or delete multiple blanks. Perl performs these type of operations using Translation, Substitution, and Regular Expressions.

The Translation (tr) function
This function takes a string and replaces every occurrence of some character with another. In the simplest way of doing this (shown below), the tr function operates on the contents of the $_ variable -- the default variable. Tr takes the first argument as the characters you are wanting to match. The second is the replacement characters. Tr returns the number of characters replaced. Both the SearchList and ReplaceList can be a list of characters or a range of characters.
# Example using the tr function # tr /SearchList/ReplaceList/; $_ = "is class over yet"; $iMatched = tr/cioy/CIOY/; # $_ is now "Is Class Over Yet" # $iMatched is now 4 $_ = "is class over yet"; $iMatched = tr/a-z/A-Z/; # $_ is now "IS CLASS OVER YET" # $iMatched is now 14 Usually the number of characters in the SearchList and ReplaceList are the same. If ReplaceList is shorter than SearchList, the last character of ReplaceList is replicated as necessary.

$_ = "babs the cat decided to stay at home"; $iMatched = tr/a-e/$*#/; # $_ is now # "*$*s th# #$t ###i### to st$y $t hom#" # $iMatched is now 15 The next use of tr has no value for the second argument, but follows the // with the letter "d". This tells tr to delete every instance of characters in the first argument.

$_ = "babs the cat decided to stay at home"; $iMatched = tr/aeiou//d; # $_ is now "bbs th ct dcdd t sty t hm" # $iMatched is now 11 The squeeze "s" option causes multiple identical replacements to be squeezed into just one occurrence.

$_ = "babs the cat decided to stay at home"; $iMatched = tr/a-e/#/s; # $_ is now "#s th# #t #i# to st#y #t hom#" # $iMatched is now 15 The complement "c" option matches all characters NOT in the SearchList.

$_ = "Babs,the+cat,decided...to:--;STAY;at+++++home"; $iMatched = tr/A-Za-z/ /cs; # $_ is now # "Babs the cat decided to STAY at home" # $iMatched is now 16
Regular Expressions
Regular expressions are a powerful way of identifying patterns within a string. Here are just a few of many.
cs. # . matches any character, so this matches # cs1, cst, cs:, css cs* # * means 0, 1, or more of the item, so this # matches c, cs, css, csss, cssss, .... ^d # matches d at the beginning of a line d$ # matches d at the end of a line ^(x|d) # matches x OR d at the beginning of a line \bjon # \b matches word boundaries # matches jonathon but not Mcjon jon\b # matches Mcjon but not jonathon [0123456789] or [0-9] # matches any single digit [ijklm] or [i-m] # matches i,j,k,l,m [aeiouAEIOU] # matches any vowel c|de|f # matches c, or de, or f (chest|pea)nuts # matches chestnuts or peanuts Regular expressions can be used with any pattern matching function such as the Substitution function below.

The Substitution (s) function
This function takes a string and replaces patterns found in that string with replacement text. In the simplest way of doing this (shown below), the s function operates on the contents of the $_ variable -- the default variable. S takes the first argument as the pattern you are wanting to match. The second is the replacement characters. S returns the number of characters replaced.
$_ = "babs the cat decided to stay at home"; $iMatched = s/de/pur/; # $_ is now # "babs the cat purcided to stay at home" # $iMatched is now 1 The first occurrence of "de" is replaced by "pur". What if you want to replace every occurrence of "de"? Use the global "g" option.

$_ = "babs the cat decided to stay at home"; $iMatched = s/de/pur/g; # $_ is now # "babs the cat purcipurd to stay at home" # $iMatched is now 2 The pattern can be any regular expression: $_ = "babs the cat decided to stay at home"; $iMatched = s/de./pur/g; # $_ is now # "babs the cat puripur to stay at home" # $iMatched is now 2 $_ = "babs the cat decided to stay at home"; $iMatched = s/\bde./pur/g; # $_ is now # "babs the cat purided to stay at home" # $iMatched is now 1 $_ = "babs the cat decided to stay at home"; $iMatched = s/[a-e]/#/g; # $_ is now # "###s th# ##t ###i### to st#y #t hom#" # $iMatched is now 15 # Notice that this works exactly like # tr/[a-e]/#/ The ignore case "i" option is handy for mixed-case strings. $_ = "Babs the cat decided to STAY at home"; $iMatched = s/.a./fred/gi; # $_ is now # "freds the fred decided to Sfredfred home" # $iMatched is now 4 $_ = "Babs the cat decided to STAY at home"; $iMatched = s/..a.../fred/gi; # $_ is now "Babs thefredecided to fredt home" # $iMatched is now 2
The matching operator
This allows you to match an instance of a string or character within another for use in commands involving truth such as if statements.
# Example using the Matching Operator =~ /.../ $sString = "Hello CS290W"; $sToFind = "CS"; if ($sString =~ /$sToFind/) { print "I found CS in your string\n"; } $sToFind = "cs"; # Case Insensitive if ($sString =~ /$sToFind/i) { print "I found CS in your string\n"; } The Matching Operator allows you to check if a string contains another string or a pattern. Notice that we are trying to see if the above string contains the string "CS". In the second, we are performing the same check with an i attached to the end of the regular expression. This i denotes that the Matching Operator should ignore case.

[ Back to Main ]