Advanced String Manipulation
This document was written by CS 290W TA David Corcoran and was last
modified
Sometimes it is useful to be able to modify some of the characters in
a string. You may want to change all A's to B's, change all lowercase
to capital letters, or delete multiple blanks. Perl performs these
type of operations using Translation, Substitution, and Regular
Expressions.
- The Translation (tr) function
- This function takes a string and replaces every occurrence of
some character with another. In the simplest way of doing this (shown
below), the tr function operates on the contents of the $_ variable --
the default variable. Tr takes the first argument as the characters
you are wanting to match. The second is the replacement characters.
Tr returns the number of characters replaced. Both the SearchList and
ReplaceList can be a list of characters or a range of characters.
# Example using the tr function
# tr /SearchList/ReplaceList/;
$_ = "is class over yet";
$iMatched = tr/cioy/CIOY/;
# $_ is now "Is Class Over Yet"
# $iMatched is now 4
$_ = "is class over yet";
$iMatched = tr/a-z/A-Z/;
# $_ is now "IS CLASS OVER YET"
# $iMatched is now 14
Usually the number of characters in the SearchList and ReplaceList are
the same. If ReplaceList is shorter than SearchList, the last
character of ReplaceList is replicated as necessary.
$_ = "babs the cat decided to stay at home";
$iMatched = tr/a-e/$*#/;
# $_ is now
# "*$*s th# #$t ###i### to st$y $t hom#"
# $iMatched is now 15
The next use of tr has no value for the second argument, but follows
the // with the letter "d". This tells tr to delete every instance of
characters in the first argument.
$_ = "babs the cat decided to stay at home";
$iMatched = tr/aeiou//d;
# $_ is now "bbs th ct dcdd t sty t hm"
# $iMatched is now 11
The squeeze "s" option causes multiple identical replacements to be
squeezed into just one occurrence.
$_ = "babs the cat decided to stay at home";
$iMatched = tr/a-e/#/s;
# $_ is now "#s th# #t #i# to st#y #t hom#"
# $iMatched is now 15
The complement "c" option matches all characters NOT in the
SearchList.
$_ =
"Babs,the+cat,decided...to:--;STAY;at+++++home";
$iMatched = tr/A-Za-z/ /cs;
# $_ is now
# "Babs the cat decided to STAY at home"
# $iMatched is now 16
- Regular Expressions
- Regular expressions are a powerful way of identifying patterns
within a string. Here are just a few of many.
cs.
# . matches any character, so this matches
# cs1, cst, cs:, css
cs*
# * means 0, 1, or more of the item, so this
# matches c, cs, css, csss, cssss, ....
^d
# matches d at the beginning of a line
d$
# matches d at the end of a line
^(x|d)
# matches x OR d at the beginning of a line
\bjon
# \b matches word boundaries
# matches jonathon but not Mcjon
jon\b
# matches Mcjon but not jonathon
[0123456789] or [0-9]
# matches any single digit
[ijklm] or [i-m]
# matches i,j,k,l,m
[aeiouAEIOU]
# matches any vowel
c|de|f
# matches c, or de, or f
(chest|pea)nuts
# matches chestnuts or peanuts
Regular expressions can be used with any pattern matching function
such as the Substitution function below.
- The Substitution (s) function
- This function takes a string and replaces patterns found in that
string with replacement text. In the simplest way of doing this
(shown below), the s function operates on the contents of the $_
variable -- the default variable. S takes the first argument as the
pattern you are wanting to match. The second is the replacement
characters. S returns the number of characters replaced.
$_ = "babs the cat decided to stay at home";
$iMatched = s/de/pur/;
# $_ is now
# "babs the cat purcided to stay at home"
# $iMatched is now 1
The first occurrence of "de" is replaced by "pur". What if you want to
replace every occurrence of "de"? Use the global "g" option.
$_ = "babs the cat decided to stay at home";
$iMatched = s/de/pur/g;
# $_ is now
# "babs the cat purcipurd to stay at home"
# $iMatched is now 2
The pattern can be any regular expression:
$_ = "babs the cat decided to stay at home";
$iMatched = s/de./pur/g;
# $_ is now
# "babs the cat puripur to stay at home"
# $iMatched is now 2
$_ = "babs the cat decided to stay at home";
$iMatched = s/\bde./pur/g;
# $_ is now
# "babs the cat purided to stay at home"
# $iMatched is now 1
$_ = "babs the cat decided to stay at home";
$iMatched = s/[a-e]/#/g;
# $_ is now
# "###s th# ##t ###i### to st#y #t hom#"
# $iMatched is now 15
# Notice that this works exactly like
# tr/[a-e]/#/
The ignore case "i" option is handy for mixed-case strings.
$_ = "Babs the cat decided to STAY at home";
$iMatched = s/.a./fred/gi;
# $_ is now
# "freds the fred decided to Sfredfred home"
# $iMatched is now 4
$_ = "Babs the cat decided to STAY at home";
$iMatched = s/..a.../fred/gi;
# $_ is now "Babs thefredecided to fredt home"
# $iMatched is now 2
- The matching operator
- This allows you to match an instance of a string or character
within another for use in commands involving truth such as if
statements.
# Example using the Matching Operator =~ /.../
$sString = "Hello CS290W";
$sToFind = "CS";
if ($sString =~ /$sToFind/)
{
print "I found CS in your string\n";
}
$sToFind = "cs";
# Case Insensitive
if ($sString =~ /$sToFind/i)
{
print "I found CS in your string\n";
}
The Matching Operator allows you to check if a string contains another
string or a pattern. Notice that we are trying to see if the above
string contains the string "CS". In the second, we are performing the
same check with an i attached to the end of the regular expression.
This i denotes that the Matching Operator should ignore case.