grep {base} | R Documentation |
grep
searches for matches to pattern
(its first
argument) within the character vector x
(second
argument). regexpr
does too, but returns more detail in a
different format.
sub
and gsub
perform replacement of matches determined
by regular expression matching.
grep(pattern, x, ignore.case=FALSE, extended=TRUE, perl=FALSE, value=FALSE) sub(pattern, replacement, x, ignore.case=FALSE, extended=TRUE, perl=FALSE) gsub(pattern, replacement, x, ignore.case=FALSE, extended=TRUE, perl=FALSE) regexpr(pattern, text, extended=TRUE, perl=FALSE)
pattern |
character string containing a regular expression to be matched in the given character vector. |
x, text |
a character vector where matches are sought. |
ignore.case |
if FALSE , the pattern matching is case
sensitive and if TRUE , case is ignored during matching. |
extended |
if TRUE , extended regular expression matching
is used, and if FALSE basic regular expressions are used. |
perl |
logical. Should perl-compatible regexps be used if
available? Has priority over extended . |
value |
if FALSE , a vector containing the (integer )
indices of the matches determined by grep is returned, and if
TRUE , a vector containing the matching elements themselves is
returned. |
replacement |
a replacement for matched pattern in sub and
gsub . |
The two *sub
functions differ only in that sub
replaces
only the first occurrence of a pattern
whereas gsub
replaces all occurrences.
The regular expressions used are those specified by POSIX 1003.2,
either extended or basic, depending on the value of the
extended
argument, unless perl = TRUE
when they are
those of PCRE,
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/.
For grep
a vector giving either the indices of the elements of
x
that yielded a match or, if value
is TRUE
, the
matched elements.
For sub
and gsub
a character vector of the same length
as the original.
For regexpr
an integer vector of the same length as text
giving the starting position of the first match, or -1 if there
is none, with attribute "match.length"
giving the length of the
matched text (or -1 for no match).
perl=TRUE
will only be available if R was compiled against
PCRE: this is detected at configure time. All Unix and Windows system
should have it.
agrep
for approximate matching.
tolower
, toupper
and chartr
for character translations.
charmatch
, pmatch
, match
.
apropos
uses regexps and has nice examples.
grep("[a-z]", letters) txt <- c("arm","foot","lefroo", "bafoobar") if(any(i <- grep("foo",txt))) cat("`foo' appears at least once in\n\t",txt,"\n") i # 2 and 4 txt[i] ## Double all 'a' or 'b's; "\" must be escaped, i.e. `doubled' gsub("([ab])", "\\1_\\1_", "abc and ABC") txt <- c("The", "licenses", "for", "most", "software", "are", "designed", "to", "take", "away", "your", "freedom", "to", "share", "and", "change", "it.", "", "By", "contrast,", "the", "GNU", "General", "Public", "License", "is", "intended", "to", "guarantee", "your", "freedom", "to", "share", "and", "change", "free", "software", "--", "to", "make", "sure", "the", "software", "is", "free", "for", "all", "its", "users") ( i <- grep("[gu]", txt) ) # indices stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) ) (ot <- sub("[b-e]",".", txt)) txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution txt[gsub("g","#", txt) != gsub("g","#", txt, ignore.case = TRUE)] # the "G" words regexpr("en", txt) ## trim trailing white space str = 'Now is the time ' sub(' +$', '', str) ## spaces only sub('[[:space:]]+$', '', str) ## white space, POSIX-style if(capabilities("PCRE")) sub('\\s+$', '', str, perl = TRUE) ## perl-style white space