add pcre to in tree libs
git-svn-id: http://svn.freeswitch.org/svn/freeswitch/trunk@3732 d0543943-73ff-0310-b7d9-9358b9ac24b2
This commit is contained in:
parent
f82b80b57c
commit
9da5d7e90f
|
@ -0,0 +1,23 @@
|
|||
THE MAIN PCRE LIBRARY
|
||||
---------------------
|
||||
|
||||
Written by: Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
|
||||
University of Cambridge Computing Service,
|
||||
Cambridge, England. Phone: +44 1223 334714.
|
||||
|
||||
Copyright (c) 1997-2006 University of Cambridge
|
||||
All rights reserved
|
||||
|
||||
|
||||
THE C++ WRAPPER LIBRARY
|
||||
-----------------------
|
||||
|
||||
Written by: Google Inc.
|
||||
|
||||
Copyright (c) 2006 Google Inc
|
||||
All rights reserved
|
||||
|
||||
####
|
|
@ -0,0 +1,68 @@
|
|||
PCRE LICENCE
|
||||
------------
|
||||
|
||||
PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Release 6 of PCRE is distributed under the terms of the "BSD" licence, as
|
||||
specified below. The documentation for PCRE, supplied in the "doc"
|
||||
directory, is distributed under the same terms as the software itself.
|
||||
|
||||
The basic library functions are written in C and are freestanding. Also
|
||||
included in the distribution is a set of C++ wrapper functions.
|
||||
|
||||
|
||||
THE BASIC LIBRARY FUNCTIONS
|
||||
---------------------------
|
||||
|
||||
Written by: Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
|
||||
University of Cambridge Computing Service,
|
||||
Cambridge, England. Phone: +44 1223 334714.
|
||||
|
||||
Copyright (c) 1997-2006 University of Cambridge
|
||||
All rights reserved.
|
||||
|
||||
|
||||
THE C++ WRAPPER FUNCTIONS
|
||||
-------------------------
|
||||
|
||||
Contributed by: Google Inc.
|
||||
|
||||
Copyright (c) 2006, Google Inc.
|
||||
All rights reserved.
|
||||
|
||||
|
||||
THE "BSD" LICENCE
|
||||
-----------------
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the name of the University of Cambridge nor the name of Google
|
||||
Inc. nor the names of their contributors may be used to endorse or
|
||||
promote products derived from this software without specific prior
|
||||
written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
End
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,185 @@
|
|||
Basic Installation
|
||||
==================
|
||||
|
||||
These are generic installation instructions that apply to systems that
|
||||
can run the `configure' shell script - Unix systems and any that imitate
|
||||
it. They are not specific to PCRE. There are PCRE-specific instructions
|
||||
for non-Unix systems in the file NON-UNIX-USE.
|
||||
|
||||
The `configure' shell script attempts to guess correct values for
|
||||
various system-dependent variables used during compilation. It uses
|
||||
those values to create a `Makefile' in each directory of the package.
|
||||
It may also create one or more `.h' files containing system-dependent
|
||||
definitions. Finally, it creates a shell script `config.status' that
|
||||
you can run in the future to recreate the current configuration, a file
|
||||
`config.cache' that saves the results of its tests to speed up
|
||||
reconfiguring, and a file `config.log' containing compiler output
|
||||
(useful mainly for debugging `configure').
|
||||
|
||||
If you need to do unusual things to compile the package, please try
|
||||
to figure out how `configure' could check whether to do them, and mail
|
||||
diffs or instructions to the address given in the `README' so they can
|
||||
be considered for the next release. If at some point `config.cache'
|
||||
contains results you don't want to keep, you may remove or edit it.
|
||||
|
||||
The file `configure.in' is used to create `configure' by a program
|
||||
called `autoconf'. You only need `configure.in' if you want to change
|
||||
it or regenerate `configure' using a newer version of `autoconf'.
|
||||
|
||||
The simplest way to compile this package is:
|
||||
|
||||
1. `cd' to the directory containing the package's source code and type
|
||||
`./configure' to configure the package for your system. If you're
|
||||
using `csh' on an old version of System V, you might need to type
|
||||
`sh ./configure' instead to prevent `csh' from trying to execute
|
||||
`configure' itself.
|
||||
|
||||
Running `configure' takes awhile. While running, it prints some
|
||||
messages telling which features it is checking for.
|
||||
|
||||
2. Type `make' to compile the package.
|
||||
|
||||
3. Optionally, type `make check' to run any self-tests that come with
|
||||
the package.
|
||||
|
||||
4. Type `make install' to install the programs and any data files and
|
||||
documentation.
|
||||
|
||||
5. You can remove the program binaries and object files from the
|
||||
source code directory by typing `make clean'. To also remove the
|
||||
files that `configure' created (so you can compile the package for
|
||||
a different kind of computer), type `make distclean'. There is
|
||||
also a `make maintainer-clean' target, but that is intended mainly
|
||||
for the package's developers. If you use it, you may have to get
|
||||
all sorts of other programs in order to regenerate files that came
|
||||
with the distribution.
|
||||
|
||||
Compilers and Options
|
||||
=====================
|
||||
|
||||
Some systems require unusual options for compilation or linking that
|
||||
the `configure' script does not know about. You can give `configure'
|
||||
initial values for variables by setting them in the environment. Using
|
||||
a Bourne-compatible shell, you can do that on the command line like
|
||||
this:
|
||||
CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure
|
||||
|
||||
Or on systems that have the `env' program, you can do it like this:
|
||||
env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure
|
||||
|
||||
Compiling For Multiple Architectures
|
||||
====================================
|
||||
|
||||
You can compile the package for more than one kind of computer at the
|
||||
same time, by placing the object files for each architecture in their
|
||||
own directory. To do this, you must use a version of `make' that
|
||||
supports the `VPATH' variable, such as GNU `make'. `cd' to the
|
||||
directory where you want the object files and executables to go and run
|
||||
the `configure' script. `configure' automatically checks for the
|
||||
source code in the directory that `configure' is in and in `..'.
|
||||
|
||||
If you have to use a `make' that does not supports the `VPATH'
|
||||
variable, you have to compile the package for one architecture at a time
|
||||
in the source code directory. After you have installed the package for
|
||||
one architecture, use `make distclean' before reconfiguring for another
|
||||
architecture.
|
||||
|
||||
Installation Names
|
||||
==================
|
||||
|
||||
By default, `make install' will install the package's files in
|
||||
`/usr/local/bin', `/usr/local/man', etc. You can specify an
|
||||
installation prefix other than `/usr/local' by giving `configure' the
|
||||
option `--prefix=PATH'.
|
||||
|
||||
You can specify separate installation prefixes for
|
||||
architecture-specific files and architecture-independent files. If you
|
||||
give `configure' the option `--exec-prefix=PATH', the package will use
|
||||
PATH as the prefix for installing programs and libraries.
|
||||
Documentation and other data files will still use the regular prefix.
|
||||
|
||||
In addition, if you use an unusual directory layout you can give
|
||||
options like `--bindir=PATH' to specify different values for particular
|
||||
kinds of files. Run `configure --help' for a list of the directories
|
||||
you can set and what kinds of files go in them.
|
||||
|
||||
If the package supports it, you can cause programs to be installed
|
||||
with an extra prefix or suffix on their names by giving `configure' the
|
||||
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
|
||||
|
||||
Optional Features
|
||||
=================
|
||||
|
||||
Some packages pay attention to `--enable-FEATURE' options to
|
||||
`configure', where FEATURE indicates an optional part of the package.
|
||||
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
|
||||
is something like `gnu-as' or `x' (for the X Window System). The
|
||||
`README' should mention any `--enable-' and `--with-' options that the
|
||||
package recognizes.
|
||||
|
||||
For packages that use the X Window System, `configure' can usually
|
||||
find the X include and library files automatically, but if it doesn't,
|
||||
you can use the `configure' options `--x-includes=DIR' and
|
||||
`--x-libraries=DIR' to specify their locations.
|
||||
|
||||
Specifying the System Type
|
||||
==========================
|
||||
|
||||
There may be some features `configure' can not figure out
|
||||
automatically, but needs to determine by the type of host the package
|
||||
will run on. Usually `configure' can figure that out, but if it prints
|
||||
a message saying it can not guess the host type, give it the
|
||||
`--host=TYPE' option. TYPE can either be a short name for the system
|
||||
type, such as `sun4', or a canonical name with three fields:
|
||||
CPU-COMPANY-SYSTEM
|
||||
|
||||
See the file `config.sub' for the possible values of each field. If
|
||||
`config.sub' isn't included in this package, then this package doesn't
|
||||
need to know the host type.
|
||||
|
||||
If you are building compiler tools for cross-compiling, you can also
|
||||
use the `--target=TYPE' option to select the type of system they will
|
||||
produce code for and the `--build=TYPE' option to select the type of
|
||||
system on which you are compiling the package.
|
||||
|
||||
Sharing Defaults
|
||||
================
|
||||
|
||||
If you want to set default values for `configure' scripts to share,
|
||||
you can create a site shell script called `config.site' that gives
|
||||
default values for variables like `CC', `cache_file', and `prefix'.
|
||||
`configure' looks for `PREFIX/share/config.site' if it exists, then
|
||||
`PREFIX/etc/config.site' if it exists. Or, you can set the
|
||||
`CONFIG_SITE' environment variable to the location of the site script.
|
||||
A warning: not all `configure' scripts look for a site script.
|
||||
|
||||
Operation Controls
|
||||
==================
|
||||
|
||||
`configure' recognizes the following options to control how it
|
||||
operates.
|
||||
|
||||
`--cache-file=FILE'
|
||||
Use and save the results of the tests in FILE instead of
|
||||
`./config.cache'. Set FILE to `/dev/null' to disable caching, for
|
||||
debugging `configure'.
|
||||
|
||||
`--help'
|
||||
Print a summary of the options to `configure', and exit.
|
||||
|
||||
`--quiet'
|
||||
`--silent'
|
||||
`-q'
|
||||
Do not print messages saying which checks are being made. To
|
||||
suppress all normal output, redirect it to `/dev/null' (any error
|
||||
messages will still be shown).
|
||||
|
||||
`--srcdir=DIR'
|
||||
Look for the package's source code in directory DIR. Usually
|
||||
`configure' can determine that directory automatically.
|
||||
|
||||
`--version'
|
||||
Print the version of Autoconf used to generate the `configure'
|
||||
script, and exit.
|
||||
|
||||
`configure' also accepts some other, not widely useful, options.
|
|
@ -0,0 +1,68 @@
|
|||
PCRE LICENCE
|
||||
------------
|
||||
|
||||
PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Release 6 of PCRE is distributed under the terms of the "BSD" licence, as
|
||||
specified below. The documentation for PCRE, supplied in the "doc"
|
||||
directory, is distributed under the same terms as the software itself.
|
||||
|
||||
The basic library functions are written in C and are freestanding. Also
|
||||
included in the distribution is a set of C++ wrapper functions.
|
||||
|
||||
|
||||
THE BASIC LIBRARY FUNCTIONS
|
||||
---------------------------
|
||||
|
||||
Written by: Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
|
||||
University of Cambridge Computing Service,
|
||||
Cambridge, England. Phone: +44 1223 334714.
|
||||
|
||||
Copyright (c) 1997-2006 University of Cambridge
|
||||
All rights reserved.
|
||||
|
||||
|
||||
THE C++ WRAPPER FUNCTIONS
|
||||
-------------------------
|
||||
|
||||
Contributed by: Google Inc.
|
||||
|
||||
Copyright (c) 2006, Google Inc.
|
||||
All rights reserved.
|
||||
|
||||
|
||||
THE "BSD" LICENCE
|
||||
-----------------
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the name of the University of Cambridge nor the name of Google
|
||||
Inc. nor the names of their contributors may be used to endorse or
|
||||
promote products derived from this software without specific prior
|
||||
written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
End
|
|
@ -0,0 +1,606 @@
|
|||
|
||||
# Makefile.in for PCRE (Perl-Compatible Regular Expression) library.
|
||||
|
||||
|
||||
#############################################################################
|
||||
|
||||
# PCRE is developed on a Unix system. I do not use Windows or Macs, and know
|
||||
# nothing about building software on them. Although the code of PCRE should
|
||||
# be very portable, the building system in this Makefile is designed for Unix
|
||||
# systems. However, there are features that have been supplied to me by various
|
||||
# people that should make it work on MinGW and Cygwin systems.
|
||||
|
||||
# This setting enables Unix-style directory scanning in pcregrep, triggered
|
||||
# by the -f option. Maybe one day someone will add code for other systems.
|
||||
|
||||
PCREGREP_OSTYPE=-DIS_UNIX
|
||||
|
||||
#############################################################################
|
||||
|
||||
|
||||
# Libtool places .o files in the .libs directory; this can mean that "make"
|
||||
# thinks is it not up-to-date when in fact it is. This setting helps when
|
||||
# GNU "make" is being used. It presumably does no harm in other cases.
|
||||
|
||||
VPATH=.libs
|
||||
|
||||
|
||||
#---------------------------------------------------------------------------#
|
||||
# The following lines are modified by "configure" to insert data that it is #
|
||||
# given in its arguments, or which it finds out for itself. #
|
||||
#---------------------------------------------------------------------------#
|
||||
|
||||
SHELL = @SHELL@
|
||||
prefix = @prefix@
|
||||
exec_prefix = @exec_prefix@
|
||||
top_srcdir = @top_srcdir@
|
||||
|
||||
mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs
|
||||
|
||||
# NB: top_builddir is not referred to directly below, but it is used in the
|
||||
# setting of $(LIBTOOL), so don't remove it!
|
||||
|
||||
top_builddir = .
|
||||
|
||||
# BINDIR is the directory in which the pcregrep, pcretest, and pcre-config
|
||||
# commands are installed.
|
||||
# INCDIR is the directory in which the public header files pcre.h and
|
||||
# pcreposix.h are installed.
|
||||
# LIBDIR is the directory in which the libraries are installed.
|
||||
# MANDIR is the directory in which the man pages are installed.
|
||||
|
||||
BINDIR = @bindir@
|
||||
LIBDIR = @libdir@
|
||||
INCDIR = @includedir@
|
||||
MANDIR = @mandir@
|
||||
|
||||
# EXEEXT is set by configure to the extention of an executable file
|
||||
# OBJEXT is set by configure to the extention of an object file
|
||||
# The BUILD_* equivalents are the same but for the host we're building on
|
||||
|
||||
EXEEXT = @EXEEXT@
|
||||
OBJEXT = @OBJEXT@
|
||||
# Note that these are just here to have a convenient place to look at the
|
||||
# outcome.
|
||||
BUILD_EXEEXT = @BUILD_EXEEXT@
|
||||
BUILD_OBJEXT = @BUILD_OBJEXT@
|
||||
|
||||
# POSIX_OBJ and POSIX_LOBJ are either set empty, or to the names of the
|
||||
# POSIX object files.
|
||||
|
||||
POSIX_OBJ = @POSIX_OBJ@
|
||||
POSIX_LOBJ = @POSIX_LOBJ@
|
||||
|
||||
# The compiler, C flags, preprocessor flags, etc
|
||||
|
||||
CC = @CC@
|
||||
CXX = @CXX@
|
||||
CFLAGS = @CFLAGS@
|
||||
CXXFLAGS = @CXXFLAGS@
|
||||
LDFLAGS = @LDFLAGS@
|
||||
CXXLDFLAGS = @CXXLDFLAGS@
|
||||
|
||||
CC_FOR_BUILD = @CC_FOR_BUILD@
|
||||
CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
|
||||
CXX_FOR_BUILD = @CXX_FOR_BUILD@
|
||||
CXXFLAGS_FOR_BUILD = @CXXFLAGS_FOR_BUILD@
|
||||
LDFLAGS_FOR_BUILD = $(LDFLAGS)
|
||||
|
||||
UCP = @UCP@
|
||||
UTF8 = @UTF8@
|
||||
NEWLINE = @NEWLINE@
|
||||
POSIX_MALLOC_THRESHOLD = @POSIX_MALLOC_THRESHOLD@
|
||||
LINK_SIZE = @LINK_SIZE@
|
||||
MATCH_LIMIT = @MATCH_LIMIT@ @MATCH_LIMIT_RECURSION@
|
||||
NO_RECURSE = @NO_RECURSE@
|
||||
EBCDIC = @EBCDIC@
|
||||
|
||||
INSTALL = @INSTALL@
|
||||
INSTALL_DATA = @INSTALL_DATA@
|
||||
|
||||
# LIBTOOL enables the building of shared and static libraries. It is set up
|
||||
# to do one or the other or both by ./configure.
|
||||
|
||||
LIBTOOL = @LIBTOOL@
|
||||
LTCOMPILE = $(LIBTOOL) --mode=compile $(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(NEWLINE) $(LINK_SIZE) $(MATCH_LIMIT) $(NO_RECURSE) $(EBCDIC)
|
||||
LTCXXCOMPILE = $(LIBTOOL) --mode=compile $(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(NEWLINE) $(LINK_SIZE) $(MATCH_LIMIT) $(NO_RECURSE) $(EBCDIC)
|
||||
@ON_WINDOWS@LINK = $(CC) $(LDFLAGS) -I. -I$(top_srcdir) -L.libs
|
||||
@NOT_ON_WINDOWS@LINK = $(LIBTOOL) --mode=link $(CC) $(CFLAGS) $(LDFLAGS) -I. -I$(top_srcdir)
|
||||
LINKLIB = $(LIBTOOL) --mode=link $(CC) -export-symbols-regex '^[^_]' $(LDFLAGS) -I. -I$(top_srcdir)
|
||||
LINK_FOR_BUILD = $(LIBTOOL) --mode=link $(CC_FOR_BUILD) $(CFLAGS_FOR_BUILD) $(LDFLAGS_FOR_BUILD) -I. -I$(top_srcdir)
|
||||
@ON_WINDOWS@CXXLINK = $(CXX) $(LDFLAGS) -I. -I$(top_srcdir) -L.libs
|
||||
@NOT_ON_WINDOWS@CXXLINK = $(LIBTOOL) --mode=link $(CXX) $(CXXFLAGS) $(CXXLDFLAGS) -I. -I$(top_srcdir)
|
||||
CXXLINKLIB = $(LIBTOOL) --mode=link $(CXX) $(LDFLAGS) -I. -I$(top_srcdir)
|
||||
|
||||
# These are the version numbers for the shared libraries
|
||||
|
||||
PCRELIBVERSION = @PCRE_LIB_VERSION@
|
||||
PCREPOSIXLIBVERSION = @PCRE_POSIXLIB_VERSION@
|
||||
PCRECPPLIBVERSION = @PCRE_CPPLIB_VERSION@
|
||||
|
||||
##############################################################################
|
||||
|
||||
|
||||
OBJ = pcre_chartables.@OBJEXT@ \
|
||||
pcre_compile.@OBJEXT@ \
|
||||
pcre_config.@OBJEXT@ \
|
||||
pcre_dfa_exec.@OBJEXT@ \
|
||||
pcre_exec.@OBJEXT@ \
|
||||
pcre_fullinfo.@OBJEXT@ \
|
||||
pcre_get.@OBJEXT@ \
|
||||
pcre_globals.@OBJEXT@ \
|
||||
pcre_info.@OBJEXT@ \
|
||||
pcre_maketables.@OBJEXT@ \
|
||||
pcre_ord2utf8.@OBJEXT@ \
|
||||
pcre_refcount.@OBJEXT@ \
|
||||
pcre_study.@OBJEXT@ \
|
||||
pcre_tables.@OBJEXT@ \
|
||||
pcre_try_flipped.@OBJEXT@ \
|
||||
pcre_ucp_searchfuncs.@OBJEXT@ \
|
||||
pcre_valid_utf8.@OBJEXT@ \
|
||||
pcre_version.@OBJEXT@ \
|
||||
pcre_xclass.@OBJEXT@ \
|
||||
$(POSIX_OBJ)
|
||||
|
||||
LOBJ = pcre_chartables.lo \
|
||||
pcre_compile.lo \
|
||||
pcre_config.lo \
|
||||
pcre_dfa_exec.lo \
|
||||
pcre_exec.lo \
|
||||
pcre_fullinfo.lo \
|
||||
pcre_get.lo \
|
||||
pcre_globals.lo \
|
||||
pcre_info.lo \
|
||||
pcre_maketables.lo \
|
||||
pcre_ord2utf8.lo \
|
||||
pcre_refcount.lo \
|
||||
pcre_study.lo \
|
||||
pcre_tables.lo \
|
||||
pcre_try_flipped.lo \
|
||||
pcre_ucp_searchfuncs.lo \
|
||||
pcre_valid_utf8.lo \
|
||||
pcre_version.lo \
|
||||
pcre_xclass.lo \
|
||||
$(POSIX_LOBJ)
|
||||
|
||||
CPPOBJ = pcrecpp.@OBJEXT@ \
|
||||
pcre_scanner.@OBJEXT@ \
|
||||
pcre_stringpiece.@OBJEXT@
|
||||
|
||||
CPPLOBJ = pcrecpp.lo \
|
||||
pcre_scanner.lo \
|
||||
pcre_stringpiece.lo
|
||||
|
||||
CPP_TARGETS = libpcrecpp.la \
|
||||
pcrecpp_unittest@EXEEXT@ \
|
||||
pcre_scanner_unittest@EXEEXT@ \
|
||||
pcre_stringpiece_unittest@EXEEXT@
|
||||
|
||||
all: libpcre.la @POSIX_LIB@ pcretest@EXEEXT@ pcregrep@EXEEXT@ \
|
||||
@MAYBE_CPP_TARGETS@ @ON_WINDOWS@ winshared
|
||||
|
||||
pcregrep@EXEEXT@: libpcre.la pcregrep.@OBJEXT@ @ON_WINDOWS@ winshared
|
||||
$(LINK) -o pcregrep@EXEEXT@ pcregrep.@OBJEXT@ libpcre.la
|
||||
|
||||
pcretest@EXEEXT@: libpcre.la @POSIX_LIB@ pcretest.@OBJEXT@ \
|
||||
@ON_WINDOWS@ winshared
|
||||
$(LINK) $(PURIFY) $(EFENCE) -o pcretest@EXEEXT@ \
|
||||
pcretest.@OBJEXT@ \
|
||||
libpcre.la @POSIX_LIB@
|
||||
|
||||
pcrecpp_unittest@EXEEXT@: libpcrecpp.la pcrecpp_unittest.@OBJEXT@ \
|
||||
@ON_WINDOWS@ winshared
|
||||
$(CXXLINK) $(PURIFY) $(EFENCE) -o pcrecpp_unittest@EXEEXT@ \
|
||||
pcrecpp_unittest.@OBJEXT@ \
|
||||
libpcrecpp.la @POSIX_LIB@
|
||||
|
||||
pcre_scanner_unittest@EXEEXT@: libpcrecpp.la pcre_scanner_unittest.@OBJEXT@ \
|
||||
@ON_WINDOWS@ winshared
|
||||
$(CXXLINK) $(PURIFY) $(EFENCE) \
|
||||
-o pcre_scanner_unittest@EXEEXT@ \
|
||||
pcre_scanner_unittest.@OBJEXT@ \
|
||||
libpcrecpp.la @POSIX_LIB@
|
||||
|
||||
pcre_stringpiece_unittest@EXEEXT@: libpcrecpp.la \
|
||||
pcre_stringpiece_unittest.@OBJEXT@ @ON_WINDOWS@ winshared
|
||||
$(CXXLINK) $(PURIFY) $(EFENCE) \
|
||||
-o pcre_stringpiece_unittest@EXEEXT@ \
|
||||
pcre_stringpiece_unittest.@OBJEXT@ \
|
||||
libpcrecpp.la @POSIX_LIB@
|
||||
|
||||
libpcre.la: $(OBJ)
|
||||
-rm -f libpcre.la
|
||||
$(LINKLIB) -rpath $(LIBDIR) -version-info \
|
||||
'$(PCRELIBVERSION)' -o libpcre.la $(LOBJ)
|
||||
|
||||
libpcreposix.la: libpcre.la pcreposix.@OBJEXT@
|
||||
-rm -f libpcreposix.la
|
||||
$(LINKLIB) -rpath $(LIBDIR) libpcre.la -version-info \
|
||||
'$(PCREPOSIXLIBVERSION)' -o libpcreposix.la pcreposix.lo
|
||||
|
||||
libpcrecpp.la: libpcre.la $(CPPOBJ)
|
||||
-rm -f libpcrecpp.la
|
||||
$(CXXLINKLIB) -rpath $(LIBDIR) libpcre.la -version-info \
|
||||
'$(PCRECPPLIBVERSION)' -o libpcrecpp.la $(CPPLOBJ)
|
||||
|
||||
# Note that files generated by ./configure and by dftables are in the current
|
||||
# directory, not the source directory.
|
||||
|
||||
pcre_chartables.@OBJEXT@: pcre_chartables.c
|
||||
@$(LTCOMPILE) pcre_chartables.c
|
||||
|
||||
pcre_compile.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_compile.c \
|
||||
$(top_srcdir)/pcre_printint.src
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_compile.c
|
||||
|
||||
pcre_config.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_config.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_config.c
|
||||
|
||||
pcre_dfa_exec.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_dfa_exec.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_dfa_exec.c
|
||||
|
||||
pcre_exec.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_exec.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_exec.c
|
||||
|
||||
pcre_fullinfo.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_fullinfo.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_fullinfo.c
|
||||
|
||||
pcre_get.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_get.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_get.c
|
||||
|
||||
pcre_globals.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_globals.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_globals.c
|
||||
|
||||
pcre_info.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_info.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_info.c
|
||||
|
||||
pcre_maketables.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_maketables.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_maketables.c
|
||||
|
||||
pcre_ord2utf8.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_ord2utf8.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_ord2utf8.c
|
||||
|
||||
pcre_refcount.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_refcount.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_refcount.c
|
||||
|
||||
pcre_study.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_study.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_study.c
|
||||
|
||||
pcre_tables.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_tables.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_tables.c
|
||||
|
||||
pcre_try_flipped.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_try_flipped.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_try_flipped.c
|
||||
|
||||
pcre_ucp_searchfuncs.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h \
|
||||
$(top_srcdir)/pcre_ucp_searchfuncs.c \
|
||||
$(top_srcdir)/ucptable.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_ucp_searchfuncs.c
|
||||
|
||||
pcre_valid_utf8.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_valid_utf8.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_valid_utf8.c
|
||||
|
||||
pcre_version.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_version.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_version.c
|
||||
|
||||
pcre_xclass.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_xclass.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_xclass.c
|
||||
|
||||
pcreposix.@OBJEXT@: $(top_srcdir)/pcreposix.c $(top_srcdir)/pcreposix.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre.h config.h Makefile
|
||||
@$(LTCOMPILE) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcreposix.c
|
||||
|
||||
pcrecpp.@OBJEXT@: $(top_srcdir)/pcrecpp.cc $(top_srcdir)/pcrecpp.h \
|
||||
pcrecpparg.h pcre_stringpiece.h $(top_srcdir)/pcre.h config.h Makefile
|
||||
@$(LTCXXCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcrecpp.cc
|
||||
|
||||
pcre_scanner.@OBJEXT@: $(top_srcdir)/pcre_scanner.cc \
|
||||
$(top_srcdir)/pcre_scanner.h \
|
||||
$(top_srcdir)/pcrecpp.h pcrecpparg.h pcre_stringpiece.h \
|
||||
$(top_srcdir)/pcre.h config.h Makefile
|
||||
@$(LTCXXCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_scanner.cc
|
||||
|
||||
pcre_stringpiece.@OBJEXT@: $(top_srcdir)/pcre_stringpiece.cc \
|
||||
pcre_stringpiece.h \
|
||||
config.h Makefile
|
||||
@$(LTCXXCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_stringpiece.cc
|
||||
|
||||
pcretest.@OBJEXT@: $(top_srcdir)/pcretest.c $(top_srcdir)/pcre_internal.h \
|
||||
$(top_srcdir)/pcre_printint.src $(top_srcdir)/pcre.h config.h Makefile
|
||||
$(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(LINK_SIZE) $(top_srcdir)/pcretest.c
|
||||
|
||||
pcrecpp_unittest.@OBJEXT@: $(top_srcdir)/pcrecpp_unittest.cc \
|
||||
$(top_srcdir)/pcrecpp.h \
|
||||
pcrecpparg.h pcre_stringpiece.h $(top_srcdir)/pcre.h config.h Makefile
|
||||
$(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(LINK_SIZE) $(top_srcdir)/pcrecpp_unittest.cc
|
||||
|
||||
pcre_stringpiece_unittest.@OBJEXT@: $(top_srcdir)/pcre_stringpiece_unittest.cc \
|
||||
pcre_stringpiece.h pcrecpparg.h config.h Makefile
|
||||
$(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(LINK_SIZE) $(top_srcdir)/pcre_stringpiece_unittest.cc
|
||||
|
||||
pcre_scanner_unittest.@OBJEXT@: $(top_srcdir)/pcre_scanner_unittest.cc \
|
||||
$(top_srcdir)/pcre_scanner.h \
|
||||
$(top_srcdir)/pcrecpp.h pcre_stringpiece.h \
|
||||
$(top_srcdir)/pcre.h pcrecpparg.h config.h Makefile
|
||||
$(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(LINK_SIZE) $(top_srcdir)/pcre_scanner_unittest.cc
|
||||
|
||||
pcregrep.@OBJEXT@: $(top_srcdir)/pcregrep.c $(top_srcdir)/pcre.h Makefile config.h
|
||||
$(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(PCREGREP_OSTYPE) $(top_srcdir)/pcregrep.c
|
||||
|
||||
# Some Windows-specific targets for MinGW. Do not use for Cygwin.
|
||||
|
||||
winshared : .libs/@WIN_PREFIX@pcre.dll .libs/@WIN_PREFIX@pcreposix.dll \
|
||||
.libs/@WIN_PREFIX@pcrecpp.dll
|
||||
|
||||
.libs/@WIN_PREFIX@pcre.dll : libpcre.la
|
||||
$(CC) $(CFLAGS) -shared -o $@ \
|
||||
-Wl,--whole-archive .libs/libpcre.a \
|
||||
-Wl,--out-implib,.libs/libpcre.dll.a \
|
||||
-Wl,--output-def,.libs/@WIN_PREFIX@pcre.dll-def \
|
||||
-Wl,--export-all-symbols \
|
||||
-Wl,--no-whole-archive
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
|
||||
< .libs/libpcre.lai > .libs/libpcre.lai.tmp && \
|
||||
mv -f .libs/libpcre.lai.tmp .libs/libpcre.lai
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
|
||||
< libpcre.la > libpcre.la.tmp && \
|
||||
mv -f libpcre.la.tmp libpcre.la
|
||||
|
||||
|
||||
.libs/@WIN_PREFIX@pcreposix.dll: libpcreposix.la libpcre.la
|
||||
$(CC) $(CFLAGS) -shared -o $@ \
|
||||
-Wl,--whole-archive .libs/libpcreposix.a \
|
||||
-Wl,--out-implib,.libs/@WIN_PREFIX@pcreposix.dll.a \
|
||||
-Wl,--output-def,.libs/@WIN_PREFIX@libpcreposix.dll-def \
|
||||
-Wl,--export-all-symbols \
|
||||
-Wl,--no-whole-archive .libs/libpcre.a
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
|
||||
< .libs/libpcreposix.lai > .libs/libpcreposix.lai.tmp && \
|
||||
mv -f .libs/libpcreposix.lai.tmp .libs/libpcreposix.lai
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
|
||||
< libpcreposix.la > libpcreposix.la.tmp && \
|
||||
mv -f libpcreposix.la.tmp libpcreposix.la
|
||||
|
||||
.libs/@WIN_PREFIX@pcrecpp.dll: libpcrecpp.la libpcre.la
|
||||
$(CXX) $(CXXFLAGS) -shared -o $@ \
|
||||
-Wl,--whole-archive .libs/libpcrecpp.a \
|
||||
-Wl,--out-implib,.libs/@WIN_PREFIX@pcrecpp.dll.a \
|
||||
-Wl,--output-def,.libs/@WIN_PREFIX@libpcrecpp.dll-def \
|
||||
-Wl,--export-all-symbols \
|
||||
-Wl,--no-whole-archive .libs/libpcre.a
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcrecpp.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcrecpp.dll.a'#"\
|
||||
< .libs/libpcrecpp.lai > .libs/libpcrecpp.lai.tmp && \
|
||||
mv -f .libs/libpcrecpp.lai.tmp .libs/libpcrecpp.lai
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcrecpp.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcrecpp.dll.a'#"\
|
||||
< libpcrecpp.la > libpcrecpp.la.tmp && \
|
||||
mv -f libpcrecpp.la.tmp libpcrecpp.la
|
||||
|
||||
|
||||
wininstall : winshared
|
||||
$(mkinstalldirs) $(DESTDIR)$(LIBDIR)
|
||||
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
|
||||
$(INSTALL) .libs/@WIN_PREFIX@pcre.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
|
||||
$(INSTALL) .libs/@WIN_PREFIX@pcreposix.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
|
||||
$(INSTALL) .libs/@WIN_PREFIX@libpcreposix.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcreposix.dll.a
|
||||
$(INSTALL) .libs/@WIN_PREFIX@libpcre.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcre.dll.a
|
||||
@HAVE_CPP@ $(INSTALL) .libs/@WIN_PREFIX@pcrecpp.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcrecpp.dll
|
||||
@HAVE_CPP@ $(INSTALL) .libs/@WIN_PREFIX@libpcrecpp.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcrecpp.dll.a
|
||||
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
|
||||
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
|
||||
@HAVE_CPP@ -strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcrecpp.dll
|
||||
-strip $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
|
||||
-strip $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
|
||||
|
||||
# An auxiliary program makes the default character table source. This is put
|
||||
# in the current directory, NOT the $top_srcdir directory.
|
||||
|
||||
pcre_chartables.c: dftables@BUILD_EXEEXT@
|
||||
./dftables@BUILD_EXEEXT@ pcre_chartables.c
|
||||
|
||||
dftables.@BUILD_OBJEXT@: $(top_srcdir)/dftables.c \
|
||||
$(top_srcdir)/pcre_maketables.c $(top_srcdir)/pcre_internal.h \
|
||||
$(top_srcdir)/pcre.h config.h Makefile
|
||||
$(CC_FOR_BUILD) -c $(CFLAGS_FOR_BUILD) -I. $(top_srcdir)/dftables.c
|
||||
|
||||
dftables@BUILD_EXEEXT@: dftables.@BUILD_OBJEXT@
|
||||
$(LINK_FOR_BUILD) -o dftables@BUILD_EXEEXT@ dftables.@OBJEXT@
|
||||
|
||||
install: all @ON_WINDOWS@ wininstall
|
||||
@NOT_ON_WINDOWS@ $(mkinstalldirs) $(DESTDIR)$(LIBDIR)
|
||||
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la"
|
||||
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la
|
||||
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la"
|
||||
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la
|
||||
@NOT_ON_WINDOWS@@HAVE_CPP@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcrecpp.la $(DESTDIR)$(LIBDIR)/libpcrecpp.la"
|
||||
@NOT_ON_WINDOWS@@HAVE_CPP@ $(LIBTOOL) --mode=install $(INSTALL) libpcrecpp.la $(DESTDIR)$(LIBDIR)/libpcrecpp.la
|
||||
@NOT_ON_WINDOWS@ $(LIBTOOL) --finish $(DESTDIR)$(LIBDIR)
|
||||
$(mkinstalldirs) $(DESTDIR)$(INCDIR)
|
||||
$(INSTALL_DATA) $(top_srcdir)/pcre.h $(DESTDIR)$(INCDIR)/pcre.h
|
||||
$(INSTALL_DATA) $(top_srcdir)/pcreposix.h $(DESTDIR)$(INCDIR)/pcreposix.h
|
||||
@HAVE_CPP@ $(INSTALL_DATA) $(top_srcdir)/pcrecpp.h $(DESTDIR)$(INCDIR)/pcrecpp.h
|
||||
@HAVE_CPP@ $(INSTALL_DATA) pcrecpparg.h $(DESTDIR)$(INCDIR)/pcrecpparg.h
|
||||
@HAVE_CPP@ $(INSTALL_DATA) pcre_stringpiece.h $(DESTDIR)$(INCDIR)/pcre_stringpiece.h
|
||||
@HAVE_CPP@ $(INSTALL_DATA) $(top_srcdir)/pcre_scanner.h $(DESTDIR)$(INCDIR)/pcre_scanner.h
|
||||
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre.3 $(DESTDIR)$(MANDIR)/man3/pcre.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreapi.3 $(DESTDIR)$(MANDIR)/man3/pcreapi.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrebuild.3 $(DESTDIR)$(MANDIR)/man3/pcrebuild.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecallout.3 $(DESTDIR)$(MANDIR)/man3/pcrecallout.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecompat.3 $(DESTDIR)$(MANDIR)/man3/pcrecompat.3
|
||||
@HAVE_CPP@ $(INSTALL_DATA) $(top_srcdir)/doc/pcrecpp.3 $(DESTDIR)$(MANDIR)/man3/pcrecpp.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrematching.3 $(DESTDIR)$(MANDIR)/man3/pcrematching.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrepartial.3 $(DESTDIR)$(MANDIR)/man3/pcrepartial.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrepattern.3 $(DESTDIR)$(MANDIR)/man3/pcrepattern.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreperform.3 $(DESTDIR)$(MANDIR)/man3/pcreperform.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreposix.3 $(DESTDIR)$(MANDIR)/man3/pcreposix.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreprecompile.3 $(DESTDIR)$(MANDIR)/man3/pcreprecompile.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcresample.3 $(DESTDIR)$(MANDIR)/man3/pcresample.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrestack.3 $(DESTDIR)$(MANDIR)/man3/pcrestack.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_compile.3 $(DESTDIR)$(MANDIR)/man3/pcre_compile.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_compile2.3 $(DESTDIR)$(MANDIR)/man3/pcre_compile2.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_config.3 $(DESTDIR)$(MANDIR)/man3/pcre_config.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_named_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_dfa_exec.3 $(DESTDIR)$(MANDIR)/man3/pcre_dfa_exec.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_exec.3 $(DESTDIR)$(MANDIR)/man3/pcre_exec.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring_list.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_fullinfo.3 $(DESTDIR)$(MANDIR)/man3/pcre_fullinfo.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_named_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_stringnumber.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_stringnumber.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_stringtable_entries.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_stringtable_entries.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring_list.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_info.3 $(DESTDIR)$(MANDIR)/man3/pcre_info.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_maketables.3 $(DESTDIR)$(MANDIR)/man3/pcre_maketables.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_refcount.3 $(DESTDIR)$(MANDIR)/man3/pcre_refcount.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_study.3 $(DESTDIR)$(MANDIR)/man3/pcre_study.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_version.3 $(DESTDIR)$(MANDIR)/man3/pcre_version.3
|
||||
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man1
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcregrep.1 $(DESTDIR)$(MANDIR)/man1/pcregrep.1
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcretest.1 $(DESTDIR)$(MANDIR)/man1/pcretest.1
|
||||
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
|
||||
$(LIBTOOL) --mode=install $(INSTALL) pcregrep@EXEEXT@ $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
|
||||
$(LIBTOOL) --mode=install $(INSTALL) pcretest@EXEEXT@ $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
|
||||
$(INSTALL) pcre-config $(DESTDIR)$(BINDIR)/pcre-config
|
||||
$(mkinstalldirs) $(DESTDIR)$(LIBDIR)/pkgconfig
|
||||
$(INSTALL_DATA) libpcre.pc $(DESTDIR)$(LIBDIR)/pkgconfig/libpcre.pc
|
||||
|
||||
# The uninstall target removes all the files that were installed.
|
||||
|
||||
uninstall:; -rm -rf \
|
||||
$(DESTDIR)$(LIBDIR)/libpcre.* \
|
||||
$(DESTDIR)$(LIBDIR)/libpcreposix.* \
|
||||
$(DESTDIR)$(LIBDIR)/libpcrecpp.* \
|
||||
$(DESTDIR)$(INCDIR)/pcre.h \
|
||||
$(DESTDIR)$(INCDIR)/pcreposix.h \
|
||||
$(DESTDIR)$(INCDIR)/pcrecpp.h \
|
||||
$(DESTDIR)$(INCDIR)/pcrecpparg.h \
|
||||
$(DESTDIR)$(INCDIR)/pcre_scanner.h \
|
||||
$(DESTDIR)$(INCDIR)/pcre_stringpiece.h \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcreapi.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrebuild.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrecallout.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrecompat.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrecpp.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrematching.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrepartial.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrepattern.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcreperform.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcreposix.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcreprecompile.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcresample.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrestack.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_compile.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_compile2.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_config.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_copy_named_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_copy_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_dfa_exec.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_exec.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_free_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_free_substring_list.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_fullinfo.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_named_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_stringnumber.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_stringtable_entries.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_substring_list.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_info.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_maketables.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_refcount.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_study.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_version.3 \
|
||||
$(DESTDIR)$(MANDIR)/man1/pcregrep.1 \
|
||||
$(DESTDIR)$(MANDIR)/man1/pcretest.1 \
|
||||
$(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@ \
|
||||
$(DESTDIR)$(BINDIR)/pcretest@EXEEXT@ \
|
||||
$(DESTDIR)$(BINDIR)/pcre-config \
|
||||
$(DESTDIR)$(LIBDIR)/pkgconfig/libpcre.pc
|
||||
|
||||
# We deliberately omit dftables and pcre_chartables.c from 'make clean'; once
|
||||
# made pcre_chartables.c shouldn't change, and if people have edited the tables
|
||||
# by hand, you don't want to throw them away.
|
||||
|
||||
clean:; -rm -rf *.@OBJEXT@ *.lo *.a *.la .libs pcretest@EXEEXT@ pcre_stringpiece_unittest@EXEEXT@ pcrecpp_unittest@EXEEXT@ pcre_scanner_unittest@EXEEXT@ pcregrep@EXEEXT@ testtry
|
||||
|
||||
# But "make distclean" should get back to a virgin distribution
|
||||
|
||||
distclean: clean
|
||||
-rm -f pcre_chartables.c libtool pcre-config libpcre.pc \
|
||||
pcre_stringpiece.h pcrecpparg.h \
|
||||
dftables@EXEEXT@ RunGrepTest RunTest \
|
||||
Makefile config.h config.status config.log config.cache
|
||||
|
||||
check: runtest
|
||||
|
||||
@WIN_PREFIX@pcre.dll : winshared
|
||||
cp .libs/@WIN_PREFIX@pcre.dll .
|
||||
|
||||
test: runtest
|
||||
|
||||
runtest: all @ON_WINDOWS@ @WIN_PREFIX@pcre.dll
|
||||
@./RunTest
|
||||
@./RunGrepTest
|
||||
@HAVE_CPP@ @echo ""
|
||||
@HAVE_CPP@ @echo "Testing C++ wrapper"
|
||||
@HAVE_CPP@ @echo ""; echo "Test 1++: stringpiece"
|
||||
@HAVE_CPP@ @./pcre_stringpiece_unittest@EXEEXT@
|
||||
@HAVE_CPP@ @echo ""; echo "Test 2++: RE class"
|
||||
@HAVE_CPP@ @./pcrecpp_unittest@EXEEXT@
|
||||
@HAVE_CPP@ @echo ""; echo "Test 3++: Scanner class"
|
||||
@HAVE_CPP@ @./pcre_scanner_unittest@EXEEXT@
|
||||
|
||||
# End
|
|
@ -0,0 +1,266 @@
|
|||
News about PCRE releases
|
||||
------------------------
|
||||
|
||||
Release 6.7 04-Jul-06
|
||||
---------------------
|
||||
|
||||
The main additions to this release are the ability to use the same name for
|
||||
multiple sets of parentheses, and support for CRLF line endings in both the
|
||||
library and pcregrep (and in pcretest for testing).
|
||||
|
||||
Thanks to Ian Taylor, the stack usage for many kinds of pattern has been
|
||||
significantly reduced for certain subject strings.
|
||||
|
||||
|
||||
Release 6.5 01-Feb-06
|
||||
---------------------
|
||||
|
||||
Important changes in this release:
|
||||
|
||||
1. A number of new features have been added to pcregrep.
|
||||
|
||||
2. The Unicode property tables have been updated to Unicode 4.1.0, and the
|
||||
supported properties have been extended with script names such as "Arabic",
|
||||
and the derived properties "Any" and "L&". This has necessitated a change to
|
||||
the interal format of compiled patterns. Any saved compiled patterns that
|
||||
use \p or \P must be recompiled.
|
||||
|
||||
3. The specification of recursion in patterns has been changed so that all
|
||||
recursive subpatterns are automatically treated as atomic groups. Thus, for
|
||||
example, (?R) is treated as if it were (?>(?R)). This is necessary because
|
||||
otherwise there are situations where recursion does not work.
|
||||
|
||||
See the ChangeLog for a complete list of changes, which include a number of bug
|
||||
fixes and tidies.
|
||||
|
||||
|
||||
Release 6.0 07-Jun-05
|
||||
---------------------
|
||||
|
||||
The release number has been increased to 6.0 because of the addition of several
|
||||
major new pieces of functionality.
|
||||
|
||||
A new function, pcre_dfa_exec(), which implements pattern matching using a DFA
|
||||
algorithm, has been added. This has a number of advantages for certain cases,
|
||||
though it does run more slowly, and lacks the ability to capture substrings. On
|
||||
the other hand, it does find all matches, not just the first, and it works
|
||||
better for partial matching. The pcrematching man page discusses the
|
||||
differences.
|
||||
|
||||
The pcretest program has been enhanced so that it can make use of the new
|
||||
pcre_dfa_exec() matching function and the extra features it provides.
|
||||
|
||||
The distribution now includes a C++ wrapper library. This is built
|
||||
automatically if a C++ compiler is found. The pcrecpp man page discusses this
|
||||
interface.
|
||||
|
||||
The code itself has been re-organized into many more files, one for each
|
||||
function, so it no longer requires everything to be linked in when static
|
||||
linkage is used. As a consequence, some internal functions have had to have
|
||||
their names exposed. These functions all have names starting with _pcre_. They
|
||||
are undocumented, and are not intended for use by outside callers.
|
||||
|
||||
The pcregrep program has been enhanced with new functionality such as
|
||||
multiline-matching and options for output more matching context. See the
|
||||
ChangeLog for a complete list of changes to the library and the utility
|
||||
programs.
|
||||
|
||||
|
||||
Release 5.0 13-Sep-04
|
||||
---------------------
|
||||
|
||||
The licence under which PCRE is released has been changed to the more
|
||||
conventional "BSD" licence.
|
||||
|
||||
In the code, some bugs have been fixed, and there are also some major changes
|
||||
in this release (which is why I've increased the number to 5.0). Some changes
|
||||
are internal rearrangements, and some provide a number of new facilities. The
|
||||
new features are:
|
||||
|
||||
1. There's an "automatic callout" feature that inserts callouts before every
|
||||
item in the regex, and there's a new callout field that gives the position
|
||||
in the pattern - useful for debugging and tracing.
|
||||
|
||||
2. The extra_data structure can now be used to pass in a set of character
|
||||
tables at exec time. This is useful if compiled regex are saved and re-used
|
||||
at a later time when the tables may not be at the same address. If the
|
||||
default internal tables are used, the pointer saved with the compiled
|
||||
pattern is now set to NULL, which means that you don't need to do anything
|
||||
special unless you are using custom tables.
|
||||
|
||||
3. It is possible, with some restrictions on the content of the regex, to
|
||||
request "partial" matching. A special return code is given if all of the
|
||||
subject string matched part of the regex. This could be useful for testing
|
||||
an input field as it is being typed.
|
||||
|
||||
4. There is now some optional support for Unicode character properties, which
|
||||
means that the patterns items such as \p{Lu} and \X can now be used. Only
|
||||
the general category properties are supported. If PCRE is compiled with this
|
||||
support, an additional 90K data structure is include, which increases the
|
||||
size of the library dramatically.
|
||||
|
||||
5. There is support for saving compiled patterns and re-using them later.
|
||||
|
||||
6. There is support for running regular expressions that were compiled on a
|
||||
different host with the opposite endianness.
|
||||
|
||||
7. The pcretest program has been extended to accommodate the new features.
|
||||
|
||||
The main internal rearrangement is that sequences of literal characters are no
|
||||
longer handled as strings. Instead, each character is handled on its own. This
|
||||
makes some UTF-8 handling easier, and makes the support of partial matching
|
||||
possible. Compiled patterns containing long literal strings will be larger as a
|
||||
result of this change; I hope that performance will not be much affected.
|
||||
|
||||
|
||||
Release 4.5 01-Dec-03
|
||||
---------------------
|
||||
|
||||
Again mainly a bug-fix and tidying release, with only a couple of new features:
|
||||
|
||||
1. It's possible now to compile PCRE so that it does not use recursive
|
||||
function calls when matching. Instead it gets memory from the heap. This slows
|
||||
things down, but may be necessary on systems with limited stacks.
|
||||
|
||||
2. UTF-8 string checking has been tightened to reject overlong sequences and to
|
||||
check that a starting offset points to the start of a character. Failure of the
|
||||
latter returns a new error code: PCRE_ERROR_BADUTF8_OFFSET.
|
||||
|
||||
3. PCRE can now be compiled for systems that use EBCDIC code.
|
||||
|
||||
|
||||
Release 4.4 21-Aug-03
|
||||
---------------------
|
||||
|
||||
This is mainly a bug-fix and tidying release. The only new feature is that PCRE
|
||||
checks UTF-8 strings for validity by default. There is an option to suppress
|
||||
this, just in case anybody wants that teeny extra bit of performance.
|
||||
|
||||
|
||||
Releases 4.1 - 4.3
|
||||
------------------
|
||||
|
||||
Sorry, I forgot about updating the NEWS file for these releases. Please take a
|
||||
look at ChangeLog.
|
||||
|
||||
|
||||
Release 4.0 17-Feb-03
|
||||
---------------------
|
||||
|
||||
There have been a lot of changes for the 4.0 release, adding additional
|
||||
functionality and mending bugs. Below is a list of the highlights of the new
|
||||
functionality. For full details of these features, please consult the
|
||||
documentation. For a complete list of changes, see the ChangeLog file.
|
||||
|
||||
1. Support for Perl's \Q...\E escapes.
|
||||
|
||||
2. "Possessive quantifiers" ?+, *+, ++, and {,}+ which come from Sun's Java
|
||||
package. They provide some syntactic sugar for simple cases of "atomic
|
||||
grouping".
|
||||
|
||||
3. Support for the \G assertion. It is true when the current matching position
|
||||
is at the start point of the match.
|
||||
|
||||
4. A new feature that provides some of the functionality that Perl provides
|
||||
with (?{...}). The facility is termed a "callout". The way it is done in PCRE
|
||||
is for the caller to provide an optional function, by setting pcre_callout to
|
||||
its entry point. To get the function called, the regex must include (?C) at
|
||||
appropriate points.
|
||||
|
||||
5. Support for recursive calls to individual subpatterns. This makes it really
|
||||
easy to get totally confused.
|
||||
|
||||
6. Support for named subpatterns. The Python syntax (?P<name>...) is used to
|
||||
name a group.
|
||||
|
||||
7. Several extensions to UTF-8 support; it is now fairly complete. There is an
|
||||
option for pcregrep to make it operate in UTF-8 mode.
|
||||
|
||||
8. The single man page has been split into a number of separate man pages.
|
||||
These also give rise to individual HTML pages which are put in a separate
|
||||
directory. There is an index.html page that lists them all. Some hyperlinking
|
||||
between the pages has been installed.
|
||||
|
||||
|
||||
Release 3.5 15-Aug-01
|
||||
---------------------
|
||||
|
||||
1. The configuring system has been upgraded to use later versions of autoconf
|
||||
and libtool. By default it builds both a shared and a static library if the OS
|
||||
supports it. You can use --disable-shared or --disable-static on the configure
|
||||
command if you want only one of them.
|
||||
|
||||
2. The pcretest utility is now installed along with pcregrep because it is
|
||||
useful for users (to test regexs) and by doing this, it automatically gets
|
||||
relinked by libtool. The documentation has been turned into a man page, so
|
||||
there are now .1, .txt, and .html versions in /doc.
|
||||
|
||||
3. Upgrades to pcregrep:
|
||||
(i) Added long-form option names like gnu grep.
|
||||
(ii) Added --help to list all options with an explanatory phrase.
|
||||
(iii) Added -r, --recursive to recurse into sub-directories.
|
||||
(iv) Added -f, --file to read patterns from a file.
|
||||
|
||||
4. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
|
||||
script, to force use of CR or LF instead of \n in the source. On non-Unix
|
||||
systems, the value can be set in config.h.
|
||||
|
||||
5. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
|
||||
absolute limit. Changed the text of the error message to make this clear, and
|
||||
likewise updated the man page.
|
||||
|
||||
6. The limit of 99 on the number of capturing subpatterns has been removed.
|
||||
The new limit is 65535, which I hope will not be a "real" limit.
|
||||
|
||||
|
||||
Release 3.3 01-Aug-00
|
||||
---------------------
|
||||
|
||||
There is some support for UTF-8 character strings. This is incomplete and
|
||||
experimental. The documentation describes what is and what is not implemented.
|
||||
Otherwise, this is just a bug-fixing release.
|
||||
|
||||
|
||||
Release 3.0 01-Feb-00
|
||||
---------------------
|
||||
|
||||
1. A "configure" script is now used to configure PCRE for Unix systems. It
|
||||
builds a Makefile, a config.h file, and the pcre-config script.
|
||||
|
||||
2. PCRE is built as a shared library by default.
|
||||
|
||||
3. There is support for POSIX classes such as [:alpha:].
|
||||
|
||||
5. There is an experimental recursion feature.
|
||||
|
||||
----------------------------------------------------------------------------
|
||||
IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00
|
||||
|
||||
Please note that there has been a change in the API such that a larger
|
||||
ovector is required at matching time, to provide some additional workspace.
|
||||
The new man page has details. This change was necessary in order to support
|
||||
some of the new functionality in Perl 5.005.
|
||||
|
||||
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00
|
||||
|
||||
Another (I hope this is the last!) change has been made to the API for the
|
||||
pcre_compile() function. An additional argument has been added to make it
|
||||
possible to pass over a pointer to character tables built in the current
|
||||
locale by pcre_maketables(). To use the default tables, this new arguement
|
||||
should be passed as NULL.
|
||||
|
||||
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05
|
||||
|
||||
Yet another (and again I hope this really is the last) change has been made
|
||||
to the API for the pcre_exec() function. An additional argument has been
|
||||
added to make it possible to start the match other than at the start of the
|
||||
subject string. This is important if there are lookbehinds. The new man
|
||||
page has the details, but you just want to convert existing programs, all
|
||||
you need to do is to stick in a new fifth argument to pcre_exec(), with a
|
||||
value of zero. For example, change
|
||||
|
||||
pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize)
|
||||
to
|
||||
pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize)
|
||||
|
||||
****
|
|
@ -0,0 +1,269 @@
|
|||
Compiling PCRE on non-Unix systems
|
||||
----------------------------------
|
||||
|
||||
See below for comments on Cygwin or MinGW and OpenVMS usage. I (Philip Hazel)
|
||||
have no knowledge of Windows or VMS sytems and how their libraries work. The
|
||||
items in the PCRE Makefile that relate to anything other than Unix-like systems
|
||||
have been contributed by PCRE users. There are some other comments and files in
|
||||
the Contrib directory on the ftp site that you may find useful. See
|
||||
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
|
||||
|
||||
If you want to compile PCRE for a non-Unix system (or perhaps, more strictly,
|
||||
for a system that does not support "configure" and "make" files), note that
|
||||
the basic PCRE library consists entirely of code written in Standard C, and so
|
||||
should compile successfully on any system that has a Standard C compiler and
|
||||
library. The C++ wrapper functions are a separate issue (see below).
|
||||
|
||||
|
||||
GENERIC INSTRUCTIONS FOR THE C LIBRARY
|
||||
|
||||
The following are generic comments about building PCRE. The interspersed
|
||||
indented commands are suggestions from Mark Tetrode as to which commands you
|
||||
might use on a Windows system to build a static library.
|
||||
|
||||
(1) Copy or rename the file config.in as config.h, and change the macros that
|
||||
define HAVE_STRERROR and HAVE_MEMMOVE to define them as 1 rather than 0.
|
||||
Unfortunately, because of the way Unix autoconf works, the default setting has
|
||||
to be 0. You may also want to make changes to other macros in config.h. In
|
||||
particular, if you want to force a specific value for newline, you can define
|
||||
the NEWLINE macro. The default is to use '\n', thereby using whatever value
|
||||
your compiler gives to '\n'.
|
||||
|
||||
rem Mark Tetrode's commands
|
||||
copy config.in config.h
|
||||
rem Use write, because notepad cannot handle UNIX files. Change values.
|
||||
write config.h
|
||||
|
||||
(2) Compile dftables.c as a stand-alone program, and then run it with
|
||||
the single argument "pcre_chartables.c". This generates a set of standard
|
||||
character tables and writes them to that file.
|
||||
|
||||
rem Mark Tetrode's commands
|
||||
rem Compile & run
|
||||
cl -DSUPPORT_UTF8 -DSUPPORT_UCP dftables.c
|
||||
dftables.exe pcre_chartables.c
|
||||
|
||||
(3) Compile the following source files:
|
||||
|
||||
pcre_chartables.c
|
||||
pcre_compile.c
|
||||
pcre_config.c
|
||||
pcre_dfa_exec.c
|
||||
pcre_exec.c
|
||||
pcre_fullinfo.c
|
||||
pcre_get.c
|
||||
pcre_globals.c
|
||||
pcre_info.c
|
||||
pcre_maketables.c
|
||||
pcre_ord2utf8.c
|
||||
pcre_refcount.c
|
||||
pcre_study.c
|
||||
pcre_tables.c
|
||||
pcre_try_flipped.c
|
||||
pcre_ucp_searchfuncs.c
|
||||
pcre_valid_utf8.c
|
||||
pcre_version.c
|
||||
pcre_xclass.c
|
||||
|
||||
and link them all together into an object library in whichever form your system
|
||||
keeps such libraries. This is the pcre C library. If your system has static and
|
||||
shared libraries, you may have to do this once for each type.
|
||||
|
||||
rem These comments are out-of-date, referring to a previous release which
|
||||
rem had fewer source files. Replace with the file names from above.
|
||||
rem Mark Tetrode's commands, for a static library
|
||||
rem Compile & lib
|
||||
cl -DSUPPORT_UTF8 -DSUPPORT_UCP -DPOSIX_MALLOC_THRESHOLD=10 /c maketables.c get.c study.c pcre.c
|
||||
lib /OUT:pcre.lib maketables.obj get.obj study.obj pcre.obj
|
||||
|
||||
(4) Similarly, compile pcreposix.c and link it (on its own) as the pcreposix
|
||||
library.
|
||||
|
||||
rem Mark Tetrode's commands, for a static library
|
||||
rem Compile & lib
|
||||
cl -DSUPPORT_UTF8 -DSUPPORT_UCP -DPOSIX_MALLOC_THRESHOLD=10 /c pcreposix.c
|
||||
lib /OUT:pcreposix.lib pcreposix.obj
|
||||
|
||||
(5) Compile the test program pcretest.c. This needs the functions in the
|
||||
pcre and pcreposix libraries when linking.
|
||||
|
||||
rem Mark Tetrode's commands
|
||||
rem compile & link
|
||||
cl /F0x400000 pcretest.c pcre.lib pcreposix.lib
|
||||
|
||||
(6) Run pcretest on the testinput files in the testdata directory, and check
|
||||
that the output matches the corresponding testoutput files. You must use the
|
||||
-i option when checking testinput2. Note that the supplied files are in Unix
|
||||
format, with just LF characters as line terminators. You may need to edit them
|
||||
to change this if your system uses a different convention.
|
||||
|
||||
rem Mark Tetrode's commands
|
||||
pcretest testdata\testinput1 testdata\myoutput1
|
||||
windiff testdata\testoutput1 testdata\myoutput1
|
||||
pcretest -i testdata\testinput2 testdata\myoutput2
|
||||
windiff testdata\testoutput2 testdata\myoutput2
|
||||
pcretest testdata\testinput3 testdata\myoutput3
|
||||
windiff testdata\testoutput3 testdata\myoutput3
|
||||
pcretest testdata\testinput4 testdata\myoutput4
|
||||
windiff testdata\testoutput4 testdata\myoutput4
|
||||
pcretest testdata\testinput5 testdata\myoutput5
|
||||
windiff testdata\testoutput5 testdata\myoutput5
|
||||
pcretest testdata\testinput6 testdata\myoutput6
|
||||
windiff testdata\testoutput6 testdata\myoutput6
|
||||
|
||||
Note that there are now three more tests (7, 8, 9) that did not exist when Mark
|
||||
wrote those comments. The test the new pcre_dfa_exec() function.
|
||||
|
||||
(7) If you want to use the pcregrep command, compile and link pcregrep.c; it
|
||||
uses only the basic PCRE library.
|
||||
|
||||
|
||||
THE C++ WRAPPER FUNCTIONS
|
||||
|
||||
The PCRE distribution now contains some C++ wrapper functions and tests,
|
||||
contributed by Google Inc. On a system that can use "configure" and "make",
|
||||
the functions are automatically built into a library called pcrecpp. It should
|
||||
be straightforward to compile the .cc files manually on other systems. The
|
||||
files called xxx_unittest.cc are test programs for each of the corresponding
|
||||
xxx.cc files.
|
||||
|
||||
|
||||
FURTHER REMARKS
|
||||
|
||||
If you have a system without "configure" but where you can use a Makefile, edit
|
||||
Makefile.in to create Makefile, substituting suitable values for the variables
|
||||
at the head of the file.
|
||||
|
||||
Some help in building a Win32 DLL of PCRE in GnuWin32 environments was
|
||||
contributed by Paul Sokolovsky. These environments are Mingw32
|
||||
(http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and CygWin
|
||||
(http://sourceware.cygnus.com/cygwin/). Paul comments:
|
||||
|
||||
For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get
|
||||
pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically
|
||||
linked pgrep and pcretest. If you have /bin/sh, run RunTest (three
|
||||
main test go ok, locale not supported).
|
||||
|
||||
Changes to do MinGW with autoconf 2.50 were supplied by Fred Cox
|
||||
<sailorFred@yahoo.com>, who comments as follows:
|
||||
|
||||
If you are using the PCRE DLL, the normal Unix style configure && make &&
|
||||
make check && make install should just work[*]. If you want to statically
|
||||
link against the .a file, you must define PCRE_STATIC before including
|
||||
pcre.h, otherwise the pcre_malloc and pcre_free exported functions will be
|
||||
declared __declspec(dllimport), with hilarious results. See the configure.in
|
||||
and pcretest.c for how it is done for the static test.
|
||||
|
||||
Also, there will only be a libpcre.la, not a libpcreposix.la, as you
|
||||
would expect from the Unix version. The single DLL includes the pcreposix
|
||||
interface.
|
||||
|
||||
[*] But note that the supplied test files are in Unix format, with just LF
|
||||
characters as line terminators. You will have to edit them to change to CR LF
|
||||
terminators.
|
||||
|
||||
A script for building PCRE using Borland's C++ compiler for use with VPASCAL
|
||||
was contributed by Alexander Tokarev. It is called makevp.bat.
|
||||
|
||||
These are some further comments about Win32 builds from Mark Evans. They
|
||||
were contributed before Fred Cox's changes were made, so it is possible that
|
||||
they may no longer be relevant.
|
||||
|
||||
"The documentation for Win32 builds is a bit shy. Under MSVC6 I
|
||||
followed their instructions to the letter, but there were still
|
||||
some things missing.
|
||||
|
||||
(1) Must #define STATIC for entire project if linking statically.
|
||||
(I see no reason to use DLLs for code this compact.) This of
|
||||
course is a project setting in MSVC under Preprocessor.
|
||||
|
||||
(2) Missing some #ifdefs relating to the function pointers
|
||||
pcre_malloc and pcre_free. See my solution below. (The stubs
|
||||
may not be mandatory but they made me feel better.)"
|
||||
|
||||
=========================
|
||||
#ifdef _WIN32
|
||||
#include <malloc.h>
|
||||
|
||||
void* malloc_stub(size_t N)
|
||||
{ return malloc(N); }
|
||||
void free_stub(void* p)
|
||||
{ free(p); }
|
||||
void *(*pcre_malloc)(size_t) = &malloc_stub;
|
||||
void (*pcre_free)(void *) = &free_stub;
|
||||
|
||||
#else
|
||||
|
||||
void *(*pcre_malloc)(size_t) = malloc;
|
||||
void (*pcre_free)(void *) = free;
|
||||
|
||||
#endif
|
||||
=========================
|
||||
|
||||
|
||||
BUILDING PCRE ON OPENVMS
|
||||
|
||||
Dan Mooney sent the following comments about building PCRE on OpenVMS. They
|
||||
relate to an older version of PCRE that used fewer source files, so the exact
|
||||
commands will need changing. See the current list of source files above.
|
||||
|
||||
"It was quite easy to compile and link the library. I don't have a formal
|
||||
make file but the attached file [reproduced below] contains the OpenVMS DCL
|
||||
commands I used to build the library. I had to add #define
|
||||
POSIX_MALLOC_THRESHOLD 10 to pcre.h since it was not defined anywhere.
|
||||
|
||||
The library was built on:
|
||||
O/S: HP OpenVMS v7.3-1
|
||||
Compiler: Compaq C v6.5-001-48BCD
|
||||
Linker: vA13-01
|
||||
|
||||
The test results did not match 100% due to the issues you mention in your
|
||||
documentation regarding isprint(), iscntrl(), isgraph() and ispunct(). I
|
||||
modified some of the character tables temporarily and was able to get the
|
||||
results to match. Tests using the fr locale did not match since I don't have
|
||||
that locale loaded. The study size was always reported to be 3 less than the
|
||||
value in the standard test output files."
|
||||
|
||||
=========================
|
||||
$! This DCL procedure builds PCRE on OpenVMS
|
||||
$!
|
||||
$! I followed the instructions in the non-unix-use file in the distribution.
|
||||
$!
|
||||
$ COMPILE == "CC/LIST/NOMEMBER_ALIGNMENT/PREFIX_LIBRARY_ENTRIES=ALL_ENTRIES
|
||||
$ COMPILE DFTABLES.C
|
||||
$ LINK/EXE=DFTABLES.EXE DFTABLES.OBJ
|
||||
$ RUN DFTABLES.EXE/OUTPUT=CHARTABLES.C
|
||||
$ COMPILE MAKETABLES.C
|
||||
$ COMPILE GET.C
|
||||
$ COMPILE STUDY.C
|
||||
$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
|
||||
$! did not seem to be defined anywhere.
|
||||
$! I edited pcre.h and added #DEFINE SUPPORT_UTF8 to enable UTF8 support.
|
||||
$ COMPILE PCRE.C
|
||||
$ LIB/CREATE PCRE MAKETABLES.OBJ, GET.OBJ, STUDY.OBJ, PCRE.OBJ
|
||||
$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
|
||||
$! did not seem to be defined anywhere.
|
||||
$ COMPILE PCREPOSIX.C
|
||||
$ LIB/CREATE PCREPOSIX PCREPOSIX.OBJ
|
||||
$ COMPILE PCRETEST.C
|
||||
$ LINK/EXE=PCRETEST.EXE PCRETEST.OBJ, PCRE/LIB, PCREPOSIX/LIB
|
||||
$! C programs that want access to command line arguments must be
|
||||
$! defined as a symbol
|
||||
$ PCRETEST :== "$ SYS$ROADSUSERS:[DMOONEY.REGEXP]PCRETEST.EXE"
|
||||
$! Arguments must be enclosed in quotes.
|
||||
$ PCRETEST "-C"
|
||||
$! Test results:
|
||||
$!
|
||||
$! The test results did not match 100%. The functions isprint(), iscntrl(),
|
||||
$! isgraph() and ispunct() on OpenVMS must not produce the same results
|
||||
$! as the system that built the test output files provided with the
|
||||
$! distribution.
|
||||
$!
|
||||
$! The study size did not match and was always 3 less on OpenVMS.
|
||||
$!
|
||||
$! Locale could not be set to fr
|
||||
$!
|
||||
=========================
|
||||
|
||||
****
|
|
@ -0,0 +1,528 @@
|
|||
README file for PCRE (Perl-compatible regular expression library)
|
||||
-----------------------------------------------------------------
|
||||
|
||||
The latest release of PCRE is always available from
|
||||
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
|
||||
|
||||
Please read the NEWS file if you are upgrading from a previous release.
|
||||
|
||||
|
||||
The PCRE APIs
|
||||
-------------
|
||||
|
||||
PCRE is written in C, and it has its own API. The distribution now includes a
|
||||
set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
|
||||
for details).
|
||||
|
||||
Also included are a set of C wrapper functions that are based on the POSIX
|
||||
API. These end up in the library called libpcreposix. Note that this just
|
||||
provides a POSIX calling interface to PCRE: the regular expressions themselves
|
||||
still follow Perl syntax and semantics. The header file for the POSIX-style
|
||||
functions is called pcreposix.h. The official POSIX name is regex.h, but I
|
||||
didn't want to risk possible problems with existing files of that name by
|
||||
distributing it that way. To use it with an existing program that uses the
|
||||
POSIX API, it will have to be renamed or pointed at by a link.
|
||||
|
||||
If you are using the POSIX interface to PCRE and there is already a POSIX regex
|
||||
library installed on your system, you must take care when linking programs to
|
||||
ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
|
||||
up the "real" POSIX functions of the same name.
|
||||
|
||||
|
||||
Documentation for PCRE
|
||||
----------------------
|
||||
|
||||
If you install PCRE in the normal way, you will end up with an installed set of
|
||||
man pages whose names all start with "pcre". The one that is just called "pcre"
|
||||
lists all the others. In addition to these man pages, the PCRE documentation is
|
||||
supplied in two other forms; however, as there is no standard place to install
|
||||
them, they are left in the doc directory of the unpacked source distribution.
|
||||
These forms are:
|
||||
|
||||
1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
|
||||
first of these is a concatenation of the text forms of all the section 3
|
||||
man pages except those that summarize individual functions. The other two
|
||||
are the text forms of the section 1 man pages for the pcregrep and
|
||||
pcretest commands. Text forms are provided for ease of scanning with text
|
||||
editors or similar tools.
|
||||
|
||||
2. A subdirectory called doc/html contains all the documentation in HTML
|
||||
form, hyperlinked in various ways, and rooted in a file called
|
||||
doc/index.html.
|
||||
|
||||
|
||||
Contributions by users of PCRE
|
||||
------------------------------
|
||||
|
||||
You can find contributions from PCRE users in the directory
|
||||
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
|
||||
|
||||
where there is also a README file giving brief descriptions of what they are.
|
||||
Several of them provide support for compiling PCRE on various flavours of
|
||||
Windows systems (I myself do not use Windows). Some are complete in themselves;
|
||||
others are pointers to URLs containing relevant files.
|
||||
|
||||
|
||||
Building PCRE on a Unix-like system
|
||||
-----------------------------------
|
||||
|
||||
If you are using HP's ANSI C++ compiler (aCC), please see the special note
|
||||
in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
|
||||
|
||||
To build PCRE on a Unix-like system, first run the "configure" command from the
|
||||
PCRE distribution directory, with your current directory set to the directory
|
||||
where you want the files to be created. This command is a standard GNU
|
||||
"autoconf" configuration script, for which generic instructions are supplied in
|
||||
INSTALL.
|
||||
|
||||
Most commonly, people build PCRE within its own distribution directory, and in
|
||||
this case, on many systems, just running "./configure" is sufficient, but the
|
||||
usual methods of changing standard defaults are available. For example:
|
||||
|
||||
CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
|
||||
|
||||
specifies that the C compiler should be run with the flags '-O2 -Wall' instead
|
||||
of the default, and that "make install" should install PCRE under /opt/local
|
||||
instead of the default /usr/local.
|
||||
|
||||
If you want to build in a different directory, just run "configure" with that
|
||||
directory as current. For example, suppose you have unpacked the PCRE source
|
||||
into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
|
||||
|
||||
cd /build/pcre/pcre-xxx
|
||||
/source/pcre/pcre-xxx/configure
|
||||
|
||||
PCRE is written in C and is normally compiled as a C library. However, it is
|
||||
possible to build it as a C++ library, though the provided building apparatus
|
||||
does not have any features to support this.
|
||||
|
||||
There are some optional features that can be included or omitted from the PCRE
|
||||
library. You can read more about them in the pcrebuild man page.
|
||||
|
||||
. If you want to suppress the building of the C++ wrapper library, you can add
|
||||
--disable-cpp to the "configure" command. Otherwise, when "configure" is run,
|
||||
will try to find a C++ compiler and C++ header files, and if it succeeds, it
|
||||
will try to build the C++ wrapper.
|
||||
|
||||
. If you want to make use of the support for UTF-8 character strings in PCRE,
|
||||
you must add --enable-utf8 to the "configure" command. Without it, the code
|
||||
for handling UTF-8 is not included in the library. (Even when included, it
|
||||
still has to be enabled by an option at run time.)
|
||||
|
||||
. If, in addition to support for UTF-8 character strings, you want to include
|
||||
support for the \P, \p, and \X sequences that recognize Unicode character
|
||||
properties, you must add --enable-unicode-properties to the "configure"
|
||||
command. This adds about 30K to the size of the library (in the form of a
|
||||
property table); only the basic two-letter properties such as Lu are
|
||||
supported.
|
||||
|
||||
. You can build PCRE to recognize either CR or LF or the sequence CRLF as
|
||||
indicating the end of a line. Whatever you specify at build time is the
|
||||
default; the caller of PCRE can change the selection at run time. The default
|
||||
newline indicator is a single LF character (the Unix standard). You can
|
||||
specify the default newline indicator by adding --newline-is-cr or
|
||||
--newline-is-lf or --newline-is-crlf to the "configure" command,
|
||||
respectively.
|
||||
|
||||
. When called via the POSIX interface, PCRE uses malloc() to get additional
|
||||
storage for processing capturing parentheses if there are more than 10 of
|
||||
them. You can increase this threshold by setting, for example,
|
||||
|
||||
--with-posix-malloc-threshold=20
|
||||
|
||||
on the "configure" command.
|
||||
|
||||
. PCRE has a counter that can be set to limit the amount of resources it uses.
|
||||
If the limit is exceeded during a match, the match fails. The default is ten
|
||||
million. You can change the default by setting, for example,
|
||||
|
||||
--with-match-limit=500000
|
||||
|
||||
on the "configure" command. This is just the default; individual calls to
|
||||
pcre_exec() can supply their own value. There is discussion on the pcreapi
|
||||
man page.
|
||||
|
||||
. There is a separate counter that limits the depth of recursive function calls
|
||||
during a matching process. This also has a default of ten million, which is
|
||||
essentially "unlimited". You can change the default by setting, for example,
|
||||
|
||||
--with-match-limit-recursion=500000
|
||||
|
||||
Recursive function calls use up the runtime stack; running out of stack can
|
||||
cause programs to crash in strange ways. There is a discussion about stack
|
||||
sizes in the pcrestack man page.
|
||||
|
||||
. The default maximum compiled pattern size is around 64K. You can increase
|
||||
this by adding --with-link-size=3 to the "configure" command. You can
|
||||
increase it even more by setting --with-link-size=4, but this is unlikely
|
||||
ever to be necessary. If you build PCRE with an increased link size, test 2
|
||||
(and 5 if you are using UTF-8) will fail. Part of the output of these tests
|
||||
is a representation of the compiled pattern, and this changes with the link
|
||||
size.
|
||||
|
||||
. You can build PCRE so that its internal match() function that is called from
|
||||
pcre_exec() does not call itself recursively. Instead, it uses blocks of data
|
||||
from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
|
||||
to save data that would otherwise be saved on the stack. To build PCRE like
|
||||
this, use
|
||||
|
||||
--disable-stack-for-recursion
|
||||
|
||||
on the "configure" command. PCRE runs more slowly in this mode, but it may be
|
||||
necessary in environments with limited stack sizes. This applies only to the
|
||||
pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
|
||||
use deeply nested recursion.
|
||||
|
||||
The "configure" script builds eight files for the basic C library:
|
||||
|
||||
. Makefile is the makefile that builds the library
|
||||
. config.h contains build-time configuration options for the library
|
||||
. pcre-config is a script that shows the settings of "configure" options
|
||||
. libpcre.pc is data for the pkg-config command
|
||||
. libtool is a script that builds shared and/or static libraries
|
||||
. RunTest is a script for running tests on the library
|
||||
. RunGrepTest is a script for running tests on the pcregrep command
|
||||
|
||||
In addition, if a C++ compiler is found, the following are also built:
|
||||
|
||||
. pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
|
||||
. pcre_stringpiece.h is the header for the C++ "stringpiece" functions
|
||||
|
||||
The "configure" script also creates config.status, which is an executable
|
||||
script that can be run to recreate the configuration, and config.log, which
|
||||
contains compiler output from tests that "configure" runs.
|
||||
|
||||
Once "configure" has run, you can run "make". It builds two libraries, called
|
||||
libpcre and libpcreposix, a test program called pcretest, and the pcregrep
|
||||
command. If a C++ compiler was found on your system, it also builds the C++
|
||||
wrapper library, which is called libpcrecpp, and some test programs called
|
||||
pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
|
||||
|
||||
The command "make test" runs all the appropriate tests. Details of the PCRE
|
||||
tests are given in a separate section of this document, below.
|
||||
|
||||
You can use "make install" to copy the libraries, the public header files
|
||||
pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
|
||||
the C++ wrapper was built), and the man pages to appropriate live directories
|
||||
on your system, in the normal way.
|
||||
|
||||
If you want to remove PCRE from your system, you can run "make uninstall".
|
||||
This removes all the files that "make install" installed. However, it does not
|
||||
remove any directories, because these are often shared with other programs.
|
||||
|
||||
|
||||
Retrieving configuration information on Unix-like systems
|
||||
---------------------------------------------------------
|
||||
|
||||
Running "make install" also installs the command pcre-config, which can be used
|
||||
to recall information about the PCRE configuration and installation. For
|
||||
example:
|
||||
|
||||
pcre-config --version
|
||||
|
||||
prints the version number, and
|
||||
|
||||
pcre-config --libs
|
||||
|
||||
outputs information about where the library is installed. This command can be
|
||||
included in makefiles for programs that use PCRE, saving the programmer from
|
||||
having to remember too many details.
|
||||
|
||||
The pkg-config command is another system for saving and retrieving information
|
||||
about installed libraries. Instead of separate commands for each library, a
|
||||
single command is used. For example:
|
||||
|
||||
pkg-config --cflags pcre
|
||||
|
||||
The data is held in *.pc files that are installed in a directory called
|
||||
pkgconfig.
|
||||
|
||||
|
||||
Shared libraries on Unix-like systems
|
||||
-------------------------------------
|
||||
|
||||
The default distribution builds PCRE as shared libraries and static libraries,
|
||||
as long as the operating system supports shared libraries. Shared library
|
||||
support relies on the "libtool" script which is built as part of the
|
||||
"configure" process.
|
||||
|
||||
The libtool script is used to compile and link both shared and static
|
||||
libraries. They are placed in a subdirectory called .libs when they are newly
|
||||
built. The programs pcretest and pcregrep are built to use these uninstalled
|
||||
libraries (by means of wrapper scripts in the case of shared libraries). When
|
||||
you use "make install" to install shared libraries, pcregrep and pcretest are
|
||||
automatically re-built to use the newly installed shared libraries before being
|
||||
installed themselves. However, the versions left in the source directory still
|
||||
use the uninstalled libraries.
|
||||
|
||||
To build PCRE using static libraries only you must use --disable-shared when
|
||||
configuring it. For example:
|
||||
|
||||
./configure --prefix=/usr/gnu --disable-shared
|
||||
|
||||
Then run "make" in the usual way. Similarly, you can use --disable-static to
|
||||
build only shared libraries.
|
||||
|
||||
|
||||
Cross-compiling on a Unix-like system
|
||||
-------------------------------------
|
||||
|
||||
You can specify CC and CFLAGS in the normal way to the "configure" command, in
|
||||
order to cross-compile PCRE for some other host. However, during the building
|
||||
process, the dftables.c source file is compiled *and run* on the local host, in
|
||||
order to generate the default character tables (the chartables.c file). It
|
||||
therefore needs to be compiled with the local compiler, not the cross compiler.
|
||||
You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
|
||||
there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
|
||||
when calling the "configure" command. If they are not specified, they default
|
||||
to the values of CC and CFLAGS.
|
||||
|
||||
|
||||
Using HP's ANSI C++ compiler (aCC)
|
||||
----------------------------------
|
||||
|
||||
Unless C++ support is disabled by specifiying the "--disable-cpp" option of the
|
||||
"configure" script, you *must* include the "-AA" option in the CXXFLAGS
|
||||
environment variable in order for the C++ components to compile correctly.
|
||||
|
||||
Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
|
||||
needed libraries fail to get included when specifying the "-AA" compiler
|
||||
option. If you experience unresolved symbols when linking the C++ programs,
|
||||
use the workaround of specifying the following environment variable prior to
|
||||
running the "configure" script:
|
||||
|
||||
CXXLDFLAGS="-lstd_v2 -lCsup_v2"
|
||||
|
||||
|
||||
Building on non-Unix systems
|
||||
----------------------------
|
||||
|
||||
For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
|
||||
the system supports the use of "configure" and "make" you may be able to build
|
||||
PCRE in the same way as for Unix systems.
|
||||
|
||||
PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
|
||||
the details because I don't use those systems. It should be straightforward to
|
||||
build PCRE on any system that has a Standard C compiler, because it uses only
|
||||
Standard C functions.
|
||||
|
||||
|
||||
Testing PCRE
|
||||
------------
|
||||
|
||||
To test PCRE on a Unix system, run the RunTest script that is created by the
|
||||
configuring process. There is also a script called RunGrepTest that tests the
|
||||
options of the pcregrep command. If the C++ wrapper library is build, three
|
||||
test programs called pcrecpp_unittest, pcre_scanner_unittest, and
|
||||
pcre_stringpiece_unittest are provided.
|
||||
|
||||
Both the scripts and all the program tests are run if you obey "make runtest",
|
||||
"make check", or "make test". For other systems, see the instructions in
|
||||
NON-UNIX-USE.
|
||||
|
||||
The RunTest script runs the pcretest test program (which is documented in its
|
||||
own man page) on each of the testinput files (in the testdata directory) in
|
||||
turn, and compares the output with the contents of the corresponding testoutput
|
||||
file. A file called testtry is used to hold the main output from pcretest
|
||||
(testsavedregex is also used as a working file). To run pcretest on just one of
|
||||
the test files, give its number as an argument to RunTest, for example:
|
||||
|
||||
RunTest 2
|
||||
|
||||
The first file can also be fed directly into the perltest script to check that
|
||||
Perl gives the same results. The only difference you should see is in the first
|
||||
few lines, where the Perl version is given instead of the PCRE version.
|
||||
|
||||
The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
|
||||
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
|
||||
detection, and run-time flags that are specific to PCRE, as well as the POSIX
|
||||
wrapper API. It also uses the debugging flag to check some of the internals of
|
||||
pcre_compile().
|
||||
|
||||
If you build PCRE with a locale setting that is not the standard C locale, the
|
||||
character tables may be different (see next paragraph). In some cases, this may
|
||||
cause failures in the second set of tests. For example, in a locale where the
|
||||
isprint() function yields TRUE for characters in the range 128-255, the use of
|
||||
[:isascii:] inside a character class defines a different set of characters, and
|
||||
this shows up in this test as a difference in the compiled code, which is being
|
||||
listed for checking. Where the comparison test output contains [\x00-\x7f] the
|
||||
test will contain [\x00-\xff], and similarly in some other cases. This is not a
|
||||
bug in PCRE.
|
||||
|
||||
The third set of tests checks pcre_maketables(), the facility for building a
|
||||
set of character tables for a specific locale and using them instead of the
|
||||
default tables. The tests make use of the "fr_FR" (French) locale. Before
|
||||
running the test, the script checks for the presence of this locale by running
|
||||
the "locale" command. If that command fails, or if it doesn't include "fr_FR"
|
||||
in the list of available locales, the third test cannot be run, and a comment
|
||||
is output to say why. If running this test produces instances of the error
|
||||
|
||||
** Failed to set locale "fr_FR"
|
||||
|
||||
in the comparison output, it means that locale is not available on your system,
|
||||
despite being listed by "locale". This does not mean that PCRE is broken.
|
||||
|
||||
The fourth test checks the UTF-8 support. It is not run automatically unless
|
||||
PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
|
||||
running "configure". This file can be also fed directly to the perltest script,
|
||||
provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
|
||||
commented in the script, can be be used.)
|
||||
|
||||
The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
|
||||
features of PCRE that are not relevant to Perl.
|
||||
|
||||
The sixth and test checks the support for Unicode character properties. It it
|
||||
not run automatically unless PCRE is built with Unicode property support. To to
|
||||
this you must set --enable-unicode-properties when running "configure".
|
||||
|
||||
The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
|
||||
matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
|
||||
property support, respectively. The eighth and ninth tests are not run
|
||||
automatically unless PCRE is build with the relevant support.
|
||||
|
||||
|
||||
Character tables
|
||||
----------------
|
||||
|
||||
PCRE uses four tables for manipulating and identifying characters whose values
|
||||
are less than 256. The final argument of the pcre_compile() function is a
|
||||
pointer to a block of memory containing the concatenated tables. A call to
|
||||
pcre_maketables() can be used to generate a set of tables in the current
|
||||
locale. If the final argument for pcre_compile() is passed as NULL, a set of
|
||||
default tables that is built into the binary is used.
|
||||
|
||||
The source file called chartables.c contains the default set of tables. This is
|
||||
not supplied in the distribution, but is built by the program dftables
|
||||
(compiled from dftables.c), which uses the ANSI C character handling functions
|
||||
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
|
||||
sources. This means that the default C locale which is set for your system will
|
||||
control the contents of these default tables. You can change the default tables
|
||||
by editing chartables.c and then re-building PCRE. If you do this, you should
|
||||
probably also edit Makefile to ensure that the file doesn't ever get
|
||||
re-generated.
|
||||
|
||||
The first two 256-byte tables provide lower casing and case flipping functions,
|
||||
respectively. The next table consists of three 32-byte bit maps which identify
|
||||
digits, "word" characters, and white space, respectively. These are used when
|
||||
building 32-byte bit maps that represent character classes.
|
||||
|
||||
The final 256-byte table has bits indicating various character types, as
|
||||
follows:
|
||||
|
||||
1 white space character
|
||||
2 letter
|
||||
4 decimal digit
|
||||
8 hexadecimal digit
|
||||
16 alphanumeric or '_'
|
||||
128 regular expression metacharacter or binary zero
|
||||
|
||||
You should not alter the set of characters that contain the 128 bit, as that
|
||||
will cause PCRE to malfunction.
|
||||
|
||||
|
||||
Manifest
|
||||
--------
|
||||
|
||||
The distribution should contain the following files:
|
||||
|
||||
(A) The actual source files of the PCRE library functions and their
|
||||
headers:
|
||||
|
||||
dftables.c auxiliary program for building chartables.c
|
||||
|
||||
pcreposix.c )
|
||||
pcre_compile.c )
|
||||
pcre_config.c )
|
||||
pcre_dfa_exec.c )
|
||||
pcre_exec.c )
|
||||
pcre_fullinfo.c )
|
||||
pcre_get.c ) sources for the functions in the library,
|
||||
pcre_globals.c ) and some internal functions that they use
|
||||
pcre_info.c )
|
||||
pcre_maketables.c )
|
||||
pcre_ord2utf8.c )
|
||||
pcre_refcount.c )
|
||||
pcre_study.c )
|
||||
pcre_tables.c )
|
||||
pcre_try_flipped.c )
|
||||
pcre_ucp_searchfuncs.c)
|
||||
pcre_valid_utf8.c )
|
||||
pcre_version.c )
|
||||
pcre_xclass.c )
|
||||
ucptable.c )
|
||||
|
||||
pcre_printint.src ) debugging function that is #included in pcretest, and
|
||||
) can also be #included in pcre_compile()
|
||||
|
||||
pcre.h the public PCRE header file
|
||||
pcreposix.h header for the external POSIX wrapper API
|
||||
pcre_internal.h header for internal use
|
||||
ucp.h ) headers concerned with
|
||||
ucpinternal.h ) Unicode property handling
|
||||
config.in template for config.h, which is built by configure
|
||||
|
||||
pcrecpp.h the header file for the C++ wrapper
|
||||
pcrecpparg.h.in "source" for another C++ header file
|
||||
pcrecpp.cc )
|
||||
pcre_scanner.cc ) source for the C++ wrapper library
|
||||
|
||||
pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
|
||||
C++ stringpiece functions
|
||||
pcre_stringpiece.cc source for the C++ stringpiece functions
|
||||
|
||||
(B) Auxiliary files:
|
||||
|
||||
AUTHORS information about the author of PCRE
|
||||
ChangeLog log of changes to the code
|
||||
INSTALL generic installation instructions
|
||||
LICENCE conditions for the use of PCRE
|
||||
COPYING the same, using GNU's standard name
|
||||
Makefile.in template for Unix Makefile, which is built by configure
|
||||
NEWS important changes in this release
|
||||
NON-UNIX-USE notes on building PCRE on non-Unix systems
|
||||
README this file
|
||||
RunTest.in template for a Unix shell script for running tests
|
||||
RunGrepTest.in template for a Unix shell script for pcregrep tests
|
||||
config.guess ) files used by libtool,
|
||||
config.sub ) used only when building a shared library
|
||||
config.h.in "source" for the config.h header file
|
||||
configure a configuring shell script (built by autoconf)
|
||||
configure.ac the autoconf input used to build configure
|
||||
doc/Tech.Notes notes on the encoding
|
||||
doc/*.3 man page sources for the PCRE functions
|
||||
doc/*.1 man page sources for pcregrep and pcretest
|
||||
doc/html/* HTML documentation
|
||||
doc/pcre.txt plain text version of the man pages
|
||||
doc/pcretest.txt plain text documentation of test program
|
||||
doc/perltest.txt plain text documentation of Perl test program
|
||||
install-sh a shell script for installing files
|
||||
libpcre.pc.in "source" for libpcre.pc for pkg-config
|
||||
ltmain.sh file used to build a libtool script
|
||||
mkinstalldirs script for making install directories
|
||||
pcretest.c comprehensive test program
|
||||
pcredemo.c simple demonstration of coding calls to PCRE
|
||||
perltest Perl test program
|
||||
pcregrep.c source of a grep utility that uses PCRE
|
||||
pcre-config.in source of script which retains PCRE information
|
||||
pcrecpp_unittest.c )
|
||||
pcre_scanner_unittest.c ) test programs for the C++ wrapper
|
||||
pcre_stringpiece_unittest.c )
|
||||
testdata/testinput* test data for main library tests
|
||||
testdata/testoutput* expected test results
|
||||
testdata/grep* input and output for pcregrep tests
|
||||
|
||||
(C) Auxiliary files for Win32 DLL
|
||||
|
||||
libpcre.def
|
||||
libpcreposix.def
|
||||
|
||||
(D) Auxiliary file for VPASCAL
|
||||
|
||||
makevp.bat
|
||||
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
June 2006
|
|
@ -0,0 +1,208 @@
|
|||
#! /bin/sh
|
||||
|
||||
# This file is generated by configure from RunGrepTest.in. Make any changes
|
||||
# to that file.
|
||||
|
||||
echo "Testing pcregrep"
|
||||
./pcregrep -V
|
||||
|
||||
# Run pcregrep tests. The assumption is that the PCRE tests check the library
|
||||
# itself. What we are checking here is the file handling and options that are
|
||||
# supported by pcregrep.
|
||||
|
||||
cf=diff
|
||||
valgrind=
|
||||
if [ ! -d testdata ] ; then
|
||||
ln -s @top_srcdir@/testdata testdata
|
||||
fi
|
||||
testdata=./testdata
|
||||
|
||||
while [ $# -gt 0 ] ; do
|
||||
case $1 in
|
||||
valgrind) valgrind="valgrind -q --leak-check=no";;
|
||||
*) echo "Unknown argument $1"; exit 1;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
echo "---------------------------- Test 1 ------------------------------" >testtry
|
||||
$valgrind ./pcregrep PATTERN $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 2 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep '^PATTERN' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 3 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -in PATTERN $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 4 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -ic PATTERN $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 5 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -in PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 6 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -inh PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 7 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -il PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 8 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -l PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 9 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -q PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 10 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -q NEVER-PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 11 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -vn pattern $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 12 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -ix pattern $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 13 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -f$testdata/greplist $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 14 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -w pat $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 15 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep 'abc^*' $testdata/grepinput 2>>testtry >>testtry
|
||||
|
||||
echo "---------------------------- Test 16 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep abc $testdata/grepinput $testdata/nonexistfile 2>>testtry >>testtry
|
||||
|
||||
echo "---------------------------- Test 17 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -M 'the\noutput' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 18 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -Mn '(the\noutput|dog\.\n--)' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 19 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -Mix 'Pattern' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 20 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -Mixn 'complete pair\nof lines' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 21 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -nA3 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 22 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -nB3 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 23 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -C3 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 24 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -A9 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 25 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -nB9 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 26 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -A9 -B9 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 27 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -A10 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 28 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -nB10 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 29 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -C12 -B10 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 30 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -inB3 'pattern' $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 31 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -inA3 'pattern' $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 32 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -L 'fox' $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 33 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep 'fox' $testdata/grepnonexist >>testtry 2>&1
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 34 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -s 'fox' $testdata/grepnonexist >>testtry 2>&1
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 35 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -L -r --include=grepinputx 'fox' $testdata >>testtry
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 36 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -L -r --include=grepinput --exclude 'grepinput$' 'fox' $testdata >>testtry
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 37 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep '^(a+)*\d' $testdata/grepinput >>testtry 2>teststderr
|
||||
echo "RC=$?" >>testtry
|
||||
echo "======== STDERR ========" >>testtry
|
||||
cat teststderr >>testtry
|
||||
|
||||
echo "---------------------------- Test 38 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep '>\x00<' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 39 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -A1 'before the binary zero' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 40 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -B1 'after the binary zero' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 41 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -B1 -o '\w+ the binary zero' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 41 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -B1 -onH '\w+ the binary zero' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 42 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -on 'before|zero|after' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 43 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -on -e before -e zero -e after $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 44 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -on -f $testdata/greplist -e binary $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 45 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -e abc -e '(unclosed' $testdata/grepinput 2>>testtry >>testtry
|
||||
|
||||
echo "---------------------------- Test 46 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -Fx "AB.VE
|
||||
elephant" $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 47 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -F "AB.VE
|
||||
elephant" $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 48 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -F -e DATA -e "AB.VE
|
||||
elephant" $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 49 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep "^(abc|def|ghi|jkl)" $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 50 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -N CR "^(abc|def|ghi|jkl)" $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 51 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep --newline=crlf "^(abc|def|ghi|jkl)" $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 52 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep --newline=cr -F "def
jkl" $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 53 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep --newline=crlf -F "xxx
|
||||
jkl" $testdata/grepinputx >>testtry
|
||||
|
||||
# Now compare the results.
|
||||
|
||||
$cf testtry $testdata/grepoutput
|
||||
if [ $? != 0 ] ; then exit 1; else exit 0; fi
|
||||
|
||||
# End
|
|
@ -0,0 +1,258 @@
|
|||
#! /bin/sh
|
||||
|
||||
# This file is generated by configure from RunTest.in. Make any changes
|
||||
# to that file.
|
||||
|
||||
# Run PCRE tests
|
||||
|
||||
cf=diff
|
||||
valgrind=
|
||||
if [ ! -d testdata ] ; then
|
||||
ln -s @top_srcdir@/testdata testdata
|
||||
fi
|
||||
testdata=./testdata
|
||||
|
||||
|
||||
# Select which tests to run; if no selection, run all
|
||||
|
||||
do1=no
|
||||
do2=no
|
||||
do3=no
|
||||
do4=no
|
||||
do5=no
|
||||
do6=no
|
||||
do7=no
|
||||
do8=no
|
||||
do9=no
|
||||
|
||||
while [ $# -gt 0 ] ; do
|
||||
case $1 in
|
||||
1) do1=yes;;
|
||||
2) do2=yes;;
|
||||
3) do3=yes;;
|
||||
4) do4=yes;;
|
||||
5) do5=yes;;
|
||||
6) do6=yes;;
|
||||
7) do7=yes;;
|
||||
8) do8=yes;;
|
||||
9) do9=yes;;
|
||||
valgrind) valgrind="valgrind -q";;
|
||||
*) echo "Unknown test number $1"; exit 1;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
if [ "@LINK_SIZE@" != "" -a "@LINK_SIZE@" != "-DLINK_SIZE=2" ] ; then
|
||||
if [ $do2 = yes ] ; then
|
||||
echo "Can't run test 2 with an internal link size other than 2"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do5 = yes ] ; then
|
||||
echo "Can't run test 5 with an internal link size other than 2"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do6 = yes ] ; then
|
||||
echo "Can't run test 6 with an internal link size other than 2"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ "@UTF8@" = "" ] ; then
|
||||
if [ $do4 = yes ] ; then
|
||||
echo "Can't run test 4 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do5 = yes ] ; then
|
||||
echo "Can't run test 5 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do6 = yes ] ; then
|
||||
echo "Can't run test 6 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do8 = yes ] ; then
|
||||
echo "Can't run test 8 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do9 = yes ] ; then
|
||||
echo "Can't run test 9 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ "@UCP@" = "" ] ; then
|
||||
if [ $do6 = yes ] ; then
|
||||
echo "Can't run test 6 because Unicode property support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do9 = yes ] ; then
|
||||
echo "Can't run test 9 because Unicode property support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ $do1 = no -a $do2 = no -a $do3 = no -a $do4 = no -a \
|
||||
$do5 = no -a $do6 = no -a $do7 = no -a $do8 = no -a \
|
||||
$do9 = no ] ; then
|
||||
do1=yes
|
||||
do2=yes
|
||||
do3=yes
|
||||
if [ "@UTF8@" != "" ] ; then do4=yes; fi
|
||||
if [ "@UTF8@" != "" ] ; then do5=yes; fi
|
||||
if [ "@UTF8@" != "" -a "@UCP@" != "" ] ; then do6=yes; fi
|
||||
do7=yes
|
||||
if [ "@UTF8@" != "" ] ; then do8=yes; fi
|
||||
if [ "@UTF8@" != "" -a "@UCP@" != "" ] ; then do9=yes; fi
|
||||
fi
|
||||
|
||||
# Show which release
|
||||
|
||||
./pcretest /dev/null
|
||||
|
||||
# Primary test, Perl-compatible
|
||||
|
||||
if [ $do1 = yes ] ; then
|
||||
echo "Test 1: main functionality (Perl compatible)"
|
||||
$valgrind ./pcretest -q $testdata/testinput1 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput1
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
# PCRE tests that are not Perl-compatible - API & error tests, mostly
|
||||
|
||||
if [ $do2 = yes ] ; then
|
||||
if [ "@LINK_SIZE@" = "" -o "@LINK_SIZE@" = "-DLINK_SIZE=2" ] ; then
|
||||
echo "Test 2: API and error handling (not Perl compatible)"
|
||||
$valgrind ./pcretest -q -i $testdata/testinput2 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput2
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
else
|
||||
echo Test 2 skipped for link size other than 2 \(@LINK_SIZE@\)
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
# Locale-specific tests, provided the "fr_FR" locale is available
|
||||
|
||||
if [ $do3 = yes ] ; then
|
||||
locale -a | grep '^fr_FR$' >/dev/null
|
||||
if [ $? -eq 0 ] ; then
|
||||
echo "Test 3: locale-specific features (using 'fr_FR' locale)"
|
||||
$valgrind ./pcretest -q $testdata/testinput3 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput3
|
||||
if [ $? != 0 ] ; then
|
||||
echo " "
|
||||
echo "Locale test did not run entirely successfully."
|
||||
echo "This usually means that there is a problem with the locale"
|
||||
echo "settings rather than a bug in PCRE."
|
||||
else
|
||||
echo "OK"
|
||||
fi
|
||||
echo " "
|
||||
else exit 1
|
||||
fi
|
||||
else
|
||||
echo "Cannot test locale-specific features - 'fr_FR' locale not found,"
|
||||
echo "or the \"locale\" command is not available to check for it."
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
# Additional tests for UTF8 support
|
||||
|
||||
if [ $do4 = yes ] ; then
|
||||
echo "Test 4: UTF-8 support (Perl compatible)"
|
||||
$valgrind ./pcretest -q $testdata/testinput4 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput4
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
if [ $do5 = yes ] ; then
|
||||
if [ "@LINK_SIZE@" = "" -o "@LINK_SIZE@" = "-DLINK_SIZE=2" ] ; then
|
||||
echo "Test 5: API and internals for UTF-8 support (not Perl compatible)"
|
||||
$valgrind ./pcretest -q $testdata/testinput5 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput5
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
else
|
||||
echo Test 5 skipped for link size other than 2 \(@LINK_SIZE@\)
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ $do6 = yes ] ; then
|
||||
if [ "@LINK_SIZE@" = "" -o "@LINK_SIZE@" = "-DLINK_SIZE=2" ] ; then
|
||||
echo "Test 6: Unicode property support"
|
||||
$valgrind ./pcretest -q $testdata/testinput6 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput6
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
else
|
||||
echo Test 6 skipped for link size other than 2 \(@LINK_SIZE@\)
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
# Tests for DFA matching support
|
||||
|
||||
if [ $do7 = yes ] ; then
|
||||
echo "Test 7: DFA matching"
|
||||
$valgrind ./pcretest -q -dfa $testdata/testinput7 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput7
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
if [ $do8 = yes ] ; then
|
||||
echo "Test 8: DFA matching with UTF-8"
|
||||
$valgrind ./pcretest -q -dfa $testdata/testinput8 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput8
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
if [ $do9 = yes ] ; then
|
||||
echo "Test 9: DFA matching with Unicode properties"
|
||||
$valgrind ./pcretest -q -dfa $testdata/testinput9 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput9
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
# End
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,143 @@
|
|||
|
||||
/* On Unix-like systems config.in is converted by "configure" into config.h.
|
||||
Some other environments also support the use of "configure". PCRE is written in
|
||||
Standard C, but there are a few non-standard things it can cope with, allowing
|
||||
it to run on SunOS4 and other "close to standard" systems.
|
||||
|
||||
On a non-Unix-like system you should just copy this file into config.h, and set
|
||||
up the macros the way you need them. You should normally change the definitions
|
||||
of HAVE_STRERROR and HAVE_MEMMOVE to 1. Unfortunately, because of the way
|
||||
autoconf works, these cannot be made the defaults. If your system has bcopy()
|
||||
and not memmove(), change the definition of HAVE_BCOPY instead of HAVE_MEMMOVE.
|
||||
If your system has neither bcopy() nor memmove(), leave them both as 0; an
|
||||
emulation function will be used. */
|
||||
|
||||
/* If you are compiling for a system that uses EBCDIC instead of ASCII
|
||||
character codes, define this macro as 1. On systems that can use "configure",
|
||||
this can be done via --enable-ebcdic. */
|
||||
|
||||
#ifndef EBCDIC
|
||||
#define EBCDIC 0
|
||||
#endif
|
||||
|
||||
/* If you are compiling for a system other than a Unix-like system or Win32,
|
||||
and it needs some magic to be inserted before the definition of a function that
|
||||
is exported by the library, define this macro to contain the relevant magic. If
|
||||
you do not define this macro, it defaults to "extern" for a C compiler and
|
||||
"extern C" for a C++ compiler on non-Win32 systems. This macro apears at the
|
||||
start of every exported function that is part of the external API. It does not
|
||||
appear on functions that are "external" in the C sense, but which are internal
|
||||
to the library. */
|
||||
|
||||
/* #define PCRE_DATA_SCOPE */
|
||||
|
||||
/* Define the following macro to empty if the "const" keyword does not work. */
|
||||
|
||||
#undef const
|
||||
|
||||
/* Define the following macro to "unsigned" if <stddef.h> does not define
|
||||
size_t. */
|
||||
|
||||
#undef size_t
|
||||
|
||||
/* The following two definitions are mainly for the benefit of SunOS4, which
|
||||
does not have the strerror() or memmove() functions that should be present in
|
||||
all Standard C libraries. The macros HAVE_STRERROR and HAVE_MEMMOVE should
|
||||
normally be defined with the value 1 for other systems, but unfortunately we
|
||||
cannot make this the default because "configure" files generated by autoconf
|
||||
will only change 0 to 1; they won't change 1 to 0 if the functions are not
|
||||
found. */
|
||||
|
||||
#define HAVE_STRERROR 0
|
||||
#define HAVE_MEMMOVE 0
|
||||
|
||||
/* There are some non-Unix-like systems that don't even have bcopy(). If this
|
||||
macro is false, an emulation is used. If HAVE_MEMMOVE is set to 1, the value of
|
||||
HAVE_BCOPY is not relevant. */
|
||||
|
||||
#define HAVE_BCOPY 0
|
||||
|
||||
/* The value of NEWLINE determines the newline character. The default is to
|
||||
leave it up to the compiler, but some sites want to force a particular value.
|
||||
On Unix-like systems, "configure" can be used to override this default. */
|
||||
|
||||
#ifndef NEWLINE
|
||||
#define NEWLINE '\n'
|
||||
#endif
|
||||
|
||||
/* The value of LINK_SIZE determines the number of bytes used to store links as
|
||||
offsets within the compiled regex. The default is 2, which allows for compiled
|
||||
patterns up to 64K long. This covers the vast majority of cases. However, PCRE
|
||||
can also be compiled to use 3 or 4 bytes instead. This allows for longer
|
||||
patterns in extreme cases. On systems that support it, "configure" can be used
|
||||
to override this default. */
|
||||
|
||||
#ifndef LINK_SIZE
|
||||
#define LINK_SIZE 2
|
||||
#endif
|
||||
|
||||
/* When calling PCRE via the POSIX interface, additional working storage is
|
||||
required for holding the pointers to capturing substrings because PCRE requires
|
||||
three integers per substring, whereas the POSIX interface provides only two. If
|
||||
the number of expected substrings is small, the wrapper function uses space on
|
||||
the stack, because this is faster than using malloc() for each call. The
|
||||
threshold above which the stack is no longer used is defined by POSIX_MALLOC_
|
||||
THRESHOLD. On systems that support it, "configure" can be used to override this
|
||||
default. */
|
||||
|
||||
#ifndef POSIX_MALLOC_THRESHOLD
|
||||
#define POSIX_MALLOC_THRESHOLD 10
|
||||
#endif
|
||||
|
||||
/* PCRE uses recursive function calls to handle backtracking while matching.
|
||||
This can sometimes be a problem on systems that have stacks of limited size.
|
||||
Define NO_RECURSE to get a version that doesn't use recursion in the match()
|
||||
function; instead it creates its own stack by steam using pcre_recurse_malloc()
|
||||
to obtain memory from the heap. For more detail, see the comments and other
|
||||
stuff just above the match() function. On systems that support it, "configure"
|
||||
can be used to set this in the Makefile (use --disable-stack-for-recursion). */
|
||||
|
||||
/* #define NO_RECURSE */
|
||||
|
||||
/* The value of MATCH_LIMIT determines the default number of times the internal
|
||||
match() function can be called during a single execution of pcre_exec(). There
|
||||
is a runtime interface for setting a different limit. The limit exists in order
|
||||
to catch runaway regular expressions that take for ever to determine that they
|
||||
do not match. The default is set very large so that it does not accidentally
|
||||
catch legitimate cases. On systems that support it, "configure" can be used to
|
||||
override this default default. */
|
||||
|
||||
#ifndef MATCH_LIMIT
|
||||
#define MATCH_LIMIT 10000000
|
||||
#endif
|
||||
|
||||
/* The above limit applies to all calls of match(), whether or not they
|
||||
increase the recursion depth. In some environments it is desirable to limit the
|
||||
depth of recursive calls of match() more strictly, in order to restrict the
|
||||
maximum amount of stack (or heap, if NO_RECURSE is defined) that is used. The
|
||||
value of MATCH_LIMIT_RECURSION applies only to recursive calls of match(). To
|
||||
have any useful effect, it must be less than the value of MATCH_LIMIT. There is
|
||||
a runtime method for setting a different limit. On systems that support it,
|
||||
"configure" can be used to override this default default. */
|
||||
|
||||
#ifndef MATCH_LIMIT_RECURSION
|
||||
#define MATCH_LIMIT_RECURSION MATCH_LIMIT
|
||||
#endif
|
||||
|
||||
/* These three limits are parameterized just in case anybody ever wants to
|
||||
change them. Care must be taken if they are increased, because they guard
|
||||
against integer overflow caused by enormously large patterns. */
|
||||
|
||||
#ifndef MAX_NAME_SIZE
|
||||
#define MAX_NAME_SIZE 32
|
||||
#endif
|
||||
|
||||
#ifndef MAX_NAME_COUNT
|
||||
#define MAX_NAME_COUNT 10000
|
||||
#endif
|
||||
|
||||
#ifndef MAX_DUPLENGTH
|
||||
#define MAX_DUPLENGTH 30000
|
||||
#endif
|
||||
|
||||
/* End */
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,302 @@
|
|||
dnl Process this file with autoconf to produce a configure script.
|
||||
|
||||
dnl This configure.in file has been hacked around quite a lot as a result of
|
||||
dnl patches that various people have sent to me (PH). Sometimes the information
|
||||
dnl I get is contradictory. I've tried to put in comments that explain things,
|
||||
dnl but in some cases the information is second-hand and I have no way of
|
||||
dnl verifying it. I am not an autoconf or libtool expert!
|
||||
|
||||
dnl This is required at the start; the name is the name of a file
|
||||
dnl it should be seeing, to verify it is in the same directory.
|
||||
|
||||
AC_INIT(dftables.c)
|
||||
AC_CONFIG_SRCDIR([pcre.h])
|
||||
|
||||
dnl A safety precaution
|
||||
|
||||
AC_PREREQ(2.57)
|
||||
|
||||
dnl Arrange to build config.h from config.h.in.
|
||||
dnl Manual says this macro should come right after AC_INIT.
|
||||
AC_CONFIG_HEADER(config.h)
|
||||
|
||||
dnl Default values for miscellaneous macros
|
||||
|
||||
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=10
|
||||
|
||||
dnl Provide versioning information for libtool shared libraries that
|
||||
dnl are built by default on Unix systems.
|
||||
|
||||
PCRE_LIB_VERSION=0:1:0
|
||||
PCRE_POSIXLIB_VERSION=0:0:0
|
||||
PCRE_CPPLIB_VERSION=0:0:0
|
||||
|
||||
dnl Find the PCRE version from the pcre.h file. The PCRE_VERSION variable is
|
||||
dnl substituted in pcre-config.in.
|
||||
|
||||
PCRE_MAJOR=`grep '#define PCRE_MAJOR' ${srcdir}/pcre.h | cut -c 29-`
|
||||
PCRE_MINOR=`grep '#define PCRE_MINOR' ${srcdir}/pcre.h | cut -c 29-`
|
||||
PCRE_PRERELEASE=`grep '#define PCRE_PRERELEASE' ${srcdir}/pcre.h | cut -c 29-`
|
||||
PCRE_VERSION=${PCRE_MAJOR}.${PCRE_MINOR}${PCRE_PRERELEASE}
|
||||
|
||||
dnl Handle --disable-cpp
|
||||
|
||||
AC_ARG_ENABLE(cpp,
|
||||
[ --disable-cpp disable C++ support],
|
||||
want_cpp="$enableval", want_cpp=yes)
|
||||
|
||||
dnl Checks for programs.
|
||||
|
||||
AC_PROG_CC
|
||||
|
||||
dnl Test for C++ for the C++ wrapper libpcrecpp. It seems, however, that
|
||||
dnl AC_PROC_CXX will set $CXX to "g++" when no C++ compiler is installed, even
|
||||
dnl though that is completely bogus. (This may happen only on certain systems
|
||||
dnl with certain versions of autoconf, of course.) An attempt to include this
|
||||
dnl test inside a check for want_cpp was criticized by a libtool expert, who
|
||||
dnl tells me that it isn't allowed.
|
||||
|
||||
AC_PROG_CXX
|
||||
|
||||
dnl The icc compiler has the same options as gcc, so let the rest of the
|
||||
dnl configure script think it has gcc when setting up dnl options etc.
|
||||
dnl This is a nasty hack which no longer seems necessary with the update
|
||||
dnl to the latest libtool files, so I have commented it out.
|
||||
dnl
|
||||
dnl if test "$CC" = "icc" ; then GCC=yes ; fi
|
||||
|
||||
AC_PROG_INSTALL
|
||||
AC_LIBTOOL_WIN32_DLL
|
||||
AC_PROG_LIBTOOL
|
||||
|
||||
dnl We need to find a compiler for compiling a program to run on the local host
|
||||
dnl while building. It needs to be different from CC when cross-compiling.
|
||||
dnl There is a macro called AC_PROG_CC_FOR_BUILD in the GNU archive for
|
||||
dnl figuring this out automatically. Unfortunately, it does not work with the
|
||||
dnl latest versions of autoconf. So for the moment, we just default to the
|
||||
dnl same values as the "main" compiler. People who are cross-compiling will
|
||||
dnl just have to adjust the Makefile by hand or set these values when they
|
||||
dnl run "configure".
|
||||
|
||||
CC_FOR_BUILD=${CC_FOR_BUILD:-'$(CC)'}
|
||||
CXX_FOR_BUILD=${CXX_FOR_BUILD:-'$(CXX)'}
|
||||
CFLAGS_FOR_BUILD=${CFLAGS_FOR_BUILD:-'$(CFLAGS)'}
|
||||
CPPFLAGS_FOR_BUILD=${CFLAGS_FOR_BUILD:-'$(CPPFLAGS)'}
|
||||
CXXFLAGS_FOR_BUILD=${CXXFLAGS_FOR_BUILD:-'$(CXXFLAGS)'}
|
||||
BUILD_EXEEXT=${BUILD_EXEEXT:-'$(EXEEXT)'}
|
||||
BUILD_OBJEXT=${BUILD_OBJEXT:-'$(OBJEXT)'}
|
||||
|
||||
dnl Checks for header files.
|
||||
|
||||
AC_HEADER_STDC
|
||||
AC_CHECK_HEADERS(limits.h)
|
||||
|
||||
dnl The files below are C++ header files. One person told me (PH) that
|
||||
dnl AC_LANG_CPLUSPLUS unsets CXX if it was explicitly set to something which
|
||||
dnl doesn't work. However, this doesn't always seem to be the case.
|
||||
|
||||
if test "x$want_cpp" = "xyes" -a -n "$CXX"
|
||||
then
|
||||
AC_LANG_SAVE
|
||||
AC_LANG_CPLUSPLUS
|
||||
|
||||
dnl We could be more clever here, given we're doing AC_SUBST with this
|
||||
dnl (eg set a var to be the name of the include file we want). But we're not
|
||||
dnl so it's easy to change back to 'regular' autoconf vars if we needed to.
|
||||
AC_CHECK_HEADERS(string, [pcre_have_cpp_headers="1"],
|
||||
[pcre_have_cpp_headers="0"])
|
||||
AC_CHECK_HEADERS(bits/type_traits.h, [pcre_have_bits_type_traits="1"],
|
||||
[pcre_have_bits_type_traits="0"])
|
||||
AC_CHECK_HEADERS(type_traits.h, [pcre_have_type_traits="1"],
|
||||
[pcre_have_type_traits="0"])
|
||||
dnl Using AC_SUBST eliminates the need to include config.h in a public .h file
|
||||
AC_SUBST(pcre_have_bits_type_traits)
|
||||
AC_SUBST(pcre_have_type_traits)
|
||||
AC_LANG_RESTORE
|
||||
fi
|
||||
|
||||
dnl From the above, we now have enough info to know if C++ is fully installed
|
||||
if test "x$want_cpp" = "xyes" -a -n "$CXX" -a "$pcre_have_cpp_headers" = 1; then
|
||||
MAYBE_CPP_TARGETS='$(CPP_TARGETS)'
|
||||
HAVE_CPP=
|
||||
else
|
||||
MAYBE_CPP_TARGETS=
|
||||
HAVE_CPP="#"
|
||||
fi
|
||||
AC_SUBST(MAYBE_CPP_TARGETS)
|
||||
AC_SUBST(HAVE_CPP)
|
||||
|
||||
dnl Checks for typedefs, structures, and compiler characteristics.
|
||||
|
||||
AC_C_CONST
|
||||
AC_TYPE_SIZE_T
|
||||
|
||||
AC_CHECK_TYPES([long long], [pcre_have_long_long="1"], [pcre_have_long_long="0"])
|
||||
AC_CHECK_TYPES([unsigned long long], [pcre_have_ulong_long="1"], [pcre_have_ulong_long="0"])
|
||||
AC_SUBST(pcre_have_long_long)
|
||||
AC_SUBST(pcre_have_ulong_long)
|
||||
|
||||
dnl Checks for library functions.
|
||||
|
||||
AC_CHECK_FUNCS(bcopy memmove strerror strtoq strtoll)
|
||||
|
||||
dnl Handle --enable-utf8
|
||||
|
||||
AC_ARG_ENABLE(utf8,
|
||||
[ --enable-utf8 enable UTF8 support],
|
||||
if test "$enableval" = "yes"; then
|
||||
UTF8=-DSUPPORT_UTF8
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-unicode-properties
|
||||
|
||||
AC_ARG_ENABLE(unicode-properties,
|
||||
[ --enable-unicode-properties enable Unicode properties support],
|
||||
if test "$enableval" = "yes"; then
|
||||
UCP=-DSUPPORT_UCP
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-newline-is-cr
|
||||
|
||||
AC_ARG_ENABLE(newline-is-cr,
|
||||
[ --enable-newline-is-cr use CR as the newline character],
|
||||
if test "$enableval" = "yes"; then
|
||||
NEWLINE=-DNEWLINE=13
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-newline-is-lf
|
||||
|
||||
AC_ARG_ENABLE(newline-is-lf,
|
||||
[ --enable-newline-is-lf use LF as the newline character],
|
||||
if test "$enableval" = "yes"; then
|
||||
NEWLINE=-DNEWLINE=10
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-newline-is-crlf
|
||||
|
||||
AC_ARG_ENABLE(newline-is-crlf,
|
||||
[ --enable-newline-is-crlf use CRLF as the newline sequence],
|
||||
if test "$enableval" = "yes"; then
|
||||
NEWLINE=-DNEWLINE=3338
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-ebcdic
|
||||
|
||||
AC_ARG_ENABLE(ebcdic,
|
||||
[ --enable-ebcdic assume EBCDIC coding rather than ASCII],
|
||||
if test "$enableval" == "yes"; then
|
||||
EBCDIC=-DEBCDIC=1
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --disable-stack-for-recursion
|
||||
|
||||
AC_ARG_ENABLE(stack-for-recursion,
|
||||
[ --disable-stack-for-recursion disable use of stack recursion when matching],
|
||||
if test "$enableval" = "no"; then
|
||||
NO_RECURSE=-DNO_RECURSE
|
||||
fi
|
||||
)
|
||||
|
||||
dnl There doesn't seem to be a straightforward way of having parameters
|
||||
dnl that set values, other than fudging the --with thing. So that's what
|
||||
dnl I've done.
|
||||
|
||||
dnl Handle --with-posix-malloc-threshold=n
|
||||
|
||||
AC_ARG_WITH(posix-malloc-threshold,
|
||||
[ --with-posix-malloc-threshold=10 threshold for POSIX malloc usage],
|
||||
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=$withval
|
||||
)
|
||||
|
||||
dnl Handle --with-link-size=n
|
||||
|
||||
AC_ARG_WITH(link-size,
|
||||
[ --with-link-size=2 internal link size (2, 3, or 4 allowed)],
|
||||
LINK_SIZE=-DLINK_SIZE=$withval
|
||||
)
|
||||
|
||||
dnl Handle --with-match-limit=n
|
||||
|
||||
AC_ARG_WITH(match-limit,
|
||||
[ --with-match-limit=10000000 default limit on internal looping],
|
||||
MATCH_LIMIT=-DMATCH_LIMIT=$withval
|
||||
)
|
||||
|
||||
dnl Handle --with-match-limit_recursion=n
|
||||
|
||||
AC_ARG_WITH(match-limit-recursion,
|
||||
[ --with-match-limit-recursion=10000000 default limit on internal recursion],
|
||||
MATCH_LIMIT_RECURSION=-DMATCH_LIMIT_RECURSION=$withval
|
||||
)
|
||||
|
||||
dnl Unicode character property support implies UTF-8 support
|
||||
|
||||
if test "$UCP" != "" ; then
|
||||
UTF8=-DSUPPORT_UTF8
|
||||
fi
|
||||
|
||||
dnl "Export" these variables
|
||||
|
||||
AC_SUBST(BUILD_EXEEXT)
|
||||
AC_SUBST(BUILD_OBJEXT)
|
||||
AC_SUBST(CC_FOR_BUILD)
|
||||
AC_SUBST(CXX_FOR_BUILD)
|
||||
AC_SUBST(CFLAGS_FOR_BUILD)
|
||||
AC_SUBST(CXXFLAGS_FOR_BUILD)
|
||||
AC_SUBST(CXXLDFLAGS)
|
||||
AC_SUBST(EBCDIC)
|
||||
AC_SUBST(HAVE_MEMMOVE)
|
||||
AC_SUBST(HAVE_STRERROR)
|
||||
AC_SUBST(LINK_SIZE)
|
||||
AC_SUBST(MATCH_LIMIT)
|
||||
AC_SUBST(MATCH_LIMIT_RECURSION)
|
||||
AC_SUBST(NEWLINE)
|
||||
AC_SUBST(NO_RECURSE)
|
||||
AC_SUBST(PCRE_LIB_VERSION)
|
||||
AC_SUBST(PCRE_POSIXLIB_VERSION)
|
||||
AC_SUBST(PCRE_CPPLIB_VERSION)
|
||||
AC_SUBST(PCRE_VERSION)
|
||||
AC_SUBST(POSIX_MALLOC_THRESHOLD)
|
||||
AC_SUBST(UCP)
|
||||
AC_SUBST(UTF8)
|
||||
|
||||
dnl Stuff to make MinGW work better. Special treatment is no longer
|
||||
dnl needed for Cygwin.
|
||||
|
||||
case $host_os in
|
||||
mingw* )
|
||||
POSIX_OBJ=pcreposix.o
|
||||
POSIX_LOBJ=pcreposix.lo
|
||||
POSIX_LIB=
|
||||
ON_WINDOWS=
|
||||
NOT_ON_WINDOWS="#"
|
||||
WIN_PREFIX=
|
||||
;;
|
||||
* )
|
||||
ON_WINDOWS="#"
|
||||
NOT_ON_WINDOWS=
|
||||
POSIX_OBJ=
|
||||
POSIX_LOBJ=
|
||||
POSIX_LIB=libpcreposix.la
|
||||
WIN_PREFIX=
|
||||
;;
|
||||
esac
|
||||
AC_SUBST(WIN_PREFIX)
|
||||
AC_SUBST(ON_WINDOWS)
|
||||
AC_SUBST(NOT_ON_WINDOWS)
|
||||
AC_SUBST(POSIX_OBJ)
|
||||
AC_SUBST(POSIX_LOBJ)
|
||||
AC_SUBST(POSIX_LIB)
|
||||
|
||||
if test "x$enable_shared" = "xno" ; then
|
||||
AC_DEFINE([PCRE_STATIC],[1],[to link statically])
|
||||
fi
|
||||
|
||||
dnl This must be last; it determines what files are written as well as config.h
|
||||
AC_OUTPUT(Makefile pcre-config:pcre-config.in libpcre.pc:libpcre.pc.in pcrecpparg.h:pcrecpparg.h.in pcre_stringpiece.h:pcre_stringpiece.h.in RunGrepTest:RunGrepTest.in RunTest:RunTest.in,[chmod a+x RunTest RunGrepTest pcre-config])
|
|
@ -0,0 +1,172 @@
|
|||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/* PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by Philip Hazel
|
||||
Copyright (c) 1997-2006 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the name of the University of Cambridge nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
-----------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
|
||||
/* This is a freestanding support program to generate a file containing default
|
||||
character tables for PCRE. The tables are built according to the default C
|
||||
locale. Now that pcre_maketables is a function visible to the outside world, we
|
||||
make use of its code from here in order to be consistent. */
|
||||
|
||||
#include <ctype.h>
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
#define DFTABLES /* pcre_maketables.c notices this */
|
||||
#include "pcre_maketables.c"
|
||||
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
int i;
|
||||
FILE *f;
|
||||
const unsigned char *tables = pcre_maketables();
|
||||
const unsigned char *base_of_tables = tables;
|
||||
|
||||
if (argc != 2)
|
||||
{
|
||||
fprintf(stderr, "dftables: one filename argument is required\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
f = fopen(argv[1], "wb");
|
||||
if (f == NULL)
|
||||
{
|
||||
fprintf(stderr, "dftables: failed to open %s for writing\n", argv[1]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* There are two fprintf() calls here, because gcc in pedantic mode complains
|
||||
about the very long string otherwise. */
|
||||
|
||||
fprintf(f,
|
||||
"/*************************************************\n"
|
||||
"* Perl-Compatible Regular Expressions *\n"
|
||||
"*************************************************/\n\n"
|
||||
"/* This file is automatically written by the dftables auxiliary \n"
|
||||
"program. If you edit it by hand, you might like to edit the Makefile to \n"
|
||||
"prevent its ever being regenerated.\n\n");
|
||||
fprintf(f,
|
||||
"This file contains the default tables for characters with codes less than\n"
|
||||
"128 (ASCII characters). These tables are used when no external tables are\n"
|
||||
"passed to PCRE. */\n\n"
|
||||
"const unsigned char _pcre_default_tables[] = {\n\n"
|
||||
"/* This table is a lower casing table. */\n\n");
|
||||
|
||||
fprintf(f, " ");
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0) fprintf(f, "\n ");
|
||||
fprintf(f, "%3d", *tables++);
|
||||
if (i != 255) fprintf(f, ",");
|
||||
}
|
||||
fprintf(f, ",\n\n");
|
||||
|
||||
fprintf(f, "/* This table is a case flipping table. */\n\n");
|
||||
|
||||
fprintf(f, " ");
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0) fprintf(f, "\n ");
|
||||
fprintf(f, "%3d", *tables++);
|
||||
if (i != 255) fprintf(f, ",");
|
||||
}
|
||||
fprintf(f, ",\n\n");
|
||||
|
||||
fprintf(f,
|
||||
"/* This table contains bit maps for various character classes.\n"
|
||||
"Each map is 32 bytes long and the bits run from the least\n"
|
||||
"significant end of each byte. The classes that have their own\n"
|
||||
"maps are: space, xdigit, digit, upper, lower, word, graph\n"
|
||||
"print, punct, and cntrl. Other classes are built from combinations. */\n\n");
|
||||
|
||||
fprintf(f, " ");
|
||||
for (i = 0; i < cbit_length; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0)
|
||||
{
|
||||
if ((i & 31) == 0) fprintf(f, "\n");
|
||||
fprintf(f, "\n ");
|
||||
}
|
||||
fprintf(f, "0x%02x", *tables++);
|
||||
if (i != cbit_length - 1) fprintf(f, ",");
|
||||
}
|
||||
fprintf(f, ",\n\n");
|
||||
|
||||
fprintf(f,
|
||||
"/* This table identifies various classes of character by individual bits:\n"
|
||||
" 0x%02x white space character\n"
|
||||
" 0x%02x letter\n"
|
||||
" 0x%02x decimal digit\n"
|
||||
" 0x%02x hexadecimal digit\n"
|
||||
" 0x%02x alphanumeric or '_'\n"
|
||||
" 0x%02x regular expression metacharacter or binary zero\n*/\n\n",
|
||||
ctype_space, ctype_letter, ctype_digit, ctype_xdigit, ctype_word,
|
||||
ctype_meta);
|
||||
|
||||
fprintf(f, " ");
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0)
|
||||
{
|
||||
fprintf(f, " /* ");
|
||||
if (isprint(i-8)) fprintf(f, " %c -", i-8);
|
||||
else fprintf(f, "%3d-", i-8);
|
||||
if (isprint(i-1)) fprintf(f, " %c ", i-1);
|
||||
else fprintf(f, "%3d", i-1);
|
||||
fprintf(f, " */\n ");
|
||||
}
|
||||
fprintf(f, "0x%02x", *tables++);
|
||||
if (i != 255) fprintf(f, ",");
|
||||
}
|
||||
|
||||
fprintf(f, "};/* ");
|
||||
if (isprint(i-8)) fprintf(f, " %c -", i-8);
|
||||
else fprintf(f, "%3d-", i-8);
|
||||
if (isprint(i-1)) fprintf(f, " %c ", i-1);
|
||||
else fprintf(f, "%3d", i-1);
|
||||
fprintf(f, " */\n\n/* End of chartables.c */\n");
|
||||
|
||||
fclose(f);
|
||||
free((void *)base_of_tables);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* End of dftables.c */
|
|
@ -0,0 +1,348 @@
|
|||
Technical Notes about PCRE
|
||||
--------------------------
|
||||
|
||||
These are very rough technical notes that record potentially useful information
|
||||
about PCRE internals.
|
||||
|
||||
Historical note 1
|
||||
-----------------
|
||||
|
||||
Many years ago I implemented some regular expression functions to an algorithm
|
||||
suggested by Martin Richards. These were not Unix-like in form, and were quite
|
||||
restricted in what they could do by comparison with Perl. The interesting part
|
||||
about the algorithm was that the amount of space required to hold the compiled
|
||||
form of an expression was known in advance. The code to apply an expression did
|
||||
not operate by backtracking, as the original Henry Spencer code and current
|
||||
Perl code does, but instead checked all possibilities simultaneously by keeping
|
||||
a list of current states and checking all of them as it advanced through the
|
||||
subject string. In the terminology of Jeffrey Friedl's book, it was a "DFA
|
||||
algorithm". When the pattern was all used up, all remaining states were
|
||||
possible matches, and the one matching the longest subset of the subject string
|
||||
was chosen. This did not necessarily maximize the individual wild portions of
|
||||
the pattern, as is expected in Unix and Perl-style regular expressions.
|
||||
|
||||
Historical note 2
|
||||
-----------------
|
||||
|
||||
By contrast, the code originally written by Henry Spencer (which was
|
||||
subsequently heavily modified for Perl) compiles the expression twice: once in
|
||||
a dummy mode in order to find out how much store will be needed, and then for
|
||||
real. (The Perl version probably doesn't do this any more; I'm talking about
|
||||
the original library.) The execution function operates by backtracking and
|
||||
maximizing (or, optionally, minimizing in Perl) the amount of the subject that
|
||||
matches individual wild portions of the pattern. This is an "NFA algorithm" in
|
||||
Friedl's terminology.
|
||||
|
||||
OK, here's the real stuff
|
||||
-------------------------
|
||||
|
||||
For the set of functions that form the "basic" PCRE library (which are
|
||||
unrelated to those mentioned above), I tried at first to invent an algorithm
|
||||
that used an amount of store bounded by a multiple of the number of characters
|
||||
in the pattern, to save on compiling time. However, because of the greater
|
||||
complexity in Perl regular expressions, I couldn't do this. In any case, a
|
||||
first pass through the pattern is needed, for a number of reasons. PCRE works
|
||||
by running a very degenerate first pass to calculate a maximum store size, and
|
||||
then a second pass to do the real compile - which may use a bit less than the
|
||||
predicted amount of store. The idea is that this is going to turn out faster
|
||||
because the first pass is degenerate and the second pass can just store stuff
|
||||
straight into the vector, which it knows is big enough. It does make the
|
||||
compiling functions bigger, of course, but they have become quite big anyway to
|
||||
handle all the Perl stuff.
|
||||
|
||||
Traditional matching function
|
||||
-----------------------------
|
||||
|
||||
The "traditional", and original, matching function is called pcre_exec(), and
|
||||
it implements an NFA algorithm, similar to the original Henry Spencer algorithm
|
||||
and the way that Perl works. Not surprising, since it is intended to be as
|
||||
compatible with Perl as possible. This is the function most users of PCRE will
|
||||
use most of the time.
|
||||
|
||||
Supplementary matching function
|
||||
-------------------------------
|
||||
|
||||
From PCRE 6.0, there is also a supplementary matching function called
|
||||
pcre_dfa_exec(). This implements a DFA matching algorithm that searches
|
||||
simultaneously for all possible matches that start at one point in the subject
|
||||
string. (Going back to my roots: see Historical Note 1 above.) This function
|
||||
intreprets the same compiled pattern data as pcre_exec(); however, not all the
|
||||
facilities are available, and those that are do not always work in quite the
|
||||
same way. See the user documentation for details.
|
||||
|
||||
Format of compiled patterns
|
||||
---------------------------
|
||||
|
||||
The compiled form of a pattern is a vector of bytes, containing items of
|
||||
variable length. The first byte in an item is an opcode, and the length of the
|
||||
item is either implicit in the opcode or contained in the data bytes that
|
||||
follow it.
|
||||
|
||||
In many cases below "two-byte" data values are specified. This is in fact just
|
||||
a default. PCRE can be compiled to use 3-byte or 4-byte values (impairing the
|
||||
performance). This is necessary only when patterns whose compiled length is
|
||||
greater than 64K are going to be processed. In this description, we assume the
|
||||
"normal" compilation options.
|
||||
|
||||
A list of all the opcodes follows:
|
||||
|
||||
Opcodes with no following data
|
||||
------------------------------
|
||||
|
||||
These items are all just one byte long
|
||||
|
||||
OP_END end of pattern
|
||||
OP_ANY match any character
|
||||
OP_ANYBYTE match any single byte, even in UTF-8 mode
|
||||
OP_SOD match start of data: \A
|
||||
OP_SOM, start of match (subject + offset): \G
|
||||
OP_CIRC ^ (start of data, or after \n in multiline)
|
||||
OP_NOT_WORD_BOUNDARY \W
|
||||
OP_WORD_BOUNDARY \w
|
||||
OP_NOT_DIGIT \D
|
||||
OP_DIGIT \d
|
||||
OP_NOT_WHITESPACE \S
|
||||
OP_WHITESPACE \s
|
||||
OP_NOT_WORDCHAR \W
|
||||
OP_WORDCHAR \w
|
||||
OP_EODN match end of data or \n at end: \Z
|
||||
OP_EOD match end of data: \z
|
||||
OP_DOLL $ (end of data, or before \n in multiline)
|
||||
OP_EXTUNI match an extended Unicode character
|
||||
|
||||
|
||||
Repeating single characters
|
||||
---------------------------
|
||||
|
||||
The common repeats (*, +, ?) when applied to a single character use the
|
||||
following opcodes:
|
||||
|
||||
OP_STAR
|
||||
OP_MINSTAR
|
||||
OP_PLUS
|
||||
OP_MINPLUS
|
||||
OP_QUERY
|
||||
OP_MINQUERY
|
||||
|
||||
In ASCII mode, these are two-byte items; in UTF-8 mode, the length is variable.
|
||||
Those with "MIN" in their name are the minimizing versions. Each is followed by
|
||||
the character that is to be repeated. Other repeats make use of
|
||||
|
||||
OP_UPTO
|
||||
OP_MINUPTO
|
||||
OP_EXACT
|
||||
|
||||
which are followed by a two-byte count (most significant first) and the
|
||||
repeated character. OP_UPTO matches from 0 to the given number. A repeat with a
|
||||
non-zero minimum and a fixed maximum is coded as an OP_EXACT followed by an
|
||||
OP_UPTO (or OP_MINUPTO).
|
||||
|
||||
|
||||
Repeating character types
|
||||
-------------------------
|
||||
|
||||
Repeats of things like \d are done exactly as for single characters, except
|
||||
that instead of a character, the opcode for the type is stored in the data
|
||||
byte. The opcodes are:
|
||||
|
||||
OP_TYPESTAR
|
||||
OP_TYPEMINSTAR
|
||||
OP_TYPEPLUS
|
||||
OP_TYPEMINPLUS
|
||||
OP_TYPEQUERY
|
||||
OP_TYPEMINQUERY
|
||||
OP_TYPEUPTO
|
||||
OP_TYPEMINUPTO
|
||||
OP_TYPEEXACT
|
||||
|
||||
|
||||
Match by Unicode property
|
||||
-------------------------
|
||||
|
||||
OP_PROP and OP_NOTPROP are used for positive and negative matches of a
|
||||
character by testing its Unicode property (the \p and \P escape sequences).
|
||||
Each is followed by two bytes that encode the desired property as a type and a
|
||||
value.
|
||||
|
||||
Repeats of these items use the OP_TYPESTAR etc. set of opcodes, followed by
|
||||
three bytes: OP_PROP or OP_NOTPROP and then the desired property type and
|
||||
value.
|
||||
|
||||
|
||||
Matching literal characters
|
||||
---------------------------
|
||||
|
||||
The OP_CHAR opcode is followed by a single character that is to be matched
|
||||
casefully. For caseless matching, OP_CHARNC is used. In UTF-8 mode, the
|
||||
character may be more than one byte long. (Earlier versions of PCRE used
|
||||
multi-character strings, but this was changed to allow some new features to be
|
||||
added.)
|
||||
|
||||
|
||||
Character classes
|
||||
-----------------
|
||||
|
||||
If there is only one character, OP_CHAR or OP_CHARNC is used for a positive
|
||||
class, and OP_NOT for a negative one (that is, for something like [^a]).
|
||||
However, in UTF-8 mode, the use of OP_NOT applies only to characters with
|
||||
values < 128, because OP_NOT is confined to single bytes.
|
||||
|
||||
Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a repeated,
|
||||
negated, single-character class. The normal ones (OP_STAR etc.) are used for a
|
||||
repeated positive single-character class.
|
||||
|
||||
When there's more than one character in a class and all the characters are less
|
||||
than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a negative
|
||||
one. In either case, the opcode is followed by a 32-byte bit map containing a 1
|
||||
bit for every character that is acceptable. The bits are counted from the least
|
||||
significant end of each byte.
|
||||
|
||||
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 mode,
|
||||
subject characters with values greater than 256 can be handled correctly. For
|
||||
OP_CLASS they don't match, whereas for OP_NCLASS they do.
|
||||
|
||||
For classes containing characters with values > 255, OP_XCLASS is used. It
|
||||
optionally uses a bit map (if any characters lie within it), followed by a list
|
||||
of pairs and single characters. There is a flag character than indicates
|
||||
whether it's a positive or a negative class.
|
||||
|
||||
|
||||
Back references
|
||||
---------------
|
||||
|
||||
OP_REF is followed by two bytes containing the reference number.
|
||||
|
||||
|
||||
Repeating character classes and back references
|
||||
-----------------------------------------------
|
||||
|
||||
Single-character classes are handled specially (see above). This applies to
|
||||
OP_CLASS and OP_REF. In both cases, the repeat information follows the base
|
||||
item. The matching code looks at the following opcode to see if it is one of
|
||||
|
||||
OP_CRSTAR
|
||||
OP_CRMINSTAR
|
||||
OP_CRPLUS
|
||||
OP_CRMINPLUS
|
||||
OP_CRQUERY
|
||||
OP_CRMINQUERY
|
||||
OP_CRRANGE
|
||||
OP_CRMINRANGE
|
||||
|
||||
All but the last two are just single-byte items. The others are followed by
|
||||
four bytes of data, comprising the minimum and maximum repeat counts.
|
||||
|
||||
|
||||
Brackets and alternation
|
||||
------------------------
|
||||
|
||||
A pair of non-capturing (round) brackets is wrapped round each expression at
|
||||
compile time, so alternation always happens in the context of brackets.
|
||||
|
||||
Non-capturing brackets use the opcode OP_BRA, while capturing brackets use
|
||||
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
|
||||
speakers, including myself, can be round, square, curly, or pointy. Hence this
|
||||
usage.]
|
||||
|
||||
Originally PCRE was limited to 99 capturing brackets (so as not to use up all
|
||||
the opcodes). From release 3.5, there is no limit. What happens is that the
|
||||
first ones, up to EXTRACT_BASIC_MAX are handled with separate opcodes, as
|
||||
above. If there are more, the opcode is set to EXTRACT_BASIC_MAX+1, and the
|
||||
first operation in the bracket is OP_BRANUMBER, followed by a 2-byte bracket
|
||||
number. This opcode is ignored while matching, but is fished out when handling
|
||||
the bracket itself. (They could have all been done like this, but I was making
|
||||
minimal changes.)
|
||||
|
||||
A bracket opcode is followed by LINK_SIZE bytes which give the offset to the
|
||||
next alternative OP_ALT or, if there aren't any branches, to the matching
|
||||
OP_KET opcode. Each OP_ALT is followed by LINK_SIZE bytes giving the offset to
|
||||
the next one, or to the OP_KET opcode.
|
||||
|
||||
OP_KET is used for subpatterns that do not repeat indefinitely, while
|
||||
OP_KETRMIN and OP_KETRMAX are used for indefinite repetitions, minimally or
|
||||
maximally respectively. All three are followed by LINK_SIZE bytes giving (as a
|
||||
positive number) the offset back to the matching OP_BRA opcode.
|
||||
|
||||
If a subpattern is quantified such that it is permitted to match zero times, it
|
||||
is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte
|
||||
opcodes which tell the matcher that skipping this subpattern entirely is a
|
||||
valid branch.
|
||||
|
||||
A subpattern with an indefinite maximum repetition is replicated in the
|
||||
compiled data its minimum number of times (or once with OP_BRAZERO if the
|
||||
minimum is zero), with the final copy terminating with OP_KETRMIN or OP_KETRMAX
|
||||
as appropriate.
|
||||
|
||||
A subpattern with a bounded maximum repetition is replicated in a nested
|
||||
fashion up to the maximum number of times, with OP_BRAZERO or OP_BRAMINZERO
|
||||
before each replication after the minimum, so that, for example, (abc){2,5} is
|
||||
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?.
|
||||
|
||||
|
||||
Assertions
|
||||
----------
|
||||
|
||||
Forward assertions are just like other subpatterns, but starting with one of
|
||||
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
|
||||
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
|
||||
is OP_REVERSE, followed by a two byte count of the number of characters to move
|
||||
back the pointer in the subject string. When operating in UTF-8 mode, the count
|
||||
is a character count rather than a byte count. A separate count is present in
|
||||
each alternative of a lookbehind assertion, allowing them to have different
|
||||
fixed lengths.
|
||||
|
||||
|
||||
Once-only subpatterns
|
||||
---------------------
|
||||
|
||||
These are also just like other subpatterns, but they start with the opcode
|
||||
OP_ONCE.
|
||||
|
||||
|
||||
Conditional subpatterns
|
||||
-----------------------
|
||||
|
||||
These are like other subpatterns, but they start with the opcode OP_COND. If
|
||||
the condition is a back reference, this is stored at the start of the
|
||||
subpattern using the opcode OP_CREF followed by two bytes containing the
|
||||
reference number. If the condition is "in recursion" (coded as "(?(R)"), the
|
||||
same scheme is used, with a "reference number" of 0xffff. Otherwise, a
|
||||
conditional subpattern always starts with one of the assertions.
|
||||
|
||||
|
||||
Recursion
|
||||
---------
|
||||
|
||||
Recursion either matches the current regex, or some subexpression. The opcode
|
||||
OP_RECURSE is followed by an value which is the offset to the starting bracket
|
||||
from the start of the whole pattern. From release 6.5, OP_RECURSE is
|
||||
automatically wrapped inside OP_ONCE brackets (because otherwise some patterns
|
||||
broke it). OP_RECURSE is also used for "subroutine" calls, even though they
|
||||
are not strictly a recursion.
|
||||
|
||||
|
||||
Callout
|
||||
-------
|
||||
|
||||
OP_CALLOUT is followed by one byte of data that holds a callout number in the
|
||||
range 0 to 254 for manual callouts, or 255 for an automatic callout. In both
|
||||
cases there follows a two-byte value giving the offset in the pattern to the
|
||||
start of the following item, and another two-byte item giving the length of the
|
||||
next item.
|
||||
|
||||
|
||||
Changing options
|
||||
----------------
|
||||
|
||||
If any of the /i, /m, or /s options are changed within a pattern, an OP_OPT
|
||||
opcode is compiled, followed by one byte containing the new settings of these
|
||||
flags. If there are several alternatives, there is an occurrence of OP_OPT at
|
||||
the start of all those following the first options change, to set appropriate
|
||||
options for the start of the alternative. Immediately after the end of the
|
||||
group there is another such item to reset the flags to their previous values. A
|
||||
change of flag right at the very start of the pattern can be handled entirely
|
||||
at compile time, and so does not cause anything to be put into the compiled
|
||||
data.
|
||||
|
||||
Philip Hazel
|
||||
June 2006
|
|
@ -0,0 +1,128 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>PCRE specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>Perl-compatible Regular Expressions (PCRE)</h1>
|
||||
<p>
|
||||
The HTML documentation for PCRE comprises the following pages:
|
||||
</p>
|
||||
|
||||
<table>
|
||||
<tr><td><a href="pcre.html">pcre</a></td>
|
||||
<td> Introductory page</td></tr>
|
||||
|
||||
<tr><td><a href="pcreapi.html">pcreapi</a></td>
|
||||
<td> PCRE's native API</td></tr>
|
||||
|
||||
<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
|
||||
<td> Options for building PCRE</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
|
||||
<td> The <i>callout</i> facility</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecompat.html">pcrecompat</a></td>
|
||||
<td> Compability with Perl</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecpp.html">pcrecpp</a></td>
|
||||
<td> The C++ wrapper for the PCRE library</td></tr>
|
||||
|
||||
<tr><td><a href="pcregrep.html">pcregrep</a></td>
|
||||
<td> The <b>pcregrep</b> command</td></tr>
|
||||
|
||||
<tr><td><a href="pcrematching.html">pcrematching</a></td>
|
||||
<td> Discussion of the two matching algorithms</td></tr>
|
||||
|
||||
<tr><td><a href="pcrepartial.html">pcrepartial</a></td>
|
||||
<td> Using PCRE for partial matching</td></tr>
|
||||
|
||||
<tr><td><a href="pcrepattern.html">pcrepattern</a></td>
|
||||
<td> Specification of the regular expressions supported by PCRE</td></tr>
|
||||
|
||||
<tr><td><a href="pcreperform.html">pcreperform</a></td>
|
||||
<td> Some comments on performance</td></tr>
|
||||
|
||||
<tr><td><a href="pcreposix.html">pcreposix</a></td>
|
||||
<td> The POSIX API to the PCRE library</td></tr>
|
||||
|
||||
<tr><td><a href="pcreprecompile.html">pcreprecompile</a></td>
|
||||
<td> How to save and re-use compiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcresample.html">pcresample</a></td>
|
||||
<td> Description of the sample program</td></tr>
|
||||
|
||||
<tr><td><a href="pcrestack.html">pcrestack</a></td>
|
||||
<td> Discussion of PCRE's stack usage</td></tr>
|
||||
|
||||
<tr><td><a href="pcretest.html">pcretest</a></td>
|
||||
<td> The <b>pcretest</b> command for testing PCRE</td></tr>
|
||||
</table>
|
||||
|
||||
<p>
|
||||
There are also individual pages that summarize the interface for each function
|
||||
in the library:
|
||||
</p>
|
||||
|
||||
<table>
|
||||
|
||||
<tr><td><a href="pcre_compile.html">pcre_compile</a></td>
|
||||
<td> Compile a regular expression</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_compile2.html">pcre_compile2</a></td>
|
||||
<td> Compile a regular expression (alternate interface)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_config.html">pcre_config</a></td>
|
||||
<td> Show build-time configuration options</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_copy_named_substring.html">pcre_copy_named_substring</a></td>
|
||||
<td> Extract named substring into given buffer</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_copy_substring.html">pcre_copy_substring</a></td>
|
||||
<td> Extract numbered substring into given buffer</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_dfa_exec.html">pcre_dfa_exec</a></td>
|
||||
<td> Match a compiled pattern to a subject string
|
||||
(DFA algorithm; <i>not</i> Perl compatible)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
|
||||
<td> Match a compiled pattern to a subject string
|
||||
(Perl compatible)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
|
||||
<td> Free extracted substring</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_substring_list.html">pcre_free_substring_list</a></td>
|
||||
<td> Free list of extracted substrings</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_fullinfo.html">pcre_fullinfo</a></td>
|
||||
<td> Extract information about a pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_named_substring.html">pcre_get_named_substring</a></td>
|
||||
<td> Extract named substring into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
|
||||
<td> Convert captured string name to number</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
|
||||
<td> Extract numbered substring into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
|
||||
<td> Extract all substrings into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_info.html">pcre_info</a></td>
|
||||
<td> Obsolete information extraction function</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
|
||||
<td> Build character tables in current locale</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
|
||||
<td> Maintain reference count in compiled pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_study.html">pcre_study</a></td>
|
||||
<td> Study a compiled pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_version.html">pcre_version</a></td>
|
||||
<td> Return PCRE version and release date</td></tr>
|
||||
</table>
|
||||
|
||||
</html>
|
|
@ -0,0 +1,252 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
|
||||
<li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>
|
||||
<li><a name="TOC3" href="#SEC3">LIMITATIONS</a>
|
||||
<li><a name="TOC4" href="#SEC4">UTF-8 AND UNICODE PROPERTY SUPPORT</a>
|
||||
<li><a name="TOC5" href="#SEC5">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
|
||||
<P>
|
||||
The PCRE library is a set of functions that implement regular expression
|
||||
pattern matching using the same syntax and semantics as Perl, with just a few
|
||||
differences. The current implementation of PCRE (release 6.x) corresponds
|
||||
approximately with Perl 5.8, including support for UTF-8 encoded strings and
|
||||
Unicode general category properties. However, this support has to be explicitly
|
||||
enabled; it is not the default.
|
||||
</P>
|
||||
<P>
|
||||
In addition to the Perl-compatible matching function, PCRE also contains an
|
||||
alternative matching function that matches the same compiled patterns in a
|
||||
different way. In certain circumstances, the alternative function has some
|
||||
advantages. For a discussion of the two matching algorithms, see the
|
||||
<a href="pcrematching.html"><b>pcrematching</b></a>
|
||||
page.
|
||||
</P>
|
||||
<P>
|
||||
PCRE is written in C and released as a C library. A number of people have
|
||||
written wrappers and interfaces of various kinds. In particular, Google Inc.
|
||||
have provided a comprehensive C++ wrapper. This is now included as part of the
|
||||
PCRE distribution. The
|
||||
<a href="pcrecpp.html"><b>pcrecpp</b></a>
|
||||
page has details of this interface. Other people's contributions can be found
|
||||
in the <i>Contrib</i> directory at the primary FTP site, which is:
|
||||
<a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>
|
||||
</P>
|
||||
<P>
|
||||
Details of exactly which Perl regular expression features are and are not
|
||||
supported by PCRE are given in separate documents. See the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
and
|
||||
<a href="pcrecompat.html"><b>pcrecompat</b></a>
|
||||
pages.
|
||||
</P>
|
||||
<P>
|
||||
Some features of PCRE can be included, excluded, or changed when the library is
|
||||
built. The
|
||||
<a href="pcre_config.html"><b>pcre_config()</b></a>
|
||||
function makes it possible for a client to discover which features are
|
||||
available. The features themselves are described in the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
page. Documentation about building PCRE for various operating systems can be
|
||||
found in the <b>README</b> file in the source distribution.
|
||||
</P>
|
||||
<P>
|
||||
The library contains a number of undocumented internal functions and data
|
||||
tables that are used by more than one of the exported external functions, but
|
||||
which are not intended for use by external callers. Their names all begin with
|
||||
"_pcre_", which hopefully will not provoke any name clashes. In some
|
||||
environments, it is possible to control which external symbols are exported
|
||||
when a shared library is built, and in these cases the undocumented symbols are
|
||||
not exported.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>
|
||||
<P>
|
||||
The user documentation for PCRE comprises a number of different sections. In
|
||||
the "man" format, each of these is a separate "man page". In the HTML format,
|
||||
each is a separate page, linked from the index page. In the plain text format,
|
||||
all the sections are concatenated, for ease of searching. The sections are as
|
||||
follows:
|
||||
<pre>
|
||||
pcre this document
|
||||
pcreapi details of PCRE's native C API
|
||||
pcrebuild options for building PCRE
|
||||
pcrecallout details of the callout feature
|
||||
pcrecompat discussion of Perl compatibility
|
||||
pcrecpp details of the C++ wrapper
|
||||
pcregrep description of the <b>pcregrep</b> command
|
||||
pcrematching discussion of the two matching algorithms
|
||||
pcrepartial details of the partial matching facility
|
||||
pcrepattern syntax and semantics of supported regular expressions
|
||||
pcreperform discussion of performance issues
|
||||
pcreposix the POSIX-compatible C API
|
||||
pcreprecompile details of saving and re-using precompiled patterns
|
||||
pcresample discussion of the sample program
|
||||
pcrestack discussion of stack usage
|
||||
pcretest description of the <b>pcretest</b> testing command
|
||||
</pre>
|
||||
In addition, in the "man" and HTML formats, there is a short page for each
|
||||
C library function, listing its arguments and results.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">LIMITATIONS</a><br>
|
||||
<P>
|
||||
There are some size limitations in PCRE but it is hoped that they will never in
|
||||
practice be relevant.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
|
||||
compiled with the default internal linkage size of 2. If you want to process
|
||||
regular expressions that are truly enormous, you can compile PCRE with an
|
||||
internal linkage size of 3 or 4 (see the <b>README</b> file in the source
|
||||
distribution and the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
documentation for details). In these cases the limit is substantially larger.
|
||||
However, the speed of execution will be slower.
|
||||
</P>
|
||||
<P>
|
||||
All values in repeating quantifiers must be less than 65536. The maximum
|
||||
compiled length of subpattern with an explicit repeat count is 30000 bytes. The
|
||||
maximum number of capturing subpatterns is 65535.
|
||||
</P>
|
||||
<P>
|
||||
There is no limit to the number of non-capturing subpatterns, but the maximum
|
||||
depth of nesting of all kinds of parenthesized subpattern, including capturing
|
||||
subpatterns, assertions, and other types of subpattern, is 200.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of name for a named subpattern is 32, and the maximum number
|
||||
of named subpatterns is 10000.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a subject string is the largest positive number that an
|
||||
integer variable can hold. However, when using the traditional matching
|
||||
function, PCRE uses recursion to handle subpatterns and indefinite repetition.
|
||||
This means that the available stack space may limit the size of a subject
|
||||
string that can be processed by certain patterns. For a discussion of stack
|
||||
issues, see the
|
||||
<a href="pcrestack.html"><b>pcrestack</b></a>
|
||||
documentation.
|
||||
<a name="utf8support"></a></P>
|
||||
<br><a name="SEC4" href="#TOC1">UTF-8 AND UNICODE PROPERTY SUPPORT</a><br>
|
||||
<P>
|
||||
From release 3.3, PCRE has had some support for character strings encoded in
|
||||
the UTF-8 format. For release 4.0 this was greatly extended to cover most
|
||||
common requirements, and in release 5.0 additional support for Unicode general
|
||||
category properties was added.
|
||||
</P>
|
||||
<P>
|
||||
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
|
||||
the code, and, in addition, you must call
|
||||
<a href="pcre_compile.html"><b>pcre_compile()</b></a>
|
||||
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
|
||||
subject strings that are matched against it are treated as UTF-8 strings
|
||||
instead of just strings of bytes.
|
||||
</P>
|
||||
<P>
|
||||
If you compile PCRE with UTF-8 support, but do not use it at run time, the
|
||||
library will be a bit bigger, but the additional run time overhead is limited
|
||||
to testing the PCRE_UTF8 flag in several places, so should not be very large.
|
||||
</P>
|
||||
<P>
|
||||
If PCRE is built with Unicode character property support (which implies UTF-8
|
||||
support), the escape sequences \p{..}, \P{..}, and \X are supported.
|
||||
The available properties that can be tested are limited to the general
|
||||
category properties such as Lu for an upper case letter or Nd for a decimal
|
||||
number, the Unicode script names such as Arabic or Han, and the derived
|
||||
properties Any and L&. A full list is given in the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
documentation. Only the short names for properties are supported. For example,
|
||||
\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
|
||||
Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
|
||||
compatibility with Perl 5.6. PCRE does not support this.
|
||||
</P>
|
||||
<P>
|
||||
The following comments apply when PCRE is running in UTF-8 mode:
|
||||
</P>
|
||||
<P>
|
||||
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
|
||||
are checked for validity on entry to the relevant functions. If an invalid
|
||||
UTF-8 string is passed, an error return is given. In some situations, you may
|
||||
already know that your strings are valid, and therefore want to skip these
|
||||
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
|
||||
at compile time or at run time, PCRE assumes that the pattern or subject it
|
||||
is given (respectively) contains only valid UTF-8 codes. In this case, it does
|
||||
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
|
||||
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
|
||||
may crash.
|
||||
</P>
|
||||
<P>
|
||||
2. An unbraced hexadecimal escape sequence (such as \xb3) matches a two-byte
|
||||
UTF-8 character if the value is greater than 127.
|
||||
</P>
|
||||
<P>
|
||||
3. Octal numbers up to \777 are recognized, and match two-byte UTF-8
|
||||
characters for values greater than \177.
|
||||
</P>
|
||||
<P>
|
||||
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
|
||||
bytes, for example: \x{100}{3}.
|
||||
</P>
|
||||
<P>
|
||||
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
|
||||
</P>
|
||||
<P>
|
||||
6. The escape sequence \C can be used to match a single byte in UTF-8 mode,
|
||||
but its use can lead to some strange effects. This facility is not available in
|
||||
the alternative matching function, <b>pcre_dfa_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
|
||||
test characters of any code value, but the characters that PCRE recognizes as
|
||||
digits, spaces, or word characters remain the same set as before, all with
|
||||
values less than 256. This remains true even when PCRE includes Unicode
|
||||
property support, because to do otherwise would slow down PCRE in many common
|
||||
cases. If you really want to test for a wider sense of, say, "digit", you
|
||||
must use Unicode property tests such as \p{Nd}.
|
||||
</P>
|
||||
<P>
|
||||
8. Similarly, characters that match the POSIX named character classes are all
|
||||
low-valued characters.
|
||||
</P>
|
||||
<P>
|
||||
9. Case-insensitive matching applies only to characters whose values are less
|
||||
than 128, unless PCRE is built with Unicode property support. Even when Unicode
|
||||
property support is available, PCRE still uses its own character tables when
|
||||
checking the case of low-valued characters, so as not to degrade performance.
|
||||
The Unicode property information is used only for characters with higher
|
||||
values. Even when Unicode property support is available, PCRE supports
|
||||
case-insensitive matching only when there is a one-to-one mapping between a
|
||||
letter's cases. There are a small number of many-to-one mappings in Unicode;
|
||||
these are not supported by PCRE.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service,
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Putting an actual email address here seems to have been a spam magnet, so I've
|
||||
taken it away. If you want to email me, use my initial and surname, separated
|
||||
by a dot, at the domain ucs.cam.ac.uk.
|
||||
Last updated: 05 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,80 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_compile specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_compile man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function compiles a regular expression into an internal form. Its
|
||||
arguments are:
|
||||
<pre>
|
||||
<i>pattern</i> A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
<i>options</i> Zero or more option bits
|
||||
<i>errptr</i> Where to put an error message
|
||||
<i>erroffset</i> Offset in pattern where error was found
|
||||
<i>tableptr</i> Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
</pre>
|
||||
The option bits are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_AUTO_CALLOUT Compile automatic callouts
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_DUPNAMES Allow duplicate names for subpatterns
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
</pre>
|
||||
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
|
||||
PCRE_NO_UTF8_CHECK.
|
||||
</P>
|
||||
<P>
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,85 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_compile2 specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_compile2 man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre *pcre_compile2(const char *<i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>int *<i>errorcodeptr</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function compiles a regular expression into an internal form. It is the
|
||||
same as <b>pcre_compile()</b>, except for the addition of the <i>errorcodeptr</i>
|
||||
argument. The arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>pattern</i> A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
<i>options</i> Zero or more option bits
|
||||
<i>errorcodeptr</i> Where to put an error code
|
||||
<i>errptr</i> Where to put an error message
|
||||
<i>erroffset</i> Offset in pattern where error was found
|
||||
<i>tableptr</i> Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
</pre>
|
||||
The option bits are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_AUTO_CALLOUT Compile automatic callouts
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_DUPNAMES Allow duplicate names for subpatterns
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
</pre>
|
||||
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
|
||||
PCRE_NO_UTF8_CHECK.
|
||||
</P>
|
||||
<P>
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,62 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_config specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_config man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function makes it possible for a client program to find out which optional
|
||||
features are available in the version of the PCRE library it is using. Its
|
||||
arguments are as follows:
|
||||
<pre>
|
||||
<i>what</i> A code specifying what information is required
|
||||
<i>where</i> Points to where to put the data
|
||||
</pre>
|
||||
The available codes are:
|
||||
<pre>
|
||||
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
|
||||
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
|
||||
PCRE_CONFIG_MATCH_LIMIT_RECURSION
|
||||
Internal recursion depth limit
|
||||
PCRE_CONFIG_NEWLINE Value of the newline sequence
|
||||
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
|
||||
Threshold of return slots, above
|
||||
which <b>malloc()</b> is used by
|
||||
the POSIX API
|
||||
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
|
||||
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
|
||||
PCRE_CONFIG_UNICODE_PROPERTIES
|
||||
Availability of Unicode property support
|
||||
(1=yes 0=no)
|
||||
</pre>
|
||||
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,53 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_copy_named_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_copy_named_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b>char *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring, identified
|
||||
by name, into a given buffer. The arguments are:
|
||||
<pre>
|
||||
<i>code</i> Pattern that was successfully matched
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringname</i> Name of the required substring
|
||||
<i>buffer</i> Buffer to receive the string
|
||||
<i>buffersize</i> Size of buffer
|
||||
</pre>
|
||||
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,51 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_copy_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_copy_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
|
||||
<b>int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring into a given
|
||||
buffer. The arguments are:
|
||||
<pre>
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringnumber</i> Number of the required substring
|
||||
<i>buffer</i> Buffer to receive the string
|
||||
<i>buffersize</i> Size of buffer
|
||||
</pre>
|
||||
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,93 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_dfa_exec specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_dfa_exec man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b>int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, using a DFA matching algorithm (<i>not</i> Perl-compatible). Note that
|
||||
the main, Perl-compatible, matching function is <b>pcre_exec()</b>. The
|
||||
arguments for this function are:
|
||||
<pre>
|
||||
<i>code</i> Points to the compiled pattern
|
||||
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
|
||||
or is NULL
|
||||
<i>subject</i> Points to the subject string
|
||||
<i>length</i> Length of the subject string, in bytes
|
||||
<i>startoffset</i> Offset in bytes in the subject at which to
|
||||
start matching
|
||||
<i>options</i> Option bits
|
||||
<i>ovector</i> Points to a vector of ints for result offsets
|
||||
<i>ovecsize</i> Number of elements in the vector
|
||||
<i>workspace</i> Points to a vector of ints used as working space
|
||||
<i>wscount</i> Number of elements in the vector
|
||||
</pre>
|
||||
The options are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
|
||||
PCRE_DFA_SHORTEST Return only the shortest match
|
||||
PCRE_DFA_RESTART This is a restart after a partial match
|
||||
</pre>
|
||||
There are restrictions on what may appear in a pattern when matching using the
|
||||
DFA algorithm is requested. Details are given in the
|
||||
<a href="pcrematching.html"><b>pcrematching</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
A <b>pcre_extra</b> structure contains the following fields:
|
||||
<pre>
|
||||
<i>flags</i> Bits indicating which fields are set
|
||||
<i>study_data</i> Opaque data from <b>pcre_study()</b>
|
||||
<i>match_limit</i> Limit on internal resource use
|
||||
<i>match_limit_recursion</i> Limit on internal recursion depth
|
||||
<i>callout_data</i> Opaque data passed back to callouts
|
||||
<i>tables</i> Points to character tables or is NULL
|
||||
</pre>
|
||||
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
|
||||
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
|
||||
PCRE_EXTRA_TABLES. For DFA matching, the <i>match_limit</i> and
|
||||
<i>match_limit_recursion</i> fields are not used, and must not be set.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,84 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_exec specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_exec man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, using a matching algorithm that is similar to Perl's. It returns
|
||||
offsets to captured substrings. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> Points to the compiled pattern
|
||||
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
|
||||
or is NULL
|
||||
<i>subject</i> Points to the subject string
|
||||
<i>length</i> Length of the subject string, in bytes
|
||||
<i>startoffset</i> Offset in bytes in the subject at which to
|
||||
start matching
|
||||
<i>options</i> Option bits
|
||||
<i>ovector</i> Points to a vector of ints for result offsets
|
||||
<i>ovecsize</i> Number of elements in the vector (a multiple of 3)
|
||||
</pre>
|
||||
The options are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
|
||||
</pre>
|
||||
There are restrictions on what may appear in a pattern when partial matching is
|
||||
requested.
|
||||
</P>
|
||||
<P>
|
||||
A <b>pcre_extra</b> structure contains the following fields:
|
||||
<pre>
|
||||
<i>flags</i> Bits indicating which fields are set
|
||||
<i>study_data</i> Opaque data from <b>pcre_study()</b>
|
||||
<i>match_limit</i> Limit on internal resource use
|
||||
<i>match_limit_recursion</i> Limit on internal recursion depth
|
||||
<i>callout_data</i> Opaque data passed back to callouts
|
||||
<i>tables</i> Points to character tables or is NULL
|
||||
</pre>
|
||||
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
|
||||
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
|
||||
PCRE_EXTRA_TABLES.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,40 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_free_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_free_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void pcre_free_substring(const char *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to <b>pcre_get_substring()</b> or <b>pcre_get_named_substring()</b>. Its
|
||||
only argument is a pointer to the string.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,40 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_free_substring_list specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_free_substring_list man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void pcre_free_substring_list(const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to <b>pcre_get_substring_list()</b>. Its only argument is a pointer to the
|
||||
list of string pointers.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,71 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_fullinfo specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_fullinfo man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function returns information about a compiled pattern. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>extra</i> Result of <b>pcre_study()</b> or NULL
|
||||
<i>what</i> What information is required
|
||||
<i>where</i> Where to put the information
|
||||
</pre>
|
||||
The following information is available:
|
||||
<pre>
|
||||
PCRE_INFO_BACKREFMAX Number of highest back reference
|
||||
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
|
||||
PCRE_INFO_DEFAULT_TABLES Pointer to default tables
|
||||
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
|
||||
-1 for start of string
|
||||
or after newline, or
|
||||
-2 otherwise
|
||||
PCRE_INFO_FIRSTTABLE Table of first bytes
|
||||
(after studying)
|
||||
PCRE_INFO_LASTLITERAL Literal last byte required
|
||||
PCRE_INFO_NAMECOUNT Number of named subpatterns
|
||||
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
|
||||
PCRE_INFO_NAMETABLE Pointer to name table
|
||||
PCRE_INFO_OPTIONS Options used for compilation
|
||||
PCRE_INFO_SIZE Size of compiled pattern
|
||||
PCRE_INFO_STUDYSIZE Size of study data
|
||||
</pre>
|
||||
The yield of the function is zero on success or:
|
||||
<pre>
|
||||
PCRE_ERROR_NULL the argument <i>code</i> was NULL
|
||||
the argument <i>where</i> was NULL
|
||||
PCRE_ERROR_BADMAGIC the "magic number" was not found
|
||||
PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,54 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_get_named_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_named_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b>const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring by name. The
|
||||
arguments are:
|
||||
<pre>
|
||||
<i>code</i> Compiled pattern
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringname</i> Name of the required substring
|
||||
<i>stringptr</i> Where to put the string pointer
|
||||
</pre>
|
||||
The memory in which the substring is placed is obtained by calling
|
||||
<b>pcre_malloc()</b>. The yield of the function is the length of the extracted
|
||||
substring, PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
||||
PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,46 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_get_stringnumber specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_stringnumber man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>name</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This convenience function finds the number of a named substring capturing
|
||||
parenthesis in a compiled pattern. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>name</i> Name whose number is required
|
||||
</pre>
|
||||
The yield of the function is the number of the parenthesis if the name is
|
||||
found, or PCRE_ERROR_NOSUBSTRING otherwise.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,52 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_get_stringtable_entries specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_stringtable_entries man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_stringtable_entries(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>name</i>, char **<i>first</i>, char **<i>last</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This convenience function finds, for a compiled pattern, the first and last
|
||||
entries for a given name in the table that translates capturing parenthesis
|
||||
names into numbers. When names are required to be unique (PCRE_DUPNAMES is
|
||||
<i>not</i> set), it is usually easier to use <b>pcre_get_stringnumber()</b>
|
||||
instead.
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>name</i> Name whose entries required
|
||||
<i>first</i> Where to return a pointer to the first entry
|
||||
<i>last</i> Where to return a pointer to the last entry
|
||||
</pre>
|
||||
The yield of the function is the length of each entry, or
|
||||
PCRE_ERROR_NOSUBSTRING if none are found.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API, including the format of
|
||||
the table entries, in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,52 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_get_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b>const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring. The
|
||||
arguments are:
|
||||
<pre>
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringnumber</i> Number of the required substring
|
||||
<i>stringptr</i> Where to put the string pointer
|
||||
</pre>
|
||||
The memory in which the substring is placed is obtained by calling
|
||||
<b>pcre_malloc()</b>. The yield of the function is the length of the substring,
|
||||
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
||||
PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,51 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_get_substring_list specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_substring_list man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
|
||||
<b>int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a list of all the captured
|
||||
substrings. The arguments are:
|
||||
<pre>
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec</b>
|
||||
<i>listptr</i> Where to put a pointer to the list
|
||||
</pre>
|
||||
The memory in which the substrings and the list are placed is obtained by
|
||||
calling <b>pcre_malloc()</b>. A pointer to a list of pointers is put in
|
||||
the variable whose address is in <i>listptr</i>. The list is terminated by a
|
||||
NULL pointer. The yield of the function is zero on success or
|
||||
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,39 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_info specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_info man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b>
|
||||
<b>*<i>firstcharptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function is obsolete. You should be using <b>pcre_fullinfo()</b> instead.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,42 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_maketables specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_maketables man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>const unsigned char *pcre_maketables(void);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function builds a set of character tables for character values less than
|
||||
256. These can be passed to <b>pcre_compile()</b> to override PCRE's internal,
|
||||
built-in tables (which were made by <b>pcre_maketables()</b> when PCRE was
|
||||
compiled). You might want to do this if you are using a non-standard locale.
|
||||
The function yields a pointer to the tables.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,45 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_refcount specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_refcount man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_refcount(pcre *<i>code</i>, int <i>adjust</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function is used to maintain a reference count inside a data block that
|
||||
contains a compiled pattern. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>adjust</i> Adjustment to reference value
|
||||
</pre>
|
||||
The yield of the function is the adjusted reference value, which is constrained
|
||||
to lie between 0 and 65535.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,56 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_study specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_study man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function studies a compiled pattern, to see if additional information can
|
||||
be extracted that might speed up matching. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> A compiled regular expression
|
||||
<i>options</i> Options for <b>pcre_study()</b>
|
||||
<i>errptr</i> Where to put an error message
|
||||
</pre>
|
||||
If the function succeeds, it returns a value that can be passed to
|
||||
<b>pcre_exec()</b> via its <i>extra</i> argument.
|
||||
</P>
|
||||
<P>
|
||||
If the function returns NULL, either it could not find any additional
|
||||
information, or there was an error. You can tell the difference by looking at
|
||||
the error value. It is NULL in first case.
|
||||
</P>
|
||||
<P>
|
||||
There are currently no options defined; the value of the second argument should
|
||||
always be zero.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,39 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcre_version specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_version man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>char *pcre_version(void);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function returns a character string that gives the version number of the
|
||||
PCRE library and the date of its release.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,225 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcrebuild specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrebuild man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC2" href="#SEC2">C++ SUPPORT</a>
|
||||
<li><a name="TOC3" href="#SEC3">UTF-8 SUPPORT</a>
|
||||
<li><a name="TOC4" href="#SEC4">UNICODE CHARACTER PROPERTY SUPPORT</a>
|
||||
<li><a name="TOC5" href="#SEC5">CODE VALUE OF NEWLINE</a>
|
||||
<li><a name="TOC6" href="#SEC6">BUILDING SHARED AND STATIC LIBRARIES</a>
|
||||
<li><a name="TOC7" href="#SEC7">POSIX MALLOC USAGE</a>
|
||||
<li><a name="TOC8" href="#SEC8">HANDLING VERY LARGE PATTERNS</a>
|
||||
<li><a name="TOC9" href="#SEC9">AVOIDING EXCESSIVE STACK USAGE</a>
|
||||
<li><a name="TOC10" href="#SEC10">LIMITING PCRE RESOURCE USAGE</a>
|
||||
<li><a name="TOC11" href="#SEC11">USING EBCDIC CODE</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
|
||||
<P>
|
||||
This document describes the optional features of PCRE that can be selected when
|
||||
the library is compiled. They are all selected, or deselected, by providing
|
||||
options to the <b>configure</b> script that is run before the <b>make</b>
|
||||
command. The complete list of options for <b>configure</b> (which includes the
|
||||
standard ones such as the selection of the installation directory) can be
|
||||
obtained by running
|
||||
<pre>
|
||||
./configure --help
|
||||
</pre>
|
||||
The following sections describe certain options whose names begin with --enable
|
||||
or --disable. These settings specify changes to the defaults for the
|
||||
<b>configure</b> command. Because of the way that <b>configure</b> works,
|
||||
--enable and --disable always come in pairs, so the complementary option always
|
||||
exists as well, but as it specifies the default, it is not described.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">C++ SUPPORT</a><br>
|
||||
<P>
|
||||
By default, the <b>configure</b> script will search for a C++ compiler and C++
|
||||
header files. If it finds them, it automatically builds the C++ wrapper library
|
||||
for PCRE. You can disable this by adding
|
||||
<pre>
|
||||
--disable-cpp
|
||||
</pre>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">UTF-8 SUPPORT</a><br>
|
||||
<P>
|
||||
To build PCRE with support for UTF-8 character strings, add
|
||||
<pre>
|
||||
--enable-utf8
|
||||
</pre>
|
||||
to the <b>configure</b> command. Of itself, this does not make PCRE treat
|
||||
strings as UTF-8. As well as compiling PCRE with this option, you also have
|
||||
have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
|
||||
function.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
|
||||
<P>
|
||||
UTF-8 support allows PCRE to process character values greater than 255 in the
|
||||
strings that it handles. On its own, however, it does not provide any
|
||||
facilities for accessing the properties of such characters. If you want to be
|
||||
able to use the pattern escapes \P, \p, and \X, which refer to Unicode
|
||||
character properties, you must add
|
||||
<pre>
|
||||
--enable-unicode-properties
|
||||
</pre>
|
||||
to the <b>configure</b> command. This implies UTF-8 support, even if you have
|
||||
not explicitly requested it.
|
||||
</P>
|
||||
<P>
|
||||
Including Unicode property support adds around 90K of tables to the PCRE
|
||||
library, approximately doubling its size. Only the general category properties
|
||||
such as <i>Lu</i> and <i>Nd</i> are supported. Details are given in the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
|
||||
<P>
|
||||
By default, PCRE interprets character 10 (linefeed, LF) as indicating the end
|
||||
of a line. This is the normal newline character on Unix-like systems. You can
|
||||
compile PCRE to use character 13 (carriage return, CR) instead, by adding
|
||||
<pre>
|
||||
--enable-newline-is-cr
|
||||
</pre>
|
||||
to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
|
||||
which explicitly specifies linefeed as the newline character.
|
||||
<br>
|
||||
<br>
|
||||
Alternatively, you can specify that line endings are to be indicated by the two
|
||||
character sequence CRLF. If you want this, add
|
||||
<pre>
|
||||
--enable-newline-is-crlf
|
||||
</pre>
|
||||
to the <b>configure</b> command. Whatever line ending convention is selected
|
||||
when PCRE is built can be overridden when the library functions are called. At
|
||||
build time it is conventional to use the standard for your operating system.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
|
||||
<P>
|
||||
The PCRE building process uses <b>libtool</b> to build both shared and static
|
||||
Unix libraries by default. You can suppress one of these by adding one of
|
||||
<pre>
|
||||
--disable-shared
|
||||
--disable-static
|
||||
</pre>
|
||||
to the <b>configure</b> command, as required.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">POSIX MALLOC USAGE</a><br>
|
||||
<P>
|
||||
When PCRE is called through the POSIX interface (see the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
documentation), additional working storage is required for holding the pointers
|
||||
to capturing substrings, because PCRE requires three integers per substring,
|
||||
whereas the POSIX interface provides only two. If the number of expected
|
||||
substrings is small, the wrapper function uses space on the stack, because this
|
||||
is faster than using <b>malloc()</b> for each call. The default threshold above
|
||||
which the stack is no longer used is 10; it can be changed by adding a setting
|
||||
such as
|
||||
<pre>
|
||||
--with-posix-malloc-threshold=20
|
||||
</pre>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
|
||||
<P>
|
||||
Within a compiled pattern, offset values are used to point from one part to
|
||||
another (for example, from an opening parenthesis to an alternation
|
||||
metacharacter). By default, two-byte values are used for these offsets, leading
|
||||
to a maximum size for a compiled pattern of around 64K. This is sufficient to
|
||||
handle all but the most gigantic patterns. Nevertheless, some people do want to
|
||||
process enormous patterns, so it is possible to compile PCRE to use three-byte
|
||||
or four-byte offsets by adding a setting such as
|
||||
<pre>
|
||||
--with-link-size=3
|
||||
</pre>
|
||||
to the <b>configure</b> command. The value given must be 2, 3, or 4. Using
|
||||
longer offsets slows down the operation of PCRE because it has to load
|
||||
additional bytes when handling them.
|
||||
</P>
|
||||
<P>
|
||||
If you build PCRE with an increased link size, test 2 (and test 5 if you are
|
||||
using UTF-8) will fail. Part of the output of these tests is a representation
|
||||
of the compiled pattern, and this changes with the link size.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
|
||||
<P>
|
||||
When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
|
||||
by making recursive calls to an internal function called <b>match()</b>. In
|
||||
environments where the size of the stack is limited, this can severely limit
|
||||
PCRE's operation. (The Unix environment does not usually suffer from this
|
||||
problem, but it may sometimes be necessary to increase the maximum stack size.
|
||||
There is a discussion in the
|
||||
<a href="pcrestack.html"><b>pcrestack</b></a>
|
||||
documentation.) An alternative approach to recursion that uses memory from the
|
||||
heap to remember data, instead of using recursive function calls, has been
|
||||
implemented to work round the problem of limited stack size. If you want to
|
||||
build a version of PCRE that works this way, add
|
||||
<pre>
|
||||
--disable-stack-for-recursion
|
||||
</pre>
|
||||
to the <b>configure</b> command. With this configuration, PCRE will use the
|
||||
<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
|
||||
management functions. Separate functions are provided because the usage is very
|
||||
predictable: the block sizes requested are always the same, and the blocks are
|
||||
always freed in reverse order. A calling program might be able to implement
|
||||
optimized functions that perform better than the standard <b>malloc()</b> and
|
||||
<b>free()</b> functions. PCRE runs noticeably more slowly when built in this
|
||||
way. This option affects only the <b>pcre_exec()</b> function; it is not
|
||||
relevant for the the <b>pcre_dfa_exec()</b> function.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
|
||||
<P>
|
||||
Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
|
||||
(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
|
||||
function. By controlling the maximum number of times this function may be
|
||||
called during a single matching operation, a limit can be placed on the
|
||||
resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
|
||||
at run time, as described in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation. The default is 10 million, but this can be changed by adding a
|
||||
setting such as
|
||||
<pre>
|
||||
--with-match-limit=500000
|
||||
</pre>
|
||||
to the <b>configure</b> command. This setting has no effect on the
|
||||
<b>pcre_dfa_exec()</b> matching function.
|
||||
</P>
|
||||
<P>
|
||||
In some environments it is desirable to limit the depth of recursive calls of
|
||||
<b>match()</b> more strictly than the total number of calls, in order to
|
||||
restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
|
||||
is specified) that is used. A second limit controls this; it defaults to the
|
||||
value that is set for --with-match-limit, which imposes no additional
|
||||
constraints. However, you can set a lower limit by adding, for example,
|
||||
<pre>
|
||||
--with-match-limit-recursion=10000
|
||||
</pre>
|
||||
to the <b>configure</b> command. This value can also be overridden at run time.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">USING EBCDIC CODE</a><br>
|
||||
<P>
|
||||
PCRE assumes by default that it will run in an environment where the character
|
||||
code is ASCII (or Unicode, which is a superset of ASCII). PCRE can, however, be
|
||||
compiled to run in an EBCDIC environment by adding
|
||||
<pre>
|
||||
--enable-ebcdic
|
||||
</pre>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 06 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,186 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcrecallout specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrecallout man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE CALLOUTS</a>
|
||||
<li><a name="TOC2" href="#SEC2">MISSING CALLOUTS</a>
|
||||
<li><a name="TOC3" href="#SEC3">THE CALLOUT INTERFACE</a>
|
||||
<li><a name="TOC4" href="#SEC4">RETURN VALUES</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE CALLOUTS</a><br>
|
||||
<P>
|
||||
<b>int (*pcre_callout)(pcre_callout_block *);</b>
|
||||
</P>
|
||||
<P>
|
||||
PCRE provides a feature called "callout", which is a means of temporarily
|
||||
passing control to the caller of PCRE in the middle of pattern matching. The
|
||||
caller of PCRE provides an external function by putting its entry point in the
|
||||
global variable <i>pcre_callout</i>. By default, this variable contains NULL,
|
||||
which disables all calling out.
|
||||
</P>
|
||||
<P>
|
||||
Within a regular expression, (?C) indicates the points at which the external
|
||||
function is to be called. Different callout points can be identified by putting
|
||||
a number less than 256 after the letter C. The default value is zero.
|
||||
For example, this pattern has two callout points:
|
||||
<pre>
|
||||
(?C1)\deabc(?C2)def
|
||||
</pre>
|
||||
If the PCRE_AUTO_CALLOUT option bit is set when <b>pcre_compile()</b> is called,
|
||||
PCRE automatically inserts callouts, all with number 255, before each item in
|
||||
the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
|
||||
<pre>
|
||||
A(\d{2}|--)
|
||||
</pre>
|
||||
it is processed as if it were
|
||||
<br>
|
||||
<br>
|
||||
(?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
|
||||
<br>
|
||||
<br>
|
||||
Notice that there is a callout before and after each parenthesis and
|
||||
alternation bar. Automatic callouts can be used for tracking the progress of
|
||||
pattern matching. The
|
||||
<a href="pcretest.html"><b>pcretest</b></a>
|
||||
command has an option that sets automatic callouts; when it is used, the output
|
||||
indicates how the pattern is matched. This is useful information when you are
|
||||
trying to optimize the performance of a particular pattern.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">MISSING CALLOUTS</a><br>
|
||||
<P>
|
||||
You should be aware that, because of optimizations in the way PCRE matches
|
||||
patterns, callouts sometimes do not happen. For example, if the pattern is
|
||||
<pre>
|
||||
ab(?C4)cd
|
||||
</pre>
|
||||
PCRE knows that any matching string must contain the letter "d". If the subject
|
||||
string is "abyz", the lack of "d" means that matching doesn't ever start, and
|
||||
the callout is never reached. However, with "abyd", though the result is still
|
||||
no match, the callout is obeyed.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">THE CALLOUT INTERFACE</a><br>
|
||||
<P>
|
||||
During matching, when PCRE reaches a callout point, the external function
|
||||
defined by <i>pcre_callout</i> is called (if it is set). This applies to both
|
||||
the <b>pcre_exec()</b> and the <b>pcre_dfa_exec()</b> matching functions. The
|
||||
only argument to the callout function is a pointer to a <b>pcre_callout</b>
|
||||
block. This structure contains the following fields:
|
||||
<pre>
|
||||
int <i>version</i>;
|
||||
int <i>callout_number</i>;
|
||||
int *<i>offset_vector</i>;
|
||||
const char *<i>subject</i>;
|
||||
int <i>subject_length</i>;
|
||||
int <i>start_match</i>;
|
||||
int <i>current_position</i>;
|
||||
int <i>capture_top</i>;
|
||||
int <i>capture_last</i>;
|
||||
void *<i>callout_data</i>;
|
||||
int <i>pattern_position</i>;
|
||||
int <i>next_item_length</i>;
|
||||
</pre>
|
||||
The <i>version</i> field is an integer containing the version number of the
|
||||
block format. The initial version was 0; the current version is 1. The version
|
||||
number will change again in future if additional fields are added, but the
|
||||
intention is never to remove any of the existing fields.
|
||||
</P>
|
||||
<P>
|
||||
The <i>callout_number</i> field contains the number of the callout, as compiled
|
||||
into the pattern (that is, the number after ?C for manual callouts, and 255 for
|
||||
automatically generated callouts).
|
||||
</P>
|
||||
<P>
|
||||
The <i>offset_vector</i> field is a pointer to the vector of offsets that was
|
||||
passed by the caller to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. When
|
||||
<b>pcre_exec()</b> is used, the contents can be inspected in order to extract
|
||||
substrings that have been matched so far, in the same way as for extracting
|
||||
substrings after a match has completed. For <b>pcre_dfa_exec()</b> this field is
|
||||
not useful.
|
||||
</P>
|
||||
<P>
|
||||
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
|
||||
that were passed to <b>pcre_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
The <i>start_match</i> field contains the offset within the subject at which the
|
||||
current match attempt started. If the pattern is not anchored, the callout
|
||||
function may be called several times from the same point in the pattern for
|
||||
different starting points in the subject.
|
||||
</P>
|
||||
<P>
|
||||
The <i>current_position</i> field contains the offset within the subject of the
|
||||
current match pointer.
|
||||
</P>
|
||||
<P>
|
||||
When the <b>pcre_exec()</b> function is used, the <i>capture_top</i> field
|
||||
contains one more than the number of the highest numbered captured substring so
|
||||
far. If no substrings have been captured, the value of <i>capture_top</i> is
|
||||
one. This is always the case when <b>pcre_dfa_exec()</b> is used, because it
|
||||
does not support captured substrings.
|
||||
</P>
|
||||
<P>
|
||||
The <i>capture_last</i> field contains the number of the most recently captured
|
||||
substring. If no substrings have been captured, its value is -1. This is always
|
||||
the case when <b>pcre_dfa_exec()</b> is used.
|
||||
</P>
|
||||
<P>
|
||||
The <i>callout_data</i> field contains a value that is passed to
|
||||
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> specifically so that it can be
|
||||
passed back in callouts. It is passed in the <i>pcre_callout</i> field of the
|
||||
<b>pcre_extra</b> data structure. If no such data was passed, the value of
|
||||
<i>callout_data</i> in a <b>pcre_callout</b> block is NULL. There is a
|
||||
description of the <b>pcre_extra</b> structure in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
The <i>pattern_position</i> field is present from version 1 of the
|
||||
<i>pcre_callout</i> structure. It contains the offset to the next item to be
|
||||
matched in the pattern string.
|
||||
</P>
|
||||
<P>
|
||||
The <i>next_item_length</i> field is present from version 1 of the
|
||||
<i>pcre_callout</i> structure. It contains the length of the next item to be
|
||||
matched in the pattern string. When the callout immediately precedes an
|
||||
alternation bar, a closing parenthesis, or the end of the pattern, the length
|
||||
is zero. When the callout precedes an opening parenthesis, the length is that
|
||||
of the entire subpattern.
|
||||
</P>
|
||||
<P>
|
||||
The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
|
||||
help in distinguishing between different automatic callouts, which all have the
|
||||
same callout number. However, they are set for all callouts.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">RETURN VALUES</a><br>
|
||||
<P>
|
||||
The external callout function returns an integer to PCRE. If the value is zero,
|
||||
matching proceeds as normal. If the value is greater than zero, matching fails
|
||||
at the current point, but the testing of other matching possibilities goes
|
||||
ahead, just as if a lookahead assertion had failed. If the value is less than
|
||||
zero, the match is abandoned, and <b>pcre_exec()</b> (or <b>pcre_dfa_exec()</b>)
|
||||
returns the negative value.
|
||||
</P>
|
||||
<P>
|
||||
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
|
||||
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
|
||||
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
|
||||
it will never be used by PCRE itself.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 28 February 2005
|
||||
<br>
|
||||
Copyright © 1997-2005 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,156 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcrecompat specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrecompat man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
DIFFERENCES BETWEEN PCRE AND PERL
|
||||
</b><br>
|
||||
<P>
|
||||
This document describes the differences in the ways that PCRE and Perl handle
|
||||
regular expressions. The differences described here are with respect to Perl
|
||||
5.8.
|
||||
</P>
|
||||
<P>
|
||||
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
|
||||
it does have are given in the
|
||||
<a href="pcre.html#utf8support">section on UTF-8 support</a>
|
||||
in the main
|
||||
<a href="pcre.html"><b>pcre</b></a>
|
||||
page.
|
||||
</P>
|
||||
<P>
|
||||
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
|
||||
them, but they do not mean what you might think. For example, (?!a){3} does
|
||||
not assert that the next three characters are not "a". It just asserts that the
|
||||
next character is not "a" three times.
|
||||
</P>
|
||||
<P>
|
||||
3. Capturing subpatterns that occur inside negative lookahead assertions are
|
||||
counted, but their entries in the offsets vector are never set. Perl sets its
|
||||
numerical variables from any such patterns that are matched before the
|
||||
assertion fails to match something (thereby succeeding), but only if the
|
||||
negative lookahead assertion contains just one branch.
|
||||
</P>
|
||||
<P>
|
||||
4. Though binary zero characters are supported in the subject string, they are
|
||||
not allowed in a pattern string because it is passed as a normal C string,
|
||||
terminated by zero. The escape sequence \0 can be used in the pattern to
|
||||
represent a binary zero.
|
||||
</P>
|
||||
<P>
|
||||
5. The following Perl escape sequences are not supported: \l, \u, \L,
|
||||
\U, and \N. In fact these are implemented by Perl's general string-handling
|
||||
and are not part of its pattern matching engine. If any of these are
|
||||
encountered by PCRE, an error is generated.
|
||||
</P>
|
||||
<P>
|
||||
6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is
|
||||
built with Unicode character property support. The properties that can be
|
||||
tested with \p and \P are limited to the general category properties such as
|
||||
Lu and Nd, script names such as Greek or Han, and the derived properties Any
|
||||
and L&.
|
||||
</P>
|
||||
<P>
|
||||
7. PCRE does support the \Q...\E escape for quoting substrings. Characters in
|
||||
between are treated as literals. This is slightly different from Perl in that $
|
||||
and @ are also handled as literals inside the quotes. In Perl, they cause
|
||||
variable interpolation (but of course PCRE does not have variables). Note the
|
||||
following examples:
|
||||
<pre>
|
||||
Pattern PCRE matches Perl matches
|
||||
|
||||
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||
</pre>
|
||||
The \Q...\E sequence is recognized both inside and outside character classes.
|
||||
</P>
|
||||
<P>
|
||||
8. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
|
||||
constructions. However, there is support for recursive patterns using the
|
||||
non-Perl items (?R), (?number), and (?P>name). Also, the PCRE "callout" feature
|
||||
allows an external function to be called during pattern matching. See the
|
||||
<a href="pcrecallout.html"><b>pcrecallout</b></a>
|
||||
documentation for details.
|
||||
</P>
|
||||
<P>
|
||||
9. There are some differences that are concerned with the settings of captured
|
||||
strings when part of a pattern is repeated. For example, matching "aba" against
|
||||
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
|
||||
</P>
|
||||
<P>
|
||||
10. PCRE provides some extensions to the Perl regular expression facilities:
|
||||
<br>
|
||||
<br>
|
||||
(a) Although lookbehind assertions must match fixed length strings, each
|
||||
alternative branch of a lookbehind assertion can match a different length of
|
||||
string. Perl requires them all to have the same length.
|
||||
<br>
|
||||
<br>
|
||||
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
|
||||
meta-character matches only at the very end of the string.
|
||||
<br>
|
||||
<br>
|
||||
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
|
||||
meaning is faulted. Otherwise, like Perl, the backslash is ignored. (Perl can
|
||||
be made to issue a warning.)
|
||||
<br>
|
||||
<br>
|
||||
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||
inverted, that is, by default they are not greedy, but if followed by a
|
||||
question mark they are.
|
||||
<br>
|
||||
<br>
|
||||
(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
|
||||
only at the first matching position in the subject string.
|
||||
<br>
|
||||
<br>
|
||||
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
|
||||
options for <b>pcre_exec()</b> have no Perl equivalents.
|
||||
<br>
|
||||
<br>
|
||||
(g) The (?R), (?number), and (?P>name) constructs allows for recursive pattern
|
||||
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
|
||||
support.)
|
||||
<br>
|
||||
<br>
|
||||
(h) PCRE supports named capturing substrings, using the Python syntax.
|
||||
<br>
|
||||
<br>
|
||||
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
|
||||
package.
|
||||
<br>
|
||||
<br>
|
||||
(j) The (R) condition, for testing recursion, is a PCRE extension.
|
||||
<br>
|
||||
<br>
|
||||
(k) The callout facility is PCRE-specific.
|
||||
<br>
|
||||
<br>
|
||||
(l) The partial matching facility is PCRE-specific.
|
||||
<br>
|
||||
<br>
|
||||
(m) Patterns compiled by PCRE can be saved and re-used at a later time, even on
|
||||
different hosts that have the other endianness.
|
||||
<br>
|
||||
<br>
|
||||
(n) The alternative matching function (<b>pcre_dfa_exec()</b>) matches in a
|
||||
different way and is not Perl-compatible.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 06 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,337 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcrecpp specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrecpp man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
|
||||
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
||||
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
|
||||
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a>
|
||||
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a>
|
||||
<li><a name="TOC6" href="#SEC6">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
|
||||
<li><a name="TOC7" href="#SEC7">SCANNING TEXT INCREMENTALLY</a>
|
||||
<li><a name="TOC8" href="#SEC8">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
|
||||
<li><a name="TOC9" href="#SEC9">REPLACING PARTS OF STRINGS</a>
|
||||
<li><a name="TOC10" href="#SEC10">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
|
||||
<P>
|
||||
<b>#include <pcrecpp.h></b>
|
||||
</P>
|
||||
<P>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
The C++ wrapper for PCRE was provided by Google Inc. Some additional
|
||||
functionality was added by Giuseppe Maxia. This brief man page was constructed
|
||||
from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
|
||||
further details.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
|
||||
<P>
|
||||
The "FullMatch" operation checks that supplied text matches a supplied pattern
|
||||
exactly. If pointer arguments are supplied, it copies matched sub-strings that
|
||||
match sub-patterns into them.
|
||||
<pre>
|
||||
Example: successful match
|
||||
pcrecpp::RE re("h.*o");
|
||||
re.FullMatch("hello");
|
||||
|
||||
Example: unsuccessful match (requires full match):
|
||||
pcrecpp::RE re("e");
|
||||
!re.FullMatch("hello");
|
||||
|
||||
Example: creating a temporary RE object:
|
||||
pcrecpp::RE("h.*o").FullMatch("hello");
|
||||
</pre>
|
||||
You can pass in a "const char*" or a "string" for "text". The examples below
|
||||
tend to use a const char*. You can, as in the different examples above, store
|
||||
the RE object explicitly in a variable or use a temporary RE object. The
|
||||
examples below use one mode or the other arbitrarily. Either could correctly be
|
||||
used for any of these examples.
|
||||
</P>
|
||||
<P>
|
||||
You must supply extra pointer arguments to extract matched subpieces.
|
||||
<pre>
|
||||
Example: extracts "ruby" into "s" and 1234 into "i"
|
||||
int i;
|
||||
string s;
|
||||
pcrecpp::RE re("(\\w+):(\\d+)");
|
||||
re.FullMatch("ruby:1234", &s, &i);
|
||||
|
||||
Example: does not try to extract any extra sub-patterns
|
||||
re.FullMatch("ruby:1234", &s);
|
||||
|
||||
Example: does not try to extract into NULL
|
||||
re.FullMatch("ruby:1234", NULL, &i);
|
||||
|
||||
Example: integer overflow causes failure
|
||||
!re.FullMatch("ruby:1234567891234", NULL, &i);
|
||||
|
||||
Example: fails because there aren't enough sub-patterns:
|
||||
!pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
|
||||
|
||||
Example: fails because string cannot be stored in integer
|
||||
!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
|
||||
</pre>
|
||||
The provided pointer arguments can be pointers to any scalar numeric
|
||||
type, or one of:
|
||||
<pre>
|
||||
string (matched piece is copied to string)
|
||||
StringPiece (StringPiece is mutated to point to matched piece)
|
||||
T (where "bool T::ParseFrom(const char*, int)" exists)
|
||||
NULL (the corresponding matched sub-pattern is not copied)
|
||||
</pre>
|
||||
The function returns true iff all of the following conditions are satisfied:
|
||||
<pre>
|
||||
a. "text" matches "pattern" exactly;
|
||||
|
||||
b. The number of matched sub-patterns is >= number of supplied
|
||||
pointers;
|
||||
|
||||
c. The "i"th argument has a suitable type for holding the
|
||||
string captured as the "i"th sub-pattern. If you pass in
|
||||
NULL for the "i"th argument, or pass fewer arguments than
|
||||
number of sub-patterns, "i"th captured sub-pattern is
|
||||
ignored.
|
||||
</pre>
|
||||
The matching interface supports at most 16 arguments per call.
|
||||
If you need more, consider using the more general interface
|
||||
<b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
|
||||
<b>DoMatch</b>.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHES</a><br>
|
||||
<P>
|
||||
You can use the "PartialMatch" operation when you want the pattern
|
||||
to match any substring of the text.
|
||||
<pre>
|
||||
Example: simple search for a string:
|
||||
pcrecpp::RE("ell").PartialMatch("hello");
|
||||
|
||||
Example: find first number in a string:
|
||||
int number;
|
||||
pcrecpp::RE re("(\\d+)");
|
||||
re.PartialMatch("x*100 + 20", &number);
|
||||
assert(number == 100);
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
|
||||
<P>
|
||||
By default, pattern and text are plain text, one byte per character. The UTF8
|
||||
flag, passed to the constructor, causes both pattern and string to be treated
|
||||
as UTF-8 text, still a byte stream but potentially multiple bytes per
|
||||
character. In practice, the text is likelier to be UTF-8 than the pattern, but
|
||||
the match returned may depend on the UTF8 flag, so always use it when matching
|
||||
UTF8 text. For example, "." will match one byte normally but with UTF8 set may
|
||||
match up to three bytes of a multi-byte character.
|
||||
<pre>
|
||||
Example:
|
||||
pcrecpp::RE_Options options;
|
||||
options.set_utf8();
|
||||
pcrecpp::RE re(utf8_pattern, options);
|
||||
re.FullMatch(utf8_string);
|
||||
|
||||
Example: using the convenience function UTF8():
|
||||
pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
|
||||
re.FullMatch(utf8_string);
|
||||
</pre>
|
||||
NOTE: The UTF8 flag is ignored if pcre was not configured with the
|
||||
<pre>
|
||||
--enable-utf8 flag.
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
|
||||
<P>
|
||||
PCRE defines some modifiers to change the behavior of the regular expression
|
||||
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
|
||||
pass such modifiers to a RE class. Currently, the following modifiers are
|
||||
supported:
|
||||
<pre>
|
||||
modifier description Perl corresponding
|
||||
|
||||
PCRE_CASELESS case insensitive match /i
|
||||
PCRE_MULTILINE multiple lines match /m
|
||||
PCRE_DOTALL dot matches newlines /s
|
||||
PCRE_DOLLAR_ENDONLY $ matches only at end N/A
|
||||
PCRE_EXTRA strict escape parsing N/A
|
||||
PCRE_EXTENDED ignore whitespaces /x
|
||||
PCRE_UTF8 handles UTF8 chars built-in
|
||||
PCRE_UNGREEDY reverses * and *? N/A
|
||||
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
|
||||
</pre>
|
||||
(*) Both Perl and PCRE allow non capturing parentheses by means of the
|
||||
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
|
||||
capture, while (ab|cd) does.
|
||||
</P>
|
||||
<P>
|
||||
For a full account on how each modifier works, please check the
|
||||
PCRE API reference page.
|
||||
</P>
|
||||
<P>
|
||||
For each modifier, there are two member functions whose name is made
|
||||
out of the modifier in lowercase, without the "PCRE_" prefix. For
|
||||
instance, PCRE_CASELESS is handled by
|
||||
<pre>
|
||||
bool caseless()
|
||||
</pre>
|
||||
which returns true if the modifier is set, and
|
||||
<pre>
|
||||
RE_Options & set_caseless(bool)
|
||||
</pre>
|
||||
which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
|
||||
accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
|
||||
functions. Setting <i>match_limit</i> to a non-zero value will limit the
|
||||
execution of pcre to keep it from doing bad things like blowing the stack or
|
||||
taking an eternity to return a result. A value of 5000 is good enough to stop
|
||||
stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
|
||||
match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
|
||||
which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
|
||||
recurses. <b>match_limit()</b> limits the number of matches PCRE does;
|
||||
<b>match_limit_recursion()</b> limits the depth of internal recursion, and
|
||||
therefore the amount of stack that is used.
|
||||
</P>
|
||||
<P>
|
||||
Normally, to pass one or more modifiers to a RE class, you declare
|
||||
a <i>RE_Options</i> object, set the appropriate options, and pass this
|
||||
object to a RE constructor. Example:
|
||||
<pre>
|
||||
RE_options opt;
|
||||
opt.set_caseless(true);
|
||||
if (RE("HELLO", opt).PartialMatch("hello world")) ...
|
||||
</pre>
|
||||
RE_options has two constructors. The default constructor takes no arguments and
|
||||
creates a set of flags that are off by default. The optional parameter
|
||||
<i>option_flags</i> is to facilitate transfer of legacy code from C programs.
|
||||
This lets you do
|
||||
<pre>
|
||||
RE(pattern,
|
||||
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
|
||||
</pre>
|
||||
However, new code is better off doing
|
||||
<pre>
|
||||
RE(pattern,
|
||||
RE_Options().set_caseless(true).set_multiline(true))
|
||||
.PartialMatch(str);
|
||||
</pre>
|
||||
If you are going to pass one of the most used modifiers, there are some
|
||||
convenience functions that return a RE_Options class with the
|
||||
appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
|
||||
<b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
|
||||
</P>
|
||||
<P>
|
||||
If you need to set several options at once, and you don't want to go through
|
||||
the pains of declaring a RE_Options object and setting several options, there
|
||||
is a parallel method that give you such ability on the fly. You can concatenate
|
||||
several <b>set_xxxxx()</b> member functions, since each of them returns a
|
||||
reference to its class object. For example, to pass PCRE_CASELESS,
|
||||
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
|
||||
<pre>
|
||||
RE(" ^ xyz \\s+ .* blah$",
|
||||
RE_Options()
|
||||
.set_caseless(true)
|
||||
.set_extended(true)
|
||||
.set_multiline(true)).PartialMatch(sometext);
|
||||
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
|
||||
<P>
|
||||
The "Consume" operation may be useful if you want to repeatedly
|
||||
match regular expressions at the front of a string and skip over
|
||||
them as they match. This requires use of the "StringPiece" type,
|
||||
which represents a sub-range of a real string. Like RE, StringPiece
|
||||
is defined in the pcrecpp namespace.
|
||||
<pre>
|
||||
Example: read lines of the form "var = value" from a string.
|
||||
string contents = ...; // Fill string somehow
|
||||
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
string var;
|
||||
int value;
|
||||
pcrecpp::RE re("(\\w+) = (\\d+)\n");
|
||||
while (re.Consume(&input, &var, &value)) {
|
||||
...;
|
||||
}
|
||||
</pre>
|
||||
Each successful call to "Consume" will set "var/value", and also
|
||||
advance "input" so it points past the matched text.
|
||||
</P>
|
||||
<P>
|
||||
The "FindAndConsume" operation is similar to "Consume" but does not
|
||||
anchor your match at the beginning of the string. For example, you
|
||||
could extract all words from a string by repeatedly calling
|
||||
<pre>
|
||||
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
|
||||
<P>
|
||||
By default, if you pass a pointer to a numeric value, the
|
||||
corresponding text is interpreted as a base-10 number. You can
|
||||
instead wrap the pointer with a call to one of the operators Hex(),
|
||||
Octal(), or CRadix() to interpret the text in another base. The
|
||||
CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
|
||||
prefixes, but defaults to base-10.
|
||||
<pre>
|
||||
Example:
|
||||
int a, b, c, d;
|
||||
pcrecpp::RE re("(.*) (.*) (.*) (.*)");
|
||||
re.FullMatch("100 40 0100 0x40",
|
||||
pcrecpp::Octal(&a), pcrecpp::Hex(&b),
|
||||
pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
|
||||
</pre>
|
||||
will leave 64 in a, b, c, and d.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
|
||||
<P>
|
||||
You can replace the first match of "pattern" in "str" with "rewrite".
|
||||
Within "rewrite", backslash-escaped digits (\1 to \9) can be
|
||||
used to insert text matching corresponding parenthesized group
|
||||
from the pattern. \0 in "rewrite" refers to the entire matching
|
||||
text. For example:
|
||||
<pre>
|
||||
string s = "yabba dabba doo";
|
||||
pcrecpp::RE("b+").Replace("d", &s);
|
||||
</pre>
|
||||
will leave "s" containing "yada dabba doo". The result is true if the pattern
|
||||
matches and a replacement occurs, false otherwise.
|
||||
</P>
|
||||
<P>
|
||||
<b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
|
||||
occurrences of the pattern in the string with the rewrite. Replacements are
|
||||
not subject to re-matching. For example:
|
||||
<pre>
|
||||
string s = "yabba dabba doo";
|
||||
pcrecpp::RE("b+").GlobalReplace("d", &s);
|
||||
</pre>
|
||||
will leave "s" containing "yada dada doo". It returns the number of
|
||||
replacements made.
|
||||
</P>
|
||||
<P>
|
||||
<b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
|
||||
"rewrite" is copied into "out" (an additional argument) with substitutions.
|
||||
The non-matching portions of "text" are ignored. Returns true iff a match
|
||||
occurred and the extraction happened successfully; if no match occurs, the
|
||||
string is left unaffected.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
The C++ wrapper was contributed by Google Inc.
|
||||
<br>
|
||||
Copyright © 2005 Google Inc.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,424 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcregrep specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcregrep man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
|
||||
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
||||
<li><a name="TOC3" href="#SEC3">OPTIONS</a>
|
||||
<li><a name="TOC4" href="#SEC4">ENVIRONMENT VARIABLES</a>
|
||||
<li><a name="TOC5" href="#SEC5">NEWLINES</a>
|
||||
<li><a name="TOC6" href="#SEC6">OPTIONS COMPATIBILITY</a>
|
||||
<li><a name="TOC7" href="#SEC7">OPTIONS WITH DATA</a>
|
||||
<li><a name="TOC8" href="#SEC8">MATCHING ERRORS</a>
|
||||
<li><a name="TOC9" href="#SEC9">DIAGNOSTICS</a>
|
||||
<li><a name="TOC10" href="#SEC10">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
<b>pcregrep [options] [long options] [pattern] [path1 path2 ...]</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
<b>pcregrep</b> searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
for a full description of syntax and semantics of the regular expressions that
|
||||
PCRE supports.
|
||||
</P>
|
||||
<P>
|
||||
Patterns, whether supplied on the command line or in a separate file, are given
|
||||
without delimiters. For example:
|
||||
<pre>
|
||||
pcregrep Thursday /etc/motd
|
||||
</pre>
|
||||
If you attempt to use delimiters (for example, by surrounding a pattern with
|
||||
slashes, as is common in Perl scripts), they are interpreted as part of the
|
||||
pattern. Quotes can of course be used on the command line because they are
|
||||
interpreted by the shell, and indeed they are required if a pattern contains
|
||||
white space or shell metacharacters.
|
||||
</P>
|
||||
<P>
|
||||
The first argument that follows any option settings is treated as the single
|
||||
pattern to be matched when neither <b>-e</b> nor <b>-f</b> is present.
|
||||
Conversely, when one or both of these options are used to specify patterns, all
|
||||
arguments are treated as path names. At least one of <b>-e</b>, <b>-f</b>, or an
|
||||
argument pattern must be provided.
|
||||
</P>
|
||||
<P>
|
||||
If no files are specified, <b>pcregrep</b> reads the standard input. The
|
||||
standard input can also be referenced by a name consisting of a single hyphen.
|
||||
For example:
|
||||
<pre>
|
||||
pcregrep some-pattern /file1 - /file3
|
||||
</pre>
|
||||
By default, each line that matches the pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at the
|
||||
start of each line. However, there are options that can change how
|
||||
<b>pcregrep</b> behaves. In particular, the <b>-M</b> option makes it possible to
|
||||
search for patterns that span line boundaries. What defines a line boundary is
|
||||
controlled by the <b>-N</b> (<b>--newline</b>) option.
|
||||
</P>
|
||||
<P>
|
||||
Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
|
||||
BUFSIZ is defined in <b><stdio.h></b>.
|
||||
</P>
|
||||
<P>
|
||||
If the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variable is set,
|
||||
<b>pcregrep</b> uses the value to set a locale when calling the PCRE library.
|
||||
The <b>--locale</b> option can be used to override this.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">OPTIONS</a><br>
|
||||
<P>
|
||||
<b>--</b>
|
||||
This terminate the list of options. It is useful if the next item on the
|
||||
command line starts with a hyphen but is not an option. This allows for the
|
||||
processing of patterns and filenames that start with hyphens.
|
||||
</P>
|
||||
<P>
|
||||
<b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
|
||||
Output <i>number</i> lines of context after each matching line. If filenames
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
|
||||
guarantees to have up to 8K of following text available for context output.
|
||||
</P>
|
||||
<P>
|
||||
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
|
||||
Output <i>number</i> lines of context before each matching line. If filenames
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
|
||||
guarantees to have up to 8K of preceding text available for context output.
|
||||
</P>
|
||||
<P>
|
||||
<b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
|
||||
Output <i>number</i> lines of context both before and after each matching line.
|
||||
This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
|
||||
</P>
|
||||
<P>
|
||||
<b>-c</b>, <b>--count</b>
|
||||
Do not output individual lines; instead just output a count of the number of
|
||||
lines that would otherwise have been output. If several files are given, a
|
||||
count is output for each of them. In this mode, the <b>-A</b>, <b>-B</b>, and
|
||||
<b>-C</b> options are ignored.
|
||||
</P>
|
||||
<P>
|
||||
<b>--colour</b>, <b>--color</b>
|
||||
If this option is given without any data, it is equivalent to "--colour=auto".
|
||||
If data is required, it must be given in the same shell item, separated by an
|
||||
equals sign.
|
||||
</P>
|
||||
<P>
|
||||
<b>--colour=</b><i>value</i>, <b>--color=</b><i>value</i>
|
||||
This option specifies under what circumstances the part of a line that matched
|
||||
a pattern should be coloured in the output. The value may be "never" (the
|
||||
default), "always", or "auto". In the latter case, colouring happens only if
|
||||
the standard output is connected to a terminal. The colour can be specified by
|
||||
setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
|
||||
of this variable should be a string of two numbers, separated by a semicolon.
|
||||
They are copied directly into the control string for setting colour on a
|
||||
terminal, so it is your responsibility to ensure that they make sense. If
|
||||
neither of the environment variables is set, the default is "1;31", which gives
|
||||
red.
|
||||
</P>
|
||||
<P>
|
||||
<b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
|
||||
If an input path is not a regular file or a directory, "action" specifies how
|
||||
it is to be processed. Valid values are "read" (the default) or "skip"
|
||||
(silently skip the path).
|
||||
</P>
|
||||
<P>
|
||||
<b>-d</b> <i>action</i>, <b>--directories=</b><i>action</i>
|
||||
If an input path is a directory, "action" specifies how it is to be processed.
|
||||
Valid values are "read" (the default), "recurse" (equivalent to the <b>-r</b>
|
||||
option), or "skip" (silently skip the path). In the default case, directories
|
||||
are read as if they were ordinary files. In some operating systems the effect
|
||||
of reading a directory like this is an immediate end-of-file.
|
||||
</P>
|
||||
<P>
|
||||
<b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>,
|
||||
<b>--regexp=</b><i>pattern</i> Specify a pattern to be matched. This option can
|
||||
be used multiple times in order to specify several patterns. It can also be
|
||||
used as a way of specifying a single pattern that starts with a hyphen. When
|
||||
<b>-e</b> is used, no argument pattern is taken from the command line; all
|
||||
arguments are treated as file names. There is an overall maximum of 100
|
||||
patterns. They are applied to each line in the order in which they are defined
|
||||
until one matches (or fails to match if <b>-v</b> is used). If <b>-f</b> is used
|
||||
with <b>-e</b>, the command line patterns are matched first, followed by the
|
||||
patterns from the file, independent of the order in which these options are
|
||||
specified. Note that multiple use of <b>-e</b> is not the same as a single
|
||||
pattern with alternatives. For example, X|Y finds the first character in a line
|
||||
that is X or Y, whereas if the two patterns are given separately,
|
||||
<b>pcregrep</b> finds X if it is present, even if it follows Y in the line. It
|
||||
finds Y only if there is no X in the line. This really matters only if you are
|
||||
using <b>-o</b> to show the portion of the line that matched.
|
||||
</P>
|
||||
<P>
|
||||
<b>--exclude</b>=<i>pattern</i>
|
||||
When <b>pcregrep</b> is searching the files in a directory as a consequence of
|
||||
the <b>-r</b> (recursive search) option, any files whose names match the pattern
|
||||
are excluded. The pattern is a PCRE regular expression. If a file name matches
|
||||
both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no short
|
||||
form for this option.
|
||||
</P>
|
||||
<P>
|
||||
<b>-F</b>, <b>--fixed-strings</b>
|
||||
Interpret each pattern as a list of fixed strings, separated by newlines,
|
||||
instead of as a regular expression. The <b>-w</b> (match as a word) and <b>-x</b>
|
||||
(match whole line) options can be used with <b>-F</b>. They apply to each of the
|
||||
fixed strings. A line is selected if any of the fixed strings are found in it
|
||||
(subject to <b>-w</b> or <b>-x</b>, if present).
|
||||
</P>
|
||||
<P>
|
||||
<b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
|
||||
Read a number of patterns from the file, one per line, and match them against
|
||||
each line of input. A data line is output if any of the patterns match it. The
|
||||
filename can be given as "-" to refer to the standard input. When <b>-f</b> is
|
||||
used, patterns specified on the command line using <b>-e</b> may also be
|
||||
present; they are tested before the file's patterns. However, no other pattern
|
||||
is taken from the command line; all arguments are treated as file names. There
|
||||
is an overall maximum of 100 patterns. Trailing white space is removed from
|
||||
each line, and blank lines are ignored. An empty file contains no patterns and
|
||||
therefore matches nothing.
|
||||
</P>
|
||||
<P>
|
||||
<b>-H</b>, <b>--with-filename</b>
|
||||
Force the inclusion of the filename at the start of output lines when searching
|
||||
a single file. By default, the filename is not shown in this case. For matching
|
||||
lines, the filename is followed by a colon and a space; for context lines, a
|
||||
hyphen separator is used. If a line number is also being output, it follows the
|
||||
file name without a space.
|
||||
</P>
|
||||
<P>
|
||||
<b>-h</b>, <b>--no-filename</b>
|
||||
Suppress the output filenames when searching multiple files. By default,
|
||||
filenames are shown when multiple files are searched. For matching lines, the
|
||||
filename is followed by a colon and a space; for context lines, a hyphen
|
||||
separator is used. If a line number is also being output, it follows the file
|
||||
name without a space.
|
||||
</P>
|
||||
<P>
|
||||
<b>--help</b>
|
||||
Output a brief help message and exit.
|
||||
</P>
|
||||
<P>
|
||||
<b>-i</b>, <b>--ignore-case</b>
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
</P>
|
||||
<P>
|
||||
<b>--include</b>=<i>pattern</i>
|
||||
When <b>pcregrep</b> is searching the files in a directory as a consequence of
|
||||
the <b>-r</b> (recursive search) option, only those files whose names match the
|
||||
pattern are included. The pattern is a PCRE regular expression. If a file name
|
||||
matches both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no
|
||||
short form for this option.
|
||||
</P>
|
||||
<P>
|
||||
<b>-L</b>, <b>--files-without-match</b>
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
that do not contain any lines that would have been output. Each file name is
|
||||
output once, on a separate line.
|
||||
</P>
|
||||
<P>
|
||||
<b>-l</b>, <b>--files-with-matches</b>
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
containing lines that would have been output. Each file name is output
|
||||
once, on a separate line. Searching stops as soon as a matching line is found
|
||||
in a file.
|
||||
</P>
|
||||
<P>
|
||||
<b>--label</b>=<i>name</i>
|
||||
This option supplies a name to be used for the standard input when file names
|
||||
are being output. If not supplied, "(standard input)" is used. There is no
|
||||
short form for this option.
|
||||
</P>
|
||||
<P>
|
||||
<b>--locale</b>=<i>locale-name</i>
|
||||
This option specifies a locale to be used for pattern matching. It overrides
|
||||
the value in the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variables. If no
|
||||
locale is specified, the PCRE library's default (usually the "C" locale) is
|
||||
used. There is no short form for this option.
|
||||
</P>
|
||||
<P>
|
||||
<b>-M</b>, <b>--multiline</b>
|
||||
Allow patterns to match more than one line. When this option is given, patterns
|
||||
may usefully contain literal newline characters and internal occurrences of ^
|
||||
and $ characters. The output for any one match may consist of more than one
|
||||
line. When this option is set, the PCRE library is called in "multiline" mode.
|
||||
There is a limit to the number of lines that can be matched, imposed by the way
|
||||
that <b>pcregrep</b> buffers the input file as it scans it. However,
|
||||
<b>pcregrep</b> ensures that at least 8K characters or the rest of the document
|
||||
(whichever is the shorter) are available for forward matching, and similarly
|
||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||
are guaranteed to be available for lookbehind assertions.
|
||||
</P>
|
||||
<P>
|
||||
<b>-N</b> <i>newline-type</i>, <b>--newline=</b><i>newline-type</i>
|
||||
The PCRE library supports three different character sequences for indicating
|
||||
the ends of lines. They are the single-character sequences CR (carriage return)
|
||||
and LF (linefeed), and the two-character sequence CR, LF. When the library is
|
||||
built, a default line-ending sequence is specified. This is normally the
|
||||
standard sequence for the operating system. Unless otherwise specified by this
|
||||
option, <b>pcregrep</b> uses the default. The possible values for this option
|
||||
are CR, LF, or CRLF. This makes it possible to use <b>pcregrep</b> on files that
|
||||
have come from other environments without having to modify their line endings.
|
||||
If the data that is being scanned does not agree with the convention set by
|
||||
this option, <b>pcregrep</b> may behave in strange ways.
|
||||
</P>
|
||||
<P>
|
||||
<b>-n</b>, <b>--line-number</b>
|
||||
Precede each output line by its line number in the file, followed by a colon
|
||||
and a space for matching lines or a hyphen and a space for context lines. If
|
||||
the filename is also being output, it precedes the line number.
|
||||
</P>
|
||||
<P>
|
||||
<b>-o</b>, <b>--only-matching</b>
|
||||
Show only the part of the line that matched a pattern. In this mode, no
|
||||
context is shown. That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b> options are
|
||||
ignored.
|
||||
</P>
|
||||
<P>
|
||||
<b>-q</b>, <b>--quiet</b>
|
||||
Work quietly, that is, display nothing except error messages. The exit
|
||||
status indicates whether or not any matches were found.
|
||||
</P>
|
||||
<P>
|
||||
<b>-r</b>, <b>--recursive</b>
|
||||
If any given path is a directory, recursively scan the files it contains,
|
||||
taking note of any <b>--include</b> and <b>--exclude</b> settings. By default, a
|
||||
directory is read as a normal file; in some operating systems this gives an
|
||||
immediate end-of-file. This option is a shorthand for setting the <b>-d</b>
|
||||
option to "recurse".
|
||||
</P>
|
||||
<P>
|
||||
<b>-s</b>, <b>--no-messages</b>
|
||||
Suppress error messages about non-existent or unreadable files. Such files are
|
||||
quietly skipped. However, the return code is still 2, even if matches were
|
||||
found in other files.
|
||||
</P>
|
||||
<P>
|
||||
<b>-u</b>, <b>--utf-8</b>
|
||||
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
|
||||
with UTF-8 support. Both patterns and subject lines must be valid strings of
|
||||
UTF-8 characters.
|
||||
</P>
|
||||
<P>
|
||||
<b>-V</b>, <b>--version</b>
|
||||
Write the version numbers of <b>pcregrep</b> and the PCRE library that is being
|
||||
used to the standard error stream.
|
||||
</P>
|
||||
<P>
|
||||
<b>-v</b>, <b>--invert-match</b>
|
||||
Invert the sense of the match, so that lines which do <i>not</i> match any of
|
||||
the patterns are the ones that are found.
|
||||
</P>
|
||||
<P>
|
||||
<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
|
||||
Force the patterns to match only whole words. This is equivalent to having \b
|
||||
at the start and end of the pattern.
|
||||
</P>
|
||||
<P>
|
||||
<b>-x</b>, <b>--line-regex</b>, \fP--line-regexp\fP
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in every pattern.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
|
||||
<P>
|
||||
The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
|
||||
order, for a locale. The first one that is set is used. This can be overridden
|
||||
by the <b>--locale</b> option. If no locale is set, the PCRE library's default
|
||||
(usually the "C" locale) is used.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">NEWLINES</a><br>
|
||||
<P>
|
||||
The <b>-N</b> (<b>--newline</b>) option allows <b>pcregrep</b> to scan files with
|
||||
different newline conventions from the default. However, the setting of this
|
||||
option does not affect the way in which <b>pcregrep</b> writes information to
|
||||
the standard error and output streams. It uses the string "\n" in C
|
||||
<b>printf()</b> calls to indicate newlines, relying on the C I/O library to
|
||||
convert this to an appropriate sequence if the output is sent to a file.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
|
||||
<P>
|
||||
The majority of short and long forms of <b>pcregrep</b>'s options are the same
|
||||
as in the GNU <b>grep</b> program. Any long option of the form
|
||||
<b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
|
||||
(PCRE terminology). However, the <b>--locale</b>, <b>-M</b>, <b>--multiline</b>,
|
||||
<b>-u</b>, and <b>--utf-8</b> options are specific to <b>pcregrep</b>.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">OPTIONS WITH DATA</a><br>
|
||||
<P>
|
||||
There are four different ways in which an option with data can be specified.
|
||||
If a short form option is used, the data may follow immediately, or in the next
|
||||
command line item. For example:
|
||||
<pre>
|
||||
-f/some/file
|
||||
-f /some/file
|
||||
</pre>
|
||||
If a long form option is used, the data may appear in the same command line
|
||||
item, separated by an equals character, or (with one exception) it may appear
|
||||
in the next command line item. For example:
|
||||
<pre>
|
||||
--file=/some/file
|
||||
--file /some/file
|
||||
</pre>
|
||||
Note, however, that if you want to supply a file name beginning with ~ as data
|
||||
in a shell command, and have the shell expand ~ to a home directory, you must
|
||||
separate the file name from the option, because the shell does not treat ~
|
||||
specially unless it is at the start of an item.
|
||||
</P>
|
||||
<P>
|
||||
The exception to the above is the <b>--colour</b> (or <b>--color</b>) option,
|
||||
for which the data is optional. If this option does have data, it must be given
|
||||
in the first form, using an equals character. Otherwise it will be assumed that
|
||||
it has no data.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">MATCHING ERRORS</a><br>
|
||||
<P>
|
||||
It is possible to supply a regular expression that takes a very long time to
|
||||
fail to match certain lines. Such patterns normally involve nested indefinite
|
||||
repeats, for example: (a+)*\d when matched against a line of a's with no final
|
||||
digit. The PCRE matching function has a resource limit that causes it to abort
|
||||
in these circumstances. If this happens, <b>pcregrep</b> outputs an error
|
||||
message and the line that caused the problem to the standard error stream. If
|
||||
there are more than 20 such errors, <b>pcregrep</b> gives up.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">DIAGNOSTICS</a><br>
|
||||
<P>
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors and non-existent or inacessible files (even if matches were
|
||||
found in other files) or too many matching errors. Using the <b>-s</b> option to
|
||||
suppress error messages about inaccessble files does not affect the return
|
||||
code.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 06 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,192 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcrematching specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrematching man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE MATCHING ALGORITHMS</a>
|
||||
<li><a name="TOC2" href="#SEC2">REGULAR EXPRESSIONS AS TREES</a>
|
||||
<li><a name="TOC3" href="#SEC3">THE STANDARD MATCHING ALGORITHM</a>
|
||||
<li><a name="TOC4" href="#SEC4">THE DFA MATCHING ALGORITHM</a>
|
||||
<li><a name="TOC5" href="#SEC5">ADVANTAGES OF THE DFA ALGORITHM</a>
|
||||
<li><a name="TOC6" href="#SEC6">DISADVANTAGES OF THE DFA ALGORITHM</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE MATCHING ALGORITHMS</a><br>
|
||||
<P>
|
||||
This document describes the two different algorithms that are available in PCRE
|
||||
for matching a compiled regular expression against a given subject string. The
|
||||
"standard" algorithm is the one provided by the <b>pcre_exec()</b> function.
|
||||
This works in the same was as Perl's matching function, and provides a
|
||||
Perl-compatible matching operation.
|
||||
</P>
|
||||
<P>
|
||||
An alternative algorithm is provided by the <b>pcre_dfa_exec()</b> function;
|
||||
this operates in a different way, and is not Perl-compatible. It has advantages
|
||||
and disadvantages compared with the standard algorithm, and these are described
|
||||
below.
|
||||
</P>
|
||||
<P>
|
||||
When there is only one possible way in which a given subject string can match a
|
||||
pattern, the two algorithms give the same answer. A difference arises, however,
|
||||
when there are multiple possibilities. For example, if the pattern
|
||||
<pre>
|
||||
^<.*>
|
||||
</pre>
|
||||
is matched against the string
|
||||
<pre>
|
||||
<something> <something else> <something further>
|
||||
</pre>
|
||||
there are three possible answers. The standard algorithm finds only one of
|
||||
them, whereas the DFA algorithm finds all three.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">REGULAR EXPRESSIONS AS TREES</a><br>
|
||||
<P>
|
||||
The set of strings that are matched by a regular expression can be represented
|
||||
as a tree structure. An unlimited repetition in the pattern makes the tree of
|
||||
infinite size, but it is still a tree. Matching the pattern to a given subject
|
||||
string (from a given starting point) can be thought of as a search of the tree.
|
||||
There are two ways to search a tree: depth-first and breadth-first, and these
|
||||
correspond to the two matching algorithms provided by PCRE.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br>
|
||||
<P>
|
||||
In the terminology of Jeffrey Friedl's book \fIMastering Regular
|
||||
Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a
|
||||
depth-first search of the pattern tree. That is, it proceeds along a single
|
||||
path through the tree, checking that the subject matches what is required. When
|
||||
there is a mismatch, the algorithm tries any alternatives at the current point,
|
||||
and if they all fail, it backs up to the previous branch point in the tree, and
|
||||
tries the next alternative branch at that level. This often involves backing up
|
||||
(moving to the left) in the subject string as well. The order in which
|
||||
repetition branches are tried is controlled by the greedy or ungreedy nature of
|
||||
the quantifier.
|
||||
</P>
|
||||
<P>
|
||||
If a leaf node is reached, a matching string has been found, and at that point
|
||||
the algorithm stops. Thus, if there is more than one possible match, this
|
||||
algorithm returns the first one that it finds. Whether this is the shortest,
|
||||
the longest, or some intermediate length depends on the way the greedy and
|
||||
ungreedy repetition quantifiers are specified in the pattern.
|
||||
</P>
|
||||
<P>
|
||||
Because it ends up with a single path through the tree, it is relatively
|
||||
straightforward for this algorithm to keep track of the substrings that are
|
||||
matched by portions of the pattern in parentheses. This provides support for
|
||||
capturing parentheses and back references.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">THE DFA MATCHING ALGORITHM</a><br>
|
||||
<P>
|
||||
DFA stands for "deterministic finite automaton", but you do not need to
|
||||
understand the origins of that name. This algorithm conducts a breadth-first
|
||||
search of the tree. Starting from the first matching point in the subject, it
|
||||
scans the subject string from left to right, once, character by character, and
|
||||
as it does this, it remembers all the paths through the tree that represent
|
||||
valid matches.
|
||||
</P>
|
||||
<P>
|
||||
The scan continues until either the end of the subject is reached, or there are
|
||||
no more unterminated paths. At this point, terminated paths represent the
|
||||
different matching possibilities (if there are none, the match has failed).
|
||||
Thus, if there is more than one possible match, this algorithm finds all of
|
||||
them, and in particular, it finds the longest. In PCRE, there is an option to
|
||||
stop the algorithm after the first match (which is necessarily the shortest)
|
||||
has been found.
|
||||
</P>
|
||||
<P>
|
||||
Note that all the matches that are found start at the same point in the
|
||||
subject. If the pattern
|
||||
<pre>
|
||||
cat(er(pillar)?)
|
||||
</pre>
|
||||
is matched against the string "the caterpillar catchment", the result will be
|
||||
the three strings "cat", "cater", and "caterpillar" that start at the fourth
|
||||
character of the subject. The algorithm does not automatically move on to find
|
||||
matches that start at later positions.
|
||||
</P>
|
||||
<P>
|
||||
There are a number of features of PCRE regular expressions that are not
|
||||
supported by the DFA matching algorithm. They are as follows:
|
||||
</P>
|
||||
<P>
|
||||
1. Because the algorithm finds all possible matches, the greedy or ungreedy
|
||||
nature of repetition quantifiers is not relevant. Greedy and ungreedy
|
||||
quantifiers are treated in exactly the same way.
|
||||
</P>
|
||||
<P>
|
||||
2. When dealing with multiple paths through the tree simultaneously, it is not
|
||||
straightforward to keep track of captured substrings for the different matching
|
||||
possibilities, and PCRE's implementation of this algorithm does not attempt to
|
||||
do this. This means that no captured substrings are available.
|
||||
</P>
|
||||
<P>
|
||||
3. Because no substrings are captured, back references within the pattern are
|
||||
not supported, and cause errors if encountered.
|
||||
</P>
|
||||
<P>
|
||||
4. For the same reason, conditional expressions that use a backreference as the
|
||||
condition are not supported.
|
||||
</P>
|
||||
<P>
|
||||
5. Callouts are supported, but the value of the <i>capture_top</i> field is
|
||||
always 1, and the value of the <i>capture_last</i> field is always -1.
|
||||
</P>
|
||||
<P>
|
||||
6.
|
||||
The \C escape sequence, which (in the standard algorithm) matches a single
|
||||
byte, even in UTF-8 mode, is not supported because the DFA algorithm moves
|
||||
through the subject string one character at a time, for all active paths
|
||||
through the tree.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE DFA ALGORITHM</a><br>
|
||||
<P>
|
||||
Using the DFA matching algorithm provides the following advantages:
|
||||
</P>
|
||||
<P>
|
||||
1. All possible matches (at a single point in the subject) are automatically
|
||||
found, and in particular, the longest match is found. To find more than one
|
||||
match using the standard algorithm, you have to do kludgy things with
|
||||
callouts.
|
||||
</P>
|
||||
<P>
|
||||
2. There is much better support for partial matching. The restrictions on the
|
||||
content of the pattern that apply when using the standard algorithm for partial
|
||||
matching do not apply to the DFA algorithm. For non-anchored patterns, the
|
||||
starting position of a partial match is available.
|
||||
</P>
|
||||
<P>
|
||||
3. Because the DFA algorithm scans the subject string just once, and never
|
||||
needs to backtrack, it is possible to pass very long subject strings to the
|
||||
matching function in several pieces, checking for partial matching each time.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE DFA ALGORITHM</a><br>
|
||||
<P>
|
||||
The DFA algorithm suffers from a number of disadvantages:
|
||||
</P>
|
||||
<P>
|
||||
1. It is substantially slower than the standard algorithm. This is partly
|
||||
because it has to search for all possible matches, but is also because it is
|
||||
less susceptible to optimization.
|
||||
</P>
|
||||
<P>
|
||||
2. Capturing parentheses and back references are not supported.
|
||||
</P>
|
||||
<P>
|
||||
3. The "atomic group" feature of PCRE regular expressions is supported, but
|
||||
does not provide the advantage that it does for the standard algorithm.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 06 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,225 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcrepartial specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrepartial man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
|
||||
<li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>
|
||||
<li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
|
||||
<li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
|
||||
<P>
|
||||
In normal use of PCRE, if the subject string that is passed to
|
||||
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> matches as far as it goes, but is
|
||||
too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
|
||||
are circumstances where it might be helpful to distinguish this case from other
|
||||
cases in which there is no match.
|
||||
</P>
|
||||
<P>
|
||||
Consider, for example, an application where a human is required to type in data
|
||||
for a field with specific formatting requirements. An example might be a date
|
||||
in the form <i>ddmmmyy</i>, defined by this pattern:
|
||||
<pre>
|
||||
^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
|
||||
</pre>
|
||||
If the application sees the user's keystrokes one by one, and can check that
|
||||
what has been typed so far is potentially valid, it is able to raise an error
|
||||
as soon as a mistake is made, possibly beeping and not reflecting the
|
||||
character that has been typed. This immediate feedback is likely to be a better
|
||||
user interface than a check that is delayed until the entire string has been
|
||||
entered.
|
||||
</P>
|
||||
<P>
|
||||
PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
|
||||
option, which can be set when calling <b>pcre_exec()</b> or
|
||||
<b>pcre_dfa_exec()</b>. When this flag is set for <b>pcre_exec()</b>, the return
|
||||
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
|
||||
during the matching process the last part of the subject string matched part of
|
||||
the pattern. Unfortunately, for non-anchored matching, it is not possible to
|
||||
obtain the position of the start of the partial match. No captured data is set
|
||||
when PCRE_ERROR_PARTIAL is returned.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE_PARTIAL is set for <b>pcre_dfa_exec()</b>, the return code
|
||||
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
|
||||
subject is reached, there have been no complete matches, but there is still at
|
||||
least one matching possibility. The portion of the string that provided the
|
||||
partial match is set as the first matching string.
|
||||
</P>
|
||||
<P>
|
||||
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
|
||||
last literal byte in a pattern, and abandons matching immediately if such a
|
||||
byte is not present in the subject string. This optimization cannot be used
|
||||
for a subject string that might match only partially.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>
|
||||
<P>
|
||||
Because of the way certain internal optimizations are implemented in the
|
||||
<b>pcre_exec()</b> function, the PCRE_PARTIAL option cannot be used with all
|
||||
patterns. These restrictions do not apply when <b>pcre_dfa_exec()</b> is used.
|
||||
For <b>pcre_exec()</b>, repeated single characters such as
|
||||
<pre>
|
||||
a{2,4}
|
||||
</pre>
|
||||
and repeated single metasequences such as
|
||||
<pre>
|
||||
\d+
|
||||
</pre>
|
||||
are not permitted if the maximum number of occurrences is greater than one.
|
||||
Optional items such as \d? (where the maximum is one) are permitted.
|
||||
Quantifiers with any values are permitted after parentheses, so the invalid
|
||||
examples above can be coded thus:
|
||||
<pre>
|
||||
(a){2,4}
|
||||
(\d)+
|
||||
</pre>
|
||||
These constructions run more slowly, but for the kinds of application that are
|
||||
envisaged for this facility, this is not felt to be a major restriction.
|
||||
</P>
|
||||
<P>
|
||||
If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
|
||||
<b>pcre_exec()</b> returns the error code PCRE_ERROR_BADPARTIAL (-13).
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a><br>
|
||||
<P>
|
||||
If the escape sequence \P is present in a <b>pcretest</b> data line, the
|
||||
PCRE_PARTIAL flag is used for the match. Here is a run of <b>pcretest</b> that
|
||||
uses the date example quoted above:
|
||||
<pre>
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 25jun04\P
|
||||
0: 25jun04
|
||||
1: jun
|
||||
data> 25dec3\P
|
||||
Partial match
|
||||
data> 3ju\P
|
||||
Partial match
|
||||
data> 3juj\P
|
||||
No match
|
||||
data> j\P
|
||||
No match
|
||||
</pre>
|
||||
The first data string is matched completely, so <b>pcretest</b> shows the
|
||||
matched substrings. The remaining four strings do not match the complete
|
||||
pattern, but the first two are partial matches. The same test, using DFA
|
||||
matching (by means of the \D escape sequence), produces the following output:
|
||||
<pre>
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 25jun04\P\D
|
||||
0: 25jun04
|
||||
data> 23dec3\P\D
|
||||
Partial match: 23dec3
|
||||
data> 3ju\P\D
|
||||
Partial match: 3ju
|
||||
data> 3juj\P\D
|
||||
No match
|
||||
data> j\P\D
|
||||
No match
|
||||
</pre>
|
||||
Notice that in this case the portion of the string that was matched is made
|
||||
available.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
|
||||
<P>
|
||||
When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
|
||||
to continue the match by providing additional subject data and calling
|
||||
<b>pcre_dfa_exec()</b> again with the PCRE_DFA_RESTART option and the same
|
||||
working space (where details of the previous partial match are stored). Here is
|
||||
an example using <b>pcretest</b>, where the \R escape sequence sets the
|
||||
PCRE_DFA_RESTART option and the \D escape sequence requests the use of
|
||||
<b>pcre_dfa_exec()</b>:
|
||||
<pre>
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 23ja\P\D
|
||||
Partial match: 23ja
|
||||
data> n05\R\D
|
||||
0: n05
|
||||
</pre>
|
||||
The first call has "23ja" as the subject, and requests partial matching; the
|
||||
second call has "n05" as the subject for the continued (restarted) match.
|
||||
Notice that when the match is complete, only the last part is shown; PCRE does
|
||||
not retain the previously partially-matched string. It is up to the calling
|
||||
program to do that if it needs to.
|
||||
</P>
|
||||
<P>
|
||||
This facility can be used to pass very long subject strings to
|
||||
<b>pcre_dfa_exec()</b>. However, some care is needed for certain types of
|
||||
pattern.
|
||||
</P>
|
||||
<P>
|
||||
1. If the pattern contains tests for the beginning or end of a line, you need
|
||||
to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
|
||||
subject string for any call does not contain the beginning or end of a line.
|
||||
</P>
|
||||
<P>
|
||||
2. If the pattern contains backward assertions (including \b or \B), you need
|
||||
to arrange for some overlap in the subject strings to allow for this. For
|
||||
example, you could pass the subject in chunks that were 500 bytes long, but in
|
||||
a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
|
||||
bytes at the start of the buffer.
|
||||
</P>
|
||||
<P>
|
||||
3. Matching a subject string that is split into multiple segments does not
|
||||
always produce exactly the same result as matching over one single long string.
|
||||
The difference arises when there are multiple matching possibilities, because a
|
||||
partial match result is given only when there are no completed matches in a
|
||||
call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
|
||||
been found, continuation to a new subject segment is no longer possible.
|
||||
Consider this <b>pcretest</b> example:
|
||||
<pre>
|
||||
re> /dog(sbody)?/
|
||||
data> do\P\D
|
||||
Partial match: do
|
||||
data> gsb\R\P\D
|
||||
0: g
|
||||
data> dogsbody\D
|
||||
0: dogsbody
|
||||
1: dog
|
||||
</pre>
|
||||
The pattern matches the words "dog" or "dogsbody". When the subject is
|
||||
presented in several parts ("do" and "gsb" being the first two) the match stops
|
||||
when "dog" has been found, and it is not possible to continue. On the other
|
||||
hand, if "dogsbody" is presented as a single string, both matches are found.
|
||||
</P>
|
||||
<P>
|
||||
Because of this phenomenon, it does not usually make sense to end a pattern
|
||||
that is going to be matched in this way with a variable repeat.
|
||||
</P>
|
||||
<P>
|
||||
4. Patterns that contain alternatives at the top level which do not all
|
||||
start with the same pattern item may not work as expected. For example,
|
||||
consider this pattern:
|
||||
<pre>
|
||||
1234|3789
|
||||
</pre>
|
||||
If the first part of the subject is "ABC123", a partial match of the first
|
||||
alternative is found at offset 3. There is no partial match for the second
|
||||
alternative, because such a match does not start at the same point in the
|
||||
subject string. Attempting to continue with the string "789" does not yield a
|
||||
match because only those alternatives that match at one point in the subject
|
||||
are remembered. The problem arises because the start of the second alternative
|
||||
matches within the first alternative. There is no problem with anchored
|
||||
patterns or patterns such as:
|
||||
<pre>
|
||||
1234|ABCD
|
||||
</pre>
|
||||
where no string can be a partial match for both alternatives.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 16 January 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,97 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcreperform specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcreperform man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
PCRE PERFORMANCE
|
||||
</b><br>
|
||||
<P>
|
||||
Certain items that may appear in regular expression patterns are more efficient
|
||||
than others. It is more efficient to use a character class like [aeiou] than a
|
||||
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
|
||||
that provides the required behaviour is usually the most efficient. Jeffrey
|
||||
Friedl's book contains a lot of useful general discussion about optimizing
|
||||
regular expressions for efficient performance. This document contains a few
|
||||
observations about PCRE.
|
||||
</P>
|
||||
<P>
|
||||
Using Unicode character properties (the \p, \P, and \X escapes) is slow,
|
||||
because PCRE has to scan a structure that contains data for over fifteen
|
||||
thousand characters whenever it needs a character's property. If you can find
|
||||
an alternative pattern that does not use character properties, it will probably
|
||||
be faster.
|
||||
</P>
|
||||
<P>
|
||||
When a pattern begins with .* not in parentheses, or in parentheses that are
|
||||
not the subject of a backreference, and the PCRE_DOTALL option is set, the
|
||||
pattern is implicitly anchored by PCRE, since it can match only at the start of
|
||||
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
|
||||
optimization, because the . metacharacter does not then match a newline, and if
|
||||
the subject string contains newlines, the pattern may match from the character
|
||||
immediately following one of them instead of from the very start. For example,
|
||||
the pattern
|
||||
<pre>
|
||||
.*second
|
||||
</pre>
|
||||
matches the subject "first\nand second" (where \n stands for a newline
|
||||
character), with the match starting at the seventh character. In order to do
|
||||
this, PCRE has to retry the match starting after every newline in the subject.
|
||||
</P>
|
||||
<P>
|
||||
If you are using such a pattern with subject strings that do not contain
|
||||
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
|
||||
the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE
|
||||
from having to scan along the subject looking for a newline to restart at.
|
||||
</P>
|
||||
<P>
|
||||
Beware of patterns that contain nested indefinite repeats. These can take a
|
||||
long time to run when applied to a string that does not match. Consider the
|
||||
pattern fragment
|
||||
<pre>
|
||||
(a+)*
|
||||
</pre>
|
||||
This can match "aaaa" in 33 different ways, and this number increases very
|
||||
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
|
||||
times, and for each of those cases other than 0, the + repeats can match
|
||||
different numbers of times.) When the remainder of the pattern is such that the
|
||||
entire match is going to fail, PCRE has in principle to try every possible
|
||||
variation, and this can take an extremely long time.
|
||||
</P>
|
||||
<P>
|
||||
An optimization catches some of the more simple cases such as
|
||||
<pre>
|
||||
(a+)*b
|
||||
</pre>
|
||||
where a literal character follows. Before embarking on the standard matching
|
||||
procedure, PCRE checks that there is a "b" later in the subject string, and if
|
||||
there is not, it fails the match immediately. However, when there is no
|
||||
following literal this optimization cannot be used. You can see the difference
|
||||
by comparing the behaviour of
|
||||
<pre>
|
||||
(a+)*\d
|
||||
</pre>
|
||||
with the pattern above. The former gives a failure almost instantly when
|
||||
applied to a whole line of "a" characters, whereas the latter takes an
|
||||
appreciable time with strings longer than about 20 characters.
|
||||
</P>
|
||||
<P>
|
||||
In many cases, the solution to this kind of performance issue is to use an
|
||||
atomic group or a possessive quantifier.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 28 February 2005
|
||||
<br>
|
||||
Copyright © 1997-2005 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,244 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcreposix specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcreposix man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
|
||||
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
||||
<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
|
||||
<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
|
||||
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
|
||||
<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
|
||||
<li><a name="TOC8" href="#SEC8">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
|
||||
<P>
|
||||
<b>#include <pcreposix.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
|
||||
<b>int <i>cflags</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
|
||||
<b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
|
||||
<b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void regfree(regex_t *<i>preg</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation for a description of PCRE's native API, which contains much
|
||||
additional functionality.
|
||||
</P>
|
||||
<P>
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
|
||||
header file, and on Unix systems the library itself is called
|
||||
<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
|
||||
command for linking an application that uses them. Because the POSIX functions
|
||||
call the native ones, it is also necessary to add <b>-lpcre</b>.
|
||||
</P>
|
||||
<P>
|
||||
I have implemented only those option bits that can be reasonably mapped to PCRE
|
||||
native options. In addition, the option REG_EXTENDED is defined with the value
|
||||
zero. This has no effect, but since programs that are written to the POSIX
|
||||
interface often use it, this makes it easier to slot in PCRE as a replacement
|
||||
library. Other POSIX options are not even defined.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below. "POSIX-like in style" means that the API approximates to the
|
||||
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
|
||||
domains it is probably even less compatible.
|
||||
</P>
|
||||
<P>
|
||||
The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as <b>regex.h</b>, which is the "correct" name. It provides two
|
||||
structure types, <i>regex_t</i> for compiled internal forms, and
|
||||
<i>regmatch_t</i> for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
</P>
|
||||
<P>
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<P>
|
||||
The function <b>regcomp()</b> is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
|
||||
to a <b>regex_t</b> structure that is used as a base for storing information
|
||||
about the compiled regular expression.
|
||||
</P>
|
||||
<P>
|
||||
The argument <i>cflags</i> is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
<pre>
|
||||
REG_DOTALL
|
||||
</pre>
|
||||
The PCRE_DOTALL option is set when the regular expression is passed for
|
||||
compilation to the native function. Note that REG_DOTALL is not part of the
|
||||
POSIX standard.
|
||||
<pre>
|
||||
REG_ICASE
|
||||
</pre>
|
||||
The PCRE_CASELESS option is set when the regular expression is passed for
|
||||
compilation to the native function.
|
||||
<pre>
|
||||
REG_NEWLINE
|
||||
</pre>
|
||||
The PCRE_MULTILINE option is set when the regular expression is passed for
|
||||
compilation to the native function. Note that this does <i>not</i> mimic the
|
||||
defined POSIX behaviour for REG_NEWLINE (see the following section).
|
||||
<pre>
|
||||
REG_NOSUB
|
||||
</pre>
|
||||
The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
|
||||
for compilation to the native function. In addition, when a pattern that is
|
||||
compiled with this flag is passed to <b>regexec()</b> for matching, the
|
||||
<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
|
||||
are returned.
|
||||
<pre>
|
||||
REG_UTF8
|
||||
</pre>
|
||||
The PCRE_UTF8 option is set when the regular expression is passed for
|
||||
compilation to the native function. This causes the pattern itself and all data
|
||||
strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
|
||||
is not part of the POSIX standard.
|
||||
</P>
|
||||
<P>
|
||||
In the absence of these flags, no options are passed to the native function.
|
||||
This means the the regex is compiled with PCRE default semantics. In
|
||||
particular, the way it handles newline characters in the subject string is the
|
||||
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
|
||||
<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
|
||||
newlines are matched by . (they aren't) or by a negative class such as [^a]
|
||||
(they are).
|
||||
</P>
|
||||
<P>
|
||||
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
|
||||
<i>preg</i> structure is filled in on success, and one member of the structure
|
||||
is public: <i>re_nsub</i> contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
|
||||
<P>
|
||||
This area is not simple, because POSIX and Perl take different views of things.
|
||||
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
|
||||
intended to be a POSIX engine. The following table lists the different
|
||||
possibilities for matching newline characters in PCRE:
|
||||
<pre>
|
||||
Default Change with
|
||||
|
||||
. matches newline no PCRE_DOTALL
|
||||
newline matches [^a] yes not changeable
|
||||
$ matches \n at end yes PCRE_DOLLARENDONLY
|
||||
$ matches \n in middle no PCRE_MULTILINE
|
||||
^ matches \n in middle no PCRE_MULTILINE
|
||||
</pre>
|
||||
This is the equivalent table for POSIX:
|
||||
<pre>
|
||||
Default Change with
|
||||
|
||||
. matches newline yes REG_NEWLINE
|
||||
newline matches [^a] yes REG_NEWLINE
|
||||
$ matches \n at end no REG_NEWLINE
|
||||
$ matches \n in middle no REG_NEWLINE
|
||||
^ matches \n in middle no REG_NEWLINE
|
||||
</pre>
|
||||
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
|
||||
PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
|
||||
newline from matching [^a].
|
||||
</P>
|
||||
<P>
|
||||
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
|
||||
PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
|
||||
REG_NEWLINE action.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
|
||||
<P>
|
||||
The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
|
||||
against a given <i>string</i>, which is terminated by a zero byte, subject to
|
||||
the options in <i>eflags</i>. These can be:
|
||||
<pre>
|
||||
REG_NOTBOL
|
||||
</pre>
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
<pre>
|
||||
REG_NOTEOL
|
||||
</pre>
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
</P>
|
||||
<P>
|
||||
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
||||
strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
|
||||
<b>regexec()</b> are ignored.
|
||||
</P>
|
||||
<P>
|
||||
Otherwise,the portion of the string that was matched, and also any captured
|
||||
substrings, are returned via the <i>pmatch</i> argument, which points to an
|
||||
array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
|
||||
members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
|
||||
character of each substring and the offset to the first character after the end
|
||||
of each substring, respectively. The 0th element of the vector relates to the
|
||||
entire portion of <i>string</i> that was matched; subsequent elements relate to
|
||||
the capturing subpatterns of the regular expression. Unused entries in the
|
||||
array have both structure members set to -1.
|
||||
</P>
|
||||
<P>
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
|
||||
<P>
|
||||
The <b>regerror()</b> function maps a non-zero errorcode from either
|
||||
<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in <i>errbuf</i>. The length of the
|
||||
message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
|
||||
<P>
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
|
||||
memory, after which <i>preg</i> may no longer be used as a compiled expression.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service,
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 16 January 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,140 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcreprecompile specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcreprecompile man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a>
|
||||
<li><a name="TOC2" href="#SEC2">SAVING A COMPILED PATTERN</a>
|
||||
<li><a name="TOC3" href="#SEC3">RE-USING A PRECOMPILED PATTERN</a>
|
||||
<li><a name="TOC4" href="#SEC4">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a><br>
|
||||
<P>
|
||||
If you are running an application that uses a large number of regular
|
||||
expression patterns, it may be useful to store them in a precompiled form
|
||||
instead of having to compile them every time the application is run.
|
||||
If you are not using any private character tables (see the
|
||||
<a href="pcre_maketables.html"><b>pcre_maketables()</b></a>
|
||||
documentation), this is relatively straightforward. If you are using private
|
||||
tables, it is a little bit more complicated.
|
||||
</P>
|
||||
<P>
|
||||
If you save compiled patterns to a file, you can copy them to a different host
|
||||
and run them there. This works even if the new host has the opposite endianness
|
||||
to the one on which the patterns were compiled. There may be a small
|
||||
performance penalty, but it should be insignificant.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">SAVING A COMPILED PATTERN</a><br>
|
||||
<P>
|
||||
The value returned by <b>pcre_compile()</b> points to a single block of memory
|
||||
that holds the compiled pattern and associated data. You can find the length of
|
||||
this block in bytes by calling <b>pcre_fullinfo()</b> with an argument of
|
||||
PCRE_INFO_SIZE. You can then save the data in any appropriate manner. Here is
|
||||
sample code that compiles a pattern and writes it to a file. It assumes that
|
||||
the variable <i>fd</i> refers to a file that is open for output:
|
||||
<pre>
|
||||
int erroroffset, rc, size;
|
||||
char *error;
|
||||
pcre *re;
|
||||
|
||||
re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
|
||||
if (re == NULL) { ... handle errors ... }
|
||||
rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
|
||||
if (rc < 0) { ... handle errors ... }
|
||||
rc = fwrite(re, 1, size, fd);
|
||||
if (rc != size) { ... handle errors ... }
|
||||
</pre>
|
||||
In this example, the bytes that comprise the compiled pattern are copied
|
||||
exactly. Note that this is binary data that may contain any of the 256 possible
|
||||
byte values. On systems that make a distinction between binary and non-binary
|
||||
data, be sure that the file is opened for binary output.
|
||||
</P>
|
||||
<P>
|
||||
If you want to write more than one pattern to a file, you will have to devise a
|
||||
way of separating them. For binary data, preceding each pattern with its length
|
||||
is probably the most straightforward approach. Another possibility is to write
|
||||
out the data in hexadecimal instead of binary, one pattern to a line.
|
||||
</P>
|
||||
<P>
|
||||
Saving compiled patterns in a file is only one possible way of storing them for
|
||||
later use. They could equally well be saved in a database, or in the memory of
|
||||
some daemon process that passes them via sockets to the processes that want
|
||||
them.
|
||||
</P>
|
||||
<P>
|
||||
If the pattern has been studied, it is also possible to save the study data in
|
||||
a similar way to the compiled pattern itself. When studying generates
|
||||
additional information, <b>pcre_study()</b> returns a pointer to a
|
||||
<b>pcre_extra</b> data block. Its format is defined in the
|
||||
<a href="pcreapi.html#extradata">section on matching a pattern</a>
|
||||
in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation. The <i>study_data</i> field points to the binary study data, and
|
||||
this is what you must save (not the <b>pcre_extra</b> block itself). The length
|
||||
of the study data can be obtained by calling <b>pcre_fullinfo()</b> with an
|
||||
argument of PCRE_INFO_STUDYSIZE. Remember to check that <b>pcre_study()</b> did
|
||||
return a non-NULL value before trying to save the study data.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">RE-USING A PRECOMPILED PATTERN</a><br>
|
||||
<P>
|
||||
Re-using a precompiled pattern is straightforward. Having reloaded it into main
|
||||
memory, you pass its pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in
|
||||
the usual way. This should work even on another host, and even if that host has
|
||||
the opposite endianness to the one where the pattern was compiled.
|
||||
</P>
|
||||
<P>
|
||||
However, if you passed a pointer to custom character tables when the pattern
|
||||
was compiled (the <i>tableptr</i> argument of <b>pcre_compile()</b>), you must
|
||||
now pass a similar pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>,
|
||||
because the value saved with the compiled pattern will obviously be nonsense. A
|
||||
field in a <b>pcre_extra()</b> block is used to pass this data, as described in
|
||||
the
|
||||
<a href="pcreapi.html#extradata">section on matching a pattern</a>
|
||||
in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
If you did not provide custom character tables when the pattern was compiled,
|
||||
the pointer in the compiled pattern is NULL, which causes <b>pcre_exec()</b> to
|
||||
use PCRE's internal tables. Thus, you do not need to take any special action at
|
||||
run time in this case.
|
||||
</P>
|
||||
<P>
|
||||
If you saved study data with the compiled pattern, you need to create your own
|
||||
<b>pcre_extra</b> data block and set the <i>study_data</i> field to point to the
|
||||
reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
|
||||
<i>flags</i> field to indicate that study data is present. Then pass the
|
||||
<b>pcre_extra</b> block to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in the
|
||||
usual way.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a><br>
|
||||
<P>
|
||||
The layout of the control block that is at the start of the data that makes up
|
||||
a compiled pattern was changed for release 5.0. If you have any saved patterns
|
||||
that were compiled with previous releases (not a facility that was previously
|
||||
advertised), you will have to recompile them for release 5.0. However, from now
|
||||
on, it should be possible to make changes in a compatible manner.
|
||||
</P>
|
||||
<P>
|
||||
Notwithstanding the above, if you have any saved patterns in UTF-8 mode that
|
||||
use \p or \P that were compiled with any release up to and including 6.4, you
|
||||
will have to recompile them for release 6.5 and above.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 01 February 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,81 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcresample specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcresample man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
PCRE SAMPLE PROGRAM
|
||||
</b><br>
|
||||
<P>
|
||||
A simple, complete demonstration program, to get you started with using PCRE,
|
||||
is supplied in the file <i>pcredemo.c</i> in the PCRE distribution.
|
||||
</P>
|
||||
<P>
|
||||
The program compiles the regular expression that is its first argument, and
|
||||
matches it against the subject string in its second argument. No PCRE options
|
||||
are set, and default character tables are used. If matching succeeds, the
|
||||
program outputs the portion of the subject that matched, together with the
|
||||
contents of any captured substrings.
|
||||
</P>
|
||||
<P>
|
||||
If the -g option is given on the command line, the program then goes on to
|
||||
check for further matches of the same regular expression in the same subject
|
||||
string. The logic is a little bit tricky because of the possibility of matching
|
||||
an empty string. Comments in the code explain what is going on.
|
||||
</P>
|
||||
<P>
|
||||
If PCRE is installed in the standard include and library directories for your
|
||||
system, you should be able to compile the demonstration program using this
|
||||
command:
|
||||
<pre>
|
||||
gcc -o pcredemo pcredemo.c -lpcre
|
||||
</pre>
|
||||
If PCRE is installed elsewhere, you may need to add additional options to the
|
||||
command line. For example, on a Unix-like system that has PCRE installed in
|
||||
<i>/usr/local</i>, you can compile the demonstration program using a command
|
||||
like this:
|
||||
<pre>
|
||||
gcc -o pcredemo -I/usr/local/include pcredemo.c -L/usr/local/lib -lpcre
|
||||
</pre>
|
||||
Once you have compiled the demonstration program, you can run simple tests like
|
||||
this:
|
||||
<pre>
|
||||
./pcredemo 'cat|dog' 'the cat sat on the mat'
|
||||
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
|
||||
</pre>
|
||||
Note that there is a much more comprehensive test program, called
|
||||
<a href="pcretest.html"><b>pcretest</b>,</a>
|
||||
which supports many more facilities for testing regular expressions and the
|
||||
PCRE library. The <b>pcredemo</b> program is provided as a simple coding
|
||||
example.
|
||||
</P>
|
||||
<P>
|
||||
On some operating systems (e.g. Solaris), when PCRE is not installed in the
|
||||
standard library directory, you may get an error like this when you try to run
|
||||
<b>pcredemo</b>:
|
||||
<pre>
|
||||
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
|
||||
</pre>
|
||||
This is caused by the way shared library support works on those systems. You
|
||||
need to add
|
||||
<pre>
|
||||
-R/usr/local/lib
|
||||
</pre>
|
||||
(for example) to the compile command to get round this problem.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 09 September 2004
|
||||
<br>
|
||||
Copyright © 1997-2004 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,127 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcrestack specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrestack man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
PCRE DISCUSSION OF STACK USAGE
|
||||
</b><br>
|
||||
<P>
|
||||
When you call <b>pcre_exec()</b>, it makes use of an internal function called
|
||||
<b>match()</b>. This calls itself recursively at branch points in the pattern,
|
||||
in order to remember the state of the match so that it can back up and try a
|
||||
different alternative if the first one fails. As matching proceeds deeper and
|
||||
deeper into the tree of possibilities, the recursion depth increases.
|
||||
</P>
|
||||
<P>
|
||||
Not all calls of <b>match()</b> increase the recursion depth; for an item such
|
||||
as a* it may be called several times at the same level, after matching
|
||||
different numbers of a's. Furthermore, in a number of cases where the result of
|
||||
the recursive call would immediately be passed back as the result of the
|
||||
current call (a "tail recursion"), the function is just restarted instead.
|
||||
</P>
|
||||
<P>
|
||||
The <b>pcre_dfa_exec()</b> function operates in an entirely different way, and
|
||||
hardly uses recursion at all. The limit on its complexity is the amount of
|
||||
workspace it is given. The comments that follow do NOT apply to
|
||||
<b>pcre_dfa_exec()</b>; they are relevant only for <b>pcre_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
You can set limits on the number of times that <b>match()</b> is called, both in
|
||||
total and recursively. If the limit is exceeded, an error occurs. For details,
|
||||
see the
|
||||
<a href="pcreapi.html#extradata">section on extra data for <b>pcre_exec()</b></a>
|
||||
in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
Each time that <b>match()</b> is actually called recursively, it uses memory
|
||||
from the process stack. For certain kinds of pattern and data, very large
|
||||
amounts of stack may be needed, despite the recognition of "tail recursion".
|
||||
You can often reduce the amount of recursion, and therefore the amount of stack
|
||||
used, by modifying the pattern that is being matched. Consider, for example,
|
||||
this pattern:
|
||||
<pre>
|
||||
([^<]|<(?!inet))+
|
||||
</pre>
|
||||
It matches from wherever it starts until it encounters "<inet" or the end of
|
||||
the data, and is the kind of pattern that might be used when processing an XML
|
||||
file. Each iteration of the outer parentheses matches either one character that
|
||||
is not "<" or a "<" that is not followed by "inet". However, each time a
|
||||
parenthesis is processed, a recursion occurs, so this formulation uses a stack
|
||||
frame for each matched character. For a long string, a lot of stack is
|
||||
required. Consider now this rewritten pattern, which matches exactly the same
|
||||
strings:
|
||||
<pre>
|
||||
([^<]++|<(?!inet))
|
||||
</pre>
|
||||
This uses very much less stack, because runs of characters that do not contain
|
||||
"<" are "swallowed" in one item inside the parentheses. Recursion happens only
|
||||
when a "<" character that is not followed by "inet" is encountered (and we
|
||||
assume this is relatively rare). A possessive quantifier is used to stop any
|
||||
backtracking into the runs of non-"<" characters, but that is not related to
|
||||
stack usage.
|
||||
</P>
|
||||
<P>
|
||||
In environments where stack memory is constrained, you might want to compile
|
||||
PCRE to use heap memory instead of stack for remembering back-up points. This
|
||||
makes it run a lot more slowly, however. Details of how to do this are given in
|
||||
the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
In Unix-like environments, there is not often a problem with the stack, though
|
||||
the default limit on stack size varies from system to system. Values from 8Mb
|
||||
to 64Mb are common. You can find your default limit by running the command:
|
||||
<pre>
|
||||
ulimit -s
|
||||
</pre>
|
||||
The effect of running out of stack is often SIGSEGV, though sometimes an error
|
||||
message is given. You can normally increase the limit on stack size by code
|
||||
such as this:
|
||||
<pre>
|
||||
struct rlimit rlim;
|
||||
getrlimit(RLIMIT_STACK, &rlim);
|
||||
rlim.rlim_cur = 100*1024*1024;
|
||||
setrlimit(RLIMIT_STACK, &rlim);
|
||||
</pre>
|
||||
This reads the current limits (soft and hard) using <b>getrlimit()</b>, then
|
||||
attempts to increase the soft limit to 100Mb using <b>setrlimit()</b>. You must
|
||||
do this before calling <b>pcre_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
PCRE has an internal counter that can be used to limit the depth of recursion,
|
||||
and thus cause <b>pcre_exec()</b> to give an error code before it runs out of
|
||||
stack. By default, the limit is very large, and unlikely ever to operate. It
|
||||
can be changed when PCRE is built, and it can also be set when
|
||||
<b>pcre_exec()</b> is called. For details of these interfaces, see the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
and
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
As a very rough rule of thumb, you should reckon on about 500 bytes per
|
||||
recursion. Thus, if you want to limit your stack usage to 8Mb, you
|
||||
should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
|
||||
support around 128000 recursions. The <b>pcretest</b> test program has a command
|
||||
line option (<b>-S</b>) that can be used to increase its stack.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 29 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,616 @@
|
|||
<html>
|
||||
<head>
|
||||
<title>pcretest specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcretest man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
|
||||
<li><a name="TOC2" href="#SEC2">OPTIONS</a>
|
||||
<li><a name="TOC3" href="#SEC3">DESCRIPTION</a>
|
||||
<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>
|
||||
<li><a name="TOC5" href="#SEC5">DATA LINES</a>
|
||||
<li><a name="TOC6" href="#SEC6">THE ALTERNATIVE MATCHING FUNCTION</a>
|
||||
<li><a name="TOC7" href="#SEC7">DEFAULT OUTPUT FROM PCRETEST</a>
|
||||
<li><a name="TOC8" href="#SEC8">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
|
||||
<li><a name="TOC9" href="#SEC9">RESTARTING AFTER A PARTIAL MATCH</a>
|
||||
<li><a name="TOC10" href="#SEC10">CALLOUTS</a>
|
||||
<li><a name="TOC11" href="#SEC11">SAVING AND RELOADING COMPILED PATTERNS</a>
|
||||
<li><a name="TOC12" href="#SEC12">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
<b>pcretest [options] [source] [destination]</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcretest</b> was written as a test program for the PCRE regular expression
|
||||
library itself, but it can also be used for experimenting with regular
|
||||
expressions. This document describes the features of the test program; for
|
||||
details of the regular expressions themselves, see the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
documentation. For details of the PCRE library function calls and their
|
||||
options, see the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">OPTIONS</a><br>
|
||||
<P>
|
||||
<b>-C</b>
|
||||
Output the version number of the PCRE library, and all available information
|
||||
about the optional features that are included, and then exit.
|
||||
</P>
|
||||
<P>
|
||||
<b>-d</b>
|
||||
Behave as if each regex has the <b>/D</b> (debug) modifier; the internal
|
||||
form is output after compilation.
|
||||
</P>
|
||||
<P>
|
||||
<b>-dfa</b>
|
||||
Behave as if each data line contains the \D escape sequence; this causes the
|
||||
alternative matching function, <b>pcre_dfa_exec()</b>, to be used instead of the
|
||||
standard <b>pcre_exec()</b> function (more detail is given below).
|
||||
</P>
|
||||
<P>
|
||||
<b>-i</b>
|
||||
Behave as if each regex has the <b>/I</b> modifier; information about the
|
||||
compiled pattern is given after compilation.
|
||||
</P>
|
||||
<P>
|
||||
<b>-m</b>
|
||||
Output the size of each compiled pattern after it has been compiled. This is
|
||||
equivalent to adding <b>/M</b> to each regular expression. For compatibility
|
||||
with earlier versions of pcretest, <b>-s</b> is a synonym for <b>-m</b>.
|
||||
</P>
|
||||
<P>
|
||||
<b>-o</b> <i>osize</i>
|
||||
Set the number of elements in the output vector that is used when calling
|
||||
<b>pcre_exec()</b> to be <i>osize</i>. The default value is 45, which is enough
|
||||
for 14 capturing subexpressions. The vector size can be changed for individual
|
||||
matching calls by including \O in the data line (see below).
|
||||
</P>
|
||||
<P>
|
||||
<b>-p</b>
|
||||
Behave as if each regex has the <b>/P</b> modifier; the POSIX wrapper API is
|
||||
used to call PCRE. None of the other options has any effect when <b>-p</b> is
|
||||
set.
|
||||
</P>
|
||||
<P>
|
||||
<b>-q</b>
|
||||
Do not output the version number of <b>pcretest</b> at the start of execution.
|
||||
</P>
|
||||
<P>
|
||||
<b>-S</b> <i>size</i>
|
||||
On Unix-like systems, set the size of the runtime stack to <i>size</i>
|
||||
megabytes.
|
||||
</P>
|
||||
<P>
|
||||
<b>-t</b>
|
||||
Run each compile, study, and match many times with a timer, and output
|
||||
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with
|
||||
<b>-t</b>, because you will then get the size output a zillion times, and the
|
||||
timing will be distorted.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
If <b>pcretest</b> is given two filename arguments, it reads from the first and
|
||||
writes to the second. If it is given only one filename argument, it reads from
|
||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||
stdout, and prompts for each line of input, using "re>" to prompt for regular
|
||||
expressions, and "data>" to prompt for data lines.
|
||||
</P>
|
||||
<P>
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern.
|
||||
</P>
|
||||
<P>
|
||||
Each data line is matched separately and independently. If you want to do
|
||||
multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
|
||||
depending on the newline setting) in a single line of input to encode the
|
||||
newline characters. There is no limit on the length of data lines; the input
|
||||
buffer is automatically extended if it is too small.
|
||||
</P>
|
||||
<P>
|
||||
An empty line signals the end of the data lines, at which point a new regular
|
||||
expression is read. The regular expressions are given enclosed in any
|
||||
non-alphanumeric delimiters other than backslash, for example:
|
||||
<pre>
|
||||
/(a|bc)x+yz/
|
||||
</pre>
|
||||
White space before the initial delimiter is ignored. A regular expression may
|
||||
be continued over several input lines, in which case the newline characters are
|
||||
included within it. It is possible to include the delimiter within the pattern
|
||||
by escaping it, for example
|
||||
<pre>
|
||||
/abc\/def/
|
||||
</pre>
|
||||
If you do so, the escape and the delimiter form part of the pattern, but since
|
||||
delimiters are always non-alphanumeric, this does not affect its interpretation.
|
||||
If the terminating delimiter is immediately followed by a backslash, for
|
||||
example,
|
||||
<pre>
|
||||
/abc/\
|
||||
</pre>
|
||||
then a backslash is added to the end of the pattern. This is done to provide a
|
||||
way of testing the error condition that arises if a pattern finishes with a
|
||||
backslash, because
|
||||
<pre>
|
||||
/abc\/
|
||||
</pre>
|
||||
is interpreted as the first line of a pattern that starts with "abc/", causing
|
||||
pcretest to read the next line as a continuation of the regular expression.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">PATTERN MODIFIERS</a><br>
|
||||
<P>
|
||||
A pattern may be followed by any number of modifiers, which are mostly single
|
||||
characters. Following Perl usage, these are referred to below as, for example,
|
||||
"the <b>/i</b> modifier", even though the delimiter of the pattern need not
|
||||
always be a slash, and no slash is used when writing modifiers. Whitespace may
|
||||
appear between the final pattern delimiter and the first modifier, and between
|
||||
the modifiers themselves.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS,
|
||||
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
|
||||
<b>pcre_compile()</b> is called. These four modifier letters have the same
|
||||
effect as they do in Perl. For example:
|
||||
<pre>
|
||||
/caseless/i
|
||||
</pre>
|
||||
The following table shows additional modifiers for setting PCRE options that do
|
||||
not correspond to anything in Perl:
|
||||
<pre>
|
||||
<b>/A</b> PCRE_ANCHORED
|
||||
<b>/C</b> PCRE_AUTO_CALLOUT
|
||||
<b>/E</b> PCRE_DOLLAR_ENDONLY
|
||||
<b>/f</b> PCRE_FIRSTLINE
|
||||
<b>/J</b> PCRE_DUPNAMES
|
||||
<b>/N</b> PCRE_NO_AUTO_CAPTURE
|
||||
<b>/U</b> PCRE_UNGREEDY
|
||||
<b>/X</b> PCRE_EXTRA
|
||||
<b>/<cr></b> PCRE_NEWLINE_CR
|
||||
<b>/<lf></b> PCRE_NEWLINE_LF
|
||||
<b>/<crlf></b> PCRE_NEWLINE_CRLF
|
||||
</pre>
|
||||
Those specifying line endings are literal strings as shown. Details of the
|
||||
meanings of these PCRE options are given in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><b>
|
||||
Finding all matches in a string
|
||||
</b><br>
|
||||
<P>
|
||||
Searching for all possible matches within each subject string can be requested
|
||||
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
|
||||
again to search the remainder of the subject string. The difference between
|
||||
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to
|
||||
<b>pcre_exec()</b> to start searching at a new point within the entire string
|
||||
(which is in effect what Perl does), whereas the latter passes over a shortened
|
||||
substring. This makes a difference to the matching process if the pattern
|
||||
begins with a lookbehind assertion (including \b or \B).
|
||||
</P>
|
||||
<P>
|
||||
If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an
|
||||
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
|
||||
flags set in order to search for another, non-empty, match at the same point.
|
||||
If this second match fails, the start offset is advanced by one, and the normal
|
||||
match is retried. This imitates the way Perl handles such cases when using the
|
||||
<b>/g</b> modifier or the <b>split()</b> function.
|
||||
</P>
|
||||
<br><b>
|
||||
Other modifiers
|
||||
</b><br>
|
||||
<P>
|
||||
There are yet more modifiers for controlling the way <b>pcretest</b>
|
||||
operates.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/+</b> modifier requests that as well as outputting the substring that
|
||||
matched the entire pattern, pcretest should in addition output the remainder of
|
||||
the subject string. This is useful for tests where the subject contains
|
||||
multiple copies of the same substring.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/L</b> modifier must be followed directly by the name of a locale, for
|
||||
example,
|
||||
<pre>
|
||||
/pattern/Lfr_FR
|
||||
</pre>
|
||||
For this reason, it must be the last modifier. The given locale is set,
|
||||
<b>pcre_maketables()</b> is called to build a set of character tables for the
|
||||
locale, and this is then passed to <b>pcre_compile()</b> when compiling the
|
||||
regular expression. Without an <b>/L</b> modifier, NULL is passed as the tables
|
||||
pointer; that is, <b>/L</b> applies only to the expression on which it appears.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
|
||||
compiled pattern (whether it is anchored, has a fixed first character, and
|
||||
so on). It does this by calling <b>pcre_fullinfo()</b> after compiling a
|
||||
pattern. If the pattern is studied, the results of that are also output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/D</b> modifier is a PCRE debugging feature, which also assumes <b>/I</b>.
|
||||
It causes the internal form of compiled regular expressions to be output after
|
||||
compilation. If the pattern was studied, the information returned is also
|
||||
output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the
|
||||
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This
|
||||
facility is for testing the feature in PCRE that allows it to execute patterns
|
||||
that were compiled on a host with a different endianness. This feature is not
|
||||
available when the POSIX interface to PCRE is being used, that is, when the
|
||||
<b>/P</b> pattern modifier is specified. See also the section about saving and
|
||||
reloading compiled patterns below.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/S</b> modifier causes <b>pcre_study()</b> to be called after the
|
||||
expression has been compiled, and the results used when the expression is
|
||||
matched.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/M</b> modifier causes the size of memory block used to hold the compiled
|
||||
pattern to be output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper
|
||||
API rather than its native API. When this is done, all other modifiers except
|
||||
<b>/i</b>, <b>/m</b>, and <b>/+</b> are ignored. REG_ICASE is set if <b>/i</b> is
|
||||
present, and REG_NEWLINE is set if <b>/m</b> is present. The wrapper functions
|
||||
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/8</b> modifier causes <b>pcretest</b> to call PCRE with the PCRE_UTF8
|
||||
option set. This turns on support for UTF-8 character handling in PCRE,
|
||||
provided that it was compiled with this support enabled. This modifier also
|
||||
causes any non-printing characters in output strings to be printed using the
|
||||
\x{hh...} notation if they are valid UTF-8 sequences.
|
||||
</P>
|
||||
<P>
|
||||
If the <b>/?</b> modifier is used with <b>/8</b>, it causes <b>pcretest</b> to
|
||||
call <b>pcre_compile()</b> with the PCRE_NO_UTF8_CHECK option, to suppress the
|
||||
checking of the string for UTF-8 validity.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">DATA LINES</a><br>
|
||||
<P>
|
||||
Before each data line is passed to <b>pcre_exec()</b>, leading and trailing
|
||||
whitespace is removed, and it is then scanned for \ escapes. Some of these are
|
||||
pretty esoteric features, intended for checking out some of the more
|
||||
complicated features of PCRE. If you are just testing "ordinary" regular
|
||||
expressions, you probably don't need any of these. The following escapes are
|
||||
recognized:
|
||||
<pre>
|
||||
\a alarm (= BEL)
|
||||
\b backspace
|
||||
\e escape
|
||||
\f formfeed
|
||||
\n newline
|
||||
\qdd set the PCRE_MATCH_LIMIT limit to dd (any number of digits)
|
||||
\r carriage return
|
||||
\t tab
|
||||
\v vertical tab
|
||||
\nnn octal character (up to 3 octal digits)
|
||||
\xhh hexadecimal character (up to 2 hex digits)
|
||||
\x{hh...} hexadecimal character, any number of digits in UTF-8 mode
|
||||
\A pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\B pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\Cdd call pcre_copy_substring() for substring dd after a successful match (number less than 32)
|
||||
\Cname call pcre_copy_named_substring() for substring "name" after a successful match (name termin-
|
||||
ated by next non alphanumeric character)
|
||||
\C+ show the current captured substrings at callout time
|
||||
\C- do not supply a callout function
|
||||
\C!n return 1 instead of 0 when callout number n is reached
|
||||
\C!n!m return 1 instead of 0 when callout number n is reached for the nth time
|
||||
\C*n pass the number n (may be negative) as callout data; this is used as the callout return value
|
||||
\D use the <b>pcre_dfa_exec()</b> match function
|
||||
\F only shortest match for <b>pcre_dfa_exec()</b>
|
||||
\Gdd call pcre_get_substring() for substring dd after a successful match (number less than 32)
|
||||
\Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin-
|
||||
ated by next non-alphanumeric character)
|
||||
\L call pcre_get_substringlist() after a successful match
|
||||
\M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
|
||||
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\Odd set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
|
||||
\P pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
|
||||
\R pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b>
|
||||
\S output details of memory get/free calls during matching
|
||||
\Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\? pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\>dd start the match at offset dd (any number of digits);
|
||||
this sets the <i>startoffset</i> argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\<cr> pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\<lf> pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\<crlf> pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
</pre>
|
||||
The escapes that specify line endings are literal strings, exactly as shown.
|
||||
A backslash followed by anything else just escapes the anything else. If the
|
||||
very last character is a backslash, it is ignored. This gives a way of passing
|
||||
an empty line as data, since a real empty line terminates the data input.
|
||||
</P>
|
||||
<P>
|
||||
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
|
||||
different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
|
||||
fields of the <b>pcre_extra</b> data structure, until it finds the minimum
|
||||
numbers for each parameter that allow <b>pcre_exec()</b> to complete. The
|
||||
<i>match_limit</i> number is a measure of the amount of backtracking that takes
|
||||
place, and checking it out can be instructive. For most simple matches, the
|
||||
number is quite small, but for patterns with very large numbers of matching
|
||||
possibilities, it can become large very quickly with increasing length of
|
||||
subject string. The <i>match_limit_recursion</i> number is a measure of how much
|
||||
stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
|
||||
to complete the match attempt.
|
||||
</P>
|
||||
<P>
|
||||
When \O is used, the value specified may be higher or lower than the size set
|
||||
by the <b>-O</b> command line option (or defaulted to 45); \O applies only to
|
||||
the call of <b>pcre_exec()</b> for the line in which it appears.
|
||||
</P>
|
||||
<P>
|
||||
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
|
||||
API to be used, the only option-setting sequences that have any effect are \B
|
||||
and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
|
||||
<b>regexec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
|
||||
of the <b>/8</b> modifier on the pattern. It is recognized always. There may be
|
||||
any number of hexadecimal digits inside the braces. The result is from one to
|
||||
six bytes, encoded according to the UTF-8 rules.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
||||
<P>
|
||||
By default, <b>pcretest</b> uses the standard PCRE matching function,
|
||||
<b>pcre_exec()</b> to match each data line. From release 6.0, PCRE supports an
|
||||
alternative matching function, <b>pcre_dfa_test()</b>, which operates in a
|
||||
different way, and has some restrictions. The differences between the two
|
||||
functions are described in the
|
||||
<a href="pcrematching.html"><b>pcrematching</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
If a data line contains the \D escape sequence, or if the command line
|
||||
contains the <b>-dfa</b> option, the alternative matching function is called.
|
||||
This function finds all possible matches at a given point. If, however, the \F
|
||||
escape sequence is present in the data line, it stops after the first match is
|
||||
found. This is always the shortest possible match.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
|
||||
<P>
|
||||
This section describes the output when the normal matching function,
|
||||
<b>pcre_exec()</b>, is being used.
|
||||
</P>
|
||||
<P>
|
||||
When a match succeeds, pcretest outputs the list of captured substrings that
|
||||
<b>pcre_exec()</b> returns, starting with number 0 for the string that matched
|
||||
the whole pattern. Otherwise, it outputs "No match" or "Partial match"
|
||||
when <b>pcre_exec()</b> returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
|
||||
respectively, and otherwise the PCRE negative error number. Here is an example
|
||||
of an interactive <b>pcretest</b> run.
|
||||
<pre>
|
||||
$ pcretest
|
||||
PCRE version 5.00 07-Sep-2004
|
||||
|
||||
re> /^abc(\d+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
</pre>
|
||||
If the strings contain any non-printing characters, they are output as \0x
|
||||
escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the
|
||||
pattern. If the pattern has the <b>/+</b> modifier, the output for substring 0
|
||||
is followed by the the rest of the subject string, identified by "0+" like
|
||||
this:
|
||||
<pre>
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
</pre>
|
||||
If the pattern has the <b>/g</b> or <b>/G</b> modifier, the results of successive
|
||||
matching attempts are output in sequence, like this:
|
||||
<pre>
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
</pre>
|
||||
"No match" is output only if the first match attempt fails.
|
||||
</P>
|
||||
<P>
|
||||
If any of the sequences <b>\C</b>, <b>\G</b>, or <b>\L</b> are present in a
|
||||
data line that is successfully matched, the substrings extracted by the
|
||||
convenience functions are output with C, G, or L after the string number
|
||||
instead of a colon. This is in addition to the normal full list. The string
|
||||
length (that is, the return from the extraction function) is given in
|
||||
parentheses after each string for <b>\C</b> and <b>\G</b>.
|
||||
</P>
|
||||
<P>
|
||||
Note that while patterns can be continued over several lines (a plain ">"
|
||||
prompt is used for continuations), data lines may not. However newlines can be
|
||||
included in data by means of the \n escape (or \r or \r\n for those newline
|
||||
settings).
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
||||
<P>
|
||||
When the alternative matching function, <b>pcre_dfa_exec()</b>, is used (by
|
||||
means of the \D escape sequence or the <b>-dfa</b> command line option), the
|
||||
output consists of a list of all the matches that start at the first point in
|
||||
the subject where there is at least one match. For example:
|
||||
<pre>
|
||||
re> /(tang|tangerine|tan)/
|
||||
data> yellow tangerine\D
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
</pre>
|
||||
(Using the normal matching function on this data finds only "tang".) The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
</P>
|
||||
<P>
|
||||
If \fB/g\P is present on the pattern, the search for further matches resumes
|
||||
at the end of the longest match. For example:
|
||||
<pre>
|
||||
re> /(tang|tangerine|tan)/g
|
||||
data> yellow tangerine and tangy sultana\D
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
0: tang
|
||||
1: tan
|
||||
0: tan
|
||||
</pre>
|
||||
Since the matching function does not support substring capture, the escape
|
||||
sequences that are concerned with captured substrings are not relevant.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
|
||||
<P>
|
||||
When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
|
||||
indicating that the subject partially matched the pattern, you can restart the
|
||||
match with additional subject data by means of the \R escape sequence. For
|
||||
example:
|
||||
<pre>
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 23ja\P\D
|
||||
Partial match: 23ja
|
||||
data> n05\R\D
|
||||
0: n05
|
||||
</pre>
|
||||
For further information about partial matching, see the
|
||||
<a href="pcrepartial.html"><b>pcrepartial</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
If the pattern contains any callout requests, <b>pcretest</b>'s callout function
|
||||
is called during matching. This works with both matching functions. By default,
|
||||
the called function displays the callout number, the start and current
|
||||
positions in the text at the callout time, and the next pattern item to be
|
||||
tested. For example, the output
|
||||
<pre>
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \d
|
||||
</pre>
|
||||
indicates that callout number 0 occurred for a match attempt starting at the
|
||||
fourth character of the subject string, when the pointer was at the seventh
|
||||
character of the data, and when the next pattern item was \d. Just one
|
||||
circumflex is output if the start and current positions are the same.
|
||||
</P>
|
||||
<P>
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
|
||||
result of the <b>/C</b> pattern modifier. In this case, instead of showing the
|
||||
callout number, the offset in the pattern, preceded by a plus, is output. For
|
||||
example:
|
||||
<pre>
|
||||
re> /\d?[A-E]\*/C
|
||||
data> E*
|
||||
--->E*
|
||||
+0 ^ \d?
|
||||
+3 ^ [A-E]
|
||||
+8 ^^ \*
|
||||
+10 ^ ^
|
||||
0: E*
|
||||
</pre>
|
||||
The callout function in <b>pcretest</b> returns zero (carry on matching) by
|
||||
default, but you can use a \C item in a data line (as described above) to
|
||||
change this.
|
||||
</P>
|
||||
<P>
|
||||
Inserting callouts can be helpful when using <b>pcretest</b> to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
<a href="pcrecallout.html"><b>pcrecallout</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
|
||||
<P>
|
||||
The facilities described in this section are not available when the POSIX
|
||||
inteface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
|
||||
specified.
|
||||
</P>
|
||||
<P>
|
||||
When the POSIX interface is not in use, you can cause <b>pcretest</b> to write a
|
||||
compiled pattern to a file, by following the modifiers with > and a file name.
|
||||
For example:
|
||||
<pre>
|
||||
/pattern/im >/some/file
|
||||
</pre>
|
||||
See the
|
||||
<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
|
||||
documentation for a discussion about saving and re-using compiled patterns.
|
||||
</P>
|
||||
<P>
|
||||
The data that is written is binary. The first eight bytes are the length of the
|
||||
compiled pattern data followed by the length of the optional study data, each
|
||||
written as four bytes in big-endian order (most significant byte first). If
|
||||
there is no study data (either the pattern was not studied, or studying did not
|
||||
return any data), the second length is zero. The lengths are followed by an
|
||||
exact copy of the compiled pattern. If there is additional study data, this
|
||||
follows immediately after the compiled pattern. After writing the file,
|
||||
<b>pcretest</b> expects to read a new pattern.
|
||||
</P>
|
||||
<P>
|
||||
A saved pattern can be reloaded into <b>pcretest</b> by specifing < and a file
|
||||
name instead of a pattern. The name of the file must not contain a < character,
|
||||
as otherwise <b>pcretest</b> will interpret the line as a pattern delimited by <
|
||||
characters.
|
||||
For example:
|
||||
<pre>
|
||||
re> </some/file
|
||||
Compiled regex loaded from /some/file
|
||||
No study data
|
||||
</pre>
|
||||
When the pattern has been loaded, <b>pcretest</b> proceeds to read data lines in
|
||||
the usual way.
|
||||
</P>
|
||||
<P>
|
||||
You can copy a file written by <b>pcretest</b> to a different host and reload it
|
||||
there, even if the new host has opposite endianness to the one on which the
|
||||
pattern was compiled. For example, you can compile on an i86 machine and run on
|
||||
a SPARC machine.
|
||||
</P>
|
||||
<P>
|
||||
File names for saving and reloading can be absolute or relative, but note that
|
||||
the shell facility of expanding a file name that starts with a tilde (~) is not
|
||||
available.
|
||||
</P>
|
||||
<P>
|
||||
The ability to save and reload files in <b>pcretest</b> is intended for testing
|
||||
and experimentation. It is not intended for production use because only a
|
||||
single pattern can be written to a file. Furthermore, there is no facility for
|
||||
supplying custom character tables for use with a reloaded pattern. If the
|
||||
original pattern was compiled with custom tables, an attempt to match a subject
|
||||
string using a reloaded pattern is likely to cause <b>pcretest</b> to crash.
|
||||
Finally, if you attempt to load a file that is not in the correct format, the
|
||||
result is undefined.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service,
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 29 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
|
@ -0,0 +1,244 @@
|
|||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH INTRODUCTION
|
||||
.rs
|
||||
.sp
|
||||
The PCRE library is a set of functions that implement regular expression
|
||||
pattern matching using the same syntax and semantics as Perl, with just a few
|
||||
differences. The current implementation of PCRE (release 6.x) corresponds
|
||||
approximately with Perl 5.8, including support for UTF-8 encoded strings and
|
||||
Unicode general category properties. However, this support has to be explicitly
|
||||
enabled; it is not the default.
|
||||
.P
|
||||
In addition to the Perl-compatible matching function, PCRE also contains an
|
||||
alternative matching function that matches the same compiled patterns in a
|
||||
different way. In certain circumstances, the alternative function has some
|
||||
advantages. For a discussion of the two matching algorithms, see the
|
||||
.\" HREF
|
||||
\fBpcrematching\fP
|
||||
.\"
|
||||
page.
|
||||
.P
|
||||
PCRE is written in C and released as a C library. A number of people have
|
||||
written wrappers and interfaces of various kinds. In particular, Google Inc.
|
||||
have provided a comprehensive C++ wrapper. This is now included as part of the
|
||||
PCRE distribution. The
|
||||
.\" HREF
|
||||
\fBpcrecpp\fP
|
||||
.\"
|
||||
page has details of this interface. Other people's contributions can be found
|
||||
in the \fIContrib\fR directory at the primary FTP site, which is:
|
||||
.sp
|
||||
.\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
|
||||
.\" </a>
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
|
||||
.P
|
||||
Details of exactly which Perl regular expression features are and are not
|
||||
supported by PCRE are given in separate documents. See the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fR
|
||||
.\"
|
||||
and
|
||||
.\" HREF
|
||||
\fBpcrecompat\fR
|
||||
.\"
|
||||
pages.
|
||||
.P
|
||||
Some features of PCRE can be included, excluded, or changed when the library is
|
||||
built. The
|
||||
.\" HREF
|
||||
\fBpcre_config()\fR
|
||||
.\"
|
||||
function makes it possible for a client to discover which features are
|
||||
available. The features themselves are described in the
|
||||
.\" HREF
|
||||
\fBpcrebuild\fP
|
||||
.\"
|
||||
page. Documentation about building PCRE for various operating systems can be
|
||||
found in the \fBREADME\fP file in the source distribution.
|
||||
.P
|
||||
The library contains a number of undocumented internal functions and data
|
||||
tables that are used by more than one of the exported external functions, but
|
||||
which are not intended for use by external callers. Their names all begin with
|
||||
"_pcre_", which hopefully will not provoke any name clashes. In some
|
||||
environments, it is possible to control which external symbols are exported
|
||||
when a shared library is built, and in these cases the undocumented symbols are
|
||||
not exported.
|
||||
.
|
||||
.
|
||||
.SH "USER DOCUMENTATION"
|
||||
.rs
|
||||
.sp
|
||||
The user documentation for PCRE comprises a number of different sections. In
|
||||
the "man" format, each of these is a separate "man page". In the HTML format,
|
||||
each is a separate page, linked from the index page. In the plain text format,
|
||||
all the sections are concatenated, for ease of searching. The sections are as
|
||||
follows:
|
||||
.sp
|
||||
pcre this document
|
||||
pcreapi details of PCRE's native C API
|
||||
pcrebuild options for building PCRE
|
||||
pcrecallout details of the callout feature
|
||||
pcrecompat discussion of Perl compatibility
|
||||
pcrecpp details of the C++ wrapper
|
||||
pcregrep description of the \fBpcregrep\fP command
|
||||
pcrematching discussion of the two matching algorithms
|
||||
pcrepartial details of the partial matching facility
|
||||
.\" JOIN
|
||||
pcrepattern syntax and semantics of supported
|
||||
regular expressions
|
||||
pcreperform discussion of performance issues
|
||||
pcreposix the POSIX-compatible C API
|
||||
pcreprecompile details of saving and re-using precompiled patterns
|
||||
pcresample discussion of the sample program
|
||||
pcrestack discussion of stack usage
|
||||
pcretest description of the \fBpcretest\fP testing command
|
||||
.sp
|
||||
In addition, in the "man" and HTML formats, there is a short page for each
|
||||
C library function, listing its arguments and results.
|
||||
.
|
||||
.
|
||||
.SH LIMITATIONS
|
||||
.rs
|
||||
.sp
|
||||
There are some size limitations in PCRE but it is hoped that they will never in
|
||||
practice be relevant.
|
||||
.P
|
||||
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
|
||||
compiled with the default internal linkage size of 2. If you want to process
|
||||
regular expressions that are truly enormous, you can compile PCRE with an
|
||||
internal linkage size of 3 or 4 (see the \fBREADME\fP file in the source
|
||||
distribution and the
|
||||
.\" HREF
|
||||
\fBpcrebuild\fP
|
||||
.\"
|
||||
documentation for details). In these cases the limit is substantially larger.
|
||||
However, the speed of execution will be slower.
|
||||
.P
|
||||
All values in repeating quantifiers must be less than 65536. The maximum
|
||||
compiled length of subpattern with an explicit repeat count is 30000 bytes. The
|
||||
maximum number of capturing subpatterns is 65535.
|
||||
.P
|
||||
There is no limit to the number of non-capturing subpatterns, but the maximum
|
||||
depth of nesting of all kinds of parenthesized subpattern, including capturing
|
||||
subpatterns, assertions, and other types of subpattern, is 200.
|
||||
.P
|
||||
The maximum length of name for a named subpattern is 32, and the maximum number
|
||||
of named subpatterns is 10000.
|
||||
.P
|
||||
The maximum length of a subject string is the largest positive number that an
|
||||
integer variable can hold. However, when using the traditional matching
|
||||
function, PCRE uses recursion to handle subpatterns and indefinite repetition.
|
||||
This means that the available stack space may limit the size of a subject
|
||||
string that can be processed by certain patterns. For a discussion of stack
|
||||
issues, see the
|
||||
.\" HREF
|
||||
\fBpcrestack\fP
|
||||
.\"
|
||||
documentation.
|
||||
.sp
|
||||
.\" HTML <a name="utf8support"></a>
|
||||
.
|
||||
.
|
||||
.SH "UTF-8 AND UNICODE PROPERTY SUPPORT"
|
||||
.rs
|
||||
.sp
|
||||
From release 3.3, PCRE has had some support for character strings encoded in
|
||||
the UTF-8 format. For release 4.0 this was greatly extended to cover most
|
||||
common requirements, and in release 5.0 additional support for Unicode general
|
||||
category properties was added.
|
||||
.P
|
||||
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
|
||||
the code, and, in addition, you must call
|
||||
.\" HREF
|
||||
\fBpcre_compile()\fP
|
||||
.\"
|
||||
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
|
||||
subject strings that are matched against it are treated as UTF-8 strings
|
||||
instead of just strings of bytes.
|
||||
.P
|
||||
If you compile PCRE with UTF-8 support, but do not use it at run time, the
|
||||
library will be a bit bigger, but the additional run time overhead is limited
|
||||
to testing the PCRE_UTF8 flag in several places, so should not be very large.
|
||||
.P
|
||||
If PCRE is built with Unicode character property support (which implies UTF-8
|
||||
support), the escape sequences \ep{..}, \eP{..}, and \eX are supported.
|
||||
The available properties that can be tested are limited to the general
|
||||
category properties such as Lu for an upper case letter or Nd for a decimal
|
||||
number, the Unicode script names such as Arabic or Han, and the derived
|
||||
properties Any and L&. A full list is given in the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fP
|
||||
.\"
|
||||
documentation. Only the short names for properties are supported. For example,
|
||||
\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.
|
||||
Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
|
||||
compatibility with Perl 5.6. PCRE does not support this.
|
||||
.P
|
||||
The following comments apply when PCRE is running in UTF-8 mode:
|
||||
.P
|
||||
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
|
||||
are checked for validity on entry to the relevant functions. If an invalid
|
||||
UTF-8 string is passed, an error return is given. In some situations, you may
|
||||
already know that your strings are valid, and therefore want to skip these
|
||||
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
|
||||
at compile time or at run time, PCRE assumes that the pattern or subject it
|
||||
is given (respectively) contains only valid UTF-8 codes. In this case, it does
|
||||
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
|
||||
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
|
||||
may crash.
|
||||
.P
|
||||
2. An unbraced hexadecimal escape sequence (such as \exb3) matches a two-byte
|
||||
UTF-8 character if the value is greater than 127.
|
||||
.P
|
||||
3. Octal numbers up to \e777 are recognized, and match two-byte UTF-8
|
||||
characters for values greater than \e177.
|
||||
.P
|
||||
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
|
||||
bytes, for example: \ex{100}{3}.
|
||||
.P
|
||||
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
|
||||
.P
|
||||
6. The escape sequence \eC can be used to match a single byte in UTF-8 mode,
|
||||
but its use can lead to some strange effects. This facility is not available in
|
||||
the alternative matching function, \fBpcre_dfa_exec()\fP.
|
||||
.P
|
||||
7. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
|
||||
test characters of any code value, but the characters that PCRE recognizes as
|
||||
digits, spaces, or word characters remain the same set as before, all with
|
||||
values less than 256. This remains true even when PCRE includes Unicode
|
||||
property support, because to do otherwise would slow down PCRE in many common
|
||||
cases. If you really want to test for a wider sense of, say, "digit", you
|
||||
must use Unicode property tests such as \ep{Nd}.
|
||||
.P
|
||||
8. Similarly, characters that match the POSIX named character classes are all
|
||||
low-valued characters.
|
||||
.P
|
||||
9. Case-insensitive matching applies only to characters whose values are less
|
||||
than 128, unless PCRE is built with Unicode property support. Even when Unicode
|
||||
property support is available, PCRE still uses its own character tables when
|
||||
checking the case of low-valued characters, so as not to degrade performance.
|
||||
The Unicode property information is used only for characters with higher
|
||||
values. Even when Unicode property support is available, PCRE supports
|
||||
case-insensitive matching only when there is a one-to-one mapping between a
|
||||
letter's cases. There are a small number of many-to-one mappings in Unicode;
|
||||
these are not supported by PCRE.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.P
|
||||
Putting an actual email address here seems to have been a spam magnet, so I've
|
||||
taken it away. If you want to email me, use my initial and surname, separated
|
||||
by a dot, at the domain ucs.cam.ac.uk.
|
||||
.sp
|
||||
.in 0
|
||||
Last updated: 05 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,69 @@
|
|||
.TH PCRE_COMPILE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B pcre *pcre_compile(const char *\fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function compiles a regular expression into an internal form. Its
|
||||
arguments are:
|
||||
.sp
|
||||
\fIpattern\fR A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
\fIoptions\fR Zero or more option bits
|
||||
\fIerrptr\fR Where to put an error message
|
||||
\fIerroffset\fR Offset in pattern where error was found
|
||||
\fItableptr\fR Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
.sp
|
||||
The option bits are:
|
||||
.sp
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_AUTO_CALLOUT Compile automatic callouts
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_DUPNAMES Allow duplicate names for subpatterns
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
.sp
|
||||
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
|
||||
PCRE_NO_UTF8_CHECK.
|
||||
.P
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fR
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,74 @@
|
|||
.TH PCRE_COMPILE2 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B int *\fIerrorcodeptr\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function compiles a regular expression into an internal form. It is the
|
||||
same as \fBpcre_compile()\fP, except for the addition of the \fIerrorcodeptr\fP
|
||||
argument. The arguments are:
|
||||
|
||||
.sp
|
||||
\fIpattern\fR A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
\fIoptions\fR Zero or more option bits
|
||||
\fIerrorcodeptr\fP Where to put an error code
|
||||
\fIerrptr\fR Where to put an error message
|
||||
\fIerroffset\fR Offset in pattern where error was found
|
||||
\fItableptr\fR Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
.sp
|
||||
The option bits are:
|
||||
.sp
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_AUTO_CALLOUT Compile automatic callouts
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_DUPNAMES Allow duplicate names for subpatterns
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
.sp
|
||||
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
|
||||
PCRE_NO_UTF8_CHECK.
|
||||
.P
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fR
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,50 @@
|
|||
.TH PCRE_CONFIG 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function makes it possible for a client program to find out which optional
|
||||
features are available in the version of the PCRE library it is using. Its
|
||||
arguments are as follows:
|
||||
.sp
|
||||
\fIwhat\fR A code specifying what information is required
|
||||
\fIwhere\fR Points to where to put the data
|
||||
.sp
|
||||
The available codes are:
|
||||
.sp
|
||||
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
|
||||
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
|
||||
PCRE_CONFIG_MATCH_LIMIT_RECURSION
|
||||
Internal recursion depth limit
|
||||
PCRE_CONFIG_NEWLINE Value of the newline sequence
|
||||
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
|
||||
Threshold of return slots, above
|
||||
which \fBmalloc()\fR is used by
|
||||
the POSIX API
|
||||
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
|
||||
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
|
||||
PCRE_CONFIG_UNICODE_PROPERTIES
|
||||
Availability of Unicode property support
|
||||
(1=yes 0=no)
|
||||
.sp
|
||||
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fR
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,44 @@
|
|||
.TH PCRE_COPY_NAMED_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_copy_named_substring(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, const char *\fIstringname\fP,
|
||||
.ti +5n
|
||||
.B char *\fIbuffer\fP, int \fIbuffersize\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring, identified
|
||||
by name, into a given buffer. The arguments are:
|
||||
.sp
|
||||
\fIcode\fP Pattern that was successfully matched
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
|
||||
\fIstringname\fP Name of the required substring
|
||||
\fIbuffer\fP Buffer to receive the string
|
||||
\fIbuffersize\fP Size of buffer
|
||||
.sp
|
||||
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,41 @@
|
|||
.TH PCRE_COPY_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_copy_substring(const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP, char *\fIbuffer\fP,
|
||||
.ti +5n
|
||||
.B int \fIbuffersize\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring into a given
|
||||
buffer. The arguments are:
|
||||
.sp
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
|
||||
\fIstringnumber\fP Number of the required substring
|
||||
\fIbuffer\fP Buffer to receive the string
|
||||
\fIbuffersize\fP Size of buffer
|
||||
.sp
|
||||
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,85 @@
|
|||
.TH PCRE_DFA_EXEC 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B int *\fIworkspace\fP, int \fIwscount\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, using a DFA matching algorithm (\fInot\fP Perl-compatible). Note that
|
||||
the main, Perl-compatible, matching function is \fBpcre_exec()\fP. The
|
||||
arguments for this function are:
|
||||
.sp
|
||||
\fIcode\fP Points to the compiled pattern
|
||||
\fIextra\fP Points to an associated \fBpcre_extra\fP structure,
|
||||
or is NULL
|
||||
\fIsubject\fP Points to the subject string
|
||||
\fIlength\fP Length of the subject string, in bytes
|
||||
\fIstartoffset\fP Offset in bytes in the subject at which to
|
||||
start matching
|
||||
\fIoptions\fP Option bits
|
||||
\fIovector\fP Points to a vector of ints for result offsets
|
||||
\fIovecsize\fP Number of elements in the vector
|
||||
\fIworkspace\fP Points to a vector of ints used as working space
|
||||
\fIwscount\fP Number of elements in the vector
|
||||
.sp
|
||||
The options are:
|
||||
.sp
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
|
||||
PCRE_DFA_SHORTEST Return only the shortest match
|
||||
PCRE_DFA_RESTART This is a restart after a partial match
|
||||
.sp
|
||||
There are restrictions on what may appear in a pattern when matching using the
|
||||
DFA algorithm is requested. Details are given in the
|
||||
.\" HREF
|
||||
\fBpcrematching\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
A \fBpcre_extra\fP structure contains the following fields:
|
||||
.sp
|
||||
\fIflags\fP Bits indicating which fields are set
|
||||
\fIstudy_data\fP Opaque data from \fBpcre_study()\fP
|
||||
\fImatch_limit\fP Limit on internal resource use
|
||||
\fImatch_limit_recursion\fP Limit on internal recursion depth
|
||||
\fIcallout_data\fP Opaque data passed back to callouts
|
||||
\fItables\fP Points to character tables or is NULL
|
||||
.sp
|
||||
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
|
||||
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
|
||||
PCRE_EXTRA_TABLES. For DFA matching, the \fImatch_limit\fP and
|
||||
\fImatch_limit_recursion\fP fields are not used, and must not be set.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,73 @@
|
|||
.TH PCRE_EXEC 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, using a matching algorithm that is similar to Perl's. It returns
|
||||
offsets to captured substrings. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP Points to the compiled pattern
|
||||
\fIextra\fP Points to an associated \fBpcre_extra\fP structure,
|
||||
or is NULL
|
||||
\fIsubject\fP Points to the subject string
|
||||
\fIlength\fP Length of the subject string, in bytes
|
||||
\fIstartoffset\fP Offset in bytes in the subject at which to
|
||||
start matching
|
||||
\fIoptions\fP Option bits
|
||||
\fIovector\fP Points to a vector of ints for result offsets
|
||||
\fIovecsize\fP Number of elements in the vector (a multiple of 3)
|
||||
.sp
|
||||
The options are:
|
||||
.sp
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
|
||||
.sp
|
||||
There are restrictions on what may appear in a pattern when partial matching is
|
||||
requested.
|
||||
.P
|
||||
A \fBpcre_extra\fP structure contains the following fields:
|
||||
.sp
|
||||
\fIflags\fP Bits indicating which fields are set
|
||||
\fIstudy_data\fP Opaque data from \fBpcre_study()\fP
|
||||
\fImatch_limit\fP Limit on internal resource use
|
||||
\fImatch_limit_recursion\fP Limit on internal recursion depth
|
||||
\fIcallout_data\fP Opaque data passed back to callouts
|
||||
\fItables\fP Points to character tables or is NULL
|
||||
.sp
|
||||
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
|
||||
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
|
||||
PCRE_EXTRA_TABLES.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,28 @@
|
|||
.TH PCRE_FREE_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B void pcre_free_substring(const char *\fIstringptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to \fBpcre_get_substring()\fP or \fBpcre_get_named_substring()\fP. Its
|
||||
only argument is a pointer to the string.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,28 @@
|
|||
.TH PCRE_FREE_SUBSTRING_LIST 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B void pcre_free_substring_list(const char **\fIstringptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to \fBpcre_get_substring_list()\fP. Its only argument is a pointer to the
|
||||
list of string pointers.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,59 @@
|
|||
.TH PCRE_FULLINFO 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_fullinfo(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function returns information about a compiled pattern. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP Compiled regular expression
|
||||
\fIextra\fP Result of \fBpcre_study()\fP or NULL
|
||||
\fIwhat\fP What information is required
|
||||
\fIwhere\fP Where to put the information
|
||||
.sp
|
||||
The following information is available:
|
||||
.sp
|
||||
PCRE_INFO_BACKREFMAX Number of highest back reference
|
||||
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
|
||||
PCRE_INFO_DEFAULT_TABLES Pointer to default tables
|
||||
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
|
||||
-1 for start of string
|
||||
or after newline, or
|
||||
-2 otherwise
|
||||
PCRE_INFO_FIRSTTABLE Table of first bytes
|
||||
(after studying)
|
||||
PCRE_INFO_LASTLITERAL Literal last byte required
|
||||
PCRE_INFO_NAMECOUNT Number of named subpatterns
|
||||
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
|
||||
PCRE_INFO_NAMETABLE Pointer to name table
|
||||
PCRE_INFO_OPTIONS Options used for compilation
|
||||
PCRE_INFO_SIZE Size of compiled pattern
|
||||
PCRE_INFO_STUDYSIZE Size of study data
|
||||
.sp
|
||||
The yield of the function is zero on success or:
|
||||
.sp
|
||||
PCRE_ERROR_NULL the argument \fIcode\fP was NULL
|
||||
the argument \fIwhere\fP was NULL
|
||||
PCRE_ERROR_BADMAGIC the "magic number" was not found
|
||||
PCRE_ERROR_BADOPTION the value of \fIwhat\fP was invalid
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,45 @@
|
|||
.TH PCRE_GET_NAMED_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_named_substring(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, const char *\fIstringname\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIstringptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring by name. The
|
||||
arguments are:
|
||||
.sp
|
||||
\fIcode\fP Compiled pattern
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
|
||||
\fIstringname\fP Name of the required substring
|
||||
\fIstringptr\fP Where to put the string pointer
|
||||
.sp
|
||||
The memory in which the substring is placed is obtained by calling
|
||||
\fBpcre_malloc()\fP. The yield of the function is the length of the extracted
|
||||
substring, PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
||||
PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,35 @@
|
|||
.TH PCRE_GET_STRINGNUMBER 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_stringnumber(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIname\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This convenience function finds the number of a named substring capturing
|
||||
parenthesis in a compiled pattern. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP Compiled regular expression
|
||||
\fIname\fP Name whose number is required
|
||||
.sp
|
||||
The yield of the function is the number of the parenthesis if the name is
|
||||
found, or PCRE_ERROR_NOSUBSTRING otherwise.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,41 @@
|
|||
.TH PCRE_GET_STRINGNUMBER 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_stringtable_entries(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This convenience function finds, for a compiled pattern, the first and last
|
||||
entries for a given name in the table that translates capturing parenthesis
|
||||
names into numbers. When names are required to be unique (PCRE_DUPNAMES is
|
||||
\fInot\fP set), it is usually easier to use \fBpcre_get_stringnumber()\fP
|
||||
instead.
|
||||
.sp
|
||||
\fIcode\fP Compiled regular expression
|
||||
\fIname\fP Name whose entries required
|
||||
\fIfirst\fP Where to return a pointer to the first entry
|
||||
\fIlast\fP Where to return a pointer to the last entry
|
||||
.sp
|
||||
The yield of the function is the length of each entry, or
|
||||
PCRE_ERROR_NOSUBSTRING if none are found.
|
||||
.P
|
||||
There is a complete description of the PCRE native API, including the format of
|
||||
the table entries, in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,42 @@
|
|||
.TH PCRE_GET_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_substring(const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIstringptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring. The
|
||||
arguments are:
|
||||
.sp
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
|
||||
\fIstringnumber\fP Number of the required substring
|
||||
\fIstringptr\fP Where to put the string pointer
|
||||
.sp
|
||||
The memory in which the substring is placed is obtained by calling
|
||||
\fBpcre_malloc()\fP. The yield of the function is the length of the substring,
|
||||
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
||||
PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,40 @@
|
|||
.TH PCRE_GET_SUBSTRING_LIST 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_substring_list(const char *\fIsubject\fP,
|
||||
.ti +5n
|
||||
.B int *\fIovector\fP, int \fIstringcount\fP, "const char ***\fIlistptr\fP);"
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a list of all the captured
|
||||
substrings. The arguments are:
|
||||
.sp
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec\fP
|
||||
\fIlistptr\fP Where to put a pointer to the list
|
||||
.sp
|
||||
The memory in which the substrings and the list are placed is obtained by
|
||||
calling \fBpcre_malloc()\fP. A pointer to a list of pointers is put in
|
||||
the variable whose address is in \fIlistptr\fP. The list is terminated by a
|
||||
NULL pointer. The yield of the function is zero on success or
|
||||
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,27 @@
|
|||
.TH PCRE_INFO 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_info(const pcre *\fIcode\fP, int *\fIoptptr\fP, int
|
||||
.B *\fIfirstcharptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function is obsolete. You should be using \fBpcre_fullinfo()\fP instead.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,30 @@
|
|||
.TH PCRE_MAKETABLES 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B const unsigned char *pcre_maketables(void);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function builds a set of character tables for character values less than
|
||||
256. These can be passed to \fBpcre_compile()\fP to override PCRE's internal,
|
||||
built-in tables (which were made by \fBpcre_maketables()\fP when PCRE was
|
||||
compiled). You might want to do this if you are using a non-standard locale.
|
||||
The function yields a pointer to the tables.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,33 @@
|
|||
.TH PCRE_REFCOUNT 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function is used to maintain a reference count inside a data block that
|
||||
contains a compiled pattern. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP Compiled regular expression
|
||||
\fIadjust\fP Adjustment to reference value
|
||||
.sp
|
||||
The yield of the function is the adjusted reference value, which is constrained
|
||||
to lie between 0 and 65535.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,43 @@
|
|||
.TH PCRE_STUDY 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function studies a compiled pattern, to see if additional information can
|
||||
be extracted that might speed up matching. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP A compiled regular expression
|
||||
\fIoptions\fP Options for \fBpcre_study()\fP
|
||||
\fIerrptr\fP Where to put an error message
|
||||
.sp
|
||||
If the function succeeds, it returns a value that can be passed to
|
||||
\fBpcre_exec()\fP via its \fIextra\fP argument.
|
||||
.P
|
||||
If the function returns NULL, either it could not find any additional
|
||||
information, or there was an error. You can tell the difference by looking at
|
||||
the error value. It is NULL in first case.
|
||||
.P
|
||||
There are currently no options defined; the value of the second argument should
|
||||
always be zero.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
|
@ -0,0 +1,27 @@
|
|||
.TH PCRE_VERSION 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B char *pcre_version(void);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function returns a character string that gives the version number of the
|
||||
PCRE library and the date of its release.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,213 @@
|
|||
.TH PCREBUILD 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE BUILD-TIME OPTIONS"
|
||||
.rs
|
||||
.sp
|
||||
This document describes the optional features of PCRE that can be selected when
|
||||
the library is compiled. They are all selected, or deselected, by providing
|
||||
options to the \fBconfigure\fP script that is run before the \fBmake\fP
|
||||
command. The complete list of options for \fBconfigure\fP (which includes the
|
||||
standard ones such as the selection of the installation directory) can be
|
||||
obtained by running
|
||||
.sp
|
||||
./configure --help
|
||||
.sp
|
||||
The following sections describe certain options whose names begin with --enable
|
||||
or --disable. These settings specify changes to the defaults for the
|
||||
\fBconfigure\fP command. Because of the way that \fBconfigure\fP works,
|
||||
--enable and --disable always come in pairs, so the complementary option always
|
||||
exists as well, but as it specifies the default, it is not described.
|
||||
.
|
||||
.SH "C++ SUPPORT"
|
||||
.rs
|
||||
.sp
|
||||
By default, the \fBconfigure\fP script will search for a C++ compiler and C++
|
||||
header files. If it finds them, it automatically builds the C++ wrapper library
|
||||
for PCRE. You can disable this by adding
|
||||
.sp
|
||||
--disable-cpp
|
||||
.sp
|
||||
to the \fBconfigure\fP command.
|
||||
.
|
||||
.SH "UTF-8 SUPPORT"
|
||||
.rs
|
||||
.sp
|
||||
To build PCRE with support for UTF-8 character strings, add
|
||||
.sp
|
||||
--enable-utf8
|
||||
.sp
|
||||
to the \fBconfigure\fP command. Of itself, this does not make PCRE treat
|
||||
strings as UTF-8. As well as compiling PCRE with this option, you also have
|
||||
have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fP
|
||||
function.
|
||||
.
|
||||
.SH "UNICODE CHARACTER PROPERTY SUPPORT"
|
||||
.rs
|
||||
.sp
|
||||
UTF-8 support allows PCRE to process character values greater than 255 in the
|
||||
strings that it handles. On its own, however, it does not provide any
|
||||
facilities for accessing the properties of such characters. If you want to be
|
||||
able to use the pattern escapes \eP, \ep, and \eX, which refer to Unicode
|
||||
character properties, you must add
|
||||
.sp
|
||||
--enable-unicode-properties
|
||||
.sp
|
||||
to the \fBconfigure\fP command. This implies UTF-8 support, even if you have
|
||||
not explicitly requested it.
|
||||
.P
|
||||
Including Unicode property support adds around 90K of tables to the PCRE
|
||||
library, approximately doubling its size. Only the general category properties
|
||||
such as \fILu\fP and \fINd\fP are supported. Details are given in the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.SH "CODE VALUE OF NEWLINE"
|
||||
.rs
|
||||
.sp
|
||||
By default, PCRE interprets character 10 (linefeed, LF) as indicating the end
|
||||
of a line. This is the normal newline character on Unix-like systems. You can
|
||||
compile PCRE to use character 13 (carriage return, CR) instead, by adding
|
||||
.sp
|
||||
--enable-newline-is-cr
|
||||
.sp
|
||||
to the \fBconfigure\fP command. There is also a --enable-newline-is-lf option,
|
||||
which explicitly specifies linefeed as the newline character.
|
||||
.sp
|
||||
Alternatively, you can specify that line endings are to be indicated by the two
|
||||
character sequence CRLF. If you want this, add
|
||||
.sp
|
||||
--enable-newline-is-crlf
|
||||
.sp
|
||||
to the \fBconfigure\fP command. Whatever line ending convention is selected
|
||||
when PCRE is built can be overridden when the library functions are called. At
|
||||
build time it is conventional to use the standard for your operating system.
|
||||
.
|
||||
.SH "BUILDING SHARED AND STATIC LIBRARIES"
|
||||
.rs
|
||||
.sp
|
||||
The PCRE building process uses \fBlibtool\fP to build both shared and static
|
||||
Unix libraries by default. You can suppress one of these by adding one of
|
||||
.sp
|
||||
--disable-shared
|
||||
--disable-static
|
||||
.sp
|
||||
to the \fBconfigure\fP command, as required.
|
||||
.
|
||||
.SH "POSIX MALLOC USAGE"
|
||||
.rs
|
||||
.sp
|
||||
When PCRE is called through the POSIX interface (see the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
documentation), additional working storage is required for holding the pointers
|
||||
to capturing substrings, because PCRE requires three integers per substring,
|
||||
whereas the POSIX interface provides only two. If the number of expected
|
||||
substrings is small, the wrapper function uses space on the stack, because this
|
||||
is faster than using \fBmalloc()\fP for each call. The default threshold above
|
||||
which the stack is no longer used is 10; it can be changed by adding a setting
|
||||
such as
|
||||
.sp
|
||||
--with-posix-malloc-threshold=20
|
||||
.sp
|
||||
to the \fBconfigure\fP command.
|
||||
.
|
||||
.SH "HANDLING VERY LARGE PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
Within a compiled pattern, offset values are used to point from one part to
|
||||
another (for example, from an opening parenthesis to an alternation
|
||||
metacharacter). By default, two-byte values are used for these offsets, leading
|
||||
to a maximum size for a compiled pattern of around 64K. This is sufficient to
|
||||
handle all but the most gigantic patterns. Nevertheless, some people do want to
|
||||
process enormous patterns, so it is possible to compile PCRE to use three-byte
|
||||
or four-byte offsets by adding a setting such as
|
||||
.sp
|
||||
--with-link-size=3
|
||||
.sp
|
||||
to the \fBconfigure\fP command. The value given must be 2, 3, or 4. Using
|
||||
longer offsets slows down the operation of PCRE because it has to load
|
||||
additional bytes when handling them.
|
||||
.P
|
||||
If you build PCRE with an increased link size, test 2 (and test 5 if you are
|
||||
using UTF-8) will fail. Part of the output of these tests is a representation
|
||||
of the compiled pattern, and this changes with the link size.
|
||||
.
|
||||
.SH "AVOIDING EXCESSIVE STACK USAGE"
|
||||
.rs
|
||||
.sp
|
||||
When matching with the \fBpcre_exec()\fP function, PCRE implements backtracking
|
||||
by making recursive calls to an internal function called \fBmatch()\fP. In
|
||||
environments where the size of the stack is limited, this can severely limit
|
||||
PCRE's operation. (The Unix environment does not usually suffer from this
|
||||
problem, but it may sometimes be necessary to increase the maximum stack size.
|
||||
There is a discussion in the
|
||||
.\" HREF
|
||||
\fBpcrestack\fP
|
||||
.\"
|
||||
documentation.) An alternative approach to recursion that uses memory from the
|
||||
heap to remember data, instead of using recursive function calls, has been
|
||||
implemented to work round the problem of limited stack size. If you want to
|
||||
build a version of PCRE that works this way, add
|
||||
.sp
|
||||
--disable-stack-for-recursion
|
||||
.sp
|
||||
to the \fBconfigure\fP command. With this configuration, PCRE will use the
|
||||
\fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory
|
||||
management functions. Separate functions are provided because the usage is very
|
||||
predictable: the block sizes requested are always the same, and the blocks are
|
||||
always freed in reverse order. A calling program might be able to implement
|
||||
optimized functions that perform better than the standard \fBmalloc()\fP and
|
||||
\fBfree()\fP functions. PCRE runs noticeably more slowly when built in this
|
||||
way. This option affects only the \fBpcre_exec()\fP function; it is not
|
||||
relevant for the the \fBpcre_dfa_exec()\fP function.
|
||||
.
|
||||
.SH "LIMITING PCRE RESOURCE USAGE"
|
||||
.rs
|
||||
.sp
|
||||
Internally, PCRE has a function called \fBmatch()\fP, which it calls repeatedly
|
||||
(sometimes recursively) when matching a pattern with the \fBpcre_exec()\fP
|
||||
function. By controlling the maximum number of times this function may be
|
||||
called during a single matching operation, a limit can be placed on the
|
||||
resources used by a single call to \fBpcre_exec()\fP. The limit can be changed
|
||||
at run time, as described in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation. The default is 10 million, but this can be changed by adding a
|
||||
setting such as
|
||||
.sp
|
||||
--with-match-limit=500000
|
||||
.sp
|
||||
to the \fBconfigure\fP command. This setting has no effect on the
|
||||
\fBpcre_dfa_exec()\fP matching function.
|
||||
.P
|
||||
In some environments it is desirable to limit the depth of recursive calls of
|
||||
\fBmatch()\fP more strictly than the total number of calls, in order to
|
||||
restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
|
||||
is specified) that is used. A second limit controls this; it defaults to the
|
||||
value that is set for --with-match-limit, which imposes no additional
|
||||
constraints. However, you can set a lower limit by adding, for example,
|
||||
.sp
|
||||
--with-match-limit-recursion=10000
|
||||
.sp
|
||||
to the \fBconfigure\fP command. This value can also be overridden at run time.
|
||||
.
|
||||
.SH "USING EBCDIC CODE"
|
||||
.rs
|
||||
.sp
|
||||
PCRE assumes by default that it will run in an environment where the character
|
||||
code is ASCII (or Unicode, which is a superset of ASCII). PCRE can, however, be
|
||||
compiled to run in an EBCDIC environment by adding
|
||||
.sp
|
||||
--enable-ebcdic
|
||||
.sp
|
||||
to the \fBconfigure\fP command.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 06 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,161 @@
|
|||
.TH PCRECALLOUT 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE CALLOUTS"
|
||||
.rs
|
||||
.sp
|
||||
.B int (*pcre_callout)(pcre_callout_block *);
|
||||
.PP
|
||||
PCRE provides a feature called "callout", which is a means of temporarily
|
||||
passing control to the caller of PCRE in the middle of pattern matching. The
|
||||
caller of PCRE provides an external function by putting its entry point in the
|
||||
global variable \fIpcre_callout\fP. By default, this variable contains NULL,
|
||||
which disables all calling out.
|
||||
.P
|
||||
Within a regular expression, (?C) indicates the points at which the external
|
||||
function is to be called. Different callout points can be identified by putting
|
||||
a number less than 256 after the letter C. The default value is zero.
|
||||
For example, this pattern has two callout points:
|
||||
.sp
|
||||
(?C1)\deabc(?C2)def
|
||||
.sp
|
||||
If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP is called,
|
||||
PCRE automatically inserts callouts, all with number 255, before each item in
|
||||
the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
|
||||
.sp
|
||||
A(\ed{2}|--)
|
||||
.sp
|
||||
it is processed as if it were
|
||||
.sp
|
||||
(?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
|
||||
.sp
|
||||
Notice that there is a callout before and after each parenthesis and
|
||||
alternation bar. Automatic callouts can be used for tracking the progress of
|
||||
pattern matching. The
|
||||
.\" HREF
|
||||
\fBpcretest\fP
|
||||
.\"
|
||||
command has an option that sets automatic callouts; when it is used, the output
|
||||
indicates how the pattern is matched. This is useful information when you are
|
||||
trying to optimize the performance of a particular pattern.
|
||||
.
|
||||
.
|
||||
.SH "MISSING CALLOUTS"
|
||||
.rs
|
||||
.sp
|
||||
You should be aware that, because of optimizations in the way PCRE matches
|
||||
patterns, callouts sometimes do not happen. For example, if the pattern is
|
||||
.sp
|
||||
ab(?C4)cd
|
||||
.sp
|
||||
PCRE knows that any matching string must contain the letter "d". If the subject
|
||||
string is "abyz", the lack of "d" means that matching doesn't ever start, and
|
||||
the callout is never reached. However, with "abyd", though the result is still
|
||||
no match, the callout is obeyed.
|
||||
.
|
||||
.
|
||||
.SH "THE CALLOUT INTERFACE"
|
||||
.rs
|
||||
.sp
|
||||
During matching, when PCRE reaches a callout point, the external function
|
||||
defined by \fIpcre_callout\fP is called (if it is set). This applies to both
|
||||
the \fBpcre_exec()\fP and the \fBpcre_dfa_exec()\fP matching functions. The
|
||||
only argument to the callout function is a pointer to a \fBpcre_callout\fP
|
||||
block. This structure contains the following fields:
|
||||
.sp
|
||||
int \fIversion\fP;
|
||||
int \fIcallout_number\fP;
|
||||
int *\fIoffset_vector\fP;
|
||||
const char *\fIsubject\fP;
|
||||
int \fIsubject_length\fP;
|
||||
int \fIstart_match\fP;
|
||||
int \fIcurrent_position\fP;
|
||||
int \fIcapture_top\fP;
|
||||
int \fIcapture_last\fP;
|
||||
void *\fIcallout_data\fP;
|
||||
int \fIpattern_position\fP;
|
||||
int \fInext_item_length\fP;
|
||||
.sp
|
||||
The \fIversion\fP field is an integer containing the version number of the
|
||||
block format. The initial version was 0; the current version is 1. The version
|
||||
number will change again in future if additional fields are added, but the
|
||||
intention is never to remove any of the existing fields.
|
||||
.P
|
||||
The \fIcallout_number\fP field contains the number of the callout, as compiled
|
||||
into the pattern (that is, the number after ?C for manual callouts, and 255 for
|
||||
automatically generated callouts).
|
||||
.P
|
||||
The \fIoffset_vector\fP field is a pointer to the vector of offsets that was
|
||||
passed by the caller to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. When
|
||||
\fBpcre_exec()\fP is used, the contents can be inspected in order to extract
|
||||
substrings that have been matched so far, in the same way as for extracting
|
||||
substrings after a match has completed. For \fBpcre_dfa_exec()\fP this field is
|
||||
not useful.
|
||||
.P
|
||||
The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
|
||||
that were passed to \fBpcre_exec()\fP.
|
||||
.P
|
||||
The \fIstart_match\fP field contains the offset within the subject at which the
|
||||
current match attempt started. If the pattern is not anchored, the callout
|
||||
function may be called several times from the same point in the pattern for
|
||||
different starting points in the subject.
|
||||
.P
|
||||
The \fIcurrent_position\fP field contains the offset within the subject of the
|
||||
current match pointer.
|
||||
.P
|
||||
When the \fBpcre_exec()\fP function is used, the \fIcapture_top\fP field
|
||||
contains one more than the number of the highest numbered captured substring so
|
||||
far. If no substrings have been captured, the value of \fIcapture_top\fP is
|
||||
one. This is always the case when \fBpcre_dfa_exec()\fP is used, because it
|
||||
does not support captured substrings.
|
||||
.P
|
||||
The \fIcapture_last\fP field contains the number of the most recently captured
|
||||
substring. If no substrings have been captured, its value is -1. This is always
|
||||
the case when \fBpcre_dfa_exec()\fP is used.
|
||||
.P
|
||||
The \fIcallout_data\fP field contains a value that is passed to
|
||||
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP specifically so that it can be
|
||||
passed back in callouts. It is passed in the \fIpcre_callout\fP field of the
|
||||
\fBpcre_extra\fP data structure. If no such data was passed, the value of
|
||||
\fIcallout_data\fP in a \fBpcre_callout\fP block is NULL. There is a
|
||||
description of the \fBpcre_extra\fP structure in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
The \fIpattern_position\fP field is present from version 1 of the
|
||||
\fIpcre_callout\fP structure. It contains the offset to the next item to be
|
||||
matched in the pattern string.
|
||||
.P
|
||||
The \fInext_item_length\fP field is present from version 1 of the
|
||||
\fIpcre_callout\fP structure. It contains the length of the next item to be
|
||||
matched in the pattern string. When the callout immediately precedes an
|
||||
alternation bar, a closing parenthesis, or the end of the pattern, the length
|
||||
is zero. When the callout precedes an opening parenthesis, the length is that
|
||||
of the entire subpattern.
|
||||
.P
|
||||
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
|
||||
help in distinguishing between different automatic callouts, which all have the
|
||||
same callout number. However, they are set for all callouts.
|
||||
.
|
||||
.
|
||||
.SH "RETURN VALUES"
|
||||
.rs
|
||||
.sp
|
||||
The external callout function returns an integer to PCRE. If the value is zero,
|
||||
matching proceeds as normal. If the value is greater than zero, matching fails
|
||||
at the current point, but the testing of other matching possibilities goes
|
||||
ahead, just as if a lookahead assertion had failed. If the value is less than
|
||||
zero, the match is abandoned, and \fBpcre_exec()\fP (or \fBpcre_dfa_exec()\fP)
|
||||
returns the negative value.
|
||||
.P
|
||||
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
|
||||
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
|
||||
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
|
||||
it will never be used by PCRE itself.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 28 February 2005
|
||||
.br
|
||||
Copyright (c) 1997-2005 University of Cambridge.
|
|
@ -0,0 +1,126 @@
|
|||
.TH PCRECOMPAT 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "DIFFERENCES BETWEEN PCRE AND PERL"
|
||||
.rs
|
||||
.sp
|
||||
This document describes the differences in the ways that PCRE and Perl handle
|
||||
regular expressions. The differences described here are with respect to Perl
|
||||
5.8.
|
||||
.P
|
||||
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
|
||||
it does have are given in the
|
||||
.\" HTML <a href="pcre.html#utf8support">
|
||||
.\" </a>
|
||||
section on UTF-8 support
|
||||
.\"
|
||||
in the main
|
||||
.\" HREF
|
||||
\fBpcre\fP
|
||||
.\"
|
||||
page.
|
||||
.P
|
||||
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
|
||||
them, but they do not mean what you might think. For example, (?!a){3} does
|
||||
not assert that the next three characters are not "a". It just asserts that the
|
||||
next character is not "a" three times.
|
||||
.P
|
||||
3. Capturing subpatterns that occur inside negative lookahead assertions are
|
||||
counted, but their entries in the offsets vector are never set. Perl sets its
|
||||
numerical variables from any such patterns that are matched before the
|
||||
assertion fails to match something (thereby succeeding), but only if the
|
||||
negative lookahead assertion contains just one branch.
|
||||
.P
|
||||
4. Though binary zero characters are supported in the subject string, they are
|
||||
not allowed in a pattern string because it is passed as a normal C string,
|
||||
terminated by zero. The escape sequence \e0 can be used in the pattern to
|
||||
represent a binary zero.
|
||||
.P
|
||||
5. The following Perl escape sequences are not supported: \el, \eu, \eL,
|
||||
\eU, and \eN. In fact these are implemented by Perl's general string-handling
|
||||
and are not part of its pattern matching engine. If any of these are
|
||||
encountered by PCRE, an error is generated.
|
||||
.P
|
||||
6. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE is
|
||||
built with Unicode character property support. The properties that can be
|
||||
tested with \ep and \eP are limited to the general category properties such as
|
||||
Lu and Nd, script names such as Greek or Han, and the derived properties Any
|
||||
and L&.
|
||||
.P
|
||||
7. PCRE does support the \eQ...\eE escape for quoting substrings. Characters in
|
||||
between are treated as literals. This is slightly different from Perl in that $
|
||||
and @ are also handled as literals inside the quotes. In Perl, they cause
|
||||
variable interpolation (but of course PCRE does not have variables). Note the
|
||||
following examples:
|
||||
.sp
|
||||
Pattern PCRE matches Perl matches
|
||||
.sp
|
||||
.\" JOIN
|
||||
\eQabc$xyz\eE abc$xyz abc followed by the
|
||||
contents of $xyz
|
||||
\eQabc\e$xyz\eE abc\e$xyz abc\e$xyz
|
||||
\eQabc\eE\e$\eQxyz\eE abc$xyz abc$xyz
|
||||
.sp
|
||||
The \eQ...\eE sequence is recognized both inside and outside character classes.
|
||||
.P
|
||||
8. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
|
||||
constructions. However, there is support for recursive patterns using the
|
||||
non-Perl items (?R), (?number), and (?P>name). Also, the PCRE "callout" feature
|
||||
allows an external function to be called during pattern matching. See the
|
||||
.\" HREF
|
||||
\fBpcrecallout\fP
|
||||
.\"
|
||||
documentation for details.
|
||||
.P
|
||||
9. There are some differences that are concerned with the settings of captured
|
||||
strings when part of a pattern is repeated. For example, matching "aba" against
|
||||
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
|
||||
.P
|
||||
10. PCRE provides some extensions to the Perl regular expression facilities:
|
||||
.sp
|
||||
(a) Although lookbehind assertions must match fixed length strings, each
|
||||
alternative branch of a lookbehind assertion can match a different length of
|
||||
string. Perl requires them all to have the same length.
|
||||
.sp
|
||||
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
|
||||
meta-character matches only at the very end of the string.
|
||||
.sp
|
||||
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
|
||||
meaning is faulted. Otherwise, like Perl, the backslash is ignored. (Perl can
|
||||
be made to issue a warning.)
|
||||
.sp
|
||||
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||
inverted, that is, by default they are not greedy, but if followed by a
|
||||
question mark they are.
|
||||
.sp
|
||||
(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
|
||||
only at the first matching position in the subject string.
|
||||
.sp
|
||||
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
|
||||
options for \fBpcre_exec()\fP have no Perl equivalents.
|
||||
.sp
|
||||
(g) The (?R), (?number), and (?P>name) constructs allows for recursive pattern
|
||||
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
|
||||
support.)
|
||||
.sp
|
||||
(h) PCRE supports named capturing substrings, using the Python syntax.
|
||||
.sp
|
||||
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
|
||||
package.
|
||||
.sp
|
||||
(j) The (R) condition, for testing recursion, is a PCRE extension.
|
||||
.sp
|
||||
(k) The callout facility is PCRE-specific.
|
||||
.sp
|
||||
(l) The partial matching facility is PCRE-specific.
|
||||
.sp
|
||||
(m) Patterns compiled by PCRE can be saved and re-used at a later time, even on
|
||||
different hosts that have the other endianness.
|
||||
.sp
|
||||
(n) The alternative matching function (\fBpcre_dfa_exec()\fP) matches in a
|
||||
different way and is not Perl-compatible.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 06 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,312 @@
|
|||
.TH PCRECPP 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions.
|
||||
.SH "SYNOPSIS OF C++ WRAPPER"
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcrecpp.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
The C++ wrapper for PCRE was provided by Google Inc. Some additional
|
||||
functionality was added by Giuseppe Maxia. This brief man page was constructed
|
||||
from the notes in the \fIpcrecpp.h\fP file, which should be consulted for
|
||||
further details.
|
||||
.
|
||||
.
|
||||
.SH "MATCHING INTERFACE"
|
||||
.rs
|
||||
.sp
|
||||
The "FullMatch" operation checks that supplied text matches a supplied pattern
|
||||
exactly. If pointer arguments are supplied, it copies matched sub-strings that
|
||||
match sub-patterns into them.
|
||||
.sp
|
||||
Example: successful match
|
||||
pcrecpp::RE re("h.*o");
|
||||
re.FullMatch("hello");
|
||||
.sp
|
||||
Example: unsuccessful match (requires full match):
|
||||
pcrecpp::RE re("e");
|
||||
!re.FullMatch("hello");
|
||||
.sp
|
||||
Example: creating a temporary RE object:
|
||||
pcrecpp::RE("h.*o").FullMatch("hello");
|
||||
.sp
|
||||
You can pass in a "const char*" or a "string" for "text". The examples below
|
||||
tend to use a const char*. You can, as in the different examples above, store
|
||||
the RE object explicitly in a variable or use a temporary RE object. The
|
||||
examples below use one mode or the other arbitrarily. Either could correctly be
|
||||
used for any of these examples.
|
||||
.P
|
||||
You must supply extra pointer arguments to extract matched subpieces.
|
||||
.sp
|
||||
Example: extracts "ruby" into "s" and 1234 into "i"
|
||||
int i;
|
||||
string s;
|
||||
pcrecpp::RE re("(\e\ew+):(\e\ed+)");
|
||||
re.FullMatch("ruby:1234", &s, &i);
|
||||
.sp
|
||||
Example: does not try to extract any extra sub-patterns
|
||||
re.FullMatch("ruby:1234", &s);
|
||||
.sp
|
||||
Example: does not try to extract into NULL
|
||||
re.FullMatch("ruby:1234", NULL, &i);
|
||||
.sp
|
||||
Example: integer overflow causes failure
|
||||
!re.FullMatch("ruby:1234567891234", NULL, &i);
|
||||
.sp
|
||||
Example: fails because there aren't enough sub-patterns:
|
||||
!pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s);
|
||||
.sp
|
||||
Example: fails because string cannot be stored in integer
|
||||
!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
|
||||
.sp
|
||||
The provided pointer arguments can be pointers to any scalar numeric
|
||||
type, or one of:
|
||||
.sp
|
||||
string (matched piece is copied to string)
|
||||
StringPiece (StringPiece is mutated to point to matched piece)
|
||||
T (where "bool T::ParseFrom(const char*, int)" exists)
|
||||
NULL (the corresponding matched sub-pattern is not copied)
|
||||
.sp
|
||||
The function returns true iff all of the following conditions are satisfied:
|
||||
.sp
|
||||
a. "text" matches "pattern" exactly;
|
||||
.sp
|
||||
b. The number of matched sub-patterns is >= number of supplied
|
||||
pointers;
|
||||
.sp
|
||||
c. The "i"th argument has a suitable type for holding the
|
||||
string captured as the "i"th sub-pattern. If you pass in
|
||||
NULL for the "i"th argument, or pass fewer arguments than
|
||||
number of sub-patterns, "i"th captured sub-pattern is
|
||||
ignored.
|
||||
.sp
|
||||
The matching interface supports at most 16 arguments per call.
|
||||
If you need more, consider using the more general interface
|
||||
\fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for
|
||||
\fBDoMatch\fP.
|
||||
.
|
||||
.SH "PARTIAL MATCHES"
|
||||
.rs
|
||||
.sp
|
||||
You can use the "PartialMatch" operation when you want the pattern
|
||||
to match any substring of the text.
|
||||
.sp
|
||||
Example: simple search for a string:
|
||||
pcrecpp::RE("ell").PartialMatch("hello");
|
||||
.sp
|
||||
Example: find first number in a string:
|
||||
int number;
|
||||
pcrecpp::RE re("(\e\ed+)");
|
||||
re.PartialMatch("x*100 + 20", &number);
|
||||
assert(number == 100);
|
||||
.
|
||||
.
|
||||
.SH "UTF-8 AND THE MATCHING INTERFACE"
|
||||
.rs
|
||||
.sp
|
||||
By default, pattern and text are plain text, one byte per character. The UTF8
|
||||
flag, passed to the constructor, causes both pattern and string to be treated
|
||||
as UTF-8 text, still a byte stream but potentially multiple bytes per
|
||||
character. In practice, the text is likelier to be UTF-8 than the pattern, but
|
||||
the match returned may depend on the UTF8 flag, so always use it when matching
|
||||
UTF8 text. For example, "." will match one byte normally but with UTF8 set may
|
||||
match up to three bytes of a multi-byte character.
|
||||
.sp
|
||||
Example:
|
||||
pcrecpp::RE_Options options;
|
||||
options.set_utf8();
|
||||
pcrecpp::RE re(utf8_pattern, options);
|
||||
re.FullMatch(utf8_string);
|
||||
.sp
|
||||
Example: using the convenience function UTF8():
|
||||
pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
|
||||
re.FullMatch(utf8_string);
|
||||
.sp
|
||||
NOTE: The UTF8 flag is ignored if pcre was not configured with the
|
||||
--enable-utf8 flag.
|
||||
.
|
||||
.
|
||||
.SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE"
|
||||
.rs
|
||||
.sp
|
||||
PCRE defines some modifiers to change the behavior of the regular expression
|
||||
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
|
||||
pass such modifiers to a RE class. Currently, the following modifiers are
|
||||
supported:
|
||||
.sp
|
||||
modifier description Perl corresponding
|
||||
.sp
|
||||
PCRE_CASELESS case insensitive match /i
|
||||
PCRE_MULTILINE multiple lines match /m
|
||||
PCRE_DOTALL dot matches newlines /s
|
||||
PCRE_DOLLAR_ENDONLY $ matches only at end N/A
|
||||
PCRE_EXTRA strict escape parsing N/A
|
||||
PCRE_EXTENDED ignore whitespaces /x
|
||||
PCRE_UTF8 handles UTF8 chars built-in
|
||||
PCRE_UNGREEDY reverses * and *? N/A
|
||||
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
|
||||
.sp
|
||||
(*) Both Perl and PCRE allow non capturing parentheses by means of the
|
||||
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
|
||||
capture, while (ab|cd) does.
|
||||
.P
|
||||
For a full account on how each modifier works, please check the
|
||||
PCRE API reference page.
|
||||
.P
|
||||
For each modifier, there are two member functions whose name is made
|
||||
out of the modifier in lowercase, without the "PCRE_" prefix. For
|
||||
instance, PCRE_CASELESS is handled by
|
||||
.sp
|
||||
bool caseless()
|
||||
.sp
|
||||
which returns true if the modifier is set, and
|
||||
.sp
|
||||
RE_Options & set_caseless(bool)
|
||||
.sp
|
||||
which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
|
||||
accessed through the \fBset_match_limit()\fR and \fBmatch_limit()\fR member
|
||||
functions. Setting \fImatch_limit\fR to a non-zero value will limit the
|
||||
execution of pcre to keep it from doing bad things like blowing the stack or
|
||||
taking an eternity to return a result. A value of 5000 is good enough to stop
|
||||
stack blowup in a 2MB thread stack. Setting \fImatch_limit\fR to zero disables
|
||||
match limiting. Alternatively, you can call \fBmatch_limit_recursion()\fP
|
||||
which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
|
||||
recurses. \fBmatch_limit()\fP limits the number of matches PCRE does;
|
||||
\fBmatch_limit_recursion()\fP limits the depth of internal recursion, and
|
||||
therefore the amount of stack that is used.
|
||||
.P
|
||||
Normally, to pass one or more modifiers to a RE class, you declare
|
||||
a \fIRE_Options\fR object, set the appropriate options, and pass this
|
||||
object to a RE constructor. Example:
|
||||
.sp
|
||||
RE_options opt;
|
||||
opt.set_caseless(true);
|
||||
if (RE("HELLO", opt).PartialMatch("hello world")) ...
|
||||
.sp
|
||||
RE_options has two constructors. The default constructor takes no arguments and
|
||||
creates a set of flags that are off by default. The optional parameter
|
||||
\fIoption_flags\fR is to facilitate transfer of legacy code from C programs.
|
||||
This lets you do
|
||||
.sp
|
||||
RE(pattern,
|
||||
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
|
||||
.sp
|
||||
However, new code is better off doing
|
||||
.sp
|
||||
RE(pattern,
|
||||
RE_Options().set_caseless(true).set_multiline(true))
|
||||
.PartialMatch(str);
|
||||
.sp
|
||||
If you are going to pass one of the most used modifiers, there are some
|
||||
convenience functions that return a RE_Options class with the
|
||||
appropriate modifier already set: \fBCASELESS()\fR, \fBUTF8()\fR,
|
||||
\fBMULTILINE()\fR, \fBDOTALL\fR(), and \fBEXTENDED()\fR.
|
||||
.P
|
||||
If you need to set several options at once, and you don't want to go through
|
||||
the pains of declaring a RE_Options object and setting several options, there
|
||||
is a parallel method that give you such ability on the fly. You can concatenate
|
||||
several \fBset_xxxxx()\fR member functions, since each of them returns a
|
||||
reference to its class object. For example, to pass PCRE_CASELESS,
|
||||
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
|
||||
.sp
|
||||
RE(" ^ xyz \e\es+ .* blah$",
|
||||
RE_Options()
|
||||
.set_caseless(true)
|
||||
.set_extended(true)
|
||||
.set_multiline(true)).PartialMatch(sometext);
|
||||
.sp
|
||||
.
|
||||
.
|
||||
.SH "SCANNING TEXT INCREMENTALLY"
|
||||
.rs
|
||||
.sp
|
||||
The "Consume" operation may be useful if you want to repeatedly
|
||||
match regular expressions at the front of a string and skip over
|
||||
them as they match. This requires use of the "StringPiece" type,
|
||||
which represents a sub-range of a real string. Like RE, StringPiece
|
||||
is defined in the pcrecpp namespace.
|
||||
.sp
|
||||
Example: read lines of the form "var = value" from a string.
|
||||
string contents = ...; // Fill string somehow
|
||||
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
|
||||
|
||||
string var;
|
||||
int value;
|
||||
pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en");
|
||||
while (re.Consume(&input, &var, &value)) {
|
||||
...;
|
||||
}
|
||||
.sp
|
||||
Each successful call to "Consume" will set "var/value", and also
|
||||
advance "input" so it points past the matched text.
|
||||
.P
|
||||
The "FindAndConsume" operation is similar to "Consume" but does not
|
||||
anchor your match at the beginning of the string. For example, you
|
||||
could extract all words from a string by repeatedly calling
|
||||
.sp
|
||||
pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word)
|
||||
.
|
||||
.
|
||||
.SH "PARSING HEX/OCTAL/C-RADIX NUMBERS"
|
||||
.rs
|
||||
.sp
|
||||
By default, if you pass a pointer to a numeric value, the
|
||||
corresponding text is interpreted as a base-10 number. You can
|
||||
instead wrap the pointer with a call to one of the operators Hex(),
|
||||
Octal(), or CRadix() to interpret the text in another base. The
|
||||
CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
|
||||
prefixes, but defaults to base-10.
|
||||
.sp
|
||||
Example:
|
||||
int a, b, c, d;
|
||||
pcrecpp::RE re("(.*) (.*) (.*) (.*)");
|
||||
re.FullMatch("100 40 0100 0x40",
|
||||
pcrecpp::Octal(&a), pcrecpp::Hex(&b),
|
||||
pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
|
||||
.sp
|
||||
will leave 64 in a, b, c, and d.
|
||||
.
|
||||
.
|
||||
.SH "REPLACING PARTS OF STRINGS"
|
||||
.rs
|
||||
.sp
|
||||
You can replace the first match of "pattern" in "str" with "rewrite".
|
||||
Within "rewrite", backslash-escaped digits (\e1 to \e9) can be
|
||||
used to insert text matching corresponding parenthesized group
|
||||
from the pattern. \e0 in "rewrite" refers to the entire matching
|
||||
text. For example:
|
||||
.sp
|
||||
string s = "yabba dabba doo";
|
||||
pcrecpp::RE("b+").Replace("d", &s);
|
||||
.sp
|
||||
will leave "s" containing "yada dabba doo". The result is true if the pattern
|
||||
matches and a replacement occurs, false otherwise.
|
||||
.P
|
||||
\fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all
|
||||
occurrences of the pattern in the string with the rewrite. Replacements are
|
||||
not subject to re-matching. For example:
|
||||
.sp
|
||||
string s = "yabba dabba doo";
|
||||
pcrecpp::RE("b+").GlobalReplace("d", &s);
|
||||
.sp
|
||||
will leave "s" containing "yada dada doo". It returns the number of
|
||||
replacements made.
|
||||
.P
|
||||
\fBExtract\fP is like \fBReplace\fP, except that if the pattern matches,
|
||||
"rewrite" is copied into "out" (an additional argument) with substitutions.
|
||||
The non-matching portions of "text" are ignored. Returns true iff a match
|
||||
occurred and the extraction happened successfully; if no match occurs, the
|
||||
string is left unaffected.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
The C++ wrapper was contributed by Google Inc.
|
||||
.br
|
||||
Copyright (c) 2005 Google Inc.
|
|
@ -0,0 +1,376 @@
|
|||
.TH PCREGREP 1
|
||||
.SH NAME
|
||||
pcregrep - a grep with Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.B pcregrep [options] [long options] [pattern] [path1 path2 ...]
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
\fBpcregrep\fP searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
.\" HREF
|
||||
\fBpcrepattern\fP
|
||||
.\"
|
||||
for a full description of syntax and semantics of the regular expressions that
|
||||
PCRE supports.
|
||||
.P
|
||||
Patterns, whether supplied on the command line or in a separate file, are given
|
||||
without delimiters. For example:
|
||||
.sp
|
||||
pcregrep Thursday /etc/motd
|
||||
.sp
|
||||
If you attempt to use delimiters (for example, by surrounding a pattern with
|
||||
slashes, as is common in Perl scripts), they are interpreted as part of the
|
||||
pattern. Quotes can of course be used on the command line because they are
|
||||
interpreted by the shell, and indeed they are required if a pattern contains
|
||||
white space or shell metacharacters.
|
||||
.P
|
||||
The first argument that follows any option settings is treated as the single
|
||||
pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present.
|
||||
Conversely, when one or both of these options are used to specify patterns, all
|
||||
arguments are treated as path names. At least one of \fB-e\fP, \fB-f\fP, or an
|
||||
argument pattern must be provided.
|
||||
.P
|
||||
If no files are specified, \fBpcregrep\fP reads the standard input. The
|
||||
standard input can also be referenced by a name consisting of a single hyphen.
|
||||
For example:
|
||||
.sp
|
||||
pcregrep some-pattern /file1 - /file3
|
||||
.sp
|
||||
By default, each line that matches the pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at the
|
||||
start of each line. However, there are options that can change how
|
||||
\fBpcregrep\fP behaves. In particular, the \fB-M\fP option makes it possible to
|
||||
search for patterns that span line boundaries. What defines a line boundary is
|
||||
controlled by the \fB-N\fP (\fB--newline\fP) option.
|
||||
.P
|
||||
Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
|
||||
BUFSIZ is defined in \fB<stdio.h>\fP.
|
||||
.P
|
||||
If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set,
|
||||
\fBpcregrep\fP uses the value to set a locale when calling the PCRE library.
|
||||
The \fB--locale\fP option can be used to override this.
|
||||
.
|
||||
.SH OPTIONS
|
||||
.rs
|
||||
.TP 10
|
||||
\fB--\fP
|
||||
This terminate the list of options. It is useful if the next item on the
|
||||
command line starts with a hyphen but is not an option. This allows for the
|
||||
processing of patterns and filenames that start with hyphens.
|
||||
.TP
|
||||
\fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP
|
||||
Output \fInumber\fP lines of context after each matching line. If filenames
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP
|
||||
guarantees to have up to 8K of following text available for context output.
|
||||
.TP
|
||||
\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
|
||||
Output \fInumber\fP lines of context before each matching line. If filenames
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP
|
||||
guarantees to have up to 8K of preceding text available for context output.
|
||||
.TP
|
||||
\fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP
|
||||
Output \fInumber\fP lines of context both before and after each matching line.
|
||||
This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value.
|
||||
.TP
|
||||
\fB-c\fP, \fB--count\fP
|
||||
Do not output individual lines; instead just output a count of the number of
|
||||
lines that would otherwise have been output. If several files are given, a
|
||||
count is output for each of them. In this mode, the \fB-A\fP, \fB-B\fP, and
|
||||
\fB-C\fP options are ignored.
|
||||
.TP
|
||||
\fB--colour\fP, \fB--color\fP
|
||||
If this option is given without any data, it is equivalent to "--colour=auto".
|
||||
If data is required, it must be given in the same shell item, separated by an
|
||||
equals sign.
|
||||
.TP
|
||||
\fB--colour=\fP\fIvalue\fP, \fB--color=\fP\fIvalue\fP
|
||||
This option specifies under what circumstances the part of a line that matched
|
||||
a pattern should be coloured in the output. The value may be "never" (the
|
||||
default), "always", or "auto". In the latter case, colouring happens only if
|
||||
the standard output is connected to a terminal. The colour can be specified by
|
||||
setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
|
||||
of this variable should be a string of two numbers, separated by a semicolon.
|
||||
They are copied directly into the control string for setting colour on a
|
||||
terminal, so it is your responsibility to ensure that they make sense. If
|
||||
neither of the environment variables is set, the default is "1;31", which gives
|
||||
red.
|
||||
.TP
|
||||
\fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP
|
||||
If an input path is not a regular file or a directory, "action" specifies how
|
||||
it is to be processed. Valid values are "read" (the default) or "skip"
|
||||
(silently skip the path).
|
||||
.TP
|
||||
\fB-d\fP \fIaction\fP, \fB--directories=\fP\fIaction\fP
|
||||
If an input path is a directory, "action" specifies how it is to be processed.
|
||||
Valid values are "read" (the default), "recurse" (equivalent to the \fB-r\fP
|
||||
option), or "skip" (silently skip the path). In the default case, directories
|
||||
are read as if they were ordinary files. In some operating systems the effect
|
||||
of reading a directory like this is an immediate end-of-file.
|
||||
.TP
|
||||
\fB-e\fP \fIpattern\fP, \fB--regex=\fP\fIpattern\fP,
|
||||
\fB--regexp=\fP\fIpattern\fP Specify a pattern to be matched. This option can
|
||||
be used multiple times in order to specify several patterns. It can also be
|
||||
used as a way of specifying a single pattern that starts with a hyphen. When
|
||||
\fB-e\fP is used, no argument pattern is taken from the command line; all
|
||||
arguments are treated as file names. There is an overall maximum of 100
|
||||
patterns. They are applied to each line in the order in which they are defined
|
||||
until one matches (or fails to match if \fB-v\fP is used). If \fB-f\fP is used
|
||||
with \fB-e\fP, the command line patterns are matched first, followed by the
|
||||
patterns from the file, independent of the order in which these options are
|
||||
specified. Note that multiple use of \fB-e\fP is not the same as a single
|
||||
pattern with alternatives. For example, X|Y finds the first character in a line
|
||||
that is X or Y, whereas if the two patterns are given separately,
|
||||
\fBpcregrep\fP finds X if it is present, even if it follows Y in the line. It
|
||||
finds Y only if there is no X in the line. This really matters only if you are
|
||||
using \fB-o\fP to show the portion of the line that matched.
|
||||
.TP
|
||||
\fB--exclude\fP=\fIpattern\fP
|
||||
When \fBpcregrep\fP is searching the files in a directory as a consequence of
|
||||
the \fB-r\fP (recursive search) option, any files whose names match the pattern
|
||||
are excluded. The pattern is a PCRE regular expression. If a file name matches
|
||||
both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no short
|
||||
form for this option.
|
||||
.TP
|
||||
\fB-F\fP, \fB--fixed-strings\fP
|
||||
Interpret each pattern as a list of fixed strings, separated by newlines,
|
||||
instead of as a regular expression. The \fB-w\fP (match as a word) and \fB-x\fP
|
||||
(match whole line) options can be used with \fB-F\fP. They apply to each of the
|
||||
fixed strings. A line is selected if any of the fixed strings are found in it
|
||||
(subject to \fB-w\fP or \fB-x\fP, if present).
|
||||
.TP
|
||||
\fB-f\fP \fIfilename\fP, \fB--file=\fP\fIfilename\fP
|
||||
Read a number of patterns from the file, one per line, and match them against
|
||||
each line of input. A data line is output if any of the patterns match it. The
|
||||
filename can be given as "-" to refer to the standard input. When \fB-f\fP is
|
||||
used, patterns specified on the command line using \fB-e\fP may also be
|
||||
present; they are tested before the file's patterns. However, no other pattern
|
||||
is taken from the command line; all arguments are treated as file names. There
|
||||
is an overall maximum of 100 patterns. Trailing white space is removed from
|
||||
each line, and blank lines are ignored. An empty file contains no patterns and
|
||||
therefore matches nothing.
|
||||
.TP
|
||||
\fB-H\fP, \fB--with-filename\fP
|
||||
Force the inclusion of the filename at the start of output lines when searching
|
||||
a single file. By default, the filename is not shown in this case. For matching
|
||||
lines, the filename is followed by a colon and a space; for context lines, a
|
||||
hyphen separator is used. If a line number is also being output, it follows the
|
||||
file name without a space.
|
||||
.TP
|
||||
\fB-h\fP, \fB--no-filename\fP
|
||||
Suppress the output filenames when searching multiple files. By default,
|
||||
filenames are shown when multiple files are searched. For matching lines, the
|
||||
filename is followed by a colon and a space; for context lines, a hyphen
|
||||
separator is used. If a line number is also being output, it follows the file
|
||||
name without a space.
|
||||
.TP
|
||||
\fB--help\fP
|
||||
Output a brief help message and exit.
|
||||
.TP
|
||||
\fB-i\fP, \fB--ignore-case\fP
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
.TP
|
||||
\fB--include\fP=\fIpattern\fP
|
||||
When \fBpcregrep\fP is searching the files in a directory as a consequence of
|
||||
the \fB-r\fP (recursive search) option, only those files whose names match the
|
||||
pattern are included. The pattern is a PCRE regular expression. If a file name
|
||||
matches both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no
|
||||
short form for this option.
|
||||
.TP
|
||||
\fB-L\fP, \fB--files-without-match\fP
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
that do not contain any lines that would have been output. Each file name is
|
||||
output once, on a separate line.
|
||||
.TP
|
||||
\fB-l\fP, \fB--files-with-matches\fP
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
containing lines that would have been output. Each file name is output
|
||||
once, on a separate line. Searching stops as soon as a matching line is found
|
||||
in a file.
|
||||
.TP
|
||||
\fB--label\fP=\fIname\fP
|
||||
This option supplies a name to be used for the standard input when file names
|
||||
are being output. If not supplied, "(standard input)" is used. There is no
|
||||
short form for this option.
|
||||
.TP
|
||||
\fB--locale\fP=\fIlocale-name\fP
|
||||
This option specifies a locale to be used for pattern matching. It overrides
|
||||
the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no
|
||||
locale is specified, the PCRE library's default (usually the "C" locale) is
|
||||
used. There is no short form for this option.
|
||||
.TP
|
||||
\fB-M\fP, \fB--multiline\fP
|
||||
Allow patterns to match more than one line. When this option is given, patterns
|
||||
may usefully contain literal newline characters and internal occurrences of ^
|
||||
and $ characters. The output for any one match may consist of more than one
|
||||
line. When this option is set, the PCRE library is called in "multiline" mode.
|
||||
There is a limit to the number of lines that can be matched, imposed by the way
|
||||
that \fBpcregrep\fP buffers the input file as it scans it. However,
|
||||
\fBpcregrep\fP ensures that at least 8K characters or the rest of the document
|
||||
(whichever is the shorter) are available for forward matching, and similarly
|
||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||
are guaranteed to be available for lookbehind assertions.
|
||||
.TP
|
||||
\fB-N\fP \fInewline-type\fP, \fB--newline=\fP\fInewline-type\fP
|
||||
The PCRE library supports three different character sequences for indicating
|
||||
the ends of lines. They are the single-character sequences CR (carriage return)
|
||||
and LF (linefeed), and the two-character sequence CR, LF. When the library is
|
||||
built, a default line-ending sequence is specified. This is normally the
|
||||
standard sequence for the operating system. Unless otherwise specified by this
|
||||
option, \fBpcregrep\fP uses the default. The possible values for this option
|
||||
are CR, LF, or CRLF. This makes it possible to use \fBpcregrep\fP on files that
|
||||
have come from other environments without having to modify their line endings.
|
||||
If the data that is being scanned does not agree with the convention set by
|
||||
this option, \fBpcregrep\fP may behave in strange ways.
|
||||
.TP
|
||||
\fB-n\fP, \fB--line-number\fP
|
||||
Precede each output line by its line number in the file, followed by a colon
|
||||
and a space for matching lines or a hyphen and a space for context lines. If
|
||||
the filename is also being output, it precedes the line number.
|
||||
.TP
|
||||
\fB-o\fP, \fB--only-matching\fP
|
||||
Show only the part of the line that matched a pattern. In this mode, no
|
||||
context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are
|
||||
ignored.
|
||||
.TP
|
||||
\fB-q\fP, \fB--quiet\fP
|
||||
Work quietly, that is, display nothing except error messages. The exit
|
||||
status indicates whether or not any matches were found.
|
||||
.TP
|
||||
\fB-r\fP, \fB--recursive\fP
|
||||
If any given path is a directory, recursively scan the files it contains,
|
||||
taking note of any \fB--include\fP and \fB--exclude\fP settings. By default, a
|
||||
directory is read as a normal file; in some operating systems this gives an
|
||||
immediate end-of-file. This option is a shorthand for setting the \fB-d\fP
|
||||
option to "recurse".
|
||||
.TP
|
||||
\fB-s\fP, \fB--no-messages\fP
|
||||
Suppress error messages about non-existent or unreadable files. Such files are
|
||||
quietly skipped. However, the return code is still 2, even if matches were
|
||||
found in other files.
|
||||
.TP
|
||||
\fB-u\fP, \fB--utf-8\fP
|
||||
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
|
||||
with UTF-8 support. Both patterns and subject lines must be valid strings of
|
||||
UTF-8 characters.
|
||||
.TP
|
||||
\fB-V\fP, \fB--version\fP
|
||||
Write the version numbers of \fBpcregrep\fP and the PCRE library that is being
|
||||
used to the standard error stream.
|
||||
.TP
|
||||
\fB-v\fP, \fB--invert-match\fP
|
||||
Invert the sense of the match, so that lines which do \fInot\fP match any of
|
||||
the patterns are the ones that are found.
|
||||
.TP
|
||||
\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP
|
||||
Force the patterns to match only whole words. This is equivalent to having \eb
|
||||
at the start and end of the pattern.
|
||||
.TP
|
||||
\fB-x\fP, \fB--line-regex\fP, \fP--line-regexp\fP
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in every pattern.
|
||||
.
|
||||
.
|
||||
.SH "ENVIRONMENT VARIABLES"
|
||||
.rs
|
||||
.sp
|
||||
The environment variables \fBLC_ALL\fP and \fBLC_CTYPE\fP are examined, in that
|
||||
order, for a locale. The first one that is set is used. This can be overridden
|
||||
by the \fB--locale\fP option. If no locale is set, the PCRE library's default
|
||||
(usually the "C" locale) is used.
|
||||
.
|
||||
.
|
||||
.SH "NEWLINES"
|
||||
.rs
|
||||
.sp
|
||||
The \fB-N\fP (\fB--newline\fP) option allows \fBpcregrep\fP to scan files with
|
||||
different newline conventions from the default. However, the setting of this
|
||||
option does not affect the way in which \fBpcregrep\fP writes information to
|
||||
the standard error and output streams. It uses the string "\en" in C
|
||||
\fBprintf()\fP calls to indicate newlines, relying on the C I/O library to
|
||||
convert this to an appropriate sequence if the output is sent to a file.
|
||||
.
|
||||
.
|
||||
.SH "OPTIONS COMPATIBILITY"
|
||||
.rs
|
||||
.sp
|
||||
The majority of short and long forms of \fBpcregrep\fP's options are the same
|
||||
as in the GNU \fBgrep\fP program. Any long option of the form
|
||||
\fB--xxx-regexp\fP (GNU terminology) is also available as \fB--xxx-regex\fP
|
||||
(PCRE terminology). However, the \fB--locale\fP, \fB-M\fP, \fB--multiline\fP,
|
||||
\fB-u\fP, and \fB--utf-8\fP options are specific to \fBpcregrep\fP.
|
||||
.
|
||||
.
|
||||
.SH "OPTIONS WITH DATA"
|
||||
.rs
|
||||
.sp
|
||||
There are four different ways in which an option with data can be specified.
|
||||
If a short form option is used, the data may follow immediately, or in the next
|
||||
command line item. For example:
|
||||
.sp
|
||||
-f/some/file
|
||||
-f /some/file
|
||||
.sp
|
||||
If a long form option is used, the data may appear in the same command line
|
||||
item, separated by an equals character, or (with one exception) it may appear
|
||||
in the next command line item. For example:
|
||||
.sp
|
||||
--file=/some/file
|
||||
--file /some/file
|
||||
.sp
|
||||
Note, however, that if you want to supply a file name beginning with ~ as data
|
||||
in a shell command, and have the shell expand ~ to a home directory, you must
|
||||
separate the file name from the option, because the shell does not treat ~
|
||||
specially unless it is at the start of an item.
|
||||
.P
|
||||
The exception to the above is the \fB--colour\fP (or \fB--color\fP) option,
|
||||
for which the data is optional. If this option does have data, it must be given
|
||||
in the first form, using an equals character. Otherwise it will be assumed that
|
||||
it has no data.
|
||||
.
|
||||
.
|
||||
.SH MATCHING ERRORS
|
||||
.rs
|
||||
.sp
|
||||
It is possible to supply a regular expression that takes a very long time to
|
||||
fail to match certain lines. Such patterns normally involve nested indefinite
|
||||
repeats, for example: (a+)*\ed when matched against a line of a's with no final
|
||||
digit. The PCRE matching function has a resource limit that causes it to abort
|
||||
in these circumstances. If this happens, \fBpcregrep\fP outputs an error
|
||||
message and the line that caused the problem to the standard error stream. If
|
||||
there are more than 20 such errors, \fBpcregrep\fP gives up.
|
||||
.
|
||||
.
|
||||
.SH DIAGNOSTICS
|
||||
.rs
|
||||
.sp
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors and non-existent or inacessible files (even if matches were
|
||||
found in other files) or too many matching errors. Using the \fB-s\fP option to
|
||||
suppress error messages about inaccessble files does not affect the return
|
||||
code.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel
|
||||
.br
|
||||
University Computing Service
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 06 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,399 @@
|
|||
PCREGREP(1) PCREGREP(1)
|
||||
|
||||
|
||||
NAME
|
||||
pcregrep - a grep with Perl-compatible regular expressions.
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
pcregrep [options] [long options] [pattern] [path1 path2 ...]
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
|
||||
pcregrep searches files for character patterns, in the same way as
|
||||
other grep commands do, but it uses the PCRE regular expression library
|
||||
to support patterns that are compatible with the regular expressions of
|
||||
Perl 5. See pcrepattern for a full description of syntax and semantics
|
||||
of the regular expressions that PCRE supports.
|
||||
|
||||
Patterns, whether supplied on the command line or in a separate file,
|
||||
are given without delimiters. For example:
|
||||
|
||||
pcregrep Thursday /etc/motd
|
||||
|
||||
If you attempt to use delimiters (for example, by surrounding a pattern
|
||||
with slashes, as is common in Perl scripts), they are interpreted as
|
||||
part of the pattern. Quotes can of course be used on the command line
|
||||
because they are interpreted by the shell, and indeed they are required
|
||||
if a pattern contains white space or shell metacharacters.
|
||||
|
||||
The first argument that follows any option settings is treated as the
|
||||
single pattern to be matched when neither -e nor -f is present. Con-
|
||||
versely, when one or both of these options are used to specify pat-
|
||||
terns, all arguments are treated as path names. At least one of -e, -f,
|
||||
or an argument pattern must be provided.
|
||||
|
||||
If no files are specified, pcregrep reads the standard input. The stan-
|
||||
dard input can also be referenced by a name consisting of a single
|
||||
hyphen. For example:
|
||||
|
||||
pcregrep some-pattern /file1 - /file3
|
||||
|
||||
By default, each line that matches the pattern is copied to the stan-
|
||||
dard output, and if there is more than one file, the file name is out-
|
||||
put at the start of each line. However, there are options that can
|
||||
change how pcregrep behaves. In particular, the -M option makes it pos-
|
||||
sible to search for patterns that span line boundaries. What defines a
|
||||
line boundary is controlled by the -N (--newline) option.
|
||||
|
||||
Patterns are limited to 8K or BUFSIZ characters, whichever is the
|
||||
greater. BUFSIZ is defined in <stdio.h>.
|
||||
|
||||
If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
|
||||
the value to set a locale when calling the PCRE library. The --locale
|
||||
option can be used to override this.
|
||||
|
||||
|
||||
OPTIONS
|
||||
|
||||
-- This terminate the list of options. It is useful if the next
|
||||
item on the command line starts with a hyphen but is not an
|
||||
option. This allows for the processing of patterns and file-
|
||||
names that start with hyphens.
|
||||
|
||||
-A number, --after-context=number
|
||||
Output number lines of context after each matching line. If
|
||||
filenames and/or line numbers are being output, a hyphen sep-
|
||||
arator is used instead of a colon for the context lines. A
|
||||
line containing "--" is output between each group of lines,
|
||||
unless they are in fact contiguous in the input file. The
|
||||
value of number is expected to be relatively small. However,
|
||||
pcregrep guarantees to have up to 8K of following text avail-
|
||||
able for context output.
|
||||
|
||||
-B number, --before-context=number
|
||||
Output number lines of context before each matching line. If
|
||||
filenames and/or line numbers are being output, a hyphen sep-
|
||||
arator is used instead of a colon for the context lines. A
|
||||
line containing "--" is output between each group of lines,
|
||||
unless they are in fact contiguous in the input file. The
|
||||
value of number is expected to be relatively small. However,
|
||||
pcregrep guarantees to have up to 8K of preceding text avail-
|
||||
able for context output.
|
||||
|
||||
-C number, --context=number
|
||||
Output number lines of context both before and after each
|
||||
matching line. This is equivalent to setting both -A and -B
|
||||
to the same value.
|
||||
|
||||
-c, --count
|
||||
Do not output individual lines; instead just output a count
|
||||
of the number of lines that would otherwise have been output.
|
||||
If several files are given, a count is output for each of
|
||||
them. In this mode, the -A, -B, and -C options are ignored.
|
||||
|
||||
--colour, --color
|
||||
If this option is given without any data, it is equivalent to
|
||||
"--colour=auto". If data is required, it must be given in
|
||||
the same shell item, separated by an equals sign.
|
||||
|
||||
--colour=value, --color=value
|
||||
This option specifies under what circumstances the part of a
|
||||
line that matched a pattern should be coloured in the output.
|
||||
The value may be "never" (the default), "always", or "auto".
|
||||
In the latter case, colouring happens only if the standard
|
||||
output is connected to a terminal. The colour can be speci-
|
||||
fied by setting the environment variable PCREGREP_COLOUR or
|
||||
PCREGREP_COLOR. The value of this variable should be a string
|
||||
of two numbers, separated by a semicolon. They are copied
|
||||
directly into the control string for setting colour on a ter-
|
||||
minal, so it is your responsibility to ensure that they make
|
||||
sense. If neither of the environment variables is set, the
|
||||
default is "1;31", which gives red.
|
||||
|
||||
-D action, --devices=action
|
||||
If an input path is not a regular file or a directory,
|
||||
"action" specifies how it is to be processed. Valid values
|
||||
are "read" (the default) or "skip" (silently skip the path).
|
||||
|
||||
-d action, --directories=action
|
||||
If an input path is a directory, "action" specifies how it is
|
||||
to be processed. Valid values are "read" (the default),
|
||||
"recurse" (equivalent to the -r option), or "skip" (silently
|
||||
skip the path). In the default case, directories are read as
|
||||
if they were ordinary files. In some operating systems the
|
||||
effect of reading a directory like this is an immediate end-
|
||||
of-file.
|
||||
|
||||
-e pattern, --regex=pattern,
|
||||
--regexp=pattern Specify a pattern to be matched. This option
|
||||
can be used multiple times in order to specify several pat-
|
||||
terns. It can also be used as a way of specifying a single
|
||||
pattern that starts with a hyphen. When -e is used, no argu-
|
||||
ment pattern is taken from the command line; all arguments
|
||||
are treated as file names. There is an overall maximum of 100
|
||||
patterns. They are applied to each line in the order in which
|
||||
they are defined until one matches (or fails to match if -v
|
||||
is used). If -f is used with -e, the command line patterns
|
||||
are matched first, followed by the patterns from the file,
|
||||
independent of the order in which these options are speci-
|
||||
fied. Note that multiple use of -e is not the same as a sin-
|
||||
gle pattern with alternatives. For example, X|Y finds the
|
||||
first character in a line that is X or Y, whereas if the two
|
||||
patterns are given separately, pcregrep finds X if it is
|
||||
present, even if it follows Y in the line. It finds Y only if
|
||||
there is no X in the line. This really matters only if you
|
||||
are using -o to show the portion of the line that matched.
|
||||
|
||||
--exclude=pattern
|
||||
When pcregrep is searching the files in a directory as a con-
|
||||
sequence of the -r (recursive search) option, any files whose
|
||||
names match the pattern are excluded. The pattern is a PCRE
|
||||
regular expression. If a file name matches both --include and
|
||||
--exclude, it is excluded. There is no short form for this
|
||||
option.
|
||||
|
||||
-F, --fixed-strings
|
||||
Interpret each pattern as a list of fixed strings, separated
|
||||
by newlines, instead of as a regular expression. The -w
|
||||
(match as a word) and -x (match whole line) options can be
|
||||
used with -F. They apply to each of the fixed strings. A line
|
||||
is selected if any of the fixed strings are found in it (sub-
|
||||
ject to -w or -x, if present).
|
||||
|
||||
-f filename, --file=filename
|
||||
Read a number of patterns from the file, one per line, and
|
||||
match them against each line of input. A data line is output
|
||||
if any of the patterns match it. The filename can be given as
|
||||
"-" to refer to the standard input. When -f is used, patterns
|
||||
specified on the command line using -e may also be present;
|
||||
they are tested before the file's patterns. However, no other
|
||||
pattern is taken from the command line; all arguments are
|
||||
treated as file names. There is an overall maximum of 100
|
||||
patterns. Trailing white space is removed from each line, and
|
||||
blank lines are ignored. An empty file contains no patterns
|
||||
and therefore matches nothing.
|
||||
|
||||
-H, --with-filename
|
||||
Force the inclusion of the filename at the start of output
|
||||
lines when searching a single file. By default, the filename
|
||||
is not shown in this case. For matching lines, the filename
|
||||
is followed by a colon and a space; for context lines, a
|
||||
hyphen separator is used. If a line number is also being out-
|
||||
put, it follows the file name without a space.
|
||||
|
||||
-h, --no-filename
|
||||
Suppress the output filenames when searching multiple files.
|
||||
By default, filenames are shown when multiple files are
|
||||
searched. For matching lines, the filename is followed by a
|
||||
colon and a space; for context lines, a hyphen separator is
|
||||
used. If a line number is also being output, it follows the
|
||||
file name without a space.
|
||||
|
||||
--help Output a brief help message and exit.
|
||||
|
||||
-i, --ignore-case
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
|
||||
--include=pattern
|
||||
When pcregrep is searching the files in a directory as a con-
|
||||
sequence of the -r (recursive search) option, only those
|
||||
files whose names match the pattern are included. The pattern
|
||||
is a PCRE regular expression. If a file name matches both
|
||||
--include and --exclude, it is excluded. There is no short
|
||||
form for this option.
|
||||
|
||||
-L, --files-without-match
|
||||
Instead of outputting lines from the files, just output the
|
||||
names of the files that do not contain any lines that would
|
||||
have been output. Each file name is output once, on a sepa-
|
||||
rate line.
|
||||
|
||||
-l, --files-with-matches
|
||||
Instead of outputting lines from the files, just output the
|
||||
names of the files containing lines that would have been out-
|
||||
put. Each file name is output once, on a separate line.
|
||||
Searching stops as soon as a matching line is found in a
|
||||
file.
|
||||
|
||||
--label=name
|
||||
This option supplies a name to be used for the standard input
|
||||
when file names are being output. If not supplied, "(standard
|
||||
input)" is used. There is no short form for this option.
|
||||
|
||||
--locale=locale-name
|
||||
This option specifies a locale to be used for pattern match-
|
||||
ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
|
||||
ronment variables. If no locale is specified, the PCRE
|
||||
library's default (usually the "C" locale) is used. There is
|
||||
no short form for this option.
|
||||
|
||||
-M, --multiline
|
||||
Allow patterns to match more than one line. When this option
|
||||
is given, patterns may usefully contain literal newline char-
|
||||
acters and internal occurrences of ^ and $ characters. The
|
||||
output for any one match may consist of more than one line.
|
||||
When this option is set, the PCRE library is called in "mul-
|
||||
tiline" mode. There is a limit to the number of lines that
|
||||
can be matched, imposed by the way that pcregrep buffers the
|
||||
input file as it scans it. However, pcregrep ensures that at
|
||||
least 8K characters or the rest of the document (whichever is
|
||||
the shorter) are available for forward matching, and simi-
|
||||
larly the previous 8K characters (or all the previous charac-
|
||||
ters, if fewer than 8K) are guaranteed to be available for
|
||||
lookbehind assertions.
|
||||
|
||||
-N newline-type, --newline=newline-type
|
||||
The PCRE library supports three different character sequences
|
||||
for indicating the ends of lines. They are the single-charac-
|
||||
ter sequences CR (carriage return) and LF (linefeed), and the
|
||||
two-character sequence CR, LF. When the library is built, a
|
||||
default line-ending sequence is specified. This is normally
|
||||
the standard sequence for the operating system. Unless other-
|
||||
wise specified by this option, pcregrep uses the default. The
|
||||
possible values for this option are CR, LF, or CRLF. This
|
||||
makes it possible to use pcregrep on files that have come
|
||||
from other environments without having to modify their line
|
||||
endings. If the data that is being scanned does not agree
|
||||
with the convention set by this option, pcregrep may behave
|
||||
in strange ways.
|
||||
|
||||
-n, --line-number
|
||||
Precede each output line by its line number in the file, fol-
|
||||
lowed by a colon and a space for matching lines or a hyphen
|
||||
and a space for context lines. If the filename is also being
|
||||
output, it precedes the line number.
|
||||
|
||||
-o, --only-matching
|
||||
Show only the part of the line that matched a pattern. In
|
||||
this mode, no context is shown. That is, the -A, -B, and -C
|
||||
options are ignored.
|
||||
|
||||
-q, --quiet
|
||||
Work quietly, that is, display nothing except error messages.
|
||||
The exit status indicates whether or not any matches were
|
||||
found.
|
||||
|
||||
-r, --recursive
|
||||
If any given path is a directory, recursively scan the files
|
||||
it contains, taking note of any --include and --exclude set-
|
||||
tings. By default, a directory is read as a normal file; in
|
||||
some operating systems this gives an immediate end-of-file.
|
||||
This option is a shorthand for setting the -d option to
|
||||
"recurse".
|
||||
|
||||
-s, --no-messages
|
||||
Suppress error messages about non-existent or unreadable
|
||||
files. Such files are quietly skipped. However, the return
|
||||
code is still 2, even if matches were found in other files.
|
||||
|
||||
-u, --utf-8
|
||||
Operate in UTF-8 mode. This option is available only if PCRE
|
||||
has been compiled with UTF-8 support. Both patterns and sub-
|
||||
ject lines must be valid strings of UTF-8 characters.
|
||||
|
||||
-V, --version
|
||||
Write the version numbers of pcregrep and the PCRE library
|
||||
that is being used to the standard error stream.
|
||||
|
||||
-v, --invert-match
|
||||
Invert the sense of the match, so that lines which do not
|
||||
match any of the patterns are the ones that are found.
|
||||
|
||||
-w, --word-regex, --word-regexp
|
||||
Force the patterns to match only whole words. This is equiva-
|
||||
lent to having \b at the start and end of the pattern.
|
||||
|
||||
-x, --line-regex, --line-regexp
|
||||
Force the patterns to be anchored (each must start matching
|
||||
at the beginning of a line) and in addition, require them to
|
||||
match entire lines. This is equivalent to having ^ and $
|
||||
characters at the start and end of each alternative branch in
|
||||
every pattern.
|
||||
|
||||
|
||||
ENVIRONMENT VARIABLES
|
||||
|
||||
The environment variables LC_ALL and LC_CTYPE are examined, in that
|
||||
order, for a locale. The first one that is set is used. This can be
|
||||
overridden by the --locale option. If no locale is set, the PCRE
|
||||
library's default (usually the "C" locale) is used.
|
||||
|
||||
|
||||
NEWLINES
|
||||
|
||||
The -N (--newline) option allows pcregrep to scan files with different
|
||||
newline conventions from the default. However, the setting of this
|
||||
option does not affect the way in which pcregrep writes information to
|
||||
the standard error and output streams. It uses the string "\n" in C
|
||||
printf() calls to indicate newlines, relying on the C I/O library to
|
||||
convert this to an appropriate sequence if the output is sent to a
|
||||
file.
|
||||
|
||||
|
||||
OPTIONS COMPATIBILITY
|
||||
|
||||
The majority of short and long forms of pcregrep's options are the same
|
||||
as in the GNU grep program. Any long option of the form --xxx-regexp
|
||||
(GNU terminology) is also available as --xxx-regex (PCRE terminology).
|
||||
However, the --locale, -M, --multiline, -u, and --utf-8 options are
|
||||
specific to pcregrep.
|
||||
|
||||
|
||||
OPTIONS WITH DATA
|
||||
|
||||
There are four different ways in which an option with data can be spec-
|
||||
ified. If a short form option is used, the data may follow immedi-
|
||||
ately, or in the next command line item. For example:
|
||||
|
||||
-f/some/file
|
||||
-f /some/file
|
||||
|
||||
If a long form option is used, the data may appear in the same command
|
||||
line item, separated by an equals character, or (with one exception) it
|
||||
may appear in the next command line item. For example:
|
||||
|
||||
--file=/some/file
|
||||
--file /some/file
|
||||
|
||||
Note, however, that if you want to supply a file name beginning with ~
|
||||
as data in a shell command, and have the shell expand ~ to a home
|
||||
directory, you must separate the file name from the option, because the
|
||||
shell does not treat ~ specially unless it is at the start of an item.
|
||||
|
||||
The exception to the above is the --colour (or --color) option, for
|
||||
which the data is optional. If this option does have data, it must be
|
||||
given in the first form, using an equals character. Otherwise it will
|
||||
be assumed that it has no data.
|
||||
|
||||
|
||||
MATCHING ERRORS
|
||||
|
||||
It is possible to supply a regular expression that takes a very long
|
||||
time to fail to match certain lines. Such patterns normally involve
|
||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||
line of a's with no final digit. The PCRE matching function has a
|
||||
resource limit that causes it to abort in these circumstances. If this
|
||||
happens, pcregrep outputs an error message and the line that caused the
|
||||
problem to the standard error stream. If there are more than 20 such
|
||||
errors, pcregrep gives up.
|
||||
|
||||
|
||||
DIAGNOSTICS
|
||||
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found,
|
||||
and 2 for syntax errors and non-existent or inacessible files (even if
|
||||
matches were found in other files) or too many matching errors. Using
|
||||
the -s option to suppress error messages about inaccessble files does
|
||||
not affect the return code.
|
||||
|
||||
|
||||
AUTHOR
|
||||
|
||||
Philip Hazel
|
||||
University Computing Service
|
||||
Cambridge CB2 3QG, England.
|
||||
|
||||
Last updated: 06 June 2006
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,157 @@
|
|||
.TH PCREMATCHING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE MATCHING ALGORITHMS"
|
||||
.rs
|
||||
.sp
|
||||
This document describes the two different algorithms that are available in PCRE
|
||||
for matching a compiled regular expression against a given subject string. The
|
||||
"standard" algorithm is the one provided by the \fBpcre_exec()\fP function.
|
||||
This works in the same was as Perl's matching function, and provides a
|
||||
Perl-compatible matching operation.
|
||||
.P
|
||||
An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP function;
|
||||
this operates in a different way, and is not Perl-compatible. It has advantages
|
||||
and disadvantages compared with the standard algorithm, and these are described
|
||||
below.
|
||||
.P
|
||||
When there is only one possible way in which a given subject string can match a
|
||||
pattern, the two algorithms give the same answer. A difference arises, however,
|
||||
when there are multiple possibilities. For example, if the pattern
|
||||
.sp
|
||||
^<.*>
|
||||
.sp
|
||||
is matched against the string
|
||||
.sp
|
||||
<something> <something else> <something further>
|
||||
.sp
|
||||
there are three possible answers. The standard algorithm finds only one of
|
||||
them, whereas the DFA algorithm finds all three.
|
||||
.
|
||||
.SH "REGULAR EXPRESSIONS AS TREES"
|
||||
.rs
|
||||
.sp
|
||||
The set of strings that are matched by a regular expression can be represented
|
||||
as a tree structure. An unlimited repetition in the pattern makes the tree of
|
||||
infinite size, but it is still a tree. Matching the pattern to a given subject
|
||||
string (from a given starting point) can be thought of as a search of the tree.
|
||||
There are two ways to search a tree: depth-first and breadth-first, and these
|
||||
correspond to the two matching algorithms provided by PCRE.
|
||||
.
|
||||
.SH "THE STANDARD MATCHING ALGORITHM"
|
||||
.rs
|
||||
.sp
|
||||
In the terminology of Jeffrey Friedl's book \fIMastering Regular
|
||||
Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a
|
||||
depth-first search of the pattern tree. That is, it proceeds along a single
|
||||
path through the tree, checking that the subject matches what is required. When
|
||||
there is a mismatch, the algorithm tries any alternatives at the current point,
|
||||
and if they all fail, it backs up to the previous branch point in the tree, and
|
||||
tries the next alternative branch at that level. This often involves backing up
|
||||
(moving to the left) in the subject string as well. The order in which
|
||||
repetition branches are tried is controlled by the greedy or ungreedy nature of
|
||||
the quantifier.
|
||||
.P
|
||||
If a leaf node is reached, a matching string has been found, and at that point
|
||||
the algorithm stops. Thus, if there is more than one possible match, this
|
||||
algorithm returns the first one that it finds. Whether this is the shortest,
|
||||
the longest, or some intermediate length depends on the way the greedy and
|
||||
ungreedy repetition quantifiers are specified in the pattern.
|
||||
.P
|
||||
Because it ends up with a single path through the tree, it is relatively
|
||||
straightforward for this algorithm to keep track of the substrings that are
|
||||
matched by portions of the pattern in parentheses. This provides support for
|
||||
capturing parentheses and back references.
|
||||
.
|
||||
.SH "THE DFA MATCHING ALGORITHM"
|
||||
.rs
|
||||
.sp
|
||||
DFA stands for "deterministic finite automaton", but you do not need to
|
||||
understand the origins of that name. This algorithm conducts a breadth-first
|
||||
search of the tree. Starting from the first matching point in the subject, it
|
||||
scans the subject string from left to right, once, character by character, and
|
||||
as it does this, it remembers all the paths through the tree that represent
|
||||
valid matches.
|
||||
.P
|
||||
The scan continues until either the end of the subject is reached, or there are
|
||||
no more unterminated paths. At this point, terminated paths represent the
|
||||
different matching possibilities (if there are none, the match has failed).
|
||||
Thus, if there is more than one possible match, this algorithm finds all of
|
||||
them, and in particular, it finds the longest. In PCRE, there is an option to
|
||||
stop the algorithm after the first match (which is necessarily the shortest)
|
||||
has been found.
|
||||
.P
|
||||
Note that all the matches that are found start at the same point in the
|
||||
subject. If the pattern
|
||||
.sp
|
||||
cat(er(pillar)?)
|
||||
.sp
|
||||
is matched against the string "the caterpillar catchment", the result will be
|
||||
the three strings "cat", "cater", and "caterpillar" that start at the fourth
|
||||
character of the subject. The algorithm does not automatically move on to find
|
||||
matches that start at later positions.
|
||||
.P
|
||||
There are a number of features of PCRE regular expressions that are not
|
||||
supported by the DFA matching algorithm. They are as follows:
|
||||
.P
|
||||
1. Because the algorithm finds all possible matches, the greedy or ungreedy
|
||||
nature of repetition quantifiers is not relevant. Greedy and ungreedy
|
||||
quantifiers are treated in exactly the same way.
|
||||
.P
|
||||
2. When dealing with multiple paths through the tree simultaneously, it is not
|
||||
straightforward to keep track of captured substrings for the different matching
|
||||
possibilities, and PCRE's implementation of this algorithm does not attempt to
|
||||
do this. This means that no captured substrings are available.
|
||||
.P
|
||||
3. Because no substrings are captured, back references within the pattern are
|
||||
not supported, and cause errors if encountered.
|
||||
.P
|
||||
4. For the same reason, conditional expressions that use a backreference as the
|
||||
condition are not supported.
|
||||
.P
|
||||
5. Callouts are supported, but the value of the \fIcapture_top\fP field is
|
||||
always 1, and the value of the \fIcapture_last\fP field is always -1.
|
||||
.P
|
||||
6.
|
||||
The \eC escape sequence, which (in the standard algorithm) matches a single
|
||||
byte, even in UTF-8 mode, is not supported because the DFA algorithm moves
|
||||
through the subject string one character at a time, for all active paths
|
||||
through the tree.
|
||||
.
|
||||
.SH "ADVANTAGES OF THE DFA ALGORITHM"
|
||||
.rs
|
||||
.sp
|
||||
Using the DFA matching algorithm provides the following advantages:
|
||||
.P
|
||||
1. All possible matches (at a single point in the subject) are automatically
|
||||
found, and in particular, the longest match is found. To find more than one
|
||||
match using the standard algorithm, you have to do kludgy things with
|
||||
callouts.
|
||||
.P
|
||||
2. There is much better support for partial matching. The restrictions on the
|
||||
content of the pattern that apply when using the standard algorithm for partial
|
||||
matching do not apply to the DFA algorithm. For non-anchored patterns, the
|
||||
starting position of a partial match is available.
|
||||
.P
|
||||
3. Because the DFA algorithm scans the subject string just once, and never
|
||||
needs to backtrack, it is possible to pass very long subject strings to the
|
||||
matching function in several pieces, checking for partial matching each time.
|
||||
.
|
||||
.SH "DISADVANTAGES OF THE DFA ALGORITHM"
|
||||
.rs
|
||||
.sp
|
||||
The DFA algorithm suffers from a number of disadvantages:
|
||||
.P
|
||||
1. It is substantially slower than the standard algorithm. This is partly
|
||||
because it has to search for all possible matches, but is also because it is
|
||||
less susceptible to optimization.
|
||||
.P
|
||||
2. Capturing parentheses and back references are not supported.
|
||||
.P
|
||||
3. The "atomic group" feature of PCRE regular expressions is supported, but
|
||||
does not provide the advantage that it does for the standard algorithm.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 06 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,203 @@
|
|||
.TH PCREPARTIAL 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PARTIAL MATCHING IN PCRE"
|
||||
.rs
|
||||
.sp
|
||||
In normal use of PCRE, if the subject string that is passed to
|
||||
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP matches as far as it goes, but is
|
||||
too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
|
||||
are circumstances where it might be helpful to distinguish this case from other
|
||||
cases in which there is no match.
|
||||
.P
|
||||
Consider, for example, an application where a human is required to type in data
|
||||
for a field with specific formatting requirements. An example might be a date
|
||||
in the form \fIddmmmyy\fP, defined by this pattern:
|
||||
.sp
|
||||
^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$
|
||||
.sp
|
||||
If the application sees the user's keystrokes one by one, and can check that
|
||||
what has been typed so far is potentially valid, it is able to raise an error
|
||||
as soon as a mistake is made, possibly beeping and not reflecting the
|
||||
character that has been typed. This immediate feedback is likely to be a better
|
||||
user interface than a check that is delayed until the entire string has been
|
||||
entered.
|
||||
.P
|
||||
PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
|
||||
option, which can be set when calling \fBpcre_exec()\fP or
|
||||
\fBpcre_dfa_exec()\fP. When this flag is set for \fBpcre_exec()\fP, the return
|
||||
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
|
||||
during the matching process the last part of the subject string matched part of
|
||||
the pattern. Unfortunately, for non-anchored matching, it is not possible to
|
||||
obtain the position of the start of the partial match. No captured data is set
|
||||
when PCRE_ERROR_PARTIAL is returned.
|
||||
.P
|
||||
When PCRE_PARTIAL is set for \fBpcre_dfa_exec()\fP, the return code
|
||||
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
|
||||
subject is reached, there have been no complete matches, but there is still at
|
||||
least one matching possibility. The portion of the string that provided the
|
||||
partial match is set as the first matching string.
|
||||
.P
|
||||
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
|
||||
last literal byte in a pattern, and abandons matching immediately if such a
|
||||
byte is not present in the subject string. This optimization cannot be used
|
||||
for a subject string that might match only partially.
|
||||
.
|
||||
.
|
||||
.SH "RESTRICTED PATTERNS FOR PCRE_PARTIAL"
|
||||
.rs
|
||||
.sp
|
||||
Because of the way certain internal optimizations are implemented in the
|
||||
\fBpcre_exec()\fP function, the PCRE_PARTIAL option cannot be used with all
|
||||
patterns. These restrictions do not apply when \fBpcre_dfa_exec()\fP is used.
|
||||
For \fBpcre_exec()\fP, repeated single characters such as
|
||||
.sp
|
||||
a{2,4}
|
||||
.sp
|
||||
and repeated single metasequences such as
|
||||
.sp
|
||||
\ed+
|
||||
.sp
|
||||
are not permitted if the maximum number of occurrences is greater than one.
|
||||
Optional items such as \ed? (where the maximum is one) are permitted.
|
||||
Quantifiers with any values are permitted after parentheses, so the invalid
|
||||
examples above can be coded thus:
|
||||
.sp
|
||||
(a){2,4}
|
||||
(\ed)+
|
||||
.sp
|
||||
These constructions run more slowly, but for the kinds of application that are
|
||||
envisaged for this facility, this is not felt to be a major restriction.
|
||||
.P
|
||||
If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
|
||||
\fBpcre_exec()\fP returns the error code PCRE_ERROR_BADPARTIAL (-13).
|
||||
.
|
||||
.
|
||||
.SH "EXAMPLE OF PARTIAL MATCHING USING PCRETEST"
|
||||
.rs
|
||||
.sp
|
||||
If the escape sequence \eP is present in a \fBpcretest\fP data line, the
|
||||
PCRE_PARTIAL flag is used for the match. Here is a run of \fBpcretest\fP that
|
||||
uses the date example quoted above:
|
||||
.sp
|
||||
re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
|
||||
data> 25jun04\eP
|
||||
0: 25jun04
|
||||
1: jun
|
||||
data> 25dec3\eP
|
||||
Partial match
|
||||
data> 3ju\eP
|
||||
Partial match
|
||||
data> 3juj\eP
|
||||
No match
|
||||
data> j\eP
|
||||
No match
|
||||
.sp
|
||||
The first data string is matched completely, so \fBpcretest\fP shows the
|
||||
matched substrings. The remaining four strings do not match the complete
|
||||
pattern, but the first two are partial matches. The same test, using DFA
|
||||
matching (by means of the \eD escape sequence), produces the following output:
|
||||
.sp
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 25jun04\eP\eD
|
||||
0: 25jun04
|
||||
data> 23dec3\eP\eD
|
||||
Partial match: 23dec3
|
||||
data> 3ju\eP\eD
|
||||
Partial match: 3ju
|
||||
data> 3juj\eP\eD
|
||||
No match
|
||||
data> j\eP\eD
|
||||
No match
|
||||
.sp
|
||||
Notice that in this case the portion of the string that was matched is made
|
||||
available.
|
||||
.
|
||||
.
|
||||
.SH "MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()"
|
||||
.rs
|
||||
.sp
|
||||
When a partial match has been found using \fBpcre_dfa_exec()\fP, it is possible
|
||||
to continue the match by providing additional subject data and calling
|
||||
\fBpcre_dfa_exec()\fP again with the PCRE_DFA_RESTART option and the same
|
||||
working space (where details of the previous partial match are stored). Here is
|
||||
an example using \fBpcretest\fP, where the \eR escape sequence sets the
|
||||
PCRE_DFA_RESTART option and the \eD escape sequence requests the use of
|
||||
\fBpcre_dfa_exec()\fP:
|
||||
.sp
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 23ja\eP\eD
|
||||
Partial match: 23ja
|
||||
data> n05\eR\eD
|
||||
0: n05
|
||||
.sp
|
||||
The first call has "23ja" as the subject, and requests partial matching; the
|
||||
second call has "n05" as the subject for the continued (restarted) match.
|
||||
Notice that when the match is complete, only the last part is shown; PCRE does
|
||||
not retain the previously partially-matched string. It is up to the calling
|
||||
program to do that if it needs to.
|
||||
.P
|
||||
This facility can be used to pass very long subject strings to
|
||||
\fBpcre_dfa_exec()\fP. However, some care is needed for certain types of
|
||||
pattern.
|
||||
.P
|
||||
1. If the pattern contains tests for the beginning or end of a line, you need
|
||||
to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
|
||||
subject string for any call does not contain the beginning or end of a line.
|
||||
.P
|
||||
2. If the pattern contains backward assertions (including \eb or \eB), you need
|
||||
to arrange for some overlap in the subject strings to allow for this. For
|
||||
example, you could pass the subject in chunks that were 500 bytes long, but in
|
||||
a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
|
||||
bytes at the start of the buffer.
|
||||
.P
|
||||
3. Matching a subject string that is split into multiple segments does not
|
||||
always produce exactly the same result as matching over one single long string.
|
||||
The difference arises when there are multiple matching possibilities, because a
|
||||
partial match result is given only when there are no completed matches in a
|
||||
call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
|
||||
been found, continuation to a new subject segment is no longer possible.
|
||||
Consider this \fBpcretest\fP example:
|
||||
.sp
|
||||
re> /dog(sbody)?/
|
||||
data> do\eP\eD
|
||||
Partial match: do
|
||||
data> gsb\eR\eP\eD
|
||||
0: g
|
||||
data> dogsbody\eD
|
||||
0: dogsbody
|
||||
1: dog
|
||||
.sp
|
||||
The pattern matches the words "dog" or "dogsbody". When the subject is
|
||||
presented in several parts ("do" and "gsb" being the first two) the match stops
|
||||
when "dog" has been found, and it is not possible to continue. On the other
|
||||
hand, if "dogsbody" is presented as a single string, both matches are found.
|
||||
.P
|
||||
Because of this phenomenon, it does not usually make sense to end a pattern
|
||||
that is going to be matched in this way with a variable repeat.
|
||||
.P
|
||||
4. Patterns that contain alternatives at the top level which do not all
|
||||
start with the same pattern item may not work as expected. For example,
|
||||
consider this pattern:
|
||||
.sp
|
||||
1234|3789
|
||||
.sp
|
||||
If the first part of the subject is "ABC123", a partial match of the first
|
||||
alternative is found at offset 3. There is no partial match for the second
|
||||
alternative, because such a match does not start at the same point in the
|
||||
subject string. Attempting to continue with the string "789" does not yield a
|
||||
match because only those alternatives that match at one point in the subject
|
||||
are remembered. The problem arises because the start of the second alternative
|
||||
matches within the first alternative. There is no problem with anchored
|
||||
patterns or patterns such as:
|
||||
.sp
|
||||
1234|ABCD
|
||||
.sp
|
||||
where no string can be a partial match for both alternatives.
|
||||
.
|
||||
.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 16 January 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,76 @@
|
|||
.TH PCREPERFORM 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE PERFORMANCE"
|
||||
.rs
|
||||
.sp
|
||||
Certain items that may appear in regular expression patterns are more efficient
|
||||
than others. It is more efficient to use a character class like [aeiou] than a
|
||||
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
|
||||
that provides the required behaviour is usually the most efficient. Jeffrey
|
||||
Friedl's book contains a lot of useful general discussion about optimizing
|
||||
regular expressions for efficient performance. This document contains a few
|
||||
observations about PCRE.
|
||||
.P
|
||||
Using Unicode character properties (the \ep, \eP, and \eX escapes) is slow,
|
||||
because PCRE has to scan a structure that contains data for over fifteen
|
||||
thousand characters whenever it needs a character's property. If you can find
|
||||
an alternative pattern that does not use character properties, it will probably
|
||||
be faster.
|
||||
.P
|
||||
When a pattern begins with .* not in parentheses, or in parentheses that are
|
||||
not the subject of a backreference, and the PCRE_DOTALL option is set, the
|
||||
pattern is implicitly anchored by PCRE, since it can match only at the start of
|
||||
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
|
||||
optimization, because the . metacharacter does not then match a newline, and if
|
||||
the subject string contains newlines, the pattern may match from the character
|
||||
immediately following one of them instead of from the very start. For example,
|
||||
the pattern
|
||||
.sp
|
||||
.*second
|
||||
.sp
|
||||
matches the subject "first\enand second" (where \en stands for a newline
|
||||
character), with the match starting at the seventh character. In order to do
|
||||
this, PCRE has to retry the match starting after every newline in the subject.
|
||||
.P
|
||||
If you are using such a pattern with subject strings that do not contain
|
||||
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
|
||||
the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE
|
||||
from having to scan along the subject looking for a newline to restart at.
|
||||
.P
|
||||
Beware of patterns that contain nested indefinite repeats. These can take a
|
||||
long time to run when applied to a string that does not match. Consider the
|
||||
pattern fragment
|
||||
.sp
|
||||
(a+)*
|
||||
.sp
|
||||
This can match "aaaa" in 33 different ways, and this number increases very
|
||||
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
|
||||
times, and for each of those cases other than 0, the + repeats can match
|
||||
different numbers of times.) When the remainder of the pattern is such that the
|
||||
entire match is going to fail, PCRE has in principle to try every possible
|
||||
variation, and this can take an extremely long time.
|
||||
.P
|
||||
An optimization catches some of the more simple cases such as
|
||||
.sp
|
||||
(a+)*b
|
||||
.sp
|
||||
where a literal character follows. Before embarking on the standard matching
|
||||
procedure, PCRE checks that there is a "b" later in the subject string, and if
|
||||
there is not, it fails the match immediately. However, when there is no
|
||||
following literal this optimization cannot be used. You can see the difference
|
||||
by comparing the behaviour of
|
||||
.sp
|
||||
(a+)*\ed
|
||||
.sp
|
||||
with the pattern above. The former gives a failure almost instantly when
|
||||
applied to a whole line of "a" characters, whereas the latter takes an
|
||||
appreciable time with strings longer than about 20 characters.
|
||||
.P
|
||||
In many cases, the solution to this kind of performance issue is to use an
|
||||
atomic group or a possessive quantifier.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 28 February 2005
|
||||
.br
|
||||
Copyright (c) 1997-2005 University of Cambridge.
|
|
@ -0,0 +1,226 @@
|
|||
.TH PCREPOSIX 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions.
|
||||
.SH "SYNOPSIS OF POSIX API"
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcreposix.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int regcomp(regex_t *\fIpreg\fP, const char *\fIpattern\fP,
|
||||
.ti +5n
|
||||
.B int \fIcflags\fP);
|
||||
.PP
|
||||
.br
|
||||
.B int regexec(regex_t *\fIpreg\fP, const char *\fIstring\fP,
|
||||
.ti +5n
|
||||
.B size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);
|
||||
.PP
|
||||
.br
|
||||
.B size_t regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP,
|
||||
.ti +5n
|
||||
.B char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);
|
||||
.PP
|
||||
.br
|
||||
.B void regfree(regex_t *\fIpreg\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation for a description of PCRE's native API, which contains much
|
||||
additional functionality.
|
||||
.P
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the PCRE native API. Their prototypes are defined in the \fBpcreposix.h\fP
|
||||
header file, and on Unix systems the library itself is called
|
||||
\fBpcreposix.a\fP, so can be accessed by adding \fB-lpcreposix\fP to the
|
||||
command for linking an application that uses them. Because the POSIX functions
|
||||
call the native ones, it is also necessary to add \fB-lpcre\fP.
|
||||
.P
|
||||
I have implemented only those option bits that can be reasonably mapped to PCRE
|
||||
native options. In addition, the option REG_EXTENDED is defined with the value
|
||||
zero. This has no effect, but since programs that are written to the POSIX
|
||||
interface often use it, this makes it easier to slot in PCRE as a replacement
|
||||
library. Other POSIX options are not even defined.
|
||||
.P
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below. "POSIX-like in style" means that the API approximates to the
|
||||
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
|
||||
domains it is probably even less compatible.
|
||||
.P
|
||||
The header for these functions is supplied as \fBpcreposix.h\fP to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as \fBregex.h\fP, which is the "correct" name. It provides two
|
||||
structure types, \fIregex_t\fP for compiled internal forms, and
|
||||
\fIregmatch_t\fP for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
.P
|
||||
.SH "COMPILING A PATTERN"
|
||||
.rs
|
||||
.sp
|
||||
The function \fBregcomp()\fP is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer
|
||||
to a \fBregex_t\fP structure that is used as a base for storing information
|
||||
about the compiled regular expression.
|
||||
.P
|
||||
The argument \fIcflags\fP is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
.sp
|
||||
REG_DOTALL
|
||||
.sp
|
||||
The PCRE_DOTALL option is set when the regular expression is passed for
|
||||
compilation to the native function. Note that REG_DOTALL is not part of the
|
||||
POSIX standard.
|
||||
.sp
|
||||
REG_ICASE
|
||||
.sp
|
||||
The PCRE_CASELESS option is set when the regular expression is passed for
|
||||
compilation to the native function.
|
||||
.sp
|
||||
REG_NEWLINE
|
||||
.sp
|
||||
The PCRE_MULTILINE option is set when the regular expression is passed for
|
||||
compilation to the native function. Note that this does \fInot\fP mimic the
|
||||
defined POSIX behaviour for REG_NEWLINE (see the following section).
|
||||
.sp
|
||||
REG_NOSUB
|
||||
.sp
|
||||
The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
|
||||
for compilation to the native function. In addition, when a pattern that is
|
||||
compiled with this flag is passed to \fBregexec()\fP for matching, the
|
||||
\fInmatch\fP and \fIpmatch\fP arguments are ignored, and no captured strings
|
||||
are returned.
|
||||
.sp
|
||||
REG_UTF8
|
||||
.sp
|
||||
The PCRE_UTF8 option is set when the regular expression is passed for
|
||||
compilation to the native function. This causes the pattern itself and all data
|
||||
strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
|
||||
is not part of the POSIX standard.
|
||||
.P
|
||||
In the absence of these flags, no options are passed to the native function.
|
||||
This means the the regex is compiled with PCRE default semantics. In
|
||||
particular, the way it handles newline characters in the subject string is the
|
||||
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
|
||||
\fIsome\fP of the effects specified for REG_NEWLINE. It does not affect the way
|
||||
newlines are matched by . (they aren't) or by a negative class such as [^a]
|
||||
(they are).
|
||||
.P
|
||||
The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
|
||||
\fIpreg\fP structure is filled in on success, and one member of the structure
|
||||
is public: \fIre_nsub\fP contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
.
|
||||
.
|
||||
.SH "MATCHING NEWLINE CHARACTERS"
|
||||
.rs
|
||||
.sp
|
||||
This area is not simple, because POSIX and Perl take different views of things.
|
||||
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
|
||||
intended to be a POSIX engine. The following table lists the different
|
||||
possibilities for matching newline characters in PCRE:
|
||||
.sp
|
||||
Default Change with
|
||||
.sp
|
||||
. matches newline no PCRE_DOTALL
|
||||
newline matches [^a] yes not changeable
|
||||
$ matches \en at end yes PCRE_DOLLARENDONLY
|
||||
$ matches \en in middle no PCRE_MULTILINE
|
||||
^ matches \en in middle no PCRE_MULTILINE
|
||||
.sp
|
||||
This is the equivalent table for POSIX:
|
||||
.sp
|
||||
Default Change with
|
||||
.sp
|
||||
. matches newline yes REG_NEWLINE
|
||||
newline matches [^a] yes REG_NEWLINE
|
||||
$ matches \en at end no REG_NEWLINE
|
||||
$ matches \en in middle no REG_NEWLINE
|
||||
^ matches \en in middle no REG_NEWLINE
|
||||
.sp
|
||||
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
|
||||
PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
|
||||
newline from matching [^a].
|
||||
.P
|
||||
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
|
||||
PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
|
||||
REG_NEWLINE action.
|
||||
.
|
||||
.
|
||||
.SH "MATCHING A PATTERN"
|
||||
.rs
|
||||
.sp
|
||||
The function \fBregexec()\fP is called to match a compiled pattern \fIpreg\fP
|
||||
against a given \fIstring\fP, which is terminated by a zero byte, subject to
|
||||
the options in \fIeflags\fP. These can be:
|
||||
.sp
|
||||
REG_NOTBOL
|
||||
.sp
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
.sp
|
||||
REG_NOTEOL
|
||||
.sp
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
.P
|
||||
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
||||
strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
|
||||
\fBregexec()\fP are ignored.
|
||||
.P
|
||||
Otherwise,the portion of the string that was matched, and also any captured
|
||||
substrings, are returned via the \fIpmatch\fP argument, which points to an
|
||||
array of \fInmatch\fP structures of type \fIregmatch_t\fP, containing the
|
||||
members \fIrm_so\fP and \fIrm_eo\fP. These contain the offset to the first
|
||||
character of each substring and the offset to the first character after the end
|
||||
of each substring, respectively. The 0th element of the vector relates to the
|
||||
entire portion of \fIstring\fP that was matched; subsequent elements relate to
|
||||
the capturing subpatterns of the regular expression. Unused entries in the
|
||||
array have both structure members set to -1.
|
||||
.P
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
.
|
||||
.
|
||||
.SH "ERROR MESSAGES"
|
||||
.rs
|
||||
.sp
|
||||
The \fBregerror()\fP function maps a non-zero errorcode from either
|
||||
\fBregcomp()\fP or \fBregexec()\fP to a printable message. If \fIpreg\fP is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in \fIerrbuf\fP. The length of the
|
||||
message, including the zero, is limited to \fIerrbuf_size\fP. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
.
|
||||
.
|
||||
.SH MEMORY USAGE
|
||||
.rs
|
||||
.sp
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the \fIpreg\fP structure. The function \fBregfree()\fP frees all such
|
||||
memory, after which \fIpreg\fP may no longer be used as a compiled expression.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 16 January 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,131 @@
|
|||
.TH PCREPRECOMPILE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "SAVING AND RE-USING PRECOMPILED PCRE PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
If you are running an application that uses a large number of regular
|
||||
expression patterns, it may be useful to store them in a precompiled form
|
||||
instead of having to compile them every time the application is run.
|
||||
If you are not using any private character tables (see the
|
||||
.\" HREF
|
||||
\fBpcre_maketables()\fP
|
||||
.\"
|
||||
documentation), this is relatively straightforward. If you are using private
|
||||
tables, it is a little bit more complicated.
|
||||
.P
|
||||
If you save compiled patterns to a file, you can copy them to a different host
|
||||
and run them there. This works even if the new host has the opposite endianness
|
||||
to the one on which the patterns were compiled. There may be a small
|
||||
performance penalty, but it should be insignificant.
|
||||
.
|
||||
.
|
||||
.SH "SAVING A COMPILED PATTERN"
|
||||
.rs
|
||||
.sh
|
||||
The value returned by \fBpcre_compile()\fP points to a single block of memory
|
||||
that holds the compiled pattern and associated data. You can find the length of
|
||||
this block in bytes by calling \fBpcre_fullinfo()\fP with an argument of
|
||||
PCRE_INFO_SIZE. You can then save the data in any appropriate manner. Here is
|
||||
sample code that compiles a pattern and writes it to a file. It assumes that
|
||||
the variable \fIfd\fP refers to a file that is open for output:
|
||||
.sp
|
||||
int erroroffset, rc, size;
|
||||
char *error;
|
||||
pcre *re;
|
||||
.sp
|
||||
re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
|
||||
if (re == NULL) { ... handle errors ... }
|
||||
rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
|
||||
if (rc < 0) { ... handle errors ... }
|
||||
rc = fwrite(re, 1, size, fd);
|
||||
if (rc != size) { ... handle errors ... }
|
||||
.sp
|
||||
In this example, the bytes that comprise the compiled pattern are copied
|
||||
exactly. Note that this is binary data that may contain any of the 256 possible
|
||||
byte values. On systems that make a distinction between binary and non-binary
|
||||
data, be sure that the file is opened for binary output.
|
||||
.P
|
||||
If you want to write more than one pattern to a file, you will have to devise a
|
||||
way of separating them. For binary data, preceding each pattern with its length
|
||||
is probably the most straightforward approach. Another possibility is to write
|
||||
out the data in hexadecimal instead of binary, one pattern to a line.
|
||||
.P
|
||||
Saving compiled patterns in a file is only one possible way of storing them for
|
||||
later use. They could equally well be saved in a database, or in the memory of
|
||||
some daemon process that passes them via sockets to the processes that want
|
||||
them.
|
||||
.P
|
||||
If the pattern has been studied, it is also possible to save the study data in
|
||||
a similar way to the compiled pattern itself. When studying generates
|
||||
additional information, \fBpcre_study()\fP returns a pointer to a
|
||||
\fBpcre_extra\fP data block. Its format is defined in the
|
||||
.\" HTML <a href="pcreapi.html#extradata">
|
||||
.\" </a>
|
||||
section on matching a pattern
|
||||
.\"
|
||||
in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation. The \fIstudy_data\fP field points to the binary study data, and
|
||||
this is what you must save (not the \fBpcre_extra\fP block itself). The length
|
||||
of the study data can be obtained by calling \fBpcre_fullinfo()\fP with an
|
||||
argument of PCRE_INFO_STUDYSIZE. Remember to check that \fBpcre_study()\fP did
|
||||
return a non-NULL value before trying to save the study data.
|
||||
.
|
||||
.
|
||||
.SH "RE-USING A PRECOMPILED PATTERN"
|
||||
.rs
|
||||
.sp
|
||||
Re-using a precompiled pattern is straightforward. Having reloaded it into main
|
||||
memory, you pass its pointer to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP in
|
||||
the usual way. This should work even on another host, and even if that host has
|
||||
the opposite endianness to the one where the pattern was compiled.
|
||||
.P
|
||||
However, if you passed a pointer to custom character tables when the pattern
|
||||
was compiled (the \fItableptr\fP argument of \fBpcre_compile()\fP), you must
|
||||
now pass a similar pointer to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP,
|
||||
because the value saved with the compiled pattern will obviously be nonsense. A
|
||||
field in a \fBpcre_extra()\fP block is used to pass this data, as described in
|
||||
the
|
||||
.\" HTML <a href="pcreapi.html#extradata">
|
||||
.\" </a>
|
||||
section on matching a pattern
|
||||
.\"
|
||||
in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
If you did not provide custom character tables when the pattern was compiled,
|
||||
the pointer in the compiled pattern is NULL, which causes \fBpcre_exec()\fP to
|
||||
use PCRE's internal tables. Thus, you do not need to take any special action at
|
||||
run time in this case.
|
||||
.P
|
||||
If you saved study data with the compiled pattern, you need to create your own
|
||||
\fBpcre_extra\fP data block and set the \fIstudy_data\fP field to point to the
|
||||
reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
|
||||
\fIflags\fP field to indicate that study data is present. Then pass the
|
||||
\fBpcre_extra\fP block to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP in the
|
||||
usual way.
|
||||
.
|
||||
.
|
||||
.SH "COMPATIBILITY WITH DIFFERENT PCRE RELEASES"
|
||||
.rs
|
||||
.sp
|
||||
The layout of the control block that is at the start of the data that makes up
|
||||
a compiled pattern was changed for release 5.0. If you have any saved patterns
|
||||
that were compiled with previous releases (not a facility that was previously
|
||||
advertised), you will have to recompile them for release 5.0. However, from now
|
||||
on, it should be possible to make changes in a compatible manner.
|
||||
.P
|
||||
Notwithstanding the above, if you have any saved patterns in UTF-8 mode that
|
||||
use \ep or \eP that were compiled with any release up to and including 6.4, you
|
||||
will have to recompile them for release 6.5 and above.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 01 February 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,66 @@
|
|||
.TH PCRESAMPLE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE SAMPLE PROGRAM"
|
||||
.rs
|
||||
.sp
|
||||
A simple, complete demonstration program, to get you started with using PCRE,
|
||||
is supplied in the file \fIpcredemo.c\fP in the PCRE distribution.
|
||||
.P
|
||||
The program compiles the regular expression that is its first argument, and
|
||||
matches it against the subject string in its second argument. No PCRE options
|
||||
are set, and default character tables are used. If matching succeeds, the
|
||||
program outputs the portion of the subject that matched, together with the
|
||||
contents of any captured substrings.
|
||||
.P
|
||||
If the -g option is given on the command line, the program then goes on to
|
||||
check for further matches of the same regular expression in the same subject
|
||||
string. The logic is a little bit tricky because of the possibility of matching
|
||||
an empty string. Comments in the code explain what is going on.
|
||||
.P
|
||||
If PCRE is installed in the standard include and library directories for your
|
||||
system, you should be able to compile the demonstration program using this
|
||||
command:
|
||||
.sp
|
||||
gcc -o pcredemo pcredemo.c -lpcre
|
||||
.sp
|
||||
If PCRE is installed elsewhere, you may need to add additional options to the
|
||||
command line. For example, on a Unix-like system that has PCRE installed in
|
||||
\fI/usr/local\fP, you can compile the demonstration program using a command
|
||||
like this:
|
||||
.sp
|
||||
.\" JOINSH
|
||||
gcc -o pcredemo -I/usr/local/include pcredemo.c \e
|
||||
-L/usr/local/lib -lpcre
|
||||
.sp
|
||||
Once you have compiled the demonstration program, you can run simple tests like
|
||||
this:
|
||||
.sp
|
||||
./pcredemo 'cat|dog' 'the cat sat on the mat'
|
||||
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
|
||||
.sp
|
||||
Note that there is a much more comprehensive test program, called
|
||||
.\" HREF
|
||||
\fBpcretest\fP,
|
||||
.\"
|
||||
which supports many more facilities for testing regular expressions and the
|
||||
PCRE library. The \fBpcredemo\fP program is provided as a simple coding
|
||||
example.
|
||||
.P
|
||||
On some operating systems (e.g. Solaris), when PCRE is not installed in the
|
||||
standard library directory, you may get an error like this when you try to run
|
||||
\fBpcredemo\fP:
|
||||
.sp
|
||||
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
|
||||
.sp
|
||||
This is caused by the way shared library support works on those systems. You
|
||||
need to add
|
||||
.sp
|
||||
-R/usr/local/lib
|
||||
.sp
|
||||
(for example) to the compile command to get round this problem.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 09 September 2004
|
||||
.br
|
||||
Copyright (c) 1997-2004 University of Cambridge.
|
|
@ -0,0 +1,115 @@
|
|||
.TH PCRESTACK 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE DISCUSSION OF STACK USAGE"
|
||||
.rs
|
||||
.sp
|
||||
When you call \fBpcre_exec()\fP, it makes use of an internal function called
|
||||
\fBmatch()\fP. This calls itself recursively at branch points in the pattern,
|
||||
in order to remember the state of the match so that it can back up and try a
|
||||
different alternative if the first one fails. As matching proceeds deeper and
|
||||
deeper into the tree of possibilities, the recursion depth increases.
|
||||
.P
|
||||
Not all calls of \fBmatch()\fP increase the recursion depth; for an item such
|
||||
as a* it may be called several times at the same level, after matching
|
||||
different numbers of a's. Furthermore, in a number of cases where the result of
|
||||
the recursive call would immediately be passed back as the result of the
|
||||
current call (a "tail recursion"), the function is just restarted instead.
|
||||
.P
|
||||
The \fBpcre_dfa_exec()\fP function operates in an entirely different way, and
|
||||
hardly uses recursion at all. The limit on its complexity is the amount of
|
||||
workspace it is given. The comments that follow do NOT apply to
|
||||
\fBpcre_dfa_exec()\fP; they are relevant only for \fBpcre_exec()\fP.
|
||||
.P
|
||||
You can set limits on the number of times that \fBmatch()\fP is called, both in
|
||||
total and recursively. If the limit is exceeded, an error occurs. For details,
|
||||
see the
|
||||
.\" HTML <a href="pcreapi.html#extradata">
|
||||
.\" </a>
|
||||
section on extra data for \fBpcre_exec()\fP
|
||||
.\"
|
||||
in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
Each time that \fBmatch()\fP is actually called recursively, it uses memory
|
||||
from the process stack. For certain kinds of pattern and data, very large
|
||||
amounts of stack may be needed, despite the recognition of "tail recursion".
|
||||
You can often reduce the amount of recursion, and therefore the amount of stack
|
||||
used, by modifying the pattern that is being matched. Consider, for example,
|
||||
this pattern:
|
||||
.sp
|
||||
([^<]|<(?!inet))+
|
||||
.sp
|
||||
It matches from wherever it starts until it encounters "<inet" or the end of
|
||||
the data, and is the kind of pattern that might be used when processing an XML
|
||||
file. Each iteration of the outer parentheses matches either one character that
|
||||
is not "<" or a "<" that is not followed by "inet". However, each time a
|
||||
parenthesis is processed, a recursion occurs, so this formulation uses a stack
|
||||
frame for each matched character. For a long string, a lot of stack is
|
||||
required. Consider now this rewritten pattern, which matches exactly the same
|
||||
strings:
|
||||
.sp
|
||||
([^<]++|<(?!inet))
|
||||
.sp
|
||||
This uses very much less stack, because runs of characters that do not contain
|
||||
"<" are "swallowed" in one item inside the parentheses. Recursion happens only
|
||||
when a "<" character that is not followed by "inet" is encountered (and we
|
||||
assume this is relatively rare). A possessive quantifier is used to stop any
|
||||
backtracking into the runs of non-"<" characters, but that is not related to
|
||||
stack usage.
|
||||
.P
|
||||
In environments where stack memory is constrained, you might want to compile
|
||||
PCRE to use heap memory instead of stack for remembering back-up points. This
|
||||
makes it run a lot more slowly, however. Details of how to do this are given in
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcrebuild\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
In Unix-like environments, there is not often a problem with the stack, though
|
||||
the default limit on stack size varies from system to system. Values from 8Mb
|
||||
to 64Mb are common. You can find your default limit by running the command:
|
||||
.sp
|
||||
ulimit -s
|
||||
.sp
|
||||
The effect of running out of stack is often SIGSEGV, though sometimes an error
|
||||
message is given. You can normally increase the limit on stack size by code
|
||||
such as this:
|
||||
.sp
|
||||
struct rlimit rlim;
|
||||
getrlimit(RLIMIT_STACK, &rlim);
|
||||
rlim.rlim_cur = 100*1024*1024;
|
||||
setrlimit(RLIMIT_STACK, &rlim);
|
||||
.sp
|
||||
This reads the current limits (soft and hard) using \fBgetrlimit()\fP, then
|
||||
attempts to increase the soft limit to 100Mb using \fBsetrlimit()\fP. You must
|
||||
do this before calling \fBpcre_exec()\fP.
|
||||
.P
|
||||
PCRE has an internal counter that can be used to limit the depth of recursion,
|
||||
and thus cause \fBpcre_exec()\fP to give an error code before it runs out of
|
||||
stack. By default, the limit is very large, and unlikely ever to operate. It
|
||||
can be changed when PCRE is built, and it can also be set when
|
||||
\fBpcre_exec()\fP is called. For details of these interfaces, see the
|
||||
.\" HREF
|
||||
\fBpcrebuild\fP
|
||||
.\"
|
||||
and
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
As a very rough rule of thumb, you should reckon on about 500 bytes per
|
||||
recursion. Thus, if you want to limit your stack usage to 8Mb, you
|
||||
should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
|
||||
support around 128000 recursions. The \fBpcretest\fP test program has a command
|
||||
line option (\fB-S\fP) that can be used to increase its stack.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 29 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,631 @@
|
|||
.TH PCRETEST 1
|
||||
.SH NAME
|
||||
pcretest - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B pcretest "[options] [source] [destination]"
|
||||
.sp
|
||||
\fBpcretest\fP was written as a test program for the PCRE regular expression
|
||||
library itself, but it can also be used for experimenting with regular
|
||||
expressions. This document describes the features of the test program; for
|
||||
details of the regular expressions themselves, see the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fP
|
||||
.\"
|
||||
documentation. For details of the PCRE library function calls and their
|
||||
options, see the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SH OPTIONS
|
||||
.rs
|
||||
.TP 10
|
||||
\fB-C\fP
|
||||
Output the version number of the PCRE library, and all available information
|
||||
about the optional features that are included, and then exit.
|
||||
.TP 10
|
||||
\fB-d\fP
|
||||
Behave as if each regex has the \fB/D\fP (debug) modifier; the internal
|
||||
form is output after compilation.
|
||||
.TP 10
|
||||
\fB-dfa\fP
|
||||
Behave as if each data line contains the \eD escape sequence; this causes the
|
||||
alternative matching function, \fBpcre_dfa_exec()\fP, to be used instead of the
|
||||
standard \fBpcre_exec()\fP function (more detail is given below).
|
||||
.TP 10
|
||||
\fB-i\fP
|
||||
Behave as if each regex has the \fB/I\fP modifier; information about the
|
||||
compiled pattern is given after compilation.
|
||||
.TP 10
|
||||
\fB-m\fP
|
||||
Output the size of each compiled pattern after it has been compiled. This is
|
||||
equivalent to adding \fB/M\fP to each regular expression. For compatibility
|
||||
with earlier versions of pcretest, \fB-s\fP is a synonym for \fB-m\fP.
|
||||
.TP 10
|
||||
\fB-o\fP \fIosize\fP
|
||||
Set the number of elements in the output vector that is used when calling
|
||||
\fBpcre_exec()\fP to be \fIosize\fP. The default value is 45, which is enough
|
||||
for 14 capturing subexpressions. The vector size can be changed for individual
|
||||
matching calls by including \eO in the data line (see below).
|
||||
.TP 10
|
||||
\fB-p\fP
|
||||
Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is
|
||||
used to call PCRE. None of the other options has any effect when \fB-p\fP is
|
||||
set.
|
||||
.TP 10
|
||||
\fB-q\fP
|
||||
Do not output the version number of \fBpcretest\fP at the start of execution.
|
||||
.TP 10
|
||||
\fB-S\fP \fIsize\fP
|
||||
On Unix-like systems, set the size of the runtime stack to \fIsize\fP
|
||||
megabytes.
|
||||
.TP 10
|
||||
\fB-t\fP
|
||||
Run each compile, study, and match many times with a timer, and output
|
||||
resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with
|
||||
\fB-t\fP, because you will then get the size output a zillion times, and the
|
||||
timing will be distorted.
|
||||
.
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
If \fBpcretest\fP is given two filename arguments, it reads from the first and
|
||||
writes to the second. If it is given only one filename argument, it reads from
|
||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||
stdout, and prompts for each line of input, using "re>" to prompt for regular
|
||||
expressions, and "data>" to prompt for data lines.
|
||||
.P
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern.
|
||||
.P
|
||||
Each data line is matched separately and independently. If you want to do
|
||||
multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
|
||||
depending on the newline setting) in a single line of input to encode the
|
||||
newline characters. There is no limit on the length of data lines; the input
|
||||
buffer is automatically extended if it is too small.
|
||||
.P
|
||||
An empty line signals the end of the data lines, at which point a new regular
|
||||
expression is read. The regular expressions are given enclosed in any
|
||||
non-alphanumeric delimiters other than backslash, for example:
|
||||
.sp
|
||||
/(a|bc)x+yz/
|
||||
.sp
|
||||
White space before the initial delimiter is ignored. A regular expression may
|
||||
be continued over several input lines, in which case the newline characters are
|
||||
included within it. It is possible to include the delimiter within the pattern
|
||||
by escaping it, for example
|
||||
.sp
|
||||
/abc\e/def/
|
||||
.sp
|
||||
If you do so, the escape and the delimiter form part of the pattern, but since
|
||||
delimiters are always non-alphanumeric, this does not affect its interpretation.
|
||||
If the terminating delimiter is immediately followed by a backslash, for
|
||||
example,
|
||||
.sp
|
||||
/abc/\e
|
||||
.sp
|
||||
then a backslash is added to the end of the pattern. This is done to provide a
|
||||
way of testing the error condition that arises if a pattern finishes with a
|
||||
backslash, because
|
||||
.sp
|
||||
/abc\e/
|
||||
.sp
|
||||
is interpreted as the first line of a pattern that starts with "abc/", causing
|
||||
pcretest to read the next line as a continuation of the regular expression.
|
||||
.
|
||||
.
|
||||
.SH "PATTERN MODIFIERS"
|
||||
.rs
|
||||
.sp
|
||||
A pattern may be followed by any number of modifiers, which are mostly single
|
||||
characters. Following Perl usage, these are referred to below as, for example,
|
||||
"the \fB/i\fP modifier", even though the delimiter of the pattern need not
|
||||
always be a slash, and no slash is used when writing modifiers. Whitespace may
|
||||
appear between the final pattern delimiter and the first modifier, and between
|
||||
the modifiers themselves.
|
||||
.P
|
||||
The \fB/i\fP, \fB/m\fP, \fB/s\fP, and \fB/x\fP modifiers set the PCRE_CASELESS,
|
||||
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
|
||||
\fBpcre_compile()\fP is called. These four modifier letters have the same
|
||||
effect as they do in Perl. For example:
|
||||
.sp
|
||||
/caseless/i
|
||||
.sp
|
||||
The following table shows additional modifiers for setting PCRE options that do
|
||||
not correspond to anything in Perl:
|
||||
.sp
|
||||
\fB/A\fP PCRE_ANCHORED
|
||||
\fB/C\fP PCRE_AUTO_CALLOUT
|
||||
\fB/E\fP PCRE_DOLLAR_ENDONLY
|
||||
\fB/f\fP PCRE_FIRSTLINE
|
||||
\fB/J\fP PCRE_DUPNAMES
|
||||
\fB/N\fP PCRE_NO_AUTO_CAPTURE
|
||||
\fB/U\fP PCRE_UNGREEDY
|
||||
\fB/X\fP PCRE_EXTRA
|
||||
\fB/<cr>\fP PCRE_NEWLINE_CR
|
||||
\fB/<lf>\fP PCRE_NEWLINE_LF
|
||||
\fB/<crlf>\fP PCRE_NEWLINE_CRLF
|
||||
.sp
|
||||
Those specifying line endings are literal strings as shown. Details of the
|
||||
meanings of these PCRE options are given in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SS "Finding all matches in a string"
|
||||
.rs
|
||||
.sp
|
||||
Searching for all possible matches within each subject string can be requested
|
||||
by the \fB/g\fP or \fB/G\fP modifier. After finding a match, PCRE is called
|
||||
again to search the remainder of the subject string. The difference between
|
||||
\fB/g\fP and \fB/G\fP is that the former uses the \fIstartoffset\fP argument to
|
||||
\fBpcre_exec()\fP to start searching at a new point within the entire string
|
||||
(which is in effect what Perl does), whereas the latter passes over a shortened
|
||||
substring. This makes a difference to the matching process if the pattern
|
||||
begins with a lookbehind assertion (including \eb or \eB).
|
||||
.P
|
||||
If any call to \fBpcre_exec()\fP in a \fB/g\fP or \fB/G\fP sequence matches an
|
||||
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
|
||||
flags set in order to search for another, non-empty, match at the same point.
|
||||
If this second match fails, the start offset is advanced by one, and the normal
|
||||
match is retried. This imitates the way Perl handles such cases when using the
|
||||
\fB/g\fP modifier or the \fBsplit()\fP function.
|
||||
.
|
||||
.
|
||||
.SS "Other modifiers"
|
||||
.rs
|
||||
.sp
|
||||
There are yet more modifiers for controlling the way \fBpcretest\fP
|
||||
operates.
|
||||
.P
|
||||
The \fB/+\fP modifier requests that as well as outputting the substring that
|
||||
matched the entire pattern, pcretest should in addition output the remainder of
|
||||
the subject string. This is useful for tests where the subject contains
|
||||
multiple copies of the same substring.
|
||||
.P
|
||||
The \fB/L\fP modifier must be followed directly by the name of a locale, for
|
||||
example,
|
||||
.sp
|
||||
/pattern/Lfr_FR
|
||||
.sp
|
||||
For this reason, it must be the last modifier. The given locale is set,
|
||||
\fBpcre_maketables()\fP is called to build a set of character tables for the
|
||||
locale, and this is then passed to \fBpcre_compile()\fP when compiling the
|
||||
regular expression. Without an \fB/L\fP modifier, NULL is passed as the tables
|
||||
pointer; that is, \fB/L\fP applies only to the expression on which it appears.
|
||||
.P
|
||||
The \fB/I\fP modifier requests that \fBpcretest\fP output information about the
|
||||
compiled pattern (whether it is anchored, has a fixed first character, and
|
||||
so on). It does this by calling \fBpcre_fullinfo()\fP after compiling a
|
||||
pattern. If the pattern is studied, the results of that are also output.
|
||||
.P
|
||||
The \fB/D\fP modifier is a PCRE debugging feature, which also assumes \fB/I\fP.
|
||||
It causes the internal form of compiled regular expressions to be output after
|
||||
compilation. If the pattern was studied, the information returned is also
|
||||
output.
|
||||
.P
|
||||
The \fB/F\fP modifier causes \fBpcretest\fP to flip the byte order of the
|
||||
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This
|
||||
facility is for testing the feature in PCRE that allows it to execute patterns
|
||||
that were compiled on a host with a different endianness. This feature is not
|
||||
available when the POSIX interface to PCRE is being used, that is, when the
|
||||
\fB/P\fP pattern modifier is specified. See also the section about saving and
|
||||
reloading compiled patterns below.
|
||||
.P
|
||||
The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the
|
||||
expression has been compiled, and the results used when the expression is
|
||||
matched.
|
||||
.P
|
||||
The \fB/M\fP modifier causes the size of memory block used to hold the compiled
|
||||
pattern to be output.
|
||||
.P
|
||||
The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper
|
||||
API rather than its native API. When this is done, all other modifiers except
|
||||
\fB/i\fP, \fB/m\fP, and \fB/+\fP are ignored. REG_ICASE is set if \fB/i\fP is
|
||||
present, and REG_NEWLINE is set if \fB/m\fP is present. The wrapper functions
|
||||
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
.P
|
||||
The \fB/8\fP modifier causes \fBpcretest\fP to call PCRE with the PCRE_UTF8
|
||||
option set. This turns on support for UTF-8 character handling in PCRE,
|
||||
provided that it was compiled with this support enabled. This modifier also
|
||||
causes any non-printing characters in output strings to be printed using the
|
||||
\ex{hh...} notation if they are valid UTF-8 sequences.
|
||||
.P
|
||||
If the \fB/?\fP modifier is used with \fB/8\fP, it causes \fBpcretest\fP to
|
||||
call \fBpcre_compile()\fP with the PCRE_NO_UTF8_CHECK option, to suppress the
|
||||
checking of the string for UTF-8 validity.
|
||||
.
|
||||
.
|
||||
.SH "DATA LINES"
|
||||
.rs
|
||||
.sp
|
||||
Before each data line is passed to \fBpcre_exec()\fP, leading and trailing
|
||||
whitespace is removed, and it is then scanned for \e escapes. Some of these are
|
||||
pretty esoteric features, intended for checking out some of the more
|
||||
complicated features of PCRE. If you are just testing "ordinary" regular
|
||||
expressions, you probably don't need any of these. The following escapes are
|
||||
recognized:
|
||||
.sp
|
||||
\ea alarm (= BEL)
|
||||
\eb backspace
|
||||
\ee escape
|
||||
\ef formfeed
|
||||
\en newline
|
||||
.\" JOIN
|
||||
\eqdd set the PCRE_MATCH_LIMIT limit to dd
|
||||
(any number of digits)
|
||||
\er carriage return
|
||||
\et tab
|
||||
\ev vertical tab
|
||||
\ennn octal character (up to 3 octal digits)
|
||||
\exhh hexadecimal character (up to 2 hex digits)
|
||||
.\" JOIN
|
||||
\ex{hh...} hexadecimal character, any number of digits
|
||||
in UTF-8 mode
|
||||
.\" JOIN
|
||||
\eA pass the PCRE_ANCHORED option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eB pass the PCRE_NOTBOL option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eCdd call pcre_copy_substring() for substring dd
|
||||
after a successful match (number less than 32)
|
||||
.\" JOIN
|
||||
\eCname call pcre_copy_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non alphanumeric character)
|
||||
.\" JOIN
|
||||
\eC+ show the current captured substrings at callout
|
||||
time
|
||||
\eC- do not supply a callout function
|
||||
.\" JOIN
|
||||
\eC!n return 1 instead of 0 when callout number n is
|
||||
reached
|
||||
.\" JOIN
|
||||
\eC!n!m return 1 instead of 0 when callout number n is
|
||||
reached for the nth time
|
||||
.\" JOIN
|
||||
\eC*n pass the number n (may be negative) as callout
|
||||
data; this is used as the callout return value
|
||||
\eD use the \fBpcre_dfa_exec()\fP match function
|
||||
\eF only shortest match for \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eGdd call pcre_get_substring() for substring dd
|
||||
after a successful match (number less than 32)
|
||||
.\" JOIN
|
||||
\eGname call pcre_get_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non-alphanumeric character)
|
||||
.\" JOIN
|
||||
\eL call pcre_get_substringlist() after a
|
||||
successful match
|
||||
.\" JOIN
|
||||
\eM discover the minimum MATCH_LIMIT and
|
||||
MATCH_LIMIT_RECURSION settings
|
||||
.\" JOIN
|
||||
\eN pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eOdd set the size of the output vector passed to
|
||||
\fBpcre_exec()\fP to dd (any number of digits)
|
||||
.\" JOIN
|
||||
\eP pass the PCRE_PARTIAL option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eQdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd
|
||||
(any number of digits)
|
||||
\eR pass the PCRE_DFA_RESTART option to \fBpcre_dfa_exec()\fP
|
||||
\eS output details of memory get/free calls during matching
|
||||
.\" JOIN
|
||||
\eZ pass the PCRE_NOTEOL option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\e? pass the PCRE_NO_UTF8_CHECK option to
|
||||
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP
|
||||
\e>dd start the match at offset dd (any number of digits);
|
||||
.\" JOIN
|
||||
this sets the \fIstartoffset\fP argument for \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\e<cr> pass the PCRE_NEWLINE_CR option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\e<lf> pass the PCRE_NEWLINE_LF option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\e<crlf> pass the PCRE_NEWLINE_CRLF option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.sp
|
||||
The escapes that specify line endings are literal strings, exactly as shown.
|
||||
A backslash followed by anything else just escapes the anything else. If the
|
||||
very last character is a backslash, it is ignored. This gives a way of passing
|
||||
an empty line as data, since a real empty line terminates the data input.
|
||||
.P
|
||||
If \eM is present, \fBpcretest\fP calls \fBpcre_exec()\fP several times, with
|
||||
different values in the \fImatch_limit\fP and \fImatch_limit_recursion\fP
|
||||
fields of the \fBpcre_extra\fP data structure, until it finds the minimum
|
||||
numbers for each parameter that allow \fBpcre_exec()\fP to complete. The
|
||||
\fImatch_limit\fP number is a measure of the amount of backtracking that takes
|
||||
place, and checking it out can be instructive. For most simple matches, the
|
||||
number is quite small, but for patterns with very large numbers of matching
|
||||
possibilities, it can become large very quickly with increasing length of
|
||||
subject string. The \fImatch_limit_recursion\fP number is a measure of how much
|
||||
stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
|
||||
to complete the match attempt.
|
||||
.P
|
||||
When \eO is used, the value specified may be higher or lower than the size set
|
||||
by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to
|
||||
the call of \fBpcre_exec()\fP for the line in which it appears.
|
||||
.P
|
||||
If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper
|
||||
API to be used, the only option-setting sequences that have any effect are \eB
|
||||
and \eZ, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
|
||||
\fBregexec()\fP.
|
||||
.P
|
||||
The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use
|
||||
of the \fB/8\fP modifier on the pattern. It is recognized always. There may be
|
||||
any number of hexadecimal digits inside the braces. The result is from one to
|
||||
six bytes, encoded according to the UTF-8 rules.
|
||||
.
|
||||
.
|
||||
.SH "THE ALTERNATIVE MATCHING FUNCTION"
|
||||
.rs
|
||||
.sp
|
||||
By default, \fBpcretest\fP uses the standard PCRE matching function,
|
||||
\fBpcre_exec()\fP to match each data line. From release 6.0, PCRE supports an
|
||||
alternative matching function, \fBpcre_dfa_test()\fP, which operates in a
|
||||
different way, and has some restrictions. The differences between the two
|
||||
functions are described in the
|
||||
.\" HREF
|
||||
\fBpcrematching\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
If a data line contains the \eD escape sequence, or if the command line
|
||||
contains the \fB-dfa\fP option, the alternative matching function is called.
|
||||
This function finds all possible matches at a given point. If, however, the \eF
|
||||
escape sequence is present in the data line, it stops after the first match is
|
||||
found. This is always the shortest possible match.
|
||||
.
|
||||
.
|
||||
.SH "DEFAULT OUTPUT FROM PCRETEST"
|
||||
.rs
|
||||
.sp
|
||||
This section describes the output when the normal matching function,
|
||||
\fBpcre_exec()\fP, is being used.
|
||||
.P
|
||||
When a match succeeds, pcretest outputs the list of captured substrings that
|
||||
\fBpcre_exec()\fP returns, starting with number 0 for the string that matched
|
||||
the whole pattern. Otherwise, it outputs "No match" or "Partial match"
|
||||
when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
|
||||
respectively, and otherwise the PCRE negative error number. Here is an example
|
||||
of an interactive \fBpcretest\fP run.
|
||||
.sp
|
||||
$ pcretest
|
||||
PCRE version 5.00 07-Sep-2004
|
||||
.sp
|
||||
re> /^abc(\ed+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
.sp
|
||||
If the strings contain any non-printing characters, they are output as \e0x
|
||||
escapes, or as \ex{...} escapes if the \fB/8\fP modifier was present on the
|
||||
pattern. If the pattern has the \fB/+\fP modifier, the output for substring 0
|
||||
is followed by the the rest of the subject string, identified by "0+" like
|
||||
this:
|
||||
.sp
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
.sp
|
||||
If the pattern has the \fB/g\fP or \fB/G\fP modifier, the results of successive
|
||||
matching attempts are output in sequence, like this:
|
||||
.sp
|
||||
re> /\eBi(\ew\ew)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
.sp
|
||||
"No match" is output only if the first match attempt fails.
|
||||
.P
|
||||
If any of the sequences \fB\eC\fP, \fB\eG\fP, or \fB\eL\fP are present in a
|
||||
data line that is successfully matched, the substrings extracted by the
|
||||
convenience functions are output with C, G, or L after the string number
|
||||
instead of a colon. This is in addition to the normal full list. The string
|
||||
length (that is, the return from the extraction function) is given in
|
||||
parentheses after each string for \fB\eC\fP and \fB\eG\fP.
|
||||
.P
|
||||
Note that while patterns can be continued over several lines (a plain ">"
|
||||
prompt is used for continuations), data lines may not. However newlines can be
|
||||
included in data by means of the \en escape (or \er or \er\en for those newline
|
||||
settings).
|
||||
.
|
||||
.
|
||||
.SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION"
|
||||
.rs
|
||||
.sp
|
||||
When the alternative matching function, \fBpcre_dfa_exec()\fP, is used (by
|
||||
means of the \eD escape sequence or the \fB-dfa\fP command line option), the
|
||||
output consists of a list of all the matches that start at the first point in
|
||||
the subject where there is at least one match. For example:
|
||||
.sp
|
||||
re> /(tang|tangerine|tan)/
|
||||
data> yellow tangerine\eD
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
.sp
|
||||
(Using the normal matching function on this data finds only "tang".) The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
.P
|
||||
If \fB/g\P is present on the pattern, the search for further matches resumes
|
||||
at the end of the longest match. For example:
|
||||
.sp
|
||||
re> /(tang|tangerine|tan)/g
|
||||
data> yellow tangerine and tangy sultana\eD
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
0: tang
|
||||
1: tan
|
||||
0: tan
|
||||
.sp
|
||||
Since the matching function does not support substring capture, the escape
|
||||
sequences that are concerned with captured substrings are not relevant.
|
||||
.
|
||||
.
|
||||
.SH "RESTARTING AFTER A PARTIAL MATCH"
|
||||
.rs
|
||||
.sp
|
||||
When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
|
||||
indicating that the subject partially matched the pattern, you can restart the
|
||||
match with additional subject data by means of the \eR escape sequence. For
|
||||
example:
|
||||
.sp
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 23ja\eP\eD
|
||||
Partial match: 23ja
|
||||
data> n05\eR\eD
|
||||
0: n05
|
||||
.sp
|
||||
For further information about partial matching, see the
|
||||
.\" HREF
|
||||
\fBpcrepartial\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SH CALLOUTS
|
||||
.rs
|
||||
.sp
|
||||
If the pattern contains any callout requests, \fBpcretest\fP's callout function
|
||||
is called during matching. This works with both matching functions. By default,
|
||||
the called function displays the callout number, the start and current
|
||||
positions in the text at the callout time, and the next pattern item to be
|
||||
tested. For example, the output
|
||||
.sp
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \ed
|
||||
.sp
|
||||
indicates that callout number 0 occurred for a match attempt starting at the
|
||||
fourth character of the subject string, when the pointer was at the seventh
|
||||
character of the data, and when the next pattern item was \ed. Just one
|
||||
circumflex is output if the start and current positions are the same.
|
||||
.P
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
|
||||
result of the \fB/C\fP pattern modifier. In this case, instead of showing the
|
||||
callout number, the offset in the pattern, preceded by a plus, is output. For
|
||||
example:
|
||||
.sp
|
||||
re> /\ed?[A-E]\e*/C
|
||||
data> E*
|
||||
--->E*
|
||||
+0 ^ \ed?
|
||||
+3 ^ [A-E]
|
||||
+8 ^^ \e*
|
||||
+10 ^ ^
|
||||
0: E*
|
||||
.sp
|
||||
The callout function in \fBpcretest\fP returns zero (carry on matching) by
|
||||
default, but you can use a \eC item in a data line (as described above) to
|
||||
change this.
|
||||
.P
|
||||
Inserting callouts can be helpful when using \fBpcretest\fP to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcrecallout\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SH "SAVING AND RELOADING COMPILED PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
The facilities described in this section are not available when the POSIX
|
||||
inteface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is
|
||||
specified.
|
||||
.P
|
||||
When the POSIX interface is not in use, you can cause \fBpcretest\fP to write a
|
||||
compiled pattern to a file, by following the modifiers with > and a file name.
|
||||
For example:
|
||||
.sp
|
||||
/pattern/im >/some/file
|
||||
.sp
|
||||
See the
|
||||
.\" HREF
|
||||
\fBpcreprecompile\fP
|
||||
.\"
|
||||
documentation for a discussion about saving and re-using compiled patterns.
|
||||
.P
|
||||
The data that is written is binary. The first eight bytes are the length of the
|
||||
compiled pattern data followed by the length of the optional study data, each
|
||||
written as four bytes in big-endian order (most significant byte first). If
|
||||
there is no study data (either the pattern was not studied, or studying did not
|
||||
return any data), the second length is zero. The lengths are followed by an
|
||||
exact copy of the compiled pattern. If there is additional study data, this
|
||||
follows immediately after the compiled pattern. After writing the file,
|
||||
\fBpcretest\fP expects to read a new pattern.
|
||||
.P
|
||||
A saved pattern can be reloaded into \fBpcretest\fP by specifing < and a file
|
||||
name instead of a pattern. The name of the file must not contain a < character,
|
||||
as otherwise \fBpcretest\fP will interpret the line as a pattern delimited by <
|
||||
characters.
|
||||
For example:
|
||||
.sp
|
||||
re> </some/file
|
||||
Compiled regex loaded from /some/file
|
||||
No study data
|
||||
.sp
|
||||
When the pattern has been loaded, \fBpcretest\fP proceeds to read data lines in
|
||||
the usual way.
|
||||
.P
|
||||
You can copy a file written by \fBpcretest\fP to a different host and reload it
|
||||
there, even if the new host has opposite endianness to the one on which the
|
||||
pattern was compiled. For example, you can compile on an i86 machine and run on
|
||||
a SPARC machine.
|
||||
.P
|
||||
File names for saving and reloading can be absolute or relative, but note that
|
||||
the shell facility of expanding a file name that starts with a tilde (~) is not
|
||||
available.
|
||||
.P
|
||||
The ability to save and reload files in \fBpcretest\fP is intended for testing
|
||||
and experimentation. It is not intended for production use because only a
|
||||
single pattern can be written to a file. Furthermore, there is no facility for
|
||||
supplying custom character tables for use with a reloaded pattern. If the
|
||||
original pattern was compiled with custom tables, an attempt to match a subject
|
||||
string using a reloaded pattern is likely to cause \fBpcretest\fP to crash.
|
||||
Finally, if you attempt to load a file that is not in the correct format, the
|
||||
result is undefined.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 29 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,569 @@
|
|||
PCRETEST(1) PCRETEST(1)
|
||||
|
||||
|
||||
NAME
|
||||
pcretest - a program for testing Perl-compatible regular expressions.
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
|
||||
pcretest [options] [source] [destination]
|
||||
|
||||
pcretest was written as a test program for the PCRE regular expression
|
||||
library itself, but it can also be used for experimenting with regular
|
||||
expressions. This document describes the features of the test program;
|
||||
for details of the regular expressions themselves, see the pcrepattern
|
||||
documentation. For details of the PCRE library function calls and their
|
||||
options, see the pcreapi documentation.
|
||||
|
||||
|
||||
OPTIONS
|
||||
|
||||
-C Output the version number of the PCRE library, and all avail-
|
||||
able information about the optional features that are
|
||||
included, and then exit.
|
||||
|
||||
-d Behave as if each regex has the /D (debug) modifier; the
|
||||
internal form is output after compilation.
|
||||
|
||||
-dfa Behave as if each data line contains the \D escape sequence;
|
||||
this causes the alternative matching function,
|
||||
pcre_dfa_exec(), to be used instead of the standard
|
||||
pcre_exec() function (more detail is given below).
|
||||
|
||||
-i Behave as if each regex has the /I modifier; information
|
||||
about the compiled pattern is given after compilation.
|
||||
|
||||
-m Output the size of each compiled pattern after it has been
|
||||
compiled. This is equivalent to adding /M to each regular
|
||||
expression. For compatibility with earlier versions of
|
||||
pcretest, -s is a synonym for -m.
|
||||
|
||||
-o osize Set the number of elements in the output vector that is used
|
||||
when calling pcre_exec() to be osize. The default value is
|
||||
45, which is enough for 14 capturing subexpressions. The vec-
|
||||
tor size can be changed for individual matching calls by
|
||||
including \O in the data line (see below).
|
||||
|
||||
-p Behave as if each regex has the /P modifier; the POSIX wrap-
|
||||
per API is used to call PCRE. None of the other options has
|
||||
any effect when -p is set.
|
||||
|
||||
-q Do not output the version number of pcretest at the start of
|
||||
execution.
|
||||
|
||||
-S size On Unix-like systems, set the size of the runtime stack to
|
||||
size megabytes.
|
||||
|
||||
-t Run each compile, study, and match many times with a timer,
|
||||
and output resulting time per compile or match (in millisec-
|
||||
onds). Do not set -m with -t, because you will then get the
|
||||
size output a zillion times, and the timing will be dis-
|
||||
torted.
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
|
||||
If pcretest is given two filename arguments, it reads from the first
|
||||
and writes to the second. If it is given only one filename argument, it
|
||||
reads from that file and writes to stdout. Otherwise, it reads from
|
||||
stdin and writes to stdout, and prompts for each line of input, using
|
||||
"re>" to prompt for regular expressions, and "data>" to prompt for data
|
||||
lines.
|
||||
|
||||
The program handles any number of sets of input on a single input file.
|
||||
Each set starts with a regular expression, and continues with any num-
|
||||
ber of data lines to be matched against the pattern.
|
||||
|
||||
Each data line is matched separately and independently. If you want to
|
||||
do multi-line matches, you have to use the \n escape sequence (or \r or
|
||||
\r\n, depending on the newline setting) in a single line of input to
|
||||
encode the newline characters. There is no limit on the length of data
|
||||
lines; the input buffer is automatically extended if it is too small.
|
||||
|
||||
An empty line signals the end of the data lines, at which point a new
|
||||
regular expression is read. The regular expressions are given enclosed
|
||||
in any non-alphanumeric delimiters other than backslash, for example:
|
||||
|
||||
/(a|bc)x+yz/
|
||||
|
||||
White space before the initial delimiter is ignored. A regular expres-
|
||||
sion may be continued over several input lines, in which case the new-
|
||||
line characters are included within it. It is possible to include the
|
||||
delimiter within the pattern by escaping it, for example
|
||||
|
||||
/abc\/def/
|
||||
|
||||
If you do so, the escape and the delimiter form part of the pattern,
|
||||
but since delimiters are always non-alphanumeric, this does not affect
|
||||
its interpretation. If the terminating delimiter is immediately fol-
|
||||
lowed by a backslash, for example,
|
||||
|
||||
/abc/\
|
||||
|
||||
then a backslash is added to the end of the pattern. This is done to
|
||||
provide a way of testing the error condition that arises if a pattern
|
||||
finishes with a backslash, because
|
||||
|
||||
/abc\/
|
||||
|
||||
is interpreted as the first line of a pattern that starts with "abc/",
|
||||
causing pcretest to read the next line as a continuation of the regular
|
||||
expression.
|
||||
|
||||
|
||||
PATTERN MODIFIERS
|
||||
|
||||
A pattern may be followed by any number of modifiers, which are mostly
|
||||
single characters. Following Perl usage, these are referred to below
|
||||
as, for example, "the /i modifier", even though the delimiter of the
|
||||
pattern need not always be a slash, and no slash is used when writing
|
||||
modifiers. Whitespace may appear between the final pattern delimiter
|
||||
and the first modifier, and between the modifiers themselves.
|
||||
|
||||
The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE,
|
||||
PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre_com-
|
||||
pile() is called. These four modifier letters have the same effect as
|
||||
they do in Perl. For example:
|
||||
|
||||
/caseless/i
|
||||
|
||||
The following table shows additional modifiers for setting PCRE options
|
||||
that do not correspond to anything in Perl:
|
||||
|
||||
/A PCRE_ANCHORED
|
||||
/C PCRE_AUTO_CALLOUT
|
||||
/E PCRE_DOLLAR_ENDONLY
|
||||
/f PCRE_FIRSTLINE
|
||||
/J PCRE_DUPNAMES
|
||||
/N PCRE_NO_AUTO_CAPTURE
|
||||
/U PCRE_UNGREEDY
|
||||
/X PCRE_EXTRA
|
||||
/<cr> PCRE_NEWLINE_CR
|
||||
/<lf> PCRE_NEWLINE_LF
|
||||
/<crlf> PCRE_NEWLINE_CRLF
|
||||
|
||||
Those specifying line endings are literal strings as shown. Details of
|
||||
the meanings of these PCRE options are given in the pcreapi documenta-
|
||||
tion.
|
||||
|
||||
Finding all matches in a string
|
||||
|
||||
Searching for all possible matches within each subject string can be
|
||||
requested by the /g or /G modifier. After finding a match, PCRE is
|
||||
called again to search the remainder of the subject string. The differ-
|
||||
ence between /g and /G is that the former uses the startoffset argument
|
||||
to pcre_exec() to start searching at a new point within the entire
|
||||
string (which is in effect what Perl does), whereas the latter passes
|
||||
over a shortened substring. This makes a difference to the matching
|
||||
process if the pattern begins with a lookbehind assertion (including \b
|
||||
or \B).
|
||||
|
||||
If any call to pcre_exec() in a /g or /G sequence matches an empty
|
||||
string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
|
||||
flags set in order to search for another, non-empty, match at the same
|
||||
point. If this second match fails, the start offset is advanced by
|
||||
one, and the normal match is retried. This imitates the way Perl han-
|
||||
dles such cases when using the /g modifier or the split() function.
|
||||
|
||||
Other modifiers
|
||||
|
||||
There are yet more modifiers for controlling the way pcretest operates.
|
||||
|
||||
The /+ modifier requests that as well as outputting the substring that
|
||||
matched the entire pattern, pcretest should in addition output the
|
||||
remainder of the subject string. This is useful for tests where the
|
||||
subject contains multiple copies of the same substring.
|
||||
|
||||
The /L modifier must be followed directly by the name of a locale, for
|
||||
example,
|
||||
|
||||
/pattern/Lfr_FR
|
||||
|
||||
For this reason, it must be the last modifier. The given locale is set,
|
||||
pcre_maketables() is called to build a set of character tables for the
|
||||
locale, and this is then passed to pcre_compile() when compiling the
|
||||
regular expression. Without an /L modifier, NULL is passed as the
|
||||
tables pointer; that is, /L applies only to the expression on which it
|
||||
appears.
|
||||
|
||||
The /I modifier requests that pcretest output information about the
|
||||
compiled pattern (whether it is anchored, has a fixed first character,
|
||||
and so on). It does this by calling pcre_fullinfo() after compiling a
|
||||
pattern. If the pattern is studied, the results of that are also out-
|
||||
put.
|
||||
|
||||
The /D modifier is a PCRE debugging feature, which also assumes /I. It
|
||||
causes the internal form of compiled regular expressions to be output
|
||||
after compilation. If the pattern was studied, the information returned
|
||||
is also output.
|
||||
|
||||
The /F modifier causes pcretest to flip the byte order of the fields in
|
||||
the compiled pattern that contain 2-byte and 4-byte numbers. This
|
||||
facility is for testing the feature in PCRE that allows it to execute
|
||||
patterns that were compiled on a host with a different endianness. This
|
||||
feature is not available when the POSIX interface to PCRE is being
|
||||
used, that is, when the /P pattern modifier is specified. See also the
|
||||
section about saving and reloading compiled patterns below.
|
||||
|
||||
The /S modifier causes pcre_study() to be called after the expression
|
||||
has been compiled, and the results used when the expression is matched.
|
||||
|
||||
The /M modifier causes the size of memory block used to hold the com-
|
||||
piled pattern to be output.
|
||||
|
||||
The /P modifier causes pcretest to call PCRE via the POSIX wrapper API
|
||||
rather than its native API. When this is done, all other modifiers
|
||||
except /i, /m, and /+ are ignored. REG_ICASE is set if /i is present,
|
||||
and REG_NEWLINE is set if /m is present. The wrapper functions force
|
||||
PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
|
||||
The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option
|
||||
set. This turns on support for UTF-8 character handling in PCRE, pro-
|
||||
vided that it was compiled with this support enabled. This modifier
|
||||
also causes any non-printing characters in output strings to be printed
|
||||
using the \x{hh...} notation if they are valid UTF-8 sequences.
|
||||
|
||||
If the /? modifier is used with /8, it causes pcretest to call
|
||||
pcre_compile() with the PCRE_NO_UTF8_CHECK option, to suppress the
|
||||
checking of the string for UTF-8 validity.
|
||||
|
||||
|
||||
DATA LINES
|
||||
|
||||
Before each data line is passed to pcre_exec(), leading and trailing
|
||||
whitespace is removed, and it is then scanned for \ escapes. Some of
|
||||
these are pretty esoteric features, intended for checking out some of
|
||||
the more complicated features of PCRE. If you are just testing "ordi-
|
||||
nary" regular expressions, you probably don't need any of these. The
|
||||
following escapes are recognized:
|
||||
|
||||
\a alarm (= BEL)
|
||||
\b backspace
|
||||
\e escape
|
||||
\f formfeed
|
||||
\n newline
|
||||
\qdd set the PCRE_MATCH_LIMIT limit to dd
|
||||
(any number of digits)
|
||||
\r carriage return
|
||||
\t tab
|
||||
\v vertical tab
|
||||
\nnn octal character (up to 3 octal digits)
|
||||
\xhh hexadecimal character (up to 2 hex digits)
|
||||
\x{hh...} hexadecimal character, any number of digits
|
||||
in UTF-8 mode
|
||||
\A pass the PCRE_ANCHORED option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\B pass the PCRE_NOTBOL option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\Cdd call pcre_copy_substring() for substring dd
|
||||
after a successful match (number less than 32)
|
||||
\Cname call pcre_copy_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non alphanumeric character)
|
||||
\C+ show the current captured substrings at callout
|
||||
time
|
||||
\C- do not supply a callout function
|
||||
\C!n return 1 instead of 0 when callout number n is
|
||||
reached
|
||||
\C!n!m return 1 instead of 0 when callout number n is
|
||||
reached for the nth time
|
||||
\C*n pass the number n (may be negative) as callout
|
||||
data; this is used as the callout return value
|
||||
\D use the pcre_dfa_exec() match function
|
||||
\F only shortest match for pcre_dfa_exec()
|
||||
\Gdd call pcre_get_substring() for substring dd
|
||||
after a successful match (number less than 32)
|
||||
\Gname call pcre_get_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non-alphanumeric character)
|
||||
\L call pcre_get_substringlist() after a
|
||||
successful match
|
||||
\M discover the minimum MATCH_LIMIT and
|
||||
MATCH_LIMIT_RECURSION settings
|
||||
\N pass the PCRE_NOTEMPTY option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\Odd set the size of the output vector passed to
|
||||
pcre_exec() to dd (any number of digits)
|
||||
\P pass the PCRE_PARTIAL option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd
|
||||
(any number of digits)
|
||||
\R pass the PCRE_DFA_RESTART option to pcre_dfa_exec()
|
||||
\S output details of memory get/free calls during matching
|
||||
\Z pass the PCRE_NOTEOL option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\? pass the PCRE_NO_UTF8_CHECK option to
|
||||
pcre_exec() or pcre_dfa_exec()
|
||||
\>dd start the match at offset dd (any number of digits);
|
||||
this sets the startoffset argument for pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\<cr> pass the PCRE_NEWLINE_CR option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\<lf> pass the PCRE_NEWLINE_LF option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\<crlf> pass the PCRE_NEWLINE_CRLF option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
|
||||
The escapes that specify line endings are literal strings, exactly as
|
||||
shown. A backslash followed by anything else just escapes the anything
|
||||
else. If the very last character is a backslash, it is ignored. This
|
||||
gives a way of passing an empty line as data, since a real empty line
|
||||
terminates the data input.
|
||||
|
||||
If \M is present, pcretest calls pcre_exec() several times, with dif-
|
||||
ferent values in the match_limit and match_limit_recursion fields of
|
||||
the pcre_extra data structure, until it finds the minimum numbers for
|
||||
each parameter that allow pcre_exec() to complete. The match_limit num-
|
||||
ber is a measure of the amount of backtracking that takes place, and
|
||||
checking it out can be instructive. For most simple matches, the number
|
||||
is quite small, but for patterns with very large numbers of matching
|
||||
possibilities, it can become large very quickly with increasing length
|
||||
of subject string. The match_limit_recursion number is a measure of how
|
||||
much stack (or, if PCRE is compiled with NO_RECURSE, how much heap)
|
||||
memory is needed to complete the match attempt.
|
||||
|
||||
When \O is used, the value specified may be higher or lower than the
|
||||
size set by the -O command line option (or defaulted to 45); \O applies
|
||||
only to the call of pcre_exec() for the line in which it appears.
|
||||
|
||||
If the /P modifier was present on the pattern, causing the POSIX wrap-
|
||||
per API to be used, the only option-setting sequences that have any
|
||||
effect are \B and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively,
|
||||
to be passed to regexec().
|
||||
|
||||
The use of \x{hh...} to represent UTF-8 characters is not dependent on
|
||||
the use of the /8 modifier on the pattern. It is recognized always.
|
||||
There may be any number of hexadecimal digits inside the braces. The
|
||||
result is from one to six bytes, encoded according to the UTF-8 rules.
|
||||
|
||||
|
||||
THE ALTERNATIVE MATCHING FUNCTION
|
||||
|
||||
By default, pcretest uses the standard PCRE matching function,
|
||||
pcre_exec() to match each data line. From release 6.0, PCRE supports an
|
||||
alternative matching function, pcre_dfa_test(), which operates in a
|
||||
different way, and has some restrictions. The differences between the
|
||||
two functions are described in the pcrematching documentation.
|
||||
|
||||
If a data line contains the \D escape sequence, or if the command line
|
||||
contains the -dfa option, the alternative matching function is called.
|
||||
This function finds all possible matches at a given point. If, however,
|
||||
the \F escape sequence is present in the data line, it stops after the
|
||||
first match is found. This is always the shortest possible match.
|
||||
|
||||
|
||||
DEFAULT OUTPUT FROM PCRETEST
|
||||
|
||||
This section describes the output when the normal matching function,
|
||||
pcre_exec(), is being used.
|
||||
|
||||
When a match succeeds, pcretest outputs the list of captured substrings
|
||||
that pcre_exec() returns, starting with number 0 for the string that
|
||||
matched the whole pattern. Otherwise, it outputs "No match" or "Partial
|
||||
match" when pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PAR-
|
||||
TIAL, respectively, and otherwise the PCRE negative error number. Here
|
||||
is an example of an interactive pcretest run.
|
||||
|
||||
$ pcretest
|
||||
PCRE version 5.00 07-Sep-2004
|
||||
|
||||
re> /^abc(\d+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
|
||||
If the strings contain any non-printing characters, they are output as
|
||||
\0x escapes, or as \x{...} escapes if the /8 modifier was present on
|
||||
the pattern. If the pattern has the /+ modifier, the output for sub-
|
||||
string 0 is followed by the the rest of the subject string, identified
|
||||
by "0+" like this:
|
||||
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
|
||||
If the pattern has the /g or /G modifier, the results of successive
|
||||
matching attempts are output in sequence, like this:
|
||||
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
|
||||
"No match" is output only if the first match attempt fails.
|
||||
|
||||
If any of the sequences \C, \G, or \L are present in a data line that
|
||||
is successfully matched, the substrings extracted by the convenience
|
||||
functions are output with C, G, or L after the string number instead of
|
||||
a colon. This is in addition to the normal full list. The string length
|
||||
(that is, the return from the extraction function) is given in paren-
|
||||
theses after each string for \C and \G.
|
||||
|
||||
Note that while patterns can be continued over several lines (a plain
|
||||
">" prompt is used for continuations), data lines may not. However new-
|
||||
lines can be included in data by means of the \n escape (or \r or \r\n
|
||||
for those newline settings).
|
||||
|
||||
|
||||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||
|
||||
When the alternative matching function, pcre_dfa_exec(), is used (by
|
||||
means of the \D escape sequence or the -dfa command line option), the
|
||||
output consists of a list of all the matches that start at the first
|
||||
point in the subject where there is at least one match. For example:
|
||||
|
||||
re> /(tang|tangerine|tan)/
|
||||
data> yellow tangerine\D
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
|
||||
(Using the normal matching function on this data finds only "tang".)
|
||||
The longest matching string is always given first (and numbered zero).
|
||||
|
||||
If /gP is present on the pattern, the search for further matches
|
||||
resumes at the end of the longest match. For example:
|
||||
|
||||
re> /(tang|tangerine|tan)/g
|
||||
data> yellow tangerine and tangy sultana\D
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
0: tang
|
||||
1: tan
|
||||
0: tan
|
||||
|
||||
Since the matching function does not support substring capture, the
|
||||
escape sequences that are concerned with captured substrings are not
|
||||
relevant.
|
||||
|
||||
|
||||
RESTARTING AFTER A PARTIAL MATCH
|
||||
|
||||
When the alternative matching function has given the PCRE_ERROR_PARTIAL
|
||||
return, indicating that the subject partially matched the pattern, you
|
||||
can restart the match with additional subject data by means of the \R
|
||||
escape sequence. For example:
|
||||
|
||||
re> /^?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/
|
||||
data> 23ja\P\D
|
||||
Partial match: 23ja
|
||||
data> n05\R\D
|
||||
0: n05
|
||||
|
||||
For further information about partial matching, see the pcrepartial
|
||||
documentation.
|
||||
|
||||
|
||||
CALLOUTS
|
||||
|
||||
If the pattern contains any callout requests, pcretest's callout func-
|
||||
tion is called during matching. This works with both matching func-
|
||||
tions. By default, the called function displays the callout number, the
|
||||
start and current positions in the text at the callout time, and the
|
||||
next pattern item to be tested. For example, the output
|
||||
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \d
|
||||
|
||||
indicates that callout number 0 occurred for a match attempt starting
|
||||
at the fourth character of the subject string, when the pointer was at
|
||||
the seventh character of the data, and when the next pattern item was
|
||||
\d. Just one circumflex is output if the start and current positions
|
||||
are the same.
|
||||
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||
a result of the /C pattern modifier. In this case, instead of showing
|
||||
the callout number, the offset in the pattern, preceded by a plus, is
|
||||
output. For example:
|
||||
|
||||
re> /\d?[A-E]\*/C
|
||||
data> E*
|
||||
--->E*
|
||||
+0 ^ \d?
|
||||
+3 ^ [A-E]
|
||||
+8 ^^ \*
|
||||
+10 ^ ^
|
||||
0: E*
|
||||
|
||||
The callout function in pcretest returns zero (carry on matching) by
|
||||
default, but you can use a \C item in a data line (as described above)
|
||||
to change this.
|
||||
|
||||
Inserting callouts can be helpful when using pcretest to check compli-
|
||||
cated regular expressions. For further information about callouts, see
|
||||
the pcrecallout documentation.
|
||||
|
||||
|
||||
SAVING AND RELOADING COMPILED PATTERNS
|
||||
|
||||
The facilities described in this section are not available when the
|
||||
POSIX inteface to PCRE is being used, that is, when the /P pattern mod-
|
||||
ifier is specified.
|
||||
|
||||
When the POSIX interface is not in use, you can cause pcretest to write
|
||||
a compiled pattern to a file, by following the modifiers with > and a
|
||||
file name. For example:
|
||||
|
||||
/pattern/im >/some/file
|
||||
|
||||
See the pcreprecompile documentation for a discussion about saving and
|
||||
re-using compiled patterns.
|
||||
|
||||
The data that is written is binary. The first eight bytes are the
|
||||
length of the compiled pattern data followed by the length of the
|
||||
optional study data, each written as four bytes in big-endian order
|
||||
(most significant byte first). If there is no study data (either the
|
||||
pattern was not studied, or studying did not return any data), the sec-
|
||||
ond length is zero. The lengths are followed by an exact copy of the
|
||||
compiled pattern. If there is additional study data, this follows imme-
|
||||
diately after the compiled pattern. After writing the file, pcretest
|
||||
expects to read a new pattern.
|
||||
|
||||
A saved pattern can be reloaded into pcretest by specifing < and a file
|
||||
name instead of a pattern. The name of the file must not contain a <
|
||||
character, as otherwise pcretest will interpret the line as a pattern
|
||||
delimited by < characters. For example:
|
||||
|
||||
re> </some/file
|
||||
Compiled regex loaded from /some/file
|
||||
No study data
|
||||
|
||||
When the pattern has been loaded, pcretest proceeds to read data lines
|
||||
in the usual way.
|
||||
|
||||
You can copy a file written by pcretest to a different host and reload
|
||||
it there, even if the new host has opposite endianness to the one on
|
||||
which the pattern was compiled. For example, you can compile on an i86
|
||||
machine and run on a SPARC machine.
|
||||
|
||||
File names for saving and reloading can be absolute or relative, but
|
||||
note that the shell facility of expanding a file name that starts with
|
||||
a tilde (~) is not available.
|
||||
|
||||
The ability to save and reload files in pcretest is intended for test-
|
||||
ing and experimentation. It is not intended for production use because
|
||||
only a single pattern can be written to a file. Furthermore, there is
|
||||
no facility for supplying custom character tables for use with a
|
||||
reloaded pattern. If the original pattern was compiled with custom
|
||||
tables, an attempt to match a subject string using a reloaded pattern
|
||||
is likely to cause pcretest to crash. Finally, if you attempt to load
|
||||
a file that is not in the correct format, the result is undefined.
|
||||
|
||||
|
||||
AUTHOR
|
||||
|
||||
Philip Hazel
|
||||
University Computing Service,
|
||||
Cambridge CB2 3QG, England.
|
||||
|
||||
Last updated: 29 June 2006
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
|
@ -0,0 +1,33 @@
|
|||
The perltest program
|
||||
--------------------
|
||||
|
||||
The perltest program tests Perl's regular expressions; it has the same
|
||||
specification as pcretest, and so can be given identical input, except that
|
||||
input patterns can be followed only by Perl's lower case modifiers and /+ (as
|
||||
used by pcretest), which is recognized and handled by the program.
|
||||
|
||||
The data lines are processed as Perl double-quoted strings, so if they contain
|
||||
" $ or @ characters, these have to be escaped. For this reason, all such
|
||||
characters in testinput1 and testinput4 are escaped so that they can be used
|
||||
for perltest as well as for pcretest. The special upper case pattern
|
||||
modifiers such as /A that pcretest recognizes, and its special data line
|
||||
escapes, are not used in these files. The output should be identical, apart
|
||||
from the initial identifying banner.
|
||||
|
||||
The perltest script can also test UTF-8 features. It works as is for Perl 5.8
|
||||
or higher. It recognizes the special modifier /8 that pcretest uses to invoke
|
||||
UTF-8 functionality. The testinput4 file can be fed to perltest to run
|
||||
compatible UTF-8 tests.
|
||||
|
||||
For Perl 5.6, perltest won't work unmodified for the UTF-8 tests. You need to
|
||||
uncomment the "use utf8" lines that it contains. It is best to do this on a
|
||||
copy of the script, because for non-UTF-8 tests, these lines should remain
|
||||
commented out.
|
||||
|
||||
The other testinput files are not suitable for feeding to perltest, since they
|
||||
make use of the special upper case modifiers and escapes that pcretest uses to
|
||||
test some features of PCRE. Some of these files also contains malformed regular
|
||||
expressions, in order to check that PCRE diagnoses them correctly.
|
||||
|
||||
Philip Hazel
|
||||
September 2004
|
|
@ -0,0 +1,251 @@
|
|||
#!/bin/sh
|
||||
#
|
||||
# install - install a program, script, or datafile
|
||||
# This comes from X11R5 (mit/util/scripts/install.sh).
|
||||
#
|
||||
# Copyright 1991 by the Massachusetts Institute of Technology
|
||||
#
|
||||
# Permission to use, copy, modify, distribute, and sell this software and its
|
||||
# documentation for any purpose is hereby granted without fee, provided that
|
||||
# the above copyright notice appear in all copies and that both that
|
||||
# copyright notice and this permission notice appear in supporting
|
||||
# documentation, and that the name of M.I.T. not be used in advertising or
|
||||
# publicity pertaining to distribution of the software without specific,
|
||||
# written prior permission. M.I.T. makes no representations about the
|
||||
# suitability of this software for any purpose. It is provided "as is"
|
||||
# without express or implied warranty.
|
||||
#
|
||||
# Calling this script install-sh is preferred over install.sh, to prevent
|
||||
# `make' implicit rules from creating a file called install from it
|
||||
# when there is no Makefile.
|
||||
#
|
||||
# This script is compatible with the BSD install script, but was written
|
||||
# from scratch. It can only install one file at a time, a restriction
|
||||
# shared with many OS's install programs.
|
||||
|
||||
|
||||
# set DOITPROG to echo to test this script
|
||||
|
||||
# Don't use :- since 4.3BSD and earlier shells don't like it.
|
||||
doit="${DOITPROG-}"
|
||||
|
||||
|
||||
# put in absolute paths if you don't have them in your path; or use env. vars.
|
||||
|
||||
mvprog="${MVPROG-mv}"
|
||||
cpprog="${CPPROG-cp}"
|
||||
chmodprog="${CHMODPROG-chmod}"
|
||||
chownprog="${CHOWNPROG-chown}"
|
||||
chgrpprog="${CHGRPPROG-chgrp}"
|
||||
stripprog="${STRIPPROG-strip}"
|
||||
rmprog="${RMPROG-rm}"
|
||||
mkdirprog="${MKDIRPROG-mkdir}"
|
||||
|
||||
transformbasename=""
|
||||
transform_arg=""
|
||||
instcmd="$mvprog"
|
||||
chmodcmd="$chmodprog 0755"
|
||||
chowncmd=""
|
||||
chgrpcmd=""
|
||||
stripcmd=""
|
||||
rmcmd="$rmprog -f"
|
||||
mvcmd="$mvprog"
|
||||
src=""
|
||||
dst=""
|
||||
dir_arg=""
|
||||
|
||||
while [ x"$1" != x ]; do
|
||||
case $1 in
|
||||
-c) instcmd="$cpprog"
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-d) dir_arg=true
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-m) chmodcmd="$chmodprog $2"
|
||||
shift
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-o) chowncmd="$chownprog $2"
|
||||
shift
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-g) chgrpcmd="$chgrpprog $2"
|
||||
shift
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-s) stripcmd="$stripprog"
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-t=*) transformarg=`echo $1 | sed 's/-t=//'`
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-b=*) transformbasename=`echo $1 | sed 's/-b=//'`
|
||||
shift
|
||||
continue;;
|
||||
|
||||
*) if [ x"$src" = x ]
|
||||
then
|
||||
src=$1
|
||||
else
|
||||
# this colon is to work around a 386BSD /bin/sh bug
|
||||
:
|
||||
dst=$1
|
||||
fi
|
||||
shift
|
||||
continue;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [ x"$src" = x ]
|
||||
then
|
||||
echo "install: no input file specified"
|
||||
exit 1
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
if [ x"$dir_arg" != x ]; then
|
||||
dst=$src
|
||||
src=""
|
||||
|
||||
if [ -d $dst ]; then
|
||||
instcmd=:
|
||||
chmodcmd=""
|
||||
else
|
||||
instcmd=mkdir
|
||||
fi
|
||||
else
|
||||
|
||||
# Waiting for this to be detected by the "$instcmd $src $dsttmp" command
|
||||
# might cause directories to be created, which would be especially bad
|
||||
# if $src (and thus $dsttmp) contains '*'.
|
||||
|
||||
if [ -f $src -o -d $src ]
|
||||
then
|
||||
true
|
||||
else
|
||||
echo "install: $src does not exist"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ x"$dst" = x ]
|
||||
then
|
||||
echo "install: no destination specified"
|
||||
exit 1
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
# If destination is a directory, append the input filename; if your system
|
||||
# does not like double slashes in filenames, you may need to add some logic
|
||||
|
||||
if [ -d $dst ]
|
||||
then
|
||||
dst="$dst"/`basename $src`
|
||||
else
|
||||
true
|
||||
fi
|
||||
fi
|
||||
|
||||
## this sed command emulates the dirname command
|
||||
dstdir=`echo $dst | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`
|
||||
|
||||
# Make sure that the destination directory exists.
|
||||
# this part is taken from Noah Friedman's mkinstalldirs script
|
||||
|
||||
# Skip lots of stat calls in the usual case.
|
||||
if [ ! -d "$dstdir" ]; then
|
||||
defaultIFS='
|
||||
'
|
||||
IFS="${IFS-${defaultIFS}}"
|
||||
|
||||
oIFS="${IFS}"
|
||||
# Some sh's can't handle IFS=/ for some reason.
|
||||
IFS='%'
|
||||
set - `echo ${dstdir} | sed -e 's@/@%@g' -e 's@^%@/@'`
|
||||
IFS="${oIFS}"
|
||||
|
||||
pathcomp=''
|
||||
|
||||
while [ $# -ne 0 ] ; do
|
||||
pathcomp="${pathcomp}${1}"
|
||||
shift
|
||||
|
||||
if [ ! -d "${pathcomp}" ] ;
|
||||
then
|
||||
$mkdirprog "${pathcomp}"
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
pathcomp="${pathcomp}/"
|
||||
done
|
||||
fi
|
||||
|
||||
if [ x"$dir_arg" != x ]
|
||||
then
|
||||
$doit $instcmd $dst &&
|
||||
|
||||
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dst; else true ; fi &&
|
||||
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dst; else true ; fi &&
|
||||
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dst; else true ; fi &&
|
||||
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dst; else true ; fi
|
||||
else
|
||||
|
||||
# If we're going to rename the final executable, determine the name now.
|
||||
|
||||
if [ x"$transformarg" = x ]
|
||||
then
|
||||
dstfile=`basename $dst`
|
||||
else
|
||||
dstfile=`basename $dst $transformbasename |
|
||||
sed $transformarg`$transformbasename
|
||||
fi
|
||||
|
||||
# don't allow the sed command to completely eliminate the filename
|
||||
|
||||
if [ x"$dstfile" = x ]
|
||||
then
|
||||
dstfile=`basename $dst`
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
# Make a temp file name in the proper directory.
|
||||
|
||||
dsttmp=$dstdir/#inst.$$#
|
||||
|
||||
# Move or copy the file name to the temp name
|
||||
|
||||
$doit $instcmd $src $dsttmp &&
|
||||
|
||||
trap "rm -f ${dsttmp}" 0 &&
|
||||
|
||||
# and set any options; do chmod last to preserve setuid bits
|
||||
|
||||
# If any of these fail, we abort the whole thing. If we want to
|
||||
# ignore errors from any of these, just make sure not to ignore
|
||||
# errors from the above "$doit $instcmd $src $dsttmp" command.
|
||||
|
||||
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dsttmp; else true;fi &&
|
||||
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dsttmp; else true;fi &&
|
||||
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dsttmp; else true;fi &&
|
||||
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dsttmp; else true;fi &&
|
||||
|
||||
# Now rename the file to the real destination.
|
||||
|
||||
$doit $rmcmd -f $dstdir/$dstfile &&
|
||||
$doit $mvcmd $dsttmp $dstdir/$dstfile
|
||||
|
||||
fi &&
|
||||
|
||||
|
||||
exit 0
|
|
@ -0,0 +1,20 @@
|
|||
LIBRARY libpcre
|
||||
EXPORTS
|
||||
pcre_malloc
|
||||
pcre_free
|
||||
pcre_config
|
||||
pcre_callout
|
||||
pcre_compile
|
||||
pcre_copy_substring
|
||||
pcre_dfa_exec
|
||||
pcre_exec
|
||||
pcre_get_substring
|
||||
pcre_get_stringnumber
|
||||
pcre_get_substring_list
|
||||
pcre_free_substring
|
||||
pcre_free_substring_list
|
||||
pcre_info
|
||||
pcre_fullinfo
|
||||
pcre_maketables
|
||||
pcre_study
|
||||
pcre_version
|
|
@ -0,0 +1,12 @@
|
|||
# Package Information for pkg-config
|
||||
|
||||
prefix=@prefix@
|
||||
exec_prefix=@exec_prefix@
|
||||
libdir=@libdir@
|
||||
includedir=@includedir@
|
||||
|
||||
Name: libpcre
|
||||
Description: PCRE - Perl compatible regular expressions C library
|
||||
Version: @PCRE_VERSION@
|
||||
Libs: -L${libdir} -lpcre
|
||||
Cflags: -I${includedir}
|
|
@ -0,0 +1,25 @@
|
|||
LIBRARY libpcreposix
|
||||
EXPORTS
|
||||
pcre_malloc
|
||||
pcre_free
|
||||
pcre_config
|
||||
pcre_callout
|
||||
pcre_compile
|
||||
pcre_copy_substring
|
||||
pcre_dfa_exec
|
||||
pcre_exec
|
||||
pcre_get_substring
|
||||
pcre_get_stringnumber
|
||||
pcre_get_substring_list
|
||||
pcre_free_substring
|
||||
pcre_free_substring_list
|
||||
pcre_info
|
||||
pcre_fullinfo
|
||||
pcre_maketables
|
||||
pcre_study
|
||||
pcre_version
|
||||
|
||||
regcomp
|
||||
regexec
|
||||
regerror
|
||||
regfree
|
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue