add pcre to in tree libs

git-svn-id: http://svn.freeswitch.org/svn/freeswitch/trunk@3732 d0543943-73ff-0310-b7d9-9358b9ac24b2
This commit is contained in:
Michael Jerris 2006-12-19 19:54:26 +00:00
parent f82b80b57c
commit 9da5d7e90f
166 changed files with 125793 additions and 0 deletions

23
libs/pcre/AUTHORS Normal file
View File

@ -0,0 +1,23 @@
THE MAIN PCRE LIBRARY
---------------------
Written by: Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714.
Copyright (c) 1997-2006 University of Cambridge
All rights reserved
THE C++ WRAPPER LIBRARY
-----------------------
Written by: Google Inc.
Copyright (c) 2006 Google Inc
All rights reserved
####

68
libs/pcre/COPYING Normal file
View File

@ -0,0 +1,68 @@
PCRE LICENCE
------------
PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Release 6 of PCRE is distributed under the terms of the "BSD" licence, as
specified below. The documentation for PCRE, supplied in the "doc"
directory, is distributed under the same terms as the software itself.
The basic library functions are written in C and are freestanding. Also
included in the distribution is a set of C++ wrapper functions.
THE BASIC LIBRARY FUNCTIONS
---------------------------
Written by: Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714.
Copyright (c) 1997-2006 University of Cambridge
All rights reserved.
THE C++ WRAPPER FUNCTIONS
-------------------------
Contributed by: Google Inc.
Copyright (c) 2006, Google Inc.
All rights reserved.
THE "BSD" LICENCE
-----------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the name of Google
Inc. nor the names of their contributors may be used to endorse or
promote products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
End

2279
libs/pcre/ChangeLog Normal file

File diff suppressed because it is too large Load Diff

185
libs/pcre/INSTALL Normal file
View File

@ -0,0 +1,185 @@
Basic Installation
==================
These are generic installation instructions that apply to systems that
can run the `configure' shell script - Unix systems and any that imitate
it. They are not specific to PCRE. There are PCRE-specific instructions
for non-Unix systems in the file NON-UNIX-USE.
The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation. It uses
those values to create a `Makefile' in each directory of the package.
It may also create one or more `.h' files containing system-dependent
definitions. Finally, it creates a shell script `config.status' that
you can run in the future to recreate the current configuration, a file
`config.cache' that saves the results of its tests to speed up
reconfiguring, and a file `config.log' containing compiler output
(useful mainly for debugging `configure').
If you need to do unusual things to compile the package, please try
to figure out how `configure' could check whether to do them, and mail
diffs or instructions to the address given in the `README' so they can
be considered for the next release. If at some point `config.cache'
contains results you don't want to keep, you may remove or edit it.
The file `configure.in' is used to create `configure' by a program
called `autoconf'. You only need `configure.in' if you want to change
it or regenerate `configure' using a newer version of `autoconf'.
The simplest way to compile this package is:
1. `cd' to the directory containing the package's source code and type
`./configure' to configure the package for your system. If you're
using `csh' on an old version of System V, you might need to type
`sh ./configure' instead to prevent `csh' from trying to execute
`configure' itself.
Running `configure' takes awhile. While running, it prints some
messages telling which features it is checking for.
2. Type `make' to compile the package.
3. Optionally, type `make check' to run any self-tests that come with
the package.
4. Type `make install' to install the programs and any data files and
documentation.
5. You can remove the program binaries and object files from the
source code directory by typing `make clean'. To also remove the
files that `configure' created (so you can compile the package for
a different kind of computer), type `make distclean'. There is
also a `make maintainer-clean' target, but that is intended mainly
for the package's developers. If you use it, you may have to get
all sorts of other programs in order to regenerate files that came
with the distribution.
Compilers and Options
=====================
Some systems require unusual options for compilation or linking that
the `configure' script does not know about. You can give `configure'
initial values for variables by setting them in the environment. Using
a Bourne-compatible shell, you can do that on the command line like
this:
CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure
Or on systems that have the `env' program, you can do it like this:
env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure
Compiling For Multiple Architectures
====================================
You can compile the package for more than one kind of computer at the
same time, by placing the object files for each architecture in their
own directory. To do this, you must use a version of `make' that
supports the `VPATH' variable, such as GNU `make'. `cd' to the
directory where you want the object files and executables to go and run
the `configure' script. `configure' automatically checks for the
source code in the directory that `configure' is in and in `..'.
If you have to use a `make' that does not supports the `VPATH'
variable, you have to compile the package for one architecture at a time
in the source code directory. After you have installed the package for
one architecture, use `make distclean' before reconfiguring for another
architecture.
Installation Names
==================
By default, `make install' will install the package's files in
`/usr/local/bin', `/usr/local/man', etc. You can specify an
installation prefix other than `/usr/local' by giving `configure' the
option `--prefix=PATH'.
You can specify separate installation prefixes for
architecture-specific files and architecture-independent files. If you
give `configure' the option `--exec-prefix=PATH', the package will use
PATH as the prefix for installing programs and libraries.
Documentation and other data files will still use the regular prefix.
In addition, if you use an unusual directory layout you can give
options like `--bindir=PATH' to specify different values for particular
kinds of files. Run `configure --help' for a list of the directories
you can set and what kinds of files go in them.
If the package supports it, you can cause programs to be installed
with an extra prefix or suffix on their names by giving `configure' the
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
Optional Features
=================
Some packages pay attention to `--enable-FEATURE' options to
`configure', where FEATURE indicates an optional part of the package.
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
is something like `gnu-as' or `x' (for the X Window System). The
`README' should mention any `--enable-' and `--with-' options that the
package recognizes.
For packages that use the X Window System, `configure' can usually
find the X include and library files automatically, but if it doesn't,
you can use the `configure' options `--x-includes=DIR' and
`--x-libraries=DIR' to specify their locations.
Specifying the System Type
==========================
There may be some features `configure' can not figure out
automatically, but needs to determine by the type of host the package
will run on. Usually `configure' can figure that out, but if it prints
a message saying it can not guess the host type, give it the
`--host=TYPE' option. TYPE can either be a short name for the system
type, such as `sun4', or a canonical name with three fields:
CPU-COMPANY-SYSTEM
See the file `config.sub' for the possible values of each field. If
`config.sub' isn't included in this package, then this package doesn't
need to know the host type.
If you are building compiler tools for cross-compiling, you can also
use the `--target=TYPE' option to select the type of system they will
produce code for and the `--build=TYPE' option to select the type of
system on which you are compiling the package.
Sharing Defaults
================
If you want to set default values for `configure' scripts to share,
you can create a site shell script called `config.site' that gives
default values for variables like `CC', `cache_file', and `prefix'.
`configure' looks for `PREFIX/share/config.site' if it exists, then
`PREFIX/etc/config.site' if it exists. Or, you can set the
`CONFIG_SITE' environment variable to the location of the site script.
A warning: not all `configure' scripts look for a site script.
Operation Controls
==================
`configure' recognizes the following options to control how it
operates.
`--cache-file=FILE'
Use and save the results of the tests in FILE instead of
`./config.cache'. Set FILE to `/dev/null' to disable caching, for
debugging `configure'.
`--help'
Print a summary of the options to `configure', and exit.
`--quiet'
`--silent'
`-q'
Do not print messages saying which checks are being made. To
suppress all normal output, redirect it to `/dev/null' (any error
messages will still be shown).
`--srcdir=DIR'
Look for the package's source code in directory DIR. Usually
`configure' can determine that directory automatically.
`--version'
Print the version of Autoconf used to generate the `configure'
script, and exit.
`configure' also accepts some other, not widely useful, options.

68
libs/pcre/LICENCE Normal file
View File

@ -0,0 +1,68 @@
PCRE LICENCE
------------
PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Release 6 of PCRE is distributed under the terms of the "BSD" licence, as
specified below. The documentation for PCRE, supplied in the "doc"
directory, is distributed under the same terms as the software itself.
The basic library functions are written in C and are freestanding. Also
included in the distribution is a set of C++ wrapper functions.
THE BASIC LIBRARY FUNCTIONS
---------------------------
Written by: Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714.
Copyright (c) 1997-2006 University of Cambridge
All rights reserved.
THE C++ WRAPPER FUNCTIONS
-------------------------
Contributed by: Google Inc.
Copyright (c) 2006, Google Inc.
All rights reserved.
THE "BSD" LICENCE
-----------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the name of Google
Inc. nor the names of their contributors may be used to endorse or
promote products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
End

606
libs/pcre/Makefile.in Normal file
View File

@ -0,0 +1,606 @@
# Makefile.in for PCRE (Perl-Compatible Regular Expression) library.
#############################################################################
# PCRE is developed on a Unix system. I do not use Windows or Macs, and know
# nothing about building software on them. Although the code of PCRE should
# be very portable, the building system in this Makefile is designed for Unix
# systems. However, there are features that have been supplied to me by various
# people that should make it work on MinGW and Cygwin systems.
# This setting enables Unix-style directory scanning in pcregrep, triggered
# by the -f option. Maybe one day someone will add code for other systems.
PCREGREP_OSTYPE=-DIS_UNIX
#############################################################################
# Libtool places .o files in the .libs directory; this can mean that "make"
# thinks is it not up-to-date when in fact it is. This setting helps when
# GNU "make" is being used. It presumably does no harm in other cases.
VPATH=.libs
#---------------------------------------------------------------------------#
# The following lines are modified by "configure" to insert data that it is #
# given in its arguments, or which it finds out for itself. #
#---------------------------------------------------------------------------#
SHELL = @SHELL@
prefix = @prefix@
exec_prefix = @exec_prefix@
top_srcdir = @top_srcdir@
mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs
# NB: top_builddir is not referred to directly below, but it is used in the
# setting of $(LIBTOOL), so don't remove it!
top_builddir = .
# BINDIR is the directory in which the pcregrep, pcretest, and pcre-config
# commands are installed.
# INCDIR is the directory in which the public header files pcre.h and
# pcreposix.h are installed.
# LIBDIR is the directory in which the libraries are installed.
# MANDIR is the directory in which the man pages are installed.
BINDIR = @bindir@
LIBDIR = @libdir@
INCDIR = @includedir@
MANDIR = @mandir@
# EXEEXT is set by configure to the extention of an executable file
# OBJEXT is set by configure to the extention of an object file
# The BUILD_* equivalents are the same but for the host we're building on
EXEEXT = @EXEEXT@
OBJEXT = @OBJEXT@
# Note that these are just here to have a convenient place to look at the
# outcome.
BUILD_EXEEXT = @BUILD_EXEEXT@
BUILD_OBJEXT = @BUILD_OBJEXT@
# POSIX_OBJ and POSIX_LOBJ are either set empty, or to the names of the
# POSIX object files.
POSIX_OBJ = @POSIX_OBJ@
POSIX_LOBJ = @POSIX_LOBJ@
# The compiler, C flags, preprocessor flags, etc
CC = @CC@
CXX = @CXX@
CFLAGS = @CFLAGS@
CXXFLAGS = @CXXFLAGS@
LDFLAGS = @LDFLAGS@
CXXLDFLAGS = @CXXLDFLAGS@
CC_FOR_BUILD = @CC_FOR_BUILD@
CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
CXX_FOR_BUILD = @CXX_FOR_BUILD@
CXXFLAGS_FOR_BUILD = @CXXFLAGS_FOR_BUILD@
LDFLAGS_FOR_BUILD = $(LDFLAGS)
UCP = @UCP@
UTF8 = @UTF8@
NEWLINE = @NEWLINE@
POSIX_MALLOC_THRESHOLD = @POSIX_MALLOC_THRESHOLD@
LINK_SIZE = @LINK_SIZE@
MATCH_LIMIT = @MATCH_LIMIT@ @MATCH_LIMIT_RECURSION@
NO_RECURSE = @NO_RECURSE@
EBCDIC = @EBCDIC@
INSTALL = @INSTALL@
INSTALL_DATA = @INSTALL_DATA@
# LIBTOOL enables the building of shared and static libraries. It is set up
# to do one or the other or both by ./configure.
LIBTOOL = @LIBTOOL@
LTCOMPILE = $(LIBTOOL) --mode=compile $(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(NEWLINE) $(LINK_SIZE) $(MATCH_LIMIT) $(NO_RECURSE) $(EBCDIC)
LTCXXCOMPILE = $(LIBTOOL) --mode=compile $(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(NEWLINE) $(LINK_SIZE) $(MATCH_LIMIT) $(NO_RECURSE) $(EBCDIC)
@ON_WINDOWS@LINK = $(CC) $(LDFLAGS) -I. -I$(top_srcdir) -L.libs
@NOT_ON_WINDOWS@LINK = $(LIBTOOL) --mode=link $(CC) $(CFLAGS) $(LDFLAGS) -I. -I$(top_srcdir)
LINKLIB = $(LIBTOOL) --mode=link $(CC) -export-symbols-regex '^[^_]' $(LDFLAGS) -I. -I$(top_srcdir)
LINK_FOR_BUILD = $(LIBTOOL) --mode=link $(CC_FOR_BUILD) $(CFLAGS_FOR_BUILD) $(LDFLAGS_FOR_BUILD) -I. -I$(top_srcdir)
@ON_WINDOWS@CXXLINK = $(CXX) $(LDFLAGS) -I. -I$(top_srcdir) -L.libs
@NOT_ON_WINDOWS@CXXLINK = $(LIBTOOL) --mode=link $(CXX) $(CXXFLAGS) $(CXXLDFLAGS) -I. -I$(top_srcdir)
CXXLINKLIB = $(LIBTOOL) --mode=link $(CXX) $(LDFLAGS) -I. -I$(top_srcdir)
# These are the version numbers for the shared libraries
PCRELIBVERSION = @PCRE_LIB_VERSION@
PCREPOSIXLIBVERSION = @PCRE_POSIXLIB_VERSION@
PCRECPPLIBVERSION = @PCRE_CPPLIB_VERSION@
##############################################################################
OBJ = pcre_chartables.@OBJEXT@ \
pcre_compile.@OBJEXT@ \
pcre_config.@OBJEXT@ \
pcre_dfa_exec.@OBJEXT@ \
pcre_exec.@OBJEXT@ \
pcre_fullinfo.@OBJEXT@ \
pcre_get.@OBJEXT@ \
pcre_globals.@OBJEXT@ \
pcre_info.@OBJEXT@ \
pcre_maketables.@OBJEXT@ \
pcre_ord2utf8.@OBJEXT@ \
pcre_refcount.@OBJEXT@ \
pcre_study.@OBJEXT@ \
pcre_tables.@OBJEXT@ \
pcre_try_flipped.@OBJEXT@ \
pcre_ucp_searchfuncs.@OBJEXT@ \
pcre_valid_utf8.@OBJEXT@ \
pcre_version.@OBJEXT@ \
pcre_xclass.@OBJEXT@ \
$(POSIX_OBJ)
LOBJ = pcre_chartables.lo \
pcre_compile.lo \
pcre_config.lo \
pcre_dfa_exec.lo \
pcre_exec.lo \
pcre_fullinfo.lo \
pcre_get.lo \
pcre_globals.lo \
pcre_info.lo \
pcre_maketables.lo \
pcre_ord2utf8.lo \
pcre_refcount.lo \
pcre_study.lo \
pcre_tables.lo \
pcre_try_flipped.lo \
pcre_ucp_searchfuncs.lo \
pcre_valid_utf8.lo \
pcre_version.lo \
pcre_xclass.lo \
$(POSIX_LOBJ)
CPPOBJ = pcrecpp.@OBJEXT@ \
pcre_scanner.@OBJEXT@ \
pcre_stringpiece.@OBJEXT@
CPPLOBJ = pcrecpp.lo \
pcre_scanner.lo \
pcre_stringpiece.lo
CPP_TARGETS = libpcrecpp.la \
pcrecpp_unittest@EXEEXT@ \
pcre_scanner_unittest@EXEEXT@ \
pcre_stringpiece_unittest@EXEEXT@
all: libpcre.la @POSIX_LIB@ pcretest@EXEEXT@ pcregrep@EXEEXT@ \
@MAYBE_CPP_TARGETS@ @ON_WINDOWS@ winshared
pcregrep@EXEEXT@: libpcre.la pcregrep.@OBJEXT@ @ON_WINDOWS@ winshared
$(LINK) -o pcregrep@EXEEXT@ pcregrep.@OBJEXT@ libpcre.la
pcretest@EXEEXT@: libpcre.la @POSIX_LIB@ pcretest.@OBJEXT@ \
@ON_WINDOWS@ winshared
$(LINK) $(PURIFY) $(EFENCE) -o pcretest@EXEEXT@ \
pcretest.@OBJEXT@ \
libpcre.la @POSIX_LIB@
pcrecpp_unittest@EXEEXT@: libpcrecpp.la pcrecpp_unittest.@OBJEXT@ \
@ON_WINDOWS@ winshared
$(CXXLINK) $(PURIFY) $(EFENCE) -o pcrecpp_unittest@EXEEXT@ \
pcrecpp_unittest.@OBJEXT@ \
libpcrecpp.la @POSIX_LIB@
pcre_scanner_unittest@EXEEXT@: libpcrecpp.la pcre_scanner_unittest.@OBJEXT@ \
@ON_WINDOWS@ winshared
$(CXXLINK) $(PURIFY) $(EFENCE) \
-o pcre_scanner_unittest@EXEEXT@ \
pcre_scanner_unittest.@OBJEXT@ \
libpcrecpp.la @POSIX_LIB@
pcre_stringpiece_unittest@EXEEXT@: libpcrecpp.la \
pcre_stringpiece_unittest.@OBJEXT@ @ON_WINDOWS@ winshared
$(CXXLINK) $(PURIFY) $(EFENCE) \
-o pcre_stringpiece_unittest@EXEEXT@ \
pcre_stringpiece_unittest.@OBJEXT@ \
libpcrecpp.la @POSIX_LIB@
libpcre.la: $(OBJ)
-rm -f libpcre.la
$(LINKLIB) -rpath $(LIBDIR) -version-info \
'$(PCRELIBVERSION)' -o libpcre.la $(LOBJ)
libpcreposix.la: libpcre.la pcreposix.@OBJEXT@
-rm -f libpcreposix.la
$(LINKLIB) -rpath $(LIBDIR) libpcre.la -version-info \
'$(PCREPOSIXLIBVERSION)' -o libpcreposix.la pcreposix.lo
libpcrecpp.la: libpcre.la $(CPPOBJ)
-rm -f libpcrecpp.la
$(CXXLINKLIB) -rpath $(LIBDIR) libpcre.la -version-info \
'$(PCRECPPLIBVERSION)' -o libpcrecpp.la $(CPPLOBJ)
# Note that files generated by ./configure and by dftables are in the current
# directory, not the source directory.
pcre_chartables.@OBJEXT@: pcre_chartables.c
@$(LTCOMPILE) pcre_chartables.c
pcre_compile.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_compile.c \
$(top_srcdir)/pcre_printint.src
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_compile.c
pcre_config.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_config.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_config.c
pcre_dfa_exec.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_dfa_exec.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_dfa_exec.c
pcre_exec.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_exec.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_exec.c
pcre_fullinfo.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_fullinfo.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_fullinfo.c
pcre_get.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_get.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_get.c
pcre_globals.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_globals.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_globals.c
pcre_info.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_info.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_info.c
pcre_maketables.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_maketables.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_maketables.c
pcre_ord2utf8.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_ord2utf8.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_ord2utf8.c
pcre_refcount.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_refcount.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_refcount.c
pcre_study.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_study.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_study.c
pcre_tables.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_tables.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_tables.c
pcre_try_flipped.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_try_flipped.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_try_flipped.c
pcre_ucp_searchfuncs.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h \
$(top_srcdir)/pcre_ucp_searchfuncs.c \
$(top_srcdir)/ucptable.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_ucp_searchfuncs.c
pcre_valid_utf8.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_valid_utf8.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_valid_utf8.c
pcre_version.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_version.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_version.c
pcre_xclass.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_xclass.c
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_xclass.c
pcreposix.@OBJEXT@: $(top_srcdir)/pcreposix.c $(top_srcdir)/pcreposix.h \
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre.h config.h Makefile
@$(LTCOMPILE) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcreposix.c
pcrecpp.@OBJEXT@: $(top_srcdir)/pcrecpp.cc $(top_srcdir)/pcrecpp.h \
pcrecpparg.h pcre_stringpiece.h $(top_srcdir)/pcre.h config.h Makefile
@$(LTCXXCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcrecpp.cc
pcre_scanner.@OBJEXT@: $(top_srcdir)/pcre_scanner.cc \
$(top_srcdir)/pcre_scanner.h \
$(top_srcdir)/pcrecpp.h pcrecpparg.h pcre_stringpiece.h \
$(top_srcdir)/pcre.h config.h Makefile
@$(LTCXXCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_scanner.cc
pcre_stringpiece.@OBJEXT@: $(top_srcdir)/pcre_stringpiece.cc \
pcre_stringpiece.h \
config.h Makefile
@$(LTCXXCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
$(top_srcdir)/pcre_stringpiece.cc
pcretest.@OBJEXT@: $(top_srcdir)/pcretest.c $(top_srcdir)/pcre_internal.h \
$(top_srcdir)/pcre_printint.src $(top_srcdir)/pcre.h config.h Makefile
$(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
$(LINK_SIZE) $(top_srcdir)/pcretest.c
pcrecpp_unittest.@OBJEXT@: $(top_srcdir)/pcrecpp_unittest.cc \
$(top_srcdir)/pcrecpp.h \
pcrecpparg.h pcre_stringpiece.h $(top_srcdir)/pcre.h config.h Makefile
$(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
$(LINK_SIZE) $(top_srcdir)/pcrecpp_unittest.cc
pcre_stringpiece_unittest.@OBJEXT@: $(top_srcdir)/pcre_stringpiece_unittest.cc \
pcre_stringpiece.h pcrecpparg.h config.h Makefile
$(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
$(LINK_SIZE) $(top_srcdir)/pcre_stringpiece_unittest.cc
pcre_scanner_unittest.@OBJEXT@: $(top_srcdir)/pcre_scanner_unittest.cc \
$(top_srcdir)/pcre_scanner.h \
$(top_srcdir)/pcrecpp.h pcre_stringpiece.h \
$(top_srcdir)/pcre.h pcrecpparg.h config.h Makefile
$(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
$(LINK_SIZE) $(top_srcdir)/pcre_scanner_unittest.cc
pcregrep.@OBJEXT@: $(top_srcdir)/pcregrep.c $(top_srcdir)/pcre.h Makefile config.h
$(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
$(PCREGREP_OSTYPE) $(top_srcdir)/pcregrep.c
# Some Windows-specific targets for MinGW. Do not use for Cygwin.
winshared : .libs/@WIN_PREFIX@pcre.dll .libs/@WIN_PREFIX@pcreposix.dll \
.libs/@WIN_PREFIX@pcrecpp.dll
.libs/@WIN_PREFIX@pcre.dll : libpcre.la
$(CC) $(CFLAGS) -shared -o $@ \
-Wl,--whole-archive .libs/libpcre.a \
-Wl,--out-implib,.libs/libpcre.dll.a \
-Wl,--output-def,.libs/@WIN_PREFIX@pcre.dll-def \
-Wl,--export-all-symbols \
-Wl,--no-whole-archive
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
< .libs/libpcre.lai > .libs/libpcre.lai.tmp && \
mv -f .libs/libpcre.lai.tmp .libs/libpcre.lai
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
< libpcre.la > libpcre.la.tmp && \
mv -f libpcre.la.tmp libpcre.la
.libs/@WIN_PREFIX@pcreposix.dll: libpcreposix.la libpcre.la
$(CC) $(CFLAGS) -shared -o $@ \
-Wl,--whole-archive .libs/libpcreposix.a \
-Wl,--out-implib,.libs/@WIN_PREFIX@pcreposix.dll.a \
-Wl,--output-def,.libs/@WIN_PREFIX@libpcreposix.dll-def \
-Wl,--export-all-symbols \
-Wl,--no-whole-archive .libs/libpcre.a
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
< .libs/libpcreposix.lai > .libs/libpcreposix.lai.tmp && \
mv -f .libs/libpcreposix.lai.tmp .libs/libpcreposix.lai
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
< libpcreposix.la > libpcreposix.la.tmp && \
mv -f libpcreposix.la.tmp libpcreposix.la
.libs/@WIN_PREFIX@pcrecpp.dll: libpcrecpp.la libpcre.la
$(CXX) $(CXXFLAGS) -shared -o $@ \
-Wl,--whole-archive .libs/libpcrecpp.a \
-Wl,--out-implib,.libs/@WIN_PREFIX@pcrecpp.dll.a \
-Wl,--output-def,.libs/@WIN_PREFIX@libpcrecpp.dll-def \
-Wl,--export-all-symbols \
-Wl,--no-whole-archive .libs/libpcre.a
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcrecpp.dll'#" \
-e "s#library_names=''#library_names='libpcrecpp.dll.a'#"\
< .libs/libpcrecpp.lai > .libs/libpcrecpp.lai.tmp && \
mv -f .libs/libpcrecpp.lai.tmp .libs/libpcrecpp.lai
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcrecpp.dll'#" \
-e "s#library_names=''#library_names='libpcrecpp.dll.a'#"\
< libpcrecpp.la > libpcrecpp.la.tmp && \
mv -f libpcrecpp.la.tmp libpcrecpp.la
wininstall : winshared
$(mkinstalldirs) $(DESTDIR)$(LIBDIR)
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
$(INSTALL) .libs/@WIN_PREFIX@pcre.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
$(INSTALL) .libs/@WIN_PREFIX@pcreposix.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
$(INSTALL) .libs/@WIN_PREFIX@libpcreposix.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcreposix.dll.a
$(INSTALL) .libs/@WIN_PREFIX@libpcre.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcre.dll.a
@HAVE_CPP@ $(INSTALL) .libs/@WIN_PREFIX@pcrecpp.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcrecpp.dll
@HAVE_CPP@ $(INSTALL) .libs/@WIN_PREFIX@libpcrecpp.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcrecpp.dll.a
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
@HAVE_CPP@ -strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcrecpp.dll
-strip $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
-strip $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
# An auxiliary program makes the default character table source. This is put
# in the current directory, NOT the $top_srcdir directory.
pcre_chartables.c: dftables@BUILD_EXEEXT@
./dftables@BUILD_EXEEXT@ pcre_chartables.c
dftables.@BUILD_OBJEXT@: $(top_srcdir)/dftables.c \
$(top_srcdir)/pcre_maketables.c $(top_srcdir)/pcre_internal.h \
$(top_srcdir)/pcre.h config.h Makefile
$(CC_FOR_BUILD) -c $(CFLAGS_FOR_BUILD) -I. $(top_srcdir)/dftables.c
dftables@BUILD_EXEEXT@: dftables.@BUILD_OBJEXT@
$(LINK_FOR_BUILD) -o dftables@BUILD_EXEEXT@ dftables.@OBJEXT@
install: all @ON_WINDOWS@ wininstall
@NOT_ON_WINDOWS@ $(mkinstalldirs) $(DESTDIR)$(LIBDIR)
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la"
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la"
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la
@NOT_ON_WINDOWS@@HAVE_CPP@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcrecpp.la $(DESTDIR)$(LIBDIR)/libpcrecpp.la"
@NOT_ON_WINDOWS@@HAVE_CPP@ $(LIBTOOL) --mode=install $(INSTALL) libpcrecpp.la $(DESTDIR)$(LIBDIR)/libpcrecpp.la
@NOT_ON_WINDOWS@ $(LIBTOOL) --finish $(DESTDIR)$(LIBDIR)
$(mkinstalldirs) $(DESTDIR)$(INCDIR)
$(INSTALL_DATA) $(top_srcdir)/pcre.h $(DESTDIR)$(INCDIR)/pcre.h
$(INSTALL_DATA) $(top_srcdir)/pcreposix.h $(DESTDIR)$(INCDIR)/pcreposix.h
@HAVE_CPP@ $(INSTALL_DATA) $(top_srcdir)/pcrecpp.h $(DESTDIR)$(INCDIR)/pcrecpp.h
@HAVE_CPP@ $(INSTALL_DATA) pcrecpparg.h $(DESTDIR)$(INCDIR)/pcrecpparg.h
@HAVE_CPP@ $(INSTALL_DATA) pcre_stringpiece.h $(DESTDIR)$(INCDIR)/pcre_stringpiece.h
@HAVE_CPP@ $(INSTALL_DATA) $(top_srcdir)/pcre_scanner.h $(DESTDIR)$(INCDIR)/pcre_scanner.h
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre.3 $(DESTDIR)$(MANDIR)/man3/pcre.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcreapi.3 $(DESTDIR)$(MANDIR)/man3/pcreapi.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrebuild.3 $(DESTDIR)$(MANDIR)/man3/pcrebuild.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecallout.3 $(DESTDIR)$(MANDIR)/man3/pcrecallout.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecompat.3 $(DESTDIR)$(MANDIR)/man3/pcrecompat.3
@HAVE_CPP@ $(INSTALL_DATA) $(top_srcdir)/doc/pcrecpp.3 $(DESTDIR)$(MANDIR)/man3/pcrecpp.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrematching.3 $(DESTDIR)$(MANDIR)/man3/pcrematching.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrepartial.3 $(DESTDIR)$(MANDIR)/man3/pcrepartial.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrepattern.3 $(DESTDIR)$(MANDIR)/man3/pcrepattern.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcreperform.3 $(DESTDIR)$(MANDIR)/man3/pcreperform.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcreposix.3 $(DESTDIR)$(MANDIR)/man3/pcreposix.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcreprecompile.3 $(DESTDIR)$(MANDIR)/man3/pcreprecompile.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcresample.3 $(DESTDIR)$(MANDIR)/man3/pcresample.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcrestack.3 $(DESTDIR)$(MANDIR)/man3/pcrestack.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_compile.3 $(DESTDIR)$(MANDIR)/man3/pcre_compile.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_compile2.3 $(DESTDIR)$(MANDIR)/man3/pcre_compile2.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_config.3 $(DESTDIR)$(MANDIR)/man3/pcre_config.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_named_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_dfa_exec.3 $(DESTDIR)$(MANDIR)/man3/pcre_dfa_exec.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_exec.3 $(DESTDIR)$(MANDIR)/man3/pcre_exec.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring_list.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_fullinfo.3 $(DESTDIR)$(MANDIR)/man3/pcre_fullinfo.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_named_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_stringnumber.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_stringnumber.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_stringtable_entries.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_stringtable_entries.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring_list.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_info.3 $(DESTDIR)$(MANDIR)/man3/pcre_info.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_maketables.3 $(DESTDIR)$(MANDIR)/man3/pcre_maketables.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_refcount.3 $(DESTDIR)$(MANDIR)/man3/pcre_refcount.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_study.3 $(DESTDIR)$(MANDIR)/man3/pcre_study.3
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_version.3 $(DESTDIR)$(MANDIR)/man3/pcre_version.3
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man1
$(INSTALL_DATA) $(top_srcdir)/doc/pcregrep.1 $(DESTDIR)$(MANDIR)/man1/pcregrep.1
$(INSTALL_DATA) $(top_srcdir)/doc/pcretest.1 $(DESTDIR)$(MANDIR)/man1/pcretest.1
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
$(LIBTOOL) --mode=install $(INSTALL) pcregrep@EXEEXT@ $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
$(LIBTOOL) --mode=install $(INSTALL) pcretest@EXEEXT@ $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
$(INSTALL) pcre-config $(DESTDIR)$(BINDIR)/pcre-config
$(mkinstalldirs) $(DESTDIR)$(LIBDIR)/pkgconfig
$(INSTALL_DATA) libpcre.pc $(DESTDIR)$(LIBDIR)/pkgconfig/libpcre.pc
# The uninstall target removes all the files that were installed.
uninstall:; -rm -rf \
$(DESTDIR)$(LIBDIR)/libpcre.* \
$(DESTDIR)$(LIBDIR)/libpcreposix.* \
$(DESTDIR)$(LIBDIR)/libpcrecpp.* \
$(DESTDIR)$(INCDIR)/pcre.h \
$(DESTDIR)$(INCDIR)/pcreposix.h \
$(DESTDIR)$(INCDIR)/pcrecpp.h \
$(DESTDIR)$(INCDIR)/pcrecpparg.h \
$(DESTDIR)$(INCDIR)/pcre_scanner.h \
$(DESTDIR)$(INCDIR)/pcre_stringpiece.h \
$(DESTDIR)$(MANDIR)/man3/pcre.3 \
$(DESTDIR)$(MANDIR)/man3/pcreapi.3 \
$(DESTDIR)$(MANDIR)/man3/pcrebuild.3 \
$(DESTDIR)$(MANDIR)/man3/pcrecallout.3 \
$(DESTDIR)$(MANDIR)/man3/pcrecompat.3 \
$(DESTDIR)$(MANDIR)/man3/pcrecpp.3 \
$(DESTDIR)$(MANDIR)/man3/pcrematching.3 \
$(DESTDIR)$(MANDIR)/man3/pcrepartial.3 \
$(DESTDIR)$(MANDIR)/man3/pcrepattern.3 \
$(DESTDIR)$(MANDIR)/man3/pcreperform.3 \
$(DESTDIR)$(MANDIR)/man3/pcreposix.3 \
$(DESTDIR)$(MANDIR)/man3/pcreprecompile.3 \
$(DESTDIR)$(MANDIR)/man3/pcresample.3 \
$(DESTDIR)$(MANDIR)/man3/pcrestack.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_compile.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_compile2.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_config.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_copy_named_substring.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_copy_substring.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_dfa_exec.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_exec.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_free_substring.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_free_substring_list.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_fullinfo.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_get_named_substring.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_get_stringnumber.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_get_stringtable_entries.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_get_substring.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_get_substring_list.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_info.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_maketables.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_refcount.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_study.3 \
$(DESTDIR)$(MANDIR)/man3/pcre_version.3 \
$(DESTDIR)$(MANDIR)/man1/pcregrep.1 \
$(DESTDIR)$(MANDIR)/man1/pcretest.1 \
$(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@ \
$(DESTDIR)$(BINDIR)/pcretest@EXEEXT@ \
$(DESTDIR)$(BINDIR)/pcre-config \
$(DESTDIR)$(LIBDIR)/pkgconfig/libpcre.pc
# We deliberately omit dftables and pcre_chartables.c from 'make clean'; once
# made pcre_chartables.c shouldn't change, and if people have edited the tables
# by hand, you don't want to throw them away.
clean:; -rm -rf *.@OBJEXT@ *.lo *.a *.la .libs pcretest@EXEEXT@ pcre_stringpiece_unittest@EXEEXT@ pcrecpp_unittest@EXEEXT@ pcre_scanner_unittest@EXEEXT@ pcregrep@EXEEXT@ testtry
# But "make distclean" should get back to a virgin distribution
distclean: clean
-rm -f pcre_chartables.c libtool pcre-config libpcre.pc \
pcre_stringpiece.h pcrecpparg.h \
dftables@EXEEXT@ RunGrepTest RunTest \
Makefile config.h config.status config.log config.cache
check: runtest
@WIN_PREFIX@pcre.dll : winshared
cp .libs/@WIN_PREFIX@pcre.dll .
test: runtest
runtest: all @ON_WINDOWS@ @WIN_PREFIX@pcre.dll
@./RunTest
@./RunGrepTest
@HAVE_CPP@ @echo ""
@HAVE_CPP@ @echo "Testing C++ wrapper"
@HAVE_CPP@ @echo ""; echo "Test 1++: stringpiece"
@HAVE_CPP@ @./pcre_stringpiece_unittest@EXEEXT@
@HAVE_CPP@ @echo ""; echo "Test 2++: RE class"
@HAVE_CPP@ @./pcrecpp_unittest@EXEEXT@
@HAVE_CPP@ @echo ""; echo "Test 3++: Scanner class"
@HAVE_CPP@ @./pcre_scanner_unittest@EXEEXT@
# End

266
libs/pcre/NEWS Normal file
View File

@ -0,0 +1,266 @@
News about PCRE releases
------------------------
Release 6.7 04-Jul-06
---------------------
The main additions to this release are the ability to use the same name for
multiple sets of parentheses, and support for CRLF line endings in both the
library and pcregrep (and in pcretest for testing).
Thanks to Ian Taylor, the stack usage for many kinds of pattern has been
significantly reduced for certain subject strings.
Release 6.5 01-Feb-06
---------------------
Important changes in this release:
1. A number of new features have been added to pcregrep.
2. The Unicode property tables have been updated to Unicode 4.1.0, and the
supported properties have been extended with script names such as "Arabic",
and the derived properties "Any" and "L&". This has necessitated a change to
the interal format of compiled patterns. Any saved compiled patterns that
use \p or \P must be recompiled.
3. The specification of recursion in patterns has been changed so that all
recursive subpatterns are automatically treated as atomic groups. Thus, for
example, (?R) is treated as if it were (?>(?R)). This is necessary because
otherwise there are situations where recursion does not work.
See the ChangeLog for a complete list of changes, which include a number of bug
fixes and tidies.
Release 6.0 07-Jun-05
---------------------
The release number has been increased to 6.0 because of the addition of several
major new pieces of functionality.
A new function, pcre_dfa_exec(), which implements pattern matching using a DFA
algorithm, has been added. This has a number of advantages for certain cases,
though it does run more slowly, and lacks the ability to capture substrings. On
the other hand, it does find all matches, not just the first, and it works
better for partial matching. The pcrematching man page discusses the
differences.
The pcretest program has been enhanced so that it can make use of the new
pcre_dfa_exec() matching function and the extra features it provides.
The distribution now includes a C++ wrapper library. This is built
automatically if a C++ compiler is found. The pcrecpp man page discusses this
interface.
The code itself has been re-organized into many more files, one for each
function, so it no longer requires everything to be linked in when static
linkage is used. As a consequence, some internal functions have had to have
their names exposed. These functions all have names starting with _pcre_. They
are undocumented, and are not intended for use by outside callers.
The pcregrep program has been enhanced with new functionality such as
multiline-matching and options for output more matching context. See the
ChangeLog for a complete list of changes to the library and the utility
programs.
Release 5.0 13-Sep-04
---------------------
The licence under which PCRE is released has been changed to the more
conventional "BSD" licence.
In the code, some bugs have been fixed, and there are also some major changes
in this release (which is why I've increased the number to 5.0). Some changes
are internal rearrangements, and some provide a number of new facilities. The
new features are:
1. There's an "automatic callout" feature that inserts callouts before every
item in the regex, and there's a new callout field that gives the position
in the pattern - useful for debugging and tracing.
2. The extra_data structure can now be used to pass in a set of character
tables at exec time. This is useful if compiled regex are saved and re-used
at a later time when the tables may not be at the same address. If the
default internal tables are used, the pointer saved with the compiled
pattern is now set to NULL, which means that you don't need to do anything
special unless you are using custom tables.
3. It is possible, with some restrictions on the content of the regex, to
request "partial" matching. A special return code is given if all of the
subject string matched part of the regex. This could be useful for testing
an input field as it is being typed.
4. There is now some optional support for Unicode character properties, which
means that the patterns items such as \p{Lu} and \X can now be used. Only
the general category properties are supported. If PCRE is compiled with this
support, an additional 90K data structure is include, which increases the
size of the library dramatically.
5. There is support for saving compiled patterns and re-using them later.
6. There is support for running regular expressions that were compiled on a
different host with the opposite endianness.
7. The pcretest program has been extended to accommodate the new features.
The main internal rearrangement is that sequences of literal characters are no
longer handled as strings. Instead, each character is handled on its own. This
makes some UTF-8 handling easier, and makes the support of partial matching
possible. Compiled patterns containing long literal strings will be larger as a
result of this change; I hope that performance will not be much affected.
Release 4.5 01-Dec-03
---------------------
Again mainly a bug-fix and tidying release, with only a couple of new features:
1. It's possible now to compile PCRE so that it does not use recursive
function calls when matching. Instead it gets memory from the heap. This slows
things down, but may be necessary on systems with limited stacks.
2. UTF-8 string checking has been tightened to reject overlong sequences and to
check that a starting offset points to the start of a character. Failure of the
latter returns a new error code: PCRE_ERROR_BADUTF8_OFFSET.
3. PCRE can now be compiled for systems that use EBCDIC code.
Release 4.4 21-Aug-03
---------------------
This is mainly a bug-fix and tidying release. The only new feature is that PCRE
checks UTF-8 strings for validity by default. There is an option to suppress
this, just in case anybody wants that teeny extra bit of performance.
Releases 4.1 - 4.3
------------------
Sorry, I forgot about updating the NEWS file for these releases. Please take a
look at ChangeLog.
Release 4.0 17-Feb-03
---------------------
There have been a lot of changes for the 4.0 release, adding additional
functionality and mending bugs. Below is a list of the highlights of the new
functionality. For full details of these features, please consult the
documentation. For a complete list of changes, see the ChangeLog file.
1. Support for Perl's \Q...\E escapes.
2. "Possessive quantifiers" ?+, *+, ++, and {,}+ which come from Sun's Java
package. They provide some syntactic sugar for simple cases of "atomic
grouping".
3. Support for the \G assertion. It is true when the current matching position
is at the start point of the match.
4. A new feature that provides some of the functionality that Perl provides
with (?{...}). The facility is termed a "callout". The way it is done in PCRE
is for the caller to provide an optional function, by setting pcre_callout to
its entry point. To get the function called, the regex must include (?C) at
appropriate points.
5. Support for recursive calls to individual subpatterns. This makes it really
easy to get totally confused.
6. Support for named subpatterns. The Python syntax (?P<name>...) is used to
name a group.
7. Several extensions to UTF-8 support; it is now fairly complete. There is an
option for pcregrep to make it operate in UTF-8 mode.
8. The single man page has been split into a number of separate man pages.
These also give rise to individual HTML pages which are put in a separate
directory. There is an index.html page that lists them all. Some hyperlinking
between the pages has been installed.
Release 3.5 15-Aug-01
---------------------
1. The configuring system has been upgraded to use later versions of autoconf
and libtool. By default it builds both a shared and a static library if the OS
supports it. You can use --disable-shared or --disable-static on the configure
command if you want only one of them.
2. The pcretest utility is now installed along with pcregrep because it is
useful for users (to test regexs) and by doing this, it automatically gets
relinked by libtool. The documentation has been turned into a man page, so
there are now .1, .txt, and .html versions in /doc.
3. Upgrades to pcregrep:
(i) Added long-form option names like gnu grep.
(ii) Added --help to list all options with an explanatory phrase.
(iii) Added -r, --recursive to recurse into sub-directories.
(iv) Added -f, --file to read patterns from a file.
4. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
script, to force use of CR or LF instead of \n in the source. On non-Unix
systems, the value can be set in config.h.
5. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
absolute limit. Changed the text of the error message to make this clear, and
likewise updated the man page.
6. The limit of 99 on the number of capturing subpatterns has been removed.
The new limit is 65535, which I hope will not be a "real" limit.
Release 3.3 01-Aug-00
---------------------
There is some support for UTF-8 character strings. This is incomplete and
experimental. The documentation describes what is and what is not implemented.
Otherwise, this is just a bug-fixing release.
Release 3.0 01-Feb-00
---------------------
1. A "configure" script is now used to configure PCRE for Unix systems. It
builds a Makefile, a config.h file, and the pcre-config script.
2. PCRE is built as a shared library by default.
3. There is support for POSIX classes such as [:alpha:].
5. There is an experimental recursion feature.
----------------------------------------------------------------------------
IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00
Please note that there has been a change in the API such that a larger
ovector is required at matching time, to provide some additional workspace.
The new man page has details. This change was necessary in order to support
some of the new functionality in Perl 5.005.
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00
Another (I hope this is the last!) change has been made to the API for the
pcre_compile() function. An additional argument has been added to make it
possible to pass over a pointer to character tables built in the current
locale by pcre_maketables(). To use the default tables, this new arguement
should be passed as NULL.
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05
Yet another (and again I hope this really is the last) change has been made
to the API for the pcre_exec() function. An additional argument has been
added to make it possible to start the match other than at the start of the
subject string. This is important if there are lookbehinds. The new man
page has the details, but you just want to convert existing programs, all
you need to do is to stick in a new fifth argument to pcre_exec(), with a
value of zero. For example, change
pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize)
to
pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize)
****

269
libs/pcre/NON-UNIX-USE Normal file
View File

@ -0,0 +1,269 @@
Compiling PCRE on non-Unix systems
----------------------------------
See below for comments on Cygwin or MinGW and OpenVMS usage. I (Philip Hazel)
have no knowledge of Windows or VMS sytems and how their libraries work. The
items in the PCRE Makefile that relate to anything other than Unix-like systems
have been contributed by PCRE users. There are some other comments and files in
the Contrib directory on the ftp site that you may find useful. See
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
If you want to compile PCRE for a non-Unix system (or perhaps, more strictly,
for a system that does not support "configure" and "make" files), note that
the basic PCRE library consists entirely of code written in Standard C, and so
should compile successfully on any system that has a Standard C compiler and
library. The C++ wrapper functions are a separate issue (see below).
GENERIC INSTRUCTIONS FOR THE C LIBRARY
The following are generic comments about building PCRE. The interspersed
indented commands are suggestions from Mark Tetrode as to which commands you
might use on a Windows system to build a static library.
(1) Copy or rename the file config.in as config.h, and change the macros that
define HAVE_STRERROR and HAVE_MEMMOVE to define them as 1 rather than 0.
Unfortunately, because of the way Unix autoconf works, the default setting has
to be 0. You may also want to make changes to other macros in config.h. In
particular, if you want to force a specific value for newline, you can define
the NEWLINE macro. The default is to use '\n', thereby using whatever value
your compiler gives to '\n'.
rem Mark Tetrode's commands
copy config.in config.h
rem Use write, because notepad cannot handle UNIX files. Change values.
write config.h
(2) Compile dftables.c as a stand-alone program, and then run it with
the single argument "pcre_chartables.c". This generates a set of standard
character tables and writes them to that file.
rem Mark Tetrode's commands
rem Compile & run
cl -DSUPPORT_UTF8 -DSUPPORT_UCP dftables.c
dftables.exe pcre_chartables.c
(3) Compile the following source files:
pcre_chartables.c
pcre_compile.c
pcre_config.c
pcre_dfa_exec.c
pcre_exec.c
pcre_fullinfo.c
pcre_get.c
pcre_globals.c
pcre_info.c
pcre_maketables.c
pcre_ord2utf8.c
pcre_refcount.c
pcre_study.c
pcre_tables.c
pcre_try_flipped.c
pcre_ucp_searchfuncs.c
pcre_valid_utf8.c
pcre_version.c
pcre_xclass.c
and link them all together into an object library in whichever form your system
keeps such libraries. This is the pcre C library. If your system has static and
shared libraries, you may have to do this once for each type.
rem These comments are out-of-date, referring to a previous release which
rem had fewer source files. Replace with the file names from above.
rem Mark Tetrode's commands, for a static library
rem Compile & lib
cl -DSUPPORT_UTF8 -DSUPPORT_UCP -DPOSIX_MALLOC_THRESHOLD=10 /c maketables.c get.c study.c pcre.c
lib /OUT:pcre.lib maketables.obj get.obj study.obj pcre.obj
(4) Similarly, compile pcreposix.c and link it (on its own) as the pcreposix
library.
rem Mark Tetrode's commands, for a static library
rem Compile & lib
cl -DSUPPORT_UTF8 -DSUPPORT_UCP -DPOSIX_MALLOC_THRESHOLD=10 /c pcreposix.c
lib /OUT:pcreposix.lib pcreposix.obj
(5) Compile the test program pcretest.c. This needs the functions in the
pcre and pcreposix libraries when linking.
rem Mark Tetrode's commands
rem compile & link
cl /F0x400000 pcretest.c pcre.lib pcreposix.lib
(6) Run pcretest on the testinput files in the testdata directory, and check
that the output matches the corresponding testoutput files. You must use the
-i option when checking testinput2. Note that the supplied files are in Unix
format, with just LF characters as line terminators. You may need to edit them
to change this if your system uses a different convention.
rem Mark Tetrode's commands
pcretest testdata\testinput1 testdata\myoutput1
windiff testdata\testoutput1 testdata\myoutput1
pcretest -i testdata\testinput2 testdata\myoutput2
windiff testdata\testoutput2 testdata\myoutput2
pcretest testdata\testinput3 testdata\myoutput3
windiff testdata\testoutput3 testdata\myoutput3
pcretest testdata\testinput4 testdata\myoutput4
windiff testdata\testoutput4 testdata\myoutput4
pcretest testdata\testinput5 testdata\myoutput5
windiff testdata\testoutput5 testdata\myoutput5
pcretest testdata\testinput6 testdata\myoutput6
windiff testdata\testoutput6 testdata\myoutput6
Note that there are now three more tests (7, 8, 9) that did not exist when Mark
wrote those comments. The test the new pcre_dfa_exec() function.
(7) If you want to use the pcregrep command, compile and link pcregrep.c; it
uses only the basic PCRE library.
THE C++ WRAPPER FUNCTIONS
The PCRE distribution now contains some C++ wrapper functions and tests,
contributed by Google Inc. On a system that can use "configure" and "make",
the functions are automatically built into a library called pcrecpp. It should
be straightforward to compile the .cc files manually on other systems. The
files called xxx_unittest.cc are test programs for each of the corresponding
xxx.cc files.
FURTHER REMARKS
If you have a system without "configure" but where you can use a Makefile, edit
Makefile.in to create Makefile, substituting suitable values for the variables
at the head of the file.
Some help in building a Win32 DLL of PCRE in GnuWin32 environments was
contributed by Paul Sokolovsky. These environments are Mingw32
(http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and CygWin
(http://sourceware.cygnus.com/cygwin/). Paul comments:
For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get
pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically
linked pgrep and pcretest. If you have /bin/sh, run RunTest (three
main test go ok, locale not supported).
Changes to do MinGW with autoconf 2.50 were supplied by Fred Cox
<sailorFred@yahoo.com>, who comments as follows:
If you are using the PCRE DLL, the normal Unix style configure && make &&
make check && make install should just work[*]. If you want to statically
link against the .a file, you must define PCRE_STATIC before including
pcre.h, otherwise the pcre_malloc and pcre_free exported functions will be
declared __declspec(dllimport), with hilarious results. See the configure.in
and pcretest.c for how it is done for the static test.
Also, there will only be a libpcre.la, not a libpcreposix.la, as you
would expect from the Unix version. The single DLL includes the pcreposix
interface.
[*] But note that the supplied test files are in Unix format, with just LF
characters as line terminators. You will have to edit them to change to CR LF
terminators.
A script for building PCRE using Borland's C++ compiler for use with VPASCAL
was contributed by Alexander Tokarev. It is called makevp.bat.
These are some further comments about Win32 builds from Mark Evans. They
were contributed before Fred Cox's changes were made, so it is possible that
they may no longer be relevant.
"The documentation for Win32 builds is a bit shy. Under MSVC6 I
followed their instructions to the letter, but there were still
some things missing.
(1) Must #define STATIC for entire project if linking statically.
(I see no reason to use DLLs for code this compact.) This of
course is a project setting in MSVC under Preprocessor.
(2) Missing some #ifdefs relating to the function pointers
pcre_malloc and pcre_free. See my solution below. (The stubs
may not be mandatory but they made me feel better.)"
=========================
#ifdef _WIN32
#include <malloc.h>
void* malloc_stub(size_t N)
{ return malloc(N); }
void free_stub(void* p)
{ free(p); }
void *(*pcre_malloc)(size_t) = &malloc_stub;
void (*pcre_free)(void *) = &free_stub;
#else
void *(*pcre_malloc)(size_t) = malloc;
void (*pcre_free)(void *) = free;
#endif
=========================
BUILDING PCRE ON OPENVMS
Dan Mooney sent the following comments about building PCRE on OpenVMS. They
relate to an older version of PCRE that used fewer source files, so the exact
commands will need changing. See the current list of source files above.
"It was quite easy to compile and link the library. I don't have a formal
make file but the attached file [reproduced below] contains the OpenVMS DCL
commands I used to build the library. I had to add #define
POSIX_MALLOC_THRESHOLD 10 to pcre.h since it was not defined anywhere.
The library was built on:
O/S: HP OpenVMS v7.3-1
Compiler: Compaq C v6.5-001-48BCD
Linker: vA13-01
The test results did not match 100% due to the issues you mention in your
documentation regarding isprint(), iscntrl(), isgraph() and ispunct(). I
modified some of the character tables temporarily and was able to get the
results to match. Tests using the fr locale did not match since I don't have
that locale loaded. The study size was always reported to be 3 less than the
value in the standard test output files."
=========================
$! This DCL procedure builds PCRE on OpenVMS
$!
$! I followed the instructions in the non-unix-use file in the distribution.
$!
$ COMPILE == "CC/LIST/NOMEMBER_ALIGNMENT/PREFIX_LIBRARY_ENTRIES=ALL_ENTRIES
$ COMPILE DFTABLES.C
$ LINK/EXE=DFTABLES.EXE DFTABLES.OBJ
$ RUN DFTABLES.EXE/OUTPUT=CHARTABLES.C
$ COMPILE MAKETABLES.C
$ COMPILE GET.C
$ COMPILE STUDY.C
$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
$! did not seem to be defined anywhere.
$! I edited pcre.h and added #DEFINE SUPPORT_UTF8 to enable UTF8 support.
$ COMPILE PCRE.C
$ LIB/CREATE PCRE MAKETABLES.OBJ, GET.OBJ, STUDY.OBJ, PCRE.OBJ
$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
$! did not seem to be defined anywhere.
$ COMPILE PCREPOSIX.C
$ LIB/CREATE PCREPOSIX PCREPOSIX.OBJ
$ COMPILE PCRETEST.C
$ LINK/EXE=PCRETEST.EXE PCRETEST.OBJ, PCRE/LIB, PCREPOSIX/LIB
$! C programs that want access to command line arguments must be
$! defined as a symbol
$ PCRETEST :== "$ SYS$ROADSUSERS:[DMOONEY.REGEXP]PCRETEST.EXE"
$! Arguments must be enclosed in quotes.
$ PCRETEST "-C"
$! Test results:
$!
$! The test results did not match 100%. The functions isprint(), iscntrl(),
$! isgraph() and ispunct() on OpenVMS must not produce the same results
$! as the system that built the test output files provided with the
$! distribution.
$!
$! The study size did not match and was always 3 less on OpenVMS.
$!
$! Locale could not be set to fr
$!
=========================
****

528
libs/pcre/README Normal file
View File

@ -0,0 +1,528 @@
README file for PCRE (Perl-compatible regular expression library)
-----------------------------------------------------------------
The latest release of PCRE is always available from
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
Please read the NEWS file if you are upgrading from a previous release.
The PCRE APIs
-------------
PCRE is written in C, and it has its own API. The distribution now includes a
set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
for details).
Also included are a set of C wrapper functions that are based on the POSIX
API. These end up in the library called libpcreposix. Note that this just
provides a POSIX calling interface to PCRE: the regular expressions themselves
still follow Perl syntax and semantics. The header file for the POSIX-style
functions is called pcreposix.h. The official POSIX name is regex.h, but I
didn't want to risk possible problems with existing files of that name by
distributing it that way. To use it with an existing program that uses the
POSIX API, it will have to be renamed or pointed at by a link.
If you are using the POSIX interface to PCRE and there is already a POSIX regex
library installed on your system, you must take care when linking programs to
ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
up the "real" POSIX functions of the same name.
Documentation for PCRE
----------------------
If you install PCRE in the normal way, you will end up with an installed set of
man pages whose names all start with "pcre". The one that is just called "pcre"
lists all the others. In addition to these man pages, the PCRE documentation is
supplied in two other forms; however, as there is no standard place to install
them, they are left in the doc directory of the unpacked source distribution.
These forms are:
1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
first of these is a concatenation of the text forms of all the section 3
man pages except those that summarize individual functions. The other two
are the text forms of the section 1 man pages for the pcregrep and
pcretest commands. Text forms are provided for ease of scanning with text
editors or similar tools.
2. A subdirectory called doc/html contains all the documentation in HTML
form, hyperlinked in various ways, and rooted in a file called
doc/index.html.
Contributions by users of PCRE
------------------------------
You can find contributions from PCRE users in the directory
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
where there is also a README file giving brief descriptions of what they are.
Several of them provide support for compiling PCRE on various flavours of
Windows systems (I myself do not use Windows). Some are complete in themselves;
others are pointers to URLs containing relevant files.
Building PCRE on a Unix-like system
-----------------------------------
If you are using HP's ANSI C++ compiler (aCC), please see the special note
in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
To build PCRE on a Unix-like system, first run the "configure" command from the
PCRE distribution directory, with your current directory set to the directory
where you want the files to be created. This command is a standard GNU
"autoconf" configuration script, for which generic instructions are supplied in
INSTALL.
Most commonly, people build PCRE within its own distribution directory, and in
this case, on many systems, just running "./configure" is sufficient, but the
usual methods of changing standard defaults are available. For example:
CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
specifies that the C compiler should be run with the flags '-O2 -Wall' instead
of the default, and that "make install" should install PCRE under /opt/local
instead of the default /usr/local.
If you want to build in a different directory, just run "configure" with that
directory as current. For example, suppose you have unpacked the PCRE source
into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
cd /build/pcre/pcre-xxx
/source/pcre/pcre-xxx/configure
PCRE is written in C and is normally compiled as a C library. However, it is
possible to build it as a C++ library, though the provided building apparatus
does not have any features to support this.
There are some optional features that can be included or omitted from the PCRE
library. You can read more about them in the pcrebuild man page.
. If you want to suppress the building of the C++ wrapper library, you can add
--disable-cpp to the "configure" command. Otherwise, when "configure" is run,
will try to find a C++ compiler and C++ header files, and if it succeeds, it
will try to build the C++ wrapper.
. If you want to make use of the support for UTF-8 character strings in PCRE,
you must add --enable-utf8 to the "configure" command. Without it, the code
for handling UTF-8 is not included in the library. (Even when included, it
still has to be enabled by an option at run time.)
. If, in addition to support for UTF-8 character strings, you want to include
support for the \P, \p, and \X sequences that recognize Unicode character
properties, you must add --enable-unicode-properties to the "configure"
command. This adds about 30K to the size of the library (in the form of a
property table); only the basic two-letter properties such as Lu are
supported.
. You can build PCRE to recognize either CR or LF or the sequence CRLF as
indicating the end of a line. Whatever you specify at build time is the
default; the caller of PCRE can change the selection at run time. The default
newline indicator is a single LF character (the Unix standard). You can
specify the default newline indicator by adding --newline-is-cr or
--newline-is-lf or --newline-is-crlf to the "configure" command,
respectively.
. When called via the POSIX interface, PCRE uses malloc() to get additional
storage for processing capturing parentheses if there are more than 10 of
them. You can increase this threshold by setting, for example,
--with-posix-malloc-threshold=20
on the "configure" command.
. PCRE has a counter that can be set to limit the amount of resources it uses.
If the limit is exceeded during a match, the match fails. The default is ten
million. You can change the default by setting, for example,
--with-match-limit=500000
on the "configure" command. This is just the default; individual calls to
pcre_exec() can supply their own value. There is discussion on the pcreapi
man page.
. There is a separate counter that limits the depth of recursive function calls
during a matching process. This also has a default of ten million, which is
essentially "unlimited". You can change the default by setting, for example,
--with-match-limit-recursion=500000
Recursive function calls use up the runtime stack; running out of stack can
cause programs to crash in strange ways. There is a discussion about stack
sizes in the pcrestack man page.
. The default maximum compiled pattern size is around 64K. You can increase
this by adding --with-link-size=3 to the "configure" command. You can
increase it even more by setting --with-link-size=4, but this is unlikely
ever to be necessary. If you build PCRE with an increased link size, test 2
(and 5 if you are using UTF-8) will fail. Part of the output of these tests
is a representation of the compiled pattern, and this changes with the link
size.
. You can build PCRE so that its internal match() function that is called from
pcre_exec() does not call itself recursively. Instead, it uses blocks of data
from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
to save data that would otherwise be saved on the stack. To build PCRE like
this, use
--disable-stack-for-recursion
on the "configure" command. PCRE runs more slowly in this mode, but it may be
necessary in environments with limited stack sizes. This applies only to the
pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
use deeply nested recursion.
The "configure" script builds eight files for the basic C library:
. Makefile is the makefile that builds the library
. config.h contains build-time configuration options for the library
. pcre-config is a script that shows the settings of "configure" options
. libpcre.pc is data for the pkg-config command
. libtool is a script that builds shared and/or static libraries
. RunTest is a script for running tests on the library
. RunGrepTest is a script for running tests on the pcregrep command
In addition, if a C++ compiler is found, the following are also built:
. pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
. pcre_stringpiece.h is the header for the C++ "stringpiece" functions
The "configure" script also creates config.status, which is an executable
script that can be run to recreate the configuration, and config.log, which
contains compiler output from tests that "configure" runs.
Once "configure" has run, you can run "make". It builds two libraries, called
libpcre and libpcreposix, a test program called pcretest, and the pcregrep
command. If a C++ compiler was found on your system, it also builds the C++
wrapper library, which is called libpcrecpp, and some test programs called
pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
The command "make test" runs all the appropriate tests. Details of the PCRE
tests are given in a separate section of this document, below.
You can use "make install" to copy the libraries, the public header files
pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
the C++ wrapper was built), and the man pages to appropriate live directories
on your system, in the normal way.
If you want to remove PCRE from your system, you can run "make uninstall".
This removes all the files that "make install" installed. However, it does not
remove any directories, because these are often shared with other programs.
Retrieving configuration information on Unix-like systems
---------------------------------------------------------
Running "make install" also installs the command pcre-config, which can be used
to recall information about the PCRE configuration and installation. For
example:
pcre-config --version
prints the version number, and
pcre-config --libs
outputs information about where the library is installed. This command can be
included in makefiles for programs that use PCRE, saving the programmer from
having to remember too many details.
The pkg-config command is another system for saving and retrieving information
about installed libraries. Instead of separate commands for each library, a
single command is used. For example:
pkg-config --cflags pcre
The data is held in *.pc files that are installed in a directory called
pkgconfig.
Shared libraries on Unix-like systems
-------------------------------------
The default distribution builds PCRE as shared libraries and static libraries,
as long as the operating system supports shared libraries. Shared library
support relies on the "libtool" script which is built as part of the
"configure" process.
The libtool script is used to compile and link both shared and static
libraries. They are placed in a subdirectory called .libs when they are newly
built. The programs pcretest and pcregrep are built to use these uninstalled
libraries (by means of wrapper scripts in the case of shared libraries). When
you use "make install" to install shared libraries, pcregrep and pcretest are
automatically re-built to use the newly installed shared libraries before being
installed themselves. However, the versions left in the source directory still
use the uninstalled libraries.
To build PCRE using static libraries only you must use --disable-shared when
configuring it. For example:
./configure --prefix=/usr/gnu --disable-shared
Then run "make" in the usual way. Similarly, you can use --disable-static to
build only shared libraries.
Cross-compiling on a Unix-like system
-------------------------------------
You can specify CC and CFLAGS in the normal way to the "configure" command, in
order to cross-compile PCRE for some other host. However, during the building
process, the dftables.c source file is compiled *and run* on the local host, in
order to generate the default character tables (the chartables.c file). It
therefore needs to be compiled with the local compiler, not the cross compiler.
You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
when calling the "configure" command. If they are not specified, they default
to the values of CC and CFLAGS.
Using HP's ANSI C++ compiler (aCC)
----------------------------------
Unless C++ support is disabled by specifiying the "--disable-cpp" option of the
"configure" script, you *must* include the "-AA" option in the CXXFLAGS
environment variable in order for the C++ components to compile correctly.
Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
needed libraries fail to get included when specifying the "-AA" compiler
option. If you experience unresolved symbols when linking the C++ programs,
use the workaround of specifying the following environment variable prior to
running the "configure" script:
CXXLDFLAGS="-lstd_v2 -lCsup_v2"
Building on non-Unix systems
----------------------------
For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
the system supports the use of "configure" and "make" you may be able to build
PCRE in the same way as for Unix systems.
PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
the details because I don't use those systems. It should be straightforward to
build PCRE on any system that has a Standard C compiler, because it uses only
Standard C functions.
Testing PCRE
------------
To test PCRE on a Unix system, run the RunTest script that is created by the
configuring process. There is also a script called RunGrepTest that tests the
options of the pcregrep command. If the C++ wrapper library is build, three
test programs called pcrecpp_unittest, pcre_scanner_unittest, and
pcre_stringpiece_unittest are provided.
Both the scripts and all the program tests are run if you obey "make runtest",
"make check", or "make test". For other systems, see the instructions in
NON-UNIX-USE.
The RunTest script runs the pcretest test program (which is documented in its
own man page) on each of the testinput files (in the testdata directory) in
turn, and compares the output with the contents of the corresponding testoutput
file. A file called testtry is used to hold the main output from pcretest
(testsavedregex is also used as a working file). To run pcretest on just one of
the test files, give its number as an argument to RunTest, for example:
RunTest 2
The first file can also be fed directly into the perltest script to check that
Perl gives the same results. The only difference you should see is in the first
few lines, where the Perl version is given instead of the PCRE version.
The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
detection, and run-time flags that are specific to PCRE, as well as the POSIX
wrapper API. It also uses the debugging flag to check some of the internals of
pcre_compile().
If you build PCRE with a locale setting that is not the standard C locale, the
character tables may be different (see next paragraph). In some cases, this may
cause failures in the second set of tests. For example, in a locale where the
isprint() function yields TRUE for characters in the range 128-255, the use of
[:isascii:] inside a character class defines a different set of characters, and
this shows up in this test as a difference in the compiled code, which is being
listed for checking. Where the comparison test output contains [\x00-\x7f] the
test will contain [\x00-\xff], and similarly in some other cases. This is not a
bug in PCRE.
The third set of tests checks pcre_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
default tables. The tests make use of the "fr_FR" (French) locale. Before
running the test, the script checks for the presence of this locale by running
the "locale" command. If that command fails, or if it doesn't include "fr_FR"
in the list of available locales, the third test cannot be run, and a comment
is output to say why. If running this test produces instances of the error
** Failed to set locale "fr_FR"
in the comparison output, it means that locale is not available on your system,
despite being listed by "locale". This does not mean that PCRE is broken.
The fourth test checks the UTF-8 support. It is not run automatically unless
PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
running "configure". This file can be also fed directly to the perltest script,
provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
commented in the script, can be be used.)
The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
features of PCRE that are not relevant to Perl.
The sixth and test checks the support for Unicode character properties. It it
not run automatically unless PCRE is built with Unicode property support. To to
this you must set --enable-unicode-properties when running "configure".
The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
property support, respectively. The eighth and ninth tests are not run
automatically unless PCRE is build with the relevant support.
Character tables
----------------
PCRE uses four tables for manipulating and identifying characters whose values
are less than 256. The final argument of the pcre_compile() function is a
pointer to a block of memory containing the concatenated tables. A call to
pcre_maketables() can be used to generate a set of tables in the current
locale. If the final argument for pcre_compile() is passed as NULL, a set of
default tables that is built into the binary is used.
The source file called chartables.c contains the default set of tables. This is
not supplied in the distribution, but is built by the program dftables
(compiled from dftables.c), which uses the ANSI C character handling functions
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
sources. This means that the default C locale which is set for your system will
control the contents of these default tables. You can change the default tables
by editing chartables.c and then re-building PCRE. If you do this, you should
probably also edit Makefile to ensure that the file doesn't ever get
re-generated.
The first two 256-byte tables provide lower casing and case flipping functions,
respectively. The next table consists of three 32-byte bit maps which identify
digits, "word" characters, and white space, respectively. These are used when
building 32-byte bit maps that represent character classes.
The final 256-byte table has bits indicating various character types, as
follows:
1 white space character
2 letter
4 decimal digit
8 hexadecimal digit
16 alphanumeric or '_'
128 regular expression metacharacter or binary zero
You should not alter the set of characters that contain the 128 bit, as that
will cause PCRE to malfunction.
Manifest
--------
The distribution should contain the following files:
(A) The actual source files of the PCRE library functions and their
headers:
dftables.c auxiliary program for building chartables.c
pcreposix.c )
pcre_compile.c )
pcre_config.c )
pcre_dfa_exec.c )
pcre_exec.c )
pcre_fullinfo.c )
pcre_get.c ) sources for the functions in the library,
pcre_globals.c ) and some internal functions that they use
pcre_info.c )
pcre_maketables.c )
pcre_ord2utf8.c )
pcre_refcount.c )
pcre_study.c )
pcre_tables.c )
pcre_try_flipped.c )
pcre_ucp_searchfuncs.c)
pcre_valid_utf8.c )
pcre_version.c )
pcre_xclass.c )
ucptable.c )
pcre_printint.src ) debugging function that is #included in pcretest, and
) can also be #included in pcre_compile()
pcre.h the public PCRE header file
pcreposix.h header for the external POSIX wrapper API
pcre_internal.h header for internal use
ucp.h ) headers concerned with
ucpinternal.h ) Unicode property handling
config.in template for config.h, which is built by configure
pcrecpp.h the header file for the C++ wrapper
pcrecpparg.h.in "source" for another C++ header file
pcrecpp.cc )
pcre_scanner.cc ) source for the C++ wrapper library
pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
C++ stringpiece functions
pcre_stringpiece.cc source for the C++ stringpiece functions
(B) Auxiliary files:
AUTHORS information about the author of PCRE
ChangeLog log of changes to the code
INSTALL generic installation instructions
LICENCE conditions for the use of PCRE
COPYING the same, using GNU's standard name
Makefile.in template for Unix Makefile, which is built by configure
NEWS important changes in this release
NON-UNIX-USE notes on building PCRE on non-Unix systems
README this file
RunTest.in template for a Unix shell script for running tests
RunGrepTest.in template for a Unix shell script for pcregrep tests
config.guess ) files used by libtool,
config.sub ) used only when building a shared library
config.h.in "source" for the config.h header file
configure a configuring shell script (built by autoconf)
configure.ac the autoconf input used to build configure
doc/Tech.Notes notes on the encoding
doc/*.3 man page sources for the PCRE functions
doc/*.1 man page sources for pcregrep and pcretest
doc/html/* HTML documentation
doc/pcre.txt plain text version of the man pages
doc/pcretest.txt plain text documentation of test program
doc/perltest.txt plain text documentation of Perl test program
install-sh a shell script for installing files
libpcre.pc.in "source" for libpcre.pc for pkg-config
ltmain.sh file used to build a libtool script
mkinstalldirs script for making install directories
pcretest.c comprehensive test program
pcredemo.c simple demonstration of coding calls to PCRE
perltest Perl test program
pcregrep.c source of a grep utility that uses PCRE
pcre-config.in source of script which retains PCRE information
pcrecpp_unittest.c )
pcre_scanner_unittest.c ) test programs for the C++ wrapper
pcre_stringpiece_unittest.c )
testdata/testinput* test data for main library tests
testdata/testoutput* expected test results
testdata/grep* input and output for pcregrep tests
(C) Auxiliary files for Win32 DLL
libpcre.def
libpcreposix.def
(D) Auxiliary file for VPASCAL
makevp.bat
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
June 2006

208
libs/pcre/RunGrepTest.in Normal file
View File

@ -0,0 +1,208 @@
#! /bin/sh
# This file is generated by configure from RunGrepTest.in. Make any changes
# to that file.
echo "Testing pcregrep"
./pcregrep -V
# Run pcregrep tests. The assumption is that the PCRE tests check the library
# itself. What we are checking here is the file handling and options that are
# supported by pcregrep.
cf=diff
valgrind=
if [ ! -d testdata ] ; then
ln -s @top_srcdir@/testdata testdata
fi
testdata=./testdata
while [ $# -gt 0 ] ; do
case $1 in
valgrind) valgrind="valgrind -q --leak-check=no";;
*) echo "Unknown argument $1"; exit 1;;
esac
shift
done
echo "---------------------------- Test 1 ------------------------------" >testtry
$valgrind ./pcregrep PATTERN $testdata/grepinput >>testtry
echo "---------------------------- Test 2 ------------------------------" >>testtry
$valgrind ./pcregrep '^PATTERN' $testdata/grepinput >>testtry
echo "---------------------------- Test 3 ------------------------------" >>testtry
$valgrind ./pcregrep -in PATTERN $testdata/grepinput >>testtry
echo "---------------------------- Test 4 ------------------------------" >>testtry
$valgrind ./pcregrep -ic PATTERN $testdata/grepinput >>testtry
echo "---------------------------- Test 5 ------------------------------" >>testtry
$valgrind ./pcregrep -in PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
echo "---------------------------- Test 6 ------------------------------" >>testtry
$valgrind ./pcregrep -inh PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
echo "---------------------------- Test 7 ------------------------------" >>testtry
$valgrind ./pcregrep -il PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
echo "---------------------------- Test 8 ------------------------------" >>testtry
$valgrind ./pcregrep -l PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
echo "---------------------------- Test 9 ------------------------------" >>testtry
$valgrind ./pcregrep -q PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
echo "RC=$?" >>testtry
echo "---------------------------- Test 10 -----------------------------" >>testtry
$valgrind ./pcregrep -q NEVER-PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
echo "RC=$?" >>testtry
echo "---------------------------- Test 11 -----------------------------" >>testtry
$valgrind ./pcregrep -vn pattern $testdata/grepinputx >>testtry
echo "---------------------------- Test 12 -----------------------------" >>testtry
$valgrind ./pcregrep -ix pattern $testdata/grepinputx >>testtry
echo "---------------------------- Test 13 -----------------------------" >>testtry
$valgrind ./pcregrep -f$testdata/greplist $testdata/grepinputx >>testtry
echo "---------------------------- Test 14 -----------------------------" >>testtry
$valgrind ./pcregrep -w pat $testdata/grepinput $testdata/grepinputx >>testtry
echo "---------------------------- Test 15 -----------------------------" >>testtry
$valgrind ./pcregrep 'abc^*' $testdata/grepinput 2>>testtry >>testtry
echo "---------------------------- Test 16 -----------------------------" >>testtry
$valgrind ./pcregrep abc $testdata/grepinput $testdata/nonexistfile 2>>testtry >>testtry
echo "---------------------------- Test 17 -----------------------------" >>testtry
$valgrind ./pcregrep -M 'the\noutput' $testdata/grepinput >>testtry
echo "---------------------------- Test 18 -----------------------------" >>testtry
$valgrind ./pcregrep -Mn '(the\noutput|dog\.\n--)' $testdata/grepinput >>testtry
echo "---------------------------- Test 19 -----------------------------" >>testtry
$valgrind ./pcregrep -Mix 'Pattern' $testdata/grepinputx >>testtry
echo "---------------------------- Test 20 -----------------------------" >>testtry
$valgrind ./pcregrep -Mixn 'complete pair\nof lines' $testdata/grepinputx >>testtry
echo "---------------------------- Test 21 -----------------------------" >>testtry
$valgrind ./pcregrep -nA3 'four' $testdata/grepinputx >>testtry
echo "---------------------------- Test 22 -----------------------------" >>testtry
$valgrind ./pcregrep -nB3 'four' $testdata/grepinputx >>testtry
echo "---------------------------- Test 23 -----------------------------" >>testtry
$valgrind ./pcregrep -C3 'four' $testdata/grepinputx >>testtry
echo "---------------------------- Test 24 -----------------------------" >>testtry
$valgrind ./pcregrep -A9 'four' $testdata/grepinputx >>testtry
echo "---------------------------- Test 25 -----------------------------" >>testtry
$valgrind ./pcregrep -nB9 'four' $testdata/grepinputx >>testtry
echo "---------------------------- Test 26 -----------------------------" >>testtry
$valgrind ./pcregrep -A9 -B9 'four' $testdata/grepinputx >>testtry
echo "---------------------------- Test 27 -----------------------------" >>testtry
$valgrind ./pcregrep -A10 'four' $testdata/grepinputx >>testtry
echo "---------------------------- Test 28 -----------------------------" >>testtry
$valgrind ./pcregrep -nB10 'four' $testdata/grepinputx >>testtry
echo "---------------------------- Test 29 -----------------------------" >>testtry
$valgrind ./pcregrep -C12 -B10 'four' $testdata/grepinputx >>testtry
echo "---------------------------- Test 30 -----------------------------" >>testtry
$valgrind ./pcregrep -inB3 'pattern' $testdata/grepinput $testdata/grepinputx >>testtry
echo "---------------------------- Test 31 -----------------------------" >>testtry
$valgrind ./pcregrep -inA3 'pattern' $testdata/grepinput $testdata/grepinputx >>testtry
echo "---------------------------- Test 32 -----------------------------" >>testtry
$valgrind ./pcregrep -L 'fox' $testdata/grepinput $testdata/grepinputx >>testtry
echo "---------------------------- Test 33 -----------------------------" >>testtry
$valgrind ./pcregrep 'fox' $testdata/grepnonexist >>testtry 2>&1
echo "RC=$?" >>testtry
echo "---------------------------- Test 34 -----------------------------" >>testtry
$valgrind ./pcregrep -s 'fox' $testdata/grepnonexist >>testtry 2>&1
echo "RC=$?" >>testtry
echo "---------------------------- Test 35 -----------------------------" >>testtry
$valgrind ./pcregrep -L -r --include=grepinputx 'fox' $testdata >>testtry
echo "RC=$?" >>testtry
echo "---------------------------- Test 36 -----------------------------" >>testtry
$valgrind ./pcregrep -L -r --include=grepinput --exclude 'grepinput$' 'fox' $testdata >>testtry
echo "RC=$?" >>testtry
echo "---------------------------- Test 37 -----------------------------" >>testtry
$valgrind ./pcregrep '^(a+)*\d' $testdata/grepinput >>testtry 2>teststderr
echo "RC=$?" >>testtry
echo "======== STDERR ========" >>testtry
cat teststderr >>testtry
echo "---------------------------- Test 38 ------------------------------" >>testtry
$valgrind ./pcregrep '>\x00<' $testdata/grepinput >>testtry
echo "---------------------------- Test 39 ------------------------------" >>testtry
$valgrind ./pcregrep -A1 'before the binary zero' $testdata/grepinput >>testtry
echo "---------------------------- Test 40 ------------------------------" >>testtry
$valgrind ./pcregrep -B1 'after the binary zero' $testdata/grepinput >>testtry
echo "---------------------------- Test 41 ------------------------------" >>testtry
$valgrind ./pcregrep -B1 -o '\w+ the binary zero' $testdata/grepinput >>testtry
echo "---------------------------- Test 41 ------------------------------" >>testtry
$valgrind ./pcregrep -B1 -onH '\w+ the binary zero' $testdata/grepinput >>testtry
echo "---------------------------- Test 42 ------------------------------" >>testtry
$valgrind ./pcregrep -on 'before|zero|after' $testdata/grepinput >>testtry
echo "---------------------------- Test 43 ------------------------------" >>testtry
$valgrind ./pcregrep -on -e before -e zero -e after $testdata/grepinput >>testtry
echo "---------------------------- Test 44 ------------------------------" >>testtry
$valgrind ./pcregrep -on -f $testdata/greplist -e binary $testdata/grepinput >>testtry
echo "---------------------------- Test 45 ------------------------------" >>testtry
$valgrind ./pcregrep -e abc -e '(unclosed' $testdata/grepinput 2>>testtry >>testtry
echo "---------------------------- Test 46 ------------------------------" >>testtry
$valgrind ./pcregrep -Fx "AB.VE
elephant" $testdata/grepinput >>testtry
echo "---------------------------- Test 47 ------------------------------" >>testtry
$valgrind ./pcregrep -F "AB.VE
elephant" $testdata/grepinput >>testtry
echo "---------------------------- Test 48 ------------------------------" >>testtry
$valgrind ./pcregrep -F -e DATA -e "AB.VE
elephant" $testdata/grepinput >>testtry
echo "---------------------------- Test 49 ------------------------------" >>testtry
$valgrind ./pcregrep "^(abc|def|ghi|jkl)" $testdata/grepinputx >>testtry
echo "---------------------------- Test 50 ------------------------------" >>testtry
$valgrind ./pcregrep -N CR "^(abc|def|ghi|jkl)" $testdata/grepinputx >>testtry
echo "---------------------------- Test 51 ------------------------------" >>testtry
$valgrind ./pcregrep --newline=crlf "^(abc|def|ghi|jkl)" $testdata/grepinputx >>testtry
echo "---------------------------- Test 52 ------------------------------" >>testtry
$valgrind ./pcregrep --newline=cr -F "def jkl" $testdata/grepinputx >>testtry
echo "---------------------------- Test 53 ------------------------------" >>testtry
$valgrind ./pcregrep --newline=crlf -F "xxx
jkl" $testdata/grepinputx >>testtry
# Now compare the results.
$cf testtry $testdata/grepoutput
if [ $? != 0 ] ; then exit 1; else exit 0; fi
# End

258
libs/pcre/RunTest.in Executable file
View File

@ -0,0 +1,258 @@
#! /bin/sh
# This file is generated by configure from RunTest.in. Make any changes
# to that file.
# Run PCRE tests
cf=diff
valgrind=
if [ ! -d testdata ] ; then
ln -s @top_srcdir@/testdata testdata
fi
testdata=./testdata
# Select which tests to run; if no selection, run all
do1=no
do2=no
do3=no
do4=no
do5=no
do6=no
do7=no
do8=no
do9=no
while [ $# -gt 0 ] ; do
case $1 in
1) do1=yes;;
2) do2=yes;;
3) do3=yes;;
4) do4=yes;;
5) do5=yes;;
6) do6=yes;;
7) do7=yes;;
8) do8=yes;;
9) do9=yes;;
valgrind) valgrind="valgrind -q";;
*) echo "Unknown test number $1"; exit 1;;
esac
shift
done
if [ "@LINK_SIZE@" != "" -a "@LINK_SIZE@" != "-DLINK_SIZE=2" ] ; then
if [ $do2 = yes ] ; then
echo "Can't run test 2 with an internal link size other than 2"
exit 1
fi
if [ $do5 = yes ] ; then
echo "Can't run test 5 with an internal link size other than 2"
exit 1
fi
if [ $do6 = yes ] ; then
echo "Can't run test 6 with an internal link size other than 2"
exit 1
fi
fi
if [ "@UTF8@" = "" ] ; then
if [ $do4 = yes ] ; then
echo "Can't run test 4 because UTF-8 support is not configured"
exit 1
fi
if [ $do5 = yes ] ; then
echo "Can't run test 5 because UTF-8 support is not configured"
exit 1
fi
if [ $do6 = yes ] ; then
echo "Can't run test 6 because UTF-8 support is not configured"
exit 1
fi
if [ $do8 = yes ] ; then
echo "Can't run test 8 because UTF-8 support is not configured"
exit 1
fi
if [ $do9 = yes ] ; then
echo "Can't run test 9 because UTF-8 support is not configured"
exit 1
fi
fi
if [ "@UCP@" = "" ] ; then
if [ $do6 = yes ] ; then
echo "Can't run test 6 because Unicode property support is not configured"
exit 1
fi
if [ $do9 = yes ] ; then
echo "Can't run test 9 because Unicode property support is not configured"
exit 1
fi
fi
if [ $do1 = no -a $do2 = no -a $do3 = no -a $do4 = no -a \
$do5 = no -a $do6 = no -a $do7 = no -a $do8 = no -a \
$do9 = no ] ; then
do1=yes
do2=yes
do3=yes
if [ "@UTF8@" != "" ] ; then do4=yes; fi
if [ "@UTF8@" != "" ] ; then do5=yes; fi
if [ "@UTF8@" != "" -a "@UCP@" != "" ] ; then do6=yes; fi
do7=yes
if [ "@UTF8@" != "" ] ; then do8=yes; fi
if [ "@UTF8@" != "" -a "@UCP@" != "" ] ; then do9=yes; fi
fi
# Show which release
./pcretest /dev/null
# Primary test, Perl-compatible
if [ $do1 = yes ] ; then
echo "Test 1: main functionality (Perl compatible)"
$valgrind ./pcretest -q $testdata/testinput1 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput1
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "OK"
echo " "
fi
# PCRE tests that are not Perl-compatible - API & error tests, mostly
if [ $do2 = yes ] ; then
if [ "@LINK_SIZE@" = "" -o "@LINK_SIZE@" = "-DLINK_SIZE=2" ] ; then
echo "Test 2: API and error handling (not Perl compatible)"
$valgrind ./pcretest -q -i $testdata/testinput2 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput2
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "OK"
echo " "
else
echo Test 2 skipped for link size other than 2 \(@LINK_SIZE@\)
echo " "
fi
fi
# Locale-specific tests, provided the "fr_FR" locale is available
if [ $do3 = yes ] ; then
locale -a | grep '^fr_FR$' >/dev/null
if [ $? -eq 0 ] ; then
echo "Test 3: locale-specific features (using 'fr_FR' locale)"
$valgrind ./pcretest -q $testdata/testinput3 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput3
if [ $? != 0 ] ; then
echo " "
echo "Locale test did not run entirely successfully."
echo "This usually means that there is a problem with the locale"
echo "settings rather than a bug in PCRE."
else
echo "OK"
fi
echo " "
else exit 1
fi
else
echo "Cannot test locale-specific features - 'fr_FR' locale not found,"
echo "or the \"locale\" command is not available to check for it."
echo " "
fi
fi
# Additional tests for UTF8 support
if [ $do4 = yes ] ; then
echo "Test 4: UTF-8 support (Perl compatible)"
$valgrind ./pcretest -q $testdata/testinput4 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput4
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "OK"
echo " "
fi
if [ $do5 = yes ] ; then
if [ "@LINK_SIZE@" = "" -o "@LINK_SIZE@" = "-DLINK_SIZE=2" ] ; then
echo "Test 5: API and internals for UTF-8 support (not Perl compatible)"
$valgrind ./pcretest -q $testdata/testinput5 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput5
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "OK"
echo " "
else
echo Test 5 skipped for link size other than 2 \(@LINK_SIZE@\)
echo " "
fi
fi
if [ $do6 = yes ] ; then
if [ "@LINK_SIZE@" = "" -o "@LINK_SIZE@" = "-DLINK_SIZE=2" ] ; then
echo "Test 6: Unicode property support"
$valgrind ./pcretest -q $testdata/testinput6 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput6
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "OK"
echo " "
else
echo Test 6 skipped for link size other than 2 \(@LINK_SIZE@\)
echo " "
fi
fi
# Tests for DFA matching support
if [ $do7 = yes ] ; then
echo "Test 7: DFA matching"
$valgrind ./pcretest -q -dfa $testdata/testinput7 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput7
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "OK"
echo " "
fi
if [ $do8 = yes ] ; then
echo "Test 8: DFA matching with UTF-8"
$valgrind ./pcretest -q -dfa $testdata/testinput8 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput8
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "OK"
echo " "
fi
if [ $do9 = yes ] ; then
echo "Test 9: DFA matching with Unicode properties"
$valgrind ./pcretest -q -dfa $testdata/testinput9 testtry
if [ $? = 0 ] ; then
$cf testtry $testdata/testoutput9
if [ $? != 0 ] ; then exit 1; fi
else exit 1
fi
echo "OK"
echo " "
fi
# End

1495
libs/pcre/config.guess vendored Executable file

File diff suppressed because it is too large Load Diff

143
libs/pcre/config.h.in Normal file
View File

@ -0,0 +1,143 @@
/* On Unix-like systems config.in is converted by "configure" into config.h.
Some other environments also support the use of "configure". PCRE is written in
Standard C, but there are a few non-standard things it can cope with, allowing
it to run on SunOS4 and other "close to standard" systems.
On a non-Unix-like system you should just copy this file into config.h, and set
up the macros the way you need them. You should normally change the definitions
of HAVE_STRERROR and HAVE_MEMMOVE to 1. Unfortunately, because of the way
autoconf works, these cannot be made the defaults. If your system has bcopy()
and not memmove(), change the definition of HAVE_BCOPY instead of HAVE_MEMMOVE.
If your system has neither bcopy() nor memmove(), leave them both as 0; an
emulation function will be used. */
/* If you are compiling for a system that uses EBCDIC instead of ASCII
character codes, define this macro as 1. On systems that can use "configure",
this can be done via --enable-ebcdic. */
#ifndef EBCDIC
#define EBCDIC 0
#endif
/* If you are compiling for a system other than a Unix-like system or Win32,
and it needs some magic to be inserted before the definition of a function that
is exported by the library, define this macro to contain the relevant magic. If
you do not define this macro, it defaults to "extern" for a C compiler and
"extern C" for a C++ compiler on non-Win32 systems. This macro apears at the
start of every exported function that is part of the external API. It does not
appear on functions that are "external" in the C sense, but which are internal
to the library. */
/* #define PCRE_DATA_SCOPE */
/* Define the following macro to empty if the "const" keyword does not work. */
#undef const
/* Define the following macro to "unsigned" if <stddef.h> does not define
size_t. */
#undef size_t
/* The following two definitions are mainly for the benefit of SunOS4, which
does not have the strerror() or memmove() functions that should be present in
all Standard C libraries. The macros HAVE_STRERROR and HAVE_MEMMOVE should
normally be defined with the value 1 for other systems, but unfortunately we
cannot make this the default because "configure" files generated by autoconf
will only change 0 to 1; they won't change 1 to 0 if the functions are not
found. */
#define HAVE_STRERROR 0
#define HAVE_MEMMOVE 0
/* There are some non-Unix-like systems that don't even have bcopy(). If this
macro is false, an emulation is used. If HAVE_MEMMOVE is set to 1, the value of
HAVE_BCOPY is not relevant. */
#define HAVE_BCOPY 0
/* The value of NEWLINE determines the newline character. The default is to
leave it up to the compiler, but some sites want to force a particular value.
On Unix-like systems, "configure" can be used to override this default. */
#ifndef NEWLINE
#define NEWLINE '\n'
#endif
/* The value of LINK_SIZE determines the number of bytes used to store links as
offsets within the compiled regex. The default is 2, which allows for compiled
patterns up to 64K long. This covers the vast majority of cases. However, PCRE
can also be compiled to use 3 or 4 bytes instead. This allows for longer
patterns in extreme cases. On systems that support it, "configure" can be used
to override this default. */
#ifndef LINK_SIZE
#define LINK_SIZE 2
#endif
/* When calling PCRE via the POSIX interface, additional working storage is
required for holding the pointers to capturing substrings because PCRE requires
three integers per substring, whereas the POSIX interface provides only two. If
the number of expected substrings is small, the wrapper function uses space on
the stack, because this is faster than using malloc() for each call. The
threshold above which the stack is no longer used is defined by POSIX_MALLOC_
THRESHOLD. On systems that support it, "configure" can be used to override this
default. */
#ifndef POSIX_MALLOC_THRESHOLD
#define POSIX_MALLOC_THRESHOLD 10
#endif
/* PCRE uses recursive function calls to handle backtracking while matching.
This can sometimes be a problem on systems that have stacks of limited size.
Define NO_RECURSE to get a version that doesn't use recursion in the match()
function; instead it creates its own stack by steam using pcre_recurse_malloc()
to obtain memory from the heap. For more detail, see the comments and other
stuff just above the match() function. On systems that support it, "configure"
can be used to set this in the Makefile (use --disable-stack-for-recursion). */
/* #define NO_RECURSE */
/* The value of MATCH_LIMIT determines the default number of times the internal
match() function can be called during a single execution of pcre_exec(). There
is a runtime interface for setting a different limit. The limit exists in order
to catch runaway regular expressions that take for ever to determine that they
do not match. The default is set very large so that it does not accidentally
catch legitimate cases. On systems that support it, "configure" can be used to
override this default default. */
#ifndef MATCH_LIMIT
#define MATCH_LIMIT 10000000
#endif
/* The above limit applies to all calls of match(), whether or not they
increase the recursion depth. In some environments it is desirable to limit the
depth of recursive calls of match() more strictly, in order to restrict the
maximum amount of stack (or heap, if NO_RECURSE is defined) that is used. The
value of MATCH_LIMIT_RECURSION applies only to recursive calls of match(). To
have any useful effect, it must be less than the value of MATCH_LIMIT. There is
a runtime method for setting a different limit. On systems that support it,
"configure" can be used to override this default default. */
#ifndef MATCH_LIMIT_RECURSION
#define MATCH_LIMIT_RECURSION MATCH_LIMIT
#endif
/* These three limits are parameterized just in case anybody ever wants to
change them. Care must be taken if they are increased, because they guard
against integer overflow caused by enormously large patterns. */
#ifndef MAX_NAME_SIZE
#define MAX_NAME_SIZE 32
#endif
#ifndef MAX_NAME_COUNT
#define MAX_NAME_COUNT 10000
#endif
#ifndef MAX_DUPLENGTH
#define MAX_DUPLENGTH 30000
#endif
/* End */

1627
libs/pcre/config.sub vendored Executable file

File diff suppressed because it is too large Load Diff

21093
libs/pcre/configure vendored Executable file

File diff suppressed because it is too large Load Diff

302
libs/pcre/configure.ac Normal file
View File

@ -0,0 +1,302 @@
dnl Process this file with autoconf to produce a configure script.
dnl This configure.in file has been hacked around quite a lot as a result of
dnl patches that various people have sent to me (PH). Sometimes the information
dnl I get is contradictory. I've tried to put in comments that explain things,
dnl but in some cases the information is second-hand and I have no way of
dnl verifying it. I am not an autoconf or libtool expert!
dnl This is required at the start; the name is the name of a file
dnl it should be seeing, to verify it is in the same directory.
AC_INIT(dftables.c)
AC_CONFIG_SRCDIR([pcre.h])
dnl A safety precaution
AC_PREREQ(2.57)
dnl Arrange to build config.h from config.h.in.
dnl Manual says this macro should come right after AC_INIT.
AC_CONFIG_HEADER(config.h)
dnl Default values for miscellaneous macros
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=10
dnl Provide versioning information for libtool shared libraries that
dnl are built by default on Unix systems.
PCRE_LIB_VERSION=0:1:0
PCRE_POSIXLIB_VERSION=0:0:0
PCRE_CPPLIB_VERSION=0:0:0
dnl Find the PCRE version from the pcre.h file. The PCRE_VERSION variable is
dnl substituted in pcre-config.in.
PCRE_MAJOR=`grep '#define PCRE_MAJOR' ${srcdir}/pcre.h | cut -c 29-`
PCRE_MINOR=`grep '#define PCRE_MINOR' ${srcdir}/pcre.h | cut -c 29-`
PCRE_PRERELEASE=`grep '#define PCRE_PRERELEASE' ${srcdir}/pcre.h | cut -c 29-`
PCRE_VERSION=${PCRE_MAJOR}.${PCRE_MINOR}${PCRE_PRERELEASE}
dnl Handle --disable-cpp
AC_ARG_ENABLE(cpp,
[ --disable-cpp disable C++ support],
want_cpp="$enableval", want_cpp=yes)
dnl Checks for programs.
AC_PROG_CC
dnl Test for C++ for the C++ wrapper libpcrecpp. It seems, however, that
dnl AC_PROC_CXX will set $CXX to "g++" when no C++ compiler is installed, even
dnl though that is completely bogus. (This may happen only on certain systems
dnl with certain versions of autoconf, of course.) An attempt to include this
dnl test inside a check for want_cpp was criticized by a libtool expert, who
dnl tells me that it isn't allowed.
AC_PROG_CXX
dnl The icc compiler has the same options as gcc, so let the rest of the
dnl configure script think it has gcc when setting up dnl options etc.
dnl This is a nasty hack which no longer seems necessary with the update
dnl to the latest libtool files, so I have commented it out.
dnl
dnl if test "$CC" = "icc" ; then GCC=yes ; fi
AC_PROG_INSTALL
AC_LIBTOOL_WIN32_DLL
AC_PROG_LIBTOOL
dnl We need to find a compiler for compiling a program to run on the local host
dnl while building. It needs to be different from CC when cross-compiling.
dnl There is a macro called AC_PROG_CC_FOR_BUILD in the GNU archive for
dnl figuring this out automatically. Unfortunately, it does not work with the
dnl latest versions of autoconf. So for the moment, we just default to the
dnl same values as the "main" compiler. People who are cross-compiling will
dnl just have to adjust the Makefile by hand or set these values when they
dnl run "configure".
CC_FOR_BUILD=${CC_FOR_BUILD:-'$(CC)'}
CXX_FOR_BUILD=${CXX_FOR_BUILD:-'$(CXX)'}
CFLAGS_FOR_BUILD=${CFLAGS_FOR_BUILD:-'$(CFLAGS)'}
CPPFLAGS_FOR_BUILD=${CFLAGS_FOR_BUILD:-'$(CPPFLAGS)'}
CXXFLAGS_FOR_BUILD=${CXXFLAGS_FOR_BUILD:-'$(CXXFLAGS)'}
BUILD_EXEEXT=${BUILD_EXEEXT:-'$(EXEEXT)'}
BUILD_OBJEXT=${BUILD_OBJEXT:-'$(OBJEXT)'}
dnl Checks for header files.
AC_HEADER_STDC
AC_CHECK_HEADERS(limits.h)
dnl The files below are C++ header files. One person told me (PH) that
dnl AC_LANG_CPLUSPLUS unsets CXX if it was explicitly set to something which
dnl doesn't work. However, this doesn't always seem to be the case.
if test "x$want_cpp" = "xyes" -a -n "$CXX"
then
AC_LANG_SAVE
AC_LANG_CPLUSPLUS
dnl We could be more clever here, given we're doing AC_SUBST with this
dnl (eg set a var to be the name of the include file we want). But we're not
dnl so it's easy to change back to 'regular' autoconf vars if we needed to.
AC_CHECK_HEADERS(string, [pcre_have_cpp_headers="1"],
[pcre_have_cpp_headers="0"])
AC_CHECK_HEADERS(bits/type_traits.h, [pcre_have_bits_type_traits="1"],
[pcre_have_bits_type_traits="0"])
AC_CHECK_HEADERS(type_traits.h, [pcre_have_type_traits="1"],
[pcre_have_type_traits="0"])
dnl Using AC_SUBST eliminates the need to include config.h in a public .h file
AC_SUBST(pcre_have_bits_type_traits)
AC_SUBST(pcre_have_type_traits)
AC_LANG_RESTORE
fi
dnl From the above, we now have enough info to know if C++ is fully installed
if test "x$want_cpp" = "xyes" -a -n "$CXX" -a "$pcre_have_cpp_headers" = 1; then
MAYBE_CPP_TARGETS='$(CPP_TARGETS)'
HAVE_CPP=
else
MAYBE_CPP_TARGETS=
HAVE_CPP="#"
fi
AC_SUBST(MAYBE_CPP_TARGETS)
AC_SUBST(HAVE_CPP)
dnl Checks for typedefs, structures, and compiler characteristics.
AC_C_CONST
AC_TYPE_SIZE_T
AC_CHECK_TYPES([long long], [pcre_have_long_long="1"], [pcre_have_long_long="0"])
AC_CHECK_TYPES([unsigned long long], [pcre_have_ulong_long="1"], [pcre_have_ulong_long="0"])
AC_SUBST(pcre_have_long_long)
AC_SUBST(pcre_have_ulong_long)
dnl Checks for library functions.
AC_CHECK_FUNCS(bcopy memmove strerror strtoq strtoll)
dnl Handle --enable-utf8
AC_ARG_ENABLE(utf8,
[ --enable-utf8 enable UTF8 support],
if test "$enableval" = "yes"; then
UTF8=-DSUPPORT_UTF8
fi
)
dnl Handle --enable-unicode-properties
AC_ARG_ENABLE(unicode-properties,
[ --enable-unicode-properties enable Unicode properties support],
if test "$enableval" = "yes"; then
UCP=-DSUPPORT_UCP
fi
)
dnl Handle --enable-newline-is-cr
AC_ARG_ENABLE(newline-is-cr,
[ --enable-newline-is-cr use CR as the newline character],
if test "$enableval" = "yes"; then
NEWLINE=-DNEWLINE=13
fi
)
dnl Handle --enable-newline-is-lf
AC_ARG_ENABLE(newline-is-lf,
[ --enable-newline-is-lf use LF as the newline character],
if test "$enableval" = "yes"; then
NEWLINE=-DNEWLINE=10
fi
)
dnl Handle --enable-newline-is-crlf
AC_ARG_ENABLE(newline-is-crlf,
[ --enable-newline-is-crlf use CRLF as the newline sequence],
if test "$enableval" = "yes"; then
NEWLINE=-DNEWLINE=3338
fi
)
dnl Handle --enable-ebcdic
AC_ARG_ENABLE(ebcdic,
[ --enable-ebcdic assume EBCDIC coding rather than ASCII],
if test "$enableval" == "yes"; then
EBCDIC=-DEBCDIC=1
fi
)
dnl Handle --disable-stack-for-recursion
AC_ARG_ENABLE(stack-for-recursion,
[ --disable-stack-for-recursion disable use of stack recursion when matching],
if test "$enableval" = "no"; then
NO_RECURSE=-DNO_RECURSE
fi
)
dnl There doesn't seem to be a straightforward way of having parameters
dnl that set values, other than fudging the --with thing. So that's what
dnl I've done.
dnl Handle --with-posix-malloc-threshold=n
AC_ARG_WITH(posix-malloc-threshold,
[ --with-posix-malloc-threshold=10 threshold for POSIX malloc usage],
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=$withval
)
dnl Handle --with-link-size=n
AC_ARG_WITH(link-size,
[ --with-link-size=2 internal link size (2, 3, or 4 allowed)],
LINK_SIZE=-DLINK_SIZE=$withval
)
dnl Handle --with-match-limit=n
AC_ARG_WITH(match-limit,
[ --with-match-limit=10000000 default limit on internal looping],
MATCH_LIMIT=-DMATCH_LIMIT=$withval
)
dnl Handle --with-match-limit_recursion=n
AC_ARG_WITH(match-limit-recursion,
[ --with-match-limit-recursion=10000000 default limit on internal recursion],
MATCH_LIMIT_RECURSION=-DMATCH_LIMIT_RECURSION=$withval
)
dnl Unicode character property support implies UTF-8 support
if test "$UCP" != "" ; then
UTF8=-DSUPPORT_UTF8
fi
dnl "Export" these variables
AC_SUBST(BUILD_EXEEXT)
AC_SUBST(BUILD_OBJEXT)
AC_SUBST(CC_FOR_BUILD)
AC_SUBST(CXX_FOR_BUILD)
AC_SUBST(CFLAGS_FOR_BUILD)
AC_SUBST(CXXFLAGS_FOR_BUILD)
AC_SUBST(CXXLDFLAGS)
AC_SUBST(EBCDIC)
AC_SUBST(HAVE_MEMMOVE)
AC_SUBST(HAVE_STRERROR)
AC_SUBST(LINK_SIZE)
AC_SUBST(MATCH_LIMIT)
AC_SUBST(MATCH_LIMIT_RECURSION)
AC_SUBST(NEWLINE)
AC_SUBST(NO_RECURSE)
AC_SUBST(PCRE_LIB_VERSION)
AC_SUBST(PCRE_POSIXLIB_VERSION)
AC_SUBST(PCRE_CPPLIB_VERSION)
AC_SUBST(PCRE_VERSION)
AC_SUBST(POSIX_MALLOC_THRESHOLD)
AC_SUBST(UCP)
AC_SUBST(UTF8)
dnl Stuff to make MinGW work better. Special treatment is no longer
dnl needed for Cygwin.
case $host_os in
mingw* )
POSIX_OBJ=pcreposix.o
POSIX_LOBJ=pcreposix.lo
POSIX_LIB=
ON_WINDOWS=
NOT_ON_WINDOWS="#"
WIN_PREFIX=
;;
* )
ON_WINDOWS="#"
NOT_ON_WINDOWS=
POSIX_OBJ=
POSIX_LOBJ=
POSIX_LIB=libpcreposix.la
WIN_PREFIX=
;;
esac
AC_SUBST(WIN_PREFIX)
AC_SUBST(ON_WINDOWS)
AC_SUBST(NOT_ON_WINDOWS)
AC_SUBST(POSIX_OBJ)
AC_SUBST(POSIX_LOBJ)
AC_SUBST(POSIX_LIB)
if test "x$enable_shared" = "xno" ; then
AC_DEFINE([PCRE_STATIC],[1],[to link statically])
fi
dnl This must be last; it determines what files are written as well as config.h
AC_OUTPUT(Makefile pcre-config:pcre-config.in libpcre.pc:libpcre.pc.in pcrecpparg.h:pcrecpparg.h.in pcre_stringpiece.h:pcre_stringpiece.h.in RunGrepTest:RunGrepTest.in RunTest:RunTest.in,[chmod a+x RunTest RunGrepTest pcre-config])

172
libs/pcre/dftables.c Normal file
View File

@ -0,0 +1,172 @@
/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/
/* PCRE is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language.
Written by Philip Hazel
Copyright (c) 1997-2006 University of Cambridge
-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the University of Cambridge nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
*/
/* This is a freestanding support program to generate a file containing default
character tables for PCRE. The tables are built according to the default C
locale. Now that pcre_maketables is a function visible to the outside world, we
make use of its code from here in order to be consistent. */
#include <ctype.h>
#include <stdio.h>
#include <string.h>
#include "pcre_internal.h"
#define DFTABLES /* pcre_maketables.c notices this */
#include "pcre_maketables.c"
int main(int argc, char **argv)
{
int i;
FILE *f;
const unsigned char *tables = pcre_maketables();
const unsigned char *base_of_tables = tables;
if (argc != 2)
{
fprintf(stderr, "dftables: one filename argument is required\n");
return 1;
}
f = fopen(argv[1], "wb");
if (f == NULL)
{
fprintf(stderr, "dftables: failed to open %s for writing\n", argv[1]);
return 1;
}
/* There are two fprintf() calls here, because gcc in pedantic mode complains
about the very long string otherwise. */
fprintf(f,
"/*************************************************\n"
"* Perl-Compatible Regular Expressions *\n"
"*************************************************/\n\n"
"/* This file is automatically written by the dftables auxiliary \n"
"program. If you edit it by hand, you might like to edit the Makefile to \n"
"prevent its ever being regenerated.\n\n");
fprintf(f,
"This file contains the default tables for characters with codes less than\n"
"128 (ASCII characters). These tables are used when no external tables are\n"
"passed to PCRE. */\n\n"
"const unsigned char _pcre_default_tables[] = {\n\n"
"/* This table is a lower casing table. */\n\n");
fprintf(f, " ");
for (i = 0; i < 256; i++)
{
if ((i & 7) == 0 && i != 0) fprintf(f, "\n ");
fprintf(f, "%3d", *tables++);
if (i != 255) fprintf(f, ",");
}
fprintf(f, ",\n\n");
fprintf(f, "/* This table is a case flipping table. */\n\n");
fprintf(f, " ");
for (i = 0; i < 256; i++)
{
if ((i & 7) == 0 && i != 0) fprintf(f, "\n ");
fprintf(f, "%3d", *tables++);
if (i != 255) fprintf(f, ",");
}
fprintf(f, ",\n\n");
fprintf(f,
"/* This table contains bit maps for various character classes.\n"
"Each map is 32 bytes long and the bits run from the least\n"
"significant end of each byte. The classes that have their own\n"
"maps are: space, xdigit, digit, upper, lower, word, graph\n"
"print, punct, and cntrl. Other classes are built from combinations. */\n\n");
fprintf(f, " ");
for (i = 0; i < cbit_length; i++)
{
if ((i & 7) == 0 && i != 0)
{
if ((i & 31) == 0) fprintf(f, "\n");
fprintf(f, "\n ");
}
fprintf(f, "0x%02x", *tables++);
if (i != cbit_length - 1) fprintf(f, ",");
}
fprintf(f, ",\n\n");
fprintf(f,
"/* This table identifies various classes of character by individual bits:\n"
" 0x%02x white space character\n"
" 0x%02x letter\n"
" 0x%02x decimal digit\n"
" 0x%02x hexadecimal digit\n"
" 0x%02x alphanumeric or '_'\n"
" 0x%02x regular expression metacharacter or binary zero\n*/\n\n",
ctype_space, ctype_letter, ctype_digit, ctype_xdigit, ctype_word,
ctype_meta);
fprintf(f, " ");
for (i = 0; i < 256; i++)
{
if ((i & 7) == 0 && i != 0)
{
fprintf(f, " /* ");
if (isprint(i-8)) fprintf(f, " %c -", i-8);
else fprintf(f, "%3d-", i-8);
if (isprint(i-1)) fprintf(f, " %c ", i-1);
else fprintf(f, "%3d", i-1);
fprintf(f, " */\n ");
}
fprintf(f, "0x%02x", *tables++);
if (i != 255) fprintf(f, ",");
}
fprintf(f, "};/* ");
if (isprint(i-8)) fprintf(f, " %c -", i-8);
else fprintf(f, "%3d-", i-8);
if (isprint(i-1)) fprintf(f, " %c ", i-1);
else fprintf(f, "%3d", i-1);
fprintf(f, " */\n\n/* End of chartables.c */\n");
fclose(f);
free((void *)base_of_tables);
return 0;
}
/* End of dftables.c */

348
libs/pcre/doc/Tech.Notes Normal file
View File

@ -0,0 +1,348 @@
Technical Notes about PCRE
--------------------------
These are very rough technical notes that record potentially useful information
about PCRE internals.
Historical note 1
-----------------
Many years ago I implemented some regular expression functions to an algorithm
suggested by Martin Richards. These were not Unix-like in form, and were quite
restricted in what they could do by comparison with Perl. The interesting part
about the algorithm was that the amount of space required to hold the compiled
form of an expression was known in advance. The code to apply an expression did
not operate by backtracking, as the original Henry Spencer code and current
Perl code does, but instead checked all possibilities simultaneously by keeping
a list of current states and checking all of them as it advanced through the
subject string. In the terminology of Jeffrey Friedl's book, it was a "DFA
algorithm". When the pattern was all used up, all remaining states were
possible matches, and the one matching the longest subset of the subject string
was chosen. This did not necessarily maximize the individual wild portions of
the pattern, as is expected in Unix and Perl-style regular expressions.
Historical note 2
-----------------
By contrast, the code originally written by Henry Spencer (which was
subsequently heavily modified for Perl) compiles the expression twice: once in
a dummy mode in order to find out how much store will be needed, and then for
real. (The Perl version probably doesn't do this any more; I'm talking about
the original library.) The execution function operates by backtracking and
maximizing (or, optionally, minimizing in Perl) the amount of the subject that
matches individual wild portions of the pattern. This is an "NFA algorithm" in
Friedl's terminology.
OK, here's the real stuff
-------------------------
For the set of functions that form the "basic" PCRE library (which are
unrelated to those mentioned above), I tried at first to invent an algorithm
that used an amount of store bounded by a multiple of the number of characters
in the pattern, to save on compiling time. However, because of the greater
complexity in Perl regular expressions, I couldn't do this. In any case, a
first pass through the pattern is needed, for a number of reasons. PCRE works
by running a very degenerate first pass to calculate a maximum store size, and
then a second pass to do the real compile - which may use a bit less than the
predicted amount of store. The idea is that this is going to turn out faster
because the first pass is degenerate and the second pass can just store stuff
straight into the vector, which it knows is big enough. It does make the
compiling functions bigger, of course, but they have become quite big anyway to
handle all the Perl stuff.
Traditional matching function
-----------------------------
The "traditional", and original, matching function is called pcre_exec(), and
it implements an NFA algorithm, similar to the original Henry Spencer algorithm
and the way that Perl works. Not surprising, since it is intended to be as
compatible with Perl as possible. This is the function most users of PCRE will
use most of the time.
Supplementary matching function
-------------------------------
From PCRE 6.0, there is also a supplementary matching function called
pcre_dfa_exec(). This implements a DFA matching algorithm that searches
simultaneously for all possible matches that start at one point in the subject
string. (Going back to my roots: see Historical Note 1 above.) This function
intreprets the same compiled pattern data as pcre_exec(); however, not all the
facilities are available, and those that are do not always work in quite the
same way. See the user documentation for details.
Format of compiled patterns
---------------------------
The compiled form of a pattern is a vector of bytes, containing items of
variable length. The first byte in an item is an opcode, and the length of the
item is either implicit in the opcode or contained in the data bytes that
follow it.
In many cases below "two-byte" data values are specified. This is in fact just
a default. PCRE can be compiled to use 3-byte or 4-byte values (impairing the
performance). This is necessary only when patterns whose compiled length is
greater than 64K are going to be processed. In this description, we assume the
"normal" compilation options.
A list of all the opcodes follows:
Opcodes with no following data
------------------------------
These items are all just one byte long
OP_END end of pattern
OP_ANY match any character
OP_ANYBYTE match any single byte, even in UTF-8 mode
OP_SOD match start of data: \A
OP_SOM, start of match (subject + offset): \G
OP_CIRC ^ (start of data, or after \n in multiline)
OP_NOT_WORD_BOUNDARY \W
OP_WORD_BOUNDARY \w
OP_NOT_DIGIT \D
OP_DIGIT \d
OP_NOT_WHITESPACE \S
OP_WHITESPACE \s
OP_NOT_WORDCHAR \W
OP_WORDCHAR \w
OP_EODN match end of data or \n at end: \Z
OP_EOD match end of data: \z
OP_DOLL $ (end of data, or before \n in multiline)
OP_EXTUNI match an extended Unicode character
Repeating single characters
---------------------------
The common repeats (*, +, ?) when applied to a single character use the
following opcodes:
OP_STAR
OP_MINSTAR
OP_PLUS
OP_MINPLUS
OP_QUERY
OP_MINQUERY
In ASCII mode, these are two-byte items; in UTF-8 mode, the length is variable.
Those with "MIN" in their name are the minimizing versions. Each is followed by
the character that is to be repeated. Other repeats make use of
OP_UPTO
OP_MINUPTO
OP_EXACT
which are followed by a two-byte count (most significant first) and the
repeated character. OP_UPTO matches from 0 to the given number. A repeat with a
non-zero minimum and a fixed maximum is coded as an OP_EXACT followed by an
OP_UPTO (or OP_MINUPTO).
Repeating character types
-------------------------
Repeats of things like \d are done exactly as for single characters, except
that instead of a character, the opcode for the type is stored in the data
byte. The opcodes are:
OP_TYPESTAR
OP_TYPEMINSTAR
OP_TYPEPLUS
OP_TYPEMINPLUS
OP_TYPEQUERY
OP_TYPEMINQUERY
OP_TYPEUPTO
OP_TYPEMINUPTO
OP_TYPEEXACT
Match by Unicode property
-------------------------
OP_PROP and OP_NOTPROP are used for positive and negative matches of a
character by testing its Unicode property (the \p and \P escape sequences).
Each is followed by two bytes that encode the desired property as a type and a
value.
Repeats of these items use the OP_TYPESTAR etc. set of opcodes, followed by
three bytes: OP_PROP or OP_NOTPROP and then the desired property type and
value.
Matching literal characters
---------------------------
The OP_CHAR opcode is followed by a single character that is to be matched
casefully. For caseless matching, OP_CHARNC is used. In UTF-8 mode, the
character may be more than one byte long. (Earlier versions of PCRE used
multi-character strings, but this was changed to allow some new features to be
added.)
Character classes
-----------------
If there is only one character, OP_CHAR or OP_CHARNC is used for a positive
class, and OP_NOT for a negative one (that is, for something like [^a]).
However, in UTF-8 mode, the use of OP_NOT applies only to characters with
values < 128, because OP_NOT is confined to single bytes.
Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a repeated,
negated, single-character class. The normal ones (OP_STAR etc.) are used for a
repeated positive single-character class.
When there's more than one character in a class and all the characters are less
than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a negative
one. In either case, the opcode is followed by a 32-byte bit map containing a 1
bit for every character that is acceptable. The bits are counted from the least
significant end of each byte.
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 mode,
subject characters with values greater than 256 can be handled correctly. For
OP_CLASS they don't match, whereas for OP_NCLASS they do.
For classes containing characters with values > 255, OP_XCLASS is used. It
optionally uses a bit map (if any characters lie within it), followed by a list
of pairs and single characters. There is a flag character than indicates
whether it's a positive or a negative class.
Back references
---------------
OP_REF is followed by two bytes containing the reference number.
Repeating character classes and back references
-----------------------------------------------
Single-character classes are handled specially (see above). This applies to
OP_CLASS and OP_REF. In both cases, the repeat information follows the base
item. The matching code looks at the following opcode to see if it is one of
OP_CRSTAR
OP_CRMINSTAR
OP_CRPLUS
OP_CRMINPLUS
OP_CRQUERY
OP_CRMINQUERY
OP_CRRANGE
OP_CRMINRANGE
All but the last two are just single-byte items. The others are followed by
four bytes of data, comprising the minimum and maximum repeat counts.
Brackets and alternation
------------------------
A pair of non-capturing (round) brackets is wrapped round each expression at
compile time, so alternation always happens in the context of brackets.
Non-capturing brackets use the opcode OP_BRA, while capturing brackets use
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
speakers, including myself, can be round, square, curly, or pointy. Hence this
usage.]
Originally PCRE was limited to 99 capturing brackets (so as not to use up all
the opcodes). From release 3.5, there is no limit. What happens is that the
first ones, up to EXTRACT_BASIC_MAX are handled with separate opcodes, as
above. If there are more, the opcode is set to EXTRACT_BASIC_MAX+1, and the
first operation in the bracket is OP_BRANUMBER, followed by a 2-byte bracket
number. This opcode is ignored while matching, but is fished out when handling
the bracket itself. (They could have all been done like this, but I was making
minimal changes.)
A bracket opcode is followed by LINK_SIZE bytes which give the offset to the
next alternative OP_ALT or, if there aren't any branches, to the matching
OP_KET opcode. Each OP_ALT is followed by LINK_SIZE bytes giving the offset to
the next one, or to the OP_KET opcode.
OP_KET is used for subpatterns that do not repeat indefinitely, while
OP_KETRMIN and OP_KETRMAX are used for indefinite repetitions, minimally or
maximally respectively. All three are followed by LINK_SIZE bytes giving (as a
positive number) the offset back to the matching OP_BRA opcode.
If a subpattern is quantified such that it is permitted to match zero times, it
is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte
opcodes which tell the matcher that skipping this subpattern entirely is a
valid branch.
A subpattern with an indefinite maximum repetition is replicated in the
compiled data its minimum number of times (or once with OP_BRAZERO if the
minimum is zero), with the final copy terminating with OP_KETRMIN or OP_KETRMAX
as appropriate.
A subpattern with a bounded maximum repetition is replicated in a nested
fashion up to the maximum number of times, with OP_BRAZERO or OP_BRAMINZERO
before each replication after the minimum, so that, for example, (abc){2,5} is
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?.
Assertions
----------
Forward assertions are just like other subpatterns, but starting with one of
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
is OP_REVERSE, followed by a two byte count of the number of characters to move
back the pointer in the subject string. When operating in UTF-8 mode, the count
is a character count rather than a byte count. A separate count is present in
each alternative of a lookbehind assertion, allowing them to have different
fixed lengths.
Once-only subpatterns
---------------------
These are also just like other subpatterns, but they start with the opcode
OP_ONCE.
Conditional subpatterns
-----------------------
These are like other subpatterns, but they start with the opcode OP_COND. If
the condition is a back reference, this is stored at the start of the
subpattern using the opcode OP_CREF followed by two bytes containing the
reference number. If the condition is "in recursion" (coded as "(?(R)"), the
same scheme is used, with a "reference number" of 0xffff. Otherwise, a
conditional subpattern always starts with one of the assertions.
Recursion
---------
Recursion either matches the current regex, or some subexpression. The opcode
OP_RECURSE is followed by an value which is the offset to the starting bracket
from the start of the whole pattern. From release 6.5, OP_RECURSE is
automatically wrapped inside OP_ONCE brackets (because otherwise some patterns
broke it). OP_RECURSE is also used for "subroutine" calls, even though they
are not strictly a recursion.
Callout
-------
OP_CALLOUT is followed by one byte of data that holds a callout number in the
range 0 to 254 for manual callouts, or 255 for an automatic callout. In both
cases there follows a two-byte value giving the offset in the pattern to the
start of the following item, and another two-byte item giving the length of the
next item.
Changing options
----------------
If any of the /i, /m, or /s options are changed within a pattern, an OP_OPT
opcode is compiled, followed by one byte containing the new settings of these
flags. If there are several alternatives, there is an occurrence of OP_OPT at
the start of all those following the first options change, to set appropriate
options for the start of the alternative. Immediately after the end of the
group there is another such item to reset the flags to their previous values. A
change of flag right at the very start of the pattern can be handled entirely
at compile time, and so does not cause anything to be put into the compiled
data.
Philip Hazel
June 2006

View File

@ -0,0 +1,128 @@
<html>
<head>
<title>PCRE specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>Perl-compatible Regular Expressions (PCRE)</h1>
<p>
The HTML documentation for PCRE comprises the following pages:
</p>
<table>
<tr><td><a href="pcre.html">pcre</a></td>
<td>&nbsp;&nbsp;Introductory page</td></tr>
<tr><td><a href="pcreapi.html">pcreapi</a></td>
<td>&nbsp;&nbsp;PCRE's native API</td></tr>
<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
<td>&nbsp;&nbsp;Options for building PCRE</td></tr>
<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
<td>&nbsp;&nbsp;The <i>callout</i> facility</td></tr>
<tr><td><a href="pcrecompat.html">pcrecompat</a></td>
<td>&nbsp;&nbsp;Compability with Perl</td></tr>
<tr><td><a href="pcrecpp.html">pcrecpp</a></td>
<td>&nbsp;&nbsp;The C++ wrapper for the PCRE library</td></tr>
<tr><td><a href="pcregrep.html">pcregrep</a></td>
<td>&nbsp;&nbsp;The <b>pcregrep</b> command</td></tr>
<tr><td><a href="pcrematching.html">pcrematching</a></td>
<td>&nbsp;&nbsp;Discussion of the two matching algorithms</td></tr>
<tr><td><a href="pcrepartial.html">pcrepartial</a></td>
<td>&nbsp;&nbsp;Using PCRE for partial matching</td></tr>
<tr><td><a href="pcrepattern.html">pcrepattern</a></td>
<td>&nbsp;&nbsp;Specification of the regular expressions supported by PCRE</td></tr>
<tr><td><a href="pcreperform.html">pcreperform</a></td>
<td>&nbsp;&nbsp;Some comments on performance</td></tr>
<tr><td><a href="pcreposix.html">pcreposix</a></td>
<td>&nbsp;&nbsp;The POSIX API to the PCRE library</td></tr>
<tr><td><a href="pcreprecompile.html">pcreprecompile</a></td>
<td>&nbsp;&nbsp;How to save and re-use compiled patterns</td></tr>
<tr><td><a href="pcresample.html">pcresample</a></td>
<td>&nbsp;&nbsp;Description of the sample program</td></tr>
<tr><td><a href="pcrestack.html">pcrestack</a></td>
<td>&nbsp;&nbsp;Discussion of PCRE's stack usage</td></tr>
<tr><td><a href="pcretest.html">pcretest</a></td>
<td>&nbsp;&nbsp;The <b>pcretest</b> command for testing PCRE</td></tr>
</table>
<p>
There are also individual pages that summarize the interface for each function
in the library:
</p>
<table>
<tr><td><a href="pcre_compile.html">pcre_compile</a></td>
<td>&nbsp;&nbsp;Compile a regular expression</td></tr>
<tr><td><a href="pcre_compile2.html">pcre_compile2</a></td>
<td>&nbsp;&nbsp;Compile a regular expression (alternate interface)</td></tr>
<tr><td><a href="pcre_config.html">pcre_config</a></td>
<td>&nbsp;&nbsp;Show build-time configuration options</td></tr>
<tr><td><a href="pcre_copy_named_substring.html">pcre_copy_named_substring</a></td>
<td>&nbsp;&nbsp;Extract named substring into given buffer</td></tr>
<tr><td><a href="pcre_copy_substring.html">pcre_copy_substring</a></td>
<td>&nbsp;&nbsp;Extract numbered substring into given buffer</td></tr>
<tr><td><a href="pcre_dfa_exec.html">pcre_dfa_exec</a></td>
<td>&nbsp;&nbsp;Match a compiled pattern to a subject string
(DFA algorithm; <i>not</i> Perl compatible)</td></tr>
<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
<td>&nbsp;&nbsp;Match a compiled pattern to a subject string
(Perl compatible)</td></tr>
<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
<td>&nbsp;&nbsp;Free extracted substring</td></tr>
<tr><td><a href="pcre_free_substring_list.html">pcre_free_substring_list</a></td>
<td>&nbsp;&nbsp;Free list of extracted substrings</td></tr>
<tr><td><a href="pcre_fullinfo.html">pcre_fullinfo</a></td>
<td>&nbsp;&nbsp;Extract information about a pattern</td></tr>
<tr><td><a href="pcre_get_named_substring.html">pcre_get_named_substring</a></td>
<td>&nbsp;&nbsp;Extract named substring into new memory</td></tr>
<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
<td>&nbsp;&nbsp;Convert captured string name to number</td></tr>
<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
<td>&nbsp;&nbsp;Extract numbered substring into new memory</td></tr>
<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
<td>&nbsp;&nbsp;Extract all substrings into new memory</td></tr>
<tr><td><a href="pcre_info.html">pcre_info</a></td>
<td>&nbsp;&nbsp;Obsolete information extraction function</td></tr>
<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
<td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
<tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
<td>&nbsp;&nbsp;Maintain reference count in compiled pattern</td></tr>
<tr><td><a href="pcre_study.html">pcre_study</a></td>
<td>&nbsp;&nbsp;Study a compiled pattern</td></tr>
<tr><td><a href="pcre_version.html">pcre_version</a></td>
<td>&nbsp;&nbsp;Return PCRE version and release date</td></tr>
</table>
</html>

View File

@ -0,0 +1,252 @@
<html>
<head>
<title>pcre specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
<li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>
<li><a name="TOC3" href="#SEC3">LIMITATIONS</a>
<li><a name="TOC4" href="#SEC4">UTF-8 AND UNICODE PROPERTY SUPPORT</a>
<li><a name="TOC5" href="#SEC5">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
<P>
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl, with just a few
differences. The current implementation of PCRE (release 6.x) corresponds
approximately with Perl 5.8, including support for UTF-8 encoded strings and
Unicode general category properties. However, this support has to be explicitly
enabled; it is not the default.
</P>
<P>
In addition to the Perl-compatible matching function, PCRE also contains an
alternative matching function that matches the same compiled patterns in a
different way. In certain circumstances, the alternative function has some
advantages. For a discussion of the two matching algorithms, see the
<a href="pcrematching.html"><b>pcrematching</b></a>
page.
</P>
<P>
PCRE is written in C and released as a C library. A number of people have
written wrappers and interfaces of various kinds. In particular, Google Inc.
have provided a comprehensive C++ wrapper. This is now included as part of the
PCRE distribution. The
<a href="pcrecpp.html"><b>pcrecpp</b></a>
page has details of this interface. Other people's contributions can be found
in the <i>Contrib</i> directory at the primary FTP site, which is:
<a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>
</P>
<P>
Details of exactly which Perl regular expression features are and are not
supported by PCRE are given in separate documents. See the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
and
<a href="pcrecompat.html"><b>pcrecompat</b></a>
pages.
</P>
<P>
Some features of PCRE can be included, excluded, or changed when the library is
built. The
<a href="pcre_config.html"><b>pcre_config()</b></a>
function makes it possible for a client to discover which features are
available. The features themselves are described in the
<a href="pcrebuild.html"><b>pcrebuild</b></a>
page. Documentation about building PCRE for various operating systems can be
found in the <b>README</b> file in the source distribution.
</P>
<P>
The library contains a number of undocumented internal functions and data
tables that are used by more than one of the exported external functions, but
which are not intended for use by external callers. Their names all begin with
"_pcre_", which hopefully will not provoke any name clashes. In some
environments, it is possible to control which external symbols are exported
when a shared library is built, and in these cases the undocumented symbols are
not exported.
</P>
<br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>
<P>
The user documentation for PCRE comprises a number of different sections. In
the "man" format, each of these is a separate "man page". In the HTML format,
each is a separate page, linked from the index page. In the plain text format,
all the sections are concatenated, for ease of searching. The sections are as
follows:
<pre>
pcre this document
pcreapi details of PCRE's native C API
pcrebuild options for building PCRE
pcrecallout details of the callout feature
pcrecompat discussion of Perl compatibility
pcrecpp details of the C++ wrapper
pcregrep description of the <b>pcregrep</b> command
pcrematching discussion of the two matching algorithms
pcrepartial details of the partial matching facility
pcrepattern syntax and semantics of supported regular expressions
pcreperform discussion of performance issues
pcreposix the POSIX-compatible C API
pcreprecompile details of saving and re-using precompiled patterns
pcresample discussion of the sample program
pcrestack discussion of stack usage
pcretest description of the <b>pcretest</b> testing command
</pre>
In addition, in the "man" and HTML formats, there is a short page for each
C library function, listing its arguments and results.
</P>
<br><a name="SEC3" href="#TOC1">LIMITATIONS</a><br>
<P>
There are some size limitations in PCRE but it is hoped that they will never in
practice be relevant.
</P>
<P>
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
compiled with the default internal linkage size of 2. If you want to process
regular expressions that are truly enormous, you can compile PCRE with an
internal linkage size of 3 or 4 (see the <b>README</b> file in the source
distribution and the
<a href="pcrebuild.html"><b>pcrebuild</b></a>
documentation for details). In these cases the limit is substantially larger.
However, the speed of execution will be slower.
</P>
<P>
All values in repeating quantifiers must be less than 65536. The maximum
compiled length of subpattern with an explicit repeat count is 30000 bytes. The
maximum number of capturing subpatterns is 65535.
</P>
<P>
There is no limit to the number of non-capturing subpatterns, but the maximum
depth of nesting of all kinds of parenthesized subpattern, including capturing
subpatterns, assertions, and other types of subpattern, is 200.
</P>
<P>
The maximum length of name for a named subpattern is 32, and the maximum number
of named subpatterns is 10000.
</P>
<P>
The maximum length of a subject string is the largest positive number that an
integer variable can hold. However, when using the traditional matching
function, PCRE uses recursion to handle subpatterns and indefinite repetition.
This means that the available stack space may limit the size of a subject
string that can be processed by certain patterns. For a discussion of stack
issues, see the
<a href="pcrestack.html"><b>pcrestack</b></a>
documentation.
<a name="utf8support"></a></P>
<br><a name="SEC4" href="#TOC1">UTF-8 AND UNICODE PROPERTY SUPPORT</a><br>
<P>
From release 3.3, PCRE has had some support for character strings encoded in
the UTF-8 format. For release 4.0 this was greatly extended to cover most
common requirements, and in release 5.0 additional support for Unicode general
category properties was added.
</P>
<P>
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
the code, and, in addition, you must call
<a href="pcre_compile.html"><b>pcre_compile()</b></a>
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
subject strings that are matched against it are treated as UTF-8 strings
instead of just strings of bytes.
</P>
<P>
If you compile PCRE with UTF-8 support, but do not use it at run time, the
library will be a bit bigger, but the additional run time overhead is limited
to testing the PCRE_UTF8 flag in several places, so should not be very large.
</P>
<P>
If PCRE is built with Unicode character property support (which implies UTF-8
support), the escape sequences \p{..}, \P{..}, and \X are supported.
The available properties that can be tested are limited to the general
category properties such as Lu for an upper case letter or Nd for a decimal
number, the Unicode script names such as Arabic or Han, and the derived
properties Any and L&. A full list is given in the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
documentation. Only the short names for properties are supported. For example,
\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
compatibility with Perl 5.6. PCRE does not support this.
</P>
<P>
The following comments apply when PCRE is running in UTF-8 mode:
</P>
<P>
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
are checked for validity on entry to the relevant functions. If an invalid
UTF-8 string is passed, an error return is given. In some situations, you may
already know that your strings are valid, and therefore want to skip these
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
at compile time or at run time, PCRE assumes that the pattern or subject it
is given (respectively) contains only valid UTF-8 codes. In this case, it does
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
may crash.
</P>
<P>
2. An unbraced hexadecimal escape sequence (such as \xb3) matches a two-byte
UTF-8 character if the value is greater than 127.
</P>
<P>
3. Octal numbers up to \777 are recognized, and match two-byte UTF-8
characters for values greater than \177.
</P>
<P>
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
bytes, for example: \x{100}{3}.
</P>
<P>
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
</P>
<P>
6. The escape sequence \C can be used to match a single byte in UTF-8 mode,
but its use can lead to some strange effects. This facility is not available in
the alternative matching function, <b>pcre_dfa_exec()</b>.
</P>
<P>
7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
test characters of any code value, but the characters that PCRE recognizes as
digits, spaces, or word characters remain the same set as before, all with
values less than 256. This remains true even when PCRE includes Unicode
property support, because to do otherwise would slow down PCRE in many common
cases. If you really want to test for a wider sense of, say, "digit", you
must use Unicode property tests such as \p{Nd}.
</P>
<P>
8. Similarly, characters that match the POSIX named character classes are all
low-valued characters.
</P>
<P>
9. Case-insensitive matching applies only to characters whose values are less
than 128, unless PCRE is built with Unicode property support. Even when Unicode
property support is available, PCRE still uses its own character tables when
checking the case of low-valued characters, so as not to degrade performance.
The Unicode property information is used only for characters with higher
values. Even when Unicode property support is available, PCRE supports
case-insensitive matching only when there is a one-to-one mapping between a
letter's cases. There are a small number of many-to-one mappings in Unicode;
these are not supported by PCRE.
</P>
<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
University Computing Service,
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Putting an actual email address here seems to have been a spam magnet, so I've
taken it away. If you want to email me, use my initial and surname, separated
by a dot, at the domain ucs.cam.ac.uk.
Last updated: 05 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,80 @@
<html>
<head>
<title>pcre_compile specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_compile man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
<b>const unsigned char *<i>tableptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function compiles a regular expression into an internal form. Its
arguments are:
<pre>
<i>pattern</i> A zero-terminated string containing the
regular expression to be compiled
<i>options</i> Zero or more option bits
<i>errptr</i> Where to put an error message
<i>erroffset</i> Offset in pattern where error was found
<i>tableptr</i> Pointer to character tables, or NULL to
use the built-in default
</pre>
The option bits are:
<pre>
PCRE_ANCHORED Force pattern anchoring
PCRE_AUTO_CALLOUT Compile automatic callouts
PCRE_CASELESS Do caseless matching
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_DUPNAMES Allow duplicate names for subpatterns
PCRE_EXTENDED Ignore whitespace and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_FIRSTLINE Force matching to be before newline
PCRE_MULTILINE ^ and $ match newlines within data
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
PCRE_UNGREEDY Invert greediness of quantifiers
PCRE_UTF8 Run in UTF-8 mode
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
</pre>
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
PCRE_NO_UTF8_CHECK.
</P>
<P>
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,85 @@
<html>
<head>
<title>pcre_compile2 specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_compile2 man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>pcre *pcre_compile2(const char *<i>pattern</i>, int <i>options</i>,</b>
<b>int *<i>errorcodeptr</i>,</b>
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
<b>const unsigned char *<i>tableptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function compiles a regular expression into an internal form. It is the
same as <b>pcre_compile()</b>, except for the addition of the <i>errorcodeptr</i>
argument. The arguments are:
</P>
<P>
<pre>
<i>pattern</i> A zero-terminated string containing the
regular expression to be compiled
<i>options</i> Zero or more option bits
<i>errorcodeptr</i> Where to put an error code
<i>errptr</i> Where to put an error message
<i>erroffset</i> Offset in pattern where error was found
<i>tableptr</i> Pointer to character tables, or NULL to
use the built-in default
</pre>
The option bits are:
<pre>
PCRE_ANCHORED Force pattern anchoring
PCRE_AUTO_CALLOUT Compile automatic callouts
PCRE_CASELESS Do caseless matching
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_DUPNAMES Allow duplicate names for subpatterns
PCRE_EXTENDED Ignore whitespace and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_FIRSTLINE Force matching to be before newline
PCRE_MULTILINE ^ and $ match newlines within data
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
PCRE_UNGREEDY Invert greediness of quantifiers
PCRE_UTF8 Run in UTF-8 mode
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
</pre>
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
PCRE_NO_UTF8_CHECK.
</P>
<P>
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,62 @@
<html>
<head>
<title>pcre_config specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_config man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function makes it possible for a client program to find out which optional
features are available in the version of the PCRE library it is using. Its
arguments are as follows:
<pre>
<i>what</i> A code specifying what information is required
<i>where</i> Points to where to put the data
</pre>
The available codes are:
<pre>
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
PCRE_CONFIG_MATCH_LIMIT_RECURSION
Internal recursion depth limit
PCRE_CONFIG_NEWLINE Value of the newline sequence
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
Threshold of return slots, above
which <b>malloc()</b> is used by
the POSIX API
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
PCRE_CONFIG_UNICODE_PROPERTIES
Availability of Unicode property support
(1=yes 0=no)
</pre>
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,53 @@
<html>
<head>
<title>pcre_copy_named_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_copy_named_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
<b>char *<i>buffer</i>, int <i>buffersize</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring, identified
by name, into a given buffer. The arguments are:
<pre>
<i>code</i> Pattern that was successfully matched
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringname</i> Name of the required substring
<i>buffer</i> Buffer to receive the string
<i>buffersize</i> Size of buffer
</pre>
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,51 @@
<html>
<head>
<title>pcre_copy_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_copy_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
<b>int <i>buffersize</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring into a given
buffer. The arguments are:
<pre>
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringnumber</i> Number of the required substring
<i>buffer</i> Buffer to receive the string
<i>buffersize</i> Size of buffer
</pre>
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,93 @@
<html>
<head>
<title>pcre_dfa_exec specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_dfa_exec man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
<b>int *<i>workspace</i>, int <i>wscount</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function matches a compiled regular expression against a given subject
string, using a DFA matching algorithm (<i>not</i> Perl-compatible). Note that
the main, Perl-compatible, matching function is <b>pcre_exec()</b>. The
arguments for this function are:
<pre>
<i>code</i> Points to the compiled pattern
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
or is NULL
<i>subject</i> Points to the subject string
<i>length</i> Length of the subject string, in bytes
<i>startoffset</i> Offset in bytes in the subject at which to
start matching
<i>options</i> Option bits
<i>ovector</i> Points to a vector of ints for result offsets
<i>ovecsize</i> Number of elements in the vector
<i>workspace</i> Points to a vector of ints used as working space
<i>wscount</i> Number of elements in the vector
</pre>
The options are:
<pre>
PCRE_ANCHORED Match only at the first position
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NOTBOL Subject is not the beginning of a line
PCRE_NOTEOL Subject is not the end of a line
PCRE_NOTEMPTY An empty string is not a valid match
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
PCRE_DFA_SHORTEST Return only the shortest match
PCRE_DFA_RESTART This is a restart after a partial match
</pre>
There are restrictions on what may appear in a pattern when matching using the
DFA algorithm is requested. Details are given in the
<a href="pcrematching.html"><b>pcrematching</b></a>
documentation.
</P>
<P>
A <b>pcre_extra</b> structure contains the following fields:
<pre>
<i>flags</i> Bits indicating which fields are set
<i>study_data</i> Opaque data from <b>pcre_study()</b>
<i>match_limit</i> Limit on internal resource use
<i>match_limit_recursion</i> Limit on internal recursion depth
<i>callout_data</i> Opaque data passed back to callouts
<i>tables</i> Points to character tables or is NULL
</pre>
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
PCRE_EXTRA_TABLES. For DFA matching, the <i>match_limit</i> and
<i>match_limit_recursion</i> fields are not used, and must not be set.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,84 @@
<html>
<head>
<title>pcre_exec specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_exec man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function matches a compiled regular expression against a given subject
string, using a matching algorithm that is similar to Perl's. It returns
offsets to captured substrings. Its arguments are:
<pre>
<i>code</i> Points to the compiled pattern
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
or is NULL
<i>subject</i> Points to the subject string
<i>length</i> Length of the subject string, in bytes
<i>startoffset</i> Offset in bytes in the subject at which to
start matching
<i>options</i> Option bits
<i>ovector</i> Points to a vector of ints for result offsets
<i>ovecsize</i> Number of elements in the vector (a multiple of 3)
</pre>
The options are:
<pre>
PCRE_ANCHORED Match only at the first position
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NOTBOL Subject is not the beginning of a line
PCRE_NOTEOL Subject is not the end of a line
PCRE_NOTEMPTY An empty string is not a valid match
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
</pre>
There are restrictions on what may appear in a pattern when partial matching is
requested.
</P>
<P>
A <b>pcre_extra</b> structure contains the following fields:
<pre>
<i>flags</i> Bits indicating which fields are set
<i>study_data</i> Opaque data from <b>pcre_study()</b>
<i>match_limit</i> Limit on internal resource use
<i>match_limit_recursion</i> Limit on internal recursion depth
<i>callout_data</i> Opaque data passed back to callouts
<i>tables</i> Points to character tables or is NULL
</pre>
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
PCRE_EXTRA_TABLES.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,40 @@
<html>
<head>
<title>pcre_free_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_free_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>void pcre_free_substring(const char *<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for freeing the store obtained by a previous
call to <b>pcre_get_substring()</b> or <b>pcre_get_named_substring()</b>. Its
only argument is a pointer to the string.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,40 @@
<html>
<head>
<title>pcre_free_substring_list specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_free_substring_list man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>void pcre_free_substring_list(const char **<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for freeing the store obtained by a previous
call to <b>pcre_get_substring_list()</b>. Its only argument is a pointer to the
list of string pointers.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,71 @@
<html>
<head>
<title>pcre_fullinfo specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_fullinfo man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
<b>int <i>what</i>, void *<i>where</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function returns information about a compiled pattern. Its arguments are:
<pre>
<i>code</i> Compiled regular expression
<i>extra</i> Result of <b>pcre_study()</b> or NULL
<i>what</i> What information is required
<i>where</i> Where to put the information
</pre>
The following information is available:
<pre>
PCRE_INFO_BACKREFMAX Number of highest back reference
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
PCRE_INFO_DEFAULT_TABLES Pointer to default tables
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
-1 for start of string
or after newline, or
-2 otherwise
PCRE_INFO_FIRSTTABLE Table of first bytes
(after studying)
PCRE_INFO_LASTLITERAL Literal last byte required
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
PCRE_INFO_OPTIONS Options used for compilation
PCRE_INFO_SIZE Size of compiled pattern
PCRE_INFO_STUDYSIZE Size of study data
</pre>
The yield of the function is zero on success or:
<pre>
PCRE_ERROR_NULL the argument <i>code</i> was NULL
the argument <i>where</i> was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
</PRE>
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,54 @@
<html>
<head>
<title>pcre_get_named_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_named_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
<b>const char **<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring by name. The
arguments are:
<pre>
<i>code</i> Compiled pattern
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringname</i> Name of the required substring
<i>stringptr</i> Where to put the string pointer
</pre>
The memory in which the substring is placed is obtained by calling
<b>pcre_malloc()</b>. The yield of the function is the length of the extracted
substring, PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
PCRE_ERROR_NOSUBSTRING if the string name is invalid.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,46 @@
<html>
<head>
<title>pcre_get_stringnumber specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_stringnumber man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
<b>const char *<i>name</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This convenience function finds the number of a named substring capturing
parenthesis in a compiled pattern. Its arguments are:
<pre>
<i>code</i> Compiled regular expression
<i>name</i> Name whose number is required
</pre>
The yield of the function is the number of the parenthesis if the name is
found, or PCRE_ERROR_NOSUBSTRING otherwise.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,52 @@
<html>
<head>
<title>pcre_get_stringtable_entries specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_stringtable_entries man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_stringtable_entries(const pcre *<i>code</i>,</b>
<b>const char *<i>name</i>, char **<i>first</i>, char **<i>last</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This convenience function finds, for a compiled pattern, the first and last
entries for a given name in the table that translates capturing parenthesis
names into numbers. When names are required to be unique (PCRE_DUPNAMES is
<i>not</i> set), it is usually easier to use <b>pcre_get_stringnumber()</b>
instead.
<pre>
<i>code</i> Compiled regular expression
<i>name</i> Name whose entries required
<i>first</i> Where to return a pointer to the first entry
<i>last</i> Where to return a pointer to the last entry
</pre>
The yield of the function is the length of each entry, or
PCRE_ERROR_NOSUBSTRING if none are found.
</P>
<P>
There is a complete description of the PCRE native API, including the format of
the table entries, in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,52 @@
<html>
<head>
<title>pcre_get_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
<b>const char **<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring. The
arguments are:
<pre>
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringnumber</i> Number of the required substring
<i>stringptr</i> Where to put the string pointer
</pre>
The memory in which the substring is placed is obtained by calling
<b>pcre_malloc()</b>. The yield of the function is the length of the substring,
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
PCRE_ERROR_NOSUBSTRING if the string number is invalid.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,51 @@
<html>
<head>
<title>pcre_get_substring_list specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_substring_list man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
<b>int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a list of all the captured
substrings. The arguments are:
<pre>
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec</b> used
<i>stringcount</i> Value returned by <b>pcre_exec</b>
<i>listptr</i> Where to put a pointer to the list
</pre>
The memory in which the substrings and the list are placed is obtained by
calling <b>pcre_malloc()</b>. A pointer to a list of pointers is put in
the variable whose address is in <i>listptr</i>. The list is terminated by a
NULL pointer. The yield of the function is zero on success or
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,39 @@
<html>
<head>
<title>pcre_info specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_info man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b>
<b>*<i>firstcharptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function is obsolete. You should be using <b>pcre_fullinfo()</b> instead.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,42 @@
<html>
<head>
<title>pcre_maketables specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_maketables man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>const unsigned char *pcre_maketables(void);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function builds a set of character tables for character values less than
256. These can be passed to <b>pcre_compile()</b> to override PCRE's internal,
built-in tables (which were made by <b>pcre_maketables()</b> when PCRE was
compiled). You might want to do this if you are using a non-standard locale.
The function yields a pointer to the tables.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,45 @@
<html>
<head>
<title>pcre_refcount specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_refcount man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_refcount(pcre *<i>code</i>, int <i>adjust</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function is used to maintain a reference count inside a data block that
contains a compiled pattern. Its arguments are:
<pre>
<i>code</i> Compiled regular expression
<i>adjust</i> Adjustment to reference value
</pre>
The yield of the function is the adjusted reference value, which is constrained
to lie between 0 and 65535.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,56 @@
<html>
<head>
<title>pcre_study specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_study man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
<b>const char **<i>errptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function studies a compiled pattern, to see if additional information can
be extracted that might speed up matching. Its arguments are:
<pre>
<i>code</i> A compiled regular expression
<i>options</i> Options for <b>pcre_study()</b>
<i>errptr</i> Where to put an error message
</pre>
If the function succeeds, it returns a value that can be passed to
<b>pcre_exec()</b> via its <i>extra</i> argument.
</P>
<P>
If the function returns NULL, either it could not find any additional
information, or there was an error. You can tell the difference by looking at
the error value. It is NULL in first case.
</P>
<P>
There are currently no options defined; the value of the second argument should
always be zero.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,39 @@
<html>
<head>
<title>pcre_version specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_version man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>char *pcre_version(void);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function returns a character string that gives the version number of the
PCRE library and the date of its release.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,225 @@
<html>
<head>
<title>pcrebuild specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrebuild man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
<li><a name="TOC2" href="#SEC2">C++ SUPPORT</a>
<li><a name="TOC3" href="#SEC3">UTF-8 SUPPORT</a>
<li><a name="TOC4" href="#SEC4">UNICODE CHARACTER PROPERTY SUPPORT</a>
<li><a name="TOC5" href="#SEC5">CODE VALUE OF NEWLINE</a>
<li><a name="TOC6" href="#SEC6">BUILDING SHARED AND STATIC LIBRARIES</a>
<li><a name="TOC7" href="#SEC7">POSIX MALLOC USAGE</a>
<li><a name="TOC8" href="#SEC8">HANDLING VERY LARGE PATTERNS</a>
<li><a name="TOC9" href="#SEC9">AVOIDING EXCESSIVE STACK USAGE</a>
<li><a name="TOC10" href="#SEC10">LIMITING PCRE RESOURCE USAGE</a>
<li><a name="TOC11" href="#SEC11">USING EBCDIC CODE</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
<P>
This document describes the optional features of PCRE that can be selected when
the library is compiled. They are all selected, or deselected, by providing
options to the <b>configure</b> script that is run before the <b>make</b>
command. The complete list of options for <b>configure</b> (which includes the
standard ones such as the selection of the installation directory) can be
obtained by running
<pre>
./configure --help
</pre>
The following sections describe certain options whose names begin with --enable
or --disable. These settings specify changes to the defaults for the
<b>configure</b> command. Because of the way that <b>configure</b> works,
--enable and --disable always come in pairs, so the complementary option always
exists as well, but as it specifies the default, it is not described.
</P>
<br><a name="SEC2" href="#TOC1">C++ SUPPORT</a><br>
<P>
By default, the <b>configure</b> script will search for a C++ compiler and C++
header files. If it finds them, it automatically builds the C++ wrapper library
for PCRE. You can disable this by adding
<pre>
--disable-cpp
</pre>
to the <b>configure</b> command.
</P>
<br><a name="SEC3" href="#TOC1">UTF-8 SUPPORT</a><br>
<P>
To build PCRE with support for UTF-8 character strings, add
<pre>
--enable-utf8
</pre>
to the <b>configure</b> command. Of itself, this does not make PCRE treat
strings as UTF-8. As well as compiling PCRE with this option, you also have
have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
function.
</P>
<br><a name="SEC4" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
<P>
UTF-8 support allows PCRE to process character values greater than 255 in the
strings that it handles. On its own, however, it does not provide any
facilities for accessing the properties of such characters. If you want to be
able to use the pattern escapes \P, \p, and \X, which refer to Unicode
character properties, you must add
<pre>
--enable-unicode-properties
</pre>
to the <b>configure</b> command. This implies UTF-8 support, even if you have
not explicitly requested it.
</P>
<P>
Including Unicode property support adds around 90K of tables to the PCRE
library, approximately doubling its size. Only the general category properties
such as <i>Lu</i> and <i>Nd</i> are supported. Details are given in the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
documentation.
</P>
<br><a name="SEC5" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
<P>
By default, PCRE interprets character 10 (linefeed, LF) as indicating the end
of a line. This is the normal newline character on Unix-like systems. You can
compile PCRE to use character 13 (carriage return, CR) instead, by adding
<pre>
--enable-newline-is-cr
</pre>
to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
which explicitly specifies linefeed as the newline character.
<br>
<br>
Alternatively, you can specify that line endings are to be indicated by the two
character sequence CRLF. If you want this, add
<pre>
--enable-newline-is-crlf
</pre>
to the <b>configure</b> command. Whatever line ending convention is selected
when PCRE is built can be overridden when the library functions are called. At
build time it is conventional to use the standard for your operating system.
</P>
<br><a name="SEC6" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
<P>
The PCRE building process uses <b>libtool</b> to build both shared and static
Unix libraries by default. You can suppress one of these by adding one of
<pre>
--disable-shared
--disable-static
</pre>
to the <b>configure</b> command, as required.
</P>
<br><a name="SEC7" href="#TOC1">POSIX MALLOC USAGE</a><br>
<P>
When PCRE is called through the POSIX interface (see the
<a href="pcreposix.html"><b>pcreposix</b></a>
documentation), additional working storage is required for holding the pointers
to capturing substrings, because PCRE requires three integers per substring,
whereas the POSIX interface provides only two. If the number of expected
substrings is small, the wrapper function uses space on the stack, because this
is faster than using <b>malloc()</b> for each call. The default threshold above
which the stack is no longer used is 10; it can be changed by adding a setting
such as
<pre>
--with-posix-malloc-threshold=20
</pre>
to the <b>configure</b> command.
</P>
<br><a name="SEC8" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
<P>
Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
metacharacter). By default, two-byte values are used for these offsets, leading
to a maximum size for a compiled pattern of around 64K. This is sufficient to
handle all but the most gigantic patterns. Nevertheless, some people do want to
process enormous patterns, so it is possible to compile PCRE to use three-byte
or four-byte offsets by adding a setting such as
<pre>
--with-link-size=3
</pre>
to the <b>configure</b> command. The value given must be 2, 3, or 4. Using
longer offsets slows down the operation of PCRE because it has to load
additional bytes when handling them.
</P>
<P>
If you build PCRE with an increased link size, test 2 (and test 5 if you are
using UTF-8) will fail. Part of the output of these tests is a representation
of the compiled pattern, and this changes with the link size.
</P>
<br><a name="SEC9" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
<P>
When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
by making recursive calls to an internal function called <b>match()</b>. In
environments where the size of the stack is limited, this can severely limit
PCRE's operation. (The Unix environment does not usually suffer from this
problem, but it may sometimes be necessary to increase the maximum stack size.
There is a discussion in the
<a href="pcrestack.html"><b>pcrestack</b></a>
documentation.) An alternative approach to recursion that uses memory from the
heap to remember data, instead of using recursive function calls, has been
implemented to work round the problem of limited stack size. If you want to
build a version of PCRE that works this way, add
<pre>
--disable-stack-for-recursion
</pre>
to the <b>configure</b> command. With this configuration, PCRE will use the
<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
management functions. Separate functions are provided because the usage is very
predictable: the block sizes requested are always the same, and the blocks are
always freed in reverse order. A calling program might be able to implement
optimized functions that perform better than the standard <b>malloc()</b> and
<b>free()</b> functions. PCRE runs noticeably more slowly when built in this
way. This option affects only the <b>pcre_exec()</b> function; it is not
relevant for the the <b>pcre_dfa_exec()</b> function.
</P>
<br><a name="SEC10" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
<P>
Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
function. By controlling the maximum number of times this function may be
called during a single matching operation, a limit can be placed on the
resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
at run time, as described in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation. The default is 10 million, but this can be changed by adding a
setting such as
<pre>
--with-match-limit=500000
</pre>
to the <b>configure</b> command. This setting has no effect on the
<b>pcre_dfa_exec()</b> matching function.
</P>
<P>
In some environments it is desirable to limit the depth of recursive calls of
<b>match()</b> more strictly than the total number of calls, in order to
restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
is specified) that is used. A second limit controls this; it defaults to the
value that is set for --with-match-limit, which imposes no additional
constraints. However, you can set a lower limit by adding, for example,
<pre>
--with-match-limit-recursion=10000
</pre>
to the <b>configure</b> command. This value can also be overridden at run time.
</P>
<br><a name="SEC11" href="#TOC1">USING EBCDIC CODE</a><br>
<P>
PCRE assumes by default that it will run in an environment where the character
code is ASCII (or Unicode, which is a superset of ASCII). PCRE can, however, be
compiled to run in an EBCDIC environment by adding
<pre>
--enable-ebcdic
</pre>
to the <b>configure</b> command.
</P>
<P>
Last updated: 06 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,186 @@
<html>
<head>
<title>pcrecallout specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrecallout man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE CALLOUTS</a>
<li><a name="TOC2" href="#SEC2">MISSING CALLOUTS</a>
<li><a name="TOC3" href="#SEC3">THE CALLOUT INTERFACE</a>
<li><a name="TOC4" href="#SEC4">RETURN VALUES</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE CALLOUTS</a><br>
<P>
<b>int (*pcre_callout)(pcre_callout_block *);</b>
</P>
<P>
PCRE provides a feature called "callout", which is a means of temporarily
passing control to the caller of PCRE in the middle of pattern matching. The
caller of PCRE provides an external function by putting its entry point in the
global variable <i>pcre_callout</i>. By default, this variable contains NULL,
which disables all calling out.
</P>
<P>
Within a regular expression, (?C) indicates the points at which the external
function is to be called. Different callout points can be identified by putting
a number less than 256 after the letter C. The default value is zero.
For example, this pattern has two callout points:
<pre>
(?C1)\deabc(?C2)def
</pre>
If the PCRE_AUTO_CALLOUT option bit is set when <b>pcre_compile()</b> is called,
PCRE automatically inserts callouts, all with number 255, before each item in
the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
<pre>
A(\d{2}|--)
</pre>
it is processed as if it were
<br>
<br>
(?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
<br>
<br>
Notice that there is a callout before and after each parenthesis and
alternation bar. Automatic callouts can be used for tracking the progress of
pattern matching. The
<a href="pcretest.html"><b>pcretest</b></a>
command has an option that sets automatic callouts; when it is used, the output
indicates how the pattern is matched. This is useful information when you are
trying to optimize the performance of a particular pattern.
</P>
<br><a name="SEC2" href="#TOC1">MISSING CALLOUTS</a><br>
<P>
You should be aware that, because of optimizations in the way PCRE matches
patterns, callouts sometimes do not happen. For example, if the pattern is
<pre>
ab(?C4)cd
</pre>
PCRE knows that any matching string must contain the letter "d". If the subject
string is "abyz", the lack of "d" means that matching doesn't ever start, and
the callout is never reached. However, with "abyd", though the result is still
no match, the callout is obeyed.
</P>
<br><a name="SEC3" href="#TOC1">THE CALLOUT INTERFACE</a><br>
<P>
During matching, when PCRE reaches a callout point, the external function
defined by <i>pcre_callout</i> is called (if it is set). This applies to both
the <b>pcre_exec()</b> and the <b>pcre_dfa_exec()</b> matching functions. The
only argument to the callout function is a pointer to a <b>pcre_callout</b>
block. This structure contains the following fields:
<pre>
int <i>version</i>;
int <i>callout_number</i>;
int *<i>offset_vector</i>;
const char *<i>subject</i>;
int <i>subject_length</i>;
int <i>start_match</i>;
int <i>current_position</i>;
int <i>capture_top</i>;
int <i>capture_last</i>;
void *<i>callout_data</i>;
int <i>pattern_position</i>;
int <i>next_item_length</i>;
</pre>
The <i>version</i> field is an integer containing the version number of the
block format. The initial version was 0; the current version is 1. The version
number will change again in future if additional fields are added, but the
intention is never to remove any of the existing fields.
</P>
<P>
The <i>callout_number</i> field contains the number of the callout, as compiled
into the pattern (that is, the number after ?C for manual callouts, and 255 for
automatically generated callouts).
</P>
<P>
The <i>offset_vector</i> field is a pointer to the vector of offsets that was
passed by the caller to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. When
<b>pcre_exec()</b> is used, the contents can be inspected in order to extract
substrings that have been matched so far, in the same way as for extracting
substrings after a match has completed. For <b>pcre_dfa_exec()</b> this field is
not useful.
</P>
<P>
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
that were passed to <b>pcre_exec()</b>.
</P>
<P>
The <i>start_match</i> field contains the offset within the subject at which the
current match attempt started. If the pattern is not anchored, the callout
function may be called several times from the same point in the pattern for
different starting points in the subject.
</P>
<P>
The <i>current_position</i> field contains the offset within the subject of the
current match pointer.
</P>
<P>
When the <b>pcre_exec()</b> function is used, the <i>capture_top</i> field
contains one more than the number of the highest numbered captured substring so
far. If no substrings have been captured, the value of <i>capture_top</i> is
one. This is always the case when <b>pcre_dfa_exec()</b> is used, because it
does not support captured substrings.
</P>
<P>
The <i>capture_last</i> field contains the number of the most recently captured
substring. If no substrings have been captured, its value is -1. This is always
the case when <b>pcre_dfa_exec()</b> is used.
</P>
<P>
The <i>callout_data</i> field contains a value that is passed to
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> specifically so that it can be
passed back in callouts. It is passed in the <i>pcre_callout</i> field of the
<b>pcre_extra</b> data structure. If no such data was passed, the value of
<i>callout_data</i> in a <b>pcre_callout</b> block is NULL. There is a
description of the <b>pcre_extra</b> structure in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<P>
The <i>pattern_position</i> field is present from version 1 of the
<i>pcre_callout</i> structure. It contains the offset to the next item to be
matched in the pattern string.
</P>
<P>
The <i>next_item_length</i> field is present from version 1 of the
<i>pcre_callout</i> structure. It contains the length of the next item to be
matched in the pattern string. When the callout immediately precedes an
alternation bar, a closing parenthesis, or the end of the pattern, the length
is zero. When the callout precedes an opening parenthesis, the length is that
of the entire subpattern.
</P>
<P>
The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
help in distinguishing between different automatic callouts, which all have the
same callout number. However, they are set for all callouts.
</P>
<br><a name="SEC4" href="#TOC1">RETURN VALUES</a><br>
<P>
The external callout function returns an integer to PCRE. If the value is zero,
matching proceeds as normal. If the value is greater than zero, matching fails
at the current point, but the testing of other matching possibilities goes
ahead, just as if a lookahead assertion had failed. If the value is less than
zero, the match is abandoned, and <b>pcre_exec()</b> (or <b>pcre_dfa_exec()</b>)
returns the negative value.
</P>
<P>
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
it will never be used by PCRE itself.
</P>
<P>
Last updated: 28 February 2005
<br>
Copyright &copy; 1997-2005 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,156 @@
<html>
<head>
<title>pcrecompat specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrecompat man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
DIFFERENCES BETWEEN PCRE AND PERL
</b><br>
<P>
This document describes the differences in the ways that PCRE and Perl handle
regular expressions. The differences described here are with respect to Perl
5.8.
</P>
<P>
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
it does have are given in the
<a href="pcre.html#utf8support">section on UTF-8 support</a>
in the main
<a href="pcre.html"><b>pcre</b></a>
page.
</P>
<P>
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
them, but they do not mean what you might think. For example, (?!a){3} does
not assert that the next three characters are not "a". It just asserts that the
next character is not "a" three times.
</P>
<P>
3. Capturing subpatterns that occur inside negative lookahead assertions are
counted, but their entries in the offsets vector are never set. Perl sets its
numerical variables from any such patterns that are matched before the
assertion fails to match something (thereby succeeding), but only if the
negative lookahead assertion contains just one branch.
</P>
<P>
4. Though binary zero characters are supported in the subject string, they are
not allowed in a pattern string because it is passed as a normal C string,
terminated by zero. The escape sequence \0 can be used in the pattern to
represent a binary zero.
</P>
<P>
5. The following Perl escape sequences are not supported: \l, \u, \L,
\U, and \N. In fact these are implemented by Perl's general string-handling
and are not part of its pattern matching engine. If any of these are
encountered by PCRE, an error is generated.
</P>
<P>
6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is
built with Unicode character property support. The properties that can be
tested with \p and \P are limited to the general category properties such as
Lu and Nd, script names such as Greek or Han, and the derived properties Any
and L&.
</P>
<P>
7. PCRE does support the \Q...\E escape for quoting substrings. Characters in
between are treated as literals. This is slightly different from Perl in that $
and @ are also handled as literals inside the quotes. In Perl, they cause
variable interpolation (but of course PCRE does not have variables). Note the
following examples:
<pre>
Pattern PCRE matches Perl matches
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
\Qabc\$xyz\E abc\$xyz abc\$xyz
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
</pre>
The \Q...\E sequence is recognized both inside and outside character classes.
</P>
<P>
8. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
constructions. However, there is support for recursive patterns using the
non-Perl items (?R), (?number), and (?P&#62;name). Also, the PCRE "callout" feature
allows an external function to be called during pattern matching. See the
<a href="pcrecallout.html"><b>pcrecallout</b></a>
documentation for details.
</P>
<P>
9. There are some differences that are concerned with the settings of captured
strings when part of a pattern is repeated. For example, matching "aba" against
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
</P>
<P>
10. PCRE provides some extensions to the Perl regular expression facilities:
<br>
<br>
(a) Although lookbehind assertions must match fixed length strings, each
alternative branch of a lookbehind assertion can match a different length of
string. Perl requires them all to have the same length.
<br>
<br>
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
meta-character matches only at the very end of the string.
<br>
<br>
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
meaning is faulted. Otherwise, like Perl, the backslash is ignored. (Perl can
be made to issue a warning.)
<br>
<br>
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
inverted, that is, by default they are not greedy, but if followed by a
question mark they are.
<br>
<br>
(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
only at the first matching position in the subject string.
<br>
<br>
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
options for <b>pcre_exec()</b> have no Perl equivalents.
<br>
<br>
(g) The (?R), (?number), and (?P&#62;name) constructs allows for recursive pattern
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
support.)
<br>
<br>
(h) PCRE supports named capturing substrings, using the Python syntax.
<br>
<br>
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
package.
<br>
<br>
(j) The (R) condition, for testing recursion, is a PCRE extension.
<br>
<br>
(k) The callout facility is PCRE-specific.
<br>
<br>
(l) The partial matching facility is PCRE-specific.
<br>
<br>
(m) Patterns compiled by PCRE can be saved and re-used at a later time, even on
different hosts that have the other endianness.
<br>
<br>
(n) The alternative matching function (<b>pcre_dfa_exec()</b>) matches in a
different way and is not Perl-compatible.
</P>
<P>
Last updated: 06 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,337 @@
<html>
<head>
<title>pcrecpp specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrecpp man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a>
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a>
<li><a name="TOC6" href="#SEC6">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
<li><a name="TOC7" href="#SEC7">SCANNING TEXT INCREMENTALLY</a>
<li><a name="TOC8" href="#SEC8">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
<li><a name="TOC9" href="#SEC9">REPLACING PARTS OF STRINGS</a>
<li><a name="TOC10" href="#SEC10">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
<P>
<b>#include &#60;pcrecpp.h&#62;</b>
</P>
<P>
</P>
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
<P>
The C++ wrapper for PCRE was provided by Google Inc. Some additional
functionality was added by Giuseppe Maxia. This brief man page was constructed
from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
further details.
</P>
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
<P>
The "FullMatch" operation checks that supplied text matches a supplied pattern
exactly. If pointer arguments are supplied, it copies matched sub-strings that
match sub-patterns into them.
<pre>
Example: successful match
pcrecpp::RE re("h.*o");
re.FullMatch("hello");
Example: unsuccessful match (requires full match):
pcrecpp::RE re("e");
!re.FullMatch("hello");
Example: creating a temporary RE object:
pcrecpp::RE("h.*o").FullMatch("hello");
</pre>
You can pass in a "const char*" or a "string" for "text". The examples below
tend to use a const char*. You can, as in the different examples above, store
the RE object explicitly in a variable or use a temporary RE object. The
examples below use one mode or the other arbitrarily. Either could correctly be
used for any of these examples.
</P>
<P>
You must supply extra pointer arguments to extract matched subpieces.
<pre>
Example: extracts "ruby" into "s" and 1234 into "i"
int i;
string s;
pcrecpp::RE re("(\\w+):(\\d+)");
re.FullMatch("ruby:1234", &s, &i);
Example: does not try to extract any extra sub-patterns
re.FullMatch("ruby:1234", &s);
Example: does not try to extract into NULL
re.FullMatch("ruby:1234", NULL, &i);
Example: integer overflow causes failure
!re.FullMatch("ruby:1234567891234", NULL, &i);
Example: fails because there aren't enough sub-patterns:
!pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
Example: fails because string cannot be stored in integer
!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
</pre>
The provided pointer arguments can be pointers to any scalar numeric
type, or one of:
<pre>
string (matched piece is copied to string)
StringPiece (StringPiece is mutated to point to matched piece)
T (where "bool T::ParseFrom(const char*, int)" exists)
NULL (the corresponding matched sub-pattern is not copied)
</pre>
The function returns true iff all of the following conditions are satisfied:
<pre>
a. "text" matches "pattern" exactly;
b. The number of matched sub-patterns is &#62;= number of supplied
pointers;
c. The "i"th argument has a suitable type for holding the
string captured as the "i"th sub-pattern. If you pass in
NULL for the "i"th argument, or pass fewer arguments than
number of sub-patterns, "i"th captured sub-pattern is
ignored.
</pre>
The matching interface supports at most 16 arguments per call.
If you need more, consider using the more general interface
<b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
<b>DoMatch</b>.
</P>
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHES</a><br>
<P>
You can use the "PartialMatch" operation when you want the pattern
to match any substring of the text.
<pre>
Example: simple search for a string:
pcrecpp::RE("ell").PartialMatch("hello");
Example: find first number in a string:
int number;
pcrecpp::RE re("(\\d+)");
re.PartialMatch("x*100 + 20", &number);
assert(number == 100);
</PRE>
</P>
<br><a name="SEC5" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
<P>
By default, pattern and text are plain text, one byte per character. The UTF8
flag, passed to the constructor, causes both pattern and string to be treated
as UTF-8 text, still a byte stream but potentially multiple bytes per
character. In practice, the text is likelier to be UTF-8 than the pattern, but
the match returned may depend on the UTF8 flag, so always use it when matching
UTF8 text. For example, "." will match one byte normally but with UTF8 set may
match up to three bytes of a multi-byte character.
<pre>
Example:
pcrecpp::RE_Options options;
options.set_utf8();
pcrecpp::RE re(utf8_pattern, options);
re.FullMatch(utf8_string);
Example: using the convenience function UTF8():
pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
re.FullMatch(utf8_string);
</pre>
NOTE: The UTF8 flag is ignored if pcre was not configured with the
<pre>
--enable-utf8 flag.
</PRE>
</P>
<br><a name="SEC6" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
<P>
PCRE defines some modifiers to change the behavior of the regular expression
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
pass such modifiers to a RE class. Currently, the following modifiers are
supported:
<pre>
modifier description Perl corresponding
PCRE_CASELESS case insensitive match /i
PCRE_MULTILINE multiple lines match /m
PCRE_DOTALL dot matches newlines /s
PCRE_DOLLAR_ENDONLY $ matches only at end N/A
PCRE_EXTRA strict escape parsing N/A
PCRE_EXTENDED ignore whitespaces /x
PCRE_UTF8 handles UTF8 chars built-in
PCRE_UNGREEDY reverses * and *? N/A
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
</pre>
(*) Both Perl and PCRE allow non capturing parentheses by means of the
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
capture, while (ab|cd) does.
</P>
<P>
For a full account on how each modifier works, please check the
PCRE API reference page.
</P>
<P>
For each modifier, there are two member functions whose name is made
out of the modifier in lowercase, without the "PCRE_" prefix. For
instance, PCRE_CASELESS is handled by
<pre>
bool caseless()
</pre>
which returns true if the modifier is set, and
<pre>
RE_Options & set_caseless(bool)
</pre>
which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
functions. Setting <i>match_limit</i> to a non-zero value will limit the
execution of pcre to keep it from doing bad things like blowing the stack or
taking an eternity to return a result. A value of 5000 is good enough to stop
stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
recurses. <b>match_limit()</b> limits the number of matches PCRE does;
<b>match_limit_recursion()</b> limits the depth of internal recursion, and
therefore the amount of stack that is used.
</P>
<P>
Normally, to pass one or more modifiers to a RE class, you declare
a <i>RE_Options</i> object, set the appropriate options, and pass this
object to a RE constructor. Example:
<pre>
RE_options opt;
opt.set_caseless(true);
if (RE("HELLO", opt).PartialMatch("hello world")) ...
</pre>
RE_options has two constructors. The default constructor takes no arguments and
creates a set of flags that are off by default. The optional parameter
<i>option_flags</i> is to facilitate transfer of legacy code from C programs.
This lets you do
<pre>
RE(pattern,
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
</pre>
However, new code is better off doing
<pre>
RE(pattern,
RE_Options().set_caseless(true).set_multiline(true))
.PartialMatch(str);
</pre>
If you are going to pass one of the most used modifiers, there are some
convenience functions that return a RE_Options class with the
appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
<b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
</P>
<P>
If you need to set several options at once, and you don't want to go through
the pains of declaring a RE_Options object and setting several options, there
is a parallel method that give you such ability on the fly. You can concatenate
several <b>set_xxxxx()</b> member functions, since each of them returns a
reference to its class object. For example, to pass PCRE_CASELESS,
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
<pre>
RE(" ^ xyz \\s+ .* blah$",
RE_Options()
.set_caseless(true)
.set_extended(true)
.set_multiline(true)).PartialMatch(sometext);
</PRE>
</P>
<br><a name="SEC7" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
<P>
The "Consume" operation may be useful if you want to repeatedly
match regular expressions at the front of a string and skip over
them as they match. This requires use of the "StringPiece" type,
which represents a sub-range of a real string. Like RE, StringPiece
is defined in the pcrecpp namespace.
<pre>
Example: read lines of the form "var = value" from a string.
string contents = ...; // Fill string somehow
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
</PRE>
</P>
<P>
<pre>
string var;
int value;
pcrecpp::RE re("(\\w+) = (\\d+)\n");
while (re.Consume(&input, &var, &value)) {
...;
}
</pre>
Each successful call to "Consume" will set "var/value", and also
advance "input" so it points past the matched text.
</P>
<P>
The "FindAndConsume" operation is similar to "Consume" but does not
anchor your match at the beginning of the string. For example, you
could extract all words from a string by repeatedly calling
<pre>
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
</PRE>
</P>
<br><a name="SEC8" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
<P>
By default, if you pass a pointer to a numeric value, the
corresponding text is interpreted as a base-10 number. You can
instead wrap the pointer with a call to one of the operators Hex(),
Octal(), or CRadix() to interpret the text in another base. The
CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
prefixes, but defaults to base-10.
<pre>
Example:
int a, b, c, d;
pcrecpp::RE re("(.*) (.*) (.*) (.*)");
re.FullMatch("100 40 0100 0x40",
pcrecpp::Octal(&a), pcrecpp::Hex(&b),
pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
</pre>
will leave 64 in a, b, c, and d.
</P>
<br><a name="SEC9" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
<P>
You can replace the first match of "pattern" in "str" with "rewrite".
Within "rewrite", backslash-escaped digits (\1 to \9) can be
used to insert text matching corresponding parenthesized group
from the pattern. \0 in "rewrite" refers to the entire matching
text. For example:
<pre>
string s = "yabba dabba doo";
pcrecpp::RE("b+").Replace("d", &s);
</pre>
will leave "s" containing "yada dabba doo". The result is true if the pattern
matches and a replacement occurs, false otherwise.
</P>
<P>
<b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
occurrences of the pattern in the string with the rewrite. Replacements are
not subject to re-matching. For example:
<pre>
string s = "yabba dabba doo";
pcrecpp::RE("b+").GlobalReplace("d", &s);
</pre>
will leave "s" containing "yada dada doo". It returns the number of
replacements made.
</P>
<P>
<b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
"rewrite" is copied into "out" (an additional argument) with substitutions.
The non-matching portions of "text" are ignored. Returns true iff a match
occurred and the extraction happened successfully; if no match occurs, the
string is left unaffected.
</P>
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
<P>
The C++ wrapper was contributed by Google Inc.
<br>
Copyright &copy; 2005 Google Inc.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,424 @@
<html>
<head>
<title>pcregrep specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcregrep man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
<li><a name="TOC3" href="#SEC3">OPTIONS</a>
<li><a name="TOC4" href="#SEC4">ENVIRONMENT VARIABLES</a>
<li><a name="TOC5" href="#SEC5">NEWLINES</a>
<li><a name="TOC6" href="#SEC6">OPTIONS COMPATIBILITY</a>
<li><a name="TOC7" href="#SEC7">OPTIONS WITH DATA</a>
<li><a name="TOC8" href="#SEC8">MATCHING ERRORS</a>
<li><a name="TOC9" href="#SEC9">DIAGNOSTICS</a>
<li><a name="TOC10" href="#SEC10">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
<P>
<b>pcregrep [options] [long options] [pattern] [path1 path2 ...]</b>
</P>
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
<P>
<b>pcregrep</b> searches files for character patterns, in the same way as other
grep commands do, but it uses the PCRE regular expression library to support
patterns that are compatible with the regular expressions of Perl 5. See
<a href="pcrepattern.html"><b>pcrepattern</b></a>
for a full description of syntax and semantics of the regular expressions that
PCRE supports.
</P>
<P>
Patterns, whether supplied on the command line or in a separate file, are given
without delimiters. For example:
<pre>
pcregrep Thursday /etc/motd
</pre>
If you attempt to use delimiters (for example, by surrounding a pattern with
slashes, as is common in Perl scripts), they are interpreted as part of the
pattern. Quotes can of course be used on the command line because they are
interpreted by the shell, and indeed they are required if a pattern contains
white space or shell metacharacters.
</P>
<P>
The first argument that follows any option settings is treated as the single
pattern to be matched when neither <b>-e</b> nor <b>-f</b> is present.
Conversely, when one or both of these options are used to specify patterns, all
arguments are treated as path names. At least one of <b>-e</b>, <b>-f</b>, or an
argument pattern must be provided.
</P>
<P>
If no files are specified, <b>pcregrep</b> reads the standard input. The
standard input can also be referenced by a name consisting of a single hyphen.
For example:
<pre>
pcregrep some-pattern /file1 - /file3
</pre>
By default, each line that matches the pattern is copied to the standard
output, and if there is more than one file, the file name is output at the
start of each line. However, there are options that can change how
<b>pcregrep</b> behaves. In particular, the <b>-M</b> option makes it possible to
search for patterns that span line boundaries. What defines a line boundary is
controlled by the <b>-N</b> (<b>--newline</b>) option.
</P>
<P>
Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>.
</P>
<P>
If the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variable is set,
<b>pcregrep</b> uses the value to set a locale when calling the PCRE library.
The <b>--locale</b> option can be used to override this.
</P>
<br><a name="SEC3" href="#TOC1">OPTIONS</a><br>
<P>
<b>--</b>
This terminate the list of options. It is useful if the next item on the
command line starts with a hyphen but is not an option. This allows for the
processing of patterns and filenames that start with hyphens.
</P>
<P>
<b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
Output <i>number</i> lines of context after each matching line. If filenames
and/or line numbers are being output, a hyphen separator is used instead of a
colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
guarantees to have up to 8K of following text available for context output.
</P>
<P>
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
Output <i>number</i> lines of context before each matching line. If filenames
and/or line numbers are being output, a hyphen separator is used instead of a
colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
guarantees to have up to 8K of preceding text available for context output.
</P>
<P>
<b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
Output <i>number</i> lines of context both before and after each matching line.
This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
</P>
<P>
<b>-c</b>, <b>--count</b>
Do not output individual lines; instead just output a count of the number of
lines that would otherwise have been output. If several files are given, a
count is output for each of them. In this mode, the <b>-A</b>, <b>-B</b>, and
<b>-C</b> options are ignored.
</P>
<P>
<b>--colour</b>, <b>--color</b>
If this option is given without any data, it is equivalent to "--colour=auto".
If data is required, it must be given in the same shell item, separated by an
equals sign.
</P>
<P>
<b>--colour=</b><i>value</i>, <b>--color=</b><i>value</i>
This option specifies under what circumstances the part of a line that matched
a pattern should be coloured in the output. The value may be "never" (the
default), "always", or "auto". In the latter case, colouring happens only if
the standard output is connected to a terminal. The colour can be specified by
setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
of this variable should be a string of two numbers, separated by a semicolon.
They are copied directly into the control string for setting colour on a
terminal, so it is your responsibility to ensure that they make sense. If
neither of the environment variables is set, the default is "1;31", which gives
red.
</P>
<P>
<b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
If an input path is not a regular file or a directory, "action" specifies how
it is to be processed. Valid values are "read" (the default) or "skip"
(silently skip the path).
</P>
<P>
<b>-d</b> <i>action</i>, <b>--directories=</b><i>action</i>
If an input path is a directory, "action" specifies how it is to be processed.
Valid values are "read" (the default), "recurse" (equivalent to the <b>-r</b>
option), or "skip" (silently skip the path). In the default case, directories
are read as if they were ordinary files. In some operating systems the effect
of reading a directory like this is an immediate end-of-file.
</P>
<P>
<b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>,
<b>--regexp=</b><i>pattern</i> Specify a pattern to be matched. This option can
be used multiple times in order to specify several patterns. It can also be
used as a way of specifying a single pattern that starts with a hyphen. When
<b>-e</b> is used, no argument pattern is taken from the command line; all
arguments are treated as file names. There is an overall maximum of 100
patterns. They are applied to each line in the order in which they are defined
until one matches (or fails to match if <b>-v</b> is used). If <b>-f</b> is used
with <b>-e</b>, the command line patterns are matched first, followed by the
patterns from the file, independent of the order in which these options are
specified. Note that multiple use of <b>-e</b> is not the same as a single
pattern with alternatives. For example, X|Y finds the first character in a line
that is X or Y, whereas if the two patterns are given separately,
<b>pcregrep</b> finds X if it is present, even if it follows Y in the line. It
finds Y only if there is no X in the line. This really matters only if you are
using <b>-o</b> to show the portion of the line that matched.
</P>
<P>
<b>--exclude</b>=<i>pattern</i>
When <b>pcregrep</b> is searching the files in a directory as a consequence of
the <b>-r</b> (recursive search) option, any files whose names match the pattern
are excluded. The pattern is a PCRE regular expression. If a file name matches
both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no short
form for this option.
</P>
<P>
<b>-F</b>, <b>--fixed-strings</b>
Interpret each pattern as a list of fixed strings, separated by newlines,
instead of as a regular expression. The <b>-w</b> (match as a word) and <b>-x</b>
(match whole line) options can be used with <b>-F</b>. They apply to each of the
fixed strings. A line is selected if any of the fixed strings are found in it
(subject to <b>-w</b> or <b>-x</b>, if present).
</P>
<P>
<b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
Read a number of patterns from the file, one per line, and match them against
each line of input. A data line is output if any of the patterns match it. The
filename can be given as "-" to refer to the standard input. When <b>-f</b> is
used, patterns specified on the command line using <b>-e</b> may also be
present; they are tested before the file's patterns. However, no other pattern
is taken from the command line; all arguments are treated as file names. There
is an overall maximum of 100 patterns. Trailing white space is removed from
each line, and blank lines are ignored. An empty file contains no patterns and
therefore matches nothing.
</P>
<P>
<b>-H</b>, <b>--with-filename</b>
Force the inclusion of the filename at the start of output lines when searching
a single file. By default, the filename is not shown in this case. For matching
lines, the filename is followed by a colon and a space; for context lines, a
hyphen separator is used. If a line number is also being output, it follows the
file name without a space.
</P>
<P>
<b>-h</b>, <b>--no-filename</b>
Suppress the output filenames when searching multiple files. By default,
filenames are shown when multiple files are searched. For matching lines, the
filename is followed by a colon and a space; for context lines, a hyphen
separator is used. If a line number is also being output, it follows the file
name without a space.
</P>
<P>
<b>--help</b>
Output a brief help message and exit.
</P>
<P>
<b>-i</b>, <b>--ignore-case</b>
Ignore upper/lower case distinctions during comparisons.
</P>
<P>
<b>--include</b>=<i>pattern</i>
When <b>pcregrep</b> is searching the files in a directory as a consequence of
the <b>-r</b> (recursive search) option, only those files whose names match the
pattern are included. The pattern is a PCRE regular expression. If a file name
matches both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no
short form for this option.
</P>
<P>
<b>-L</b>, <b>--files-without-match</b>
Instead of outputting lines from the files, just output the names of the files
that do not contain any lines that would have been output. Each file name is
output once, on a separate line.
</P>
<P>
<b>-l</b>, <b>--files-with-matches</b>
Instead of outputting lines from the files, just output the names of the files
containing lines that would have been output. Each file name is output
once, on a separate line. Searching stops as soon as a matching line is found
in a file.
</P>
<P>
<b>--label</b>=<i>name</i>
This option supplies a name to be used for the standard input when file names
are being output. If not supplied, "(standard input)" is used. There is no
short form for this option.
</P>
<P>
<b>--locale</b>=<i>locale-name</i>
This option specifies a locale to be used for pattern matching. It overrides
the value in the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variables. If no
locale is specified, the PCRE library's default (usually the "C" locale) is
used. There is no short form for this option.
</P>
<P>
<b>-M</b>, <b>--multiline</b>
Allow patterns to match more than one line. When this option is given, patterns
may usefully contain literal newline characters and internal occurrences of ^
and $ characters. The output for any one match may consist of more than one
line. When this option is set, the PCRE library is called in "multiline" mode.
There is a limit to the number of lines that can be matched, imposed by the way
that <b>pcregrep</b> buffers the input file as it scans it. However,
<b>pcregrep</b> ensures that at least 8K characters or the rest of the document
(whichever is the shorter) are available for forward matching, and similarly
the previous 8K characters (or all the previous characters, if fewer than 8K)
are guaranteed to be available for lookbehind assertions.
</P>
<P>
<b>-N</b> <i>newline-type</i>, <b>--newline=</b><i>newline-type</i>
The PCRE library supports three different character sequences for indicating
the ends of lines. They are the single-character sequences CR (carriage return)
and LF (linefeed), and the two-character sequence CR, LF. When the library is
built, a default line-ending sequence is specified. This is normally the
standard sequence for the operating system. Unless otherwise specified by this
option, <b>pcregrep</b> uses the default. The possible values for this option
are CR, LF, or CRLF. This makes it possible to use <b>pcregrep</b> on files that
have come from other environments without having to modify their line endings.
If the data that is being scanned does not agree with the convention set by
this option, <b>pcregrep</b> may behave in strange ways.
</P>
<P>
<b>-n</b>, <b>--line-number</b>
Precede each output line by its line number in the file, followed by a colon
and a space for matching lines or a hyphen and a space for context lines. If
the filename is also being output, it precedes the line number.
</P>
<P>
<b>-o</b>, <b>--only-matching</b>
Show only the part of the line that matched a pattern. In this mode, no
context is shown. That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b> options are
ignored.
</P>
<P>
<b>-q</b>, <b>--quiet</b>
Work quietly, that is, display nothing except error messages. The exit
status indicates whether or not any matches were found.
</P>
<P>
<b>-r</b>, <b>--recursive</b>
If any given path is a directory, recursively scan the files it contains,
taking note of any <b>--include</b> and <b>--exclude</b> settings. By default, a
directory is read as a normal file; in some operating systems this gives an
immediate end-of-file. This option is a shorthand for setting the <b>-d</b>
option to "recurse".
</P>
<P>
<b>-s</b>, <b>--no-messages</b>
Suppress error messages about non-existent or unreadable files. Such files are
quietly skipped. However, the return code is still 2, even if matches were
found in other files.
</P>
<P>
<b>-u</b>, <b>--utf-8</b>
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
with UTF-8 support. Both patterns and subject lines must be valid strings of
UTF-8 characters.
</P>
<P>
<b>-V</b>, <b>--version</b>
Write the version numbers of <b>pcregrep</b> and the PCRE library that is being
used to the standard error stream.
</P>
<P>
<b>-v</b>, <b>--invert-match</b>
Invert the sense of the match, so that lines which do <i>not</i> match any of
the patterns are the ones that are found.
</P>
<P>
<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
Force the patterns to match only whole words. This is equivalent to having \b
at the start and end of the pattern.
</P>
<P>
<b>-x</b>, <b>--line-regex</b>, \fP--line-regexp\fP
Force the patterns to be anchored (each must start matching at the beginning of
a line) and in addition, require them to match entire lines. This is
equivalent to having ^ and $ characters at the start and end of each
alternative branch in every pattern.
</P>
<br><a name="SEC4" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
<P>
The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
order, for a locale. The first one that is set is used. This can be overridden
by the <b>--locale</b> option. If no locale is set, the PCRE library's default
(usually the "C" locale) is used.
</P>
<br><a name="SEC5" href="#TOC1">NEWLINES</a><br>
<P>
The <b>-N</b> (<b>--newline</b>) option allows <b>pcregrep</b> to scan files with
different newline conventions from the default. However, the setting of this
option does not affect the way in which <b>pcregrep</b> writes information to
the standard error and output streams. It uses the string "\n" in C
<b>printf()</b> calls to indicate newlines, relying on the C I/O library to
convert this to an appropriate sequence if the output is sent to a file.
</P>
<br><a name="SEC6" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
<P>
The majority of short and long forms of <b>pcregrep</b>'s options are the same
as in the GNU <b>grep</b> program. Any long option of the form
<b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
(PCRE terminology). However, the <b>--locale</b>, <b>-M</b>, <b>--multiline</b>,
<b>-u</b>, and <b>--utf-8</b> options are specific to <b>pcregrep</b>.
</P>
<br><a name="SEC7" href="#TOC1">OPTIONS WITH DATA</a><br>
<P>
There are four different ways in which an option with data can be specified.
If a short form option is used, the data may follow immediately, or in the next
command line item. For example:
<pre>
-f/some/file
-f /some/file
</pre>
If a long form option is used, the data may appear in the same command line
item, separated by an equals character, or (with one exception) it may appear
in the next command line item. For example:
<pre>
--file=/some/file
--file /some/file
</pre>
Note, however, that if you want to supply a file name beginning with ~ as data
in a shell command, and have the shell expand ~ to a home directory, you must
separate the file name from the option, because the shell does not treat ~
specially unless it is at the start of an item.
</P>
<P>
The exception to the above is the <b>--colour</b> (or <b>--color</b>) option,
for which the data is optional. If this option does have data, it must be given
in the first form, using an equals character. Otherwise it will be assumed that
it has no data.
</P>
<br><a name="SEC8" href="#TOC1">MATCHING ERRORS</a><br>
<P>
It is possible to supply a regular expression that takes a very long time to
fail to match certain lines. Such patterns normally involve nested indefinite
repeats, for example: (a+)*\d when matched against a line of a's with no final
digit. The PCRE matching function has a resource limit that causes it to abort
in these circumstances. If this happens, <b>pcregrep</b> outputs an error
message and the line that caused the problem to the standard error stream. If
there are more than 20 such errors, <b>pcregrep</b> gives up.
</P>
<br><a name="SEC9" href="#TOC1">DIAGNOSTICS</a><br>
<P>
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
for syntax errors and non-existent or inacessible files (even if matches were
found in other files) or too many matching errors. Using the <b>-s</b> option to
suppress error messages about inaccessble files does not affect the return
code.
</P>
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
University Computing Service
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Last updated: 06 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,192 @@
<html>
<head>
<title>pcrematching specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrematching man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE MATCHING ALGORITHMS</a>
<li><a name="TOC2" href="#SEC2">REGULAR EXPRESSIONS AS TREES</a>
<li><a name="TOC3" href="#SEC3">THE STANDARD MATCHING ALGORITHM</a>
<li><a name="TOC4" href="#SEC4">THE DFA MATCHING ALGORITHM</a>
<li><a name="TOC5" href="#SEC5">ADVANTAGES OF THE DFA ALGORITHM</a>
<li><a name="TOC6" href="#SEC6">DISADVANTAGES OF THE DFA ALGORITHM</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE MATCHING ALGORITHMS</a><br>
<P>
This document describes the two different algorithms that are available in PCRE
for matching a compiled regular expression against a given subject string. The
"standard" algorithm is the one provided by the <b>pcre_exec()</b> function.
This works in the same was as Perl's matching function, and provides a
Perl-compatible matching operation.
</P>
<P>
An alternative algorithm is provided by the <b>pcre_dfa_exec()</b> function;
this operates in a different way, and is not Perl-compatible. It has advantages
and disadvantages compared with the standard algorithm, and these are described
below.
</P>
<P>
When there is only one possible way in which a given subject string can match a
pattern, the two algorithms give the same answer. A difference arises, however,
when there are multiple possibilities. For example, if the pattern
<pre>
^&#60;.*&#62;
</pre>
is matched against the string
<pre>
&#60;something&#62; &#60;something else&#62; &#60;something further&#62;
</pre>
there are three possible answers. The standard algorithm finds only one of
them, whereas the DFA algorithm finds all three.
</P>
<br><a name="SEC2" href="#TOC1">REGULAR EXPRESSIONS AS TREES</a><br>
<P>
The set of strings that are matched by a regular expression can be represented
as a tree structure. An unlimited repetition in the pattern makes the tree of
infinite size, but it is still a tree. Matching the pattern to a given subject
string (from a given starting point) can be thought of as a search of the tree.
There are two ways to search a tree: depth-first and breadth-first, and these
correspond to the two matching algorithms provided by PCRE.
</P>
<br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br>
<P>
In the terminology of Jeffrey Friedl's book \fIMastering Regular
Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a
depth-first search of the pattern tree. That is, it proceeds along a single
path through the tree, checking that the subject matches what is required. When
there is a mismatch, the algorithm tries any alternatives at the current point,
and if they all fail, it backs up to the previous branch point in the tree, and
tries the next alternative branch at that level. This often involves backing up
(moving to the left) in the subject string as well. The order in which
repetition branches are tried is controlled by the greedy or ungreedy nature of
the quantifier.
</P>
<P>
If a leaf node is reached, a matching string has been found, and at that point
the algorithm stops. Thus, if there is more than one possible match, this
algorithm returns the first one that it finds. Whether this is the shortest,
the longest, or some intermediate length depends on the way the greedy and
ungreedy repetition quantifiers are specified in the pattern.
</P>
<P>
Because it ends up with a single path through the tree, it is relatively
straightforward for this algorithm to keep track of the substrings that are
matched by portions of the pattern in parentheses. This provides support for
capturing parentheses and back references.
</P>
<br><a name="SEC4" href="#TOC1">THE DFA MATCHING ALGORITHM</a><br>
<P>
DFA stands for "deterministic finite automaton", but you do not need to
understand the origins of that name. This algorithm conducts a breadth-first
search of the tree. Starting from the first matching point in the subject, it
scans the subject string from left to right, once, character by character, and
as it does this, it remembers all the paths through the tree that represent
valid matches.
</P>
<P>
The scan continues until either the end of the subject is reached, or there are
no more unterminated paths. At this point, terminated paths represent the
different matching possibilities (if there are none, the match has failed).
Thus, if there is more than one possible match, this algorithm finds all of
them, and in particular, it finds the longest. In PCRE, there is an option to
stop the algorithm after the first match (which is necessarily the shortest)
has been found.
</P>
<P>
Note that all the matches that are found start at the same point in the
subject. If the pattern
<pre>
cat(er(pillar)?)
</pre>
is matched against the string "the caterpillar catchment", the result will be
the three strings "cat", "cater", and "caterpillar" that start at the fourth
character of the subject. The algorithm does not automatically move on to find
matches that start at later positions.
</P>
<P>
There are a number of features of PCRE regular expressions that are not
supported by the DFA matching algorithm. They are as follows:
</P>
<P>
1. Because the algorithm finds all possible matches, the greedy or ungreedy
nature of repetition quantifiers is not relevant. Greedy and ungreedy
quantifiers are treated in exactly the same way.
</P>
<P>
2. When dealing with multiple paths through the tree simultaneously, it is not
straightforward to keep track of captured substrings for the different matching
possibilities, and PCRE's implementation of this algorithm does not attempt to
do this. This means that no captured substrings are available.
</P>
<P>
3. Because no substrings are captured, back references within the pattern are
not supported, and cause errors if encountered.
</P>
<P>
4. For the same reason, conditional expressions that use a backreference as the
condition are not supported.
</P>
<P>
5. Callouts are supported, but the value of the <i>capture_top</i> field is
always 1, and the value of the <i>capture_last</i> field is always -1.
</P>
<P>
6.
The \C escape sequence, which (in the standard algorithm) matches a single
byte, even in UTF-8 mode, is not supported because the DFA algorithm moves
through the subject string one character at a time, for all active paths
through the tree.
</P>
<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE DFA ALGORITHM</a><br>
<P>
Using the DFA matching algorithm provides the following advantages:
</P>
<P>
1. All possible matches (at a single point in the subject) are automatically
found, and in particular, the longest match is found. To find more than one
match using the standard algorithm, you have to do kludgy things with
callouts.
</P>
<P>
2. There is much better support for partial matching. The restrictions on the
content of the pattern that apply when using the standard algorithm for partial
matching do not apply to the DFA algorithm. For non-anchored patterns, the
starting position of a partial match is available.
</P>
<P>
3. Because the DFA algorithm scans the subject string just once, and never
needs to backtrack, it is possible to pass very long subject strings to the
matching function in several pieces, checking for partial matching each time.
</P>
<br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE DFA ALGORITHM</a><br>
<P>
The DFA algorithm suffers from a number of disadvantages:
</P>
<P>
1. It is substantially slower than the standard algorithm. This is partly
because it has to search for all possible matches, but is also because it is
less susceptible to optimization.
</P>
<P>
2. Capturing parentheses and back references are not supported.
</P>
<P>
3. The "atomic group" feature of PCRE regular expressions is supported, but
does not provide the advantage that it does for the standard algorithm.
</P>
<P>
Last updated: 06 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,225 @@
<html>
<head>
<title>pcrepartial specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrepartial man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
<li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>
<li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
<li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
</ul>
<br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
<P>
In normal use of PCRE, if the subject string that is passed to
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> matches as far as it goes, but is
too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
are circumstances where it might be helpful to distinguish this case from other
cases in which there is no match.
</P>
<P>
Consider, for example, an application where a human is required to type in data
for a field with specific formatting requirements. An example might be a date
in the form <i>ddmmmyy</i>, defined by this pattern:
<pre>
^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
</pre>
If the application sees the user's keystrokes one by one, and can check that
what has been typed so far is potentially valid, it is able to raise an error
as soon as a mistake is made, possibly beeping and not reflecting the
character that has been typed. This immediate feedback is likely to be a better
user interface than a check that is delayed until the entire string has been
entered.
</P>
<P>
PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
option, which can be set when calling <b>pcre_exec()</b> or
<b>pcre_dfa_exec()</b>. When this flag is set for <b>pcre_exec()</b>, the return
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
during the matching process the last part of the subject string matched part of
the pattern. Unfortunately, for non-anchored matching, it is not possible to
obtain the position of the start of the partial match. No captured data is set
when PCRE_ERROR_PARTIAL is returned.
</P>
<P>
When PCRE_PARTIAL is set for <b>pcre_dfa_exec()</b>, the return code
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
subject is reached, there have been no complete matches, but there is still at
least one matching possibility. The portion of the string that provided the
partial match is set as the first matching string.
</P>
<P>
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
last literal byte in a pattern, and abandons matching immediately if such a
byte is not present in the subject string. This optimization cannot be used
for a subject string that might match only partially.
</P>
<br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>
<P>
Because of the way certain internal optimizations are implemented in the
<b>pcre_exec()</b> function, the PCRE_PARTIAL option cannot be used with all
patterns. These restrictions do not apply when <b>pcre_dfa_exec()</b> is used.
For <b>pcre_exec()</b>, repeated single characters such as
<pre>
a{2,4}
</pre>
and repeated single metasequences such as
<pre>
\d+
</pre>
are not permitted if the maximum number of occurrences is greater than one.
Optional items such as \d? (where the maximum is one) are permitted.
Quantifiers with any values are permitted after parentheses, so the invalid
examples above can be coded thus:
<pre>
(a){2,4}
(\d)+
</pre>
These constructions run more slowly, but for the kinds of application that are
envisaged for this facility, this is not felt to be a major restriction.
</P>
<P>
If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
<b>pcre_exec()</b> returns the error code PCRE_ERROR_BADPARTIAL (-13).
</P>
<br><a name="SEC3" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a><br>
<P>
If the escape sequence \P is present in a <b>pcretest</b> data line, the
PCRE_PARTIAL flag is used for the match. Here is a run of <b>pcretest</b> that
uses the date example quoted above:
<pre>
re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data&#62; 25jun04\P
0: 25jun04
1: jun
data&#62; 25dec3\P
Partial match
data&#62; 3ju\P
Partial match
data&#62; 3juj\P
No match
data&#62; j\P
No match
</pre>
The first data string is matched completely, so <b>pcretest</b> shows the
matched substrings. The remaining four strings do not match the complete
pattern, but the first two are partial matches. The same test, using DFA
matching (by means of the \D escape sequence), produces the following output:
<pre>
re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data&#62; 25jun04\P\D
0: 25jun04
data&#62; 23dec3\P\D
Partial match: 23dec3
data&#62; 3ju\P\D
Partial match: 3ju
data&#62; 3juj\P\D
No match
data&#62; j\P\D
No match
</pre>
Notice that in this case the portion of the string that was matched is made
available.
</P>
<br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
<P>
When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
to continue the match by providing additional subject data and calling
<b>pcre_dfa_exec()</b> again with the PCRE_DFA_RESTART option and the same
working space (where details of the previous partial match are stored). Here is
an example using <b>pcretest</b>, where the \R escape sequence sets the
PCRE_DFA_RESTART option and the \D escape sequence requests the use of
<b>pcre_dfa_exec()</b>:
<pre>
re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data&#62; 23ja\P\D
Partial match: 23ja
data&#62; n05\R\D
0: n05
</pre>
The first call has "23ja" as the subject, and requests partial matching; the
second call has "n05" as the subject for the continued (restarted) match.
Notice that when the match is complete, only the last part is shown; PCRE does
not retain the previously partially-matched string. It is up to the calling
program to do that if it needs to.
</P>
<P>
This facility can be used to pass very long subject strings to
<b>pcre_dfa_exec()</b>. However, some care is needed for certain types of
pattern.
</P>
<P>
1. If the pattern contains tests for the beginning or end of a line, you need
to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
subject string for any call does not contain the beginning or end of a line.
</P>
<P>
2. If the pattern contains backward assertions (including \b or \B), you need
to arrange for some overlap in the subject strings to allow for this. For
example, you could pass the subject in chunks that were 500 bytes long, but in
a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
bytes at the start of the buffer.
</P>
<P>
3. Matching a subject string that is split into multiple segments does not
always produce exactly the same result as matching over one single long string.
The difference arises when there are multiple matching possibilities, because a
partial match result is given only when there are no completed matches in a
call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
been found, continuation to a new subject segment is no longer possible.
Consider this <b>pcretest</b> example:
<pre>
re&#62; /dog(sbody)?/
data&#62; do\P\D
Partial match: do
data&#62; gsb\R\P\D
0: g
data&#62; dogsbody\D
0: dogsbody
1: dog
</pre>
The pattern matches the words "dog" or "dogsbody". When the subject is
presented in several parts ("do" and "gsb" being the first two) the match stops
when "dog" has been found, and it is not possible to continue. On the other
hand, if "dogsbody" is presented as a single string, both matches are found.
</P>
<P>
Because of this phenomenon, it does not usually make sense to end a pattern
that is going to be matched in this way with a variable repeat.
</P>
<P>
4. Patterns that contain alternatives at the top level which do not all
start with the same pattern item may not work as expected. For example,
consider this pattern:
<pre>
1234|3789
</pre>
If the first part of the subject is "ABC123", a partial match of the first
alternative is found at offset 3. There is no partial match for the second
alternative, because such a match does not start at the same point in the
subject string. Attempting to continue with the string "789" does not yield a
match because only those alternatives that match at one point in the subject
are remembered. The problem arises because the start of the second alternative
matches within the first alternative. There is no problem with anchored
patterns or patterns such as:
<pre>
1234|ABCD
</pre>
where no string can be a partial match for both alternatives.
</P>
<P>
Last updated: 16 January 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,97 @@
<html>
<head>
<title>pcreperform specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcreperform man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
PCRE PERFORMANCE
</b><br>
<P>
Certain items that may appear in regular expression patterns are more efficient
than others. It is more efficient to use a character class like [aeiou] than a
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
that provides the required behaviour is usually the most efficient. Jeffrey
Friedl's book contains a lot of useful general discussion about optimizing
regular expressions for efficient performance. This document contains a few
observations about PCRE.
</P>
<P>
Using Unicode character properties (the \p, \P, and \X escapes) is slow,
because PCRE has to scan a structure that contains data for over fifteen
thousand characters whenever it needs a character's property. If you can find
an alternative pattern that does not use character properties, it will probably
be faster.
</P>
<P>
When a pattern begins with .* not in parentheses, or in parentheses that are
not the subject of a backreference, and the PCRE_DOTALL option is set, the
pattern is implicitly anchored by PCRE, since it can match only at the start of
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
optimization, because the . metacharacter does not then match a newline, and if
the subject string contains newlines, the pattern may match from the character
immediately following one of them instead of from the very start. For example,
the pattern
<pre>
.*second
</pre>
matches the subject "first\nand second" (where \n stands for a newline
character), with the match starting at the seventh character. In order to do
this, PCRE has to retry the match starting after every newline in the subject.
</P>
<P>
If you are using such a pattern with subject strings that do not contain
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE
from having to scan along the subject looking for a newline to restart at.
</P>
<P>
Beware of patterns that contain nested indefinite repeats. These can take a
long time to run when applied to a string that does not match. Consider the
pattern fragment
<pre>
(a+)*
</pre>
This can match "aaaa" in 33 different ways, and this number increases very
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
times, and for each of those cases other than 0, the + repeats can match
different numbers of times.) When the remainder of the pattern is such that the
entire match is going to fail, PCRE has in principle to try every possible
variation, and this can take an extremely long time.
</P>
<P>
An optimization catches some of the more simple cases such as
<pre>
(a+)*b
</pre>
where a literal character follows. Before embarking on the standard matching
procedure, PCRE checks that there is a "b" later in the subject string, and if
there is not, it fails the match immediately. However, when there is no
following literal this optimization cannot be used. You can see the difference
by comparing the behaviour of
<pre>
(a+)*\d
</pre>
with the pattern above. The former gives a failure almost instantly when
applied to a whole line of "a" characters, whereas the latter takes an
appreciable time with strings longer than about 20 characters.
</P>
<P>
In many cases, the solution to this kind of performance issue is to use an
atomic group or a possessive quantifier.
</P>
<P>
Last updated: 28 February 2005
<br>
Copyright &copy; 1997-2005 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,244 @@
<html>
<head>
<title>pcreposix specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcreposix man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
<li><a name="TOC8" href="#SEC8">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
<P>
<b>#include &#60;pcreposix.h&#62;</b>
</P>
<P>
<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
<b>int <i>cflags</i>);</b>
</P>
<P>
<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
<b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
</P>
<P>
<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
<b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
</P>
<P>
<b>void regfree(regex_t *<i>preg</i>);</b>
</P>
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
<P>
This set of functions provides a POSIX-style API to the PCRE regular expression
package. See the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation for a description of PCRE's native API, which contains much
additional functionality.
</P>
<P>
The functions described here are just wrapper functions that ultimately call
the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
header file, and on Unix systems the library itself is called
<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
command for linking an application that uses them. Because the POSIX functions
call the native ones, it is also necessary to add <b>-lpcre</b>.
</P>
<P>
I have implemented only those option bits that can be reasonably mapped to PCRE
native options. In addition, the option REG_EXTENDED is defined with the value
zero. This has no effect, but since programs that are written to the POSIX
interface often use it, this makes it easier to slot in PCRE as a replacement
library. Other POSIX options are not even defined.
</P>
<P>
When PCRE is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are
still those of Perl, subject to the setting of various PCRE options, as
described below. "POSIX-like in style" means that the API approximates to the
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
domains it is probably even less compatible.
</P>
<P>
The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
potential clash with other POSIX libraries. It can, of course, be renamed or
aliased as <b>regex.h</b>, which is the "correct" name. It provides two
structure types, <i>regex_t</i> for compiled internal forms, and
<i>regmatch_t</i> for returning captured substrings. It also defines some
constants whose names start with "REG_"; these are used for setting options and
identifying error codes.
</P>
<P>
</P>
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
<P>
The function <b>regcomp()</b> is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
to a <b>regex_t</b> structure that is used as a base for storing information
about the compiled regular expression.
</P>
<P>
The argument <i>cflags</i> is either zero, or contains one or more of the bits
defined by the following macros:
<pre>
REG_DOTALL
</pre>
The PCRE_DOTALL option is set when the regular expression is passed for
compilation to the native function. Note that REG_DOTALL is not part of the
POSIX standard.
<pre>
REG_ICASE
</pre>
The PCRE_CASELESS option is set when the regular expression is passed for
compilation to the native function.
<pre>
REG_NEWLINE
</pre>
The PCRE_MULTILINE option is set when the regular expression is passed for
compilation to the native function. Note that this does <i>not</i> mimic the
defined POSIX behaviour for REG_NEWLINE (see the following section).
<pre>
REG_NOSUB
</pre>
The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
for compilation to the native function. In addition, when a pattern that is
compiled with this flag is passed to <b>regexec()</b> for matching, the
<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
are returned.
<pre>
REG_UTF8
</pre>
The PCRE_UTF8 option is set when the regular expression is passed for
compilation to the native function. This causes the pattern itself and all data
strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
is not part of the POSIX standard.
</P>
<P>
In the absence of these flags, no options are passed to the native function.
This means the the regex is compiled with PCRE default semantics. In
particular, the way it handles newline characters in the subject string is the
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
newlines are matched by . (they aren't) or by a negative class such as [^a]
(they are).
</P>
<P>
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
<i>preg</i> structure is filled in on success, and one member of the structure
is public: <i>re_nsub</i> contains the number of capturing subpatterns in
the regular expression. Various error codes are defined in the header file.
</P>
<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
<P>
This area is not simple, because POSIX and Perl take different views of things.
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
intended to be a POSIX engine. The following table lists the different
possibilities for matching newline characters in PCRE:
<pre>
Default Change with
. matches newline no PCRE_DOTALL
newline matches [^a] yes not changeable
$ matches \n at end yes PCRE_DOLLARENDONLY
$ matches \n in middle no PCRE_MULTILINE
^ matches \n in middle no PCRE_MULTILINE
</pre>
This is the equivalent table for POSIX:
<pre>
Default Change with
. matches newline yes REG_NEWLINE
newline matches [^a] yes REG_NEWLINE
$ matches \n at end no REG_NEWLINE
$ matches \n in middle no REG_NEWLINE
^ matches \n in middle no REG_NEWLINE
</pre>
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
newline from matching [^a].
</P>
<P>
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
REG_NEWLINE action.
</P>
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
<P>
The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
against a given <i>string</i>, which is terminated by a zero byte, subject to
the options in <i>eflags</i>. These can be:
<pre>
REG_NOTBOL
</pre>
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
function.
<pre>
REG_NOTEOL
</pre>
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
function.
</P>
<P>
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
<b>regexec()</b> are ignored.
</P>
<P>
Otherwise,the portion of the string that was matched, and also any captured
substrings, are returned via the <i>pmatch</i> argument, which points to an
array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
character of each substring and the offset to the first character after the end
of each substring, respectively. The 0th element of the vector relates to the
entire portion of <i>string</i> that was matched; subsequent elements relate to
the capturing subpatterns of the regular expression. Unused entries in the
array have both structure members set to -1.
</P>
<P>
A successful match yields a zero return; various error codes are defined in the
header file, of which REG_NOMATCH is the "expected" failure code.
</P>
<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
<P>
The <b>regerror()</b> function maps a non-zero errorcode from either
<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
NULL, the error should have arisen from the use of that structure. A message
terminated by a binary zero is placed in <i>errbuf</i>. The length of the
message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
function is the size of buffer needed to hold the whole message.
</P>
<br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
<P>
Compiling a regular expression causes memory to be allocated and associated
with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
memory, after which <i>preg</i> may no longer be used as a compiled expression.
</P>
<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
University Computing Service,
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Last updated: 16 January 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,140 @@
<html>
<head>
<title>pcreprecompile specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcreprecompile man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a>
<li><a name="TOC2" href="#SEC2">SAVING A COMPILED PATTERN</a>
<li><a name="TOC3" href="#SEC3">RE-USING A PRECOMPILED PATTERN</a>
<li><a name="TOC4" href="#SEC4">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a>
</ul>
<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a><br>
<P>
If you are running an application that uses a large number of regular
expression patterns, it may be useful to store them in a precompiled form
instead of having to compile them every time the application is run.
If you are not using any private character tables (see the
<a href="pcre_maketables.html"><b>pcre_maketables()</b></a>
documentation), this is relatively straightforward. If you are using private
tables, it is a little bit more complicated.
</P>
<P>
If you save compiled patterns to a file, you can copy them to a different host
and run them there. This works even if the new host has the opposite endianness
to the one on which the patterns were compiled. There may be a small
performance penalty, but it should be insignificant.
</P>
<br><a name="SEC2" href="#TOC1">SAVING A COMPILED PATTERN</a><br>
<P>
The value returned by <b>pcre_compile()</b> points to a single block of memory
that holds the compiled pattern and associated data. You can find the length of
this block in bytes by calling <b>pcre_fullinfo()</b> with an argument of
PCRE_INFO_SIZE. You can then save the data in any appropriate manner. Here is
sample code that compiles a pattern and writes it to a file. It assumes that
the variable <i>fd</i> refers to a file that is open for output:
<pre>
int erroroffset, rc, size;
char *error;
pcre *re;
re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
if (re == NULL) { ... handle errors ... }
rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
if (rc &#60; 0) { ... handle errors ... }
rc = fwrite(re, 1, size, fd);
if (rc != size) { ... handle errors ... }
</pre>
In this example, the bytes that comprise the compiled pattern are copied
exactly. Note that this is binary data that may contain any of the 256 possible
byte values. On systems that make a distinction between binary and non-binary
data, be sure that the file is opened for binary output.
</P>
<P>
If you want to write more than one pattern to a file, you will have to devise a
way of separating them. For binary data, preceding each pattern with its length
is probably the most straightforward approach. Another possibility is to write
out the data in hexadecimal instead of binary, one pattern to a line.
</P>
<P>
Saving compiled patterns in a file is only one possible way of storing them for
later use. They could equally well be saved in a database, or in the memory of
some daemon process that passes them via sockets to the processes that want
them.
</P>
<P>
If the pattern has been studied, it is also possible to save the study data in
a similar way to the compiled pattern itself. When studying generates
additional information, <b>pcre_study()</b> returns a pointer to a
<b>pcre_extra</b> data block. Its format is defined in the
<a href="pcreapi.html#extradata">section on matching a pattern</a>
in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation. The <i>study_data</i> field points to the binary study data, and
this is what you must save (not the <b>pcre_extra</b> block itself). The length
of the study data can be obtained by calling <b>pcre_fullinfo()</b> with an
argument of PCRE_INFO_STUDYSIZE. Remember to check that <b>pcre_study()</b> did
return a non-NULL value before trying to save the study data.
</P>
<br><a name="SEC3" href="#TOC1">RE-USING A PRECOMPILED PATTERN</a><br>
<P>
Re-using a precompiled pattern is straightforward. Having reloaded it into main
memory, you pass its pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in
the usual way. This should work even on another host, and even if that host has
the opposite endianness to the one where the pattern was compiled.
</P>
<P>
However, if you passed a pointer to custom character tables when the pattern
was compiled (the <i>tableptr</i> argument of <b>pcre_compile()</b>), you must
now pass a similar pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>,
because the value saved with the compiled pattern will obviously be nonsense. A
field in a <b>pcre_extra()</b> block is used to pass this data, as described in
the
<a href="pcreapi.html#extradata">section on matching a pattern</a>
in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<P>
If you did not provide custom character tables when the pattern was compiled,
the pointer in the compiled pattern is NULL, which causes <b>pcre_exec()</b> to
use PCRE's internal tables. Thus, you do not need to take any special action at
run time in this case.
</P>
<P>
If you saved study data with the compiled pattern, you need to create your own
<b>pcre_extra</b> data block and set the <i>study_data</i> field to point to the
reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
<i>flags</i> field to indicate that study data is present. Then pass the
<b>pcre_extra</b> block to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in the
usual way.
</P>
<br><a name="SEC4" href="#TOC1">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a><br>
<P>
The layout of the control block that is at the start of the data that makes up
a compiled pattern was changed for release 5.0. If you have any saved patterns
that were compiled with previous releases (not a facility that was previously
advertised), you will have to recompile them for release 5.0. However, from now
on, it should be possible to make changes in a compatible manner.
</P>
<P>
Notwithstanding the above, if you have any saved patterns in UTF-8 mode that
use \p or \P that were compiled with any release up to and including 6.4, you
will have to recompile them for release 6.5 and above.
</P>
<P>
Last updated: 01 February 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,81 @@
<html>
<head>
<title>pcresample specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcresample man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
PCRE SAMPLE PROGRAM
</b><br>
<P>
A simple, complete demonstration program, to get you started with using PCRE,
is supplied in the file <i>pcredemo.c</i> in the PCRE distribution.
</P>
<P>
The program compiles the regular expression that is its first argument, and
matches it against the subject string in its second argument. No PCRE options
are set, and default character tables are used. If matching succeeds, the
program outputs the portion of the subject that matched, together with the
contents of any captured substrings.
</P>
<P>
If the -g option is given on the command line, the program then goes on to
check for further matches of the same regular expression in the same subject
string. The logic is a little bit tricky because of the possibility of matching
an empty string. Comments in the code explain what is going on.
</P>
<P>
If PCRE is installed in the standard include and library directories for your
system, you should be able to compile the demonstration program using this
command:
<pre>
gcc -o pcredemo pcredemo.c -lpcre
</pre>
If PCRE is installed elsewhere, you may need to add additional options to the
command line. For example, on a Unix-like system that has PCRE installed in
<i>/usr/local</i>, you can compile the demonstration program using a command
like this:
<pre>
gcc -o pcredemo -I/usr/local/include pcredemo.c -L/usr/local/lib -lpcre
</pre>
Once you have compiled the demonstration program, you can run simple tests like
this:
<pre>
./pcredemo 'cat|dog' 'the cat sat on the mat'
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
</pre>
Note that there is a much more comprehensive test program, called
<a href="pcretest.html"><b>pcretest</b>,</a>
which supports many more facilities for testing regular expressions and the
PCRE library. The <b>pcredemo</b> program is provided as a simple coding
example.
</P>
<P>
On some operating systems (e.g. Solaris), when PCRE is not installed in the
standard library directory, you may get an error like this when you try to run
<b>pcredemo</b>:
<pre>
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
</pre>
This is caused by the way shared library support works on those systems. You
need to add
<pre>
-R/usr/local/lib
</pre>
(for example) to the compile command to get round this problem.
</P>
<P>
Last updated: 09 September 2004
<br>
Copyright &copy; 1997-2004 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,127 @@
<html>
<head>
<title>pcrestack specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrestack man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
PCRE DISCUSSION OF STACK USAGE
</b><br>
<P>
When you call <b>pcre_exec()</b>, it makes use of an internal function called
<b>match()</b>. This calls itself recursively at branch points in the pattern,
in order to remember the state of the match so that it can back up and try a
different alternative if the first one fails. As matching proceeds deeper and
deeper into the tree of possibilities, the recursion depth increases.
</P>
<P>
Not all calls of <b>match()</b> increase the recursion depth; for an item such
as a* it may be called several times at the same level, after matching
different numbers of a's. Furthermore, in a number of cases where the result of
the recursive call would immediately be passed back as the result of the
current call (a "tail recursion"), the function is just restarted instead.
</P>
<P>
The <b>pcre_dfa_exec()</b> function operates in an entirely different way, and
hardly uses recursion at all. The limit on its complexity is the amount of
workspace it is given. The comments that follow do NOT apply to
<b>pcre_dfa_exec()</b>; they are relevant only for <b>pcre_exec()</b>.
</P>
<P>
You can set limits on the number of times that <b>match()</b> is called, both in
total and recursively. If the limit is exceeded, an error occurs. For details,
see the
<a href="pcreapi.html#extradata">section on extra data for <b>pcre_exec()</b></a>
in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<P>
Each time that <b>match()</b> is actually called recursively, it uses memory
from the process stack. For certain kinds of pattern and data, very large
amounts of stack may be needed, despite the recognition of "tail recursion".
You can often reduce the amount of recursion, and therefore the amount of stack
used, by modifying the pattern that is being matched. Consider, for example,
this pattern:
<pre>
([^&#60;]|&#60;(?!inet))+
</pre>
It matches from wherever it starts until it encounters "&#60;inet" or the end of
the data, and is the kind of pattern that might be used when processing an XML
file. Each iteration of the outer parentheses matches either one character that
is not "&#60;" or a "&#60;" that is not followed by "inet". However, each time a
parenthesis is processed, a recursion occurs, so this formulation uses a stack
frame for each matched character. For a long string, a lot of stack is
required. Consider now this rewritten pattern, which matches exactly the same
strings:
<pre>
([^&#60;]++|&#60;(?!inet))
</pre>
This uses very much less stack, because runs of characters that do not contain
"&#60;" are "swallowed" in one item inside the parentheses. Recursion happens only
when a "&#60;" character that is not followed by "inet" is encountered (and we
assume this is relatively rare). A possessive quantifier is used to stop any
backtracking into the runs of non-"&#60;" characters, but that is not related to
stack usage.
</P>
<P>
In environments where stack memory is constrained, you might want to compile
PCRE to use heap memory instead of stack for remembering back-up points. This
makes it run a lot more slowly, however. Details of how to do this are given in
the
<a href="pcrebuild.html"><b>pcrebuild</b></a>
documentation.
</P>
<P>
In Unix-like environments, there is not often a problem with the stack, though
the default limit on stack size varies from system to system. Values from 8Mb
to 64Mb are common. You can find your default limit by running the command:
<pre>
ulimit -s
</pre>
The effect of running out of stack is often SIGSEGV, though sometimes an error
message is given. You can normally increase the limit on stack size by code
such as this:
<pre>
struct rlimit rlim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur = 100*1024*1024;
setrlimit(RLIMIT_STACK, &rlim);
</pre>
This reads the current limits (soft and hard) using <b>getrlimit()</b>, then
attempts to increase the soft limit to 100Mb using <b>setrlimit()</b>. You must
do this before calling <b>pcre_exec()</b>.
</P>
<P>
PCRE has an internal counter that can be used to limit the depth of recursion,
and thus cause <b>pcre_exec()</b> to give an error code before it runs out of
stack. By default, the limit is very large, and unlikely ever to operate. It
can be changed when PCRE is built, and it can also be set when
<b>pcre_exec()</b> is called. For details of these interfaces, see the
<a href="pcrebuild.html"><b>pcrebuild</b></a>
and
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<P>
As a very rough rule of thumb, you should reckon on about 500 bytes per
recursion. Thus, if you want to limit your stack usage to 8Mb, you
should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
support around 128000 recursions. The <b>pcretest</b> test program has a command
line option (<b>-S</b>) that can be used to increase its stack.
</P>
<P>
Last updated: 29 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@ -0,0 +1,616 @@
<html>
<head>
<title>pcretest specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcretest man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
<li><a name="TOC2" href="#SEC2">OPTIONS</a>
<li><a name="TOC3" href="#SEC3">DESCRIPTION</a>
<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>
<li><a name="TOC5" href="#SEC5">DATA LINES</a>
<li><a name="TOC6" href="#SEC6">THE ALTERNATIVE MATCHING FUNCTION</a>
<li><a name="TOC7" href="#SEC7">DEFAULT OUTPUT FROM PCRETEST</a>
<li><a name="TOC8" href="#SEC8">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
<li><a name="TOC9" href="#SEC9">RESTARTING AFTER A PARTIAL MATCH</a>
<li><a name="TOC10" href="#SEC10">CALLOUTS</a>
<li><a name="TOC11" href="#SEC11">SAVING AND RELOADING COMPILED PATTERNS</a>
<li><a name="TOC12" href="#SEC12">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
<P>
<b>pcretest [options] [source] [destination]</b>
<br>
<br>
<b>pcretest</b> was written as a test program for the PCRE regular expression
library itself, but it can also be used for experimenting with regular
expressions. This document describes the features of the test program; for
details of the regular expressions themselves, see the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
documentation. For details of the PCRE library function calls and their
options, see the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<br><a name="SEC2" href="#TOC1">OPTIONS</a><br>
<P>
<b>-C</b>
Output the version number of the PCRE library, and all available information
about the optional features that are included, and then exit.
</P>
<P>
<b>-d</b>
Behave as if each regex has the <b>/D</b> (debug) modifier; the internal
form is output after compilation.
</P>
<P>
<b>-dfa</b>
Behave as if each data line contains the \D escape sequence; this causes the
alternative matching function, <b>pcre_dfa_exec()</b>, to be used instead of the
standard <b>pcre_exec()</b> function (more detail is given below).
</P>
<P>
<b>-i</b>
Behave as if each regex has the <b>/I</b> modifier; information about the
compiled pattern is given after compilation.
</P>
<P>
<b>-m</b>
Output the size of each compiled pattern after it has been compiled. This is
equivalent to adding <b>/M</b> to each regular expression. For compatibility
with earlier versions of pcretest, <b>-s</b> is a synonym for <b>-m</b>.
</P>
<P>
<b>-o</b> <i>osize</i>
Set the number of elements in the output vector that is used when calling
<b>pcre_exec()</b> to be <i>osize</i>. The default value is 45, which is enough
for 14 capturing subexpressions. The vector size can be changed for individual
matching calls by including \O in the data line (see below).
</P>
<P>
<b>-p</b>
Behave as if each regex has the <b>/P</b> modifier; the POSIX wrapper API is
used to call PCRE. None of the other options has any effect when <b>-p</b> is
set.
</P>
<P>
<b>-q</b>
Do not output the version number of <b>pcretest</b> at the start of execution.
</P>
<P>
<b>-S</b> <i>size</i>
On Unix-like systems, set the size of the runtime stack to <i>size</i>
megabytes.
</P>
<P>
<b>-t</b>
Run each compile, study, and match many times with a timer, and output
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with
<b>-t</b>, because you will then get the size output a zillion times, and the
timing will be distorted.
</P>
<br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br>
<P>
If <b>pcretest</b> is given two filename arguments, it reads from the first and
writes to the second. If it is given only one filename argument, it reads from
that file and writes to stdout. Otherwise, it reads from stdin and writes to
stdout, and prompts for each line of input, using "re&#62;" to prompt for regular
expressions, and "data&#62;" to prompt for data lines.
</P>
<P>
The program handles any number of sets of input on a single input file. Each
set starts with a regular expression, and continues with any number of data
lines to be matched against the pattern.
</P>
<P>
Each data line is matched separately and independently. If you want to do
multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
depending on the newline setting) in a single line of input to encode the
newline characters. There is no limit on the length of data lines; the input
buffer is automatically extended if it is too small.
</P>
<P>
An empty line signals the end of the data lines, at which point a new regular
expression is read. The regular expressions are given enclosed in any
non-alphanumeric delimiters other than backslash, for example:
<pre>
/(a|bc)x+yz/
</pre>
White space before the initial delimiter is ignored. A regular expression may
be continued over several input lines, in which case the newline characters are
included within it. It is possible to include the delimiter within the pattern
by escaping it, for example
<pre>
/abc\/def/
</pre>
If you do so, the escape and the delimiter form part of the pattern, but since
delimiters are always non-alphanumeric, this does not affect its interpretation.
If the terminating delimiter is immediately followed by a backslash, for
example,
<pre>
/abc/\
</pre>
then a backslash is added to the end of the pattern. This is done to provide a
way of testing the error condition that arises if a pattern finishes with a
backslash, because
<pre>
/abc\/
</pre>
is interpreted as the first line of a pattern that starts with "abc/", causing
pcretest to read the next line as a continuation of the regular expression.
</P>
<br><a name="SEC4" href="#TOC1">PATTERN MODIFIERS</a><br>
<P>
A pattern may be followed by any number of modifiers, which are mostly single
characters. Following Perl usage, these are referred to below as, for example,
"the <b>/i</b> modifier", even though the delimiter of the pattern need not
always be a slash, and no slash is used when writing modifiers. Whitespace may
appear between the final pattern delimiter and the first modifier, and between
the modifiers themselves.
</P>
<P>
The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS,
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
<b>pcre_compile()</b> is called. These four modifier letters have the same
effect as they do in Perl. For example:
<pre>
/caseless/i
</pre>
The following table shows additional modifiers for setting PCRE options that do
not correspond to anything in Perl:
<pre>
<b>/A</b> PCRE_ANCHORED
<b>/C</b> PCRE_AUTO_CALLOUT
<b>/E</b> PCRE_DOLLAR_ENDONLY
<b>/f</b> PCRE_FIRSTLINE
<b>/J</b> PCRE_DUPNAMES
<b>/N</b> PCRE_NO_AUTO_CAPTURE
<b>/U</b> PCRE_UNGREEDY
<b>/X</b> PCRE_EXTRA
<b>/&#60;cr&#62;</b> PCRE_NEWLINE_CR
<b>/&#60;lf&#62;</b> PCRE_NEWLINE_LF
<b>/&#60;crlf&#62;</b> PCRE_NEWLINE_CRLF
</pre>
Those specifying line endings are literal strings as shown. Details of the
meanings of these PCRE options are given in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<br><b>
Finding all matches in a string
</b><br>
<P>
Searching for all possible matches within each subject string can be requested
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
again to search the remainder of the subject string. The difference between
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to
<b>pcre_exec()</b> to start searching at a new point within the entire string
(which is in effect what Perl does), whereas the latter passes over a shortened
substring. This makes a difference to the matching process if the pattern
begins with a lookbehind assertion (including \b or \B).
</P>
<P>
If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
flags set in order to search for another, non-empty, match at the same point.
If this second match fails, the start offset is advanced by one, and the normal
match is retried. This imitates the way Perl handles such cases when using the
<b>/g</b> modifier or the <b>split()</b> function.
</P>
<br><b>
Other modifiers
</b><br>
<P>
There are yet more modifiers for controlling the way <b>pcretest</b>
operates.
</P>
<P>
The <b>/+</b> modifier requests that as well as outputting the substring that
matched the entire pattern, pcretest should in addition output the remainder of
the subject string. This is useful for tests where the subject contains
multiple copies of the same substring.
</P>
<P>
The <b>/L</b> modifier must be followed directly by the name of a locale, for
example,
<pre>
/pattern/Lfr_FR
</pre>
For this reason, it must be the last modifier. The given locale is set,
<b>pcre_maketables()</b> is called to build a set of character tables for the
locale, and this is then passed to <b>pcre_compile()</b> when compiling the
regular expression. Without an <b>/L</b> modifier, NULL is passed as the tables
pointer; that is, <b>/L</b> applies only to the expression on which it appears.
</P>
<P>
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
compiled pattern (whether it is anchored, has a fixed first character, and
so on). It does this by calling <b>pcre_fullinfo()</b> after compiling a
pattern. If the pattern is studied, the results of that are also output.
</P>
<P>
The <b>/D</b> modifier is a PCRE debugging feature, which also assumes <b>/I</b>.
It causes the internal form of compiled regular expressions to be output after
compilation. If the pattern was studied, the information returned is also
output.
</P>
<P>
The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This
facility is for testing the feature in PCRE that allows it to execute patterns
that were compiled on a host with a different endianness. This feature is not
available when the POSIX interface to PCRE is being used, that is, when the
<b>/P</b> pattern modifier is specified. See also the section about saving and
reloading compiled patterns below.
</P>
<P>
The <b>/S</b> modifier causes <b>pcre_study()</b> to be called after the
expression has been compiled, and the results used when the expression is
matched.
</P>
<P>
The <b>/M</b> modifier causes the size of memory block used to hold the compiled
pattern to be output.
</P>
<P>
The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper
API rather than its native API. When this is done, all other modifiers except
<b>/i</b>, <b>/m</b>, and <b>/+</b> are ignored. REG_ICASE is set if <b>/i</b> is
present, and REG_NEWLINE is set if <b>/m</b> is present. The wrapper functions
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
</P>
<P>
The <b>/8</b> modifier causes <b>pcretest</b> to call PCRE with the PCRE_UTF8
option set. This turns on support for UTF-8 character handling in PCRE,
provided that it was compiled with this support enabled. This modifier also
causes any non-printing characters in output strings to be printed using the
\x{hh...} notation if they are valid UTF-8 sequences.
</P>
<P>
If the <b>/?</b> modifier is used with <b>/8</b>, it causes <b>pcretest</b> to
call <b>pcre_compile()</b> with the PCRE_NO_UTF8_CHECK option, to suppress the
checking of the string for UTF-8 validity.
</P>
<br><a name="SEC5" href="#TOC1">DATA LINES</a><br>
<P>
Before each data line is passed to <b>pcre_exec()</b>, leading and trailing
whitespace is removed, and it is then scanned for \ escapes. Some of these are
pretty esoteric features, intended for checking out some of the more
complicated features of PCRE. If you are just testing "ordinary" regular
expressions, you probably don't need any of these. The following escapes are
recognized:
<pre>
\a alarm (= BEL)
\b backspace
\e escape
\f formfeed
\n newline
\qdd set the PCRE_MATCH_LIMIT limit to dd (any number of digits)
\r carriage return
\t tab
\v vertical tab
\nnn octal character (up to 3 octal digits)
\xhh hexadecimal character (up to 2 hex digits)
\x{hh...} hexadecimal character, any number of digits in UTF-8 mode
\A pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\B pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\Cdd call pcre_copy_substring() for substring dd after a successful match (number less than 32)
\Cname call pcre_copy_named_substring() for substring "name" after a successful match (name termin-
ated by next non alphanumeric character)
\C+ show the current captured substrings at callout time
\C- do not supply a callout function
\C!n return 1 instead of 0 when callout number n is reached
\C!n!m return 1 instead of 0 when callout number n is reached for the nth time
\C*n pass the number n (may be negative) as callout data; this is used as the callout return value
\D use the <b>pcre_dfa_exec()</b> match function
\F only shortest match for <b>pcre_dfa_exec()</b>
\Gdd call pcre_get_substring() for substring dd after a successful match (number less than 32)
\Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin-
ated by next non-alphanumeric character)
\L call pcre_get_substringlist() after a successful match
\M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\Odd set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
\P pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
\R pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b>
\S output details of memory get/free calls during matching
\Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\? pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\&#62;dd start the match at offset dd (any number of digits);
this sets the <i>startoffset</i> argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\&#60;cr&#62; pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\&#60;lf&#62; pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\&#60;crlf&#62; pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
</pre>
The escapes that specify line endings are literal strings, exactly as shown.
A backslash followed by anything else just escapes the anything else. If the
very last character is a backslash, it is ignored. This gives a way of passing
an empty line as data, since a real empty line terminates the data input.
</P>
<P>
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
fields of the <b>pcre_extra</b> data structure, until it finds the minimum
numbers for each parameter that allow <b>pcre_exec()</b> to complete. The
<i>match_limit</i> number is a measure of the amount of backtracking that takes
place, and checking it out can be instructive. For most simple matches, the
number is quite small, but for patterns with very large numbers of matching
possibilities, it can become large very quickly with increasing length of
subject string. The <i>match_limit_recursion</i> number is a measure of how much
stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
to complete the match attempt.
</P>
<P>
When \O is used, the value specified may be higher or lower than the size set
by the <b>-O</b> command line option (or defaulted to 45); \O applies only to
the call of <b>pcre_exec()</b> for the line in which it appears.
</P>
<P>
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
API to be used, the only option-setting sequences that have any effect are \B
and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
<b>regexec()</b>.
</P>
<P>
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
of the <b>/8</b> modifier on the pattern. It is recognized always. There may be
any number of hexadecimal digits inside the braces. The result is from one to
six bytes, encoded according to the UTF-8 rules.
</P>
<br><a name="SEC6" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
<P>
By default, <b>pcretest</b> uses the standard PCRE matching function,
<b>pcre_exec()</b> to match each data line. From release 6.0, PCRE supports an
alternative matching function, <b>pcre_dfa_test()</b>, which operates in a
different way, and has some restrictions. The differences between the two
functions are described in the
<a href="pcrematching.html"><b>pcrematching</b></a>
documentation.
</P>
<P>
If a data line contains the \D escape sequence, or if the command line
contains the <b>-dfa</b> option, the alternative matching function is called.
This function finds all possible matches at a given point. If, however, the \F
escape sequence is present in the data line, it stops after the first match is
found. This is always the shortest possible match.
</P>
<br><a name="SEC7" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
<P>
This section describes the output when the normal matching function,
<b>pcre_exec()</b>, is being used.
</P>
<P>
When a match succeeds, pcretest outputs the list of captured substrings that
<b>pcre_exec()</b> returns, starting with number 0 for the string that matched
the whole pattern. Otherwise, it outputs "No match" or "Partial match"
when <b>pcre_exec()</b> returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
respectively, and otherwise the PCRE negative error number. Here is an example
of an interactive <b>pcretest</b> run.
<pre>
$ pcretest
PCRE version 5.00 07-Sep-2004
re&#62; /^abc(\d+)/
data&#62; abc123
0: abc123
1: 123
data&#62; xyz
No match
</pre>
If the strings contain any non-printing characters, they are output as \0x
escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the
pattern. If the pattern has the <b>/+</b> modifier, the output for substring 0
is followed by the the rest of the subject string, identified by "0+" like
this:
<pre>
re&#62; /cat/+
data&#62; cataract
0: cat
0+ aract
</pre>
If the pattern has the <b>/g</b> or <b>/G</b> modifier, the results of successive
matching attempts are output in sequence, like this:
<pre>
re&#62; /\Bi(\w\w)/g
data&#62; Mississippi
0: iss
1: ss
0: iss
1: ss
0: ipp
1: pp
</pre>
"No match" is output only if the first match attempt fails.
</P>
<P>
If any of the sequences <b>\C</b>, <b>\G</b>, or <b>\L</b> are present in a
data line that is successfully matched, the substrings extracted by the
convenience functions are output with C, G, or L after the string number
instead of a colon. This is in addition to the normal full list. The string
length (that is, the return from the extraction function) is given in
parentheses after each string for <b>\C</b> and <b>\G</b>.
</P>
<P>
Note that while patterns can be continued over several lines (a plain "&#62;"
prompt is used for continuations), data lines may not. However newlines can be
included in data by means of the \n escape (or \r or \r\n for those newline
settings).
</P>
<br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
<P>
When the alternative matching function, <b>pcre_dfa_exec()</b>, is used (by
means of the \D escape sequence or the <b>-dfa</b> command line option), the
output consists of a list of all the matches that start at the first point in
the subject where there is at least one match. For example:
<pre>
re&#62; /(tang|tangerine|tan)/
data&#62; yellow tangerine\D
0: tangerine
1: tang
2: tan
</pre>
(Using the normal matching function on this data finds only "tang".) The
longest matching string is always given first (and numbered zero).
</P>
<P>
If \fB/g\P is present on the pattern, the search for further matches resumes
at the end of the longest match. For example:
<pre>
re&#62; /(tang|tangerine|tan)/g
data&#62; yellow tangerine and tangy sultana\D
0: tangerine
1: tang
2: tan
0: tang
1: tan
0: tan
</pre>
Since the matching function does not support substring capture, the escape
sequences that are concerned with captured substrings are not relevant.
</P>
<br><a name="SEC9" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
<P>
When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
indicating that the subject partially matched the pattern, you can restart the
match with additional subject data by means of the \R escape sequence. For
example:
<pre>
re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data&#62; 23ja\P\D
Partial match: 23ja
data&#62; n05\R\D
0: n05
</pre>
For further information about partial matching, see the
<a href="pcrepartial.html"><b>pcrepartial</b></a>
documentation.
</P>
<br><a name="SEC10" href="#TOC1">CALLOUTS</a><br>
<P>
If the pattern contains any callout requests, <b>pcretest</b>'s callout function
is called during matching. This works with both matching functions. By default,
the called function displays the callout number, the start and current
positions in the text at the callout time, and the next pattern item to be
tested. For example, the output
<pre>
---&#62;pqrabcdef
0 ^ ^ \d
</pre>
indicates that callout number 0 occurred for a match attempt starting at the
fourth character of the subject string, when the pointer was at the seventh
character of the data, and when the next pattern item was \d. Just one
circumflex is output if the start and current positions are the same.
</P>
<P>
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
result of the <b>/C</b> pattern modifier. In this case, instead of showing the
callout number, the offset in the pattern, preceded by a plus, is output. For
example:
<pre>
re&#62; /\d?[A-E]\*/C
data&#62; E*
---&#62;E*
+0 ^ \d?
+3 ^ [A-E]
+8 ^^ \*
+10 ^ ^
0: E*
</pre>
The callout function in <b>pcretest</b> returns zero (carry on matching) by
default, but you can use a \C item in a data line (as described above) to
change this.
</P>
<P>
Inserting callouts can be helpful when using <b>pcretest</b> to check
complicated regular expressions. For further information about callouts, see
the
<a href="pcrecallout.html"><b>pcrecallout</b></a>
documentation.
</P>
<br><a name="SEC11" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
<P>
The facilities described in this section are not available when the POSIX
inteface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
specified.
</P>
<P>
When the POSIX interface is not in use, you can cause <b>pcretest</b> to write a
compiled pattern to a file, by following the modifiers with &#62; and a file name.
For example:
<pre>
/pattern/im &#62;/some/file
</pre>
See the
<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
documentation for a discussion about saving and re-using compiled patterns.
</P>
<P>
The data that is written is binary. The first eight bytes are the length of the
compiled pattern data followed by the length of the optional study data, each
written as four bytes in big-endian order (most significant byte first). If
there is no study data (either the pattern was not studied, or studying did not
return any data), the second length is zero. The lengths are followed by an
exact copy of the compiled pattern. If there is additional study data, this
follows immediately after the compiled pattern. After writing the file,
<b>pcretest</b> expects to read a new pattern.
</P>
<P>
A saved pattern can be reloaded into <b>pcretest</b> by specifing &#60; and a file
name instead of a pattern. The name of the file must not contain a &#60; character,
as otherwise <b>pcretest</b> will interpret the line as a pattern delimited by &#60;
characters.
For example:
<pre>
re&#62; &#60;/some/file
Compiled regex loaded from /some/file
No study data
</pre>
When the pattern has been loaded, <b>pcretest</b> proceeds to read data lines in
the usual way.
</P>
<P>
You can copy a file written by <b>pcretest</b> to a different host and reload it
there, even if the new host has opposite endianness to the one on which the
pattern was compiled. For example, you can compile on an i86 machine and run on
a SPARC machine.
</P>
<P>
File names for saving and reloading can be absolute or relative, but note that
the shell facility of expanding a file name that starts with a tilde (~) is not
available.
</P>
<P>
The ability to save and reload files in <b>pcretest</b> is intended for testing
and experimentation. It is not intended for production use because only a
single pattern can be written to a file. Furthermore, there is no facility for
supplying custom character tables for use with a reloaded pattern. If the
original pattern was compiled with custom tables, an attempt to match a subject
string using a reloaded pattern is likely to cause <b>pcretest</b> to crash.
Finally, if you attempt to load a file that is not in the correct format, the
result is undefined.
</P>
<br><a name="SEC12" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
University Computing Service,
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Last updated: 29 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

244
libs/pcre/doc/pcre.3 Normal file
View File

@ -0,0 +1,244 @@
.TH PCRE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH INTRODUCTION
.rs
.sp
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl, with just a few
differences. The current implementation of PCRE (release 6.x) corresponds
approximately with Perl 5.8, including support for UTF-8 encoded strings and
Unicode general category properties. However, this support has to be explicitly
enabled; it is not the default.
.P
In addition to the Perl-compatible matching function, PCRE also contains an
alternative matching function that matches the same compiled patterns in a
different way. In certain circumstances, the alternative function has some
advantages. For a discussion of the two matching algorithms, see the
.\" HREF
\fBpcrematching\fP
.\"
page.
.P
PCRE is written in C and released as a C library. A number of people have
written wrappers and interfaces of various kinds. In particular, Google Inc.
have provided a comprehensive C++ wrapper. This is now included as part of the
PCRE distribution. The
.\" HREF
\fBpcrecpp\fP
.\"
page has details of this interface. Other people's contributions can be found
in the \fIContrib\fR directory at the primary FTP site, which is:
.sp
.\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
.\" </a>
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
.P
Details of exactly which Perl regular expression features are and are not
supported by PCRE are given in separate documents. See the
.\" HREF
\fBpcrepattern\fR
.\"
and
.\" HREF
\fBpcrecompat\fR
.\"
pages.
.P
Some features of PCRE can be included, excluded, or changed when the library is
built. The
.\" HREF
\fBpcre_config()\fR
.\"
function makes it possible for a client to discover which features are
available. The features themselves are described in the
.\" HREF
\fBpcrebuild\fP
.\"
page. Documentation about building PCRE for various operating systems can be
found in the \fBREADME\fP file in the source distribution.
.P
The library contains a number of undocumented internal functions and data
tables that are used by more than one of the exported external functions, but
which are not intended for use by external callers. Their names all begin with
"_pcre_", which hopefully will not provoke any name clashes. In some
environments, it is possible to control which external symbols are exported
when a shared library is built, and in these cases the undocumented symbols are
not exported.
.
.
.SH "USER DOCUMENTATION"
.rs
.sp
The user documentation for PCRE comprises a number of different sections. In
the "man" format, each of these is a separate "man page". In the HTML format,
each is a separate page, linked from the index page. In the plain text format,
all the sections are concatenated, for ease of searching. The sections are as
follows:
.sp
pcre this document
pcreapi details of PCRE's native C API
pcrebuild options for building PCRE
pcrecallout details of the callout feature
pcrecompat discussion of Perl compatibility
pcrecpp details of the C++ wrapper
pcregrep description of the \fBpcregrep\fP command
pcrematching discussion of the two matching algorithms
pcrepartial details of the partial matching facility
.\" JOIN
pcrepattern syntax and semantics of supported
regular expressions
pcreperform discussion of performance issues
pcreposix the POSIX-compatible C API
pcreprecompile details of saving and re-using precompiled patterns
pcresample discussion of the sample program
pcrestack discussion of stack usage
pcretest description of the \fBpcretest\fP testing command
.sp
In addition, in the "man" and HTML formats, there is a short page for each
C library function, listing its arguments and results.
.
.
.SH LIMITATIONS
.rs
.sp
There are some size limitations in PCRE but it is hoped that they will never in
practice be relevant.
.P
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
compiled with the default internal linkage size of 2. If you want to process
regular expressions that are truly enormous, you can compile PCRE with an
internal linkage size of 3 or 4 (see the \fBREADME\fP file in the source
distribution and the
.\" HREF
\fBpcrebuild\fP
.\"
documentation for details). In these cases the limit is substantially larger.
However, the speed of execution will be slower.
.P
All values in repeating quantifiers must be less than 65536. The maximum
compiled length of subpattern with an explicit repeat count is 30000 bytes. The
maximum number of capturing subpatterns is 65535.
.P
There is no limit to the number of non-capturing subpatterns, but the maximum
depth of nesting of all kinds of parenthesized subpattern, including capturing
subpatterns, assertions, and other types of subpattern, is 200.
.P
The maximum length of name for a named subpattern is 32, and the maximum number
of named subpatterns is 10000.
.P
The maximum length of a subject string is the largest positive number that an
integer variable can hold. However, when using the traditional matching
function, PCRE uses recursion to handle subpatterns and indefinite repetition.
This means that the available stack space may limit the size of a subject
string that can be processed by certain patterns. For a discussion of stack
issues, see the
.\" HREF
\fBpcrestack\fP
.\"
documentation.
.sp
.\" HTML <a name="utf8support"></a>
.
.
.SH "UTF-8 AND UNICODE PROPERTY SUPPORT"
.rs
.sp
From release 3.3, PCRE has had some support for character strings encoded in
the UTF-8 format. For release 4.0 this was greatly extended to cover most
common requirements, and in release 5.0 additional support for Unicode general
category properties was added.
.P
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
the code, and, in addition, you must call
.\" HREF
\fBpcre_compile()\fP
.\"
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
subject strings that are matched against it are treated as UTF-8 strings
instead of just strings of bytes.
.P
If you compile PCRE with UTF-8 support, but do not use it at run time, the
library will be a bit bigger, but the additional run time overhead is limited
to testing the PCRE_UTF8 flag in several places, so should not be very large.
.P
If PCRE is built with Unicode character property support (which implies UTF-8
support), the escape sequences \ep{..}, \eP{..}, and \eX are supported.
The available properties that can be tested are limited to the general
category properties such as Lu for an upper case letter or Nd for a decimal
number, the Unicode script names such as Arabic or Han, and the derived
properties Any and L&. A full list is given in the
.\" HREF
\fBpcrepattern\fP
.\"
documentation. Only the short names for properties are supported. For example,
\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.
Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
compatibility with Perl 5.6. PCRE does not support this.
.P
The following comments apply when PCRE is running in UTF-8 mode:
.P
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
are checked for validity on entry to the relevant functions. If an invalid
UTF-8 string is passed, an error return is given. In some situations, you may
already know that your strings are valid, and therefore want to skip these
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
at compile time or at run time, PCRE assumes that the pattern or subject it
is given (respectively) contains only valid UTF-8 codes. In this case, it does
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
may crash.
.P
2. An unbraced hexadecimal escape sequence (such as \exb3) matches a two-byte
UTF-8 character if the value is greater than 127.
.P
3. Octal numbers up to \e777 are recognized, and match two-byte UTF-8
characters for values greater than \e177.
.P
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
bytes, for example: \ex{100}{3}.
.P
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
.P
6. The escape sequence \eC can be used to match a single byte in UTF-8 mode,
but its use can lead to some strange effects. This facility is not available in
the alternative matching function, \fBpcre_dfa_exec()\fP.
.P
7. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
test characters of any code value, but the characters that PCRE recognizes as
digits, spaces, or word characters remain the same set as before, all with
values less than 256. This remains true even when PCRE includes Unicode
property support, because to do otherwise would slow down PCRE in many common
cases. If you really want to test for a wider sense of, say, "digit", you
must use Unicode property tests such as \ep{Nd}.
.P
8. Similarly, characters that match the POSIX named character classes are all
low-valued characters.
.P
9. Case-insensitive matching applies only to characters whose values are less
than 128, unless PCRE is built with Unicode property support. Even when Unicode
property support is available, PCRE still uses its own character tables when
checking the case of low-valued characters, so as not to degrade performance.
The Unicode property information is used only for characters with higher
values. Even when Unicode property support is available, PCRE supports
case-insensitive matching only when there is a one-to-one mapping between a
letter's cases. There are a small number of many-to-one mappings in Unicode;
these are not supported by PCRE.
.
.SH AUTHOR
.rs
.sp
Philip Hazel
.br
University Computing Service,
.br
Cambridge CB2 3QG, England.
.P
Putting an actual email address here seems to have been a spam magnet, so I've
taken it away. If you want to email me, use my initial and surname, separated
by a dot, at the domain ucs.cam.ac.uk.
.sp
.in 0
Last updated: 05 June 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

5153
libs/pcre/doc/pcre.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,69 @@
.TH PCRE_COMPILE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B pcre *pcre_compile(const char *\fIpattern\fP, int \fIoptions\fP,
.ti +5n
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
.ti +5n
.B const unsigned char *\fItableptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This function compiles a regular expression into an internal form. Its
arguments are:
.sp
\fIpattern\fR A zero-terminated string containing the
regular expression to be compiled
\fIoptions\fR Zero or more option bits
\fIerrptr\fR Where to put an error message
\fIerroffset\fR Offset in pattern where error was found
\fItableptr\fR Pointer to character tables, or NULL to
use the built-in default
.sp
The option bits are:
.sp
PCRE_ANCHORED Force pattern anchoring
PCRE_AUTO_CALLOUT Compile automatic callouts
PCRE_CASELESS Do caseless matching
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_DUPNAMES Allow duplicate names for subpatterns
PCRE_EXTENDED Ignore whitespace and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_FIRSTLINE Force matching to be before newline
PCRE_MULTILINE ^ and $ match newlines within data
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
PCRE_UNGREEDY Invert greediness of quantifiers
PCRE_UTF8 Run in UTF-8 mode
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
.sp
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
PCRE_NO_UTF8_CHECK.
.P
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fR
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fR
.\"
page.

View File

@ -0,0 +1,74 @@
.TH PCRE_COMPILE2 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP,
.ti +5n
.B int *\fIerrorcodeptr\fP,
.ti +5n
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
.ti +5n
.B const unsigned char *\fItableptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This function compiles a regular expression into an internal form. It is the
same as \fBpcre_compile()\fP, except for the addition of the \fIerrorcodeptr\fP
argument. The arguments are:
.sp
\fIpattern\fR A zero-terminated string containing the
regular expression to be compiled
\fIoptions\fR Zero or more option bits
\fIerrorcodeptr\fP Where to put an error code
\fIerrptr\fR Where to put an error message
\fIerroffset\fR Offset in pattern where error was found
\fItableptr\fR Pointer to character tables, or NULL to
use the built-in default
.sp
The option bits are:
.sp
PCRE_ANCHORED Force pattern anchoring
PCRE_AUTO_CALLOUT Compile automatic callouts
PCRE_CASELESS Do caseless matching
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_DUPNAMES Allow duplicate names for subpatterns
PCRE_EXTENDED Ignore whitespace and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_FIRSTLINE Force matching to be before newline
PCRE_MULTILINE ^ and $ match newlines within data
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
PCRE_UNGREEDY Invert greediness of quantifiers
PCRE_UTF8 Run in UTF-8 mode
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
.sp
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
PCRE_NO_UTF8_CHECK.
.P
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fR
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fR
.\"
page.

View File

@ -0,0 +1,50 @@
.TH PCRE_CONFIG 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP);
.
.SH DESCRIPTION
.rs
.sp
This function makes it possible for a client program to find out which optional
features are available in the version of the PCRE library it is using. Its
arguments are as follows:
.sp
\fIwhat\fR A code specifying what information is required
\fIwhere\fR Points to where to put the data
.sp
The available codes are:
.sp
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
PCRE_CONFIG_MATCH_LIMIT_RECURSION
Internal recursion depth limit
PCRE_CONFIG_NEWLINE Value of the newline sequence
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
Threshold of return slots, above
which \fBmalloc()\fR is used by
the POSIX API
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
PCRE_CONFIG_UNICODE_PROPERTIES
Availability of Unicode property support
(1=yes 0=no)
.sp
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fR
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fR
.\"
page.

View File

@ -0,0 +1,44 @@
.TH PCRE_COPY_NAMED_SUBSTRING 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_copy_named_substring(const pcre *\fIcode\fP,
.ti +5n
.B const char *\fIsubject\fP, int *\fIovector\fP,
.ti +5n
.B int \fIstringcount\fP, const char *\fIstringname\fP,
.ti +5n
.B char *\fIbuffer\fP, int \fIbuffersize\fP);
.
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a captured substring, identified
by name, into a given buffer. The arguments are:
.sp
\fIcode\fP Pattern that was successfully matched
\fIsubject\fP Subject that has been successfully matched
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
\fIstringname\fP Name of the required substring
\fIbuffer\fP Buffer to receive the string
\fIbuffersize\fP Size of buffer
.sp
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,41 @@
.TH PCRE_COPY_SUBSTRING 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_copy_substring(const char *\fIsubject\fP, int *\fIovector\fP,
.ti +5n
.B int \fIstringcount\fP, int \fIstringnumber\fP, char *\fIbuffer\fP,
.ti +5n
.B int \fIbuffersize\fP);
.
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a captured substring into a given
buffer. The arguments are:
.sp
\fIsubject\fP Subject that has been successfully matched
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
\fIstringnumber\fP Number of the required substring
\fIbuffer\fP Buffer to receive the string
\fIbuffersize\fP Size of buffer
.sp
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,85 @@
.TH PCRE_DFA_EXEC 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
.ti +5n
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
.ti +5n
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
.ti +5n
.B int *\fIworkspace\fP, int \fIwscount\fP);
.
.SH DESCRIPTION
.rs
.sp
This function matches a compiled regular expression against a given subject
string, using a DFA matching algorithm (\fInot\fP Perl-compatible). Note that
the main, Perl-compatible, matching function is \fBpcre_exec()\fP. The
arguments for this function are:
.sp
\fIcode\fP Points to the compiled pattern
\fIextra\fP Points to an associated \fBpcre_extra\fP structure,
or is NULL
\fIsubject\fP Points to the subject string
\fIlength\fP Length of the subject string, in bytes
\fIstartoffset\fP Offset in bytes in the subject at which to
start matching
\fIoptions\fP Option bits
\fIovector\fP Points to a vector of ints for result offsets
\fIovecsize\fP Number of elements in the vector
\fIworkspace\fP Points to a vector of ints used as working space
\fIwscount\fP Number of elements in the vector
.sp
The options are:
.sp
PCRE_ANCHORED Match only at the first position
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NOTBOL Subject is not the beginning of a line
PCRE_NOTEOL Subject is not the end of a line
PCRE_NOTEMPTY An empty string is not a valid match
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
PCRE_DFA_SHORTEST Return only the shortest match
PCRE_DFA_RESTART This is a restart after a partial match
.sp
There are restrictions on what may appear in a pattern when matching using the
DFA algorithm is requested. Details are given in the
.\" HREF
\fBpcrematching\fP
.\"
documentation.
.P
A \fBpcre_extra\fP structure contains the following fields:
.sp
\fIflags\fP Bits indicating which fields are set
\fIstudy_data\fP Opaque data from \fBpcre_study()\fP
\fImatch_limit\fP Limit on internal resource use
\fImatch_limit_recursion\fP Limit on internal recursion depth
\fIcallout_data\fP Opaque data passed back to callouts
\fItables\fP Points to character tables or is NULL
.sp
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
PCRE_EXTRA_TABLES. For DFA matching, the \fImatch_limit\fP and
\fImatch_limit_recursion\fP fields are not used, and must not be set.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

73
libs/pcre/doc/pcre_exec.3 Normal file
View File

@ -0,0 +1,73 @@
.TH PCRE_EXEC 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
.ti +5n
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
.ti +5n
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
.
.SH DESCRIPTION
.rs
.sp
This function matches a compiled regular expression against a given subject
string, using a matching algorithm that is similar to Perl's. It returns
offsets to captured substrings. Its arguments are:
.sp
\fIcode\fP Points to the compiled pattern
\fIextra\fP Points to an associated \fBpcre_extra\fP structure,
or is NULL
\fIsubject\fP Points to the subject string
\fIlength\fP Length of the subject string, in bytes
\fIstartoffset\fP Offset in bytes in the subject at which to
start matching
\fIoptions\fP Option bits
\fIovector\fP Points to a vector of ints for result offsets
\fIovecsize\fP Number of elements in the vector (a multiple of 3)
.sp
The options are:
.sp
PCRE_ANCHORED Match only at the first position
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NOTBOL Subject is not the beginning of a line
PCRE_NOTEOL Subject is not the end of a line
PCRE_NOTEMPTY An empty string is not a valid match
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
.sp
There are restrictions on what may appear in a pattern when partial matching is
requested.
.P
A \fBpcre_extra\fP structure contains the following fields:
.sp
\fIflags\fP Bits indicating which fields are set
\fIstudy_data\fP Opaque data from \fBpcre_study()\fP
\fImatch_limit\fP Limit on internal resource use
\fImatch_limit_recursion\fP Limit on internal recursion depth
\fIcallout_data\fP Opaque data passed back to callouts
\fItables\fP Points to character tables or is NULL
.sp
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
PCRE_EXTRA_TABLES.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,28 @@
.TH PCRE_FREE_SUBSTRING 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B void pcre_free_substring(const char *\fIstringptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This is a convenience function for freeing the store obtained by a previous
call to \fBpcre_get_substring()\fP or \fBpcre_get_named_substring()\fP. Its
only argument is a pointer to the string.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,28 @@
.TH PCRE_FREE_SUBSTRING_LIST 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B void pcre_free_substring_list(const char **\fIstringptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This is a convenience function for freeing the store obtained by a previous
call to \fBpcre_get_substring_list()\fP. Its only argument is a pointer to the
list of string pointers.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,59 @@
.TH PCRE_FULLINFO 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_fullinfo(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
.ti +5n
.B int \fIwhat\fP, void *\fIwhere\fP);
.
.SH DESCRIPTION
.rs
.sp
This function returns information about a compiled pattern. Its arguments are:
.sp
\fIcode\fP Compiled regular expression
\fIextra\fP Result of \fBpcre_study()\fP or NULL
\fIwhat\fP What information is required
\fIwhere\fP Where to put the information
.sp
The following information is available:
.sp
PCRE_INFO_BACKREFMAX Number of highest back reference
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
PCRE_INFO_DEFAULT_TABLES Pointer to default tables
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
-1 for start of string
or after newline, or
-2 otherwise
PCRE_INFO_FIRSTTABLE Table of first bytes
(after studying)
PCRE_INFO_LASTLITERAL Literal last byte required
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
PCRE_INFO_OPTIONS Options used for compilation
PCRE_INFO_SIZE Size of compiled pattern
PCRE_INFO_STUDYSIZE Size of study data
.sp
The yield of the function is zero on success or:
.sp
PCRE_ERROR_NULL the argument \fIcode\fP was NULL
the argument \fIwhere\fP was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of \fIwhat\fP was invalid
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,45 @@
.TH PCRE_GET_NAMED_SUBSTRING 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_get_named_substring(const pcre *\fIcode\fP,
.ti +5n
.B const char *\fIsubject\fP, int *\fIovector\fP,
.ti +5n
.B int \fIstringcount\fP, const char *\fIstringname\fP,
.ti +5n
.B const char **\fIstringptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a captured substring by name. The
arguments are:
.sp
\fIcode\fP Compiled pattern
\fIsubject\fP Subject that has been successfully matched
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
\fIstringname\fP Name of the required substring
\fIstringptr\fP Where to put the string pointer
.sp
The memory in which the substring is placed is obtained by calling
\fBpcre_malloc()\fP. The yield of the function is the length of the extracted
substring, PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
PCRE_ERROR_NOSUBSTRING if the string name is invalid.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,35 @@
.TH PCRE_GET_STRINGNUMBER 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_get_stringnumber(const pcre *\fIcode\fP,
.ti +5n
.B const char *\fIname\fP);
.
.SH DESCRIPTION
.rs
.sp
This convenience function finds the number of a named substring capturing
parenthesis in a compiled pattern. Its arguments are:
.sp
\fIcode\fP Compiled regular expression
\fIname\fP Name whose number is required
.sp
The yield of the function is the number of the parenthesis if the name is
found, or PCRE_ERROR_NOSUBSTRING otherwise.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,41 @@
.TH PCRE_GET_STRINGNUMBER 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_get_stringtable_entries(const pcre *\fIcode\fP,
.ti +5n
.B const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);
.
.SH DESCRIPTION
.rs
.sp
This convenience function finds, for a compiled pattern, the first and last
entries for a given name in the table that translates capturing parenthesis
names into numbers. When names are required to be unique (PCRE_DUPNAMES is
\fInot\fP set), it is usually easier to use \fBpcre_get_stringnumber()\fP
instead.
.sp
\fIcode\fP Compiled regular expression
\fIname\fP Name whose entries required
\fIfirst\fP Where to return a pointer to the first entry
\fIlast\fP Where to return a pointer to the last entry
.sp
The yield of the function is the length of each entry, or
PCRE_ERROR_NOSUBSTRING if none are found.
.P
There is a complete description of the PCRE native API, including the format of
the table entries, in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,42 @@
.TH PCRE_GET_SUBSTRING 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_get_substring(const char *\fIsubject\fP, int *\fIovector\fP,
.ti +5n
.B int \fIstringcount\fP, int \fIstringnumber\fP,
.ti +5n
.B const char **\fIstringptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a captured substring. The
arguments are:
.sp
\fIsubject\fP Subject that has been successfully matched
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
\fIstringnumber\fP Number of the required substring
\fIstringptr\fP Where to put the string pointer
.sp
The memory in which the substring is placed is obtained by calling
\fBpcre_malloc()\fP. The yield of the function is the length of the substring,
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
PCRE_ERROR_NOSUBSTRING if the string number is invalid.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,40 @@
.TH PCRE_GET_SUBSTRING_LIST 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_get_substring_list(const char *\fIsubject\fP,
.ti +5n
.B int *\fIovector\fP, int \fIstringcount\fP, "const char ***\fIlistptr\fP);"
.
.SH DESCRIPTION
.rs
.sp
This is a convenience function for extracting a list of all the captured
substrings. The arguments are:
.sp
\fIsubject\fP Subject that has been successfully matched
\fIovector\fP Offset vector that \fBpcre_exec\fP used
\fIstringcount\fP Value returned by \fBpcre_exec\fP
\fIlistptr\fP Where to put a pointer to the list
.sp
The memory in which the substrings and the list are placed is obtained by
calling \fBpcre_malloc()\fP. A pointer to a list of pointers is put in
the variable whose address is in \fIlistptr\fP. The list is terminated by a
NULL pointer. The yield of the function is zero on success or
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

27
libs/pcre/doc/pcre_info.3 Normal file
View File

@ -0,0 +1,27 @@
.TH PCRE_INFO 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_info(const pcre *\fIcode\fP, int *\fIoptptr\fP, int
.B *\fIfirstcharptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This function is obsolete. You should be using \fBpcre_fullinfo()\fP instead.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,30 @@
.TH PCRE_MAKETABLES 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B const unsigned char *pcre_maketables(void);
.
.SH DESCRIPTION
.rs
.sp
This function builds a set of character tables for character values less than
256. These can be passed to \fBpcre_compile()\fP to override PCRE's internal,
built-in tables (which were made by \fBpcre_maketables()\fP when PCRE was
compiled). You might want to do this if you are using a non-standard locale.
The function yields a pointer to the tables.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,33 @@
.TH PCRE_REFCOUNT 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP);
.
.SH DESCRIPTION
.rs
.sp
This function is used to maintain a reference count inside a data block that
contains a compiled pattern. Its arguments are:
.sp
\fIcode\fP Compiled regular expression
\fIadjust\fP Adjustment to reference value
.sp
The yield of the function is the adjusted reference value, which is constrained
to lie between 0 and 65535.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,43 @@
.TH PCRE_STUDY 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,
.ti +5n
.B const char **\fIerrptr\fP);
.
.SH DESCRIPTION
.rs
.sp
This function studies a compiled pattern, to see if additional information can
be extracted that might speed up matching. Its arguments are:
.sp
\fIcode\fP A compiled regular expression
\fIoptions\fP Options for \fBpcre_study()\fP
\fIerrptr\fP Where to put an error message
.sp
If the function succeeds, it returns a value that can be passed to
\fBpcre_exec()\fP via its \fIextra\fP argument.
.P
If the function returns NULL, either it could not find any additional
information, or there was an error. You can tell the difference by looking at
the error value. It is NULL in first case.
.P
There are currently no options defined; the value of the second argument should
always be zero.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

View File

@ -0,0 +1,27 @@
.TH PCRE_VERSION 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
.rs
.sp
.B #include <pcre.h>
.PP
.SM
.br
.B char *pcre_version(void);
.
.SH DESCRIPTION
.rs
.sp
This function returns a character string that gives the version number of the
PCRE library and the date of its release.
.P
There is a complete description of the PCRE native API in the
.\" HREF
\fBpcreapi\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcreposix\fP
.\"
page.

1789
libs/pcre/doc/pcreapi.3 Normal file

File diff suppressed because it is too large Load Diff

213
libs/pcre/doc/pcrebuild.3 Normal file
View File

@ -0,0 +1,213 @@
.TH PCREBUILD 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE BUILD-TIME OPTIONS"
.rs
.sp
This document describes the optional features of PCRE that can be selected when
the library is compiled. They are all selected, or deselected, by providing
options to the \fBconfigure\fP script that is run before the \fBmake\fP
command. The complete list of options for \fBconfigure\fP (which includes the
standard ones such as the selection of the installation directory) can be
obtained by running
.sp
./configure --help
.sp
The following sections describe certain options whose names begin with --enable
or --disable. These settings specify changes to the defaults for the
\fBconfigure\fP command. Because of the way that \fBconfigure\fP works,
--enable and --disable always come in pairs, so the complementary option always
exists as well, but as it specifies the default, it is not described.
.
.SH "C++ SUPPORT"
.rs
.sp
By default, the \fBconfigure\fP script will search for a C++ compiler and C++
header files. If it finds them, it automatically builds the C++ wrapper library
for PCRE. You can disable this by adding
.sp
--disable-cpp
.sp
to the \fBconfigure\fP command.
.
.SH "UTF-8 SUPPORT"
.rs
.sp
To build PCRE with support for UTF-8 character strings, add
.sp
--enable-utf8
.sp
to the \fBconfigure\fP command. Of itself, this does not make PCRE treat
strings as UTF-8. As well as compiling PCRE with this option, you also have
have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fP
function.
.
.SH "UNICODE CHARACTER PROPERTY SUPPORT"
.rs
.sp
UTF-8 support allows PCRE to process character values greater than 255 in the
strings that it handles. On its own, however, it does not provide any
facilities for accessing the properties of such characters. If you want to be
able to use the pattern escapes \eP, \ep, and \eX, which refer to Unicode
character properties, you must add
.sp
--enable-unicode-properties
.sp
to the \fBconfigure\fP command. This implies UTF-8 support, even if you have
not explicitly requested it.
.P
Including Unicode property support adds around 90K of tables to the PCRE
library, approximately doubling its size. Only the general category properties
such as \fILu\fP and \fINd\fP are supported. Details are given in the
.\" HREF
\fBpcrepattern\fP
.\"
documentation.
.
.SH "CODE VALUE OF NEWLINE"
.rs
.sp
By default, PCRE interprets character 10 (linefeed, LF) as indicating the end
of a line. This is the normal newline character on Unix-like systems. You can
compile PCRE to use character 13 (carriage return, CR) instead, by adding
.sp
--enable-newline-is-cr
.sp
to the \fBconfigure\fP command. There is also a --enable-newline-is-lf option,
which explicitly specifies linefeed as the newline character.
.sp
Alternatively, you can specify that line endings are to be indicated by the two
character sequence CRLF. If you want this, add
.sp
--enable-newline-is-crlf
.sp
to the \fBconfigure\fP command. Whatever line ending convention is selected
when PCRE is built can be overridden when the library functions are called. At
build time it is conventional to use the standard for your operating system.
.
.SH "BUILDING SHARED AND STATIC LIBRARIES"
.rs
.sp
The PCRE building process uses \fBlibtool\fP to build both shared and static
Unix libraries by default. You can suppress one of these by adding one of
.sp
--disable-shared
--disable-static
.sp
to the \fBconfigure\fP command, as required.
.
.SH "POSIX MALLOC USAGE"
.rs
.sp
When PCRE is called through the POSIX interface (see the
.\" HREF
\fBpcreposix\fP
.\"
documentation), additional working storage is required for holding the pointers
to capturing substrings, because PCRE requires three integers per substring,
whereas the POSIX interface provides only two. If the number of expected
substrings is small, the wrapper function uses space on the stack, because this
is faster than using \fBmalloc()\fP for each call. The default threshold above
which the stack is no longer used is 10; it can be changed by adding a setting
such as
.sp
--with-posix-malloc-threshold=20
.sp
to the \fBconfigure\fP command.
.
.SH "HANDLING VERY LARGE PATTERNS"
.rs
.sp
Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
metacharacter). By default, two-byte values are used for these offsets, leading
to a maximum size for a compiled pattern of around 64K. This is sufficient to
handle all but the most gigantic patterns. Nevertheless, some people do want to
process enormous patterns, so it is possible to compile PCRE to use three-byte
or four-byte offsets by adding a setting such as
.sp
--with-link-size=3
.sp
to the \fBconfigure\fP command. The value given must be 2, 3, or 4. Using
longer offsets slows down the operation of PCRE because it has to load
additional bytes when handling them.
.P
If you build PCRE with an increased link size, test 2 (and test 5 if you are
using UTF-8) will fail. Part of the output of these tests is a representation
of the compiled pattern, and this changes with the link size.
.
.SH "AVOIDING EXCESSIVE STACK USAGE"
.rs
.sp
When matching with the \fBpcre_exec()\fP function, PCRE implements backtracking
by making recursive calls to an internal function called \fBmatch()\fP. In
environments where the size of the stack is limited, this can severely limit
PCRE's operation. (The Unix environment does not usually suffer from this
problem, but it may sometimes be necessary to increase the maximum stack size.
There is a discussion in the
.\" HREF
\fBpcrestack\fP
.\"
documentation.) An alternative approach to recursion that uses memory from the
heap to remember data, instead of using recursive function calls, has been
implemented to work round the problem of limited stack size. If you want to
build a version of PCRE that works this way, add
.sp
--disable-stack-for-recursion
.sp
to the \fBconfigure\fP command. With this configuration, PCRE will use the
\fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory
management functions. Separate functions are provided because the usage is very
predictable: the block sizes requested are always the same, and the blocks are
always freed in reverse order. A calling program might be able to implement
optimized functions that perform better than the standard \fBmalloc()\fP and
\fBfree()\fP functions. PCRE runs noticeably more slowly when built in this
way. This option affects only the \fBpcre_exec()\fP function; it is not
relevant for the the \fBpcre_dfa_exec()\fP function.
.
.SH "LIMITING PCRE RESOURCE USAGE"
.rs
.sp
Internally, PCRE has a function called \fBmatch()\fP, which it calls repeatedly
(sometimes recursively) when matching a pattern with the \fBpcre_exec()\fP
function. By controlling the maximum number of times this function may be
called during a single matching operation, a limit can be placed on the
resources used by a single call to \fBpcre_exec()\fP. The limit can be changed
at run time, as described in the
.\" HREF
\fBpcreapi\fP
.\"
documentation. The default is 10 million, but this can be changed by adding a
setting such as
.sp
--with-match-limit=500000
.sp
to the \fBconfigure\fP command. This setting has no effect on the
\fBpcre_dfa_exec()\fP matching function.
.P
In some environments it is desirable to limit the depth of recursive calls of
\fBmatch()\fP more strictly than the total number of calls, in order to
restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
is specified) that is used. A second limit controls this; it defaults to the
value that is set for --with-match-limit, which imposes no additional
constraints. However, you can set a lower limit by adding, for example,
.sp
--with-match-limit-recursion=10000
.sp
to the \fBconfigure\fP command. This value can also be overridden at run time.
.
.SH "USING EBCDIC CODE"
.rs
.sp
PCRE assumes by default that it will run in an environment where the character
code is ASCII (or Unicode, which is a superset of ASCII). PCRE can, however, be
compiled to run in an EBCDIC environment by adding
.sp
--enable-ebcdic
.sp
to the \fBconfigure\fP command.
.P
.in 0
Last updated: 06 June 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

161
libs/pcre/doc/pcrecallout.3 Normal file
View File

@ -0,0 +1,161 @@
.TH PCRECALLOUT 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE CALLOUTS"
.rs
.sp
.B int (*pcre_callout)(pcre_callout_block *);
.PP
PCRE provides a feature called "callout", which is a means of temporarily
passing control to the caller of PCRE in the middle of pattern matching. The
caller of PCRE provides an external function by putting its entry point in the
global variable \fIpcre_callout\fP. By default, this variable contains NULL,
which disables all calling out.
.P
Within a regular expression, (?C) indicates the points at which the external
function is to be called. Different callout points can be identified by putting
a number less than 256 after the letter C. The default value is zero.
For example, this pattern has two callout points:
.sp
(?C1)\deabc(?C2)def
.sp
If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP is called,
PCRE automatically inserts callouts, all with number 255, before each item in
the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
.sp
A(\ed{2}|--)
.sp
it is processed as if it were
.sp
(?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
.sp
Notice that there is a callout before and after each parenthesis and
alternation bar. Automatic callouts can be used for tracking the progress of
pattern matching. The
.\" HREF
\fBpcretest\fP
.\"
command has an option that sets automatic callouts; when it is used, the output
indicates how the pattern is matched. This is useful information when you are
trying to optimize the performance of a particular pattern.
.
.
.SH "MISSING CALLOUTS"
.rs
.sp
You should be aware that, because of optimizations in the way PCRE matches
patterns, callouts sometimes do not happen. For example, if the pattern is
.sp
ab(?C4)cd
.sp
PCRE knows that any matching string must contain the letter "d". If the subject
string is "abyz", the lack of "d" means that matching doesn't ever start, and
the callout is never reached. However, with "abyd", though the result is still
no match, the callout is obeyed.
.
.
.SH "THE CALLOUT INTERFACE"
.rs
.sp
During matching, when PCRE reaches a callout point, the external function
defined by \fIpcre_callout\fP is called (if it is set). This applies to both
the \fBpcre_exec()\fP and the \fBpcre_dfa_exec()\fP matching functions. The
only argument to the callout function is a pointer to a \fBpcre_callout\fP
block. This structure contains the following fields:
.sp
int \fIversion\fP;
int \fIcallout_number\fP;
int *\fIoffset_vector\fP;
const char *\fIsubject\fP;
int \fIsubject_length\fP;
int \fIstart_match\fP;
int \fIcurrent_position\fP;
int \fIcapture_top\fP;
int \fIcapture_last\fP;
void *\fIcallout_data\fP;
int \fIpattern_position\fP;
int \fInext_item_length\fP;
.sp
The \fIversion\fP field is an integer containing the version number of the
block format. The initial version was 0; the current version is 1. The version
number will change again in future if additional fields are added, but the
intention is never to remove any of the existing fields.
.P
The \fIcallout_number\fP field contains the number of the callout, as compiled
into the pattern (that is, the number after ?C for manual callouts, and 255 for
automatically generated callouts).
.P
The \fIoffset_vector\fP field is a pointer to the vector of offsets that was
passed by the caller to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. When
\fBpcre_exec()\fP is used, the contents can be inspected in order to extract
substrings that have been matched so far, in the same way as for extracting
substrings after a match has completed. For \fBpcre_dfa_exec()\fP this field is
not useful.
.P
The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
that were passed to \fBpcre_exec()\fP.
.P
The \fIstart_match\fP field contains the offset within the subject at which the
current match attempt started. If the pattern is not anchored, the callout
function may be called several times from the same point in the pattern for
different starting points in the subject.
.P
The \fIcurrent_position\fP field contains the offset within the subject of the
current match pointer.
.P
When the \fBpcre_exec()\fP function is used, the \fIcapture_top\fP field
contains one more than the number of the highest numbered captured substring so
far. If no substrings have been captured, the value of \fIcapture_top\fP is
one. This is always the case when \fBpcre_dfa_exec()\fP is used, because it
does not support captured substrings.
.P
The \fIcapture_last\fP field contains the number of the most recently captured
substring. If no substrings have been captured, its value is -1. This is always
the case when \fBpcre_dfa_exec()\fP is used.
.P
The \fIcallout_data\fP field contains a value that is passed to
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP specifically so that it can be
passed back in callouts. It is passed in the \fIpcre_callout\fP field of the
\fBpcre_extra\fP data structure. If no such data was passed, the value of
\fIcallout_data\fP in a \fBpcre_callout\fP block is NULL. There is a
description of the \fBpcre_extra\fP structure in the
.\" HREF
\fBpcreapi\fP
.\"
documentation.
.P
The \fIpattern_position\fP field is present from version 1 of the
\fIpcre_callout\fP structure. It contains the offset to the next item to be
matched in the pattern string.
.P
The \fInext_item_length\fP field is present from version 1 of the
\fIpcre_callout\fP structure. It contains the length of the next item to be
matched in the pattern string. When the callout immediately precedes an
alternation bar, a closing parenthesis, or the end of the pattern, the length
is zero. When the callout precedes an opening parenthesis, the length is that
of the entire subpattern.
.P
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
help in distinguishing between different automatic callouts, which all have the
same callout number. However, they are set for all callouts.
.
.
.SH "RETURN VALUES"
.rs
.sp
The external callout function returns an integer to PCRE. If the value is zero,
matching proceeds as normal. If the value is greater than zero, matching fails
at the current point, but the testing of other matching possibilities goes
ahead, just as if a lookahead assertion had failed. If the value is less than
zero, the match is abandoned, and \fBpcre_exec()\fP (or \fBpcre_dfa_exec()\fP)
returns the negative value.
.P
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
it will never be used by PCRE itself.
.P
.in 0
Last updated: 28 February 2005
.br
Copyright (c) 1997-2005 University of Cambridge.

126
libs/pcre/doc/pcrecompat.3 Normal file
View File

@ -0,0 +1,126 @@
.TH PCRECOMPAT 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "DIFFERENCES BETWEEN PCRE AND PERL"
.rs
.sp
This document describes the differences in the ways that PCRE and Perl handle
regular expressions. The differences described here are with respect to Perl
5.8.
.P
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
it does have are given in the
.\" HTML <a href="pcre.html#utf8support">
.\" </a>
section on UTF-8 support
.\"
in the main
.\" HREF
\fBpcre\fP
.\"
page.
.P
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
them, but they do not mean what you might think. For example, (?!a){3} does
not assert that the next three characters are not "a". It just asserts that the
next character is not "a" three times.
.P
3. Capturing subpatterns that occur inside negative lookahead assertions are
counted, but their entries in the offsets vector are never set. Perl sets its
numerical variables from any such patterns that are matched before the
assertion fails to match something (thereby succeeding), but only if the
negative lookahead assertion contains just one branch.
.P
4. Though binary zero characters are supported in the subject string, they are
not allowed in a pattern string because it is passed as a normal C string,
terminated by zero. The escape sequence \e0 can be used in the pattern to
represent a binary zero.
.P
5. The following Perl escape sequences are not supported: \el, \eu, \eL,
\eU, and \eN. In fact these are implemented by Perl's general string-handling
and are not part of its pattern matching engine. If any of these are
encountered by PCRE, an error is generated.
.P
6. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE is
built with Unicode character property support. The properties that can be
tested with \ep and \eP are limited to the general category properties such as
Lu and Nd, script names such as Greek or Han, and the derived properties Any
and L&.
.P
7. PCRE does support the \eQ...\eE escape for quoting substrings. Characters in
between are treated as literals. This is slightly different from Perl in that $
and @ are also handled as literals inside the quotes. In Perl, they cause
variable interpolation (but of course PCRE does not have variables). Note the
following examples:
.sp
Pattern PCRE matches Perl matches
.sp
.\" JOIN
\eQabc$xyz\eE abc$xyz abc followed by the
contents of $xyz
\eQabc\e$xyz\eE abc\e$xyz abc\e$xyz
\eQabc\eE\e$\eQxyz\eE abc$xyz abc$xyz
.sp
The \eQ...\eE sequence is recognized both inside and outside character classes.
.P
8. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
constructions. However, there is support for recursive patterns using the
non-Perl items (?R), (?number), and (?P>name). Also, the PCRE "callout" feature
allows an external function to be called during pattern matching. See the
.\" HREF
\fBpcrecallout\fP
.\"
documentation for details.
.P
9. There are some differences that are concerned with the settings of captured
strings when part of a pattern is repeated. For example, matching "aba" against
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
.P
10. PCRE provides some extensions to the Perl regular expression facilities:
.sp
(a) Although lookbehind assertions must match fixed length strings, each
alternative branch of a lookbehind assertion can match a different length of
string. Perl requires them all to have the same length.
.sp
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
meta-character matches only at the very end of the string.
.sp
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
meaning is faulted. Otherwise, like Perl, the backslash is ignored. (Perl can
be made to issue a warning.)
.sp
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
inverted, that is, by default they are not greedy, but if followed by a
question mark they are.
.sp
(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
only at the first matching position in the subject string.
.sp
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
options for \fBpcre_exec()\fP have no Perl equivalents.
.sp
(g) The (?R), (?number), and (?P>name) constructs allows for recursive pattern
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
support.)
.sp
(h) PCRE supports named capturing substrings, using the Python syntax.
.sp
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
package.
.sp
(j) The (R) condition, for testing recursion, is a PCRE extension.
.sp
(k) The callout facility is PCRE-specific.
.sp
(l) The partial matching facility is PCRE-specific.
.sp
(m) Patterns compiled by PCRE can be saved and re-used at a later time, even on
different hosts that have the other endianness.
.sp
(n) The alternative matching function (\fBpcre_dfa_exec()\fP) matches in a
different way and is not Perl-compatible.
.P
.in 0
Last updated: 06 June 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

312
libs/pcre/doc/pcrecpp.3 Normal file
View File

@ -0,0 +1,312 @@
.TH PCRECPP 3
.SH NAME
PCRE - Perl-compatible regular expressions.
.SH "SYNOPSIS OF C++ WRAPPER"
.rs
.sp
.B #include <pcrecpp.h>
.PP
.SM
.br
.SH DESCRIPTION
.rs
.sp
The C++ wrapper for PCRE was provided by Google Inc. Some additional
functionality was added by Giuseppe Maxia. This brief man page was constructed
from the notes in the \fIpcrecpp.h\fP file, which should be consulted for
further details.
.
.
.SH "MATCHING INTERFACE"
.rs
.sp
The "FullMatch" operation checks that supplied text matches a supplied pattern
exactly. If pointer arguments are supplied, it copies matched sub-strings that
match sub-patterns into them.
.sp
Example: successful match
pcrecpp::RE re("h.*o");
re.FullMatch("hello");
.sp
Example: unsuccessful match (requires full match):
pcrecpp::RE re("e");
!re.FullMatch("hello");
.sp
Example: creating a temporary RE object:
pcrecpp::RE("h.*o").FullMatch("hello");
.sp
You can pass in a "const char*" or a "string" for "text". The examples below
tend to use a const char*. You can, as in the different examples above, store
the RE object explicitly in a variable or use a temporary RE object. The
examples below use one mode or the other arbitrarily. Either could correctly be
used for any of these examples.
.P
You must supply extra pointer arguments to extract matched subpieces.
.sp
Example: extracts "ruby" into "s" and 1234 into "i"
int i;
string s;
pcrecpp::RE re("(\e\ew+):(\e\ed+)");
re.FullMatch("ruby:1234", &s, &i);
.sp
Example: does not try to extract any extra sub-patterns
re.FullMatch("ruby:1234", &s);
.sp
Example: does not try to extract into NULL
re.FullMatch("ruby:1234", NULL, &i);
.sp
Example: integer overflow causes failure
!re.FullMatch("ruby:1234567891234", NULL, &i);
.sp
Example: fails because there aren't enough sub-patterns:
!pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s);
.sp
Example: fails because string cannot be stored in integer
!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
.sp
The provided pointer arguments can be pointers to any scalar numeric
type, or one of:
.sp
string (matched piece is copied to string)
StringPiece (StringPiece is mutated to point to matched piece)
T (where "bool T::ParseFrom(const char*, int)" exists)
NULL (the corresponding matched sub-pattern is not copied)
.sp
The function returns true iff all of the following conditions are satisfied:
.sp
a. "text" matches "pattern" exactly;
.sp
b. The number of matched sub-patterns is >= number of supplied
pointers;
.sp
c. The "i"th argument has a suitable type for holding the
string captured as the "i"th sub-pattern. If you pass in
NULL for the "i"th argument, or pass fewer arguments than
number of sub-patterns, "i"th captured sub-pattern is
ignored.
.sp
The matching interface supports at most 16 arguments per call.
If you need more, consider using the more general interface
\fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for
\fBDoMatch\fP.
.
.SH "PARTIAL MATCHES"
.rs
.sp
You can use the "PartialMatch" operation when you want the pattern
to match any substring of the text.
.sp
Example: simple search for a string:
pcrecpp::RE("ell").PartialMatch("hello");
.sp
Example: find first number in a string:
int number;
pcrecpp::RE re("(\e\ed+)");
re.PartialMatch("x*100 + 20", &number);
assert(number == 100);
.
.
.SH "UTF-8 AND THE MATCHING INTERFACE"
.rs
.sp
By default, pattern and text are plain text, one byte per character. The UTF8
flag, passed to the constructor, causes both pattern and string to be treated
as UTF-8 text, still a byte stream but potentially multiple bytes per
character. In practice, the text is likelier to be UTF-8 than the pattern, but
the match returned may depend on the UTF8 flag, so always use it when matching
UTF8 text. For example, "." will match one byte normally but with UTF8 set may
match up to three bytes of a multi-byte character.
.sp
Example:
pcrecpp::RE_Options options;
options.set_utf8();
pcrecpp::RE re(utf8_pattern, options);
re.FullMatch(utf8_string);
.sp
Example: using the convenience function UTF8():
pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
re.FullMatch(utf8_string);
.sp
NOTE: The UTF8 flag is ignored if pcre was not configured with the
--enable-utf8 flag.
.
.
.SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE"
.rs
.sp
PCRE defines some modifiers to change the behavior of the regular expression
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
pass such modifiers to a RE class. Currently, the following modifiers are
supported:
.sp
modifier description Perl corresponding
.sp
PCRE_CASELESS case insensitive match /i
PCRE_MULTILINE multiple lines match /m
PCRE_DOTALL dot matches newlines /s
PCRE_DOLLAR_ENDONLY $ matches only at end N/A
PCRE_EXTRA strict escape parsing N/A
PCRE_EXTENDED ignore whitespaces /x
PCRE_UTF8 handles UTF8 chars built-in
PCRE_UNGREEDY reverses * and *? N/A
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
.sp
(*) Both Perl and PCRE allow non capturing parentheses by means of the
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
capture, while (ab|cd) does.
.P
For a full account on how each modifier works, please check the
PCRE API reference page.
.P
For each modifier, there are two member functions whose name is made
out of the modifier in lowercase, without the "PCRE_" prefix. For
instance, PCRE_CASELESS is handled by
.sp
bool caseless()
.sp
which returns true if the modifier is set, and
.sp
RE_Options & set_caseless(bool)
.sp
which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
accessed through the \fBset_match_limit()\fR and \fBmatch_limit()\fR member
functions. Setting \fImatch_limit\fR to a non-zero value will limit the
execution of pcre to keep it from doing bad things like blowing the stack or
taking an eternity to return a result. A value of 5000 is good enough to stop
stack blowup in a 2MB thread stack. Setting \fImatch_limit\fR to zero disables
match limiting. Alternatively, you can call \fBmatch_limit_recursion()\fP
which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
recurses. \fBmatch_limit()\fP limits the number of matches PCRE does;
\fBmatch_limit_recursion()\fP limits the depth of internal recursion, and
therefore the amount of stack that is used.
.P
Normally, to pass one or more modifiers to a RE class, you declare
a \fIRE_Options\fR object, set the appropriate options, and pass this
object to a RE constructor. Example:
.sp
RE_options opt;
opt.set_caseless(true);
if (RE("HELLO", opt).PartialMatch("hello world")) ...
.sp
RE_options has two constructors. The default constructor takes no arguments and
creates a set of flags that are off by default. The optional parameter
\fIoption_flags\fR is to facilitate transfer of legacy code from C programs.
This lets you do
.sp
RE(pattern,
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
.sp
However, new code is better off doing
.sp
RE(pattern,
RE_Options().set_caseless(true).set_multiline(true))
.PartialMatch(str);
.sp
If you are going to pass one of the most used modifiers, there are some
convenience functions that return a RE_Options class with the
appropriate modifier already set: \fBCASELESS()\fR, \fBUTF8()\fR,
\fBMULTILINE()\fR, \fBDOTALL\fR(), and \fBEXTENDED()\fR.
.P
If you need to set several options at once, and you don't want to go through
the pains of declaring a RE_Options object and setting several options, there
is a parallel method that give you such ability on the fly. You can concatenate
several \fBset_xxxxx()\fR member functions, since each of them returns a
reference to its class object. For example, to pass PCRE_CASELESS,
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
.sp
RE(" ^ xyz \e\es+ .* blah$",
RE_Options()
.set_caseless(true)
.set_extended(true)
.set_multiline(true)).PartialMatch(sometext);
.sp
.
.
.SH "SCANNING TEXT INCREMENTALLY"
.rs
.sp
The "Consume" operation may be useful if you want to repeatedly
match regular expressions at the front of a string and skip over
them as they match. This requires use of the "StringPiece" type,
which represents a sub-range of a real string. Like RE, StringPiece
is defined in the pcrecpp namespace.
.sp
Example: read lines of the form "var = value" from a string.
string contents = ...; // Fill string somehow
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
string var;
int value;
pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en");
while (re.Consume(&input, &var, &value)) {
...;
}
.sp
Each successful call to "Consume" will set "var/value", and also
advance "input" so it points past the matched text.
.P
The "FindAndConsume" operation is similar to "Consume" but does not
anchor your match at the beginning of the string. For example, you
could extract all words from a string by repeatedly calling
.sp
pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word)
.
.
.SH "PARSING HEX/OCTAL/C-RADIX NUMBERS"
.rs
.sp
By default, if you pass a pointer to a numeric value, the
corresponding text is interpreted as a base-10 number. You can
instead wrap the pointer with a call to one of the operators Hex(),
Octal(), or CRadix() to interpret the text in another base. The
CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
prefixes, but defaults to base-10.
.sp
Example:
int a, b, c, d;
pcrecpp::RE re("(.*) (.*) (.*) (.*)");
re.FullMatch("100 40 0100 0x40",
pcrecpp::Octal(&a), pcrecpp::Hex(&b),
pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
.sp
will leave 64 in a, b, c, and d.
.
.
.SH "REPLACING PARTS OF STRINGS"
.rs
.sp
You can replace the first match of "pattern" in "str" with "rewrite".
Within "rewrite", backslash-escaped digits (\e1 to \e9) can be
used to insert text matching corresponding parenthesized group
from the pattern. \e0 in "rewrite" refers to the entire matching
text. For example:
.sp
string s = "yabba dabba doo";
pcrecpp::RE("b+").Replace("d", &s);
.sp
will leave "s" containing "yada dabba doo". The result is true if the pattern
matches and a replacement occurs, false otherwise.
.P
\fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all
occurrences of the pattern in the string with the rewrite. Replacements are
not subject to re-matching. For example:
.sp
string s = "yabba dabba doo";
pcrecpp::RE("b+").GlobalReplace("d", &s);
.sp
will leave "s" containing "yada dada doo". It returns the number of
replacements made.
.P
\fBExtract\fP is like \fBReplace\fP, except that if the pattern matches,
"rewrite" is copied into "out" (an additional argument) with substitutions.
The non-matching portions of "text" are ignored. Returns true iff a match
occurred and the extraction happened successfully; if no match occurs, the
string is left unaffected.
.
.
.SH AUTHOR
.rs
.sp
The C++ wrapper was contributed by Google Inc.
.br
Copyright (c) 2005 Google Inc.

376
libs/pcre/doc/pcregrep.1 Normal file
View File

@ -0,0 +1,376 @@
.TH PCREGREP 1
.SH NAME
pcregrep - a grep with Perl-compatible regular expressions.
.SH SYNOPSIS
.B pcregrep [options] [long options] [pattern] [path1 path2 ...]
.
.SH DESCRIPTION
.rs
.sp
\fBpcregrep\fP searches files for character patterns, in the same way as other
grep commands do, but it uses the PCRE regular expression library to support
patterns that are compatible with the regular expressions of Perl 5. See
.\" HREF
\fBpcrepattern\fP
.\"
for a full description of syntax and semantics of the regular expressions that
PCRE supports.
.P
Patterns, whether supplied on the command line or in a separate file, are given
without delimiters. For example:
.sp
pcregrep Thursday /etc/motd
.sp
If you attempt to use delimiters (for example, by surrounding a pattern with
slashes, as is common in Perl scripts), they are interpreted as part of the
pattern. Quotes can of course be used on the command line because they are
interpreted by the shell, and indeed they are required if a pattern contains
white space or shell metacharacters.
.P
The first argument that follows any option settings is treated as the single
pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present.
Conversely, when one or both of these options are used to specify patterns, all
arguments are treated as path names. At least one of \fB-e\fP, \fB-f\fP, or an
argument pattern must be provided.
.P
If no files are specified, \fBpcregrep\fP reads the standard input. The
standard input can also be referenced by a name consisting of a single hyphen.
For example:
.sp
pcregrep some-pattern /file1 - /file3
.sp
By default, each line that matches the pattern is copied to the standard
output, and if there is more than one file, the file name is output at the
start of each line. However, there are options that can change how
\fBpcregrep\fP behaves. In particular, the \fB-M\fP option makes it possible to
search for patterns that span line boundaries. What defines a line boundary is
controlled by the \fB-N\fP (\fB--newline\fP) option.
.P
Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
BUFSIZ is defined in \fB<stdio.h>\fP.
.P
If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set,
\fBpcregrep\fP uses the value to set a locale when calling the PCRE library.
The \fB--locale\fP option can be used to override this.
.
.SH OPTIONS
.rs
.TP 10
\fB--\fP
This terminate the list of options. It is useful if the next item on the
command line starts with a hyphen but is not an option. This allows for the
processing of patterns and filenames that start with hyphens.
.TP
\fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP
Output \fInumber\fP lines of context after each matching line. If filenames
and/or line numbers are being output, a hyphen separator is used instead of a
colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP
guarantees to have up to 8K of following text available for context output.
.TP
\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
Output \fInumber\fP lines of context before each matching line. If filenames
and/or line numbers are being output, a hyphen separator is used instead of a
colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP
guarantees to have up to 8K of preceding text available for context output.
.TP
\fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP
Output \fInumber\fP lines of context both before and after each matching line.
This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value.
.TP
\fB-c\fP, \fB--count\fP
Do not output individual lines; instead just output a count of the number of
lines that would otherwise have been output. If several files are given, a
count is output for each of them. In this mode, the \fB-A\fP, \fB-B\fP, and
\fB-C\fP options are ignored.
.TP
\fB--colour\fP, \fB--color\fP
If this option is given without any data, it is equivalent to "--colour=auto".
If data is required, it must be given in the same shell item, separated by an
equals sign.
.TP
\fB--colour=\fP\fIvalue\fP, \fB--color=\fP\fIvalue\fP
This option specifies under what circumstances the part of a line that matched
a pattern should be coloured in the output. The value may be "never" (the
default), "always", or "auto". In the latter case, colouring happens only if
the standard output is connected to a terminal. The colour can be specified by
setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
of this variable should be a string of two numbers, separated by a semicolon.
They are copied directly into the control string for setting colour on a
terminal, so it is your responsibility to ensure that they make sense. If
neither of the environment variables is set, the default is "1;31", which gives
red.
.TP
\fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP
If an input path is not a regular file or a directory, "action" specifies how
it is to be processed. Valid values are "read" (the default) or "skip"
(silently skip the path).
.TP
\fB-d\fP \fIaction\fP, \fB--directories=\fP\fIaction\fP
If an input path is a directory, "action" specifies how it is to be processed.
Valid values are "read" (the default), "recurse" (equivalent to the \fB-r\fP
option), or "skip" (silently skip the path). In the default case, directories
are read as if they were ordinary files. In some operating systems the effect
of reading a directory like this is an immediate end-of-file.
.TP
\fB-e\fP \fIpattern\fP, \fB--regex=\fP\fIpattern\fP,
\fB--regexp=\fP\fIpattern\fP Specify a pattern to be matched. This option can
be used multiple times in order to specify several patterns. It can also be
used as a way of specifying a single pattern that starts with a hyphen. When
\fB-e\fP is used, no argument pattern is taken from the command line; all
arguments are treated as file names. There is an overall maximum of 100
patterns. They are applied to each line in the order in which they are defined
until one matches (or fails to match if \fB-v\fP is used). If \fB-f\fP is used
with \fB-e\fP, the command line patterns are matched first, followed by the
patterns from the file, independent of the order in which these options are
specified. Note that multiple use of \fB-e\fP is not the same as a single
pattern with alternatives. For example, X|Y finds the first character in a line
that is X or Y, whereas if the two patterns are given separately,
\fBpcregrep\fP finds X if it is present, even if it follows Y in the line. It
finds Y only if there is no X in the line. This really matters only if you are
using \fB-o\fP to show the portion of the line that matched.
.TP
\fB--exclude\fP=\fIpattern\fP
When \fBpcregrep\fP is searching the files in a directory as a consequence of
the \fB-r\fP (recursive search) option, any files whose names match the pattern
are excluded. The pattern is a PCRE regular expression. If a file name matches
both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no short
form for this option.
.TP
\fB-F\fP, \fB--fixed-strings\fP
Interpret each pattern as a list of fixed strings, separated by newlines,
instead of as a regular expression. The \fB-w\fP (match as a word) and \fB-x\fP
(match whole line) options can be used with \fB-F\fP. They apply to each of the
fixed strings. A line is selected if any of the fixed strings are found in it
(subject to \fB-w\fP or \fB-x\fP, if present).
.TP
\fB-f\fP \fIfilename\fP, \fB--file=\fP\fIfilename\fP
Read a number of patterns from the file, one per line, and match them against
each line of input. A data line is output if any of the patterns match it. The
filename can be given as "-" to refer to the standard input. When \fB-f\fP is
used, patterns specified on the command line using \fB-e\fP may also be
present; they are tested before the file's patterns. However, no other pattern
is taken from the command line; all arguments are treated as file names. There
is an overall maximum of 100 patterns. Trailing white space is removed from
each line, and blank lines are ignored. An empty file contains no patterns and
therefore matches nothing.
.TP
\fB-H\fP, \fB--with-filename\fP
Force the inclusion of the filename at the start of output lines when searching
a single file. By default, the filename is not shown in this case. For matching
lines, the filename is followed by a colon and a space; for context lines, a
hyphen separator is used. If a line number is also being output, it follows the
file name without a space.
.TP
\fB-h\fP, \fB--no-filename\fP
Suppress the output filenames when searching multiple files. By default,
filenames are shown when multiple files are searched. For matching lines, the
filename is followed by a colon and a space; for context lines, a hyphen
separator is used. If a line number is also being output, it follows the file
name without a space.
.TP
\fB--help\fP
Output a brief help message and exit.
.TP
\fB-i\fP, \fB--ignore-case\fP
Ignore upper/lower case distinctions during comparisons.
.TP
\fB--include\fP=\fIpattern\fP
When \fBpcregrep\fP is searching the files in a directory as a consequence of
the \fB-r\fP (recursive search) option, only those files whose names match the
pattern are included. The pattern is a PCRE regular expression. If a file name
matches both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no
short form for this option.
.TP
\fB-L\fP, \fB--files-without-match\fP
Instead of outputting lines from the files, just output the names of the files
that do not contain any lines that would have been output. Each file name is
output once, on a separate line.
.TP
\fB-l\fP, \fB--files-with-matches\fP
Instead of outputting lines from the files, just output the names of the files
containing lines that would have been output. Each file name is output
once, on a separate line. Searching stops as soon as a matching line is found
in a file.
.TP
\fB--label\fP=\fIname\fP
This option supplies a name to be used for the standard input when file names
are being output. If not supplied, "(standard input)" is used. There is no
short form for this option.
.TP
\fB--locale\fP=\fIlocale-name\fP
This option specifies a locale to be used for pattern matching. It overrides
the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no
locale is specified, the PCRE library's default (usually the "C" locale) is
used. There is no short form for this option.
.TP
\fB-M\fP, \fB--multiline\fP
Allow patterns to match more than one line. When this option is given, patterns
may usefully contain literal newline characters and internal occurrences of ^
and $ characters. The output for any one match may consist of more than one
line. When this option is set, the PCRE library is called in "multiline" mode.
There is a limit to the number of lines that can be matched, imposed by the way
that \fBpcregrep\fP buffers the input file as it scans it. However,
\fBpcregrep\fP ensures that at least 8K characters or the rest of the document
(whichever is the shorter) are available for forward matching, and similarly
the previous 8K characters (or all the previous characters, if fewer than 8K)
are guaranteed to be available for lookbehind assertions.
.TP
\fB-N\fP \fInewline-type\fP, \fB--newline=\fP\fInewline-type\fP
The PCRE library supports three different character sequences for indicating
the ends of lines. They are the single-character sequences CR (carriage return)
and LF (linefeed), and the two-character sequence CR, LF. When the library is
built, a default line-ending sequence is specified. This is normally the
standard sequence for the operating system. Unless otherwise specified by this
option, \fBpcregrep\fP uses the default. The possible values for this option
are CR, LF, or CRLF. This makes it possible to use \fBpcregrep\fP on files that
have come from other environments without having to modify their line endings.
If the data that is being scanned does not agree with the convention set by
this option, \fBpcregrep\fP may behave in strange ways.
.TP
\fB-n\fP, \fB--line-number\fP
Precede each output line by its line number in the file, followed by a colon
and a space for matching lines or a hyphen and a space for context lines. If
the filename is also being output, it precedes the line number.
.TP
\fB-o\fP, \fB--only-matching\fP
Show only the part of the line that matched a pattern. In this mode, no
context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are
ignored.
.TP
\fB-q\fP, \fB--quiet\fP
Work quietly, that is, display nothing except error messages. The exit
status indicates whether or not any matches were found.
.TP
\fB-r\fP, \fB--recursive\fP
If any given path is a directory, recursively scan the files it contains,
taking note of any \fB--include\fP and \fB--exclude\fP settings. By default, a
directory is read as a normal file; in some operating systems this gives an
immediate end-of-file. This option is a shorthand for setting the \fB-d\fP
option to "recurse".
.TP
\fB-s\fP, \fB--no-messages\fP
Suppress error messages about non-existent or unreadable files. Such files are
quietly skipped. However, the return code is still 2, even if matches were
found in other files.
.TP
\fB-u\fP, \fB--utf-8\fP
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
with UTF-8 support. Both patterns and subject lines must be valid strings of
UTF-8 characters.
.TP
\fB-V\fP, \fB--version\fP
Write the version numbers of \fBpcregrep\fP and the PCRE library that is being
used to the standard error stream.
.TP
\fB-v\fP, \fB--invert-match\fP
Invert the sense of the match, so that lines which do \fInot\fP match any of
the patterns are the ones that are found.
.TP
\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP
Force the patterns to match only whole words. This is equivalent to having \eb
at the start and end of the pattern.
.TP
\fB-x\fP, \fB--line-regex\fP, \fP--line-regexp\fP
Force the patterns to be anchored (each must start matching at the beginning of
a line) and in addition, require them to match entire lines. This is
equivalent to having ^ and $ characters at the start and end of each
alternative branch in every pattern.
.
.
.SH "ENVIRONMENT VARIABLES"
.rs
.sp
The environment variables \fBLC_ALL\fP and \fBLC_CTYPE\fP are examined, in that
order, for a locale. The first one that is set is used. This can be overridden
by the \fB--locale\fP option. If no locale is set, the PCRE library's default
(usually the "C" locale) is used.
.
.
.SH "NEWLINES"
.rs
.sp
The \fB-N\fP (\fB--newline\fP) option allows \fBpcregrep\fP to scan files with
different newline conventions from the default. However, the setting of this
option does not affect the way in which \fBpcregrep\fP writes information to
the standard error and output streams. It uses the string "\en" in C
\fBprintf()\fP calls to indicate newlines, relying on the C I/O library to
convert this to an appropriate sequence if the output is sent to a file.
.
.
.SH "OPTIONS COMPATIBILITY"
.rs
.sp
The majority of short and long forms of \fBpcregrep\fP's options are the same
as in the GNU \fBgrep\fP program. Any long option of the form
\fB--xxx-regexp\fP (GNU terminology) is also available as \fB--xxx-regex\fP
(PCRE terminology). However, the \fB--locale\fP, \fB-M\fP, \fB--multiline\fP,
\fB-u\fP, and \fB--utf-8\fP options are specific to \fBpcregrep\fP.
.
.
.SH "OPTIONS WITH DATA"
.rs
.sp
There are four different ways in which an option with data can be specified.
If a short form option is used, the data may follow immediately, or in the next
command line item. For example:
.sp
-f/some/file
-f /some/file
.sp
If a long form option is used, the data may appear in the same command line
item, separated by an equals character, or (with one exception) it may appear
in the next command line item. For example:
.sp
--file=/some/file
--file /some/file
.sp
Note, however, that if you want to supply a file name beginning with ~ as data
in a shell command, and have the shell expand ~ to a home directory, you must
separate the file name from the option, because the shell does not treat ~
specially unless it is at the start of an item.
.P
The exception to the above is the \fB--colour\fP (or \fB--color\fP) option,
for which the data is optional. If this option does have data, it must be given
in the first form, using an equals character. Otherwise it will be assumed that
it has no data.
.
.
.SH MATCHING ERRORS
.rs
.sp
It is possible to supply a regular expression that takes a very long time to
fail to match certain lines. Such patterns normally involve nested indefinite
repeats, for example: (a+)*\ed when matched against a line of a's with no final
digit. The PCRE matching function has a resource limit that causes it to abort
in these circumstances. If this happens, \fBpcregrep\fP outputs an error
message and the line that caused the problem to the standard error stream. If
there are more than 20 such errors, \fBpcregrep\fP gives up.
.
.
.SH DIAGNOSTICS
.rs
.sp
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
for syntax errors and non-existent or inacessible files (even if matches were
found in other files) or too many matching errors. Using the \fB-s\fP option to
suppress error messages about inaccessble files does not affect the return
code.
.
.
.SH AUTHOR
.rs
.sp
Philip Hazel
.br
University Computing Service
.br
Cambridge CB2 3QG, England.
.P
.in 0
Last updated: 06 June 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

399
libs/pcre/doc/pcregrep.txt Normal file
View File

@ -0,0 +1,399 @@
PCREGREP(1) PCREGREP(1)
NAME
pcregrep - a grep with Perl-compatible regular expressions.
SYNOPSIS
pcregrep [options] [long options] [pattern] [path1 path2 ...]
DESCRIPTION
pcregrep searches files for character patterns, in the same way as
other grep commands do, but it uses the PCRE regular expression library
to support patterns that are compatible with the regular expressions of
Perl 5. See pcrepattern for a full description of syntax and semantics
of the regular expressions that PCRE supports.
Patterns, whether supplied on the command line or in a separate file,
are given without delimiters. For example:
pcregrep Thursday /etc/motd
If you attempt to use delimiters (for example, by surrounding a pattern
with slashes, as is common in Perl scripts), they are interpreted as
part of the pattern. Quotes can of course be used on the command line
because they are interpreted by the shell, and indeed they are required
if a pattern contains white space or shell metacharacters.
The first argument that follows any option settings is treated as the
single pattern to be matched when neither -e nor -f is present. Con-
versely, when one or both of these options are used to specify pat-
terns, all arguments are treated as path names. At least one of -e, -f,
or an argument pattern must be provided.
If no files are specified, pcregrep reads the standard input. The stan-
dard input can also be referenced by a name consisting of a single
hyphen. For example:
pcregrep some-pattern /file1 - /file3
By default, each line that matches the pattern is copied to the stan-
dard output, and if there is more than one file, the file name is out-
put at the start of each line. However, there are options that can
change how pcregrep behaves. In particular, the -M option makes it pos-
sible to search for patterns that span line boundaries. What defines a
line boundary is controlled by the -N (--newline) option.
Patterns are limited to 8K or BUFSIZ characters, whichever is the
greater. BUFSIZ is defined in <stdio.h>.
If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
the value to set a locale when calling the PCRE library. The --locale
option can be used to override this.
OPTIONS
-- This terminate the list of options. It is useful if the next
item on the command line starts with a hyphen but is not an
option. This allows for the processing of patterns and file-
names that start with hyphens.
-A number, --after-context=number
Output number lines of context after each matching line. If
filenames and/or line numbers are being output, a hyphen sep-
arator is used instead of a colon for the context lines. A
line containing "--" is output between each group of lines,
unless they are in fact contiguous in the input file. The
value of number is expected to be relatively small. However,
pcregrep guarantees to have up to 8K of following text avail-
able for context output.
-B number, --before-context=number
Output number lines of context before each matching line. If
filenames and/or line numbers are being output, a hyphen sep-
arator is used instead of a colon for the context lines. A
line containing "--" is output between each group of lines,
unless they are in fact contiguous in the input file. The
value of number is expected to be relatively small. However,
pcregrep guarantees to have up to 8K of preceding text avail-
able for context output.
-C number, --context=number
Output number lines of context both before and after each
matching line. This is equivalent to setting both -A and -B
to the same value.
-c, --count
Do not output individual lines; instead just output a count
of the number of lines that would otherwise have been output.
If several files are given, a count is output for each of
them. In this mode, the -A, -B, and -C options are ignored.
--colour, --color
If this option is given without any data, it is equivalent to
"--colour=auto". If data is required, it must be given in
the same shell item, separated by an equals sign.
--colour=value, --color=value
This option specifies under what circumstances the part of a
line that matched a pattern should be coloured in the output.
The value may be "never" (the default), "always", or "auto".
In the latter case, colouring happens only if the standard
output is connected to a terminal. The colour can be speci-
fied by setting the environment variable PCREGREP_COLOUR or
PCREGREP_COLOR. The value of this variable should be a string
of two numbers, separated by a semicolon. They are copied
directly into the control string for setting colour on a ter-
minal, so it is your responsibility to ensure that they make
sense. If neither of the environment variables is set, the
default is "1;31", which gives red.
-D action, --devices=action
If an input path is not a regular file or a directory,
"action" specifies how it is to be processed. Valid values
are "read" (the default) or "skip" (silently skip the path).
-d action, --directories=action
If an input path is a directory, "action" specifies how it is
to be processed. Valid values are "read" (the default),
"recurse" (equivalent to the -r option), or "skip" (silently
skip the path). In the default case, directories are read as
if they were ordinary files. In some operating systems the
effect of reading a directory like this is an immediate end-
of-file.
-e pattern, --regex=pattern,
--regexp=pattern Specify a pattern to be matched. This option
can be used multiple times in order to specify several pat-
terns. It can also be used as a way of specifying a single
pattern that starts with a hyphen. When -e is used, no argu-
ment pattern is taken from the command line; all arguments
are treated as file names. There is an overall maximum of 100
patterns. They are applied to each line in the order in which
they are defined until one matches (or fails to match if -v
is used). If -f is used with -e, the command line patterns
are matched first, followed by the patterns from the file,
independent of the order in which these options are speci-
fied. Note that multiple use of -e is not the same as a sin-
gle pattern with alternatives. For example, X|Y finds the
first character in a line that is X or Y, whereas if the two
patterns are given separately, pcregrep finds X if it is
present, even if it follows Y in the line. It finds Y only if
there is no X in the line. This really matters only if you
are using -o to show the portion of the line that matched.
--exclude=pattern
When pcregrep is searching the files in a directory as a con-
sequence of the -r (recursive search) option, any files whose
names match the pattern are excluded. The pattern is a PCRE
regular expression. If a file name matches both --include and
--exclude, it is excluded. There is no short form for this
option.
-F, --fixed-strings
Interpret each pattern as a list of fixed strings, separated
by newlines, instead of as a regular expression. The -w
(match as a word) and -x (match whole line) options can be
used with -F. They apply to each of the fixed strings. A line
is selected if any of the fixed strings are found in it (sub-
ject to -w or -x, if present).
-f filename, --file=filename
Read a number of patterns from the file, one per line, and
match them against each line of input. A data line is output
if any of the patterns match it. The filename can be given as
"-" to refer to the standard input. When -f is used, patterns
specified on the command line using -e may also be present;
they are tested before the file's patterns. However, no other
pattern is taken from the command line; all arguments are
treated as file names. There is an overall maximum of 100
patterns. Trailing white space is removed from each line, and
blank lines are ignored. An empty file contains no patterns
and therefore matches nothing.
-H, --with-filename
Force the inclusion of the filename at the start of output
lines when searching a single file. By default, the filename
is not shown in this case. For matching lines, the filename
is followed by a colon and a space; for context lines, a
hyphen separator is used. If a line number is also being out-
put, it follows the file name without a space.
-h, --no-filename
Suppress the output filenames when searching multiple files.
By default, filenames are shown when multiple files are
searched. For matching lines, the filename is followed by a
colon and a space; for context lines, a hyphen separator is
used. If a line number is also being output, it follows the
file name without a space.
--help Output a brief help message and exit.
-i, --ignore-case
Ignore upper/lower case distinctions during comparisons.
--include=pattern
When pcregrep is searching the files in a directory as a con-
sequence of the -r (recursive search) option, only those
files whose names match the pattern are included. The pattern
is a PCRE regular expression. If a file name matches both
--include and --exclude, it is excluded. There is no short
form for this option.
-L, --files-without-match
Instead of outputting lines from the files, just output the
names of the files that do not contain any lines that would
have been output. Each file name is output once, on a sepa-
rate line.
-l, --files-with-matches
Instead of outputting lines from the files, just output the
names of the files containing lines that would have been out-
put. Each file name is output once, on a separate line.
Searching stops as soon as a matching line is found in a
file.
--label=name
This option supplies a name to be used for the standard input
when file names are being output. If not supplied, "(standard
input)" is used. There is no short form for this option.
--locale=locale-name
This option specifies a locale to be used for pattern match-
ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
ronment variables. If no locale is specified, the PCRE
library's default (usually the "C" locale) is used. There is
no short form for this option.
-M, --multiline
Allow patterns to match more than one line. When this option
is given, patterns may usefully contain literal newline char-
acters and internal occurrences of ^ and $ characters. The
output for any one match may consist of more than one line.
When this option is set, the PCRE library is called in "mul-
tiline" mode. There is a limit to the number of lines that
can be matched, imposed by the way that pcregrep buffers the
input file as it scans it. However, pcregrep ensures that at
least 8K characters or the rest of the document (whichever is
the shorter) are available for forward matching, and simi-
larly the previous 8K characters (or all the previous charac-
ters, if fewer than 8K) are guaranteed to be available for
lookbehind assertions.
-N newline-type, --newline=newline-type
The PCRE library supports three different character sequences
for indicating the ends of lines. They are the single-charac-
ter sequences CR (carriage return) and LF (linefeed), and the
two-character sequence CR, LF. When the library is built, a
default line-ending sequence is specified. This is normally
the standard sequence for the operating system. Unless other-
wise specified by this option, pcregrep uses the default. The
possible values for this option are CR, LF, or CRLF. This
makes it possible to use pcregrep on files that have come
from other environments without having to modify their line
endings. If the data that is being scanned does not agree
with the convention set by this option, pcregrep may behave
in strange ways.
-n, --line-number
Precede each output line by its line number in the file, fol-
lowed by a colon and a space for matching lines or a hyphen
and a space for context lines. If the filename is also being
output, it precedes the line number.
-o, --only-matching
Show only the part of the line that matched a pattern. In
this mode, no context is shown. That is, the -A, -B, and -C
options are ignored.
-q, --quiet
Work quietly, that is, display nothing except error messages.
The exit status indicates whether or not any matches were
found.
-r, --recursive
If any given path is a directory, recursively scan the files
it contains, taking note of any --include and --exclude set-
tings. By default, a directory is read as a normal file; in
some operating systems this gives an immediate end-of-file.
This option is a shorthand for setting the -d option to
"recurse".
-s, --no-messages
Suppress error messages about non-existent or unreadable
files. Such files are quietly skipped. However, the return
code is still 2, even if matches were found in other files.
-u, --utf-8
Operate in UTF-8 mode. This option is available only if PCRE
has been compiled with UTF-8 support. Both patterns and sub-
ject lines must be valid strings of UTF-8 characters.
-V, --version
Write the version numbers of pcregrep and the PCRE library
that is being used to the standard error stream.
-v, --invert-match
Invert the sense of the match, so that lines which do not
match any of the patterns are the ones that are found.
-w, --word-regex, --word-regexp
Force the patterns to match only whole words. This is equiva-
lent to having \b at the start and end of the pattern.
-x, --line-regex, --line-regexp
Force the patterns to be anchored (each must start matching
at the beginning of a line) and in addition, require them to
match entire lines. This is equivalent to having ^ and $
characters at the start and end of each alternative branch in
every pattern.
ENVIRONMENT VARIABLES
The environment variables LC_ALL and LC_CTYPE are examined, in that
order, for a locale. The first one that is set is used. This can be
overridden by the --locale option. If no locale is set, the PCRE
library's default (usually the "C" locale) is used.
NEWLINES
The -N (--newline) option allows pcregrep to scan files with different
newline conventions from the default. However, the setting of this
option does not affect the way in which pcregrep writes information to
the standard error and output streams. It uses the string "\n" in C
printf() calls to indicate newlines, relying on the C I/O library to
convert this to an appropriate sequence if the output is sent to a
file.
OPTIONS COMPATIBILITY
The majority of short and long forms of pcregrep's options are the same
as in the GNU grep program. Any long option of the form --xxx-regexp
(GNU terminology) is also available as --xxx-regex (PCRE terminology).
However, the --locale, -M, --multiline, -u, and --utf-8 options are
specific to pcregrep.
OPTIONS WITH DATA
There are four different ways in which an option with data can be spec-
ified. If a short form option is used, the data may follow immedi-
ately, or in the next command line item. For example:
-f/some/file
-f /some/file
If a long form option is used, the data may appear in the same command
line item, separated by an equals character, or (with one exception) it
may appear in the next command line item. For example:
--file=/some/file
--file /some/file
Note, however, that if you want to supply a file name beginning with ~
as data in a shell command, and have the shell expand ~ to a home
directory, you must separate the file name from the option, because the
shell does not treat ~ specially unless it is at the start of an item.
The exception to the above is the --colour (or --color) option, for
which the data is optional. If this option does have data, it must be
given in the first form, using an equals character. Otherwise it will
be assumed that it has no data.
MATCHING ERRORS
It is possible to supply a regular expression that takes a very long
time to fail to match certain lines. Such patterns normally involve
nested indefinite repeats, for example: (a+)*\d when matched against a
line of a's with no final digit. The PCRE matching function has a
resource limit that causes it to abort in these circumstances. If this
happens, pcregrep outputs an error message and the line that caused the
problem to the standard error stream. If there are more than 20 such
errors, pcregrep gives up.
DIAGNOSTICS
Exit status is 0 if any matches were found, 1 if no matches were found,
and 2 for syntax errors and non-existent or inacessible files (even if
matches were found in other files) or too many matching errors. Using
the -s option to suppress error messages about inaccessble files does
not affect the return code.
AUTHOR
Philip Hazel
University Computing Service
Cambridge CB2 3QG, England.
Last updated: 06 June 2006
Copyright (c) 1997-2006 University of Cambridge.

View File

@ -0,0 +1,157 @@
.TH PCREMATCHING 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE MATCHING ALGORITHMS"
.rs
.sp
This document describes the two different algorithms that are available in PCRE
for matching a compiled regular expression against a given subject string. The
"standard" algorithm is the one provided by the \fBpcre_exec()\fP function.
This works in the same was as Perl's matching function, and provides a
Perl-compatible matching operation.
.P
An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP function;
this operates in a different way, and is not Perl-compatible. It has advantages
and disadvantages compared with the standard algorithm, and these are described
below.
.P
When there is only one possible way in which a given subject string can match a
pattern, the two algorithms give the same answer. A difference arises, however,
when there are multiple possibilities. For example, if the pattern
.sp
^<.*>
.sp
is matched against the string
.sp
<something> <something else> <something further>
.sp
there are three possible answers. The standard algorithm finds only one of
them, whereas the DFA algorithm finds all three.
.
.SH "REGULAR EXPRESSIONS AS TREES"
.rs
.sp
The set of strings that are matched by a regular expression can be represented
as a tree structure. An unlimited repetition in the pattern makes the tree of
infinite size, but it is still a tree. Matching the pattern to a given subject
string (from a given starting point) can be thought of as a search of the tree.
There are two ways to search a tree: depth-first and breadth-first, and these
correspond to the two matching algorithms provided by PCRE.
.
.SH "THE STANDARD MATCHING ALGORITHM"
.rs
.sp
In the terminology of Jeffrey Friedl's book \fIMastering Regular
Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a
depth-first search of the pattern tree. That is, it proceeds along a single
path through the tree, checking that the subject matches what is required. When
there is a mismatch, the algorithm tries any alternatives at the current point,
and if they all fail, it backs up to the previous branch point in the tree, and
tries the next alternative branch at that level. This often involves backing up
(moving to the left) in the subject string as well. The order in which
repetition branches are tried is controlled by the greedy or ungreedy nature of
the quantifier.
.P
If a leaf node is reached, a matching string has been found, and at that point
the algorithm stops. Thus, if there is more than one possible match, this
algorithm returns the first one that it finds. Whether this is the shortest,
the longest, or some intermediate length depends on the way the greedy and
ungreedy repetition quantifiers are specified in the pattern.
.P
Because it ends up with a single path through the tree, it is relatively
straightforward for this algorithm to keep track of the substrings that are
matched by portions of the pattern in parentheses. This provides support for
capturing parentheses and back references.
.
.SH "THE DFA MATCHING ALGORITHM"
.rs
.sp
DFA stands for "deterministic finite automaton", but you do not need to
understand the origins of that name. This algorithm conducts a breadth-first
search of the tree. Starting from the first matching point in the subject, it
scans the subject string from left to right, once, character by character, and
as it does this, it remembers all the paths through the tree that represent
valid matches.
.P
The scan continues until either the end of the subject is reached, or there are
no more unterminated paths. At this point, terminated paths represent the
different matching possibilities (if there are none, the match has failed).
Thus, if there is more than one possible match, this algorithm finds all of
them, and in particular, it finds the longest. In PCRE, there is an option to
stop the algorithm after the first match (which is necessarily the shortest)
has been found.
.P
Note that all the matches that are found start at the same point in the
subject. If the pattern
.sp
cat(er(pillar)?)
.sp
is matched against the string "the caterpillar catchment", the result will be
the three strings "cat", "cater", and "caterpillar" that start at the fourth
character of the subject. The algorithm does not automatically move on to find
matches that start at later positions.
.P
There are a number of features of PCRE regular expressions that are not
supported by the DFA matching algorithm. They are as follows:
.P
1. Because the algorithm finds all possible matches, the greedy or ungreedy
nature of repetition quantifiers is not relevant. Greedy and ungreedy
quantifiers are treated in exactly the same way.
.P
2. When dealing with multiple paths through the tree simultaneously, it is not
straightforward to keep track of captured substrings for the different matching
possibilities, and PCRE's implementation of this algorithm does not attempt to
do this. This means that no captured substrings are available.
.P
3. Because no substrings are captured, back references within the pattern are
not supported, and cause errors if encountered.
.P
4. For the same reason, conditional expressions that use a backreference as the
condition are not supported.
.P
5. Callouts are supported, but the value of the \fIcapture_top\fP field is
always 1, and the value of the \fIcapture_last\fP field is always -1.
.P
6.
The \eC escape sequence, which (in the standard algorithm) matches a single
byte, even in UTF-8 mode, is not supported because the DFA algorithm moves
through the subject string one character at a time, for all active paths
through the tree.
.
.SH "ADVANTAGES OF THE DFA ALGORITHM"
.rs
.sp
Using the DFA matching algorithm provides the following advantages:
.P
1. All possible matches (at a single point in the subject) are automatically
found, and in particular, the longest match is found. To find more than one
match using the standard algorithm, you have to do kludgy things with
callouts.
.P
2. There is much better support for partial matching. The restrictions on the
content of the pattern that apply when using the standard algorithm for partial
matching do not apply to the DFA algorithm. For non-anchored patterns, the
starting position of a partial match is available.
.P
3. Because the DFA algorithm scans the subject string just once, and never
needs to backtrack, it is possible to pass very long subject strings to the
matching function in several pieces, checking for partial matching each time.
.
.SH "DISADVANTAGES OF THE DFA ALGORITHM"
.rs
.sp
The DFA algorithm suffers from a number of disadvantages:
.P
1. It is substantially slower than the standard algorithm. This is partly
because it has to search for all possible matches, but is also because it is
less susceptible to optimization.
.P
2. Capturing parentheses and back references are not supported.
.P
3. The "atomic group" feature of PCRE regular expressions is supported, but
does not provide the advantage that it does for the standard algorithm.
.P
.in 0
Last updated: 06 June 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

203
libs/pcre/doc/pcrepartial.3 Normal file
View File

@ -0,0 +1,203 @@
.TH PCREPARTIAL 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PARTIAL MATCHING IN PCRE"
.rs
.sp
In normal use of PCRE, if the subject string that is passed to
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP matches as far as it goes, but is
too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
are circumstances where it might be helpful to distinguish this case from other
cases in which there is no match.
.P
Consider, for example, an application where a human is required to type in data
for a field with specific formatting requirements. An example might be a date
in the form \fIddmmmyy\fP, defined by this pattern:
.sp
^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$
.sp
If the application sees the user's keystrokes one by one, and can check that
what has been typed so far is potentially valid, it is able to raise an error
as soon as a mistake is made, possibly beeping and not reflecting the
character that has been typed. This immediate feedback is likely to be a better
user interface than a check that is delayed until the entire string has been
entered.
.P
PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
option, which can be set when calling \fBpcre_exec()\fP or
\fBpcre_dfa_exec()\fP. When this flag is set for \fBpcre_exec()\fP, the return
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
during the matching process the last part of the subject string matched part of
the pattern. Unfortunately, for non-anchored matching, it is not possible to
obtain the position of the start of the partial match. No captured data is set
when PCRE_ERROR_PARTIAL is returned.
.P
When PCRE_PARTIAL is set for \fBpcre_dfa_exec()\fP, the return code
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
subject is reached, there have been no complete matches, but there is still at
least one matching possibility. The portion of the string that provided the
partial match is set as the first matching string.
.P
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
last literal byte in a pattern, and abandons matching immediately if such a
byte is not present in the subject string. This optimization cannot be used
for a subject string that might match only partially.
.
.
.SH "RESTRICTED PATTERNS FOR PCRE_PARTIAL"
.rs
.sp
Because of the way certain internal optimizations are implemented in the
\fBpcre_exec()\fP function, the PCRE_PARTIAL option cannot be used with all
patterns. These restrictions do not apply when \fBpcre_dfa_exec()\fP is used.
For \fBpcre_exec()\fP, repeated single characters such as
.sp
a{2,4}
.sp
and repeated single metasequences such as
.sp
\ed+
.sp
are not permitted if the maximum number of occurrences is greater than one.
Optional items such as \ed? (where the maximum is one) are permitted.
Quantifiers with any values are permitted after parentheses, so the invalid
examples above can be coded thus:
.sp
(a){2,4}
(\ed)+
.sp
These constructions run more slowly, but for the kinds of application that are
envisaged for this facility, this is not felt to be a major restriction.
.P
If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
\fBpcre_exec()\fP returns the error code PCRE_ERROR_BADPARTIAL (-13).
.
.
.SH "EXAMPLE OF PARTIAL MATCHING USING PCRETEST"
.rs
.sp
If the escape sequence \eP is present in a \fBpcretest\fP data line, the
PCRE_PARTIAL flag is used for the match. Here is a run of \fBpcretest\fP that
uses the date example quoted above:
.sp
re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
data> 25jun04\eP
0: 25jun04
1: jun
data> 25dec3\eP
Partial match
data> 3ju\eP
Partial match
data> 3juj\eP
No match
data> j\eP
No match
.sp
The first data string is matched completely, so \fBpcretest\fP shows the
matched substrings. The remaining four strings do not match the complete
pattern, but the first two are partial matches. The same test, using DFA
matching (by means of the \eD escape sequence), produces the following output:
.sp
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data> 25jun04\eP\eD
0: 25jun04
data> 23dec3\eP\eD
Partial match: 23dec3
data> 3ju\eP\eD
Partial match: 3ju
data> 3juj\eP\eD
No match
data> j\eP\eD
No match
.sp
Notice that in this case the portion of the string that was matched is made
available.
.
.
.SH "MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()"
.rs
.sp
When a partial match has been found using \fBpcre_dfa_exec()\fP, it is possible
to continue the match by providing additional subject data and calling
\fBpcre_dfa_exec()\fP again with the PCRE_DFA_RESTART option and the same
working space (where details of the previous partial match are stored). Here is
an example using \fBpcretest\fP, where the \eR escape sequence sets the
PCRE_DFA_RESTART option and the \eD escape sequence requests the use of
\fBpcre_dfa_exec()\fP:
.sp
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data> 23ja\eP\eD
Partial match: 23ja
data> n05\eR\eD
0: n05
.sp
The first call has "23ja" as the subject, and requests partial matching; the
second call has "n05" as the subject for the continued (restarted) match.
Notice that when the match is complete, only the last part is shown; PCRE does
not retain the previously partially-matched string. It is up to the calling
program to do that if it needs to.
.P
This facility can be used to pass very long subject strings to
\fBpcre_dfa_exec()\fP. However, some care is needed for certain types of
pattern.
.P
1. If the pattern contains tests for the beginning or end of a line, you need
to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
subject string for any call does not contain the beginning or end of a line.
.P
2. If the pattern contains backward assertions (including \eb or \eB), you need
to arrange for some overlap in the subject strings to allow for this. For
example, you could pass the subject in chunks that were 500 bytes long, but in
a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
bytes at the start of the buffer.
.P
3. Matching a subject string that is split into multiple segments does not
always produce exactly the same result as matching over one single long string.
The difference arises when there are multiple matching possibilities, because a
partial match result is given only when there are no completed matches in a
call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
been found, continuation to a new subject segment is no longer possible.
Consider this \fBpcretest\fP example:
.sp
re> /dog(sbody)?/
data> do\eP\eD
Partial match: do
data> gsb\eR\eP\eD
0: g
data> dogsbody\eD
0: dogsbody
1: dog
.sp
The pattern matches the words "dog" or "dogsbody". When the subject is
presented in several parts ("do" and "gsb" being the first two) the match stops
when "dog" has been found, and it is not possible to continue. On the other
hand, if "dogsbody" is presented as a single string, both matches are found.
.P
Because of this phenomenon, it does not usually make sense to end a pattern
that is going to be matched in this way with a variable repeat.
.P
4. Patterns that contain alternatives at the top level which do not all
start with the same pattern item may not work as expected. For example,
consider this pattern:
.sp
1234|3789
.sp
If the first part of the subject is "ABC123", a partial match of the first
alternative is found at offset 3. There is no partial match for the second
alternative, because such a match does not start at the same point in the
subject string. Attempting to continue with the string "789" does not yield a
match because only those alternatives that match at one point in the subject
are remembered. The problem arises because the start of the second alternative
matches within the first alternative. There is no problem with anchored
patterns or patterns such as:
.sp
1234|ABCD
.sp
where no string can be a partial match for both alternatives.
.
.
.P
.in 0
Last updated: 16 January 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

1645
libs/pcre/doc/pcrepattern.3 Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,76 @@
.TH PCREPERFORM 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE PERFORMANCE"
.rs
.sp
Certain items that may appear in regular expression patterns are more efficient
than others. It is more efficient to use a character class like [aeiou] than a
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
that provides the required behaviour is usually the most efficient. Jeffrey
Friedl's book contains a lot of useful general discussion about optimizing
regular expressions for efficient performance. This document contains a few
observations about PCRE.
.P
Using Unicode character properties (the \ep, \eP, and \eX escapes) is slow,
because PCRE has to scan a structure that contains data for over fifteen
thousand characters whenever it needs a character's property. If you can find
an alternative pattern that does not use character properties, it will probably
be faster.
.P
When a pattern begins with .* not in parentheses, or in parentheses that are
not the subject of a backreference, and the PCRE_DOTALL option is set, the
pattern is implicitly anchored by PCRE, since it can match only at the start of
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
optimization, because the . metacharacter does not then match a newline, and if
the subject string contains newlines, the pattern may match from the character
immediately following one of them instead of from the very start. For example,
the pattern
.sp
.*second
.sp
matches the subject "first\enand second" (where \en stands for a newline
character), with the match starting at the seventh character. In order to do
this, PCRE has to retry the match starting after every newline in the subject.
.P
If you are using such a pattern with subject strings that do not contain
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE
from having to scan along the subject looking for a newline to restart at.
.P
Beware of patterns that contain nested indefinite repeats. These can take a
long time to run when applied to a string that does not match. Consider the
pattern fragment
.sp
(a+)*
.sp
This can match "aaaa" in 33 different ways, and this number increases very
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
times, and for each of those cases other than 0, the + repeats can match
different numbers of times.) When the remainder of the pattern is such that the
entire match is going to fail, PCRE has in principle to try every possible
variation, and this can take an extremely long time.
.P
An optimization catches some of the more simple cases such as
.sp
(a+)*b
.sp
where a literal character follows. Before embarking on the standard matching
procedure, PCRE checks that there is a "b" later in the subject string, and if
there is not, it fails the match immediately. However, when there is no
following literal this optimization cannot be used. You can see the difference
by comparing the behaviour of
.sp
(a+)*\ed
.sp
with the pattern above. The former gives a failure almost instantly when
applied to a whole line of "a" characters, whereas the latter takes an
appreciable time with strings longer than about 20 characters.
.P
In many cases, the solution to this kind of performance issue is to use an
atomic group or a possessive quantifier.
.P
.in 0
Last updated: 28 February 2005
.br
Copyright (c) 1997-2005 University of Cambridge.

226
libs/pcre/doc/pcreposix.3 Normal file
View File

@ -0,0 +1,226 @@
.TH PCREPOSIX 3
.SH NAME
PCRE - Perl-compatible regular expressions.
.SH "SYNOPSIS OF POSIX API"
.rs
.sp
.B #include <pcreposix.h>
.PP
.SM
.br
.B int regcomp(regex_t *\fIpreg\fP, const char *\fIpattern\fP,
.ti +5n
.B int \fIcflags\fP);
.PP
.br
.B int regexec(regex_t *\fIpreg\fP, const char *\fIstring\fP,
.ti +5n
.B size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);
.PP
.br
.B size_t regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP,
.ti +5n
.B char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);
.PP
.br
.B void regfree(regex_t *\fIpreg\fP);
.
.SH DESCRIPTION
.rs
.sp
This set of functions provides a POSIX-style API to the PCRE regular expression
package. See the
.\" HREF
\fBpcreapi\fP
.\"
documentation for a description of PCRE's native API, which contains much
additional functionality.
.P
The functions described here are just wrapper functions that ultimately call
the PCRE native API. Their prototypes are defined in the \fBpcreposix.h\fP
header file, and on Unix systems the library itself is called
\fBpcreposix.a\fP, so can be accessed by adding \fB-lpcreposix\fP to the
command for linking an application that uses them. Because the POSIX functions
call the native ones, it is also necessary to add \fB-lpcre\fP.
.P
I have implemented only those option bits that can be reasonably mapped to PCRE
native options. In addition, the option REG_EXTENDED is defined with the value
zero. This has no effect, but since programs that are written to the POSIX
interface often use it, this makes it easier to slot in PCRE as a replacement
library. Other POSIX options are not even defined.
.P
When PCRE is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are
still those of Perl, subject to the setting of various PCRE options, as
described below. "POSIX-like in style" means that the API approximates to the
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
domains it is probably even less compatible.
.P
The header for these functions is supplied as \fBpcreposix.h\fP to avoid any
potential clash with other POSIX libraries. It can, of course, be renamed or
aliased as \fBregex.h\fP, which is the "correct" name. It provides two
structure types, \fIregex_t\fP for compiled internal forms, and
\fIregmatch_t\fP for returning captured substrings. It also defines some
constants whose names start with "REG_"; these are used for setting options and
identifying error codes.
.P
.SH "COMPILING A PATTERN"
.rs
.sp
The function \fBregcomp()\fP is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer
to a \fBregex_t\fP structure that is used as a base for storing information
about the compiled regular expression.
.P
The argument \fIcflags\fP is either zero, or contains one or more of the bits
defined by the following macros:
.sp
REG_DOTALL
.sp
The PCRE_DOTALL option is set when the regular expression is passed for
compilation to the native function. Note that REG_DOTALL is not part of the
POSIX standard.
.sp
REG_ICASE
.sp
The PCRE_CASELESS option is set when the regular expression is passed for
compilation to the native function.
.sp
REG_NEWLINE
.sp
The PCRE_MULTILINE option is set when the regular expression is passed for
compilation to the native function. Note that this does \fInot\fP mimic the
defined POSIX behaviour for REG_NEWLINE (see the following section).
.sp
REG_NOSUB
.sp
The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
for compilation to the native function. In addition, when a pattern that is
compiled with this flag is passed to \fBregexec()\fP for matching, the
\fInmatch\fP and \fIpmatch\fP arguments are ignored, and no captured strings
are returned.
.sp
REG_UTF8
.sp
The PCRE_UTF8 option is set when the regular expression is passed for
compilation to the native function. This causes the pattern itself and all data
strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
is not part of the POSIX standard.
.P
In the absence of these flags, no options are passed to the native function.
This means the the regex is compiled with PCRE default semantics. In
particular, the way it handles newline characters in the subject string is the
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
\fIsome\fP of the effects specified for REG_NEWLINE. It does not affect the way
newlines are matched by . (they aren't) or by a negative class such as [^a]
(they are).
.P
The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
\fIpreg\fP structure is filled in on success, and one member of the structure
is public: \fIre_nsub\fP contains the number of capturing subpatterns in
the regular expression. Various error codes are defined in the header file.
.
.
.SH "MATCHING NEWLINE CHARACTERS"
.rs
.sp
This area is not simple, because POSIX and Perl take different views of things.
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
intended to be a POSIX engine. The following table lists the different
possibilities for matching newline characters in PCRE:
.sp
Default Change with
.sp
. matches newline no PCRE_DOTALL
newline matches [^a] yes not changeable
$ matches \en at end yes PCRE_DOLLARENDONLY
$ matches \en in middle no PCRE_MULTILINE
^ matches \en in middle no PCRE_MULTILINE
.sp
This is the equivalent table for POSIX:
.sp
Default Change with
.sp
. matches newline yes REG_NEWLINE
newline matches [^a] yes REG_NEWLINE
$ matches \en at end no REG_NEWLINE
$ matches \en in middle no REG_NEWLINE
^ matches \en in middle no REG_NEWLINE
.sp
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
newline from matching [^a].
.P
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
REG_NEWLINE action.
.
.
.SH "MATCHING A PATTERN"
.rs
.sp
The function \fBregexec()\fP is called to match a compiled pattern \fIpreg\fP
against a given \fIstring\fP, which is terminated by a zero byte, subject to
the options in \fIeflags\fP. These can be:
.sp
REG_NOTBOL
.sp
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
function.
.sp
REG_NOTEOL
.sp
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
function.
.P
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
\fBregexec()\fP are ignored.
.P
Otherwise,the portion of the string that was matched, and also any captured
substrings, are returned via the \fIpmatch\fP argument, which points to an
array of \fInmatch\fP structures of type \fIregmatch_t\fP, containing the
members \fIrm_so\fP and \fIrm_eo\fP. These contain the offset to the first
character of each substring and the offset to the first character after the end
of each substring, respectively. The 0th element of the vector relates to the
entire portion of \fIstring\fP that was matched; subsequent elements relate to
the capturing subpatterns of the regular expression. Unused entries in the
array have both structure members set to -1.
.P
A successful match yields a zero return; various error codes are defined in the
header file, of which REG_NOMATCH is the "expected" failure code.
.
.
.SH "ERROR MESSAGES"
.rs
.sp
The \fBregerror()\fP function maps a non-zero errorcode from either
\fBregcomp()\fP or \fBregexec()\fP to a printable message. If \fIpreg\fP is not
NULL, the error should have arisen from the use of that structure. A message
terminated by a binary zero is placed in \fIerrbuf\fP. The length of the
message, including the zero, is limited to \fIerrbuf_size\fP. The yield of the
function is the size of buffer needed to hold the whole message.
.
.
.SH MEMORY USAGE
.rs
.sp
Compiling a regular expression causes memory to be allocated and associated
with the \fIpreg\fP structure. The function \fBregfree()\fP frees all such
memory, after which \fIpreg\fP may no longer be used as a compiled expression.
.
.
.SH AUTHOR
.rs
.sp
Philip Hazel
.br
University Computing Service,
.br
Cambridge CB2 3QG, England.
.P
.in 0
Last updated: 16 January 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

View File

@ -0,0 +1,131 @@
.TH PCREPRECOMPILE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "SAVING AND RE-USING PRECOMPILED PCRE PATTERNS"
.rs
.sp
If you are running an application that uses a large number of regular
expression patterns, it may be useful to store them in a precompiled form
instead of having to compile them every time the application is run.
If you are not using any private character tables (see the
.\" HREF
\fBpcre_maketables()\fP
.\"
documentation), this is relatively straightforward. If you are using private
tables, it is a little bit more complicated.
.P
If you save compiled patterns to a file, you can copy them to a different host
and run them there. This works even if the new host has the opposite endianness
to the one on which the patterns were compiled. There may be a small
performance penalty, but it should be insignificant.
.
.
.SH "SAVING A COMPILED PATTERN"
.rs
.sh
The value returned by \fBpcre_compile()\fP points to a single block of memory
that holds the compiled pattern and associated data. You can find the length of
this block in bytes by calling \fBpcre_fullinfo()\fP with an argument of
PCRE_INFO_SIZE. You can then save the data in any appropriate manner. Here is
sample code that compiles a pattern and writes it to a file. It assumes that
the variable \fIfd\fP refers to a file that is open for output:
.sp
int erroroffset, rc, size;
char *error;
pcre *re;
.sp
re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
if (re == NULL) { ... handle errors ... }
rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
if (rc < 0) { ... handle errors ... }
rc = fwrite(re, 1, size, fd);
if (rc != size) { ... handle errors ... }
.sp
In this example, the bytes that comprise the compiled pattern are copied
exactly. Note that this is binary data that may contain any of the 256 possible
byte values. On systems that make a distinction between binary and non-binary
data, be sure that the file is opened for binary output.
.P
If you want to write more than one pattern to a file, you will have to devise a
way of separating them. For binary data, preceding each pattern with its length
is probably the most straightforward approach. Another possibility is to write
out the data in hexadecimal instead of binary, one pattern to a line.
.P
Saving compiled patterns in a file is only one possible way of storing them for
later use. They could equally well be saved in a database, or in the memory of
some daemon process that passes them via sockets to the processes that want
them.
.P
If the pattern has been studied, it is also possible to save the study data in
a similar way to the compiled pattern itself. When studying generates
additional information, \fBpcre_study()\fP returns a pointer to a
\fBpcre_extra\fP data block. Its format is defined in the
.\" HTML <a href="pcreapi.html#extradata">
.\" </a>
section on matching a pattern
.\"
in the
.\" HREF
\fBpcreapi\fP
.\"
documentation. The \fIstudy_data\fP field points to the binary study data, and
this is what you must save (not the \fBpcre_extra\fP block itself). The length
of the study data can be obtained by calling \fBpcre_fullinfo()\fP with an
argument of PCRE_INFO_STUDYSIZE. Remember to check that \fBpcre_study()\fP did
return a non-NULL value before trying to save the study data.
.
.
.SH "RE-USING A PRECOMPILED PATTERN"
.rs
.sp
Re-using a precompiled pattern is straightforward. Having reloaded it into main
memory, you pass its pointer to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP in
the usual way. This should work even on another host, and even if that host has
the opposite endianness to the one where the pattern was compiled.
.P
However, if you passed a pointer to custom character tables when the pattern
was compiled (the \fItableptr\fP argument of \fBpcre_compile()\fP), you must
now pass a similar pointer to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP,
because the value saved with the compiled pattern will obviously be nonsense. A
field in a \fBpcre_extra()\fP block is used to pass this data, as described in
the
.\" HTML <a href="pcreapi.html#extradata">
.\" </a>
section on matching a pattern
.\"
in the
.\" HREF
\fBpcreapi\fP
.\"
documentation.
.P
If you did not provide custom character tables when the pattern was compiled,
the pointer in the compiled pattern is NULL, which causes \fBpcre_exec()\fP to
use PCRE's internal tables. Thus, you do not need to take any special action at
run time in this case.
.P
If you saved study data with the compiled pattern, you need to create your own
\fBpcre_extra\fP data block and set the \fIstudy_data\fP field to point to the
reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
\fIflags\fP field to indicate that study data is present. Then pass the
\fBpcre_extra\fP block to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP in the
usual way.
.
.
.SH "COMPATIBILITY WITH DIFFERENT PCRE RELEASES"
.rs
.sp
The layout of the control block that is at the start of the data that makes up
a compiled pattern was changed for release 5.0. If you have any saved patterns
that were compiled with previous releases (not a facility that was previously
advertised), you will have to recompile them for release 5.0. However, from now
on, it should be possible to make changes in a compatible manner.
.P
Notwithstanding the above, if you have any saved patterns in UTF-8 mode that
use \ep or \eP that were compiled with any release up to and including 6.4, you
will have to recompile them for release 6.5 and above.
.P
.in 0
Last updated: 01 February 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

View File

@ -0,0 +1,66 @@
.TH PCRESAMPLE 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE SAMPLE PROGRAM"
.rs
.sp
A simple, complete demonstration program, to get you started with using PCRE,
is supplied in the file \fIpcredemo.c\fP in the PCRE distribution.
.P
The program compiles the regular expression that is its first argument, and
matches it against the subject string in its second argument. No PCRE options
are set, and default character tables are used. If matching succeeds, the
program outputs the portion of the subject that matched, together with the
contents of any captured substrings.
.P
If the -g option is given on the command line, the program then goes on to
check for further matches of the same regular expression in the same subject
string. The logic is a little bit tricky because of the possibility of matching
an empty string. Comments in the code explain what is going on.
.P
If PCRE is installed in the standard include and library directories for your
system, you should be able to compile the demonstration program using this
command:
.sp
gcc -o pcredemo pcredemo.c -lpcre
.sp
If PCRE is installed elsewhere, you may need to add additional options to the
command line. For example, on a Unix-like system that has PCRE installed in
\fI/usr/local\fP, you can compile the demonstration program using a command
like this:
.sp
.\" JOINSH
gcc -o pcredemo -I/usr/local/include pcredemo.c \e
-L/usr/local/lib -lpcre
.sp
Once you have compiled the demonstration program, you can run simple tests like
this:
.sp
./pcredemo 'cat|dog' 'the cat sat on the mat'
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
.sp
Note that there is a much more comprehensive test program, called
.\" HREF
\fBpcretest\fP,
.\"
which supports many more facilities for testing regular expressions and the
PCRE library. The \fBpcredemo\fP program is provided as a simple coding
example.
.P
On some operating systems (e.g. Solaris), when PCRE is not installed in the
standard library directory, you may get an error like this when you try to run
\fBpcredemo\fP:
.sp
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
.sp
This is caused by the way shared library support works on those systems. You
need to add
.sp
-R/usr/local/lib
.sp
(for example) to the compile command to get round this problem.
.P
.in 0
Last updated: 09 September 2004
.br
Copyright (c) 1997-2004 University of Cambridge.

115
libs/pcre/doc/pcrestack.3 Normal file
View File

@ -0,0 +1,115 @@
.TH PCRESTACK 3
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE DISCUSSION OF STACK USAGE"
.rs
.sp
When you call \fBpcre_exec()\fP, it makes use of an internal function called
\fBmatch()\fP. This calls itself recursively at branch points in the pattern,
in order to remember the state of the match so that it can back up and try a
different alternative if the first one fails. As matching proceeds deeper and
deeper into the tree of possibilities, the recursion depth increases.
.P
Not all calls of \fBmatch()\fP increase the recursion depth; for an item such
as a* it may be called several times at the same level, after matching
different numbers of a's. Furthermore, in a number of cases where the result of
the recursive call would immediately be passed back as the result of the
current call (a "tail recursion"), the function is just restarted instead.
.P
The \fBpcre_dfa_exec()\fP function operates in an entirely different way, and
hardly uses recursion at all. The limit on its complexity is the amount of
workspace it is given. The comments that follow do NOT apply to
\fBpcre_dfa_exec()\fP; they are relevant only for \fBpcre_exec()\fP.
.P
You can set limits on the number of times that \fBmatch()\fP is called, both in
total and recursively. If the limit is exceeded, an error occurs. For details,
see the
.\" HTML <a href="pcreapi.html#extradata">
.\" </a>
section on extra data for \fBpcre_exec()\fP
.\"
in the
.\" HREF
\fBpcreapi\fP
.\"
documentation.
.P
Each time that \fBmatch()\fP is actually called recursively, it uses memory
from the process stack. For certain kinds of pattern and data, very large
amounts of stack may be needed, despite the recognition of "tail recursion".
You can often reduce the amount of recursion, and therefore the amount of stack
used, by modifying the pattern that is being matched. Consider, for example,
this pattern:
.sp
([^<]|<(?!inet))+
.sp
It matches from wherever it starts until it encounters "<inet" or the end of
the data, and is the kind of pattern that might be used when processing an XML
file. Each iteration of the outer parentheses matches either one character that
is not "<" or a "<" that is not followed by "inet". However, each time a
parenthesis is processed, a recursion occurs, so this formulation uses a stack
frame for each matched character. For a long string, a lot of stack is
required. Consider now this rewritten pattern, which matches exactly the same
strings:
.sp
([^<]++|<(?!inet))
.sp
This uses very much less stack, because runs of characters that do not contain
"<" are "swallowed" in one item inside the parentheses. Recursion happens only
when a "<" character that is not followed by "inet" is encountered (and we
assume this is relatively rare). A possessive quantifier is used to stop any
backtracking into the runs of non-"<" characters, but that is not related to
stack usage.
.P
In environments where stack memory is constrained, you might want to compile
PCRE to use heap memory instead of stack for remembering back-up points. This
makes it run a lot more slowly, however. Details of how to do this are given in
the
.\" HREF
\fBpcrebuild\fP
.\"
documentation.
.P
In Unix-like environments, there is not often a problem with the stack, though
the default limit on stack size varies from system to system. Values from 8Mb
to 64Mb are common. You can find your default limit by running the command:
.sp
ulimit -s
.sp
The effect of running out of stack is often SIGSEGV, though sometimes an error
message is given. You can normally increase the limit on stack size by code
such as this:
.sp
struct rlimit rlim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur = 100*1024*1024;
setrlimit(RLIMIT_STACK, &rlim);
.sp
This reads the current limits (soft and hard) using \fBgetrlimit()\fP, then
attempts to increase the soft limit to 100Mb using \fBsetrlimit()\fP. You must
do this before calling \fBpcre_exec()\fP.
.P
PCRE has an internal counter that can be used to limit the depth of recursion,
and thus cause \fBpcre_exec()\fP to give an error code before it runs out of
stack. By default, the limit is very large, and unlikely ever to operate. It
can be changed when PCRE is built, and it can also be set when
\fBpcre_exec()\fP is called. For details of these interfaces, see the
.\" HREF
\fBpcrebuild\fP
.\"
and
.\" HREF
\fBpcreapi\fP
.\"
documentation.
.P
As a very rough rule of thumb, you should reckon on about 500 bytes per
recursion. Thus, if you want to limit your stack usage to 8Mb, you
should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
support around 128000 recursions. The \fBpcretest\fP test program has a command
line option (\fB-S\fP) that can be used to increase its stack.
.P
.in 0
Last updated: 29 June 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

631
libs/pcre/doc/pcretest.1 Normal file
View File

@ -0,0 +1,631 @@
.TH PCRETEST 1
.SH NAME
pcretest - a program for testing Perl-compatible regular expressions.
.SH SYNOPSIS
.rs
.sp
.B pcretest "[options] [source] [destination]"
.sp
\fBpcretest\fP was written as a test program for the PCRE regular expression
library itself, but it can also be used for experimenting with regular
expressions. This document describes the features of the test program; for
details of the regular expressions themselves, see the
.\" HREF
\fBpcrepattern\fP
.\"
documentation. For details of the PCRE library function calls and their
options, see the
.\" HREF
\fBpcreapi\fP
.\"
documentation.
.
.
.SH OPTIONS
.rs
.TP 10
\fB-C\fP
Output the version number of the PCRE library, and all available information
about the optional features that are included, and then exit.
.TP 10
\fB-d\fP
Behave as if each regex has the \fB/D\fP (debug) modifier; the internal
form is output after compilation.
.TP 10
\fB-dfa\fP
Behave as if each data line contains the \eD escape sequence; this causes the
alternative matching function, \fBpcre_dfa_exec()\fP, to be used instead of the
standard \fBpcre_exec()\fP function (more detail is given below).
.TP 10
\fB-i\fP
Behave as if each regex has the \fB/I\fP modifier; information about the
compiled pattern is given after compilation.
.TP 10
\fB-m\fP
Output the size of each compiled pattern after it has been compiled. This is
equivalent to adding \fB/M\fP to each regular expression. For compatibility
with earlier versions of pcretest, \fB-s\fP is a synonym for \fB-m\fP.
.TP 10
\fB-o\fP \fIosize\fP
Set the number of elements in the output vector that is used when calling
\fBpcre_exec()\fP to be \fIosize\fP. The default value is 45, which is enough
for 14 capturing subexpressions. The vector size can be changed for individual
matching calls by including \eO in the data line (see below).
.TP 10
\fB-p\fP
Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is
used to call PCRE. None of the other options has any effect when \fB-p\fP is
set.
.TP 10
\fB-q\fP
Do not output the version number of \fBpcretest\fP at the start of execution.
.TP 10
\fB-S\fP \fIsize\fP
On Unix-like systems, set the size of the runtime stack to \fIsize\fP
megabytes.
.TP 10
\fB-t\fP
Run each compile, study, and match many times with a timer, and output
resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with
\fB-t\fP, because you will then get the size output a zillion times, and the
timing will be distorted.
.
.
.SH DESCRIPTION
.rs
.sp
If \fBpcretest\fP is given two filename arguments, it reads from the first and
writes to the second. If it is given only one filename argument, it reads from
that file and writes to stdout. Otherwise, it reads from stdin and writes to
stdout, and prompts for each line of input, using "re>" to prompt for regular
expressions, and "data>" to prompt for data lines.
.P
The program handles any number of sets of input on a single input file. Each
set starts with a regular expression, and continues with any number of data
lines to be matched against the pattern.
.P
Each data line is matched separately and independently. If you want to do
multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
depending on the newline setting) in a single line of input to encode the
newline characters. There is no limit on the length of data lines; the input
buffer is automatically extended if it is too small.
.P
An empty line signals the end of the data lines, at which point a new regular
expression is read. The regular expressions are given enclosed in any
non-alphanumeric delimiters other than backslash, for example:
.sp
/(a|bc)x+yz/
.sp
White space before the initial delimiter is ignored. A regular expression may
be continued over several input lines, in which case the newline characters are
included within it. It is possible to include the delimiter within the pattern
by escaping it, for example
.sp
/abc\e/def/
.sp
If you do so, the escape and the delimiter form part of the pattern, but since
delimiters are always non-alphanumeric, this does not affect its interpretation.
If the terminating delimiter is immediately followed by a backslash, for
example,
.sp
/abc/\e
.sp
then a backslash is added to the end of the pattern. This is done to provide a
way of testing the error condition that arises if a pattern finishes with a
backslash, because
.sp
/abc\e/
.sp
is interpreted as the first line of a pattern that starts with "abc/", causing
pcretest to read the next line as a continuation of the regular expression.
.
.
.SH "PATTERN MODIFIERS"
.rs
.sp
A pattern may be followed by any number of modifiers, which are mostly single
characters. Following Perl usage, these are referred to below as, for example,
"the \fB/i\fP modifier", even though the delimiter of the pattern need not
always be a slash, and no slash is used when writing modifiers. Whitespace may
appear between the final pattern delimiter and the first modifier, and between
the modifiers themselves.
.P
The \fB/i\fP, \fB/m\fP, \fB/s\fP, and \fB/x\fP modifiers set the PCRE_CASELESS,
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
\fBpcre_compile()\fP is called. These four modifier letters have the same
effect as they do in Perl. For example:
.sp
/caseless/i
.sp
The following table shows additional modifiers for setting PCRE options that do
not correspond to anything in Perl:
.sp
\fB/A\fP PCRE_ANCHORED
\fB/C\fP PCRE_AUTO_CALLOUT
\fB/E\fP PCRE_DOLLAR_ENDONLY
\fB/f\fP PCRE_FIRSTLINE
\fB/J\fP PCRE_DUPNAMES
\fB/N\fP PCRE_NO_AUTO_CAPTURE
\fB/U\fP PCRE_UNGREEDY
\fB/X\fP PCRE_EXTRA
\fB/<cr>\fP PCRE_NEWLINE_CR
\fB/<lf>\fP PCRE_NEWLINE_LF
\fB/<crlf>\fP PCRE_NEWLINE_CRLF
.sp
Those specifying line endings are literal strings as shown. Details of the
meanings of these PCRE options are given in the
.\" HREF
\fBpcreapi\fP
.\"
documentation.
.
.
.SS "Finding all matches in a string"
.rs
.sp
Searching for all possible matches within each subject string can be requested
by the \fB/g\fP or \fB/G\fP modifier. After finding a match, PCRE is called
again to search the remainder of the subject string. The difference between
\fB/g\fP and \fB/G\fP is that the former uses the \fIstartoffset\fP argument to
\fBpcre_exec()\fP to start searching at a new point within the entire string
(which is in effect what Perl does), whereas the latter passes over a shortened
substring. This makes a difference to the matching process if the pattern
begins with a lookbehind assertion (including \eb or \eB).
.P
If any call to \fBpcre_exec()\fP in a \fB/g\fP or \fB/G\fP sequence matches an
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
flags set in order to search for another, non-empty, match at the same point.
If this second match fails, the start offset is advanced by one, and the normal
match is retried. This imitates the way Perl handles such cases when using the
\fB/g\fP modifier or the \fBsplit()\fP function.
.
.
.SS "Other modifiers"
.rs
.sp
There are yet more modifiers for controlling the way \fBpcretest\fP
operates.
.P
The \fB/+\fP modifier requests that as well as outputting the substring that
matched the entire pattern, pcretest should in addition output the remainder of
the subject string. This is useful for tests where the subject contains
multiple copies of the same substring.
.P
The \fB/L\fP modifier must be followed directly by the name of a locale, for
example,
.sp
/pattern/Lfr_FR
.sp
For this reason, it must be the last modifier. The given locale is set,
\fBpcre_maketables()\fP is called to build a set of character tables for the
locale, and this is then passed to \fBpcre_compile()\fP when compiling the
regular expression. Without an \fB/L\fP modifier, NULL is passed as the tables
pointer; that is, \fB/L\fP applies only to the expression on which it appears.
.P
The \fB/I\fP modifier requests that \fBpcretest\fP output information about the
compiled pattern (whether it is anchored, has a fixed first character, and
so on). It does this by calling \fBpcre_fullinfo()\fP after compiling a
pattern. If the pattern is studied, the results of that are also output.
.P
The \fB/D\fP modifier is a PCRE debugging feature, which also assumes \fB/I\fP.
It causes the internal form of compiled regular expressions to be output after
compilation. If the pattern was studied, the information returned is also
output.
.P
The \fB/F\fP modifier causes \fBpcretest\fP to flip the byte order of the
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This
facility is for testing the feature in PCRE that allows it to execute patterns
that were compiled on a host with a different endianness. This feature is not
available when the POSIX interface to PCRE is being used, that is, when the
\fB/P\fP pattern modifier is specified. See also the section about saving and
reloading compiled patterns below.
.P
The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the
expression has been compiled, and the results used when the expression is
matched.
.P
The \fB/M\fP modifier causes the size of memory block used to hold the compiled
pattern to be output.
.P
The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper
API rather than its native API. When this is done, all other modifiers except
\fB/i\fP, \fB/m\fP, and \fB/+\fP are ignored. REG_ICASE is set if \fB/i\fP is
present, and REG_NEWLINE is set if \fB/m\fP is present. The wrapper functions
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
.P
The \fB/8\fP modifier causes \fBpcretest\fP to call PCRE with the PCRE_UTF8
option set. This turns on support for UTF-8 character handling in PCRE,
provided that it was compiled with this support enabled. This modifier also
causes any non-printing characters in output strings to be printed using the
\ex{hh...} notation if they are valid UTF-8 sequences.
.P
If the \fB/?\fP modifier is used with \fB/8\fP, it causes \fBpcretest\fP to
call \fBpcre_compile()\fP with the PCRE_NO_UTF8_CHECK option, to suppress the
checking of the string for UTF-8 validity.
.
.
.SH "DATA LINES"
.rs
.sp
Before each data line is passed to \fBpcre_exec()\fP, leading and trailing
whitespace is removed, and it is then scanned for \e escapes. Some of these are
pretty esoteric features, intended for checking out some of the more
complicated features of PCRE. If you are just testing "ordinary" regular
expressions, you probably don't need any of these. The following escapes are
recognized:
.sp
\ea alarm (= BEL)
\eb backspace
\ee escape
\ef formfeed
\en newline
.\" JOIN
\eqdd set the PCRE_MATCH_LIMIT limit to dd
(any number of digits)
\er carriage return
\et tab
\ev vertical tab
\ennn octal character (up to 3 octal digits)
\exhh hexadecimal character (up to 2 hex digits)
.\" JOIN
\ex{hh...} hexadecimal character, any number of digits
in UTF-8 mode
.\" JOIN
\eA pass the PCRE_ANCHORED option to \fBpcre_exec()\fP
or \fBpcre_dfa_exec()\fP
.\" JOIN
\eB pass the PCRE_NOTBOL option to \fBpcre_exec()\fP
or \fBpcre_dfa_exec()\fP
.\" JOIN
\eCdd call pcre_copy_substring() for substring dd
after a successful match (number less than 32)
.\" JOIN
\eCname call pcre_copy_named_substring() for substring
"name" after a successful match (name termin-
ated by next non alphanumeric character)
.\" JOIN
\eC+ show the current captured substrings at callout
time
\eC- do not supply a callout function
.\" JOIN
\eC!n return 1 instead of 0 when callout number n is
reached
.\" JOIN
\eC!n!m return 1 instead of 0 when callout number n is
reached for the nth time
.\" JOIN
\eC*n pass the number n (may be negative) as callout
data; this is used as the callout return value
\eD use the \fBpcre_dfa_exec()\fP match function
\eF only shortest match for \fBpcre_dfa_exec()\fP
.\" JOIN
\eGdd call pcre_get_substring() for substring dd
after a successful match (number less than 32)
.\" JOIN
\eGname call pcre_get_named_substring() for substring
"name" after a successful match (name termin-
ated by next non-alphanumeric character)
.\" JOIN
\eL call pcre_get_substringlist() after a
successful match
.\" JOIN
\eM discover the minimum MATCH_LIMIT and
MATCH_LIMIT_RECURSION settings
.\" JOIN
\eN pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP
or \fBpcre_dfa_exec()\fP
.\" JOIN
\eOdd set the size of the output vector passed to
\fBpcre_exec()\fP to dd (any number of digits)
.\" JOIN
\eP pass the PCRE_PARTIAL option to \fBpcre_exec()\fP
or \fBpcre_dfa_exec()\fP
.\" JOIN
\eQdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd
(any number of digits)
\eR pass the PCRE_DFA_RESTART option to \fBpcre_dfa_exec()\fP
\eS output details of memory get/free calls during matching
.\" JOIN
\eZ pass the PCRE_NOTEOL option to \fBpcre_exec()\fP
or \fBpcre_dfa_exec()\fP
.\" JOIN
\e? pass the PCRE_NO_UTF8_CHECK option to
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP
\e>dd start the match at offset dd (any number of digits);
.\" JOIN
this sets the \fIstartoffset\fP argument for \fBpcre_exec()\fP
or \fBpcre_dfa_exec()\fP
.\" JOIN
\e<cr> pass the PCRE_NEWLINE_CR option to \fBpcre_exec()\fP
or \fBpcre_dfa_exec()\fP
.\" JOIN
\e<lf> pass the PCRE_NEWLINE_LF option to \fBpcre_exec()\fP
or \fBpcre_dfa_exec()\fP
.\" JOIN
\e<crlf> pass the PCRE_NEWLINE_CRLF option to \fBpcre_exec()\fP
or \fBpcre_dfa_exec()\fP
.sp
The escapes that specify line endings are literal strings, exactly as shown.
A backslash followed by anything else just escapes the anything else. If the
very last character is a backslash, it is ignored. This gives a way of passing
an empty line as data, since a real empty line terminates the data input.
.P
If \eM is present, \fBpcretest\fP calls \fBpcre_exec()\fP several times, with
different values in the \fImatch_limit\fP and \fImatch_limit_recursion\fP
fields of the \fBpcre_extra\fP data structure, until it finds the minimum
numbers for each parameter that allow \fBpcre_exec()\fP to complete. The
\fImatch_limit\fP number is a measure of the amount of backtracking that takes
place, and checking it out can be instructive. For most simple matches, the
number is quite small, but for patterns with very large numbers of matching
possibilities, it can become large very quickly with increasing length of
subject string. The \fImatch_limit_recursion\fP number is a measure of how much
stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
to complete the match attempt.
.P
When \eO is used, the value specified may be higher or lower than the size set
by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to
the call of \fBpcre_exec()\fP for the line in which it appears.
.P
If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper
API to be used, the only option-setting sequences that have any effect are \eB
and \eZ, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
\fBregexec()\fP.
.P
The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use
of the \fB/8\fP modifier on the pattern. It is recognized always. There may be
any number of hexadecimal digits inside the braces. The result is from one to
six bytes, encoded according to the UTF-8 rules.
.
.
.SH "THE ALTERNATIVE MATCHING FUNCTION"
.rs
.sp
By default, \fBpcretest\fP uses the standard PCRE matching function,
\fBpcre_exec()\fP to match each data line. From release 6.0, PCRE supports an
alternative matching function, \fBpcre_dfa_test()\fP, which operates in a
different way, and has some restrictions. The differences between the two
functions are described in the
.\" HREF
\fBpcrematching\fP
.\"
documentation.
.P
If a data line contains the \eD escape sequence, or if the command line
contains the \fB-dfa\fP option, the alternative matching function is called.
This function finds all possible matches at a given point. If, however, the \eF
escape sequence is present in the data line, it stops after the first match is
found. This is always the shortest possible match.
.
.
.SH "DEFAULT OUTPUT FROM PCRETEST"
.rs
.sp
This section describes the output when the normal matching function,
\fBpcre_exec()\fP, is being used.
.P
When a match succeeds, pcretest outputs the list of captured substrings that
\fBpcre_exec()\fP returns, starting with number 0 for the string that matched
the whole pattern. Otherwise, it outputs "No match" or "Partial match"
when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
respectively, and otherwise the PCRE negative error number. Here is an example
of an interactive \fBpcretest\fP run.
.sp
$ pcretest
PCRE version 5.00 07-Sep-2004
.sp
re> /^abc(\ed+)/
data> abc123
0: abc123
1: 123
data> xyz
No match
.sp
If the strings contain any non-printing characters, they are output as \e0x
escapes, or as \ex{...} escapes if the \fB/8\fP modifier was present on the
pattern. If the pattern has the \fB/+\fP modifier, the output for substring 0
is followed by the the rest of the subject string, identified by "0+" like
this:
.sp
re> /cat/+
data> cataract
0: cat
0+ aract
.sp
If the pattern has the \fB/g\fP or \fB/G\fP modifier, the results of successive
matching attempts are output in sequence, like this:
.sp
re> /\eBi(\ew\ew)/g
data> Mississippi
0: iss
1: ss
0: iss
1: ss
0: ipp
1: pp
.sp
"No match" is output only if the first match attempt fails.
.P
If any of the sequences \fB\eC\fP, \fB\eG\fP, or \fB\eL\fP are present in a
data line that is successfully matched, the substrings extracted by the
convenience functions are output with C, G, or L after the string number
instead of a colon. This is in addition to the normal full list. The string
length (that is, the return from the extraction function) is given in
parentheses after each string for \fB\eC\fP and \fB\eG\fP.
.P
Note that while patterns can be continued over several lines (a plain ">"
prompt is used for continuations), data lines may not. However newlines can be
included in data by means of the \en escape (or \er or \er\en for those newline
settings).
.
.
.SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION"
.rs
.sp
When the alternative matching function, \fBpcre_dfa_exec()\fP, is used (by
means of the \eD escape sequence or the \fB-dfa\fP command line option), the
output consists of a list of all the matches that start at the first point in
the subject where there is at least one match. For example:
.sp
re> /(tang|tangerine|tan)/
data> yellow tangerine\eD
0: tangerine
1: tang
2: tan
.sp
(Using the normal matching function on this data finds only "tang".) The
longest matching string is always given first (and numbered zero).
.P
If \fB/g\P is present on the pattern, the search for further matches resumes
at the end of the longest match. For example:
.sp
re> /(tang|tangerine|tan)/g
data> yellow tangerine and tangy sultana\eD
0: tangerine
1: tang
2: tan
0: tang
1: tan
0: tan
.sp
Since the matching function does not support substring capture, the escape
sequences that are concerned with captured substrings are not relevant.
.
.
.SH "RESTARTING AFTER A PARTIAL MATCH"
.rs
.sp
When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
indicating that the subject partially matched the pattern, you can restart the
match with additional subject data by means of the \eR escape sequence. For
example:
.sp
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data> 23ja\eP\eD
Partial match: 23ja
data> n05\eR\eD
0: n05
.sp
For further information about partial matching, see the
.\" HREF
\fBpcrepartial\fP
.\"
documentation.
.
.
.SH CALLOUTS
.rs
.sp
If the pattern contains any callout requests, \fBpcretest\fP's callout function
is called during matching. This works with both matching functions. By default,
the called function displays the callout number, the start and current
positions in the text at the callout time, and the next pattern item to be
tested. For example, the output
.sp
--->pqrabcdef
0 ^ ^ \ed
.sp
indicates that callout number 0 occurred for a match attempt starting at the
fourth character of the subject string, when the pointer was at the seventh
character of the data, and when the next pattern item was \ed. Just one
circumflex is output if the start and current positions are the same.
.P
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
result of the \fB/C\fP pattern modifier. In this case, instead of showing the
callout number, the offset in the pattern, preceded by a plus, is output. For
example:
.sp
re> /\ed?[A-E]\e*/C
data> E*
--->E*
+0 ^ \ed?
+3 ^ [A-E]
+8 ^^ \e*
+10 ^ ^
0: E*
.sp
The callout function in \fBpcretest\fP returns zero (carry on matching) by
default, but you can use a \eC item in a data line (as described above) to
change this.
.P
Inserting callouts can be helpful when using \fBpcretest\fP to check
complicated regular expressions. For further information about callouts, see
the
.\" HREF
\fBpcrecallout\fP
.\"
documentation.
.
.
.SH "SAVING AND RELOADING COMPILED PATTERNS"
.rs
.sp
The facilities described in this section are not available when the POSIX
inteface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is
specified.
.P
When the POSIX interface is not in use, you can cause \fBpcretest\fP to write a
compiled pattern to a file, by following the modifiers with > and a file name.
For example:
.sp
/pattern/im >/some/file
.sp
See the
.\" HREF
\fBpcreprecompile\fP
.\"
documentation for a discussion about saving and re-using compiled patterns.
.P
The data that is written is binary. The first eight bytes are the length of the
compiled pattern data followed by the length of the optional study data, each
written as four bytes in big-endian order (most significant byte first). If
there is no study data (either the pattern was not studied, or studying did not
return any data), the second length is zero. The lengths are followed by an
exact copy of the compiled pattern. If there is additional study data, this
follows immediately after the compiled pattern. After writing the file,
\fBpcretest\fP expects to read a new pattern.
.P
A saved pattern can be reloaded into \fBpcretest\fP by specifing < and a file
name instead of a pattern. The name of the file must not contain a < character,
as otherwise \fBpcretest\fP will interpret the line as a pattern delimited by <
characters.
For example:
.sp
re> </some/file
Compiled regex loaded from /some/file
No study data
.sp
When the pattern has been loaded, \fBpcretest\fP proceeds to read data lines in
the usual way.
.P
You can copy a file written by \fBpcretest\fP to a different host and reload it
there, even if the new host has opposite endianness to the one on which the
pattern was compiled. For example, you can compile on an i86 machine and run on
a SPARC machine.
.P
File names for saving and reloading can be absolute or relative, but note that
the shell facility of expanding a file name that starts with a tilde (~) is not
available.
.P
The ability to save and reload files in \fBpcretest\fP is intended for testing
and experimentation. It is not intended for production use because only a
single pattern can be written to a file. Furthermore, there is no facility for
supplying custom character tables for use with a reloaded pattern. If the
original pattern was compiled with custom tables, an attempt to match a subject
string using a reloaded pattern is likely to cause \fBpcretest\fP to crash.
Finally, if you attempt to load a file that is not in the correct format, the
result is undefined.
.
.
.SH AUTHOR
.rs
.sp
Philip Hazel
.br
University Computing Service,
.br
Cambridge CB2 3QG, England.
.P
.in 0
Last updated: 29 June 2006
.br
Copyright (c) 1997-2006 University of Cambridge.

569
libs/pcre/doc/pcretest.txt Normal file
View File

@ -0,0 +1,569 @@
PCRETEST(1) PCRETEST(1)
NAME
pcretest - a program for testing Perl-compatible regular expressions.
SYNOPSIS
pcretest [options] [source] [destination]
pcretest was written as a test program for the PCRE regular expression
library itself, but it can also be used for experimenting with regular
expressions. This document describes the features of the test program;
for details of the regular expressions themselves, see the pcrepattern
documentation. For details of the PCRE library function calls and their
options, see the pcreapi documentation.
OPTIONS
-C Output the version number of the PCRE library, and all avail-
able information about the optional features that are
included, and then exit.
-d Behave as if each regex has the /D (debug) modifier; the
internal form is output after compilation.
-dfa Behave as if each data line contains the \D escape sequence;
this causes the alternative matching function,
pcre_dfa_exec(), to be used instead of the standard
pcre_exec() function (more detail is given below).
-i Behave as if each regex has the /I modifier; information
about the compiled pattern is given after compilation.
-m Output the size of each compiled pattern after it has been
compiled. This is equivalent to adding /M to each regular
expression. For compatibility with earlier versions of
pcretest, -s is a synonym for -m.
-o osize Set the number of elements in the output vector that is used
when calling pcre_exec() to be osize. The default value is
45, which is enough for 14 capturing subexpressions. The vec-
tor size can be changed for individual matching calls by
including \O in the data line (see below).
-p Behave as if each regex has the /P modifier; the POSIX wrap-
per API is used to call PCRE. None of the other options has
any effect when -p is set.
-q Do not output the version number of pcretest at the start of
execution.
-S size On Unix-like systems, set the size of the runtime stack to
size megabytes.
-t Run each compile, study, and match many times with a timer,
and output resulting time per compile or match (in millisec-
onds). Do not set -m with -t, because you will then get the
size output a zillion times, and the timing will be dis-
torted.
DESCRIPTION
If pcretest is given two filename arguments, it reads from the first
and writes to the second. If it is given only one filename argument, it
reads from that file and writes to stdout. Otherwise, it reads from
stdin and writes to stdout, and prompts for each line of input, using
"re>" to prompt for regular expressions, and "data>" to prompt for data
lines.
The program handles any number of sets of input on a single input file.
Each set starts with a regular expression, and continues with any num-
ber of data lines to be matched against the pattern.
Each data line is matched separately and independently. If you want to
do multi-line matches, you have to use the \n escape sequence (or \r or
\r\n, depending on the newline setting) in a single line of input to
encode the newline characters. There is no limit on the length of data
lines; the input buffer is automatically extended if it is too small.
An empty line signals the end of the data lines, at which point a new
regular expression is read. The regular expressions are given enclosed
in any non-alphanumeric delimiters other than backslash, for example:
/(a|bc)x+yz/
White space before the initial delimiter is ignored. A regular expres-
sion may be continued over several input lines, in which case the new-
line characters are included within it. It is possible to include the
delimiter within the pattern by escaping it, for example
/abc\/def/
If you do so, the escape and the delimiter form part of the pattern,
but since delimiters are always non-alphanumeric, this does not affect
its interpretation. If the terminating delimiter is immediately fol-
lowed by a backslash, for example,
/abc/\
then a backslash is added to the end of the pattern. This is done to
provide a way of testing the error condition that arises if a pattern
finishes with a backslash, because
/abc\/
is interpreted as the first line of a pattern that starts with "abc/",
causing pcretest to read the next line as a continuation of the regular
expression.
PATTERN MODIFIERS
A pattern may be followed by any number of modifiers, which are mostly
single characters. Following Perl usage, these are referred to below
as, for example, "the /i modifier", even though the delimiter of the
pattern need not always be a slash, and no slash is used when writing
modifiers. Whitespace may appear between the final pattern delimiter
and the first modifier, and between the modifiers themselves.
The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE,
PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre_com-
pile() is called. These four modifier letters have the same effect as
they do in Perl. For example:
/caseless/i
The following table shows additional modifiers for setting PCRE options
that do not correspond to anything in Perl:
/A PCRE_ANCHORED
/C PCRE_AUTO_CALLOUT
/E PCRE_DOLLAR_ENDONLY
/f PCRE_FIRSTLINE
/J PCRE_DUPNAMES
/N PCRE_NO_AUTO_CAPTURE
/U PCRE_UNGREEDY
/X PCRE_EXTRA
/<cr> PCRE_NEWLINE_CR
/<lf> PCRE_NEWLINE_LF
/<crlf> PCRE_NEWLINE_CRLF
Those specifying line endings are literal strings as shown. Details of
the meanings of these PCRE options are given in the pcreapi documenta-
tion.
Finding all matches in a string
Searching for all possible matches within each subject string can be
requested by the /g or /G modifier. After finding a match, PCRE is
called again to search the remainder of the subject string. The differ-
ence between /g and /G is that the former uses the startoffset argument
to pcre_exec() to start searching at a new point within the entire
string (which is in effect what Perl does), whereas the latter passes
over a shortened substring. This makes a difference to the matching
process if the pattern begins with a lookbehind assertion (including \b
or \B).
If any call to pcre_exec() in a /g or /G sequence matches an empty
string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
flags set in order to search for another, non-empty, match at the same
point. If this second match fails, the start offset is advanced by
one, and the normal match is retried. This imitates the way Perl han-
dles such cases when using the /g modifier or the split() function.
Other modifiers
There are yet more modifiers for controlling the way pcretest operates.
The /+ modifier requests that as well as outputting the substring that
matched the entire pattern, pcretest should in addition output the
remainder of the subject string. This is useful for tests where the
subject contains multiple copies of the same substring.
The /L modifier must be followed directly by the name of a locale, for
example,
/pattern/Lfr_FR
For this reason, it must be the last modifier. The given locale is set,
pcre_maketables() is called to build a set of character tables for the
locale, and this is then passed to pcre_compile() when compiling the
regular expression. Without an /L modifier, NULL is passed as the
tables pointer; that is, /L applies only to the expression on which it
appears.
The /I modifier requests that pcretest output information about the
compiled pattern (whether it is anchored, has a fixed first character,
and so on). It does this by calling pcre_fullinfo() after compiling a
pattern. If the pattern is studied, the results of that are also out-
put.
The /D modifier is a PCRE debugging feature, which also assumes /I. It
causes the internal form of compiled regular expressions to be output
after compilation. If the pattern was studied, the information returned
is also output.
The /F modifier causes pcretest to flip the byte order of the fields in
the compiled pattern that contain 2-byte and 4-byte numbers. This
facility is for testing the feature in PCRE that allows it to execute
patterns that were compiled on a host with a different endianness. This
feature is not available when the POSIX interface to PCRE is being
used, that is, when the /P pattern modifier is specified. See also the
section about saving and reloading compiled patterns below.
The /S modifier causes pcre_study() to be called after the expression
has been compiled, and the results used when the expression is matched.
The /M modifier causes the size of memory block used to hold the com-
piled pattern to be output.
The /P modifier causes pcretest to call PCRE via the POSIX wrapper API
rather than its native API. When this is done, all other modifiers
except /i, /m, and /+ are ignored. REG_ICASE is set if /i is present,
and REG_NEWLINE is set if /m is present. The wrapper functions force
PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option
set. This turns on support for UTF-8 character handling in PCRE, pro-
vided that it was compiled with this support enabled. This modifier
also causes any non-printing characters in output strings to be printed
using the \x{hh...} notation if they are valid UTF-8 sequences.
If the /? modifier is used with /8, it causes pcretest to call
pcre_compile() with the PCRE_NO_UTF8_CHECK option, to suppress the
checking of the string for UTF-8 validity.
DATA LINES
Before each data line is passed to pcre_exec(), leading and trailing
whitespace is removed, and it is then scanned for \ escapes. Some of
these are pretty esoteric features, intended for checking out some of
the more complicated features of PCRE. If you are just testing "ordi-
nary" regular expressions, you probably don't need any of these. The
following escapes are recognized:
\a alarm (= BEL)
\b backspace
\e escape
\f formfeed
\n newline
\qdd set the PCRE_MATCH_LIMIT limit to dd
(any number of digits)
\r carriage return
\t tab
\v vertical tab
\nnn octal character (up to 3 octal digits)
\xhh hexadecimal character (up to 2 hex digits)
\x{hh...} hexadecimal character, any number of digits
in UTF-8 mode
\A pass the PCRE_ANCHORED option to pcre_exec()
or pcre_dfa_exec()
\B pass the PCRE_NOTBOL option to pcre_exec()
or pcre_dfa_exec()
\Cdd call pcre_copy_substring() for substring dd
after a successful match (number less than 32)
\Cname call pcre_copy_named_substring() for substring
"name" after a successful match (name termin-
ated by next non alphanumeric character)
\C+ show the current captured substrings at callout
time
\C- do not supply a callout function
\C!n return 1 instead of 0 when callout number n is
reached
\C!n!m return 1 instead of 0 when callout number n is
reached for the nth time
\C*n pass the number n (may be negative) as callout
data; this is used as the callout return value
\D use the pcre_dfa_exec() match function
\F only shortest match for pcre_dfa_exec()
\Gdd call pcre_get_substring() for substring dd
after a successful match (number less than 32)
\Gname call pcre_get_named_substring() for substring
"name" after a successful match (name termin-
ated by next non-alphanumeric character)
\L call pcre_get_substringlist() after a
successful match
\M discover the minimum MATCH_LIMIT and
MATCH_LIMIT_RECURSION settings
\N pass the PCRE_NOTEMPTY option to pcre_exec()
or pcre_dfa_exec()
\Odd set the size of the output vector passed to
pcre_exec() to dd (any number of digits)
\P pass the PCRE_PARTIAL option to pcre_exec()
or pcre_dfa_exec()
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd
(any number of digits)
\R pass the PCRE_DFA_RESTART option to pcre_dfa_exec()
\S output details of memory get/free calls during matching
\Z pass the PCRE_NOTEOL option to pcre_exec()
or pcre_dfa_exec()
\? pass the PCRE_NO_UTF8_CHECK option to
pcre_exec() or pcre_dfa_exec()
\>dd start the match at offset dd (any number of digits);
this sets the startoffset argument for pcre_exec()
or pcre_dfa_exec()
\<cr> pass the PCRE_NEWLINE_CR option to pcre_exec()
or pcre_dfa_exec()
\<lf> pass the PCRE_NEWLINE_LF option to pcre_exec()
or pcre_dfa_exec()
\<crlf> pass the PCRE_NEWLINE_CRLF option to pcre_exec()
or pcre_dfa_exec()
The escapes that specify line endings are literal strings, exactly as
shown. A backslash followed by anything else just escapes the anything
else. If the very last character is a backslash, it is ignored. This
gives a way of passing an empty line as data, since a real empty line
terminates the data input.
If \M is present, pcretest calls pcre_exec() several times, with dif-
ferent values in the match_limit and match_limit_recursion fields of
the pcre_extra data structure, until it finds the minimum numbers for
each parameter that allow pcre_exec() to complete. The match_limit num-
ber is a measure of the amount of backtracking that takes place, and
checking it out can be instructive. For most simple matches, the number
is quite small, but for patterns with very large numbers of matching
possibilities, it can become large very quickly with increasing length
of subject string. The match_limit_recursion number is a measure of how
much stack (or, if PCRE is compiled with NO_RECURSE, how much heap)
memory is needed to complete the match attempt.
When \O is used, the value specified may be higher or lower than the
size set by the -O command line option (or defaulted to 45); \O applies
only to the call of pcre_exec() for the line in which it appears.
If the /P modifier was present on the pattern, causing the POSIX wrap-
per API to be used, the only option-setting sequences that have any
effect are \B and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively,
to be passed to regexec().
The use of \x{hh...} to represent UTF-8 characters is not dependent on
the use of the /8 modifier on the pattern. It is recognized always.
There may be any number of hexadecimal digits inside the braces. The
result is from one to six bytes, encoded according to the UTF-8 rules.
THE ALTERNATIVE MATCHING FUNCTION
By default, pcretest uses the standard PCRE matching function,
pcre_exec() to match each data line. From release 6.0, PCRE supports an
alternative matching function, pcre_dfa_test(), which operates in a
different way, and has some restrictions. The differences between the
two functions are described in the pcrematching documentation.
If a data line contains the \D escape sequence, or if the command line
contains the -dfa option, the alternative matching function is called.
This function finds all possible matches at a given point. If, however,
the \F escape sequence is present in the data line, it stops after the
first match is found. This is always the shortest possible match.
DEFAULT OUTPUT FROM PCRETEST
This section describes the output when the normal matching function,
pcre_exec(), is being used.
When a match succeeds, pcretest outputs the list of captured substrings
that pcre_exec() returns, starting with number 0 for the string that
matched the whole pattern. Otherwise, it outputs "No match" or "Partial
match" when pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PAR-
TIAL, respectively, and otherwise the PCRE negative error number. Here
is an example of an interactive pcretest run.
$ pcretest
PCRE version 5.00 07-Sep-2004
re> /^abc(\d+)/
data> abc123
0: abc123
1: 123
data> xyz
No match
If the strings contain any non-printing characters, they are output as
\0x escapes, or as \x{...} escapes if the /8 modifier was present on
the pattern. If the pattern has the /+ modifier, the output for sub-
string 0 is followed by the the rest of the subject string, identified
by "0+" like this:
re> /cat/+
data> cataract
0: cat
0+ aract
If the pattern has the /g or /G modifier, the results of successive
matching attempts are output in sequence, like this:
re> /\Bi(\w\w)/g
data> Mississippi
0: iss
1: ss
0: iss
1: ss
0: ipp
1: pp
"No match" is output only if the first match attempt fails.
If any of the sequences \C, \G, or \L are present in a data line that
is successfully matched, the substrings extracted by the convenience
functions are output with C, G, or L after the string number instead of
a colon. This is in addition to the normal full list. The string length
(that is, the return from the extraction function) is given in paren-
theses after each string for \C and \G.
Note that while patterns can be continued over several lines (a plain
">" prompt is used for continuations), data lines may not. However new-
lines can be included in data by means of the \n escape (or \r or \r\n
for those newline settings).
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
When the alternative matching function, pcre_dfa_exec(), is used (by
means of the \D escape sequence or the -dfa command line option), the
output consists of a list of all the matches that start at the first
point in the subject where there is at least one match. For example:
re> /(tang|tangerine|tan)/
data> yellow tangerine\D
0: tangerine
1: tang
2: tan
(Using the normal matching function on this data finds only "tang".)
The longest matching string is always given first (and numbered zero).
If /gP is present on the pattern, the search for further matches
resumes at the end of the longest match. For example:
re> /(tang|tangerine|tan)/g
data> yellow tangerine and tangy sultana\D
0: tangerine
1: tang
2: tan
0: tang
1: tan
0: tan
Since the matching function does not support substring capture, the
escape sequences that are concerned with captured substrings are not
relevant.
RESTARTING AFTER A PARTIAL MATCH
When the alternative matching function has given the PCRE_ERROR_PARTIAL
return, indicating that the subject partially matched the pattern, you
can restart the match with additional subject data by means of the \R
escape sequence. For example:
re> /^?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/
data> 23ja\P\D
Partial match: 23ja
data> n05\R\D
0: n05
For further information about partial matching, see the pcrepartial
documentation.
CALLOUTS
If the pattern contains any callout requests, pcretest's callout func-
tion is called during matching. This works with both matching func-
tions. By default, the called function displays the callout number, the
start and current positions in the text at the callout time, and the
next pattern item to be tested. For example, the output
--->pqrabcdef
0 ^ ^ \d
indicates that callout number 0 occurred for a match attempt starting
at the fourth character of the subject string, when the pointer was at
the seventh character of the data, and when the next pattern item was
\d. Just one circumflex is output if the start and current positions
are the same.
Callouts numbered 255 are assumed to be automatic callouts, inserted as
a result of the /C pattern modifier. In this case, instead of showing
the callout number, the offset in the pattern, preceded by a plus, is
output. For example:
re> /\d?[A-E]\*/C
data> E*
--->E*
+0 ^ \d?
+3 ^ [A-E]
+8 ^^ \*
+10 ^ ^
0: E*
The callout function in pcretest returns zero (carry on matching) by
default, but you can use a \C item in a data line (as described above)
to change this.
Inserting callouts can be helpful when using pcretest to check compli-
cated regular expressions. For further information about callouts, see
the pcrecallout documentation.
SAVING AND RELOADING COMPILED PATTERNS
The facilities described in this section are not available when the
POSIX inteface to PCRE is being used, that is, when the /P pattern mod-
ifier is specified.
When the POSIX interface is not in use, you can cause pcretest to write
a compiled pattern to a file, by following the modifiers with > and a
file name. For example:
/pattern/im >/some/file
See the pcreprecompile documentation for a discussion about saving and
re-using compiled patterns.
The data that is written is binary. The first eight bytes are the
length of the compiled pattern data followed by the length of the
optional study data, each written as four bytes in big-endian order
(most significant byte first). If there is no study data (either the
pattern was not studied, or studying did not return any data), the sec-
ond length is zero. The lengths are followed by an exact copy of the
compiled pattern. If there is additional study data, this follows imme-
diately after the compiled pattern. After writing the file, pcretest
expects to read a new pattern.
A saved pattern can be reloaded into pcretest by specifing < and a file
name instead of a pattern. The name of the file must not contain a <
character, as otherwise pcretest will interpret the line as a pattern
delimited by < characters. For example:
re> </some/file
Compiled regex loaded from /some/file
No study data
When the pattern has been loaded, pcretest proceeds to read data lines
in the usual way.
You can copy a file written by pcretest to a different host and reload
it there, even if the new host has opposite endianness to the one on
which the pattern was compiled. For example, you can compile on an i86
machine and run on a SPARC machine.
File names for saving and reloading can be absolute or relative, but
note that the shell facility of expanding a file name that starts with
a tilde (~) is not available.
The ability to save and reload files in pcretest is intended for test-
ing and experimentation. It is not intended for production use because
only a single pattern can be written to a file. Furthermore, there is
no facility for supplying custom character tables for use with a
reloaded pattern. If the original pattern was compiled with custom
tables, an attempt to match a subject string using a reloaded pattern
is likely to cause pcretest to crash. Finally, if you attempt to load
a file that is not in the correct format, the result is undefined.
AUTHOR
Philip Hazel
University Computing Service,
Cambridge CB2 3QG, England.
Last updated: 29 June 2006
Copyright (c) 1997-2006 University of Cambridge.

View File

@ -0,0 +1,33 @@
The perltest program
--------------------
The perltest program tests Perl's regular expressions; it has the same
specification as pcretest, and so can be given identical input, except that
input patterns can be followed only by Perl's lower case modifiers and /+ (as
used by pcretest), which is recognized and handled by the program.
The data lines are processed as Perl double-quoted strings, so if they contain
" $ or @ characters, these have to be escaped. For this reason, all such
characters in testinput1 and testinput4 are escaped so that they can be used
for perltest as well as for pcretest. The special upper case pattern
modifiers such as /A that pcretest recognizes, and its special data line
escapes, are not used in these files. The output should be identical, apart
from the initial identifying banner.
The perltest script can also test UTF-8 features. It works as is for Perl 5.8
or higher. It recognizes the special modifier /8 that pcretest uses to invoke
UTF-8 functionality. The testinput4 file can be fed to perltest to run
compatible UTF-8 tests.
For Perl 5.6, perltest won't work unmodified for the UTF-8 tests. You need to
uncomment the "use utf8" lines that it contains. It is best to do this on a
copy of the script, because for non-UTF-8 tests, these lines should remain
commented out.
The other testinput files are not suitable for feeding to perltest, since they
make use of the special upper case modifiers and escapes that pcretest uses to
test some features of PCRE. Some of these files also contains malformed regular
expressions, in order to check that PCRE diagnoses them correctly.
Philip Hazel
September 2004

251
libs/pcre/install-sh Executable file
View File

@ -0,0 +1,251 @@
#!/bin/sh
#
# install - install a program, script, or datafile
# This comes from X11R5 (mit/util/scripts/install.sh).
#
# Copyright 1991 by the Massachusetts Institute of Technology
#
# Permission to use, copy, modify, distribute, and sell this software and its
# documentation for any purpose is hereby granted without fee, provided that
# the above copyright notice appear in all copies and that both that
# copyright notice and this permission notice appear in supporting
# documentation, and that the name of M.I.T. not be used in advertising or
# publicity pertaining to distribution of the software without specific,
# written prior permission. M.I.T. makes no representations about the
# suitability of this software for any purpose. It is provided "as is"
# without express or implied warranty.
#
# Calling this script install-sh is preferred over install.sh, to prevent
# `make' implicit rules from creating a file called install from it
# when there is no Makefile.
#
# This script is compatible with the BSD install script, but was written
# from scratch. It can only install one file at a time, a restriction
# shared with many OS's install programs.
# set DOITPROG to echo to test this script
# Don't use :- since 4.3BSD and earlier shells don't like it.
doit="${DOITPROG-}"
# put in absolute paths if you don't have them in your path; or use env. vars.
mvprog="${MVPROG-mv}"
cpprog="${CPPROG-cp}"
chmodprog="${CHMODPROG-chmod}"
chownprog="${CHOWNPROG-chown}"
chgrpprog="${CHGRPPROG-chgrp}"
stripprog="${STRIPPROG-strip}"
rmprog="${RMPROG-rm}"
mkdirprog="${MKDIRPROG-mkdir}"
transformbasename=""
transform_arg=""
instcmd="$mvprog"
chmodcmd="$chmodprog 0755"
chowncmd=""
chgrpcmd=""
stripcmd=""
rmcmd="$rmprog -f"
mvcmd="$mvprog"
src=""
dst=""
dir_arg=""
while [ x"$1" != x ]; do
case $1 in
-c) instcmd="$cpprog"
shift
continue;;
-d) dir_arg=true
shift
continue;;
-m) chmodcmd="$chmodprog $2"
shift
shift
continue;;
-o) chowncmd="$chownprog $2"
shift
shift
continue;;
-g) chgrpcmd="$chgrpprog $2"
shift
shift
continue;;
-s) stripcmd="$stripprog"
shift
continue;;
-t=*) transformarg=`echo $1 | sed 's/-t=//'`
shift
continue;;
-b=*) transformbasename=`echo $1 | sed 's/-b=//'`
shift
continue;;
*) if [ x"$src" = x ]
then
src=$1
else
# this colon is to work around a 386BSD /bin/sh bug
:
dst=$1
fi
shift
continue;;
esac
done
if [ x"$src" = x ]
then
echo "install: no input file specified"
exit 1
else
true
fi
if [ x"$dir_arg" != x ]; then
dst=$src
src=""
if [ -d $dst ]; then
instcmd=:
chmodcmd=""
else
instcmd=mkdir
fi
else
# Waiting for this to be detected by the "$instcmd $src $dsttmp" command
# might cause directories to be created, which would be especially bad
# if $src (and thus $dsttmp) contains '*'.
if [ -f $src -o -d $src ]
then
true
else
echo "install: $src does not exist"
exit 1
fi
if [ x"$dst" = x ]
then
echo "install: no destination specified"
exit 1
else
true
fi
# If destination is a directory, append the input filename; if your system
# does not like double slashes in filenames, you may need to add some logic
if [ -d $dst ]
then
dst="$dst"/`basename $src`
else
true
fi
fi
## this sed command emulates the dirname command
dstdir=`echo $dst | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`
# Make sure that the destination directory exists.
# this part is taken from Noah Friedman's mkinstalldirs script
# Skip lots of stat calls in the usual case.
if [ ! -d "$dstdir" ]; then
defaultIFS='
'
IFS="${IFS-${defaultIFS}}"
oIFS="${IFS}"
# Some sh's can't handle IFS=/ for some reason.
IFS='%'
set - `echo ${dstdir} | sed -e 's@/@%@g' -e 's@^%@/@'`
IFS="${oIFS}"
pathcomp=''
while [ $# -ne 0 ] ; do
pathcomp="${pathcomp}${1}"
shift
if [ ! -d "${pathcomp}" ] ;
then
$mkdirprog "${pathcomp}"
else
true
fi
pathcomp="${pathcomp}/"
done
fi
if [ x"$dir_arg" != x ]
then
$doit $instcmd $dst &&
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dst; else true ; fi &&
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dst; else true ; fi &&
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dst; else true ; fi &&
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dst; else true ; fi
else
# If we're going to rename the final executable, determine the name now.
if [ x"$transformarg" = x ]
then
dstfile=`basename $dst`
else
dstfile=`basename $dst $transformbasename |
sed $transformarg`$transformbasename
fi
# don't allow the sed command to completely eliminate the filename
if [ x"$dstfile" = x ]
then
dstfile=`basename $dst`
else
true
fi
# Make a temp file name in the proper directory.
dsttmp=$dstdir/#inst.$$#
# Move or copy the file name to the temp name
$doit $instcmd $src $dsttmp &&
trap "rm -f ${dsttmp}" 0 &&
# and set any options; do chmod last to preserve setuid bits
# If any of these fail, we abort the whole thing. If we want to
# ignore errors from any of these, just make sure not to ignore
# errors from the above "$doit $instcmd $src $dsttmp" command.
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dsttmp; else true;fi &&
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dsttmp; else true;fi &&
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dsttmp; else true;fi &&
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dsttmp; else true;fi &&
# Now rename the file to the real destination.
$doit $rmcmd -f $dstdir/$dstfile &&
$doit $mvcmd $dsttmp $dstdir/$dstfile
fi &&
exit 0

20
libs/pcre/libpcre.def Normal file
View File

@ -0,0 +1,20 @@
LIBRARY libpcre
EXPORTS
pcre_malloc
pcre_free
pcre_config
pcre_callout
pcre_compile
pcre_copy_substring
pcre_dfa_exec
pcre_exec
pcre_get_substring
pcre_get_stringnumber
pcre_get_substring_list
pcre_free_substring
pcre_free_substring_list
pcre_info
pcre_fullinfo
pcre_maketables
pcre_study
pcre_version

12
libs/pcre/libpcre.pc.in Normal file
View File

@ -0,0 +1,12 @@
# Package Information for pkg-config
prefix=@prefix@
exec_prefix=@exec_prefix@
libdir=@libdir@
includedir=@includedir@
Name: libpcre
Description: PCRE - Perl compatible regular expressions C library
Version: @PCRE_VERSION@
Libs: -L${libdir} -lpcre
Cflags: -I${includedir}

View File

@ -0,0 +1,25 @@
LIBRARY libpcreposix
EXPORTS
pcre_malloc
pcre_free
pcre_config
pcre_callout
pcre_compile
pcre_copy_substring
pcre_dfa_exec
pcre_exec
pcre_get_substring
pcre_get_stringnumber
pcre_get_substring_list
pcre_free_substring
pcre_free_substring_list
pcre_info
pcre_fullinfo
pcre_maketables
pcre_study
pcre_version
regcomp
regexec
regerror
regfree

6971
libs/pcre/ltmain.sh Normal file

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More