[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]

Re: [PATCH 5.005_63] utf8 REx botches



On Thu, 06 Jan 2000 20:49:17 EST, Ilya Zakharevich wrote:
>On Thu, Jan 06, 2000 at 04:33:51PM -0800, Gurusamy Sarathy wrote:
>> Thanks.  Did you know op/regexp.t#60 still fails with the patch?
>> 
>>     % ./perl -I../lib -Mutf8 op/regexp.t
>>     [...]
>>     ok 59
>>     not ok 60 () \ba\b:-a:y:-:- => `-', match=
>>     ok 61
>>     [...]
>
>Not here.  Did you do make test?

    % make utest
    [...]
    op/regexp.........FAILED at test 60
    [...]

(`make test` is fine.)

>Here is the debugging output:
>
>./perl -Ilib -Mre=debug -wle '"-a" =~ /\ba\b/ or die'
>Compiling REx `\ba\b'
>size 5 first at 1
>   1: BOUND(2)
>   2: EXACT <a>(4)
>   4: BOUND(5)
>   5: END(0)
>anchored `a' at 0 (checking anchored) stclass `BOUND' minlen 1
>Guessing start of match, REx `\ba\b' against `-a'...
>Found anchored substr `a' at offset 1...
>Does not contradict STCLASS...
>Guessed: match at offset 1
>Matching REx `\ba\b' against `a'
>  Setting an EVAL scope, savestack=3
>   1 <-> <a>              |  1:  BOUND
>   1 <-> <a>              |  2:  EXACT <a>
>   2 <-a> <>              |  4:  BOUND
>   2 <-a> <>              |  5:  END
>Match successful!
>Freeing REx: `\ba\b'

After putting just a single line (test 60) in re_tests, I see this:

    % ./perl -I../lib -Mutf8 -Mre=debug op/regexp.t
    [...]
    Compiling REx `\ba\b'
    size 5 first at 1
    rarest char a at 0
       1: BOUNDUTF8(2)
       2: EXACT <a>(4)
       4: BOUNDUTF8(5)
       5: END(0)
    anchored `a' at 0 (checking anchored) stclass `BOUNDUTF8' minlen 1 
    Guessing start of match, REx `\ba\b' against `-a'...
    Found anchored substr `a' at offset 1...
    Could not match STCLASS...
    Match rejected by optimizer
    Freeing REx: `\ba\b'
    not ok 1 () \ba\b:-a:y:-:- => `-', match=
    [...]

And what's more, the lib/complex.t failure does seem to involve REs.
I haven't looked closely, but lib/complex.t does various substitutions
of the form C<s/\bz\b/\$s4/g> on the DATA, but that doesn't seem to
work right somehow under utf8.  Dumping out the contents of the main
eval'' in lib/complex.t, the differences are all of the form:

    --- /tmp/right    Thu Jan  6 16:35:16 2000
    +++ /tmp/wrong    Thu Jan  6 16:35:28 2000
    @@ -742,7 +742,7 @@
     $z1 = 2*Re($s3);
     $res = abs($z0 - $z1) <= 1e-13 ? $z1 : $z0; check(149, 'z + ~z', $res, $z1, '(0,2)');
     $z0 = $s4 + ~$s4;
    -$z1 = 2*Re($s4);
    +$z1 = 2*Re(z);
     $res = abs($z0 - $z1) <= 1e-13 ? $z1 : $z0; check(150, 'z + ~z', $res, $z1, '[2,1] ');
     $z0 = $s0 - ~$s0;
     $z1 = 2*i*Im($s0);
     [...etc...]

I've attached my config info at the end.  If you still can't replicate it,
you might want to sync to what I've got by doing something like the
following:

    % tar zxf perl5.005_63.tar.gz
    % cd perl5.005_63
    % find . -name \* -type f -print | xargs perl -0777 -pi -e 's/\r$//mg'
    % zcat ~/gsar/APC/diffs/*.gz | patch -p1 -N | less
    % patch -p1 -N < 5.005_63_utf8_REx_botches
    % (./Configure -ders -Dusemultiplicity -Doptimize=-g && make all utest) >& log &

~/gsar/APC/ is the same as ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/.


Sarathy
gsar@ActiveState.com

Summary of my perl5 (revision 5.0 version 5 subversion 640) configuration:
  Platform:
    osname=linux, osvers=2.0.36, archname=i686-linux-multi
    uname='linux auger 2.0.36 #1 tue mar 30 13:15:01 pst 1999 i686 unknown '
    config_args='-ders'
    hint=previous, useposix=true, d_sigaction=define
    usethreads=undef useperlio=undef d_sfio=undef
    use64bits=undef usemultiplicity=define
  Compiler:
    cc='cc', optimize='-g', gccversion=2.7.2.3
    cppflags='-Dbool=char -DHAS_BOOL -DDEBUGGING -I/usr/local/include'
    ccflags ='-Dbool=char -DHAS_BOOL -DDEBUGGING -I/usr/local/include'
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldbm -ldb -ldl -lm -lc -lposix -lcrypt
    libc=/lib/libc-2.0.7.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'


Characteristics of this binary (from libperl): 
  Compile-time options: DEBUGGING MULTIPLICITY PERL_IMPLICIT_CONTEXT
  Built under linux
  Compiled at Jan  5 2000 18:02:08
  @INC:
    lib
    /tmp/gsperl/lib/5.00563/i686-linux-multi
    /tmp/gsperl/lib/5.00563
    /tmp/gsperl/lib/site_perl/5.00563/i686-linux-multi
    /tmp/gsperl/lib/perl5/site_perl
    .


Follow-Ups from:
Ilya Zakharevich <ilya@math.ohio-state.edu>
References to:
Ilya Zakharevich <ilya@math.ohio-state.edu>

[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]