[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]

simple word wrap



I wanted a simple word wrap in a single regexp, something I have tried
before and found frustratingly difficult.

This time I eventually achieved the goal:

    s{
        \G              # begin where previous match left off
        ([\d\D]*?)      # consume short lines
        (?:(?<=^) | \G) # and pick up at the beginning of a line,
                        # or just after the previous replaced space
        (.{1,79})       # match as many characters on this line as fit
        \ +             # followed by spaces
        (?=(\S+))       # followed by (unconsumed) nonspace
    }{ (length($2) + length($3) >= 79) ? "$1$2\n" : "$1$2 " }mexg;

In the process, I chased a long way down the garden path (reaching
(?p{ my $p = 80 - length $2; "(?=\\S{$p})" }) before turning back) until
I realised that while '((?=\S+))' wouldn't save anything, '(?=(\S+))'
would: it might be worth a mention of this in the docs or amongst the
examples.

I also wished for a couple of other extensions in the process:

1) pos() already set and modifiable at the point of replacement; I got
weird results (not to my great surprise) when I tried this:

    s{
        \G              # begin where previous match left off
        ([\d\D]*?)      # consume short lines
        (?:(?<=^) | \G) # and pick up at the beginning of a line,
                        # or just after the previous replaced space
        (.{1,79})       # match as many characters on this line as fit
        \ +             # followed by spaces
        (\S+)           # followed by nonspace, to be unconsumed
    }{
        warn "pos: ", pos(), "\n";
        pos() -= length($3);
        $1 . $2 . ((length($2) + length($3) > 79) ? "\n" : "")
    }mxge;

2) a facility to step through s///g one at a time, as with m//g in a
scalar context: this would have let me do something like

  pos() -= length($3) while s///g;

3) a facility to examine the modified rather than the original string
in later matches of a s///g: this might have allowed me to make the
'(?:(?<=^) | \G)' clause clearer.

Happy new year and all that,

Hugo


[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]