[FIXED] Regex capture multi-line groups

Issue

I’m struggling in creating a regex to capture what’s included between two keywords in a multi-line file.

In particular, consider the following file:

#%META
# date: 2022-08-27
# generated-by: Me
# id: 1
#%ENDS

#%BODY
....
#%ENDS

#%META
# date: 2022-08-27
# generated-by: Another Me
# id: 2
#%ENDS

#%BODY
....
#%ENDS

I wanted to parse what is included between the #%META and the #%ENDS keywords, if possible, without the leading #, i.e., the desired result is to capture both:

date: 2022-08-27
generated-by: Me
id: 1

and

date: 2022-08-27
generated-by: Another Me
id: 2

I come out with following regex: (?<=#%META\n)([\S\s]*?)(?=#%ENDS\n).

However this is not capable to identify the two chuncks of text to be matched as well as does not remove the leading #.

Could anyone help in that?

Thank’s a lot! 🙂

Solution

You might use a pattern to first capture all the parts between #%META and #%ENDS and then after process the capture group 1 values removing the leading # followed by optional spaces.

^#%META((?>\R(?!#%(?:META|ENDS)$).*)+)\R#%ENDS$

Explanation

  • ^ Start of string
  • #%META Match literally
  • ( Capture group 1
    • (?> Atomic group
      • \R Match any unicode newline sequence
      • (?!#%(?:META|ENDS)$) Negative lookahead, assert that the line is not #%META or #%ENDS
      • .* Match the whole line
    • )+ Close the atomic group and repeat 1+ times
  • ) Close group 1
  • \R Match any unicode newline sequence
  • #%ENDS Match literally
  • $ End of string

Regex demo | PHP demo

Example

$re = '/^#%META((?>\R(?!#%(?:META|ENDS)$).*)+)\R#%ENDS$/m';
$str = '#%META
# date: 2022-08-27
# generated-by: Me
# id: 1
#%ENDS

#%BODY
....
#%ENDS

#%META
# date: 2022-08-27
# generated-by: Another Me
# id: 2
#%ENDS

#%BODY
....
#%ENDS';

if (preg_match_all($re, $str, $matches)) {
    $result = array_map(function ($s) {
        return preg_replace("/^#\h*/m", "", trim($s));
    }, $matches[1]);
    var_export($result);
}

Output

array (
  0 => 'date: 2022-08-27
generated-by: Me
id: 1',
  1 => 'date: 2022-08-27
generated-by: Another Me
id: 2',
)

Answered By – The fourth bird

Answer Checked By – Terry (Easybugfix Volunteer)

Leave a Reply

(*) Required, Your email will not be published