1

The circumstances that lead to this request are not hugely important (read: please don't give me pointers on how to avoid this situation, I have tried them) but I have an engine producing text files. As an example, these files normally look like this:

ENTRY: XYZ
COMMENT: This is a comment
ENTRY: 123
INTEGER: 4

Sometimes, however, the data we process contains line breaks, so the text files look like this:

ENTRY: XYZ
COMMENT: This is a comment
that spans over multiple lines
just to make life difficult
ENTRY: 123

What I'm looking for is some PowerShell that can process an entire text file and say, "for lines not containing a : character, make this line part of the line before it, potentially replacing the break with :: or something to make it clear". The sought end result would be:

ENTRY: XYZ
COMMENT: This is a comment :: that spans over multiple lines :: just to make life difficult
ENTRY: 123

So far I've been using Get-Content | % {$_ etc}, but the % splits things into individual lines. I don't believe % or ? has any sentience of context so it's not going to be possible to say "where object match xyz make it part of the last object".

I spent a long time trying to use line numbers. My pseudo-code was:

  • while the document contains lines that do not contain a colon, get the line number of the first line not containing a colon
  • set the previous line number to contain both its data and the data of the offending line

However, the former action would upset the number of lines, meaning every time I made this adjustment I'd need to recalculate the individual line numbers. Add to this that the "while the document contains lines that don't start with a colon" can potentially be a very system-intensive process (these documents can be very large) and it's a recipe for disaster.

Destroy666
  • 12,350
seagull
  • 6,216
  • 10
  • 33
  • 42

1 Answers1

1

Here's an example of a regex-based solution:

(Get-Content -Raw test.txt) -Replace '\n([^\n:]+)(?=\n)', ' :: $1' | Out-File test.txt

First you load the whole file with -Raw flag. Then the regex matches a new line (\n), characters that aren't newline or : captured in a group ([^\n:]+) and lastly we have a positive lookahead for a newline (?=\n) that makes sure to stop at it but skips it for the full match so that next \n is matched for next line. Then we have a proper replacement using the captured group and output it to the same file.

Destroy666
  • 12,350