0

The title says it, really. Basically I am trying to condense an enormous log file.

Notepad++ and Regex (I know a small bit) can delete these repetitive lines, but the problem is, I don't want them all deleted. I want one instance to remain in order to preserve the structure/order of the log messages.

I've googled many an answer but I only seem to get results like this. The problem being that I am not just trying to replace or exclude lines.

At this point I'd guess Regex is more likely to hold an answer, but I'm still at that stage where I don't know what tools are available.

Edit:

Example of a messages that I have thousands of, but only need to see one of: (I see tons of these, because every scsi device wants to chip in its own message. I only need to see that it's happening, not that it's happening to each of them).

multipathd[4893]: 3600a098000badf6800005dfe5a8cd2cd: sdie - rdac checker reports path is down: ctlr is in startup sequence multipathd[4893]: 3600a098000badf6800005def5a8cd273: sdgq - rdac checker reports path is down: ctlr is in startup sequence multipathd[4893]: 3600a098000badf6800005df05a8cd27b: sdeq - rdac checker reports path is down: ctlr is in startup sequence multipathd[4893]: 3600a098000bae10c00005df55a8cd2ec: sdgw - rdac checker reports path is down: ctlr is in startup sequence multipathd[4893]: 3600a098000bae10c00005df05a8cd2c2: sdfk - rdac checker reports path is down: ctlr is in startup sequence multipathd[4893]: 3600a098000bae10c00005dec5a8cd2a3: sdgm - rdac checker reports path is down: ctlr is in startup sequence multipathd[4893]: 3600a098000badf6800005df35a8cd292: sdfo - rdac checker reports path is down: ctlr is in startup sequence

But I want to see just

rdac checker reports path is down: ctlr is in startup sequence

1 Answers1

2

If multiple instances are consecutive, you can do:

Update according to new request:

  • Ctrl+H
  • Find what: ^([^-]+- )(.+)(?:\R(?1)\2)+
  • Replace with: $2
  • check Wrap around
  • check Regular expression
  • DO NOT CHECK . matches newline
  • Replace all

Explanation:

^           : beginning of line
  (         : start group 1
    [^-]+-  : 1 or more NOT dash,then a dash and a space
  )         : end group 1
  (         : start group 2
    .+      : 1 or more any character
  )         : end group 2
  (?:       : start non capture group 
    \R      : any kind of linebreak
    (?1)    : same pattern than group 1 (ie. "[^-]+- ")
    \2      : backreference to group 2
  )+        : end non capture group, must appears 1 or more times.

Replacement:

$2      : content of group 2

Result for given example:

rdac checker reports path is down: ctlr is in startup sequence

If the multiples instances are not consecutive, you'd better to write a script in your favorite scripting language.

Here is a perl one-liner that does the job:

perl -aE 'chomp;(undef,$x)=split(/-/,$_);next if exists $s{$x};$s{$x}=1;say$x' inputfile
Toto
  • 19,304