0

How can I sort a text according to hashtag on Windows-7?

I have a long text (.txt format) which looks something like this:

  • Blah blah #Test
  • 123123 #Really
  • Blah bluh #Really
  • klfdmngl #Test

I would like to conveniently, quickly and automatically be able to sort the text so that it looks like this:

  • Blah blah #Test
  • klfdmngl #Test
  • 123123 #Really
  • Blah bluh #Really

I have to do this on a daily basis so I would like to be able to do it in as few steps as possible.

karel
  • 13,706

3 Answers3

1

Here's a final powershell solution that will deal with new lines. The delimiter is assumed to be a hashtag followed by word characters followed by {EOL}. Given a line of data with no hash tag, it is assumed that the data continues on to the next line. The other information below this section of my answer does not deal with the special case mentioned by the author where data crosses a newline boundary. This example assumes the file is called test.txt and is found in the current directory.

[string[]]$fileContent = (get-content .\test.txt);
[string]$linebuffer = '';

[object]$fixedFile = foreach($line in $fileContent) {
    if(-not ($line -match "#\w+$")) {
        $linebuffer += ($line + ' ');
        continue;
    }

    $linebuffer += $line;
    $linebuffer;
    $linebuffer = '';
}

($fixedFile -replace '^(.*)\ (#.*)$', '$2 $1' | Sort-Object) -replace '^(#\w+)\ (.*)$','$2 $1' | out-file test.txt -encoding ascii

Use gVim in Windows or MacVim on OS X.

NOTE: Vim is an editor with 2 modes. Insert/Edit mode and Command mode. To actually edit text like a normal editor, you must be in edit mode which requires pressing a key like a or i. The editor will start in command mode. When in command mode, you can just start by typing a colon to enter these commands.

:%s/^\(.*\)\ \(\#\w\+\)$/\2\ \1/g
:sort
:%s/^\(\#\w\+\)\ \(.*\)$/\2\ \1/g

The first command swaps the hashtag at the end of the line to the beginning of the line. The second command sorts the data and the third command undoes the swap and moves the hashtag back to the end of the line.

I've tested this on your sample and it works.


@Oliver_Salzburg provided a much easier answer with Excel in comments. I didn't think outside the box and provided an answer with a text-editor.

Step 1: Replace # with ,# Step 2: Import as CSV into Excel or similar application. – Oliver Salzburg♦


Here's a solution using only Powershell that can be done natively on Win7. I still haven't had a chance to read up on traversing line breaks, so this solution does not account for those.

This example assumes that the file you're working with is test.txt.

$tempstor = (get-content test.txt) -replace '^(.*)\ (#.*)$', '$2 $1' | Sort-Object
$tempstor -replace '^(#\w+)\ (.*)$','$2 $1' | out-file test.txt -encoding ASCII

One liner, leverage sub-shells.

((get-content test.txt) -replace '^(.*)\ (#\w+)$', '$2 $1' | Sort-Object) -replace '^(#\w+)\ (.*)$','$2 $1' | out-file test.txt -encoding ascii
Sean C.
  • 572
1

Here's a Windows batch (.bat) or command (.cmd) file that will do it. I wasn't sure what you wanted to do with the output, so this just displays one of the two temporary files it creates and then deletes both of them.

@echo off
if {%1} == {} (
echo usage: %0 ^<filename^>
goto :EOF
)
echo.>_temp1
for /F "tokens=1,2 delims=#" %%i in (%1) do echo %%j$%%i>>_temp1
echo.>_temp2
sort _temp1 >_temp2
echo.>_temp1
for /F "tokens=1,2 delims=$" %%i in (_temp2) do @echo %%j#%%i>>_temp1
type _temp1
del _temp1
del _temp2
martineau
  • 4,573
0

If you're on Windows, you can use this simple PowerShell script:

[io.file]::ReadAllLines("test.txt")|Sort-Object {$_.SubString($_.IndexOf('#'))}

I'm hardly a PowerShell expert, so, sorry if there is a more optimal solution :)

Example

Here's the content of my input file test.txt:

PS C:\Users\Oliver> type test.txt
Blah blah #Test
123123 #Really
Oliver #SuperUser
Blah bluh #Really
klfdmngl #Test

This is the output when running the above script:

PS C:\Users\Oliver> [io.file]::ReadAllLines("test.txt")|Sort-Object {$_.SubString($_.IndexOf('#'))}
Blah bluh #Really
123123 #Really
Oliver #SuperUser
klfdmngl #Test
Blah blah #Test

Analysis

[io.file]       # From the module io.file...
::ReadAllLines  # use method ReadAllLines to read all text lines into an array...
("test.txt")    # from the file test.txt

|               # Take that array and pipe it to...
Sort-Object     # the cmdlet Sort-Object (to sort objects)
{               # To sort the elements in the array...
$_.SubString(   # use the part of the text line...
$_.IndexOf('#') # that starts at the first position of a #
)}
Oliver Salzburg
  • 89,072
  • 65
  • 269
  • 311