3

I want to remove leading and trailing tags from country names.
In my example those tags are <li> and <a>.

<li><a href="http://afghanistan.makaan.com/">Afghanistan</a></li>
<li><a href="http://albanie.makaan.com/">Albanie</a></li>
<li><a href="http://algérie.makaan.com/">Algérie</a></li>

Result should be:

Afghanistan
Albanie
Algérie

In Microsoft Word, I want to use the Find and Replace feature to accomplish it with regular expression.

How can I use regular expressions in MS Word?

nixda
  • 27,634

4 Answers4

4

Instead of copying your input text to Word, copy it to Notepad++ or any other editor with full RegEx support.

A RegEx string to select everything outside of tags or everything between > and < signs would be.

(?<=>).*?(?=<)

enter image description here

  • (?<=>) is a look behind. It looks for > signs and acts as an anchor. This way you can exclude the search string, which is important since you don't want <Afghanistan
  • .*? is a lazy quantifier and selects everything until the very next expression
  • (?=<) is a look ahead and looks for a < sign but excludes the searched sign itself. Just like the look behind

But you don't want to select the country names. You want to remove every tag. You need the opposite of the first regular expression. Somthing like

<.*?>

enter image description here

  1. Open Notepad++ search & replace dialog
  2. Select Use regular expressions
  3. Find what: <.*?>
  4. Replace with: nothing
nixda
  • 27,634
2

This is easy to do in MS Word's Find and Replace, without Regex, without JavaScript, etc.

If you escape a bracket it finds the actual bracket character. So, with wildcards toggled on, the expression \<*\\> will find everything between angle brackets. Just replace that with nothing.

Greenonline
  • 2,390
0

This looks like it's what you need.

Given latest comment (that you just want it in javsacript) - I'd look somewhere like here

if you want that in a SQL database, then I'd probably just write a couple of lines of perl to give you the list from the raw javascript. As far as I can tell; MS word doesn't come into it.

0

I wouldn't use find/replace for that. It would be simplest to use "Text to Columns" in Excel for that task. To do it, select the column that contains the text, go to the "Data" ribbon and select "Text to Columns". You will need to do it twice, once to remove all the text prior to the country name (delimited symbol would be ">" - make sure you delete the extraneous columns to avoid confusion) and once to remove the text after the name (delimited symbol would be "<").

Karen927
  • 760