Regex, selecting from second regex expression if only it is not in first regex expression
Regex, selecting from second regex expression if only it is not in first regex expression
For example- I have this text
www.google.com
www.google.com
<a href="www.google.com"> Google Homepage </a>
<a href="www.google.com"> Google Homepage </a>
I wrote this (<a.*</a>)
which captures anchor tag and this (www.[S]+(b|$))
which selects any text which starts with www.
but what i want it selects only www.google.com
not the one inside anchor tag.
(<a.*</a>)
(www.[S]+(b|$))
www.
www.google.com
anything through which I can completely ignore anchor tag and select text only from remaining text.
To be more precise a regex which can: NOT OF (<a.*</a>)
AND (www.[S]+(b|$))
(<a.*</a>)
(www.[S]+(b|$))
Hope, I'm clear with my question. Thanks for helping.
Are you using JavaScript or what's your context?
– wp78de
Jun 29 at 8:06
yeah, I'm using JS.
– Sandeep Gupta
2 days ago
1 Answer
1
As I understand you want to select each url (starting with an www.) when it is not in the href attribut
This will work with an negative lookbehind
(?<!href=")(www.[S]+(b|$))
This regex will select the url when there is no href=" before it.
Be aware js does not support a negative lookbehind, tested on https://regex101.com/
Edit due to addtitons in the comments:
If you want to sort out everything in an html-tag (between before closing >
) this should work for you:
>
(?![^<]*>)(([a-zA-Z0-9-_.])+@[a-zA-Z_]+?(.[a-zA-Z]{2,6})+)
It's an negative lookahead saying that it should not match when having an unlimited times not <
followed by >
<
>
Good thing about negative lookahead is, that it is supported in JS :)
thanks for helping, any suggestions for this
sandeep.gupta@xyz.com
<a href="mailto:sandeep.gupta@xyz.com"> Sandeep Gupta </a>
– Sandeep Gupta
Jun 29 at 8:16
sandeep.gupta@xyz.com
<a href="mailto:sandeep.gupta@xyz.com"> Sandeep Gupta </a>
I'm trying this regex -
(?<!mailto:)(([a-zA-Z0-9-_.])+@[a-zA-Z_]+?(.[a-zA-Z]{2,6})+)
– Sandeep Gupta
Jun 29 at 8:17
(?<!mailto:)(([a-zA-Z0-9-_.])+@[a-zA-Z_]+?(.[a-zA-Z]{2,6})+)
the problem here is, that this will match the mail-adress starting from the second char
– Lars-Olof Kreim
Jun 29 at 8:20
see my edited comment
– Lars-Olof Kreim
Jun 29 at 8:27
can you please explain :/ how this works -
(?![^<]*>)
btw thanks for helping.– Sandeep Gupta
2 days ago
(?![^<]*>)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Do not try to parse html with regex
– Ulysse BN
Jun 29 at 7:29