Regular expressions

From Code Trash
Jump to: navigation, search

The following are my own experiments and some are excerpts from other sources

Matching strings - substrings

Here i am trying to extract the string between a src attribute of any tag
My condition was string between starting src="...". I am extracting the text which are three dots.
Here i am a bit furtuer of including http://www.youtube.com because i want to extract the request uri of a youtube video


	$arr= array();
	preg_match("/src=(\"|\')http:\/\/www.youtube.com\/(.*?)\"/",$url,$arr);
	print_r($arr);

Matching HTML Tags

Matching string between two strings

I wouldn't use regex either for this, but if you must this expression should work:

<customtag>(.+?)</customtag>

If there won't be any other tags between the two tags, this regex is a little safer, and more efficient:

<customtag>[^<>]*</customtag>

Source http://stackoverflow.com/questions/299942/regex-matching-html-tags-and-extracting-text

Remove Tags Two

Remove javascript and CSS:

<(script|style).*?</\1>

Remove tags

<.*?>

---

Source http://stackoverflow.com/questions/181095/regular-expression-to-extract-text-from-html

Remove script and tags like that

What about the above versions... i hope it does not include new line and the following
will include newline.

<script(.|\r\n)*?</script>

use other tags in the place of script. Combining this with the above regex we have the new one like this...

<(script|style)(.|\r\n)*?</\1>

Numbers

Numbers with trailing dot

I want to find all numbers ending with . example is 1., 12., 123. and i tried the following and it worked.

  • [0-9]+[\.]

One digit and a dot

I want just one number and a dot to be searched and i used the following.

  • [0-9][\.]

Other Options

I tried the following and got only matches like 1.0 34.34 here you to make . optional and included

  • [0-9]+[\.][0-9]+

Remove all control characters

The pattern [\x00-\x1f] matches all control characters including the NUL character.

str = str.replace(/[\x00-\x1f]/,'')

Matching Date format

  • I am trying to match this format 12/21/2010
  • where first set first number sh ould not be more than 1
  • in the second set the first number should not be more than 3
  • in the third set in 2010 for the next 90 years 20 is constant
  • the last two numbers can vary

These are the steps i kept in mind to create this and it is working...

fom.xdate.value.match(/(0|1)[0-9]\/(0|1|2|3)[0-9]\/20[0-9][0-9]/)

Match string with a character as optional

var protocol = window.location.href.match(/https?:\/\//)
// alert protocol

Here slashes are predefined so it is preceding with a back slash. When i checked w3schools the window.protocol returns http: or https: but not the two trailing slahes... what could be the reason... i have to find it.

also i used http(s):// in regex and it worked and this one too... http(s)?://

Match last word with an optional forward slash

  • window.location.href.match(/titles\/?$/)


Match a quoted text zero or more times

Matches an exact word

"\w*"

Matches any quoted string

"([^"\\]|\\.)*"


Match all hyperlinks

this i modified from the above but there should be another direct way to find all hyper links within quotes

"http://([^"\\]|\\.)*"

From Tiny MCE editor ip, url, ssn, cc, isbn, zip, phone, hexcolor and user

ip url ssn cc isbn zip phone hexcolor user

Remove characters from string

PHP

     echo preg_replace("/[^a-z0-9]+/i",'',$value);

Javascript

     val = val.replace(/[^a-z 0-9]+/gi,'');
 
     where g for all occurrences and i for case insensitive.


Matching first and last character

I have a string like '[sample]' and i want match [ as first and ] as last. so it gives [sample] and if (.*) is
included for back reference then you get [sample],sample in an array.

     str.match(/^\[.*\]$/))
     str.match(/^\[(.*)\]$/))

Match Capital Letters with optional underscore

[A-Z_]{2,}

I want it to match minimum of two characters... also i want to check with the others to impose case sensitivity( the reason being is it is not working in notepad++ so i have to check Match Case option manually in notepad++ and hope this will not be the behaviour with other regex parsers.

Extracting hostname from HTTP_REFERRER

Matching newline in multiline text without using DOT with an alternate

Find the length of a string or a range in length with backreference in regex

Capturing and not capturing - Subpatterns

Regular expressions that i should remember

Regex in C language using regcomp and regexec

Reference