Regular expressions
From Code Trash
The following are my own experiments and some are excerpts from other sources
Matching strings - substrings
Here i am trying to extract the string between a src attribute of any tag
My condition was string between starting src="...". I am extracting the text which are three dots.
Here i am a bit furtuer of including http://www.youtube.com because i want to extract the request uri of a youtube video
$arr= array(); preg_match("/src=(\"|\')http:\/\/www.youtube.com\/(.*?)\"/",$url,$arr); print_r($arr);
Matching HTML Tags
Matching string between two strings
I wouldn't use regex either for this, but if you must this expression should work:
<customtag>(.+?)</customtag>
If there won't be any other tags between the two tags, this regex is a little safer, and more efficient:
<customtag>[^<>]*</customtag>
Source http://stackoverflow.com/questions/299942/regex-matching-html-tags-and-extracting-text
Remove Tags Two
Remove javascript and CSS: <(script|style).*?</\1> Remove tags <.*?>
Source http://stackoverflow.com/questions/181095/regular-expression-to-extract-text-from-html
Remove script and tags like that
What about the above versions... i hope it does not include new line and the following
will include newline.
<script(.|\r\n)*?</script>
use other tags in the place of script. Combining this with the above regex we have the new one like this...
<(script|style)(.|\r\n)*?</\1>
Numbers
Numbers with trailing dot
I want to find all numbers ending with . example is 1., 12., 123. and i tried the following and it worked.
- [0-9]+[\.]
One digit and a dot
I want just one number and a dot to be searched and i used the following.
- [0-9][\.]
Other Options
I tried the following and got only matches like 1.0 34.34 here you to make . optional and included
- [0-9]+[\.][0-9]+
Remove all control characters
The pattern [\x00-\x1f] matches all control characters including the NUL character.
str = str.replace(/[\x00-\x1f]/,'')
Matching Date format
- I am trying to match this format 12/21/2010
- where first set first number sh ould not be more than 1
- in the second set the first number should not be more than 3
- in the third set in 2010 for the next 90 years 20 is constant
- the last two numbers can vary
These are the steps i kept in mind to create this and it is working...
fom.xdate.value.match(/(0|1)[0-9]\/(0|1|2|3)[0-9]\/20[0-9][0-9]/)
Match string with a character as optional
var protocol = window.location.href.match(/https?:\/\//) // alert protocol
Here slashes are predefined so it is preceding with a back slash. When i checked w3schools the window.protocol returns http: or https: but not the two trailing slahes... what could be the reason... i have to find it.
also i used http(s):// in regex and it worked and this one too... http(s)?://
Match last word with an optional forward slash
- window.location.href.match(/titles\/?$/)
Match a quoted text zero or more times
Matches an exact word
"\w*"
Matches any quoted string
"([^"\\]|\\.)*"
Match all hyperlinks
this i modified from the above but there should be another direct way to find all hyper links within quotes
"http://([^"\\]|\\.)*"
From Tiny MCE editor ip, url, ssn, cc, isbn, zip, phone, hexcolor and user
ip url ssn cc isbn zip phone hexcolor user
Remove characters from string
PHP
echo preg_replace("/[^a-z0-9]+/i",'',$value);
Javascript
val = val.replace(/[^a-z 0-9]+/gi,''); where g for all occurrences and i for case insensitive.
Matching first and last character
I have a string like '[sample]' and i want match [ as first and ] as last. so it gives [sample] and if (.*) is
included for back reference then you get [sample],sample in an array.
str.match(/^\[.*\]$/)) str.match(/^\[(.*)\]$/))
