Regular expressions

From Code Trash
Jump to: navigation, search

The following are my own experiments and some are excerpts from other sources

Regex for ASCII Characters

A regular expression that matches ASCII characters consists of an escaped string \x00 where 00 can be any hexadecimal ASCII character code from 00 to FF. A range of ASCII characters can be matched by enclosing two such codes in square brackets.

/[\x00-\xFF]/

The expression above will match all ASCII characters from NULL (hex code 0) to ÿ (hex code 255) as shown in this article or this list. These ASCII characters are divided into three groups:

33 control characters (hex code 00 to 1F as well as 7F) 95 printable characters (hex code 20 to 7E) 128 extended character set (hex code 80 – FF) Note that the first 32 characters (00 to 1F) as well as 7F are control characters and can often be omitted. This requires specifying two character ranges which excludes these character:

/[\x20-\x7E\x80-\xFF]/

https://regexland.com/ascii/

Matching strings - substrings

Here i am trying to extract the string between a src attribute of any tag
My condition was string between starting src="...". I am extracting the text which are three dots.
Here i am a bit furtuer of including http://www.youtube.com because i want to extract the request uri of a youtube video


	$arr= array();
	preg_match("/src=(\"|\')http:\/\/www.youtube.com\/(.*?)\"/",$url,$arr);
	print_r($arr);

Matching HTML Tags

Matching string between two strings

I wouldn't use regex either for this, but if you must this expression should work:

<customtag>(.+?)</customtag>

If there won't be any other tags between the two tags, this regex is a little safer, and more efficient:

<customtag>[^<>]*</customtag>

Source http://stackoverflow.com/questions/299942/regex-matching-html-tags-and-extracting-text

Remove Tags Two

Remove javascript and CSS:

<(script|style).*?</\1>

Remove tags

<.*?>

---

Source http://stackoverflow.com/questions/181095/regular-expression-to-extract-text-from-html

Remove script and tags like that

What about the above versions... i hope it does not include new line and the following
will include newline.

<script(.|\r\n)*?</script>

use other tags in the place of script. Combining this with the above regex we have the new one like this...

<(script|style)(.|\r\n)*?</\1>

Numbers

Numbers with trailing dot

I want to find all numbers ending with . example is 1., 12., 123. and i tried the following and it worked.

  • [0-9]+[\.]

One digit and a dot

I want just one number and a dot to be searched and i used the following.

  • [0-9][\.]

Other Options

I tried the following and got only matches like 1.0 34.34 here you to make . optional and included

  • [0-9]+[\.][0-9]+

Remove all control characters

The pattern [\x00-\x1f] matches all control characters including the NUL character.

str = str.replace(/[\x00-\x1f]/,'')

Matching Date format

  • I am trying to match this format 12/21/2010
  • where first set first number sh ould not be more than 1
  • in the second set the first number should not be more than 3
  • in the third set in 2010 for the next 90 years 20 is constant
  • the last two numbers can vary

These are the steps i kept in mind to create this and it is working...

fom.xdate.value.match(/(0|1)[0-9]\/(0|1|2|3)[0-9]\/20[0-9][0-9]/)

Match string with a character as optional

var protocol = window.location.href.match(/https?:\/\//)
// alert protocol

Here slashes are predefined so it is preceding with a back slash. When i checked w3schools the window.protocol returns http: or https: but not the two trailing slahes... what could be the reason... i have to find it.

also i used http(s):// in regex and it worked and this one too... http(s)?://

Match last word with an optional forward slash

  • window.location.href.match(/titles\/?$/)


Match a quoted text zero or more times

Matches an exact word

"\w*"

Matches any quoted string

"([^"\\]|\\.)*"


Match all hyperlinks

this i modified from the above but there should be another direct way to find all hyper links within quotes

"http://([^"\\]|\\.)*"

From Tiny MCE editor ip, url, ssn, cc, isbn, zip, phone, hexcolor and user

ip url ssn cc isbn zip phone hexcolor user

Remove characters from string

PHP

     echo preg_replace("/[^a-z0-9]+/i",'',$value);

Javascript

     val = val.replace(/[^a-z 0-9]+/gi,'');
 
     where g for all occurrences and i for case insensitive.


Matching first and last character

I have a string like '[sample]' and i want match [ as first and ] as last. so it gives [sample] and if (.*) is
included for back reference then you get [sample],sample in an array.

     str.match(/^\[.*\]$/))
     str.match(/^\[(.*)\]$/))

Match Capital Letters with optional underscore

[A-Z_]{2,}

I want it to match minimum of two characters... also i want to check with the others to impose case sensitivity( the reason being is it is not working in notepad++ so i have to check Match Case option manually in notepad++ and hope this will not be the behaviour with other regex parsers.

Extracting hostname from HTTP_REFERRER

Matching newline in multiline text without using DOT with an alternate

Find the length of a string or a range in length with backreference in regex

Capturing and not capturing - Subpatterns

Regular expressions that i should remember

Regex in C language using regcomp and regexec

Match a html content which has many children

(function(){
var cont = document.getElementById('retailform').innerHTML.match(/\<fieldset(.*?)gene_div(.*?)[\s\S]+\<\/fieldset\>/)[0];
var ele = document.createElement('div');
ele.innerHTML = cont;
document.getElementById('retailform').appendChild(ele);
document.getElementById('keyword_search').disabled = false;
})()

Reference