When we need to process a HTML page source code, we often need to retrieve the meta description of the page besides the links in the page. This description is usually located in <meta> tag of a HTML page. The meta description is very useful for search engine index. How can we retrieve the meta description? If we use a regular expression, we can easily get the meta description.
In JavaScript, the regular expression looks like :
var pattern = /<meta.*?name="description".*?content="(.*?)".*?>|<meta.*?content="(.*?)".*?name="description".*?>/i;
since the description is the content in the <meta> tag with a property name which has a value of description. So we need to find this tag and then use parenthesis to group the description for later retrieval. Also here we use a | character to separate the two sub patterns , the meta tag can have either sub pattern above.
Suppose now we have a sample code snippet which contains
var data='<meta name="description" content="This is a sample code snippet">';
when we run
var arr=pattern.exec(data);
The returned arr is an array with 3 elements if it's matched. The first one arr[0] is the matched content in the data variable, arr[1] is the content matched in the first parenthesis in the pattern, arr[2] is the content in the second parenthesis in the pattern. If the first sub pattern is matched, then arr[1] will contain the description and arr[2] will be empty. Otherwise, arr[1] will be empty and arr[2] will contain the description.
In the above case, arr[0] will be <meta name="description" content="This is a sample code snippet"> and arr[1] will be This is a sample code snippet and arr[2] will be empty.
In conclusion, to get the meta description you only need to check whether arr[1] is empty or not, if it's empty, then the description is arr[2], otherwise it's arr[1].