Get hostname from a URL using JavaScript
In JavaScript, we can have a regular expression like
var pattern=/(.+:\/\/)?([^\/]+)(\/.*)*/i;
the regular expression pattern can be used to get the hostname. There are three parenthesis in the pattern, they are used to group the strings together and when testing on the target string, the matched string blocks can be remembered and returned as an array. Then we can retrieve the hostname from the returned array. The first parenthesis is to check the protocol of the URL, protocol can be http://, https://,ftp:// or file://. It can have zero or one occurrence of the protocol in one valid URL. The second parenthesis is to match the hostname, anything before the first occurrence of '/' after the protocol string will belong to the hostname. If no '/' present, then the whole string after the protocol string is the hostname. The third parenthesis is to match all the rest after the hostname.
For example, if we have a URL string
var url="http://www.example.com/aboutus.html";
After we run
var arr=pattern.exec(url);
The returned array arr will contain 4 elements. The arr[0] is the matched URL string which is http://www.example.com/aboutus.html. arr[1] contains the http:// which is the string block matched in the first parenthesis; arr[2] is the hostname www.example.com which is the matched string block in the second parenthesis; arr[3] is /aboutus.html which is the matched string block in the third parenthesis.
What if we don't have http:// at the beginning of a URL? We can still use this pattern, it will still return an array of 4 items. The only difference is that arr[1] is empty since no matched string block. It is the same if the URL doesn't have /index.html or any other similar blocks appended, in this case arr[3] will be empty.
So for any valid URL, we can get the hostname with arr[2]. Hope this can help you when you want to know which host of a URL belongs to. This pattern can also be used in other programming languages.
RELATED
0 COMMENT
No comment for this article.