You can also see the other urls2links function, urls2linksSimple,
and the rest of my projects & resources.

urls2linksComplex - Link & E-mail Address Converting Function

Skip to: Description, demonstration, source/function code, parameters, license.

Description

This is the demonstration of a php function that turns all URLs in a string into hyperlinks. It is intended for use in scripts such as forum or message board software, so URLs typed into messages can be automatically turned into hyperlinks.

string urls2linksComplex(string $text, string $schemes = null, string $tlds = 'normal')

It searches for all URLs in the string $text, with or without a protocol (e.g. 'www.bbc.co.uk' or 'http://www.google.com'), finding the entire address: host, port, path, query string and fragment. It is configurable as to what schemes (protocols, e.g. http, ftp) and TLDs it accepts (e.g. .com, .co.uk). See the explaination of the parameters below.

If there is a punctuation mark directly after the URL, it excludes that from the link. Punctuation marks being included in links is a common problem on message boards.

If there are URLs inside HTML tags (for example the src of an <img> tag) then the function will ignore them (unless there is a space at both ends of the url). Try experimenting with HTML tags in the demonstration box below.

The php function code, explaination of the parameters and license info are available lower down on this page.

See also:

The simpler urls2links function, which cannot recognise e-mail addresses, but is only a few lines of code: urls2linksSimple

Technical information:

This function finds any URIs compliant with RFC3986, with a few exceptions:

  • The scheme (protocol) is not required. It is assumed to be 'http' or 'mailto' as appropriate.
  • It does not recognise IPv6 addresses. This would make the regular expression much more complicated with not much additional benefit.
  • The host must be a DNS formatted domain name, or an IPv4 address. In the case of a DNS formatted domain name, the top-level-domain must be between two and six characters.
  • An 'authority' (host or domain) is required, except in the case of e-mail addresses.
  • It does not recognise relative addresses.

Demonstration

Enter text with URLs in it:


Schemes (Protocols):

The php function code:

Please feel free to use this code in your own scripts, and to modify it to suit your purposes. I've made it simple - you pass it one variable and it passes one back - but you may want it to work in a different way.
If you do use this function, please leave the note with my details intact, thank you.

function urls2linksComplex($text, $schemes = null, $tlds = 'normal'){
  //"urls2links - Complex" function by Martin Pain / m-bread ( http://m-bread.com/resources/php/functions/urls2linksComplex )
  //This function can be distributed under the Creative Commons Attribution-Share Alike 2.0 UK: England & Wales License
  //( http://creativecommons.org/licenses/by-sa/2.0/uk/ )
  //Please leave these comments intact.
  if($schemes == 'normal'){
    $scheme = '(?:[Hh][Tt]|[Ff])[Tt][Pp][Ss]?';
  }elseif( is_array($schemes) ){
    $scheme = '(?:' . implode('|', $schemes) . ')';
  }elseif( is_string($schemes) ){
    $scheme = $schemes;
  }else{
    $scheme = '[a-zA-Z][a-zA-Z0-9\-+.]*';
  };//EoIF
  if($tlds == 'normal'){
    $tldExclude = array('doc', 'xls', 'txt', 'rtf', 'jpeg', 'jpg', 'gif', 'png', 'exe', 'html', 'htm', 'zip', 'gz', 'scr', 'rar', 'php', 'php3', 'inc', 'ico', 'bmp', 'asp', 'jsp', 'dat', 'lnk', 'cab', 'csv', 'xml', 'xsl', 'xsd', 'svg', 'psp', 'psd', 'pdf', 'bak', 'wav', 'mp3', 'm4v', 'midi', 'wmv', 'wma', 'js', 'css', 'ppt', 'pps', 'mdb');
  }elseif( is_array($tlds) ){
    $tldExclude = $tlds;
  }elseif( is_string($tlds) ){
    $tldExclude = array($tlds);
  }else{
    $tldExclude = array();
  };//EoIF
    $userinfo = '(?:(?:[a-zA-Z0-9\-._~!$&\'()*+,;=:]|%[0-9A-Fa-f]{2})*@)?';
      $decOctet = '(?:[0-9]|[0-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])';
     $ipv4 = '(?:'.$decOctet.'\.){3}'.$decOctet;
     $regname = '(?:(?:[0-9A-Za-z][0-9A-Za-z\-]*[0-9A-Za-z]|[0-9A-Za-z])\.)+[a-zA-Z]{2,6}';
    $host = '('.$ipv4.'|'.$regname.')';
    $port = '(?::[0-9]*)?';
   $authority = '((?://)?'.$userinfo.$host.$port.')';
   $path = '(?:/(?:[a-zA-Z0-9\-._~!$&\'()*+,;=:]|%[0-9A-Fa-f]{2})*?)*';
   $query = '(?:\?(?:[a-zA-Z0-9\-._~!$&\'()*+,;=:/?]|%[0-9A-Fa-f]{2})*?)?';
   $fragment = '(?:#(?:[a-zA-Z0-9\-._~!$&\'()*+,;=:/?]|%[0-9A-Fa-f]{2})*?)?';
  $pattern = '\b(('.$scheme.'\:)?'.$authority.$path.$query.$fragment.')($|[^\w/][<\s]|[<\s]|[^\w/]$)';
  $replacement = '( !in_array( substr(\'$4\', strrpos(\'$4\', \'.\')+1), $tldExclude) )?\'<a href="\'.((\'$2\' == \'\')?((strpos(\'$3\', \'@\'))?\'mailto:$1\':\'http://$1\'):\'$1\').\'">$1</a>$5\':\'$0\'';
  return preg_replace('/'.str_replace('/', '\x2F', $pattern).'/e', $replacement, $text);
};//EoFn urls2links

Parameters

$text

This parameter is the text containing the URLs to be converted to links. To convert line breaks to <br/> tags, use the nl2br(string) function on the text before passing it to this function. For example: urls2linksComplex(nl2br($text))

$schemes

This optional parameter provides the instructions about which schemes (protocols) to accept. Be careful when restricting schemes, because it can cause unwanted effects when someone tries to use a scheme which isn't allowed - it will replace the scheme with http:// or place http:// before the scheme they were trying to use.

This parameter can accept the following values:

null
Default. Accepts all valid schemes.
'normal' (string)
Accepts http, https, ftp and ftps.
array
Accepts all schemes in the array. Each member of the array must be a string containing a scheme name without a colon or slashes. For example: array('http', 'ftp')
sting
Takes the string as a regular expression fragment specifying the acceptable schemes. See the php documentation of regular expressions for more information. This parameter should not include delimiters or either of the start (^) or end ($) assertions.

$tlds

This optional parameter provides the instructions about which top-level-domains not to accept. This parameter can accept the following values:

null
Accepts any TLDs. This is not reccomended, because filenames (such as wordDocument.doc) will get interpreted as URLs.
'normal' (string)
Default. Does not accept the following common file extentions as TLDs: doc, xls, csv, txt, rtf, dat, jpeg, jpg, gif, png, ico, bmp, psp, psd, exe, scr, html, htm, zip, gz, rar, cab, php, php3, inc, asp, jsp, lnk, xml, xsl, xsd, svg, pdf, bak, wav, mp3, m4v, midi, wmv, wma, js, css, ppt, pps, mdb. This is to prevent file names being interpreted as URLs.
array
Does not accept any member of the array as a TLD. Each item in the array should be a string. The strings may contain periods (dots), but should not start with them. For example: array('gov', 'co.uk') to exclude US government websites and British websites.
string
Does not accept the contents of the string as a TLD. All other TLDs are accepted.
The demonstration box above allows you to select which schemes to use, but uses the default setting for TLDs. Try typing an ftp URL in and selecting the http, mailto & news scheme option to see how it behaves.

License

Creative Commons License These urls2Links php functions are licenced under the Creative Commons Attribution-Share Alike 2.0 UK: England & Wales License by Martin Pain. This means you are free to use, distribute and modify the functions, as long as you distribute any work you use it in under a similar licence. Click on the image or licence name for more information.

You can also see the other urls2links function, urls2linksSimple,
and the rest of my projects & resources.