Checking Bots using PHP Script


Checing if a visitor being a search bot is easy. You can use one of the globals, the $_SERVER[‘HTTP_USER_AGENT’] to check if it contains bot-like string. For example, many spiders, e.g. Sogu Web Spider, contain ‘spider’ in their descriptive string. Therefore, checking functions (in PHP) are pretty simple.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
  
$ZLAI_Search_Engines = array(
    'google',
    'baidu',
    'yahoo',
    'spider',
    'msn',
    'test',
    'http',
    'bot',
    'jeevesteoma',
    'slurp',
    'gulper',
    'linkwalker',
    'validator',
    'webaltbot',
    'wget',
    'feed',
    'bing',
    'websitepulse',
    'sogou',
    'mediapartners',
    'sohu',
    'soso',
    'search',
    'yodao',
    'robozilla'
  );
  
  //define('BOTS', '/('.implode('|', $ZLAI_Search_Engines).')/i');  
  define('BOTS',
 '/(test|http|google|baidu|yahoo|spider|msn|bot|jeevesteoma|slurp|gulper|linkwalker|validator|webaltbot|wget|feed|bing|websitepulse|sogou|mediapartners|sohu|soso|search|yodao|robozilla)/i'
);
 
function checkspider($u)
{
  return (preg_match(BOTS, $u)); 
}
    
function spider()
{
  $agent='';
  if (isset($_SERVER['HTTP_USER_AGENT']))
  {
    $agent = $_SERVER['HTTP_USER_AGENT'];
  }
  return (preg_match(BOTS, $agent));
}  
  
function isSE($BS)
{
  $BS = trim($BS);
  if (!$BS) 
    return (true);
  global $ZLAI_Search_Engines;
  foreach ($ZLAI_Search_Engines as $v)
  {
    if (stripos($BS,$v) !== false) 
    {
      return (true);
    }
  }
  return (false);
}
  
$ZLAI_Search_Engines = array(
    'google',
    'baidu',
    'yahoo',
    'spider',
    'msn',
    'test',
    'http',
    'bot',
    'jeevesteoma',
    'slurp',
    'gulper',
    'linkwalker',
    'validator',
    'webaltbot',
    'wget',
    'feed',
    'bing',
    'websitepulse',
    'sogou',
    'mediapartners',
    'sohu',
    'soso',
    'search',
    'yodao',
    'robozilla'
  );
  
  //define('BOTS', '/('.implode('|', $ZLAI_Search_Engines).')/i');  
  define('BOTS',
 '/(test|http|google|baidu|yahoo|spider|msn|bot|jeevesteoma|slurp|gulper|linkwalker|validator|webaltbot|wget|feed|bing|websitepulse|sogou|mediapartners|sohu|soso|search|yodao|robozilla)/i'
);

function checkspider($u)
{
  return (preg_match(BOTS, $u)); 
}
  	
function spider()
{
  $agent='';
  if (isset($_SERVER['HTTP_USER_AGENT']))
  {
    $agent = $_SERVER['HTTP_USER_AGENT'];
  }
  return (preg_match(BOTS, $agent));
}  
  
function isSE($BS)
{
  $BS = trim($BS);
  if (!$BS) 
    return (true);
  global $ZLAI_Search_Engines;
  foreach ($ZLAI_Search_Engines as $v)
  {
    if (stripos($BS,$v) !== false) 
    {
      return (true);
    }
  }
  return (false);
}

The list defines most common spiders nowadays, such as google, baidu. However, you are free to add any other signature. The following page [here] uses these functions and can list the visitors (human or search engines) grabbing the domain steakovercooked.com

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
280 words
Last Post: Select Random SQL
Next Post: Downloading URL using Python

The Permanent URL is: Checking Bots using PHP Script

Leave a Reply