Checing if a visitor being a search bot is easy. You can use one of the globals, the $_SERVER[‘HTTP_USER_AGENT’] to check if it contains bot-like string. For example, many spiders, e.g. Sogu Web Spider, contain ‘spider’ in their descriptive string. Therefore, checking functions (in PHP) are pretty simple.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | $ZLAI_Search_Engines = array( 'google', 'baidu', 'yahoo', 'spider', 'msn', 'test', 'http', 'bot', 'jeevesteoma', 'slurp', 'gulper', 'linkwalker', 'validator', 'webaltbot', 'wget', 'feed', 'bing', 'websitepulse', 'sogou', 'mediapartners', 'sohu', 'soso', 'search', 'yodao', 'robozilla' ); //define('BOTS', '/('.implode('|', $ZLAI_Search_Engines).')/i'); define('BOTS', '/(test|http|google|baidu|yahoo|spider|msn|bot|jeevesteoma|slurp|gulper|linkwalker|validator|webaltbot|wget|feed|bing|websitepulse|sogou|mediapartners|sohu|soso|search|yodao|robozilla)/i' ); function checkspider($u) { return (preg_match(BOTS, $u)); } function spider() { $agent=''; if (isset($_SERVER['HTTP_USER_AGENT'])) { $agent = $_SERVER['HTTP_USER_AGENT']; } return (preg_match(BOTS, $agent)); } function isSE($BS) { $BS = trim($BS); if (!$BS) return (true); global $ZLAI_Search_Engines; foreach ($ZLAI_Search_Engines as $v) { if (stripos($BS,$v) !== false) { return (true); } } return (false); } |
$ZLAI_Search_Engines = array( 'google', 'baidu', 'yahoo', 'spider', 'msn', 'test', 'http', 'bot', 'jeevesteoma', 'slurp', 'gulper', 'linkwalker', 'validator', 'webaltbot', 'wget', 'feed', 'bing', 'websitepulse', 'sogou', 'mediapartners', 'sohu', 'soso', 'search', 'yodao', 'robozilla' ); //define('BOTS', '/('.implode('|', $ZLAI_Search_Engines).')/i'); define('BOTS', '/(test|http|google|baidu|yahoo|spider|msn|bot|jeevesteoma|slurp|gulper|linkwalker|validator|webaltbot|wget|feed|bing|websitepulse|sogou|mediapartners|sohu|soso|search|yodao|robozilla)/i' ); function checkspider($u) { return (preg_match(BOTS, $u)); } function spider() { $agent=''; if (isset($_SERVER['HTTP_USER_AGENT'])) { $agent = $_SERVER['HTTP_USER_AGENT']; } return (preg_match(BOTS, $agent)); } function isSE($BS) { $BS = trim($BS); if (!$BS) return (true); global $ZLAI_Search_Engines; foreach ($ZLAI_Search_Engines as $v) { if (stripos($BS,$v) !== false) { return (true); } } return (false); }
The list defines most common spiders nowadays, such as google, baidu. However, you are free to add any other signature. The following page [here] uses these functions and can list the visitors (human or search engines) grabbing the domain steakovercooked.com
–EOF (The Ultimate Computing & Technology Blog) —
GD Star Rating
loading...
280 wordsloading...
Last Post: Select Random SQL
Next Post: Downloading URL using Python