CertForums


Go Back   CertForums > General Forums > Articles, Reviews and Interviews > Reviews


Webbots,Spiders, and Screen Scrapers

Reply
 
Thread Tools Display Modes
  #1  
Old 04-May-2007, 04:11 PM
tripwire45's Avatar
tripwire45 tripwire45 is offline
Lifetime Member
Posts: 14,410
 
Reputation
Points: 5008 tripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 points
Power: 207
Awards
None
Profile
Join Date: 29 Jun 2003
Location: Boise, ID, USA
Certifications: A+ and Network+
Rep Power: 207
tripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 points
Webbots,Spiders, and Screen Scrapers

Webbots,Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
Author: Michael Schrenk
Format: Paperback 328 pages
Publisher: No Starch Press (March 30, 2007)
ISBN-10: 1593271204
ISBN-13: 978-1593271206

Review by James Pyles
May 4, 2007

The book assumes a few things about the reader, which is good. If you aren't part of the "assumed" group, this book won't be very interesting or at least not very useful. Here they are...there are only two: The first assumption is that you know how to program. There are no details about to what degree, so I'll assume that basic programming skills will be sufficient. The second assumption is a little more specific. You'll need to have at least a basic understanding of PHP. If you need help in these areas, try reading books like Beginning PHP5, Learning PHP5, and perhaps some other beginner's programming book.

As with most books and especially most programming books. there's a companion website which in this case is http://www.schrenk.com/nostarch/webbots/. Once there, you can download code libraries and sample scripts. The only provision is that you can't use these materials commercially (which is a bummer if you do this professionally...still, it's best to play by the rules).

Since Schrenk was so polite as to build a promotional website for the public, I decided to check it out. I found it interesting that the author refers to himself in the third person on his home page, especially when he talks about writing this book: "We offer both traditional online services as well as advanced strategies incorporating automated browsing agents called webbots. In fact, we wrote the book on webbots". I checked and Schrenk is the sole author of the book so "we" must be an attempt to create the illusion of having a corporate staff when in fact, Michael Schrenk is the corporate staff.

Oh, the book. I'm supposed to be reviewing the book (and I'll try not to hold it against the author that he mentions being a fan of The Brady Bunch). Ok, here goes. Learning the skills taught in this book is a little like learning to be "M"; developing and sending your "double-oh" agents out to solve problems and retrieve information. When you develop webbots, you are creating "representatives" of your intentions to the web (for good or for ill). Web robots tend to have a bad rep since the general public almost always is unaware of them until some cracker uses them to steal their identity, to spam them, or to load malicious adware on their PCs. Like most things however, there is both a light and dark side to consider.

If you are someone who is or plans to specialize in developing webbots for corporate use, this book contains all the information, skills, and tools you'll need to get going. Schrenk presents the material both with the authority that his eleven years of programming experience gives him and in a friendly, easy-to-read style. You don't have to necessarily "speak geek" to read this book.

The book's assumptions are spot on. If you don't have programming experience in general and PHP experience in specific, your learning curve will be steep if not just plain vertical. On the other hand, you don't have to be a programming genius to effectively use this book's content. I'd say the text's target audience falls in the beginner to intermediate range. Some of the screen shots indicate that the author develops on a Microsoft platform but the software tools he recommends (PHP, CURL, and MySQL) are freely available downloads. Using this book to learn webbot development won't cause you to break into your piggy bank (unless you can't afford the price of this text), plus we Linux people can also participate.

Pay special attention to Chapter 28 Keeping Webbots Out of Trouble. Earlier, I mentioned the "dark side" of webbots development. It would be just as easy to use this information to develop "pain-in-the-butt" bots that do everything from annoying millions of Internet users to committing crimes against them. Take very seriously the ethical and legal standards that govern legitimate webbot development. These "critters" have a valued place on the web but they can be very much misused. In the words of Uncle Ben (or rather Stan Lee), "With great power comes great responsibility". It's like learning to drive a car. It's not enough to learn how to drive adequately. You must also practice driving safely. Don't hurt anyone. With that in mind, ladies and gentlemen, start your engines. This book is a great ride.


It's been said that if you give a million chimpanzees a million typewriters, they'll eventually reproduce the complete works of Shakespeare. Wanna bet?

Blog: A million chimpanzees


Last edited by tripwire45; 04-May-2007 at 07:55 PM.
 
Reply With Quote
  #2  
Old 09-May-2007, 01:30 PM
tripwire45's Avatar
tripwire45 tripwire45 is offline
Lifetime Member
Posts: 14,410
 
Reputation
Points: 5008 tripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 points
Power: 207
Awards
None
Profile
Join Date: 29 Jun 2003
Location: Boise, ID, USA
Certifications: A+ and Network+
Rep Power: 207
tripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 pointstripwire45 has over 4000 points
I got a nice email from the author thanking me for my review. Turns out he really does have a staff and they did assist in getting the book written, so the "we" is authentic.


It's been said that if you give a million chimpanzees a million typewriters, they'll eventually reproduce the complete works of Shakespeare. Wanna bet?

Blog: A million chimpanzees

 
Reply With Quote
  #3  
Old 13-Apr-2008, 05:27 PM
JasonGawker JasonGawker is offline
New Member
Posts: 1
 
Reputation
Points: 0 JasonGawker has no points
Power: 5
Awards
None
Profile
Join Date: 02 Apr 2008
Rep Power: 5
JasonGawker has no points
Hi

Hello

I read the ***removed spam link***, and I found it highly informative, cutting edge, and usually it goes straight to the point. I liked the illustrations and the code snippets as well, they're very useful. I worked on my private spider myself and it's up and running, and that's only under a few months of work. Of course I recommend it to anyone interested in these topics.


Last edited by tripwire45; 13-Apr-2008 at 06:25 PM. Reason: Removed spam link
 
Reply With Quote
  #4  
Old 13-Apr-2008, 06:38 PM
dmarsh dmarsh is offline
Lifetime Member
Posts: 2,501
 
Reputation
Points: 8280 dmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 points
Power: 113
Awards
None
Profile
Join Date: 24 May 2007
Certifications: One or two...
Rep Power: 113
dmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 points
I think editing of the above post is a little ironic.

Trip since you don't code in PHP and I presume you know little about spiders and bots aren't you essentially spamming the forums... ?

In fact with 2600 views in under 20 minutes I can't help but think this post is being spidered !


Last edited by dmarsh; 13-Apr-2008 at 06:53 PM.
 
Reply With Quote
  #5  
Old 13-Apr-2008, 06:54 PM
Sparky's Avatar
Sparky Sparky is offline
I`ll have a pint...
Posts: 8,274
 
Reputation
Points: 7855 Sparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 points
Power: 169
Awards
None
Profile
Join Date: 15 Dec 2005
Location: Scotland
Certifications: MSc MCSE MCSA:M MCITP:EA MCTS(x4) N+ A+
WIP: Feels like everything : )
Rep Power: 169
Sparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 pointsSparky has over 4000 points
Quote:
Originally Posted by dmarsh26 View Post
In fact with 2600 views in under 20 minutes I can't help but think this post is being spidered !
Posted in 2007 mate

 
Reply With Quote
  #6  
Old 13-Apr-2008, 06:58 PM
dmarsh dmarsh is offline
Lifetime Member
Posts: 2,501
 
Reputation
Points: 8280 dmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 points
Power: 113
Awards
None
Profile
Join Date: 24 May 2007
Certifications: One or two...
Rep Power: 113
dmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 pointsdmarsh has over 4000 points
Oh my bad missed the date OP, didn't realise it was a zombie thread...


Last edited by dmarsh; 13-Apr-2008 at 07:02 PM.
 
Reply With Quote
Reply

Go Back   CertForums > General Forums > Articles, Reviews and Interviews > Reviews

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Laptop Screen Problem mojorisin Hardware & Upgrading 6 25-Apr-2009 04:41 PM
real time screen capture thetokyoproject Training & Development 10 09-Dec-2007 09:18 AM
BlueScreen Screen Saver v3.2 Mr.Cheeks Just for Laughs 8 10-Nov-2006 08:53 PM
A New Screen of Death for Mac OS X Mr.Cheeks Just for Laughs 0 15-Sep-2006 08:36 AM
Print Screen and # on a Mac simongrahamuk Software 2 18-Aug-2006 08:51 PM


All times are GMT +1. The time now is 05:27 AM.
Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
CertForums.co.uk (C) copyright 2003-2009 All Rights Reserved. Content published on CertForums.co.uk requires permission for reprint.
Lunarpages.com Web Hosting