Page Scraping

This Tutorial will walk you through the process of Page Scraping.

What is Page Scraping, the legalities and ethics and an example of Page Scraping with a PHP Script.

Viewing the Data

We assume here that you already have a PHP site set up within Dreamweaver. Open a new PHP dynamic page (by choosing File > New, and then choosing the Dynamic Page and PHP options) and simply copy the complete code (displayed below) into the <body> section of a new PHP page.

<?php

        $url = "http://www.amazon.com/exec/obidos/ASIN/1904151191/";

        $filepointer = fopen($url,"r");

  if($filepointer){

  while(!feof($filepointer)){

              $buffer = fgets($filepointer, 4096);

                $file .= $buffer;

            }

            fclose($filepointer);

         } else {

              die("Could not create a connection to Amazon.com");   

        }

    ?>

    <?php

          preg_match("/<b>Amazon.com\sSales\sRank:\s<\/b>\s(.*)\s/i",$file,$match);

         $result = $match[1];

         echo $result;   

     ?>

When the page is saved and viewed in a browser, the sales rank should be displayed:

Of course, you can add HTML or more PHP script around the sales rank, for example as displayed below:

There are many other options as to what you could do with the data. For example, it could be logged to a database or a custom page could be created which displays only the particular sales ranks that you're interested in. Alternatively, you could compare the data from the Amazon web page to a previous copy, to see if there have been any changes. If the page has changed you could send an e-mail, and then store the new page, overwriting the old page, ready for next time the script checks whether the page has changed.

About the author:

Gareth is a member of Team Macromedia, and an author of 2 dreamweaver pro books from glasshaus www.glasshaus.com/dreamweaverpro

Dreamweaver MX: PHP Web Development

Dreamweaver MX: Advanced PHP Web Development

Read more

Read more

Gareth Downes-Powell

Gareth Downes-PowellGareth has a range of skills, covering many computer and internet related subjects. He is proficient in many different languages including ASP and PHP, and is responsible for the setup and maintenance of both Windows and Linux servers on a daily basis.


In his daily web development work he uses the complete range of Macromedia software, including Dreamweaver MX, Flash MX, Fireworks MX and Director to build a number of websites and applications. Gareth has a close relationship with Macromedia, and as a member of Team Macromedia Dreamweaver, he has worked closely in the development of Dreamweaver, and was a beta tester for Dreamweaver MX.


On a daily basis he provides support for users in the Macromedia forums, answering questions and providing help on a range of different web related subjects. He has also written a number of free and commercial extensions for Dreamweaver MX, to further extend its capabilities using its native JavaScript API’s or C++.


As a web host, Gareth has worked with a range of different servers and operating systems, with the Linux OS as his personal favourite. Most of his development work is done using a combination of Linux, Apache and MySQL and he has written extensively about setting up this type of system, and also running Apache and MySQL under Windows.

See All Postings From Gareth Downes-Powell >>

Comments

scraping and asp

April 21, 2004 by briant sylvain78
The problem I have, is how to make scrpaing fo jobs.... job scraping on asp website

This tutorial is out of date I think!

May 22, 2006 by j ll

When I try to get this code working, a blank page is returned (i.e. no info between the body tags).

The amazon information that is used in this example no longer works as Amazon have changed the format. Could please provide an example that works. Its a deadly bit of code if it works. Thanks you John L

Not quite fixed but I tried

July 6, 2009 by Jason Guritz

The script works if..

 

Just take another look at the tutorial and the new page

Or change it to what I have here.

preg_match("/<b>Amazon.com\sSales\sRank:<\/b>\s(.*)\s/i",$file,$match);

You must me logged in to write a comment.