|
Case 1: Collect web data from an online bookstore
Web Grabber is custom designed to search an online bookshop website for a list of ISBN numbers provided in an Excel file. The software program can read the input ISBN numbers one by one, search on the website, and extract data from result pages. The output data include book name, author, price, and reader reviews etc. Results are saved back into the same Excel spreadsheet where the ISBN numbers are imported.
Case 2: Generate a company list by extracting data from a business directory website
This project is to use Web Grabber to get a complete company list from a targeted business directory website. The software program can go to the main search page of the targeted website and automatically search for companies. After the search result is generated, it can extract required web content from result pages, separate output content into different data fields such as company name, address, phone number, email and website address, save results into database, and navigate the whole website until all companies are searched.
The final database contains a complete company list of more than 51,900 records. It took two days for Ficstar to customize Web Grabber for the project. The customized program only took about three hours to search the website and generate the final outputs. A normal computer operator can copy and paste content from the same website at a speed of 100 pages per hour. If a person works 40 hours per week, manually collecting the same output data needs 13 weeks!
Case 3: Copy all product information from an e-commerce website into database
A web data extraction program is needed for a client to extract all product information from a major supplier's website. There are more than 150,000 products on the targeted website and it is required all products should be saved in the exactly same format into an Access database with all possible product information including product title, description, specifications, price, stock inventory etc. A major challenge is that products are separated into about 120 categories and thousands of subcategories on this website. Not only all products on the website need to be saved, but the category information for each product needs to be collected as well. Another main function required is the extractor program needs to update the product information in the future as normally products on the supplier's website will get updated after a period of time.
Ficstar Web Grabber is customized for the e-commerce data extraction. The software development process for this project lasted less than two weeks including programming, beta testing, full data extracting, and update testing. Besides all product information saved into the database, the software also extracted more than 51,000 product images, 47,000 product technical data sheets (in PDF files with a total size of 11G Bytes) and 150,000 revised html source files without original company information for easy-adding of the client's own messages
|