A Web crawler is a computer program that browses the Web in a methodical, automated manner, or in an orderly fashion. Other terms for Web crawlers are automatic indexers, bots, and Web spiders. The process is called Web crawling or spidering.
For more than a decade, UltraTech Partners have been developing some of the most sophisticated Web crawlers customized to the need of particular clients. Utilizing HtmlUnit which is one of the most powerful headless browsers, in combination with XPath and regular expressions technology allows us to make our crawlers as intelligent as needed. The following are the 2 main purposes of custom Web Crawlers that we develop for our clients:
1. Automation
There are many processes in many companies today where part of the process requires interaction with external Web sites such as downloading reports or extract data which are consumed by other parts of the process. Although many companies may be good in automating other parts of the process, but most companies still require the Web interaction to be done manually since they don't have the knowhow to automate this part of the process. As a result a process which could be completely automated becomes dependent on manual takes. To make the issue clear, we describe automation of a process for eBay that saved eBay millions of dollars in terms of lost business and man hours.
Automation Process We Developed for eBay
eBay was developing a new application which required extracting and mixing data from 3 of their internal web applications, which in turn were mixed with data from 2 spreadsheets. The number of entries in the spreadsheets, which corresponded to the number of entries in the 3 Web applications, was in hundreds of thousands. The process were to read corresponding entries in the spreadsheets, then read the corresponding entries in the Web applications, mix and transform all these entries, and finally create the final entry for the new application. Due to multiple reasons, direct access to the databases of the 2 applications was neither desirable, nor permitted. Also, due to lack of knowledge of the particular team of Web crawlers, automation of this process seemed impossible.
However, automation of this process to UltraTech was trivial due to our extensive knowledge in automation and customized Web crawlers. Once the project was entrusted to UltraTech, we were able to easily automate the whole process. eBay benefitted greatly from the automation of the process. Some of the benefits were the following:
-
The manual process had been estimated to take 9 month to complete due to the sheer number of entries which had to be manipulated manually. It took UltraTech 2 month to implement and test the automation application. Once the implementation was complete, it took the application less than a day to process all the entries.
Due to the fact that this application was a great revenue generator, introduction of this application 7 month earlier meant tens of millions in revenue. On top of that eBay saved hundreds of thousands in terms of man-hours pay that would have been had to pay to employees who for nine month would have done nothing but work to manually mix and transform data from all these sources.
- As it is usual in big companies, the requirements are very dynamic and this application was not an exception. With the manual process any change would have been almost impossible since the tasks for each or most entries would have had to be repeated. For the automation process, the requirement changes were applied in 2 days; then all it was necessary was to rerun the whole process to generate new results.
- The manual process would have been error prune since such tedious and precise tasks are almost impossible for a human operator to be executed flawlessly. For the automation process, there were zero errors since as long as the automation process is tested correctly, there won't be any errors.
The worldwide web is the largest source of information available today. But finding, collecting and making sense of the information on the Web is time consumingwhen done manually.
Many organizations utilize Web data mining to extract real-time information from the Web for competitive intelligence, market intelligence, content aggregation and more. Web data mining refers to a computer software technique of extracting information from websites. Web data mining focuses on the extraction and transformation of unstructured Web content (which is typically in HTML format), into structured data that can be stored and analyzed in a central local database. Uses of Web data mining include online price comparison, products data collection, automated monitoring competition's Web sites, website change detection, Web research, and Web content mashup and Web data integration.
UltraTech has been developing customized crawlers to execute data extraction from Web sites for almost a decade. We can provide you customized crawlers which mine, monitor, collect and aggregate valuable nuggets information from the Web including Websites of your competitors.
To inquire about developing customized Web crawlers, and Web data mining services, use our inquiry form to contact us, or send us an email at support@ultratechpartners.com.
