WASET
	%0 Journal Article
	%A Xiaoming Jiang and  Jinqiao Shi and  Qingfeng Tan and  Wentao Zhang and  Xuebin Wang and  Muqian Chen
	%D 2016
	%J International Journal of Computer and Information Engineering
	%B World Academy of Science, Engineering and Technology
	%I Open Science Index 114, 2016
	%T Proxisch: An Optimization Approach of Large-Scale Unstable Proxy Servers Scheduling
	%U https://publications.waset.org/pdf/10004717
	%V 114
	%X Nowadays, big companies such as Google, Microsoft,
which have adequate proxy servers, have perfectly implemented
their web crawlers for a certain website in parallel. But due to
lack of expensive proxy servers, it is still a puzzle for researchers
to crawl large amounts of information from a single website in
parallel. In this case, it is a good choice for researchers to use
free public proxy servers which are crawled from the Internet. In
order to improve efficiency of web crawler, the following two issues
should be considered primarily: (1) Tasks may fail owing to the
instability of free proxy servers; (2) A proxy server will be blocked
if it visits a single website frequently. In this paper, we propose
Proxisch, an optimization approach of large-scale unstable proxy
servers scheduling, which allow anyone with extremely low cost to
run a web crawler efficiently. Proxisch is designed to work efficiently
by making maximum use of reliable proxy servers. To solve second
problem, it establishes a frequency control mechanism which can
ensure the visiting frequency of any chosen proxy server below the
website’s limit. The results show that our approach performs better
than the other scheduling algorithms.
	%P 1149 - 1154