Web Spiders are also known as Crawlers, Bots, Ants or Scutters and the process of web data extraction may also be referred to as Web Crawling, Web Scraping or Automatic Indexing.
Spiders are automated bots or programs that 'crawl' webpages for the purpose of extracting or 'scraping' site information. They have many different uses on the web today, but in the context of fraud and risk mitigation they can be used to screen and analyze a website to look for signs of fraud or high risk.
Businesses who wish to use Spiders or Web Crawling services can build their own basic Spider bots or programs, purchase software for performing site crawls, or go through a provider who will help them design and execute custom projects. Businesses may use Spiders as part of initial screening and onboarding checks, or they may also continue to use Web Crawling for ongoing monitoring of websites.
Web spiders have many different uses, both good and bad:
Google and other search engines use spiders to copy and index webpage data, extract page titles, descriptions, keywords and links. Collecting this data, keeping it up-to-date and having it on hand are crucial for quickly finding relevant sites for a user’s search query. Also, site content, sites linking in and other information gained from web crawling help search engines formulate their search result ranks.
Site owners use spiders on their own sites to check that all links are still active and to validate their HTML code.
Businesses use spiders to watch what their competitors are doing: when they have press releases, when they offer sales and change their products or prices.
Researchers and business use spiders to harvest investment and financial data for market research. Spiders can be applied for such research in any market.
Fraudsters use spiders to crawl sites and forums for emails; this how spammers build their mailing lists. Besides harvesting emails, a fraudster may also use a spider to copy information from a legitimate site they are emulating for a pharming scam.
THE FRAUD PRACTICE
KEY NOTES
Alternative Solutions - Businesses can build their own Spider bot or perform manual checks, but there are no true alternative third party solutions. If a merchant services business needs to conduct an physical investigation they should consider On-Site Surveys.
Building this In-House - It is possible to build your own spider or bot using programming languages such as Java or PHP. You can also perform many checks manually such as: ensuring a business is legally registered in country or state/province, web domain and WHOIS lookup on site and site owner, searching online and in forums for any negative history or comments, and manually checking web pages to examine content, what they are selling, prices, etc.
Estimated Cost - Basic Spider software can be purchased for under $100 while more advanced software is generally a few thousand dollars. Some Web Crawling providers charge per single project and/or by subscription. For ongoing services providers often charge an initial project setup fee and then a monthly fee for recurring data extraction, maintenance and support. Merchant Website Monitoring vendors may offer services on a per merchant/inspection basis as well as on a subscription service for ongoing monitoring
Sample Venders - N/A
SPIDER AND MERCHANT WEBSITE MONITORING TECHNIQUE OVERVIEW
Spiders are automated bots that scan and copy information from web pages. Spiders perform ‘web crawling’ or ‘web scraping’ on websites to copy the HTML code, text and other content. A merchant services business, for example, may use Web Spiders to investigate the content on their potential clients’ websites. There are also third party services that use Spiders as well as other automated processes and manual analysis for initial merchant website screening and ongoing monitoring. The Merchant Website Monitoring providers check and monitor an online merchant’s activity and content on their website to recognize and address questionable content that may be against regulatory and legal compliance requirements or may be indicative of high risk. These services are intended for merchant acquirers, PSPs, ISOs and other entities that sell or underwrite merchant.
Key considerations when implementing or buying this functionality include:
If purchasing a Spider or web crawling software, will you have available resources from the provider for setting up and performing data extraction?
How is extracted data delivered back to the business? HTML, XML, Excel file, CSV, TSV, TXT, etc.
Can the spider extract images and files such as PDFs, JavaScript, Flash and AJAX?
If using a Spider or web crawling software be prepared to do the investigation and analysis of website data using internal resources. Businesses looking for investigation and analysis of websites from a vendor should use Merchant Web Site Monitoring services.
Does the web content monitoring service ensure inspected websites comply with Visa’s Global Brand Protection Program and MasterCard’s Business Risk Assessment and Mitigation (BRAM) program?
Can the service ensure a merchant is compliant with laws and regulations in the U.S., EU and all other regions an acquirer may underwrite a merchant?
Does the service only check for compliance with card association rules and legal restrictions or can they check the merchant’s site against acquirer specific criteria?
Does the service provide any information about the website’s history or who owns the domain?
Does the service only perform an initial inspection or will they continue to monitor the merchant and their website?
HOW DOES IT WORK?
Web spiders use an automated process to extract and copy information such as site content, HTML code and other information. In a practical application the user can give a spider a URL or list of sites to visit, it will then visit each of these URLs, copy all the site content and information, identify all hyperlinks on the pages and add each to the list of sites to crawl. It may also be set to re-crawl sites at set intervals. All website information is copied and can be viewed or analyzed offline.
Many spider and web crawling providers offer a hosted software or interface for conducting web crawls. The user can specify the sites to visit, specific information to focus on and look for, as well as the method for selecting and crawling the hyperlinks. There are also services that will take care of creating and automating the data extraction process based off the buyer’s instructions. Two important factors are selecting or isolating the key data to be extracted as well as organizing and exporting this data into a useable format.
Depending on the provider users may be setting all of this up on the software or UI themselves, they may use a combination of the hosted software and help with setup from the provider, or they may be working directly with the vendor for a custom project.
Merchant Website Monitoring or website inspections use a combination of automated and human resources to thoroughly comb through a merchant’s website to ensure compliance with laws and regulations, with card association operating procedures and with other terms or agreements an acquirer may have in place with their merchant client. Using both automated and manual checks the service will ensure the merchant is not breaking laws, is in compliance with relevant rules and regulations, and is not performing other high risk activities. Before underwriting a merchant account, an acquirer will want to confirm these qualities of a merchant’s site to properly assess the risk of underwriting this merchant.
The website surveying or monitoring service will ensure the merchant is compliant with Visa’s Global Brand Protection Program (GBPP) and MasterCard’s Business Risk Assessment and Mitigation (BRAM) program. These programs were put in place prevent merchants or merchant services businesses from processing credit card transactions for illegal or unethical goods/services. This includes websites that offer illegal prescription drug sales, counterfeit goods (or any good/service infringing on copyrights), gambling in regions where it is prohibited, tobacco products where they are prohibited, and other illegal or regulated goods or services.
In addition to monitoring a merchant services business’ clients for compliance with laws and regulations, an acquirer will likely have other risk indicators they want to check. If a merchant says they are selling books, for example, their acquirer won’t want to find them selling electronics. The providers can offer on-going monitoring to check the items a merchant is selling, and not only that they are legal and unrestricted items, but they are the types of products/services the acquirer has approved them for. The service may also check product pricing to ensure prices are in-line with the market (very low prices may be a scam), check web content to ensure they aren’t making false claims or using deceptive advertising, and other checks to monitor the risk associated with a merchant’s site.
HOW DO YOU USE THE RESULTS?
There are many different ways to use spiders, but in the context of fraud and risk mitigation the main uses are to obtain information about websites and to check that hyperlinks aren’t going to pages hosting malware. For a merchant services business, they might use a spider to check the websites of potential and current merchant clients.
Using a spider to copy the site information the business would then want to check that the web pages have real content that matches the information provided by or already known about the merchant. The site content should provide value as well, it should include detailed descriptions of products and services and other information that shows time and effort were put in to making the website. Scam and shell websites won’t have the detail and effort of a real business. You should also look for red flags, such as selling illegal or prohibited items, or any other indications of fraud or risk that influence whether or not to underwrite a merchant.
The spider should also be used to check all links, subdomain pages and HTML code. First, it should check that the sites aren’t hosting any malware. A merchant that had many links to product reviews and other information related to their business would be a positive signal. Also, the spider can provide a list of and check all subdomain pages, this helps the business keep track of all their pages and when new products or pages are added. Additionally spiders can validate HTML code; a legitimate business will have concise page coding, as opposed to a business shell for a fraudster that will likely have a page written in quick and sloppy code.
When using spiders, the only results are the data they extract. What the data can tell you and how it can be used is important, and there are many ways to use spiders and website data other than what’s described above. The main point is that it’s up to the business or user to analyze this data, and this could involve extensive human resources. Most Website Monitoring companies use spiders, but the value of these providers is that they have many automated processes, and may also use manual resources, on website data to provide site analysis.
With Merchant Website Monitoring the merchant services business requests one-time or ongoing monitoring for merchants they underwrite or are considering underwriting. If there is any inclination of illegal or prohibited activities the merchant services business will receive an immediate alert. The business may also receive monthly or quarterly reports for each of the merchants being surveyed or monitored.
If a merchant service business utilizes this service for new merchant on-boarding, merchants that sell illegal or prohibited goods, don’t meet regulatory standards, have website characteristics inconsistent with their merchant account application or don’t meet the merchant service business’ proprietary standards should not be approved for merchant accounts.
If a merchant services business uses these services to monitor existing clients than merchants that begin displaying risky behavior or not following correct procedures can be recognized and addressed immediately. For minor offenses a merchant can be given a warning and deadline to correct the issue. For serious offenses, such as selling illegal and prohibited goods, the merchant services business may elect to nullify their services agreement with the merchant as they are likely in breach of contract.