Yellow Pages Scraper > Easily Scrape Data From The Yellow Pages

How to Scrape The Yellow Pages

Before we had the internet, one of the best ways to find new leads for your business, was with the Yellow Pages. A world-renowned business directory. Although the internet has made it easier at times to find new leads, they are not always accurate.

The Yellow Pages listings are still one of the most reliable sources to generate new leads. The problem is that it hasn’t been all that easy to scrape the listings on the Yellow Pages and generate your own list of leads.

The makers of Scrapebox have developed a very easy to use plugin aptly named ‘Yellow Page Scraper Plugin’. Don’t get me wrong, there are other premium Yellow Pages scrapers out there, but none of them are as fast at harvesting the data. Furthermore, the Scrapebox plugin integrates really well with Scrapebox itself. For example, you can use Scrapebox to generate keywords for the niche you require leads for. Then you can easily import them into the Yellow Pages Scraper. Not bad for $37


Yellow Pages Scraper

Free Instagram Image and Video Scraper

Instagram Scraper

Instagram is a great source of free images and clips for your SEO projects. This Instagram scraper has been developed by arc298 from Github and is available free of charge with unlimited use. This is a python script that is very easy to configure and use.

Whenever I use Python scripts, I always use Ubuntu in a virtual machine. The main reason is that I find it a lot easier to run Python scripts in Ubuntu and because I can keep all of my scraping scripts in it with all of the instructions in the one place. Furthermore, it can be running tasks in the background and not bother me while I’m doing other things on my computer.

So, assuming you are running a fresh installation of any variation of Ubuntu, these are the instructions for installation and use.


To install instagram-scraper:

$ pip install instagram-scraper

To update instagram-scraper:

$ pip install instagram-scraper --upgrade

Alternatively, you can clone the project and run the following command to install: Make sure you cd into the instagram-scraper-master folder before performing the command below.

$ python install

how to use

To scrape a user’s media – all images and videos:

$ instagram-scraper <username> -u <your username> -p <your password>  

without the < > characters

NOTE: To scrape a private user’s media you must be an approved follower.

By default, downloaded media will be placed in /.

Providing username and password is optional, if not supplied the scraper runs as a guest. Note: In this case, all private user’s media will be unavailable. All user’s stories and high-resolution profile pictures will also be unavailable.

To scrape a hashtag for media:

$ instagram-scraper <hashtag without #> --tag 

It may be useful to specify the –maximum <#> argument to limit the total number of items to scrape when scraping by hashtag.

You can also supply a file containing a list of usernames:

$ instagram-scraper -f ig_users.txt       
# ig_users.txt


# and so on...

The usernames may be separated by newlines, commas, semicolons, or whitespace.

You can also supply a file containing a list of location ids:

$ instagram-scraper --tag <your_tag_here> --include-location --filter_location_file my_locations.txt  
# my_locations.txt


# and so on...

The resulting directory structure will be;

├── some_reagion1
│ └── images_here
└── some_reagion2
└── images_here

The locations can only be separated by newlines and spaces.


--help -h               Show help message and exit.

--login-user  -u        Instagram login user.

--login-pass  -p        Instagram login password.

--followings-input      Use profiles followed by login-user as input

--followings-output     Output profiles from --followings-input to file

--filename    -f        Path to a file containing a list of users to scrape.

--destination -d        Specify the download destination. By default, media will 
                        be downloaded to <current working directory>/<username>.

--retain-username -n    Creates a username subdirectory when the destination flag is

--media-types -t        Specify media types to scrape. Enter as space separated values. 
                        Valid values are image, video, story (story-image & story-video), broadcast
                        or none. Stories require a --login-user and --login-pass to be defined.
--latest                Scrape only new media since the last scrape. Uses the last modified
                        time of the latest media item in the destination directory to compare.

--latest-stamps         Specify a file to save the timestamps of latest media scraped by user.
                        This works similarly to `--latest` except the file specified by
                        `--latest-stamps` will store the last modified time instead of using 
                        timestamps of media items in the destination directory. 
                        This allows the destination directories to be emptied whilst 
                        still maintaining history.

--cookiejar             File in which to store cookies so that they can be reused between runs.

--quiet       -q        Be quiet while scraping.

--maximum     -m        Maximum number of items to scrape.

--media-metadata        Saves the media metadata associated with the user's posts to 
                        <destination>/<username>.json. Can be combined with --media-types none
                        to only fetch the metadata without downloading the media.

--include-location      Includes location metadata when saving media metadata. 
                        Implicitly includes --media-metadata.

--profile-metadata      Saves the user profile metadata to  <destination>/<username>.json.

--proxies               Enable use of proxies, add a valid JSON with http or/and https urls.
                        Example: '{"http": "http://<ip>:<port>", "https": "https://<ip>:<port>" }'

--comments             Saves the comment metadata associated with the posts to 
                       <destination>/<username>.json. Implicitly includes --media-metadata.
--interactive -i       Enables interactive login challenge solving. Has 2 modes: SMS and Email

--retry-forever        Retry download attempts endlessly when errors are received

--tag                   Scrapes the specified hashtag for media.

--filter                Scrapes the specified hashtag within a user's media.

--filter_location       Filter scrape queries by command line location(s) ids

--filter_location_file  Provide location ids by file to filter queries 

--location              Scrapes the specified instagram location-id for media.

--search-location       Search for a location by name. Useful for determining the location-id of 
                        a specific place.
--template -T           Customize and format each file's name.
                        Default: {urlname}
                        {username}: Scraped user
                        {shortcode}: Post shortcode (profile_pic and story are empty)
                        {urlname}: Original file name from url.
                        {mediatype}: The type of media being downloaded.
                        {datetime}: Date and time of upload. (Format: 20180101 01h01m01s)
                        {date}: Date of upload. (Format: 20180101)
                        {year}: Year of upload. (Format: 2018)
                        {month}: Month of upload. (Format: 01-12)
                        {day}: Day of upload. (Format: 01-31)
                        {h}: Hour of upload. (Format: 00-23h)
                        {m}: Minute of upload. (Format: 00-59m)
                        {s}: Second of upload. (Format: 00-59s)

                        If the template is invalid, it will revert to the default.
                        Does not work with --tag and --location.

If you are not familiar with Python and or Ubuntu, just be patient. It is really not that hard, just give it a go!