YouTube is forever trying to make it difficult for us to scrape data from their site. They have recently made some changes that made it so that Scrapebox was unable to directly scrape YouTube channels.
Fortunately, there is an easy workaround for this. Simply use the standard Scrapebox harvester and use the following search string: <code>”https://www.youtube.com/channel/*” keyword</code>
Use the keyword scraper if you need to get more results, but do keep in mind that you will dilute how specific the niche is that you are targeting.
Just as the title suggests, today I will show you how to scrape some basic user data from blackhatworld.com
You may be wondering why? Well, because I can, but also, some people may be able to benefit from the scraped info. For example, with the user data, we will scrape today, you can easily see who is the most active users are. Then you can select the users without premium membership and offer to buy them one. The reason for this is simple, in return, the user will surrender their comment signature and thus give your ad’s more exposure. In the world of SEO, this is actually a very cheap way to advertise.
Now let’s get started. What you will need for this tutorial is a registered copy of Scrapebox, Link Extractor Plugin installed, and the Premium Article Scraper addon.
We need to see how many registered users are online now. Just go to https://www.blackhatworld.com/online/?type=member and scroll down to see how many pages of results there are. Usually, there will be anywhere between 35 – 50 pages of results.
Now we need to generate result page URL’s. With the exception of the first-page result, the rest of them are sequential.
You can simply copy and paste the URL’s above into an excel or Google sheet and drag the corner of the last cell to make as many result pages as you require.
Paste your generated URL’s into the Scrapebox harvester and open the Link Extractor plugin. Before you start scraping, you will first need to add a filter to remove any URL’s not containing /members/ and set your numbers of connections to 1 with a delay of 5 seconds. If you don’t do this, you will be blocked from the site
We will now be using the Premium Article Scraper to extract the user details. But first, you will need to add the following configuration file to your Scrapebox folder. Download this zip file and extract it to ~\scrapebox64\Plugins\ArticleScraper\Definitions
Now open the plugin and there should be BHW members in the configurations. Select that and load URL’s from Scrapebox harvester. Now you need to adjust the number of connections again or you will be blocked. In the options tab on the bottom right-hand side of the plugin is the options. Navigate to the “worker” threads and set that to 1. If you have private proxies, then you can ignore that last step.
You can now hit start and wait for the scraping to finish. Once it has complete, you will need to export the results to a new folder dedicated to BHW user data.
Also, you will need to make sure the following export settings are made.
Select Both Feilds
Use field value as file name
Use | for the separator.
And overwrite the file name if it exists. Useful for updating current users.
What we need to do now is merge the scraped user data files together. This is very easy to do in windows. Sorry, I’m not a Mac guy. Just copy the folder location to your clipboard and the open windows command prompt. Type CMD in the windows search bar to get to it quickly. In the terminal type cd [paste your folder pather here with the brackets] then hit enter Now type the following command copy *.txt bhwmembers.txt You should now see a file named bhwmembers.txt
Now here comes the tricky part, we need to format the new file so that it can be imported into a spreadsheet. Before we start though, you will need a portable application called “Find and Replace”, it’s a free program that’s very useful for filtering scrapes.
To filter out the mess from our harvested user data, you will need to run the application and replace the following text with |
You might notice that there are a lot of unwanted spaces, don’t stress, these will automatically be removed with the last step.
Now, all that’s left to do now is to import the txt file into google sheets.
Just login into https://docs.google.com/spreadsheets/u/0/ create a new sheet and import the file from the “file” menu on the top left-hand side.
Before you hit import, you need to make sure that you mark the “separator type” with |
Congratulations! You now know how to scrape blackhatworld.com user data.
NOTE: I have since discovered that some BHW users have the | symbol in their user profile. This obviously will cause issues for you when sorting. To solve this, just use a different character for the seperators.
With this tutorial, I will show you how you can harvest all of the data you need to make your own copy/clone of wplocker.com and publish it to WordPress.
Here is an example of what you will be able to achieve.
As you can see, we can scrape the Title, Meta Description, Demo URL with lolinez link anonymizer still in place, and the download URLs. With the download URLs, you can keep them as non-clickable text or convert them to clickable hyperlinks with a free plugin called “Auto-hyperlink URLs“.
To start with you will first need to download a config file I have made and copy it to the Scrapebox configuration folder. I have made 2 separate configs, 1 each for WordPress Plugins and the other for Themes. This will make it a lot easier when posting.
Now you will need to scrape for the latest downloads. On average, wp locker will publish about 6-8 pages each of the plugins and themes per week. And what I would recommend, is just to scrape the latest 6-8 pages each week for fresh content.
For this to work, you will need to have the free link extractor addon for Scrapebox. Now just copy and paste the following URLs into Scrapebox and open the link extractor.
Before you run the link extractor, it is recommended that you add some filters to prevent pages without downloads. Use the following settings for best results.
Number of connections:1
Seconds of delay:15
Remove URLS not containing;
for plugins: /wordpress-plugins/
for themes: /template/
Once you are done harvesting URLs, we can then move onto grabbing the data we need. All we need to do is just import the list of harvested URLs into Scrapebox and go to the “Grab/Check” dropdown menu, follow the list down to “Custom Data Grabber” and click on wp locker.
Save the data as text and you now have the raw data required to make your own clone of wp locker.
So, if you are lazy like me and you just want to publish the new pages to your WordPress site without having to do too much, just us the “batch poster” in the Article Scraper Premium plugin from Scrapebox.
And that’s it. Once a week repeat the process and you too can have a very professional and up-to-date website pedaling nulled WordPress Themes and Plugins.
Now you will need to merge the file contents with the keywords in Scrapebox. Please make sure that you have the custom footprint selected before you start harvesting.
With your keywords now prefixed with the operator search phrase you can begin harvesting. You will notice that you will only get between 6-10 results for Pinterest user boards for every keyword. Don’t worry about this, you are about to get a whole lot more with the next step. And afterwards trim them down to just the user.
Now you will have to load the Link Extractor addon in Scrapebox. If you don’t have it, you can easily download it for free from the add-ons tab and hit ‘Show Available Addons’.
Before you import your harvested profile URLs, you need to input some setting to prevent extracting non-relevant links. In the settings tab from the link extractor add the following lines
I normally run this a 10 connections at once, but you may need to adjust this depending on how big your list is.
You can now expect to get about 100 results for every URL. If you wan to really expand your list, you can run the new results through the Link Extractor again. However, the more you do this the more that you will dilute your niche. Before you move on from here, it is a good idea to remove duplicate URLs in Scrapebox.
So, by now you should have a long list of Pinterest User Boards. Now we need to trim the URLs down to the User level. The best way to do this is with Notepad++, this is a free notepad software that has extra functionality. If you don’t already have it installed, go to https://notepad-plus-plus.org/downloads/ and install the latest version.
All you need to do now is to open your list in Notepad++ and press Ctrl H. This will popup a window with some replace functions. For this section we are going to use Regular Expression aka Regex to find and trim the URLs down to the user level. Now just paste the following Regex into the ‘Find what’ search box.
Leave the ‘Replace with’ part blank. Now hit ‘Replace All’ and all of the URL’s have been trimmed down to the Pinterest User level.
Great!, now you have your own list of Pinterest users for a specific niche. Now it’s time to scrape for those homepage URLs.
For this to work, we need to assume that the users homepage itself has a link back to their Pinterest profile and is indexed by Google. If this is the case, all you need to do is enter the following keyword into Scrapebox and use your scraped Pinterest user URLs as a merge file.
Just use Google as your harvester engine and you will usually only ever get results to the Pinterest’s users homepage.