Web scraping is a real time-saving workhorse, but that’s only half the story.
It can also make you rich.
Okay, that was a little tasteless – but it’s no joke – data is digital gold.
But before we Scrooge McDuck dive into any binary nuggets, we should gloss over a few things web scraping. Here’s what I propose:
You make peace with any dad-humor that haunts these pages, and I’ll get straight-to-the-point of what exactly web scraping is and how it can bring home the bacon.. or cabbage. (Vegan friendly).
What is web scraping?
Web scraping is when someone collects data from the internet. (Think copy and paste, but faster)
They use an app or script to automate data collection, remove any unnecessary information and organize it in a library.
Web scrapers download a robot.txt file to find out where they can go on a website. Then, they follow links to new pages and compile a list or “crawl queue.”
It’s kind of similar to queueing a playlist on Spotify as you discover songs you like. Except for a web scraper palate does not have such a refined taste and queues everything.
As the scraper continues down the growing list, it branches out into different pages until its job is complete.
What exactly is it that one scrapes?
The web of course. More specifically, you collect valuable information from web pages that can be used to start and run a business, contribute to research projects, and automate tedious tasks.
A few other examples include:
- You can look at news and social media feeds to see what competitors are doing.
- Find out what products are hot on eBay.
- Offer the best prices for air travel with price aggregation.
- Scan for value discrepancies that can turn a profit.
You can program web scrapers to do anything that you can do online – but thousands of times faster.
How do you get a web scraper?
You could write your own scraper from scratch, or you could use what someone else has already written. If you’re not a coder, then you should probably opt for something like Octoparse that will save time and speed up the process.
One example that’s been around a while would be Yahoo’s YQL (Yahoo Query Language) API. It provides access to many different kinds of data sources including RSS feeds, local weather forecasts, or movie listings – anything that gets updated regularly online becomes available through this service without any additional coding required. The great thing is it doesn’t need to be limited to web pages but can also get stock market quotes, social media news feed monitoring, or financial reports.
That’s pretty old school though.
Here are some examples of modern scraping extensions for your browser:
With the free version, you can scrape up to 500 pages of data each month. It’s not much, but if you want more you can upgrade to a paid plan.
2. Web Scraper
This scraper has both a chrome and cloud extension that works with a simple point-and-click that requires no coding experience at all. It works with current web languages and easily integrates with automating software and proxies.
This software is easy to use, but I recommend that you have some experience with coding. If you click on any text in a table or list and then choose “Scrape Similar” from the browser menu, you can get information and content by adding new columns using XPath or JQuery.
I could go on and on – the web is full of scraping tools. When you choose which one to use, there are a few things to think about:
How much time do you have? Is this something you’re doing regularly or just once?
What’s your budget? Do you already own software like Google Analytics which has web crawling capabilities built-in so why pay more if what you need is free?
How to really make money web scraping
The vast world of web scraping boils down to one main ingredient: information.
That information can be used as a sort of currency in all manners of business exchange:
- Directly selling or leveraging information
- Support business automation
- Optimize trading and commerce
The sale of information is pretty straightforward – entire spy movies revolve around a thumb drive that contains valuable information.
But what about automation and commerce?
Look at it this way:
Every product down to the pixel is information.
1. Start a business that sells information.
- Financial guru – Compile the news and events that impact the stock market, real estate, and cryptocurrency.
- SEO extraordinaire – Provide keyword research and content marketing advice.
- Business consultant – Offer deep dives into industry competition and market trends.
In these cases, you would seek out information that people already pay for and package it as a product. You could also offer it free on your website to score traffic or sell out as affiliate advertising.
2. Web scraping as a middleman service.
- Travel fare aggregation – Scrape the web for the best prices on airfare, hotels, and other travel services as a service. This requires continuous web scraping on multitudes of travel websites, so you’ll need to use rotating residential proxies. As you probably know, Google deploys Google Spiders to bring you the latest on hotels and airfare. Meanwhile, other companies like Expedia, Skyscanner, and Hostelworld capitalize on different travel niches.
- Stock brokerage or hedge fund management – Everyone’s an investment genius after they buy their first stock or crypto coin. But anyone with a track record of keeping their portfolios in the green are well aware of information bias. In order to see the big picture, it’s crucial to have big data. The only way to get that is with bots to gather information that’s free from the narrowing filter of human perception. With that kind of support, you can successfully manage risk – a service people will hand over their money for (if you can give it back to them, with interest).
- Marketing and advertisement – Instead of just being an informant for marketing agencies and businesses, you can be the source of information. Once again, Google with their Google Analytics claim some sort of authority and offshoots like SEMRush and AnswerThePublic pick up the sizeable slack. You may think there isn’t anymore slack to be had, but that’s just not true. Everything in the world is transplants and multiplies online and someone has to sort all that stuff out. (For a reasonable price, of course)
3. Web scraping hot-ticket items
You want to keep an eye out for the hype. That way your risk is low and reward is high. In other words, you won’t be stuck holding the bag and sell at a loss.
- Sneakers – A unique resale industry that blossoms from the heart of sneakerheads. The limited-release sneakers are where the money’s at, with an easy 10x return on some Yeezy’s or Jordan’s. However, the learning curve is steep if you’re starting out – but there’s plenty of guides to prime you for profitable sneaker flipping.
- Electronics – Electronics like the PS5 or computer graphics cards are really easy to resell, and even earn a lucrative living from. Just like sneakers, the competition is fierce.
- Event tickets – This may be the OG resale item. There’s a reason why ticket sales feel rigged – they kind of are. Bots scoop up most highly-prized event tickets to be scalped at a premium price.
- Non-fungal tokens or NFTs – Some NFTs are incredibly hard to get your hands on. Probably because half the bids are made by bots. Bots in this example are entering multiple bids and raffle entries in order to secure as many NFTs for themselves to later resell for crazy profit on marketplaces like OpenSea, Solanart or DigitalEyes.
In any of these cases, web scraping has a slightly different function. They still crawl web pages and record data, but they also automate the checkout process.
If you throw in some proxies, you can multiply these checkouts to increase your chances to win. In fact, it’s absolutely necessary to run any automated software – bots and web scrapers – with proxies. If you don’t, then your whole operation will fail when your IP address is banned.
On their own, each of these strategies is worth their time and effort. But what do you get if you combine them?
Some sort of machine that eats information and poops paychecks.
Market insights literally stare you in the face, but the sheer volume overwhelms our processing limits. While I like to believe we can temporarily master the matrix (like Neo) – a web scraper is a little more dependable.
In seconds, you can:
- Analyze the current condition of the financial market
- Identify market changes and trends
- Keep up with national and global news that affects stocks and economics
- Get a read on consumer sentiment and behavior
Anything you can do online, web scrapers do on a much grander scale.
All thanks to proxies.
(Proxies are what conceal your presence from Agent Smith)