✨What is a web crawler?
✨How does a web crawler work?
✨What are the differences between it and a web scraper?
Get yourself refilled with all info related!
👉Subscribe and Visit Us:
Download Octoparse 8
Join our community!
In this web scraping tutorial, I’ll show you how to scrape Twitter data in 5 minutes without using Twitter API, Tweepy, Python, or writing a single line of code.
As Octoparse simulates human interaction with a webpage, it allows you to pull all the information you see on any website, such as Twitter. For example, you can easily extract Tweets of a handler, tweets containing certain hashtags, or posted within a specific time frame, etc.
All you need to do is to grab the URL of your target webpage and paste it into Octoparse built-in browser. Within a few point-and-clicks, you will be able to create a crawler from scratch by yourself. When the extraction is completed, you can export the data into Excel sheets, CSV, HTML, SQL, or you can stream it into your database in real-time via Octoparse APIs.
Step 1: Input the URL and build a pagination 1:24
Twitter applies “Infinite scrolling” technique, which means that you need to first scroll down the page to let Twitter load a few more tweets, and then extract the data shown on the screen.
Step 2: Build a loop item to extract the data 2:28
Make sure you go into the action setting of the “extract data” step. Click on the handler, and click “extract the text of the selected element”. Repeat this action to get all the data fields you want.
Step 3: Modify the pagination setting and execute the crawler 4:03
As we want Twitter to load the content fully before the bot extracts it, let’s set up the AJAX time out as 5 seconds, to give Twitter 5 seconds to load after each scroll.
Then, set up both the scroll repeats and the wait time as 2 to make sure that Twitter loads the content successfully. Now, for each scroll, Octoparse will scroll down for 2 screens, and each screen will take 2 seconds.
Head back to the loop item setting to edit the loop time to 20. This means that the bot will repeat the scrolling for 20 times.
Check out our Help Center for all web scraping tutorials
***About Us***
Octoparse data extraction: is a #webscrapingtool #webcrawler specifically designed for scalable data extraction of various data types. It can harvest URLs, phone, email addresses, product pricing, reviews, as well as meta tag information and body text. Octoparse is a SIMPLE but POWERFUL web scraping tool for harvesting structured information and specific data types related to the keywords you provide by searching through multiple layers of websites.
*** FREE TRIAL ***
Start FREE-14-Day Trial
Start FREE-30-Day Enterprise Trial
*** FOLLOW TEAM ! ***
Email: support@octoparse.com
Skype: Octoparse
Twitter:
#Twitterscraper #Twitterextractor