Веб-скраппинг

Scraping Instagram: What Actually Works (And What Doesn’t)

Эй Джей Тейт
13 января 2025 года

If you arrived here looking for “scrape Instagram follower lists at scale by farming logged-in accounts” — that approach is dead. Meta closed it down over the past few years and the unofficial Python libraries that wrapped it (Instagramy, the older instagram-scraper forks, instaloader’s logged-in mode) have been broken or actively dangerous since 2022.

Устали от того, что IP-блокировки тормозят вашу работу? Воспользуйтесь нашими прокси-серверами из частных сетей для высокоскоростной ротации или безопасными прокси-серверами от интернет-провайдеров, чтобы обеспечить максимальный срок службы аккаунта.

What’s left is a narrower but more honest picture: there’s meaningful public data on Instagram that can be collected, the official Graph API covers a specific business use case, and a handful of legitimate techniques cover the gap between them. This post walks through what actually works for collecting Instagram data in 2026, what doesn’t, and where the legal and ethical lines fall.

What you can collect, and what you can’t

The single most important thing to understand before starting an Instagram data project in 2026: the access boundary has hardened. There’s public data and private data, and the gap between them is bigger and more strictly enforced than most older articles acknowledge.

Publicly accessible without login (scrapable):

Public profile data — username, full name, bio, follower count (approximate), following count, post count, verification status, business contact info on business accounts
Public posts on public accounts — images, video URLs, captions, hashtags, like counts, comment counts
Comments on public posts — text, author username, timestamp, like count
Public hashtag feeds — recent and top posts for any hashtag
Public location pages — posts tagged at a location

Requires the official Graph API (your own accounts only):

Detailed analytics on accounts you own or manage
Direct message access (your own DMs only)
Posting and content management
Ad performance data
Insights data for your own posts

Off-limits regardless of method:

Private account data — anything not visible to a logged-out viewer
Direct messages between other users
Email addresses, phone numbers, or other PII
Detailed follower lists at scale on accounts of any meaningful size (Instagram aggressively rate-limits even legitimate access to follower data)
Stories on private accounts (and increasingly limited even on public ones)

That last point about follower lists deserves emphasis. The original version of this article was titled “Scraping Instagram Followers” and the workflow it described — log into an account, scrape its target’s followers, repeat — is the single thing Meta has worked hardest to prevent. Even the official Graph API doesn’t expose follower lists in any useful form. Treat scraping follower data of any specific account as essentially impossible in 2026, and any “scraper” claiming to do it reliably as either broken, lying, or a few weeks from being broken.

Why the old methods don’t work

Worth being explicit about what changed, because the internet is still full of outdated tutorials:

Login-based scraping is dead. Logging into Instagram with a real account from an automated tool, then scraping under that account’s session, was the standard approach from roughly 2015 to 2020. Meta’s account-security systems now flag this pattern within minutes. The account gets a checkpoint challenge, then a temporary block, then a permanent ban. This applies whether you’re using instagrapi, instaloader in logged-in mode, Selenium with a real Instagram account, or any of the “Instagram scraper” GUI tools that ask for your credentials. Don’t do it.

The official Graph API stopped serving the use case. Meta’s API was tightened progressively through 2022–2025. The legacy Instagram Basic Display API was deprecated. The current Graph API exists, but its purpose is helping businesses manage their own Instagram accounts — not analytics on other accounts, not competitor research, not data aggregation. If your use case is “get data on accounts I don’t own,” the official API is not the answer.

Most “Instagram scraper” Python libraries are abandoned or broken. The active ones are a much shorter list than older articles suggest. Anything that hasn’t been updated in the last six months should be treated as broken until proven otherwise.

Datacenter IPs are flagged immediately. This isn’t new, but the threshold has tightened. Instagram’s bot-detection identifies datacenter IP ranges before the first request completes on many requests. Residential proxies are the floor; mobile proxies are stronger.

requests has a detectable TLS fingerprint. Python’s default requests library produces a TLS handshake that Instagram’s anti-bot system identifies as automated, independent of the IP and headers. In 2026 the standard workaround is curl_cffi, which impersonates a real browser’s TLS stack.

What actually works: three viable methods

For collecting public Instagram data legitimately in 2026, the realistic options are:

Method 1: Official Graph API (your own accounts only)

If your use case is analytics on accounts you own or manage as a business — your brand’s account, your clients’ accounts, accounts where you have explicit access — the Instagram Graph API is the right answer. It’s stable, supported, and gives you detailed data the public web doesn’t expose: post-level insights, audience demographics (anonymized and aggregated), Stories metrics, and engagement breakdowns.

What it doesn’t do: data on accounts you don’t manage. The API was deliberately designed to prevent this.

Setup involves a Meta Developer account, a Facebook app, business verification for the app, and an Instagram Business or Creator account connected to a Facebook Page. The full process takes a few hours if everything goes smoothly and longer if it doesn’t.

Method 2: Scraping public web data directly

For data on accounts you don’t own — competitor research, brand monitoring, hashtag tracking, influencer discovery — direct scraping of the public web is the path. Instagram’s web frontend communicates with backend GraphQL endpoints; those endpoints return structured JSON that’s much easier to work with than parsing HTML.

A skeleton in Python using curl_cffi:

python

from curl_cffi import requests
import json

def get_public_profile(username):
    url = f"https://www.instagram.com/{username}/?__a=1&__d=dis"
    headers = {
        "x-ig-app-id": "936619743392459",  # Public app ID used by web frontend
        "Accept": "*/*",
        "Accept-Language": "en-US,en;q=0.9",
    }
    response = requests.get(
        url,
        headers=headers,
        impersonate="chrome120",  # curl_cffi browser impersonation
        proxies={"https": "http://USER:PASS@proxy.example.com:8080"},
    )
    if response.status_code != 200:
        return None
    return response.json()

A few practical notes that aren’t in most tutorials:

The internal endpoint structure changes regularly — Instagram rotates doc_id parameters on GraphQL endpoints every few weeks. Anything you build will require ongoing maintenance.
Sticky sessions are essential for pagination. Hashtag feeds, comment threads, and any paginated endpoint use cursor tokens that are tied to your IP session. Rotating IPs mid-pagination invalidates the cursor and your script breaks silently.
Rate limiting is aggressive even on public data. A reasonable starting point is 1 request every 3–5 seconds per IP, with backoff on any 429 or 401 response.
Mobile endpoints often have more permissive rate limits than desktop. Some scrapers route everything through the mobile API surface for this reason.

This method works but is high-maintenance. The endpoints change, the detection improves, and what worked last month may need patching next month.

Method 3: Managed scraping APIs

For most teams that need Instagram data without dedicating engineering capacity to maintaining a scraper, a managed scraping API is the right answer. These services run the scraping infrastructure and return JSON; you pay per query.

The current options worth knowing about: ScrapFly’s Instagram scraper, Apify’s Instagram actors (community-maintained, several different ones for different jobs), Bright Data’s Instagram dataset and SERP API, and a number of smaller services (SociaVault, the various “Instagram API” vendors that appeared after Meta’s Basic Display API deprecation).

Trade-offs:

Pro: Zero maintenance. The vendor handles proxy rotation, detection evasion, endpoint changes, and parsing.
Pro: Faster to ship. You’re integrating an API, not building infrastructure.
Con: Per-query costs add up at scale. At 100K queries/month, raw scraping with your own residential proxies usually wins on cost — if you’ve built the infrastructure.
Con: Vendor risk. If the vendor’s scraping gets blocked, your pipeline breaks until they fix it.
Con: Data quality varies. Some vendors return cleaner, more complete data than others. Test with a small sample before committing.

For most teams, the right starting point is a managed API. Migrate to in-house scraping only if cost or specific data requirements justify the engineering investment.

The proxy layer

Whichever method you choose, the IP layer matters. For raw scraping (Method 2), this is your responsibility. For managed APIs (Method 3), the vendor handles it but is using residential proxies under the hood — that’s why the API isn’t free.

For Instagram specifically, the requirements:

Residential or mobile IPs only. Datacenter is detected before the first request completes. ISP proxies sit in the middle and work in some cases but are weaker for hard targets like Instagram.
Sticky sessions for paginated workflows. Hashtag feeds, comment threads, follower endpoints (where accessible at all) — anything with a cursor needs the same IP for the duration of the session. Rotation breaks these.
Country-level targeting for region-specific content. Instagram serves slightly different responses depending on the requesting IP’s geography; if you’re collecting region-specific data (Brazilian influencers, Japanese hashtag trends), your IPs need to be in those regions.
Mobile IPs as the strongest detection-evasion play. Mobile carrier IPs (AT&T, T-Mobile, Verizon, Vodafone) are trusted more than residential by Instagram’s systems. Cost is higher; success rate is correspondingly higher on the hardest endpoints.

IPBurger’s residential and mobile proxies fit the public-data scraping case — clean IPs, country-level targeting, sticky sessions when needed — and the broader principle applies regardless of provider: at the volumes that matter, the IP layer determines whether the scrape runs to completion or stalls at 30%.

The legal and ethical layer

This is the section the original post handled poorly and that genuinely matters in 2026. Worth being precise:

Public data scraping is generally legal in the US. The hiQ Labs v. LinkedIn ruling (Ninth Circuit, settled in 2022) established that scraping publicly accessible data doesn’t violate the Computer Fraud and Abuse Act. This applies to Instagram public profiles, posts, hashtags, and comments — anyone can see them without login, and collecting them at scale is generally defensible.

ToS violations are a different matter. Even when scraping is legal, Instagram’s Terms of Use prohibit automated access. Meta can pursue civil action (they have done so) and can ban any accounts associated with the scraping. The CFAA and ToS are separate questions; staying on the right side of one doesn’t put you right on the other.

GDPR applies to EU users’ data regardless of public availability. This is the trap most US-based scrapers miss. If you’re collecting data on EU citizens — even public data — you need a documented legal basis under GDPR, a clear retention policy, and ability to handle deletion requests. Public availability of data does not exempt it from GDPR’s reach.

Avoid these patterns regardless of legal status:

Logging in with fake accounts to access non-public data
Collecting and storing personally identifiable information beyond what’s publicly displayed
Republishing scraped content in ways that violate copyright
Using scraped data for targeted harassment or doxing
Reselling scraped personal data without legitimate business purpose

The honest heuristic: if you’d be uncomfortable explaining your scraping operation to a journalist or to Meta’s lawyers, reconsider. The legal cover for public data scraping is real but narrow.

A reasonable starting workflow

If you’re building an Instagram data project today and you’re not sure where to start:

Define exactly what data you need. Most projects don’t need everything. Narrow the scope — it reduces cost, reduces detection surface, and clarifies your legal exposure.
Check if the Graph API covers it. If you’re only looking at accounts you manage, use the official API. It’s stable, free, and supported.
For public-data projects, start with a managed API. ScrapFly, Apify, or similar. Get the integration working, ship the project, see if the data actually delivers business value.
Migrate to in-house scraping only if cost or scale justifies it. Most projects don’t need to. The engineering cost of maintaining an Instagram scraper is significant and ongoing.
If you go in-house, plan for it as a real engineering investment. Residential or mobile proxies, curl_cffi for TLS fingerprinting, sticky sessions, exponential backoff, monitoring for endpoint changes, and dedicated maintenance time. This is a real project, not a weekend script.
Build for GDPR compliance from day one if any EU data is involved. Retroactively adding compliance is much harder than building it in.

The honest takeaway

The “scrape Instagram followers at scale” playbook from 2018–2021 is over. The methods that defined it — login-based scraping, account farms to bypass rate limits, follower-list harvesting — don’t work reliably, get accounts banned, and increasingly cross legal lines.

The realistic picture for 2026 is narrower but actually achievable: significant public data is collectible through proper public-web scraping or managed APIs, the official Graph API covers business-owned accounts, and the right infrastructure (residential or mobile proxies, modern TLS handling, sticky sessions, careful rate limiting) makes the public-data scrape sustainable. None of it gets you private DMs or scaled follower lists. All of it gets you enough for the legitimate business use cases — competitor monitoring, brand health, hashtag tracking, influencer discovery, public sentiment — that the original post was nominally written to support.

The methods that work are duller than the ones that don’t. They’re also the ones that still exist next year.

Успех вашего бизнеса напрямую зависит от времени безотказной работы ваших прокси-серверов. Перейдите на статические прокси-серверы бизнес-класса от интернет-провайдера, чтобы обеспечить выделенную пропускную способность и непоколебимую надежность. ИЛИ внедрите ротирующиеся прокси-серверы для частных пользователей и достигните коэффициента успешности сбора данных на уровне 99,9%.

Хватит беспокоиться о качестве вашего прокси

Наши статические прокси-серверы от интернет-провайдера гарантированно чисты и на 100 % выделены исключительно для вас. Никаких общих нагрузок — только высокая производительность.

Получить статические прокси-серверы интернет-провайдеров

Еще глубже погрузиться в тему Веб-скрапинг

Как безопасно управлять несколькими скрытыми аккаунтами на eBay, не рискуя получить бан

Прокси для электронной коммерции

Как безопасно управлять несколькими «скрытыми» аккаунтами на eBay в 2026 году, не рискуя получить бан

Ведение «скрытого» аккаунта на eBay в 2026 году сопряжено с большим риском, чем думает большинство продавцов. Маржа ошибки постоянно сокращается. В 2024 году eBay заблокировал более 37 000 аккаунтов из-за IP-адресов

Прокси-серверы

Руководство по развертыванию прокси-серверов: от настройки до масштабирования

Из этого подробного руководства вы узнаете о настройке прокси-серверов, стратегиях масштабирования и передовых методах оптимизации вашей инфраструктуры