Why Proxies Are Essential for Ad Verification&How to Choose
Protect your brand from ad fraud with ad verification proxies. Learn how proxies help verify ads across locations, devices, and platforms.
Post Time:2025-03-28
Step‑by‑step guide to bulk scraping LinkedIn using rotating residential proxies.
LinkedIn stands as the premier professional network, hosting over 900 million user profiles and millions of company pages. For sales teams, recruiters, and market researchers, accessing this wealth of data—names, titles, company details, job postings—can drive lead generation and competitive intelligence. However, LinkedIn’s anti‑scraping defenses and rate limits pose significant challenges to bulk data extraction. This guide outlines a clear, three‑method approach to scraping LinkedIn efficiently, leveraging rotating residential proxies for reliable, large‑scale operations.
Traditional scraping methods—single IP requests or free proxy lists—often trigger account bans or incomplete data. Dynamic residential proxies, which route requests through genuine home‑user IPs, mimic real visitors and bypass LinkedIn’s bot detection.
Before beginning, assemble the following:
1. Install Dependencies:
pip install requests beautifulsoup4
2. Configure Proxy List: Save your MacroProxy IPs and credentials in a proxies.json file.
3. Header Template: Extract your LinkedIn session cookie and user agent string for authenticated requests.
With the environment prepared, the following methods outline practical scraping scenarios.
Gather LinkedIn profile URLs (e.g., from search results or a CSV).
For each URL, send the GET request through a different proxy IP.
Use BeautifulSoup to extract name, headline, location, current role, and education.
Append results to a CSV or database table.
python
import requests
from bs4 import BeautifulSoup
import random, json
# Load proxies
with open('proxies.json') as f:
proxies = json.load(f)['list']
headers = {
'User-Agent': 'Mozilla/5.0 ...',
'Cookie': 'li_at=YOUR_SESSION_COOKIE;'
}
def fetch_profile(url):
proxy = random.choice(proxies)
resp = requests.get(url, headers=headers, proxies={'https': proxy}, timeout=10)
soup = BeautifulSoup(resp.text, 'html.parser')
name = soup.select_one('.pv-top-card--list li').get_text(strip=True)
# ... extract other fields ...
return {'name': name}
profiles = ['https://www.linkedin.com/in/john-doe/']
for url in profiles:
data = fetch_profile(url)
print(data)
Practical Tip: Rotate User‑Agent headers alongside proxies to further mimic varied browsers and devices.
Objective: Extract company overviews, job postings, and employee counts.
URL pattern: https://www.linkedin.com/company/{company-id}/about/
Rotate proxies and parse .org-top-card-summary__title, .about-us-section.
URL pattern: https://www.linkedin.com/jobs/search/?f_C={company-id}
Extract job titles, locations, posted dates via .job-card-list__title.
Detect “Next” link and loop until no further pages, each request via a new proxy.
Use Case: Bulk export profiles matching keywords (e.g., “Data Scientist California”).
Template: https://www.linkedin.com/search/results/people/?keywords={keyword}&origin=GLOBAL_SEARCH_HEADER
Loop pages, each via a distinct proxy; parse profile URLs.
Aggregate profile URLs, then feed into Method 1 for detailed scraping.
95 M+ residential IPs support high-volume search queries without manual proxy cycling. Reliable performance while affordable pricing. Sign up today to get a free trial!
LinkedIn employs rate limits, CAPTCHAs, and bot detection. Key defenses and countermeasures:
Challenge | Countermeasure |
IP Rate Limits | Rotate proxies every request |
CAPTCHAs | Use a managed service with CAPTCHA bypass |
Session Expiry | Refresh cookies periodically or reauthenticate |
JavaScript-Rendered Data | Employ headless browser automation sparingly |
1. Headless Browsers: Tools like Playwright can render dynamic content but at higher resource cost. Combine with rotating residential proxies to distribute headless instances across IPs.
2. Adaptive Timing: Insert randomized delays (2–5 seconds) between requests to mimic human browsing patterns.
Network Size: 95 M+ IPs across 195 countries.
Protocols: HTTP(S) & SOCKS5, with auto‑rotation and sticky sessions.
Performance: Super high anonymity, 99% success rate.
MacroProxy rotating residential proxies from $1/GB now. Fresh 200K US IPs added. Buy now!
Regularly test IPs against a “health check” endpoint to retire slow or blocked proxies.
Cross-verify scraped fields (e.g., name formats) to catch parsing errors.
Store timestamps and scrape only new or updated profiles to save resources.
Q1: How many proxies are needed for scraping 10,000 profiles?
A pool of 50–100 residential IPs, rotated per 50–100 requests, balances speed and block avoidance.
Q2: Can free proxies be used for LinkedIn scraping?
Free lists often contain dead or blacklisted IPs; they increase block risk and reduce success rates.
Q3: How to handle LinkedIn login for private data?
Use session cookies from a dedicated LinkedIn account, rotated via proxies; avoid simultaneous logins on multiple IPs.
Q4: What’s the difference between rotating and sticky sessions?
Rotating assigns a new IP per request; sticky retains the same IP for a session—use rotating for breadth, sticky for checkout‑like flows.
Q5: How does MacroProxy ensure data privacy compliance?
MacroProxy sources IPs via opt‑in networks and adheres to GDPR/CCPA standards, providing audit logs for each request.
Next >