🍡 Introduction
Welcome to Part 2 of our GSMArena WinForms Scraping using C# in VS2022 series! In Part 1, we built a basic phone list scraper using C#.NET and HtmlAgilityPack. This time, we'll explore advanced scraping practices, making your tool more reliable and polite to GSMArena servers.
🗝️st-text vs. brandmenu-v2
GSMArena uses different HTML structures for brands and models:
- st-text: Older list pages with simple UL/LI links.
- brandmenu-v2: Newer, structured grid menus with richer classes and responsive design.
To handle both, you can write XPath or CSS selectors that detect and adapt:
// Example: dual-selector strategy
var nodes = doc.DocumentNode.SelectNodes(
"//div[contains(@class,'st-text')]//a | //div[contains(@class,'brandmenu-v2')]//a");
✅ This approach makes your scraper resilient to GSMArena layout variations.
💪🏿Using User-Agents
Many servers respond differently to bots vs. browsers. Setting a user-agent helps avoid blocks and get the desktop layout you expect.
VS2022-C# Code
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/122 Safari/537.36");
✅ Best practice: Use realistic, modern user-agent strings. Avoid hard-coding only one; rotate if needed for large jobs.
🧑💻Using Proxy
For serious scraping, especially at scale, using a proxy can:
- Distribute your requests across IP addresses
- Bypass simple IP rate-limiting
- Test access from different regions
// Basic HttpClientHandler with proxy
var handler = new HttpClientHandler
{
Proxy = new WebProxy("http://your-proxy-ip:port"),
UseProxy = true
};
using var client = new HttpClient(handler);
✅ Always choose reputable proxies. Free proxies are often unreliable or dangerous.
🧩Adding Delays Between Requests
Hammering GSMArena with many rapid-fire requests is unethical and risks your IP being blocked. Adding random delays mimics human browsing and respects their server load.
var rand = new Random();
await Task.Delay(rand.Next(1500, 3000)); // 1.5 - 3 seconds
✅ Pro tip: Add a cancellation token in production to safely stop long scraping runs.
📒Expected Exceptions and How to Work Around Them
When scraping real-world websites, your code will hit errors. Here are common issues and solutions:
- HttpRequestException: Server refused or dropped the connection.
➜ Retry with exponential backoff or switch proxies. - 404/403 Status Codes: Page not found or access denied.
➜ Double-check URLs; adjust User-Agent; ensure robots.txt allows scraping. - NullReferenceException: No matching HTML nodes.
➜ Update XPath/CSS selectors to match new site layout. - TimeoutException: Server too slow or blocking.
➜ Increase timeouts in HttpClient; add delays; use retries. - HtmlWebException (HtmlAgilityPack): Invalid HTML.
➜ Use Try/Catch and log problematic pages for manual review.
try
{
var html = await httpClient.GetStringAsync(url);
var doc = new HtmlDocument();
doc.LoadHtml(html);
}
catch (HttpRequestException ex)
{
// Log and retry or skip
Console.WriteLine($"Request failed: {ex.Message}");
}
📋 Updated Workflow Overview
- On “Load Brands” click Call GetBrandSummaryAsync(progress)
- Populate DGVBrands with Brand/PhoneCount
- User selects brands (Optionally filter by name/checkbox)
- On “Scrape Selected” click Pass those Brand items to your detail scraper method
- Fetch phones only for chosen brands Populate your DGVscrap or another grid
✅ Tip: Always wrap network calls in Try/Catch and provide meaningful error messages for debugging.
🧑⚖️Conclusion
By supporting both st-text and brandmenu-v2 layouts, using realistic user-agents, proxies, delays, and robust exception handling, your GSMArena scraper becomes production-ready and ethically compliant.
Always remember: Ethical scraping is sustainable scraping. Respect robots.txt, throttle requests, and don't overload target sites.
💬 Have questions or want more C# scraping tutorials? Let me know in the comments below! Check my other posts for .NET, ADO.NET, and WinForms guides.
Link to this post: https://adonetaccess2003.blogspot.com/2025/07/gsma-scraper-csharp-winforms-part2.html
♥ Here are some online Visual Basic lessons and courses:
No comments:
Post a Comment