Welcome to ADO.NET Access 2003—your ultimate hub for VB.NET and ADO.NET programming excellence. Discover in-depth tutorials, practical code samples, and expert troubleshooting guides covering a broad range of topics—from building robust WinForms applications and seamless MS Access integration to working with SQL Server, MySQL, and advanced tools like WebView2 and Crystal Reports. Whether you're a beginner or a seasoned developer, our step-by-step articles are designed to empower you to optimize.

Looking for MS Access Developer❓❓

Application developer

Post Page Advertisement [Top]

GSMArena scrap using C# WinForms VS2022

🍡 Introduction

Welcome to Part 2 of our GSMArena WinForms Scraping using C# in VS2022 series! In Part 1, we built a basic phone list scraper using C#.NET and HtmlAgilityPack. This time, we'll explore advanced scraping practices, making your tool more reliable and polite to GSMArena servers.

🗝️st-text vs. brandmenu-v2

GSMArena uses different HTML structures for brands and models:

  • st-text: Older list pages with simple UL/LI links.
  • brandmenu-v2: Newer, structured grid menus with richer classes and responsive design.

To handle both, you can write XPath or CSS selectors that detect and adapt:


// Example: dual-selector strategy
var nodes = doc.DocumentNode.SelectNodes(
    "//div[contains(@class,'st-text')]//a | //div[contains(@class,'brandmenu-v2')]//a");

✅ This approach makes your scraper resilient to GSMArena layout variations.

💪🏿Using User-Agents

Many servers respond differently to bots vs. browsers. Setting a user-agent helps avoid blocks and get the desktop layout you expect.

VS2022-C# Code


var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/122 Safari/537.36");

✅ Best practice: Use realistic, modern user-agent strings. Avoid hard-coding only one; rotate if needed for large jobs.

🧑‍💻Using Proxy

For serious scraping, especially at scale, using a proxy can:

  • Distribute your requests across IP addresses
  • Bypass simple IP rate-limiting
  • Test access from different regions

// Basic HttpClientHandler with proxy
var handler = new HttpClientHandler
{
    Proxy = new WebProxy("http://your-proxy-ip:port"),
    UseProxy = true
};

using var client = new HttpClient(handler);

✅ Always choose reputable proxies. Free proxies are often unreliable or dangerous.

🧩Adding Delays Between Requests

Hammering GSMArena with many rapid-fire requests is unethical and risks your IP being blocked. Adding random delays mimics human browsing and respects their server load.


var rand = new Random();
await Task.Delay(rand.Next(1500, 3000)); // 1.5 - 3 seconds

✅ Pro tip: Add a cancellation token in production to safely stop long scraping runs.

📒Expected Exceptions and How to Work Around Them

When scraping real-world websites, your code will hit errors. Here are common issues and solutions:

  • HttpRequestException: Server refused or dropped the connection.
    ➜ Retry with exponential backoff or switch proxies.
  • 404/403 Status Codes: Page not found or access denied.
    ➜ Double-check URLs; adjust User-Agent; ensure robots.txt allows scraping.
  • NullReferenceException: No matching HTML nodes.
    ➜ Update XPath/CSS selectors to match new site layout.
  • TimeoutException: Server too slow or blocking.
    ➜ Increase timeouts in HttpClient; add delays; use retries.
  • HtmlWebException (HtmlAgilityPack): Invalid HTML.
    ➜ Use Try/Catch and log problematic pages for manual review.

try
{
    var html = await httpClient.GetStringAsync(url);
    var doc = new HtmlDocument();
    doc.LoadHtml(html);
}
catch (HttpRequestException ex)
{
    // Log and retry or skip
    Console.WriteLine($"Request failed: {ex.Message}");
}

📋 Updated Workflow Overview

  • On “Load Brands” click Call GetBrandSummaryAsync(progress) 
  • Populate DGVBrands with Brand/PhoneCount 
  • User selects brands (Optionally filter by name/checkbox) 
  • On “Scrape Selected” click Pass those Brand items to your detail scraper method 
  • Fetch phones only for chosen brands Populate your DGVscrap or another grid

✅ Tip: Always wrap network calls in Try/Catch and provide meaningful error messages for debugging.

🧑‍⚖️Conclusion

By supporting both st-text and brandmenu-v2 layouts, using realistic user-agents, proxies, delays, and robust exception handling, your GSMArena scraper becomes production-ready and ethically compliant.

Always remember: Ethical scraping is sustainable scraping. Respect robots.txt, throttle requests, and don't overload target sites.

📦 GSMArena Mobile Brands

GitHub Repository Showcase

This repository contains structured data for GSM Arena mobile brands, ideal for apps and web scrapers. Clean, JSON-formatted brand lists ready to use in your projects.

⭐ View on GitHub

💬 Have questions or want more C# scraping tutorials? Let me know in the comments below! Check my other posts for .NET, ADO.NET, and WinForms guides.

Link to this post: https://adonetaccess2003.blogspot.com/2025/07/gsma-scraper-csharp-winforms-part2.html

 Here are some online Visual Basic lessons and courses:

No comments:

Bottom Ad [Post Page]