You are viewing our Forum Archives. To view or take place in current topics click here.
Help on HTML data collecting. [c#]
Posted:

Help on HTML data collecting. [c#]Posted:

HTTK
  • Gold Gifter
Status: Offline
Joined: Jul 25, 201410Year Member
Posts: 2,496
Reputation Power: 313
Status: Offline
Joined: Jul 25, 201410Year Member
Posts: 2,496
Reputation Power: 313
For fun I am trying to make a program that pulls data from TTG.




I don't have time to browse around forever to find out why my code is broken because of finale exams at school and was hoping someone could help me out so when I sit down again to work on this I can jump right into it.




Please do not be to harsh on my code, i'm new to c#.

Also I know a program like this exist but I wanna make my own for the learning experience.




I keep getting this error:
[ Register or Signin to view external links. ]


Here is my code:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace Data_Collection
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {



        }

        private void button1_Click(object sender, EventArgs e)
        {
           
            List<string> online = new List<string>();

            WebClient web = new WebClient();

            String html = web.DownloadString("http://www.thetechgame.com/Forums.html");

            MatchCollection m1 = Regex.Matches(html, @"<ul>\s*(.+?)</ul>", RegexOptions.Singleline);


            foreach (Match m in m1)
            {

                string on = m.Groups[1].Value;
                online.Add(on);
            }

            listBox1.DataSource = online;
        }

    }
}

The following 1 user thanked HTTK for this useful post:

Dusknoir (12-31-2015)
#2. Posted:
Hacz
  • Christmas!
Status: Offline
Joined: Mar 04, 201014Year Member
Posts: 2,891
Reputation Power: 150
Status: Offline
Joined: Mar 04, 201014Year Member
Posts: 2,891
Reputation Power: 150
As for the error, it is a 403 Forbidden Error. Easy fix is to set the User-Agent header before making the request.

WebClient web = new WebClient();
web.Headers.Add("user-agent", "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)");
//then download the string and the 403 Error will disappear


The next issue you'll run into is the user list. It is updated upon page load using an Ajax request. Digging through the script, it is populated from a PHP file with a GET request (maybe there are more actions that could find more details about users/site stats/etc.). It'll then return the data of users in a JSON format (username and number like in the footer). Changing the Regex will then produce the ouput: [ Register or Signin to view external links. ]

I'm not sure if the staff want me posting the link to the script, so feel free to PM me if you need help finding it.
Jump to:
You are viewing our Forum Archives. To view or take place in current topics click here.