A Segment-based Weighting Technique for URL-based Genre Classification of Web Pages

Chaker Jebari


We propose a segment-based weighting technique for genre classification of web pages. This technique exploits character n-grams extracted from the URL of the web page rather than its textual content. The main idea of our technique is to segment the URL and assigns a weight for each segment. Experiments conducted on three known genre datasets show that our method achieves encouraging results.


URL; genre classification; web page; segment weight

