A Segment-based Weighting Technique for URL-based Genre Classification of Web Pages
Abstract
We propose a segment-based weighting technique for genre classification of web pages. This technique exploits character n-grams extracted from the URL of the web page rather than its textual content. The main idea of our technique is to segment the URL and assigns a weight for each segment. Experiments conducted on three known genre datasets show that our method achieves encouraging results.
Keywords
URL; genre classification; web page; segment weight
Refbacks
- There are currently no refbacks.