HTML::Encoding ============== This module can be used to determine the character encoding of HTML and XHTML files. It reports explicitly given informations on the encoding. It tries to read * the Content-Type headers 'charset' parameter of an HTTP::Headers or Mail::Header object * the XML declaration of XHTML files * the byte order mark at the beginning of the file * a meta element like You have always to know the encoding of (X)HTML files if you are trying to process them, e.g. parse it with HTML::Parser or extracting links with HTML::LinkExtor. It is not safe and forbidden by HTML 4 to assume any default encoding like US-ASCII or ISO-8859-1. Documents may even be not encoded in some 8 bit character encoding but may use UTF-16 or not compatible with US-ASCII like EBCDIC encoded files. To assume some US-ASCII compatible encoding could fail and even break document. Consider you are retrieving an UTF-8 encoded file and pass it to some other application, e.g. a web browser labeld as ISO-8859-1, the user will see lots of for him weired characters. This module provides an easy to use method to circumvent all those possible problems. It may however fail if the page author didn't supply character encoding informations; this is indeed a problem, since if this module cannot determine the encoding, no one can and the document is said to break. INSTALLATION To install this module type the following: perl Makefile.PL make make test make install COPYRIGHT AND LICENCE This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Copyright (C) 2001 Björn Höhrmann