Harusame extracts a version of the HTML document written in the specified natural language, from a source HTML document that contains paragraphs in multiple natural languages.
The document management of a multilingual Web site where there are multiple versions of a (conceptually same) document is somewhat difficult in general. If the author of an HTML document wants to edit a part of the document, then he or she has to ensure not to forget updating translations at the same time, otherwise documents in different language versions also differ in their content versions.
Using harusame, one can generate versions of an HTML document in different language from one source HTML document that contains paragraphs written in all of those languages, such that authors no longer have to manage different content versions and different language versions in separate files.
An Atom feed for ChangeLog diffs is available.
First, prepare a source HTML document html-document.src. The document can be marked up in HTML as usual, except for the special markup for harusame to identify alternative blocks written in various languages.
An input document might look like:
<!DOCTYPE HTML> <html lang=mul> <title>Example Document</title> <h1 data-lang-container> <span lang=en>Example document</span> <span lang=ja>文書の例</span> </h1> <p data-lang-container> <span lang=en>This is an example of document with multiple languages.</span> <span lang=ja>これは複数の言語で記述された文書の例です。</span>
Then, invoke the
harusame command for each language to
$ harusame --lang en < html-document.src > html-document.en $ harusame --lang ja < html-document.src > html-document.ja
For more information on the
options, see its documentation.
Special markup for harusame processing is
data-lang-* attributes on HTML
elements, as follows:
If this attribute is specified to an HTML element, then it is treated as a container element that contains versions of the content in various languages.
An element with this attribute must contain one or more HTML
elements. One of them, whose language (
value) matches with the language specified as
command-line option, is selected. If there are more than one
such elements, then the first one is selected. If there is no
element whose language matches to the
--lang option, then
the first element (of whatever language) is selected.
If the attribute value is
replace, then the element
itself is replaced by the selected element. Otherwise, the
content of the element is replaced by the selected
If this attribute is specified to an HTML element, then its content is replaced by the content of another element.
If this attribute is specified, its value must be an ID of another HTML element in the document.
A child element of the element addressed by the ID is
selected in the similar way to the
data-lang-container attribute. Then, the content of the
element with the
data-lang-content attribute is replaced
by the text content (the value of the
attribute) of the selected element.
If the element addressed by the ID has
data-lang-declaration attribute, then the element will be
removed from the document at the end of the whole processing.
This attribute is useful when an HTML element does not allow
elements to be inserted within it . For example, the content model of
title element does not allow child elements. In such
a case, instead of inserting elements with
attribute, this attribute can be used to point to translations of the
title element content, as follows:
<title data-lang-content="title-content">Example</title> <div id="title-content" data-lang-declaration hidden> <span lang=en>Example</span> <span lang=ja>例</span> </div>
Note that the pointed element, i.e. the
attribute specified. In many (though not all) cases, the pointed
translations do not form the actual content of the document and
therefore the attribute ask to harusame to get rid of
them from the document at the end of the processing.
This attribute indicates that the element declares a set of alternatives in various languages.
This attribute must not be used for an element that is not pointed
data-lang-content attribute in the same document.
This document is itself prepared using harusame. See the source document to learn it can be used.
This software requires Perl 5.8 or later.
Harusame is developed using Git. Git repository (GitHub) can be cloned by the following commands:
$ git clone https://github.com/wakaba/harusame.git $ cd harusame $ make deps $ ./harusame