/[suikacvs]/www/2005/pre-id/son-of-rfc1036.txt
Suika

Contents of /www/2005/pre-id/son-of-rfc1036.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Fri Mar 11 05:36:57 2005 UTC (19 years, 8 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1
2    
3    
4     INTERNET DRAFT to be NEWS sec. -
5    
6    
7    
8    
9    
10     News Article Format and Transmission
11    
12     Henry Spencer
13    
14    
15    
16     Status of this Memo
17    
18     This document is intended to become an Internet Draft.
19     Internet Drafts are working documents of the Internet Engi-
20     neering Task Force (IETF), its Areas, and its Working
21     Groups. Note that other groups may also distribute working
22     documents as Internet Drafts.
23    
24     Internet Drafts are draft documents valid for a maximum of
25     six months. Internet Drafts may be updated, replaced, or
26     obsoleted by other documents at any time. It is not appro-
27     priate to use Internet Drafts as reference material or to
28     cite them other than as a "working draft" or "work in
29     progress".
30    
31     Please check the I-D abstract listing contained in each
32     Internet Draft directory to learn the current status of this
33     or any other Internet Draft. (Actually, this draft is at
34     too early a stage to even be listed there yet.)
35    
36     It is hoped that a later version of this Draft will obsolete
37     RFC 1036 and will become an Internet standard.
38    
39     References to the "successor to this Draft" refer not to
40     later versions of this draft, but to a hypothetical future
41     rewrite of this Draft (in the same way that this Draft is a
42     rewrite of RFC 1036).
43    
44     Distribution of this memo is unlimited.
45    
46    
47     Abstract
48    
49     This Draft defines the format and procedures for interchange
50     of network news articles. It is hoped that a later version
51     of this Draft will obsolete RFC 1036, reflecting more recent
52     experience and accommodating future directions.
53    
54     Network news articles resemble mail messages but are broad-
55     cast to potentially-large audiences, using a flooding algo-
56     rithm that propagates one copy to each interested host (or
57     group thereof), typically stores only one copy per host, and
58     does not require any central administration or systematic
59     registration of interested users. Network news originated
60     as the medium of communication for Usenet, circa 1980.
61    
62    
63    
64     2 June 1994 - 1 - expires 15 July 1994
65    
66    
67    
68    
69    
70     INTERNET DRAFT to be NEWS sec. -
71    
72    
73     Since then Usenet has grown explosively, and many Internet
74     sites participate in it. In addition, the news technology
75     is now in widespread use for other purposes, on the Internet
76     and elsewhere.
77    
78     This Draft primarily codifies and organizes existing prac-
79     tice. A few small extensions have been added in an attempt
80     to solve problems that are considered serious. Major exten-
81     sions (e.g. cryptographic authentication) that need signifi-
82     cant development effort are left to be undertaken as inde-
83     pendent efforts.
84    
85    
86     Table of Contents
87    
88     TBW
89    
90    
91     1. Introduction
92    
93     Network news articles resemble mail messages but are broad-
94     cast to potentially-large audiences, using a flooding algo-
95     rithm that propagates one copy to each interested host (or
96     groups thereof), typically stores only one copy per host,
97     and does not require any central administration or system-
98     atic registration of interested users. Network news origi-
99     nated as the medium of communication for Usenet, circa 1980.
100     Since then Usenet has grown explosively, and many Internet
101     sites participate in it. In addition, the news technology
102     is now in widespread use for other purposes, on the Internet
103     and elsewhere.
104    
105     The earliest news interchange used the so-called "A News"
106     article format. Shortly thereafter, an article format
107     vaguely resembling Internet mail was devised and used
108     briefly. Both of those formats are completely obsolete;
109     they are documented in appendix A for historical reasons
110     only. With publication of RFC 850 [rrr] in 1983, news arti-
111     cles came to closely resemble Internet mail messages, with
112     some restrictions and some additional headers. RFC 1036
113     [rrr] in 1987 updated RFC 850 without making major changes.
114    
115     In the intervening five years, the RFC 1036 article format
116     has proven quite satisfactory, although minor extensions
117     appear desirable to match recent developments in areas such
118     as multi-media mail. RFC 1036 itself has not proven quite
119     so satisfactory. It is often rather vague and does not
120     address some issues at all; this has caused significant
121     interoperability problems at times, and implementations have
122     diverged somewhat. Worse, although it was intended primar-
123     ily to document existing practice, it did not precisely
124     match existing practice even at the time it was published,
125     and the deviations have grown since.
126    
127    
128    
129    
130     2 June 1994 - 2 - expires 15 July 1994
131    
132    
133    
134    
135    
136     INTERNET DRAFT to be NEWS sec. 1
137    
138    
139     This Draft attempts to specify the format of articles, and
140     the procedures used to exchange them and process them, in
141     sufficient detail to allow full interoperability. In addi-
142     tion, some tentative suggestions are made about directions
143     for future development, in an attempt to avert unnecessary
144     divergence and consequent loss of interoperability. Major
145     extensions (e.g. cryptographic authentication) that need
146     significant development effort are left to be undertaken as
147     independent efforts.
148    
149     NOTE: One question this all may raise is: why is
150     there no News-Version header, analogous to MIME-
151     Version, specifying a version number corresponding
152     to this specification? The answer is: it doesn't
153     appear to be useful, given news's backward-
154     compatibility constraints. The major use of a
155     version number is indicating which of several
156     INCOMPATIBLE interpretations is relevant. The
157     impossibility of orchestrating any sort of simul-
158     taneous change over news's installed base makes it
159     necessary to avoid such incompatible changes (as
160     opposed to extensions) entirely. MIME has a ver-
161     sion number mostly because it introduced incompat-
162     ible changes to the interpretation of several
163     "Content-" headers. This Draft attempts no
164     changes in interpretation and it appears doubtful
165     that future Drafts will find it feasible to intro-
166     duce any.
167    
168     UNRESOLVED ISSUE: Should this be reconsidered?
169     Only if the header has SPECIFIC IDENTIFIABLE uses
170     today. Otherwise it's just useless added bulk.
171    
172     As in this Draft's predecessors, the exact means used to
173     transmit articles from one host to another is not specified.
174     NNTP [rrr] is probably the most common transmission method
175     on the Internet, but a number of others are known to be in
176     use, including the UUCP protocol [rrr] extensively used in
177     the early days of Usenet and still much used on its fringes
178     today.
179    
180     Several of the mechanisms described in this Draft may seem
181     somewhat strange or even bizarre at first reading. As with
182     Internet mail, there is no reasonable possibility of updat-
183     ing the entire installed base of news software promptly, so
184     interoperability with old software is crucial and will
185     remain so. Compatibility with existing practice and robust-
186     ness in an imperfect world necessarily take priority over
187    
188    
189    
190    
191    
192    
193    
194    
195    
196     2 June 1994 - 3 - expires 15 July 1994
197    
198    
199    
200    
201    
202     INTERNET DRAFT to be NEWS sec. 1
203    
204    
205     elegance.
206    
207    
208     2. Definitions, Notations, and Conventions
209    
210    
211     2.1. Textual Notations
212    
213     Throughout this Draft, "MAIL" is short for "RFC 822 [rrr] as
214     amended by RFC 1123 [rrr]". (RFC 1123's amendments are
215     mostly relatively small, but they are not insignificant.)
216     See also the discussion in section 3 about this Draft's
217     relationship to MAIL. "MIME" is short for "RFCs 1341 and
218     1342" (or their updated replacements).
219    
220     UNRESOLVED ISSUE: Update these numbers.
221    
222     "ASCII" is short for "the ANSI X3.4 character set" [rrr].
223     While "ASCII" is often misused to refer to various character
224     sets somewhat similar to X3.4, in this Draft, "ASCII" means
225     X3.4 and only X3.4.
226    
227     NOTE: The name is traditional (to the point where
228     the ANSI standard sanctions it) even though it is
229     no longer an acronym for the name of the standard.
230    
231     NOTE: ASCII, X3.4, contains 128 characters, not
232     all of them printable. Character sets with more
233     characters are not ASCII, although they may
234     include it as a subset.
235    
236     Certain words used to define the significance of individual
237     requirements are capitalized. "MUST" means that the item is
238     an absolute requirement of the specification. "SHOULD"
239     means that the item is a strong recommendation: there may be
240     valid reasons to ignore it in unusual circumstances, but
241     this should be done only after careful study of the full
242     implications and a firm conclusion that it is necessary,
243     because there are serious disadvantages to doing so. "MAY"
244     means that the item is truly optional, and implementors and
245     users are warned that conformance is possible but not to be
246     relied on.
247    
248     The term "compliant", applied to implementations etc., indi-
249     cates satisfaction of all relevant "MUST" and "SHOULD"
250     requirements. The term "conditionally compliant" indicates
251     satisfaction of all relevant "MUST" requirements but viola-
252     tion of at least one relevant "SHOULD" requirement.
253    
254     This Draft contains explanatory notes using the following
255     format. These may be skipped by persons interested solely
256     in the content of the specification. The purpose of the
257     notes is to explain why choices were made, to place them in
258     context, or to suggest possible implementation techniques.
259    
260    
261    
262     2 June 1994 - 4 - expires 15 July 1994
263    
264    
265    
266    
267    
268     INTERNET DRAFT to be NEWS sec. 2.1
269    
270    
271     NOTE: While such explanatory notes may seem super-
272     fluous in principle, they often help the less-
273     than-omniscient reader grasp the purpose of the
274     specification and the constraints involved. Given
275     the limitations of natural language for descrip-
276     tive purposes, this improves the probability that
277     implementors and users will understand the true
278     intent of the specification in cases where the
279     wording is not entirely clear.
280    
281     All numeric values are given in decimal unless otherwise
282     indicated. Octets are assumed to be unsigned values for
283     this purpose. Large numbers are written using the North
284     American convention, in which "," separates groups of three
285     digits but otherwise has no significance.
286    
287    
288     2.2. Syntax Notation
289    
290     Although the mechanisms specified in this Draft are all
291     described in prose, most are also described formally in the
292     modified BNF notation of RFC 822. Implementors will need to
293     be familiar with this notation to fully understand this
294     specification, and are referred to RFC 822 for a complete
295     explanation of the modified BNF notation. Here is a brief
296     illustrative example:
297    
298     sentence = clause *( punct clause ) "."
299     punct = ":" / ";"
300     clause = 1*word [ "(" clause ")" / "," 1*word ]
301     word = <any English word>
302    
303     This defines a sentence as some clauses separated by puncts
304     and ended by a period, a punct as a colon or semicolon, a
305     clause as at least one <word> optionally followed by either
306     a parenthesized clause or a comma and at least one more
307     <word>, and a <word> as (informally) any English word. <>
308     are used to enclose names when (and only when) distinguish-
309     ing them from surrounding text is useful. The full form of
310     the repetition notation is <m>"*"<n><thing>, denoting <m>
311     through <n> repetitions of <thing>; <m> defaults to zero,
312     <n> to infinity, and the "*" and <n> can be omitted if <m>
313     and <n> are equal, so 1*word is one or more words, 1*5word
314     is one through five words, and 2word is exactly two words.
315    
316     The character "\" is not special in any way in this nota-
317     tion.
318    
319     This Draft is intended to be self-contained; all syntax
320     rules used in it are defined within it, and a rule with the
321     same name as one found in MAIL does not necessarily have the
322     same definition. The lexical layer of MAIL is NOT, repeat
323     NOT, used in this Draft, and its presence must not be
324     assumed; notably, this Draft spells out all places where
325    
326    
327    
328     2 June 1994 - 5 - expires 15 July 1994
329    
330    
331    
332    
333    
334     INTERNET DRAFT to be NEWS sec. 2.2
335    
336    
337     white space is permitted/required and all places where con-
338     structs resembling MAIL comments can occur.
339    
340     NOTE: News parsers historically have been much
341     less permissive than MAIL parsers.
342    
343    
344     2.3. Definitions
345    
346     The term "character set", wherever it is used in this Draft,
347     refers to a coded character set, in the sense of ISO charac-
348     ter set standardization work, and must not be misinterpreted
349     as meaning merely "a set of characters".
350    
351     In this Draft, ASCII character 32 is referred to as "blank";
352     the word "space" has a more generic meaning.
353    
354     An "article" is the unit of news, analogous to a MAIL "mes-
355     sage".
356    
357     A "poster" is a human being (or software equivalent) submit-
358     ting a possibly-compliant article to be "posted": made
359     available for reading on all relevant hosts. A "posting
360     agent" is software that assists posters to prepare articles,
361     including determining whether the final article is compli-
362     ant, passing it on to a relayer for posting if so, and
363     returning it to the poster with an explanation if not. A
364     "relayer" is software which receives allegedly-compliant
365     articles from posting agents and/or other relayers, files
366     copies in a "news database", and possibly passes copies on
367     to other relayers.
368    
369     NOTE: While the same software may well function
370     both as a relayer and as part of a posting agent,
371     the two functions are distinct and should not be
372     confused. The posting agent's purpose is (in
373     part) to validate an article, supply header infor-
374     mation that can or should be supplied automati-
375     cally, and generally take reasonable actions in an
376     attempt to transform the poster's submission into
377     a compliant article. The relayer's purpose is to
378     move already-compliant articles around efficiently
379     without damaging them.
380    
381     A "reader" is a human being reading news articles. A "read-
382     ing agent" is software which presents articles to a reader.
383    
384     NOTE: Informal usage often uses "reader" for both
385     these meanings, but this introduces considerable
386     potential for confusion and misunderstanding, so
387     this Draft takes care to make the distinction.
388    
389     A "newsgroup" is a single news forum, a logical bulletin
390     board, having a name and nominally intended for articles on
391    
392    
393    
394     2 June 1994 - 6 - expires 15 July 1994
395    
396    
397    
398    
399    
400     INTERNET DRAFT to be NEWS sec. 2.3
401    
402    
403     a specific topic. An article is "posted to" a single news-
404     group or several newsgroups. When an article is posted to
405     more than one newsgroup, it is said to be "cross-posted";
406     note that this differs from posting the same text as part of
407     each of several articles, one per newsgroup. A "hierarchy"
408     is the set of all newsgroups whose names share a first com-
409     ponent (see the name syntax in section 5.5).
410    
411     A newsgroup may be "moderated", in which case submissions
412     are not posted directly, but mailed to a "moderator" for
413     consideration and possible posting. Moderators are typi-
414     cally human but may be implemented partially or entirely in
415     software.
416    
417     A "followup" is an article containing a response to the con-
418     tents of an earlier article (the followup's "precursor"). A
419     "followup agent" is a combination of reading agent and post-
420     ing agent that aids in the preparation and posting of a fol-
421     lowup.
422    
423     Text comparisons are "case-sensitive" if they consider
424     uppercase letters (e.g. "A") different from lowercase let-
425     ters (e.g. "a"), and "case-insensitive" if letters differing
426     only in case (e.g. "A" and "a") are considered identical.
427     Categories of text are said to be case-(in)sensitive if com-
428     parisons of such texts to others are case-(in)sensitive.
429    
430     A "cooperating subnet" is a set of news-exchanging hosts
431     which is sufficiently well-coordinated (typically via a cen-
432     tral administration of some sort) that stronger assumptions
433     can be made about hosts in the set than about news hosts in
434     general. This is typically used to relax restrictions which
435     are otherwise required for worst-case interoperability; mem-
436     bers of a cooperating subnet MAY interchange articles that
437     do not conform to this Draft's specifications, provided all
438     members have agreed to this and provided the articles are
439     not permitted to leak out of the subnet. The word "subnet"
440     is used to emphasize that a cooperating subnet is typically
441     not an isolated universe; care must be taken that traffic
442     leaving the subnet complies with the restrictions of the
443     larger net, not just those of the cooperating subnet.
444    
445     A "message ID" is a unique identifier for an article, usu-
446     ally supplied by the posting agent which posted it. It dis-
447     tinguishes the article from every other article ever posted
448     anywhere (in theory). Articles with the same message ID are
449     treated as identical copies of the same article even if they
450     are not in fact identical.
451    
452     A "gateway" is software which receives news articles and
453     converts them to messages of some other kind (e.g. mail to a
454     mailing list), or vice-versa; in essence it is a translating
455     relayer that straddles boundaries between different methods
456     of message exchange. The most common type of gateway
457    
458    
459    
460     2 June 1994 - 7 - expires 15 July 1994
461    
462    
463    
464    
465    
466     INTERNET DRAFT to be NEWS sec. 2.3
467    
468    
469     connects newsgroup(s) to mailing list(s), either unidirec-
470     tionally or bidirectionally, but there are also gateways
471     between news networks using this Draft's news format and
472     those using other formats.
473    
474     A "control message" is an article which is marked as con-
475     taining control information; a relayer receiving such an
476     article will (subject to permissions etc.) take actions
477     beyond just filing and passing on the article.
478    
479     NOTE: "Control article" would be more consistent
480     terminology, but "control message" is already well
481     established.
482    
483     An article's "reply address" is the address to which mailed
484     replies should be sent. This is the address specified in
485     the article's From header (see section 5.2), unless it also
486     has a Reply-To header (see section 6.3).
487    
488     The notation (e.g.) "(ASCII 17)" following a name means
489     "this name refers to the ASCII character having value 17".
490     An "ASCII printable character" is an ASCII character in the
491     range 33-126. An "ASCII control character" is an ASCII
492     character in the range 0-31, or the character DEL (ASCII
493     127). A "non-ASCII character" is a character having a value
494     exceeding 127.
495    
496     NOTE: Blank is neither an "ASCII printable charac-
497     ter" nor an "ASCII control character".
498    
499    
500     2.4. End Of Line
501    
502     How the end of a text line is represented depends on the
503     context and the implementation. For Internet transmission
504     via protocols such as SMTP [rrr], an end-of-line is a CR
505     (ASCII 13) followed by an LF (ASCII 10). ISO C [rrr] and
506     many modern operating systems indicate end-of-line with a
507     single character, typically ASCII LF (aka "newline"), and
508     this is the normal convention when news is transmitted via
509     UUCP. A variety of other methods are in use, including out-
510     of-band methods in which there is no specific character that
511     means end-of-line.
512    
513     This Draft does not constrain how end-of-line is represented
514     in news, except that characters other than CR and LF MUST
515     not be usurped for use in end-of-line representations.
516     Also, obviously, all software dealing with a particular copy
517     of an article must agree on the convention to be used.
518     "EOL" is used to mean "whatever end-of-line representation
519     is appropriate"; it is not necessarily a character or
520     sequence of characters.
521    
522    
523    
524    
525    
526     2 June 1994 - 8 - expires 15 July 1994
527    
528    
529    
530    
531    
532     INTERNET DRAFT to be NEWS sec. 2.4
533    
534    
535     NOTE: If faced with picking an EOL representation
536     in the absence of other constraints, use of a sin-
537     gle character simplifies processing, and the ASCII
538     standard [rrr] specifies that if one character is
539     to be used for this purpose, it should be LF
540     (ASCII 10).
541    
542     NOTE: Inside MIME encodings, use of the Internet
543     canonical EOL representation (CR followed by LF)
544     is mandatory. See [rrr].
545    
546    
547     2.5. Case-Sensitivity
548    
549     Text in newsgroup names, header parameters, etc. is case-
550     sensitive unless stated otherwise.
551    
552     NOTE: This is at variance with MAIL, which is
553     case-insensitive unless stated otherwise, but is
554     consistent with news historical practice and
555     existing news software. See the comments on back-
556     ward compatibility in section 1.
557    
558    
559     2.6. Language
560    
561     Various constant strings in this Draft, such as header names
562     and month names, are derived from English words. Despite
563     their derivation, these words do NOT change when the poster
564     or reader employing them is interacting in a language other
565     than English. Posting and reading agents SHOULD translate
566     as appropriate in their interaction with the poster or
567     reader, but the forms that actually appear in articles are
568     always the English-derived ones defined in this Draft.
569    
570    
571     3. Relation To MAIL (RFC 822 etc.)
572    
573     The primary intent of this Draft is to completely describe
574     the news article format as a subset of MAIL's message format
575     augmented by some new headers. Unless explicitly noted oth-
576     erwise, the intent throughout is that an article MUST also
577     be a valid MAIL message.
578    
579     NOTE: Despite obvious similarities between news
580     and mail, opinions vary on whether it is possible
581     or desirable to unify them into a single service.
582     However, it is unquestionably both possible and
583     useful to employ some of the same tools for manip-
584     ulating both mail messages and news articles, so
585     there is specific advantage to be had in defining
586     them compatibly. Furthermore, there is no appar-
587     ent need to re-invent the wheel when slight exten-
588     sions to an existing definition will suffice.
589    
590    
591    
592     2 June 1994 - 9 - expires 15 July 1994
593    
594    
595    
596    
597    
598     INTERNET DRAFT to be NEWS sec. 3
599    
600    
601     Given that this Draft attempts to be self-contained, it
602     inevitably contains considerable repetition of information
603     found in MAIL. This raises the possibility of unintentional
604     conflicts. Unless specifically noted otherwise, any wording
605     in this Draft which permits behavior that is not MAIL-
606     compliant is erroneous and should be followed only to the
607     extent that the result remains compliant with MAIL.
608    
609     NOTE: RFC 1036 said "where this standard conflicts
610     with [RFC 822], RFC-822 should be considered cor-
611     rect and this standard in error". Taken liter-
612     ally, this was obviously incorrect, since RFC 1036
613     imposed a number of restrictions not found in RFC
614     822. The intent, however, was reasonable: to
615     indicate that UNINTENTIONAL differences were
616     errors in RFC 1036.
617    
618     Implementors and users should note that MAIL is deliberately
619     an extensible standard, and most extensions devised for mail
620     are also relevant to (and compatible with) news. Note par-
621     ticularly MIME [rrr], summarized briefly in appendix B,
622     which extends MAIL in a number of useful ways that are defi-
623     nitely relevant to news. Also of note is the work in
624     progress on reconciling PEM (Privacy Enhanced Mail, which
625     defines extensions for authentication and security) with
626     MIME, after which this may also be relevant to news.
627    
628     UNRESOLVED ISSUE: Update the MIME/PEM information.
629    
630     Similarly, descriptions here of MIME facilities should be
631     considered correct only to the extent that they do not
632     require or legitimize practices that would violate those
633     RFCs. (Note that this Draft does extend the application of
634     some MIME facilities, but this is an extension rather than
635     an alteration.)
636    
637    
638     4. Basic Format
639    
640    
641     4.1. Overall Syntax
642    
643     The overall syntax of a news article is:
644    
645    
646    
647    
648    
649    
650    
651    
652    
653    
654    
655    
656    
657    
658     2 June 1994 - 10 - expires 15 July 1994
659    
660    
661    
662    
663    
664     INTERNET DRAFT to be NEWS sec. 4.1
665    
666    
667     article = 1*header separator body
668     header = start-line *continuation
669     start-line = header-name ":" space [ nonblank-text ] eol
670     continuation = space nonblank-text eol
671     header-name = 1*name-character *( "-" 1*name-character )
672     name-character = letter / digit
673     letter = <ASCII letter A-Z or a-z>
674     digit = <ASCII digit 0-9>
675     separator = eol
676     body = *( [ nonblank-text / space ] eol )
677     eol = <EOL>
678     nonblank-text = [ space ] text-character *( space-or-text )
679     text-character = <any ASCII character except NUL (ASCII 0),
680     HT (ASCII 9), LF (ASCII 10), CR (ASCII 13),
681     or blank (ASCII 32)>
682     space = 1*( <HT (ASCII 9)> / <blank (ASCII 32)> )
683     space-or-text = space / text-character
684    
685     An article consists of some headers followed by a body. An
686     empty line separates the two. The headers contain struc-
687     tured information about the article and its transmission. A
688     header begins with a header name identifying it, and can be
689     continued onto subsequent lines by beginning the continua-
690     tion line(s) with white space. (Note that section 4.2.3
691     adds some restrictions to the header syntax indicated here.)
692     The body is largely-unstructured text significant only to
693     the poster and the readers.
694    
695     NOTE: Terminology here follows the current custom
696     in the news community, rather than the MAIL con-
697     vention of (sometimes) referring to what is here
698     called a "header" as a "header field" or "field".
699    
700     Note that the separator line must be truly empty, not just a
701     line containing white space. Further empty lines following
702     it are part of the body, as are empty lines at the end of
703     the article.
704    
705     NOTE: Some systems make no distinction between
706     empty lines and lines consisting entirely of white
707     space; indeed, some systems cannot represent
708     entirely empty lines. The grammar's requirement
709     that header continuation lines contain some print-
710     able text is meant to ensure that the empty/space
711     distinction cannot confuse identification of the
712     separator line.
713    
714     NOTE: It is tempting to authorize posting agents
715     to strip empty lines at the beginning and end of
716     the body, but such empty lines could possibly be
717     part of a preformatted document.
718    
719     Implementors are warned that trailing white space, whether
720     alone on the line or not, MAY be significant in the body,
721    
722    
723    
724     2 June 1994 - 11 - expires 15 July 1994
725    
726    
727    
728    
729    
730     INTERNET DRAFT to be NEWS sec. 4.1
731    
732    
733     notably in early versions of the "uuencode" encoding for
734     binary data. Trailing white space MUST be preserved unless
735     the article is known to have originated within a cooperating
736     subnet that avoids using significant trailing white space,
737     and SHOULD be preserved regardless. Posters SHOULD avoid
738     using conventions or encodings which make trailing white
739     space significant; for encoding of binary data, MIME's
740     "base64" encoding is recommended. Implementors are warned
741     that ISO C implementations are not required to preserve
742     trailing white space, and special precautions may be neces-
743     sary in implementations which do not.
744    
745     NOTE: Unfortunately, the signature-delimiter con-
746     vention (described in section 4.3.2) does use sig-
747     nificant trailing white space. It's too late to
748     fix this; there is work underway on defining an
749     organized signature convention as part of MIME,
750     which is a preferable solution in the long run.
751    
752     Posters are warned that some very old relayer software mis-
753     behaves when the first non-empty line of an article body
754     begins with white space.
755    
756    
757     4.2. Headers
758    
759    
760     4.2.1. Names and Contents
761    
762     Despite the restrictions on header-name syntax imposed by
763     the grammar, relayers and reading agents SHOULD tolerate
764     header names containing any ASCII printable character other
765     than colon (":", ASCII 58).
766    
767     NOTE: MAIL header names can contain any ASCII
768     printable character (other than colon) in theory,
769     but in practice, arbitrary header names are known
770     to cause trouble for some news software. Section
771     4.1's restriction to alphanumeric sequences sepa-
772     rated by hyphens is believed to permit all widely-
773     used header names without causing problems for any
774     widely-used software. Software is nevertheless
775     encouraged to cope correctly with the full range
776     of possibilities, since aberrations are known to
777     occur.
778    
779     Relayers MUST disregard headers not described in this Draft
780     (that is, with header names not mentioned in this Draft),
781     and pass them on unaltered.
782    
783     Posters wishing to convey non-standard information in head-
784     ers SHOULD use header names beginning with "X-". No stan-
785     dard header name will ever be of this form. Reading agents
786     SHOULD ignore "X-" headers, or at least treat them with
787    
788    
789    
790     2 June 1994 - 12 - expires 15 July 1994
791    
792    
793    
794    
795    
796     INTERNET DRAFT to be NEWS sec. 4.2.1
797    
798    
799     great care.
800    
801     The order of headers in an article is not significant. How-
802     ever, posting agents are encouraged to put mandatory headers
803     (see section 5) first, followed by optional headers (see
804     section 6), followed by headers not defined in this Draft.
805    
806     NOTE: While relayers and reading agents must be
807     prepared to handle any order, having the signifi-
808     cant headers (the precise definition of "signifi-
809     cant" depends on context) first can noticeably
810     improve efficiency, especially in memory-limited
811     environments where it is difficult to buffer up an
812     arbitrary quantity of headers while searching for
813     the few that matter.
814    
815     Header names are case-insensitive. There is a preferred
816     case convention, which posters and posting agents SHOULD
817     use: each hyphen-separated "word" has its initial letter (if
818     any) in uppercase and the rest in lowercase, except that
819     some abbreviations have all letters uppercase (e.g. "Mes-
820     sage-ID" and "MIME-Version"). The forms used in this Draft
821     are the preferred forms for the headers described herein.
822     Relayers and reading agents are warned that articles might
823     not obey this convention.
824    
825     NOTE: Although software must be prepared for the
826     possibility of random use of case in header names
827     (and other case-independent text), establishing a
828     preferred convention reduces pointless diversity,
829     and may permit optimized software that looks for
830     the preferred forms before resorting to less-
831     efficient case-insensitive searches.
832    
833     In general, a header can consist of several lines, with each
834     continuation line beginning with white space. The EOLs pre-
835     ceding continuation lines are ignored when processing such a
836     header, effectively combining the start-line and the contin-
837     uations into a single logical line. The logical line, less
838     the header name, colon, and any white space following the
839     colon, is the "header content".
840    
841    
842     4.2.2. Undesirable Headers
843    
844     A header whose content is empty is said to be an empty
845     header. Relayers and reading agents SHOULD not consider
846     presence or absence of an empty header to alter the seman-
847     tics of an article (although syntactic rules, such as
848     requirements that certain header names appear at most once
849     in an article, MUST still be satisfied). Posting agents
850     SHOULD delete empty headers from articles before posting
851     them.
852    
853    
854    
855    
856     2 June 1994 - 13 - expires 15 July 1994
857    
858    
859    
860    
861    
862     INTERNET DRAFT to be NEWS sec. 4.2.2
863    
864    
865     Headers that merely state defaults explicitly (e.g., a Fol-
866     lowup-To header with the same content as the Newsgroups
867     header, or a MIME Content-Type header with contents
868     "text/plain; charset=us-ascii") or state information that
869     reading agents can typically determine easily themselves
870     (e.g. the length of the body in octets) are redundant, con-
871     veying no information whatsoever. Headers that state infor-
872     mation which cannot possibly be of use to a significant num-
873     ber of relayers, reading agents, or readers (e.g., the name
874     of the software package used as the posting agent) are use-
875     less and pointless. Posters and posting agents SHOULD avoid
876     including redundant or useless headers in articles.
877    
878     NOTE: Information that someone, somewhere, might
879     someday find useful is best omitted from headers.
880     (There's quite enough of it in article bodies.)
881     Headers should contain information of known util-
882     ity only. This is not meant to preclude inclusion
883     of information primarily meant for news-software
884     debugging, but such information should be included
885     only if there is real reason, preferably based on
886     experience, to suspect that it may be genuinely
887     useful. Articles passing through gateways are the
888     only obvious case where inclusion of debugging
889     information appears clearly legitimate. (See sec-
890     tion 10.1.)
891    
892     NOTE: A useful rule of thumb for software imple-
893     mentors is: "if I had to pay a dollar a day for
894     the transmission of this header, would I still
895     think it worthwhile?".
896    
897    
898     4.2.3. White Space and Continuations
899    
900     The colon following the header name on the start-line MUST
901     be followed by white space, even if the header is empty. If
902     the header is not empty, at least some of the content MUST
903     appear on the start-line. Posting agents MUST enforce these
904     restrictions, but relayers (etc.) SHOULD accept even arti-
905     cles that violate them.
906    
907     NOTE: MAIL does not require white space after the
908     colon, but it is usual. RFC 1036 required the
909     white space, even in empty headers, and some
910     existing software demands it. In MAIL, and
911     arguably in RFC 1036 (although the wording is
912     vague), it is technically legitimate for the white
913     space to be part of a continuation line rather
914     than the start-line, but not all existing software
915     will accept this. Deleting empty headers and
916     placing some content on the start-line avoids this
917     issue... which is desirable because trailing
918     blanks, easily deleted by accident, are best not
919    
920    
921    
922     2 June 1994 - 14 - expires 15 July 1994
923    
924    
925    
926    
927    
928     INTERNET DRAFT to be NEWS sec. 4.2.3
929    
930    
931     made significant in headers.
932    
933     In general, posters and posting agents SHOULD use blank
934     (ASCII 32), not tab (ASCII 9), where white space is desired
935     in headers. Existing software does not consistently accept
936     tab as synonymous with blank in all contexts. In particu-
937     lar, RFC 1036 appeared to specify that the character immedi-
938     ately following the colon after a header name was required
939     to be a blank, and some news software insists on that, so
940     this character MUST be a blank. Again, posting agents MUST
941     enforce these restrictions but relayers SHOULD be more tol-
942     erant.
943    
944     Since the white space beginning a continuation line remains
945     a part of the logical line, headers can be "broken" into
946     multiple lines only at white space. Posting agents SHOULD
947     not break headers unnecessarily. Relayers SHOULD preserve
948     existing header breaks, and SHOULD not introduce new breaks.
949     Breaking headers SHOULD be a last resort; relayers and read-
950     ing agents SHOULD handle long header lines gracefully. (See
951     the discussion of size limits in section 4.6.)
952    
953    
954     4.3. Body
955    
956     Although the article body is unstructured for most of the
957     purposes of this Draft, structure MAY be imposed on it by
958     other means, notably MIME headers (see appendix B).
959    
960    
961     4.3.1. Body Format Issues
962    
963     The body of an article MAY be empty, although posting agents
964     SHOULD consider this an error condition (meriting returning
965     the article to the poster for revision). A posting agent
966     which does not reject such an article SHOULD issue a warning
967     message to the poster and supply a non-empty body. Note
968     that the separator line MUST be present even if the body is
969     empty.
970    
971     NOTE: An empty body is probably a poster error
972     except, arguably, for some control messages... and
973     even they really ought to have a body explaining
974     the reason for the control message. Some old
975     reading agents are known to generate empty bodies
976     for "cancel" control messages, so posting agents
977     might opt not to reject body-less articles in such
978     cases (although it would be better to fix the
979     reading agents to request a body). However, some
980     existing news software is known to react badly to
981     body-less articles, hence the request for posting
982     agents to insert a body in such cases.
983    
984    
985    
986    
987    
988     2 June 1994 - 15 - expires 15 July 1994
989    
990    
991    
992    
993    
994     INTERNET DRAFT to be NEWS sec. 4.3.1
995    
996    
997     NOTE: A possible posting-agent-supplied body text
998     (already used by one widespread posting agent) is
999     "This article was probably generated by a buggy
1000     news reader.". (The use of "reader" to refer to
1001     the reading agent is traditional, although this
1002     Draft uses more precise terminology.)
1003    
1004     NOTE: The requirement for the separator line even
1005     in a bodyless article is inherited from MAIL, and
1006     also distinguishes legitimately-bodyless articles
1007     from articles accidentally truncated in the middle
1008     of the headers.
1009    
1010     Note that an article body is a sequence of lines terminated
1011     by EOLs, not arbitrary binary data, and in particular it
1012     MUST end with an EOL. However, relayers SHOULD treat the
1013     body of an article as an uninterpreted sequence of octets
1014     (except as mandated by changes of EOL representation and by
1015     control-message processing) and SHOULD avoid imposing con-
1016     straints on it. See also section 4.6.
1017    
1018    
1019     4.3.2. Body Conventions
1020    
1021     Although body lines can in principle be very long (see sec-
1022     tion 4.6 for some discussion of length limits), posters
1023     SHOULD restrict body line lengths to circa 70-75 characters.
1024     On systems where text is conventionally stored with EOLs
1025     only at paragraph breaks and other "hard return" points,
1026     with software breaking lines as appropriate for display or
1027     manipulation, posting agents SHOULD insert EOLs as necessary
1028     so that posted articles comply with this restriction.
1029    
1030     NOTE: News originated in environments where line
1031     breaks in plain text files were supplied by the
1032     user, not the software. Be this good or bad, much
1033     reading-agent and posting-agent software assumes
1034     that news articles follow this convention, so it
1035     is often inconvenient to read or respond to arti-
1036     cles which violate it. The "70-75" number comes
1037     from the widespread use of display devices which
1038     are 80 columns wide, and the desire to leave a bit
1039     of margin for quoting etc. (see below).
1040    
1041     Reading agents confronted with body lines much longer than
1042     the available output-device width SHOULD break lines as
1043     appropriate. Posters are warned that such breaks may not
1044     occur exactly where the poster intends.
1045    
1046     NOTE: "As appropriate" would typically include
1047     breaking lines when supplying the text of an arti-
1048     cle to be quoted in a reply or followup, something
1049     that line-breaking reading agents often neglect to
1050     do now.
1051    
1052    
1053    
1054     2 June 1994 - 16 - expires 15 July 1994
1055    
1056    
1057    
1058    
1059    
1060     INTERNET DRAFT to be NEWS sec. 4.3.2
1061    
1062    
1063     Although styles vary widely, for plain text it is usual to
1064     use no left margin, leave the right edge ragged, use a sin-
1065     gle empty line to separate paragraphs, and employ normal
1066     natural-language usage on matters such as upper/lowercase.
1067     (In particular, articles SHOULD not be written entirely in
1068     uppercase. In environments where posters have access only
1069     to uppercase, posting agents SHOULD translate it to lower-
1070     case.)
1071    
1072     NOTE: Most people find substantial bodies of text
1073     entirely in uppercase relatively hard to read,
1074     while all-lowercase text merely looks slightly
1075     odd. The common association of uppercase with
1076     strong emphasis adds to this.
1077    
1078     Tone of voice does not carry well in written text, and mis-
1079     understandings are common when sarcasm, parody, or exaggera-
1080     tion for humorous effect is attempted without explicit warn-
1081     ing. It has become conventional to use the sequence ":-)",
1082     which (on most output devices) resembles a rotated "smiley
1083     face" symbol, as a marker for text not meant to be taken
1084     literally, especially when humor is intended. This practice
1085     aids communication and averts unintended ill-will; posters
1086     are urged to use it. A variety of analogous sequences are
1087     used with less-standardized meanings [Sanderson].
1088    
1089     The order of arrival of news articles at a particular host
1090     depends somewhat on transmission paths, and occasionally
1091     articles are lost for various reasons. When responding to a
1092     previous article, posters SHOULD not assume that all readers
1093     understand the exact context. It is common to quote some of
1094     the previous article to establish context. This SHOULD be
1095     done by prefacing each quoted line (even if it is empty)
1096     with the character ">". This will result in multiple levels
1097     of ">" when quoted context itself contains quoted context.
1098    
1099     NOTE: It may seem superfluous to put a prefix on
1100     empty lines, but it simplifies implementation of
1101     functions such as "skip all quoted text" in read-
1102     ing agents.
1103    
1104     Readability is enhanced if quoted text and new text are sep-
1105     arated by an empty line.
1106    
1107     Posters SHOULD edit quoted context to trim it down to the
1108     minimum necessary. However, posting agents SHOULD not
1109     attempt to enforce this by imposing overly-simplistic rules
1110     like "no more than 50% of the lines should be quotes".
1111    
1112     NOTE: While encouraging trimming is desirable, the
1113     50% rule imposed by some old posting agents is
1114     both inadequate and counterproductive. Posters do
1115     not respond to it by being more selective about
1116     quoting; they respond by padding short responses,
1117    
1118    
1119    
1120     2 June 1994 - 17 - expires 15 July 1994
1121    
1122    
1123    
1124    
1125    
1126     INTERNET DRAFT to be NEWS sec. 4.3.2
1127    
1128    
1129     or by using different quoting styles to defeat
1130     automatic analysis. The former adds unnecessary
1131     noise and volume, while the latter also defeats
1132     more useful forms of automatic analysis that read-
1133     ing agents might wish to do.
1134    
1135     NOTE: At the very least, if a minimum-unquoted
1136     quota is being set, article bodies shorter than
1137     (say) 20 lines, or perhaps articles which exceed
1138     the quota by only a few lines, should be exempt.
1139     This avoids the ridiculous situation of complain-
1140     ing about a 5-line response to a 6-line quote.
1141    
1142     NOTE: A more subtle posting-agent rule, suggested
1143     for experimental use, is to reject articles that
1144     appear to contain quoted signatures (see below).
1145     This is almost certainly the result of a careless
1146     poster not bothering to trim down quoted context.
1147     Also, if a posting agent or followup agent pre-
1148     sents an article template to the poster for edit-
1149     ing, it really should take note of whether the
1150     poster actually made any changes, and refrain from
1151     posting an unmodified template.
1152    
1153     Some followup agents supply "attribution" lines for quoted
1154     context, indicating where it first appeared and under whose
1155     name. When multiple levels of quoting are present and
1156     quoted context is edited for brevity, "inner" attribution
1157     lines are not always retained. The editing process is also
1158     somewhat error-prone. Reading agents (and readers) are
1159     warned not to assume that attributions are accurate.
1160    
1161     UNRESOLVED ISSUE: Should a standard format for
1162     attribution lines be defined? There is already
1163     considerable diversity... but automatic news anal-
1164     ysis would be substantially aided by a standard
1165     convention.
1166    
1167     Early difficulties in inferring return addresses from arti-
1168     cle headers led to "signatures": short closing texts, auto-
1169     matically added to the end of articles by posting agents,
1170     identifying the poster and giving his network addresses etc.
1171     If a poster or posting agent does append a signature to an
1172     article, the signature SHOULD be preceded with a delimiter
1173     line containing (only) two hyphens (ASCII 45) followed by
1174     one blank (ASCII 32). Posting agents SHOULD limit the
1175     length of signatures, since verbose excess bordering on
1176     abuse is common if no restraint is imposed; 4 lines is a
1177     common limit.
1178    
1179     NOTE: While signatures are arguably a blemish,
1180     they are a well-understood convention, and convey-
1181     ing the same information in headers exposes it to
1182     mangling and makes it rather less conspicuous. A
1183    
1184    
1185    
1186     2 June 1994 - 18 - expires 15 July 1994
1187    
1188    
1189    
1190    
1191    
1192     INTERNET DRAFT to be NEWS sec. 4.3.2
1193    
1194    
1195     standard delimiter line makes it possible for
1196     reading agents to handle signatures specially if
1197     desired. (This is unfortunately hampered by
1198     extensive misunderstanding of, and misuse of, the
1199     delimiter.)
1200    
1201     NOTE: The choice of delimiter is somewhat unfortu-
1202     nate, since it relies on preservation of trailing
1203     white space, but it is too well-established to
1204     change. There is work underway to define a more
1205     sophisticated signature scheme as part of MIME,
1206     and this will presumably supersede the current
1207     convention in due time.
1208    
1209     NOTE: Four 75-column lines of signature text is
1210     300 characters, which is ample to convey name and
1211     mail-address information in all but the most
1212     bizarre situations.
1213    
1214    
1215     4.4. Characters And Character Sets
1216    
1217     Header and body lines MAY contain any ASCII characters other
1218     than CR (ASCII 13), LF (ASCII 10), and NUL (ASCII 0).
1219    
1220     NOTE: CR and LF are excluded because they clash
1221     with common EOL conventions. NUL is excluded
1222     because it clashes with the C end-of-string con-
1223     vention, which is significant to most existing
1224     news software. These three characters are
1225     unlikely to be transmitted successfully.
1226    
1227     However, posters SHOULD avoid using ASCII control characters
1228     except for tab (ASCII 9), formfeed (ASCII 12), and backspace
1229     (ASCII 8). Tab signifies sufficient horizontal white space
1230     to reach the next of a set of fixed positions; posters are
1231     warned that there is no standard set of positions, so tabs
1232     should be avoided if precise spacing is essential. Formfeed
1233     signifies a point at which a reading agent SHOULD pause and
1234     await reader interaction before displaying further text.
1235     Backspace SHOULD be used only for underlining, done by a
1236     sequence of underscores (ASCII 95) followed by an equal num-
1237     ber of backspaces, signifying that the same number of text
1238     characters following are to be underlined. Posters are
1239     warned that underlining is not available on all output
1240     devices and is best not relied on for essential meaning.
1241     Reading agents SHOULD recognize underlining and translate it
1242     to the appropriate commands for devices that support it.
1243    
1244     NOTE: Interpretation of almost all control charac-
1245     ters is device-specific to some degree, and
1246     devices differ. Tabs and underlining are sup-
1247     ported, to some extent, by most modern devices and
1248     reading agents, hence the cautious exemptions for
1249    
1250    
1251    
1252     2 June 1994 - 19 - expires 15 July 1994
1253    
1254    
1255    
1256    
1257    
1258     INTERNET DRAFT to be NEWS sec. 4.4
1259    
1260    
1261     them. The underlining method is specified because
1262     the inverse method, text and then underscores, is
1263     tempting to the naive... but if sent unaltered to
1264     a device that shows only the most recent of sev-
1265     eral overstruck characters rather than a compos-
1266     ite, the result can be utterly unreadable.
1267    
1268     NOTE: A common interpretation of tab is that it is
1269     a request to space forward to the next position
1270     whose number is one more than a multiple of 8,
1271     with positions numbered sequentially starting at
1272     1. (So tab positions are 9, 17, 25, ...) Reading
1273     agents not constrained by existing system conven-
1274     tions might wish to use this interpretation.
1275    
1276     NOTE: It will typically be necessary for a reading
1277     agent to catch and interpret formfeed, not just
1278     send it to the output device. The actions per-
1279     formed by typical output devices on receiving a
1280     formfeed are neither adequate for nor appropriate
1281     to the pause-for-interaction meaning.
1282    
1283     Cooperating subnets which wish to employ non-ASCII character
1284     sets by using escape sequences (employing, e.g., ESC (ASCII
1285     27), SO (ASCII 14), and SI (ASCII 15)) to alter the meaning
1286     of superficially-ASCII characters MAY do so, but MUST use
1287     MIME headers to alert reading agents to the particular char-
1288     acter set(s) and escape sequences in use. A reading agent
1289     SHOULD not pass such an escape sequence through, unaltered,
1290     to the output device unless the agent confirms that the
1291     sequence is one used to affect character sets and has reason
1292     to believe that the device is capable of interpreting that
1293     particular sequence properly.
1294    
1295     NOTE: Cooperating-subnet organizers are warned
1296     that some very old relayers strip certain control
1297     characters out of articles they pass along. ESC
1298     is known to be among the affected characters.
1299    
1300     NOTE: There are now standard Internet encodings
1301     for Japanese [rrr] and Vietnamese [rrr] in partic-
1302     ular.
1303    
1304     Articles MUST not contain any octet with value exceeding
1305     127, i.e. any octet that is not an ASCII character.
1306    
1307     NOTE: This rule, like others, may be relaxed by
1308     unanimous consent of the members of a cooperating
1309     subnet, provided suitable precautions are taken to
1310     ensure that rule-violating articles do not leak
1311     out of the subnet. (This has already been done in
1312     many areas where ASCII is not adequate for the
1313     local language(s).) Beware that articles contain-
1314     ing non-ASCII octets in headers are a violation of
1315    
1316    
1317    
1318     2 June 1994 - 20 - expires 15 July 1994
1319    
1320    
1321    
1322    
1323    
1324     INTERNET DRAFT to be NEWS sec. 4.4
1325    
1326    
1327     the MAIL specifications and are not valid MAIL
1328     messages. MIME offers a way to encode non-ASCII
1329     characters in ASCII for use in headers; see sec-
1330     tion 4.5.
1331    
1332     NOTE: While there is great interest in using 8-bit
1333     character sets, not all software can yet handle
1334     them correctly. Hence the restriction to cooper-
1335     ating subnets. MIME encodings can be used to
1336     transmit such characters while remaining within
1337     the octet restriction.
1338    
1339     In anticipation of the day when it is possible to use non-
1340     ASCII characters safely anywhere, and to provide for the
1341     (substantial) cooperating subnets that are already using
1342     them, transmission paths SHOULD treat news articles as unin-
1343     terpreted sequences of octets (except perhaps for transfor-
1344     mations between EOL representations) and relayers SHOULD
1345     treat non-ASCII characters in articles as ordinary charac-
1346     ters.
1347    
1348     NOTE: 8-bit enthusiasts are warned that not all
1349     software conforms to these recommendations yet.
1350     In particular, standard NNTP [rrr] is a 7-bit pro-
1351     tocol, and there may be implementations which
1352     enforce this rule. Be warned, also, that it will
1353     never be safe to send raw binary data in the body
1354     of news articles, because changes of EOL represen-
1355     tation may (will!) corrupt it.
1356    
1357     Except where cooperating subnets permit more direct
1358     approaches, MIME [rrr] headers and encodings SHOULD be used
1359     to transmit non-ASCII content using ASCII characters; see
1360     section 4.5, appendix B, and the MIME RFCs for details. If
1361     article content can be expressed in ASCII, it SHOULD be.
1362     Failing that, the order of preference for character sets is
1363     that described in MIME [rrr].
1364    
1365     NOTE: Using the MIME facilities, it is possible to
1366     transmit ANY character set, and ANY form of binary
1367     data, using only ASCII characters. Equally impor-
1368     tant, such articles are self-describing and the
1369     reading agent can tell which octet-to-symbol map-
1370     ping is intended! Designation of some preferred
1371     character sets is intended to minimize the number
1372     of character sets that a reading agent must under-
1373     stand in order to display most articles properly.
1374    
1375     Articles containing non-ASCII characters, articles using
1376     ASCII characters (values 0 through 127) to refer to non-
1377     ASCII symbols, and articles using escape sequences to shift
1378     character sets SHOULD include MIME headers indicating which
1379     character set(s) and conventions are being used, and MUST do
1380     so unless such articles are strictly confined to a
1381    
1382    
1383    
1384     2 June 1994 - 21 - expires 15 July 1994
1385    
1386    
1387    
1388    
1389    
1390     INTERNET DRAFT to be NEWS sec. 4.4
1391    
1392    
1393     cooperating subnet which has its own pre-agreed conventions.
1394     MIME encodings are preferred over all these techniques. If
1395     it comes to a relayer's attention that it is being asked to
1396     pass an article using such techniques outward across what it
1397     knows to be the boundary of such a cooperating subnet, it
1398     MUST report this error to its administrator, and MAY refuse
1399     to pass the article beyond the subnet boundary. If it does
1400     pass the article, it MUST re-encode it with MIME encodings
1401     to make it conform to this Draft.
1402    
1403     NOTE: Such re-encoding is a non-trivial task, due
1404     to MIME rules such as the prohibition of nested
1405     encodings. It's not just a matter of pouring the
1406     body through a simple filter.
1407    
1408     Reading agents SHOULD note MIME headers and attempt to show
1409     the reader the closest possible approximation to the
1410     intended content. They SHOULD not just send the octets of
1411     the article to the output device unaltered, unless there is
1412     reason to believe that the output device will indeed inter-
1413     pret them correctly. Reading agents MUST not pass ASCII
1414     control characters or escape sequences, other than as dis-
1415     cussed above, unaltered to the output device; only by chance
1416     would the result be the desired one, and there is serious
1417     potential for harmful side effects, either accidental or
1418     malicious.
1419    
1420     NOTE: Exactly what to do with unwanted control
1421     characters/sequences depends on the philosophy of
1422     the reading agent, but passing them straight to
1423     the output device is almost always wrong. If the
1424     reading agent wants to mark the presence of such a
1425     character/sequence in circumstances where only
1426     ASCII printable characters are available, trans-
1427     lating it to "#" might be a suitable method; "#"
1428     is a conspicuous character seldom used in normal
1429     text.
1430    
1431     NOTE: Reading agents should be aware that many old
1432     output devices (or the transmission paths to them)
1433     zero out the top bit of octets sent to them. This
1434     can transform non-ASCII characters into ASCII con-
1435     trol characters.
1436    
1437     Followup agents MUST be careful to apply appropriate trans-
1438     formations of representation to the outbound followup as
1439     well as the inbound precursor. A followup to an article
1440     containing non-ASCII material is very likely to contain non-
1441     ASCII material itself.
1442    
1443    
1444    
1445    
1446    
1447    
1448    
1449    
1450     2 June 1994 - 22 - expires 15 July 1994
1451    
1452    
1453    
1454    
1455    
1456     INTERNET DRAFT to be NEWS sec. 4.5
1457    
1458    
1459     4.5. Non-ASCII Characters In Headers
1460    
1461     All octets found in headers MUST be ASCII characters. How-
1462     ever, it is desirable to have a way of encoding non-ASCII
1463     characters, especially in "human-readable" headers such as
1464     Subject. MIME [rrr] provides a way to do this. Full
1465     details may be found in the MIME specifications; herewith a
1466     quick summary to alert software authors to the issues...
1467    
1468     encoded-word = "=?" charset "?" encoding "?" codes "?="
1469     charset = 1*tag-char
1470     encoding = 1*tag-char
1471     tag-char = <ASCII printable character except !()<>@,;:\"[]/?=>
1472     codes = 1*code-char
1473     code-char = <ASCII printable character except ?>
1474    
1475     An encoded word is a sequence of ASCII printable characters
1476     that specifies the character set, encoding method, and bits
1477     of (potentially) non-ASCII characters. Encoded words are
1478     allowed only in certain positions in certain headers. Spe-
1479     cific headers impose restrictions on the content of encoded
1480     words beyond that specified in this section. Posting agents
1481     MUST ensure that any material resembling an encoded word
1482     (complete with all delimiters), in a context where encoded
1483     words may appear, really is an encoded word.
1484    
1485     NOTE: The syntax is a bit ugly, but it was
1486     designed to minimize chances of confusion with
1487     legitimate header contents, and to satisfy diffi-
1488     cult constraints on use within existing headers.
1489    
1490     An encoded word MUST not be more than 75 octets long. Each
1491     line of a header containing encoded word(s) MUST be at most
1492     76 octets long, not counting the EOL.
1493    
1494     NOTE: These limits are meant to bound the looka-
1495     head needed to determine whether text that begins
1496     "=?" is really an encoded word.
1497    
1498     The details of charsets and encodings are defined by MIME
1499     [rrr]; the sequence of preferred character sets is the same
1500     as MIME's. Encoded words SHOULD not be used for content
1501     expressible in ASCII.
1502    
1503     When an encoded word is used, other than in a newsgroup name
1504     (see section 5.5), it MUST be separated from any adjacent
1505     non-space characters (including other encoded words) by
1506     white space. Reading agents displaying the contents of
1507     encoded words (as opposed to their encoded form) should
1508     ignore white space adjacent to encoded words.
1509    
1510     UNRESOLVED ISSUE: Should this section be deleted
1511     entirely, or made much more terse? The material
1512     is relevant, but too complex to discuss fully.
1513    
1514    
1515    
1516     2 June 1994 - 23 - expires 15 July 1994
1517    
1518    
1519    
1520    
1521    
1522     INTERNET DRAFT to be NEWS sec. 4.5
1523    
1524    
1525     NOTE: The deletion of intervening white space per-
1526     mits using multiple encoded words, implicitly con-
1527     catenated by the deletion, to encode text that
1528     will not fit within a single 75-character encoded
1529     word.
1530    
1531     Reading-agent implementors are warned that although this
1532     Draft completely specifies where encoded words may appear in
1533     the headers it defines, there are other headers (e.g. the
1534     MIME Content-Description header) that MAY contain them.
1535    
1536    
1537     4.6. Size Limits
1538    
1539     Implementations SHOULD avoid fixed constraints on the sizes
1540     of lines within an article and on the size of the entire
1541     article.
1542    
1543     Relayers SHOULD treat the body of an article as an uninter-
1544     preted sequence of octets (except as mandated by changes of
1545     EOL representation and processing of control messages), not
1546     to be altered or constrained in any way.
1547    
1548     If it is absolutely necessary for an implementation to
1549     impose a limit on the length of header lines, body lines, or
1550     header logical lines, that limit shall be at least 1000
1551     octets, including EOL representations. Relayers and trans-
1552     mission paths confronted with lines beyond their internal
1553     limits (if any) MUST not simply inject EOLs at random
1554     places; they MAY break headers (as described in 4.2.3) as a
1555     last resort, and otherwise they MUST either pass the long
1556     lines through unaltered, or refuse to pass the article at
1557     all (see section 9.1 for further discussion).
1558    
1559     NOTE: The limit here is essentially the same mini-
1560     mum as that specified for SMTP mail in RFC 821
1561     [rrr]. Implementors are warned that Path (see
1562     section 5.6) and References (see section 6.5)
1563     headers, in particular, often become several hun-
1564     dred characters long, so 1000 is not an overly
1565     generous limit.
1566    
1567     All implementations MUST be able to handle an article
1568     totalling at least 65,000 octets, including headers and EOL
1569     representations, gracefully and efficiently. All implemen-
1570     tations SHOULD be able to handle an article totalling at
1571     least 1,000,000 (one million) octets, including headers and
1572     EOL representations, gracefully and efficiently. "Grace-
1573     fully and efficiently" is intended to preclude not only
1574     failures, but also major loss of performance, serious prob-
1575     lems in error recovery, or resource consumption beyond what
1576     is reasonably necessary.
1577    
1578    
1579    
1580    
1581    
1582     2 June 1994 - 24 - expires 15 July 1994
1583    
1584    
1585    
1586    
1587    
1588     INTERNET DRAFT to be NEWS sec. 4.6
1589    
1590    
1591     NOTE: The intent here is to prohibit lowering the
1592     existing de-facto limit any further, while
1593     strongly encouraging movement towards a higher
1594     one. Actually, although improvements are desir-
1595     able in some cases, much news software copes rea-
1596     sonably well with very large articles. The same
1597     cannot be said of the communications software and
1598     protocols used to transmit news from one host to
1599     another, especially when slow communications links
1600     are involved. Occasional huge articles that
1601     appear now (by accident or through ignorance) typ-
1602     ically leave trails of failing software, system
1603     problems, and irate administrators in their wake.
1604    
1605     NOTE: It is intended that the successor to this
1606     Draft will raise the "MUST" limit to 1,000,000 and
1607     the "SHOULD" limit still further.
1608    
1609     Posters SHOULD limit posted articles to at most 60,000
1610     octets, including headers and EOL representations, unless
1611     the articles are being posted only within a cooperating sub-
1612     net which is known to be capable of handling larger articles
1613     gracefully. Posting agents presented with a large article
1614     SHOULD warn the poster and request confirmation.
1615    
1616     NOTE: The difference between this and the earlier
1617     "MUST" limit is margin for header growth, differ-
1618     ing EOL representations, and transmission over-
1619     heads.
1620    
1621     NOTE: Disagreeable though these limits are, it is
1622     a fact that in current networks, an article larger
1623     than 64K (after header growth etc.) simply is not
1624     transmitted reliably. Note also the comments
1625     above on the trauma caused by single extremely-
1626     large articles now; the problems are real and cur-
1627     rent. These problems arguably should be fixed,
1628     but this will not happen network-wide in the imme-
1629     diate future. Hence the restriction of larger
1630     articles to cooperating subnets, for now.
1631    
1632     Posters using non-ASCII characters in their text MUST take
1633     into account the overhead involved in MIME encoding, unless
1634     the article's propagation will be entirely limited to a
1635     cooperating subnet which does not use MIME encodings for
1636     non-ASCII characters. For example, MIME base64 encoding
1637     involves growth by a factor of approximately 4/3, so an
1638     article which would likely have to use this encoding should
1639     be at most about 45,000 octets before encoding.
1640    
1641     Posters SHOULD use MIME "message/partial" conventions to
1642     facilitate automatic reassembly of a large document split
1643     into smaller pieces for posting. It is recommended that the
1644     content identifier used should be a message ID, generated by
1645    
1646    
1647    
1648     2 June 1994 - 25 - expires 15 July 1994
1649    
1650    
1651    
1652    
1653    
1654     INTERNET DRAFT to be NEWS sec. 4.6
1655    
1656    
1657     the same means as article message IDs (see section 5.3), and
1658     that all parts should have a See-Also header (see section
1659     6.16) giving the message IDs of at least the previous parts
1660     and preferably all the parts.
1661    
1662     NOTE: See-Also is more correct for this purpose
1663     than References, although References is in common
1664     use today (with less-formal reassembly arrange-
1665     ments). MIME reassemblers should probably examine
1666     articles suggested by References headers if See-
1667     Also headers are not present to indicate the
1668     whereabouts of the other parts of "mes-
1669     sage/partial" articles.
1670    
1671     To repeat: implementations SHOULD avoid fixed constraints on
1672     the sizes of lines within an article and on the size of the
1673     entire article.
1674    
1675    
1676     4.7. Example
1677    
1678     Here is a sample article:
1679    
1680     From: jerry@eagle.ATT.COM (Jerry Schwarz)
1681     Path: cbosgd!mhuxj!mhuxt!eagle!jerry
1682     Newsgroups: news.announce
1683     Subject: Usenet Etiquette -- Please Read
1684     Message-ID: <642@eagle.ATT.COM>
1685     Date: Mon, 17 Jan 1994 11:14:55 -0500 (EST)
1686     Followup-To: news.misc
1687     Expires: Wed, 19 Jan 1994 00:00:00 -0500
1688     Organization: AT&T Bell Laboratories, Murray Hill
1689    
1690     body
1691     body
1692     body
1693    
1694    
1695    
1696     5. Mandatory Headers
1697    
1698     An article MUST have one, and only one, of each of the fol-
1699     lowing headers: Date, From, Message-ID, Subject, Newsgroups,
1700     Path.
1701    
1702     NOTE: MAIL specifies (if read most carefully) that
1703     there must be exactly one Date header and exactly
1704     one From header, but otherwise does not restrict
1705     multiple appearances of headers. (Notably, it
1706     permits multiple Message-ID headers!) This
1707     appears singularly useless, or even harmful, in
1708     the context of news, and much current news soft-
1709     ware will not tolerate multiple appearances of
1710     mandatory headers.
1711    
1712    
1713    
1714     2 June 1994 - 26 - expires 15 July 1994
1715    
1716    
1717    
1718    
1719    
1720     INTERNET DRAFT to be NEWS sec. 5
1721    
1722    
1723     Note also that there are situations, discussed in the rele-
1724     vant parts of section 6, where References, Sender, or
1725     Approved headers are mandatory.
1726    
1727     In the discussions of the individual headers, the content of
1728     each is specified using the syntax notation. The convention
1729     used is that the content of, for example, the Subject header
1730     is defined as <Subject-content>.
1731    
1732    
1733     5.1. Date
1734    
1735     The Date header contains the date and time when the article
1736     was submitted for transmission:
1737    
1738     Date-content = [ weekday "," space ] date space time
1739     weekday = "Mon" / "Tue" / "Wed" / "Thu"
1740     / "Fri" / "Sat" / "Sun"
1741     date = day space month space year
1742     day = 1*2digit
1743     month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun"
1744     / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec"
1745     year = 4digit / 2digit
1746     time = hh ":" mm [ ":" ss ] space timezone
1747     timezone = "UT" / "GMT"
1748     / ( "+" / "-" ) hh mm [ space "(" zone-name ")" ]
1749     hh = 2digit
1750     mm = 2digit
1751     ss = 2digit
1752     zone-name = 1*( <ASCII printable character except ()\> / space )
1753    
1754     This is a restricted subset of the MAIL date format.
1755    
1756     If a weekday is given, it MUST be consistent with the date.
1757     The modern Gregorian calendar is used, and dates MUST be
1758     consistent with its usual conventions; for example, if the
1759     month is May, the day must be between 1 and 31 inclusive.
1760     The year SHOULD be given as four digits, and posting agents
1761     SHOULD enforce this; however, relayers MUST accept the two-
1762     digit form, and MUST interpret it as having the implicit
1763     prefix "19".
1764    
1765     NOTE: Two-digit year numbers can, should, and must
1766     be phased out by 1999.
1767    
1768     The time is given on the 24-hour clock, e.g. two hours
1769     before midnight is "22:00" or "22:00:00". The hh must be
1770     between 00 and 23 inclusive, the mm between 0 and 59 inclu-
1771     sive, and the ss between 0 and 61 inclusive.
1772    
1773     NOTE: Leap seconds very occasionally result in
1774     minutes that are 61 or 62 seconds long.
1775    
1776    
1777    
1778    
1779    
1780     2 June 1994 - 27 - expires 15 July 1994
1781    
1782    
1783    
1784    
1785    
1786     INTERNET DRAFT to be NEWS sec. 5.1
1787    
1788    
1789     The date and time SHOULD be given in the poster's local
1790     timezone, including a specification of that timezone as a
1791     numeric offset (which SHOULD include the timezone name, e.g.
1792     "EST", supplied in parentheses like a MAIL comment). If
1793     not, they MUST be given in Universal Time (abbreviated "UT";
1794     "GMT" is a historical synonym for "UT"). The timezone name
1795     in parentheses, if present, is a comment; software MUST
1796     ignore it, except that reading agents might wish to display
1797     it to the reader. Timezone names other than "UT" and "GMT"
1798     MUST appear only in the comment.
1799    
1800     NOTE: Attempts to deal with a full set of timezone
1801     names have all foundered on the vast number of
1802     such names in use and the duplications (for exam-
1803     ple, there are at least FIVE different timezones
1804     called "EST" by somebody). Even the limited set
1805     of North American zone names authorized by MAIL is
1806     subject to confusion and misinterpretation. Hence
1807     the flat ban on non-UT timezone names except as
1808     comments.
1809    
1810     NOTE: RFC 1036 specified that use of GMT (aka UT,
1811     UTC) was preferred. However, the local time (in
1812     the poster's timezone) is arguably information of
1813     possible interest to the reader, and this requires
1814     some indication of the poster's timezone. Numeric
1815     offsets are an unambiguous way of doing this, and
1816     their use was indeed sanctioned by RFC 1036 (that
1817     is, this is a change of preference only).
1818    
1819     NOTE: There is frequent confusion, including
1820     errors in some news software, regarding the sign
1821     of numeric timezones. Zones west of Greenwich
1822     have negative offsets. For example, North Ameri-
1823     can Eastern Standard Time is zone -0500 and North
1824     American Eastern Daylight Time is zone -0400.
1825    
1826     NOTE: Implementors are warned that the hh in a
1827     timezone can go up to about 14; it is not limited
1828     to 12. This is because the International Date
1829     Line does not run exactly along the boundary
1830     between zone -1200 and zone +1200.
1831    
1832     NOTE: The comments in section 2.6 regarding trans-
1833     lation to other languages are relevant here. The
1834     Date-content format, and the spellings of its com-
1835     ponents, as found in articles themselves, are
1836     always as defined in this Draft, regardless of the
1837     language used to interact with readers and
1838     posters. Reading and posting agents should trans-
1839     late as appropriate. Actually, even English-
1840     language reading and posting agents will probably
1841     want to do some degree of translation on dates, if
1842     only to abbreviate the lengthy format and
1843    
1844    
1845    
1846     2 June 1994 - 28 - expires 15 July 1994
1847    
1848    
1849    
1850    
1851    
1852     INTERNET DRAFT to be NEWS sec. 5.1
1853    
1854    
1855     (perhaps) translate to and from the reader's time-
1856     zone.
1857    
1858    
1859     5.2. From
1860    
1861     The From header contains the electronic address, and possi-
1862     bly the full name, of the article's author:
1863    
1864     From-content = address [ space "(" paren-phrase ")" ]
1865     / [ plain-phrase space ] "<" address ">"
1866     paren-phrase = 1*( paren-char / space / encoded-word )
1867     paren-char = <ASCII printable character except ()<>\>
1868     plain-phrase = plain-word *( space plain-word )
1869     plain-word = unquoted-word / quoted-word / encoded-word
1870     unquoted-word = 1*unquoted-char
1871     unquoted-char = <ASCII printable character except !()<>@,;:\".[]>
1872     quoted-word = quote 1*( quoted-char / space ) quote
1873     quote = <" (ASCII 34)>
1874     quoted-char = <ASCII printable character except "()<>\>
1875     address = local-part "@" domain
1876     local-part = unquoted-word *( "." unquoted-word )
1877     domain = unquoted-word *( "." unquoted-word )
1878    
1879     (Encoded words are described in section 4.5.) The full name
1880     is distinguished from the electronic address either by
1881     enclosing the former in parentheses (making it resemble a
1882     MAIL comment, after the address) or by enclosing the latter
1883     in angle brackets. The second form is preferred. In the
1884     first form, encoded words inside the full name MUST be com-
1885     posed entirely of <paren-char>s. In the second form,
1886     encoded words inside the full name may not contain charac-
1887     ters other than letters (of either case), digits, and the
1888     characters "!", "*", "+", "-", "/", "=", and "_". The local
1889     part is case-sensitive (except that all case counterparts of
1890     "postmaster" are deemed equivalent), the domain is case-
1891     insensitive, and all other parts of the From content are
1892     comments which MUST be ignored by news software (except
1893     insofar as reading agents may wish to display them to the
1894     reader). Posters and posting agents MUST restrict them-
1895     selves to this subset of the MAIL From syntax; relayers MAY
1896     accept a broader subset, but see the discussion in section
1897     9.1.
1898    
1899     NOTE: The syntax here is a restricted subset of
1900     the MAIL From syntax, with quoting particularly
1901     restricted, for simple parsing. In particular,
1902     the presence of "<" in the From content indicates
1903     that the second form is being used, otherwise the
1904     first form is being used. The major restrictions
1905     here are those already de-facto imposed by exist-
1906     ing software.
1907    
1908    
1909    
1910    
1911    
1912     2 June 1994 - 29 - expires 15 July 1994
1913    
1914    
1915    
1916    
1917    
1918     INTERNET DRAFT to be NEWS sec. 5.2
1919    
1920    
1921     NOTE: Overly-lenient posting agents sometimes per-
1922     mit the second form with a full name containing
1923     "(" or ")", but it is extremely rare for a full
1924     name to contain "<" or ">" even in mail. Accord-
1925     ingly, reading agents wishing to robustly deter-
1926     mine which form is in use in a particular article
1927     should key on the presence or absence of "<", not
1928     the presence or absence of "(".
1929    
1930     The address SHOULD be a valid and complete Internet domain
1931     address, capable of being successfully mailed to by an
1932     Internet host (possibly via an MX record and a forwarder).
1933     The pseudo-domain ".uucp" MAY be used for hosts registered
1934     in the UUCP maps (e.g. name "xyz.uucp" for registered site
1935     "xyz"), but such hosts SHOULD discontinue this usage (either
1936     by arranging a proper Internet address and forwarder, or by
1937     using the "% hack" (see below)), as soon as possible. Bit-
1938     net hosts SHOULD use Internet addresses, avoiding the obso-
1939     lescent ".bitnet" pseudo-domain. Other forms of address
1940     MUST not be used.
1941    
1942     NOTE: "Other forms" specifically include UK-style
1943     "backward" domains ("uk.oxbridge.cs" is in the
1944     Czech Republic, not the UK), pure-UUCP addressing
1945     ("knee!shin!foot" instead of
1946     "foot%shin@knee.uucp"), and abbreviated domains
1947     ("zebra.zoo" instead of "zebra.zoo.toronto.edu").
1948    
1949     If it is necessary to use the local part to specify a rout-
1950     ing relative to the nearest Internet host, this MUST be done
1951     using the "% hack", using "%" as a secondary "@". For exam-
1952     ple, to specify that mail to the address should go to Inter-
1953     net host "foo.bar.edu", then to non-Internet host "ein",
1954     then to non-Internet host "deux", for delivery there to
1955     mailbox "fred", a suitable address would be:
1956    
1957     fred%deux%ein@foo.bar.edu
1958    
1959     Analogous forms using "!" in the local part MUST not be
1960     used, as they are ambiguous; they should be expressed in the
1961     "%" form.
1962    
1963     NOTE: "a!b@c" can be interpreted as either "b%c@a"
1964     or "b%a@c", and there is no consistency in which
1965     choice is made. Such addresses consequently are
1966     unreliable. The "%" form does not suffer from
1967     this problem, and although its use is officially
1968     discouraged, it is a de-facto standard, to the
1969     point that MAIL recognizes it.
1970    
1971     Relayers MUST not, repeat MUST not, repeat MUST not, rewrite
1972     From lines, in any way, however minor or innocent-seeming.
1973     Trying to "fix" a non-conforming address has a very high
1974     probability of making things worse. Either pass it along
1975    
1976    
1977    
1978     2 June 1994 - 30 - expires 15 July 1994
1979    
1980    
1981    
1982    
1983    
1984     INTERNET DRAFT to be NEWS sec. 5.2
1985    
1986    
1987     unchanged, or reject the article.
1988    
1989     NOTE: An additional reason for banning the use of
1990     "!" addressing is that it has a much higher proba-
1991     bility of being rewritten into mangled unrecogniz-
1992     ability by old relayers.
1993    
1994     Posters and posting agents SHOULD avoid use of the charac-
1995     ters "!" and "@" in full names, as they may trigger unwanted
1996     header rewriting by old, simple-minded news software.
1997    
1998     NOTE: Also, the characters "." and ",", not infre-
1999     quently found in names (e.g., "John W. Campbell,
2000     Jr."), are NOT, repeat NOT, allowed in an unquoted
2001     word. A From header like the following MUST not
2002     be written without the quotation marks:
2003    
2004     From: "John W. Campbell, Jr." <editor@analog.com>
2005    
2006    
2007    
2008     5.3. Message-ID
2009    
2010     The Message-ID header contains the article's message ID, a
2011     unique identifier distinguishing the article from every
2012     other article:
2013    
2014     Message-ID-content = message-id
2015     message-id = "<" local-part "@" domain ">"
2016    
2017     As with From addresses, a message ID's local part is case-
2018     sensitive and its domain is case-insensitive. The "<" and
2019     ">" are parts of the message ID, not peculiarities of the
2020     Message-ID header.
2021    
2022     NOTE: News message IDs are a restricted subset of
2023     MAIL message IDs. In particular, no existing news
2024     software copes properly with MAIL quoting conven-
2025     tions within the local part, so they are forbid-
2026     den. This is unfortunate, particularly for X.400
2027     gateways that often wish to include characters
2028     which are not legal in unquoted message IDs, but
2029     it is impossible to fix net-wide. See the notes
2030     on gatewaying in section 10.
2031    
2032     The domain in the message ID SHOULD be the full Internet
2033     domain name of the posting agent's host. Use of the ".uucp"
2034     pseudo-domain (for hosts registered in the UUCP maps) or the
2035     ".bitnet" pseudo-domain (for Bitnet hosts) is permissible,
2036     but SHOULD be avoided.
2037    
2038     Posters and posting agents MUST generate the local part of a
2039     message ID using an algorithm which obeys the specified syn-
2040     tax (words separated by ".", with certain characters not
2041    
2042    
2043    
2044     2 June 1994 - 31 - expires 15 July 1994
2045    
2046    
2047    
2048    
2049    
2050     INTERNET DRAFT to be NEWS sec. 5.3
2051    
2052    
2053     permitted) (see section 5.2 for details), and will not
2054     repeat itself (ever). The algorithm SHOULD not generate
2055     message IDs which differ only in case of letters. Note the
2056     specification in section 6.5 of a recommended convention for
2057     indicating subject changes. Otherwise the algorithm is up
2058     to the implementor.
2059    
2060     NOTE: The crucial use of message IDs is to distin-
2061     guish circulating articles from each other and
2062     from articles circulated recently. They are also
2063     potentially useful as permanent indexing keys,
2064     hence the requirement for permanent uniqueness...
2065     but indexers cannot absolutely rely on this
2066     because the earlier RFCs urged it but did not
2067     demand it. All major implementations have always
2068     generated permanently-unique message IDs by
2069     design, but in some cases this is sensitive to
2070     proper administration, and duplicates may have
2071     occurred by accident.
2072    
2073     NOTE: The most popular method of generating local
2074     parts is to use the date and time, plus some way
2075     of distinguishing between simultaneous postings on
2076     the same host (e.g. a process number), and encode
2077     them in a suitably-restricted alphabet. An older
2078     but now less-popular alternative is to use a
2079     sequence number, incremented each time the host
2080     generates a new message ID; this is workable, but
2081     requires careful design to cope properly with
2082     simultaneous posting attempts, and is not as
2083     robust in the presence of crashes and other mal-
2084     functions.
2085    
2086     NOTE: Some buggy news software considers message
2087     IDs completely case-insensitive, hence the advice
2088     to avoid relying on case distinctions. The
2089     restrictions placed on the "alphabet" of local
2090     parts and domains in section 5.2 have the useful
2091     side effect of making it unnecessary to parse mes-
2092     sage IDs in complex ways to break them into case-
2093     sensitive and case-insensitive portions.
2094    
2095     The local part of a message ID MUST not be "postmaster" or
2096     any other string that would compare equal to "postmaster" in
2097     a case-insensitive comparison. Message IDs MUST be no
2098     longer than 250 octets, including the "<" and ">".
2099    
2100     NOTE: "Postmaster" is an irksome exception to
2101     case-sensitivity in local parts, inherited from
2102     MAIL, and simply avoiding it is the best way to
2103     deal with it (not that it's likely, but the issue
2104     needs to be dealt with). The length limit is
2105     undesirable, but is present in widely-used exist-
2106     ing software. The limit is actually 255, but a
2107    
2108    
2109    
2110     2 June 1994 - 32 - expires 15 July 1994
2111    
2112    
2113    
2114    
2115    
2116     INTERNET DRAFT to be NEWS sec. 5.3
2117    
2118    
2119     small safety margin is wise.
2120    
2121    
2122     5.4. Subject
2123    
2124     The Subject header's content (the "subject" of the article)
2125     is a short phrase describing the topic of the article:
2126    
2127     Subject-content = [ "Re: " ] nonblank-text
2128    
2129     Encoded words MAY appear in this header.
2130    
2131     If the article is a followup, the subject SHOULD begin with
2132     "Re: " (a "back reference"). If the article is not a fol-
2133     lowup, the subject MUST not begin with a back reference.
2134     Back references are case-insensitive, although "Re: " is the
2135     preferred form. A followup agent assisting a poster in
2136     preparing a followup SHOULD prepend a back reference, UNLESS
2137     the subject already begins with one. If the poster deter-
2138     mines that the topic of the followup differs significantly
2139     from what is described in the subject, a new, more descrip-
2140     tive, subject SHOULD be substituted (with no back refer-
2141     ence). An article whose subject begins with a back refer-
2142     ence MUST have a References header referencing the precur-
2143     sor.
2144    
2145     NOTE: A back reference is FOUR characters, the
2146     fourth being a blank. RFC 1036 was confused about
2147     this. Observe also that only ONE back reference
2148     should be present.
2149    
2150     NOTE: There is a semi-standard convention, often
2151     used, in which a subject change is flagged by mak-
2152     ing the new Subject-content of the form:
2153    
2154     new topic (was: old topic)
2155    
2156     possibly with "old topic" somewhat truncated.
2157     Posters wishing to do something like this are
2158     urged to use this exact form, to simplify auto-
2159     mated analysis.
2160    
2161     For historical reasons, the subject MUST not begin with
2162     "cmsg " (note that this sequence ends with a blank).
2163    
2164     NOTE: Some old news software takes a subject
2165     beginning with "cmsg " as an indication that the
2166     article is a control message (see sections 6.6 and
2167     7). This mechanism is obsolete and undesirable,
2168     but accidental triggering of it is still possible.
2169    
2170     The subject SHOULD be terse. Posters SHOULD avoid trying to
2171     cram their entire article into the headers; even the sim-
2172     plest query usually benefits from a sentence or two of
2173    
2174    
2175    
2176     2 June 1994 - 33 - expires 15 July 1994
2177    
2178    
2179    
2180    
2181    
2182     INTERNET DRAFT to be NEWS sec. 5.4
2183    
2184    
2185     elaboration and context, and the details of header display
2186     vary widely among reading agents.
2187    
2188     NOTE: All-in-the-subject articles are sometimes
2189     the result of misunderstandings over the interac-
2190     tion protocol of a posting agent. Posting agents
2191     might wish to give special attention to the possi-
2192     bility that a poster specifying a very long sub-
2193     ject might have thought he was typing the body of
2194     the article.
2195    
2196    
2197     5.5. Newsgroups
2198    
2199     The Newsgroups header's content specifies which newsgroup(s)
2200     the article is posted to:
2201    
2202     Newsgroups-content = newsgroup-name *( ng-delim newsgroup-name )
2203     newsgroup-name = plain-component *( "." component )
2204     component = plain-component / encoded-word
2205     plain-component = component-start *13component-rest
2206     component-start = lowercase / digit
2207     lowercase = <letter a-z>
2208     component-rest = component-start / "+" / "-" / "_"
2209     ng-delim = ","
2210    
2211     Encoded words used in newsgroup names MUST not contain char-
2212     acters other than letters, digits, "+", "-", "/", "_", "=",
2213     and "?" (although they may encode them).
2214    
2215     A newsgroup name consists of one or more components, which
2216     may be plain components or (except for the first) encoded
2217     words. A plain component MUST contain at least one letter,
2218     MUST begin with a letter or digit, and MUST not be longer
2219     than 14 characters. The first component MUST begin with a
2220     letter; subsequent components SHOULD begin with a letter.
2221     Newsgroup names MUST not contain uppercase letters, except
2222     where required by encodings in encoded words. The sequences
2223     "all" and "ctl" MUST not be used as components.
2224    
2225     NOTE: The alphabet and syntax specified encom-
2226     passes all existing names of widespread news-
2227     groups, while avoiding various forms that are
2228     known to cause problems. Important existing soft-
2229     ware uses various non-alphanumeric characters as
2230     punctuation adjacent to newsgroup names. (It
2231     would, in fact, be preferable to ban "+" from
2232     newsgroup names, were it not that several
2233     widespread newsgroups related to the C++ program-
2234     ming language already use it.)
2235    
2236     NOTE: Much existing software converts the news-
2237     group name into a directory path and stores the
2238     articles themselves using numeric filenames, so
2239    
2240    
2241    
2242     2 June 1994 - 34 - expires 15 July 1994
2243    
2244    
2245    
2246    
2247    
2248     INTERNET DRAFT to be NEWS sec. 5.5
2249    
2250    
2251     all-digit name components can be troublesome; the
2252     "Great Renaming" early in the history of Usenet
2253     included revisions of several newsgroup names to
2254     eliminate such components.
2255    
2256     NOTE: The same storage technique is the reason for
2257     the 14-character limit. The limit is now largely
2258     historical, since most modern systems have much
2259     larger limits on the length of a directory entry's
2260     name, but many old systems are still in use. Sys-
2261     tems with shorter limits also exist, but news
2262     software on such systems has had to deal with the
2263     problem already, since there are several
2264     widespread newsgroups with 14-character components
2265     in their names. Implementors are warned that it
2266     is intended that the successor to this Draft will
2267     increase the 14-character limit, and are urged to
2268     fix their software to handle longer names grace-
2269     fully (if such fixes are necessary, given the
2270     intended domain of application of the particular
2271     software).
2272    
2273     NOTE: The requirement that the first character of
2274     a name be a letter accommodates existing software
2275     which assumes it can tell the difference between a
2276     newsgroup name and other possible syntactic enti-
2277     ties by inspecting the first character. Similar
2278     considerations motivate excluding "+", "-", and
2279     "_" from coming first in a component, and the
2280     preference for components that do not begin with
2281     digits. The "all" sequence is used as a wildcard
2282     symbol in much existing software, and the "ctl"
2283     sequence was involved in an obsolete historical
2284     mechanism for marking control messages, so they
2285     are best avoided.
2286    
2287     NOTE: Possibly newsgroup names should have been
2288     case-insensitive, but all existing software treats
2289     them as case-sensitive. (RFC 977 [rrr] claims
2290     that they are case-insensitive in NNTP, but exist-
2291     ing implementations are believed to ignore this.)
2292     The simplest solution is just to ban use of upper-
2293     case letters, since no widespread newsgroup name
2294     uses them anyway; this avoids any possibility of
2295     confusion.
2296    
2297     NOTE: The syntax has the disadvantage of contain-
2298     ing no white space, making it impossible to con-
2299     tinue a Newsgroups header across several lines.
2300     Implementors of relayers and reading agents are
2301     warned that it is intended that the successor to
2302     this Draft will change the definition of ng-delim
2303     to:
2304    
2305    
2306    
2307    
2308     2 June 1994 - 35 - expires 15 July 1994
2309    
2310    
2311    
2312    
2313    
2314     INTERNET DRAFT to be NEWS sec. 5.5
2315    
2316    
2317     ng-delim = "," [ space ]
2318    
2319     and are urged to fix their software to handle
2320     (i.e., ignore) white space following the commas.
2321     Meanwhile, posters must avoid inserting such space
2322     (despite the natural-language convention which
2323     permits it) and posting agents should strip it
2324     out.
2325    
2326     NOTE: Encoded words as components are somewhat
2327     problematic, but are clearly desirable for use in
2328     non-English-speaking nations. They are not sub-
2329     ject to the 14-character limit, and this (plus the
2330     possibility of "/" within them) may require spe-
2331     cial handling in news software.
2332    
2333     Encoded words are allowed in newsgroup names ONLY where non-
2334     ASCII characters are necessary to the name, and must use the
2335     "b" encoding [rrr] and the first suitable character set in
2336     the MIME order of preferred character sets [rrr].
2337    
2338     NOTE: Since the newsgroup name is the encoded
2339     form, NOT the underlying non-ASCII form, there is
2340     room for terrible confusion here if the choice of
2341     encoding for a particular name is not fully stan-
2342     dardized.
2343    
2344     Posters SHOULD use only the names of existing newsgroups in
2345     the Newsgroups header, because newsgroups are NOT created
2346     simply by being posted to. However, it is legitimate to
2347     cross-post to newsgroup(s) which do not exist on the posting
2348     agent's host, provided that at least one of the newsgroups
2349     DOES exist there, and followup agents MUST accept this
2350     (posting agents MAY accept it, but SHOULD at least alert the
2351     poster to the situation and request confirmation). Relayers
2352     MUST not rewrite Newsgroups headers in any way, even if some
2353     or all of the newsgroups do not exist on the relayer's host.
2354    
2355     NOTE: Early experience with news software that
2356     created newsgroups when they were mentioned in a
2357     Newsgroups header was thoroughly negative: posters
2358     frequently mistype newsgroup names.
2359    
2360     NOTE: While it is legitimate for some of an arti-
2361     cle's newsgroups not to exist on the host where it
2362     is posted, this IS a rather unusual situation
2363     except in followups (which should go to all news-
2364     groups the precursor was posted to, even if not
2365     all of them reach the site where the followup is
2366     being posted).
2367    
2368     NOTE: Rewriting Newsgroups headers to strip
2369     locally-unknown newsgroups is superficially
2370     attractive. However, early experience with
2371    
2372    
2373    
2374     2 June 1994 - 36 - expires 15 July 1994
2375    
2376    
2377    
2378    
2379    
2380     INTERNET DRAFT to be NEWS sec. 5.5
2381    
2382    
2383     exactly that policy was thoroughly negative: news
2384     propagation is more redundant and much less
2385     orderly than many people imagine, and in particu-
2386     lar it is not unheard-of for the (sometimes)
2387     fastest path between two (say) U of Toronto sites
2388     to pass outside U of Toronto... in which case
2389     newsgroup stripping can cause incomplete propaga-
2390     tion. Having an article's set of newsgroups
2391     change as it propagates can also result in fol-
2392     lowups not achieving the same propagation as the
2393     original. It's been tried; it's more trouble than
2394     it's worth; don't do it.
2395    
2396     NOTE: In particular, newsgroup stripping superfi-
2397     cially looks like a solution to the problem of
2398     duplicate regional newsgroup names. For example,
2399     both University of Toronto and University of Texas
2400     have "ut.general" newsgroups, and material cross-
2401     posted to that name and a global newsgroup appears
2402     in both universities' local newsgroups. However,
2403     the side effects of stripping are sufficiently
2404     unacceptable to disqualify it for this purpose.
2405     Don't do it.
2406    
2407     Cross-posting an article to several relevant newsgroups is
2408     far superior to posting separate articles with duplicated
2409     content to each newsgroup, because reading agents can detect
2410     the situation and show the article to a reader only once.
2411     Posters SHOULD cross-post rather than duplicate-post.
2412    
2413     NOTE: On the other hand, cross-posting to a large
2414     number of newsgroups usually indicates that the
2415     poster has not thought about his audience; arti-
2416     cles are rarely pertinent to more than (say) half
2417     a dozen newsgroups. Posting agents might wish to
2418     request confirmation when the number of newsgroups
2419     exceeds (say) five in the presence of a Followup-
2420     To header, or (say) two in the absence of such a
2421     header.
2422    
2423     NOTE: One problem with cross-postings is what to
2424     do with an article cross-posted to a set of news-
2425     groups including both moderated and unmoderated
2426     ones. Posters tend to expect such an article to
2427     show up immediately in the unmoderated newsgroups,
2428     especially if they do not realize that one or more
2429     of the newsgroups is moderated. However, since it
2430     is not possible for a moderator to retroactively
2431     add an already-posted article to a moderated news-
2432     group, the only correct action is to mail such an
2433     article to one (and only one) of the moderators
2434     for action. It is probably best for the posting
2435     agent to detect this situation and ask the poster
2436     what action is preferred. The acceptable choices
2437    
2438    
2439    
2440     2 June 1994 - 37 - expires 15 July 1994
2441    
2442    
2443    
2444    
2445    
2446     INTERNET DRAFT to be NEWS sec. 5.5
2447    
2448    
2449     are to alter the newsgroup list or to mail to a
2450     moderator of the poster's choice; the posting
2451     agent should NOT offer duplicate-posting as an
2452     easy-to-request option (if only because many mod-
2453     erators will reject a submission that has already
2454     been posted to unmoderated newsgroups).
2455    
2456     NOTE: An article cross-posted to multiple moder-
2457     ated newsgroups really should have approval from
2458     all the moderators involved. In practice, the
2459     only straightforward way to do this is to send the
2460     article to one of them and have him consult the
2461     others.
2462    
2463     A newsgroup SHOULD not appear more than once in the News-
2464     groups header.
2465    
2466     Newsgroup names having only one component are reserved for
2467     newsgroups whose propagation is restricted to a single host
2468     (or the administrative equivalent). It is inadvisable to
2469     name a newsgroup "poster" because that word has special
2470     meaning in the Followup-To header (see section 6.1). The
2471     names "control" and "junk" are frequently used for pseudo-
2472     newsgroups internal to relayer implementations, and hence
2473     are also best avoided.
2474    
2475     NOTE: Beware of the duplicate-regional-newsgroup-
2476     names problem mentioned above. In particular,
2477     there are many, many hosts with a newsgroup named
2478     "general", and some surprising things show up in
2479     such newsgroups when people cross-post. It is
2480     probably better to use multi-component names,
2481     which are less likely to be duplicated. Fred's
2482     Widget House should use "fwh.general" rather than
2483     just "general" as its in-house general-topics
2484     newsgroup.
2485    
2486     It is conventional to reserve newsgroup names beginning with
2487     "to." for test messages sent on an essentially point-to-
2488     point basis (see also the ihave/sendme protocol described in
2489     section 7.2); newsgroup names beginning with "to." SHOULD
2490     not be used for any other purpose. The second (and possibly
2491     later) components of such a name should, together, comprise
2492     the relayer name (see section 5.6) of a relayer. The news-
2493     group exists only at the named relayer and its neighbors.
2494     The neighbors all pass that newsgroup to the named relayer,
2495     while the named relayer does not pass it to anyone.
2496    
2497     The order of newsgroup names in the Newsgroups header is not
2498     significant.
2499    
2500    
2501    
2502    
2503    
2504    
2505    
2506     2 June 1994 - 38 - expires 15 July 1994
2507    
2508    
2509    
2510    
2511    
2512     INTERNET DRAFT to be NEWS sec. 5.6
2513    
2514    
2515     5.6. Path
2516    
2517     The Path header's content indicates which relayers the arti-
2518     cle has already visited, so that unnecessary redundant
2519     transmission can be avoided:
2520    
2521     Path-content = [ path-list path-delimiter ] local-part
2522     path-list = relayer-name *( path-delimiter relayer-name )
2523     relayer-name = 1*rn-char
2524     rn-char = letter / digit / "." / "-" / "_"
2525     path-delimiter = "!"
2526    
2527     The Path content is a list of relayer names, separated by
2528     path delimiters, followed (after a final delimiter) by the
2529     local part of a mailing address. Each relayer MUST prepend
2530     its name, and a delimiter, to the Path content in all arti-
2531     cles it processes. A relayer MUST not pass an article to a
2532     neighboring relayer whose name is already mentioned in an
2533     article's path list, unless this is explicitly requested by
2534     the neighbor in some way. The Path content is case-
2535     sensitive.
2536    
2537     NOTE: The Path header supplied by a posting agent
2538     should normally contain only the local part. The
2539     relayer that the posting agent passes the article
2540     to for posting will prepend its relayer name to
2541     get the path list started.
2542    
2543     NOTE: Observe that the trailing local part is NOT
2544     part of the path list. This Path header:
2545    
2546     Path: fee!fie!foe!fum
2547    
2548     contains three relayer names: "fee", "fie", and
2549     "foe". A relayer named "fum" is still eligible to
2550     be sent this article.
2551    
2552     NOTE: This syntax has the disadvantage of contain-
2553     ing no white space, making it impossible to con-
2554     tinue a Path header across several lines. Imple-
2555     mentors of relayers and reading agents are warned
2556     that it is intended that the successor to this
2557     Draft will change the definition of path delimiter
2558     to:
2559    
2560     path-delimiter = "!" [ space ]
2561    
2562     and are urged to fix their software to handle
2563     (i.e., ignore) white space following the exclama-
2564     tion points. They are urged to hurry; some ill-
2565     behaved systems reportedly already feel free to
2566     add such white space.
2567    
2568    
2569    
2570    
2571    
2572     2 June 1994 - 39 - expires 15 July 1994
2573    
2574    
2575    
2576    
2577    
2578     INTERNET DRAFT to be NEWS sec. 5.6
2579    
2580    
2581     NOTE: RFC 1036 allows considerably more flexibil-
2582     ity in choice of delimiter, in theory, but this
2583     flexibility has never been used and most news
2584     software does not implement it properly. The
2585     grammar reflects the current reality. Note, in
2586     particular, that RFC 1036 treats "_" as a delim-
2587     iter, but in fact it is known to appear in relayer
2588     names occasionally.
2589    
2590     Because an article will not propagate to a relayer already
2591     mentioned in its path list, the path list MUST not contain
2592     any names other than those of relayers the article has
2593     passed through AS NEWS. This is trivially obvious for nor-
2594     mal news articles, but requires attention from the modera-
2595     tors of moderated newsgroups and the implementors and main-
2596     tainers of gateways.
2597    
2598     NOTE: For the same reason, a relayer and its
2599     neighbors need to agree on the choice of relayer
2600     name, and names should not be changed without
2601     notifying neighbors.
2602    
2603     Relayer names need to be unique among all relayers which
2604     will ever see the articles using them. A relayer name is
2605     normally either an "official" name for the host the relayer
2606     runs on, or some other "official" name controlled by the
2607     same organization. Except in cooperating subnets that agree
2608     to some other convention, and don't let articles using it
2609     escape beyond the subnet, a relayer name MUST be either a
2610     UUCP name registered in the UUCP maps (without any domain
2611     suffix such as ".UUCP"), or a complete Internet domain name.
2612     Use of a (registered) UUCP name is recommended, where prac-
2613     tical, to keep the length of the path list down.
2614    
2615     The use of Internet domain names in the path list presents
2616     one problem: domain names are case-insensitive, but the path
2617     list is case-sensitive. Relayers using domain names as
2618     their relayer names MUST pick a standard form for the name,
2619     and use that form consistently to the exclusion of all oth-
2620     ers. The preferred form for this purpose, which relayers
2621     SHOULD use, is the all-lowercase form.
2622    
2623     NOTE: It is arguably unfortunate that the path
2624     list is case-sensitive, but it is much too late to
2625     change this. Most Internet sites do, in any
2626     event, use one standardized form of their name
2627     almost everywhere.
2628    
2629     In the ordinary case, where the poster is the author of the
2630     article, the local part following the path list SHOULD be
2631     the local part of the poster's full Internet domain mailing
2632     address.
2633    
2634    
2635    
2636    
2637    
2638     2 June 1994 - 40 - expires 15 July 1994
2639    
2640    
2641    
2642    
2643    
2644     INTERNET DRAFT to be NEWS sec. 5.6
2645    
2646    
2647     NOTE: It should be just the local part, not the
2648     full address. The character "@" does not appear
2649     in a Path header.
2650    
2651     The Path content somewhat resembles a mailing address, par-
2652     ticularly in the UUCP world with its manual routing and "!"
2653     address syntax. Historically, this resemblance was impor-
2654     tant, and the Path content was often used as a reply
2655     address. This practice has always been somewhat unreliable,
2656     since news paths are not always mail paths and news relayer
2657     names are not always recognized by mail handlers, and its
2658     reliability has generally worsened in recent times. The
2659     widespread use of and recognition of Internet domain
2660     addresses, even outside the actual Internet, has largely
2661     eliminated the problem. Readers SHOULD not use the Path
2662     content as a reply address. On the other hand, relayer
2663     administrators are urged not to break this usage without
2664     good reason; where practical, paths followed by news SHOULD
2665     be traversable by mail, and mail handlers SHOULD recognize
2666     relayer names as host names.
2667    
2668     It will typically be difficult or impractical for gateways
2669     and moderators to supply a Path content that is useful as a
2670     reply address for the author, bearing in mind that the path
2671     list they supply will normally be empty. (To reiterate: the
2672     path list MUST not contain any names other than those of
2673     relayers the article has passed through AS NEWS.) They
2674     SHOULD supply a local part that will result in replies to a
2675     Path-derived address being returned to the sender with a
2676     brief explanation. Software permitting, the local part
2677     "not-for-mail" is recommended.
2678    
2679     NOTE: A moderator or gateway administrator who
2680     supplies a local part that delivers such mail to
2681     an administrative mailbox will quickly discover
2682     why it should be bounced automatically! It is
2683     best, however, for the returned message to include
2684     an explanation of what has probably happened,
2685     rather than just a mysterious "undeliverable mail"
2686     complaint, since the sender may not be aware that
2687     his/her software is unwisely using the Path con-
2688     tent as a reply address. Reply software might
2689     wish to question attempts to reply to a Path-
2690     derived address ending in "not-for-mail" (which is
2691     why a specific name is being recommended here).
2692    
2693    
2694     6. Optional Headers
2695    
2696     Many MAIL headers, and many of those specified in present
2697     and future MAIL extensions, are potentially applicable to
2698     news. Headers specific to MAIL's point-to-point transmis-
2699     sion paradigm, e.g. To and Cc, SHOULD not appear in news
2700     articles. (Gateways wishing to preserve such information
2701    
2702    
2703    
2704     2 June 1994 - 41 - expires 15 July 1994
2705    
2706    
2707    
2708    
2709    
2710     INTERNET DRAFT to be NEWS sec. 6
2711    
2712    
2713     for debugging probably SHOULD hide it under different names;
2714     prefixing "X-" to the original headers, resulting in e.g.
2715     "X-To", is suggested.)
2716    
2717     The following optional headers are either specific to news
2718     or of particular note in news articles; an article MAY con-
2719     tain some or all of them. (Note that there are some circum-
2720     stances in which some of them are mandatory; these are
2721     explained under the individual headers.) An article MUST
2722     not contain two or more headers with any one of these header
2723     names.
2724    
2725     NOTE: The ban on duplicate header names does not
2726     apply to headers not specified in this Draft at
2727     all, such as "X-" headers. Software should not
2728     assume that all header names in a given article
2729     are unique.
2730    
2731    
2732     6.1. Followup-To
2733    
2734     The Followup-To header contents specify which newsgroup(s)
2735     followups should be posted to:
2736    
2737     Followup-To-content = Newsgroups-content / "poster"
2738    
2739     The syntax is the same as that of the Newsgroups content,
2740     with the exception that the magic word "poster" means that
2741     followups should be mailed to the article's reply address
2742     rather than posted. In the absence of Followup-To, the
2743     default newsgroup(s) for a followup are those in the News-
2744     groups header.
2745    
2746     NOTE: The way to request that followups be mailed
2747     to a specific address other than that in the From
2748     line is to supply "Followup-To: poster" and a
2749     Reply-To header. Putting a mailing address in the
2750     Followup-To line is incorrect; posting agents
2751     should reject or rewrite such headers.
2752    
2753     NOTE: There is no syntax for "no followups
2754     allowed" because "Followup-To: poster" accom-
2755     plishes this effect without extra machinery.
2756    
2757     Although it is generally desirable to limit followups to the
2758     smallest reasonable set of newsgroups, especially when the
2759     precursor was cross-posted widely, posting agents SHOULD not
2760     supply a Followup-To header except at the poster's explicit
2761     request.
2762    
2763     NOTE: In particular, it is incorrect for the post-
2764     ing agent to assume that followups to a cross-
2765     posted article should be directed to the first
2766     newsgroup only. Trimming the list of newsgroups
2767    
2768    
2769    
2770     2 June 1994 - 42 - expires 15 July 1994
2771    
2772    
2773    
2774    
2775    
2776     INTERNET DRAFT to be NEWS sec. 6.1
2777    
2778    
2779     should be the poster's decision, not the posting
2780     agent's. However, when an article is to be cross-
2781     posted to a considerable number of newsgroups, a
2782     posting agent might wish to SUGGEST to the poster
2783     that followups go to a shorter list.
2784    
2785    
2786     6.2. Expires
2787    
2788     The Expires header content specifies a date and time when
2789     the article is deemed to be no longer useful and should be
2790     removed ("expired"):
2791    
2792     Expires-content = Date-content
2793    
2794     The content syntax is the same as that of the Date content.
2795     In the absence of Expires, the default is decided by the
2796     administrators of each host the article reaches, who MAY
2797     also restrict the extent to which the Expires header is hon-
2798     ored.
2799    
2800     The Expires header has two main applications: removing arti-
2801     cles whose utility ends on a specific date (e.g., event
2802     announcements which can be removed once the day of the event
2803     is past) and preserving articles expected to be of prolonged
2804     usefulness (e.g., information aimed at new readers of a
2805     newsgroup). The latter application is sometimes abused.
2806     Since individual hosts have local policies for expiration of
2807     news (depending on available disk space, for instance),
2808     posters SHOULD not provide Expires headers for articles
2809     unless there is a natural expiration date associated with
2810     the topic. Posting agents MUST not provide a default
2811     Expires header. Leave it out and allow local policies to be
2812     used unless there is a good reason not to. Expiry dates are
2813     properly the decision of individual host administrators;
2814     posters and moderators SHOULD set only expiry dates that
2815     most administrators would agree with.
2816    
2817     NOTE: A poster preparing an Expires header for an
2818     article whose utility ends on a specific day
2819     should typically specify the NEXT day as the
2820     expiry date. A meeting on July 7th remains of
2821     interest on the 7th.
2822    
2823    
2824     6.3. Reply-To
2825    
2826     The Reply-To header content specifies a reply address dif-
2827     ferent from the author's address given in the From header:
2828    
2829     Reply-To-content = From-content
2830    
2831     In the absence of Reply-To, the reply address is the address
2832     in the From header.
2833    
2834    
2835    
2836     2 June 1994 - 43 - expires 15 July 1994
2837    
2838    
2839    
2840    
2841    
2842     INTERNET DRAFT to be NEWS sec. 6.3
2843    
2844    
2845     Use of a Reply-To header is preferable to including a simi-
2846     lar request in the article body, because reply-preparation
2847     software can take account of Reply-To automatically.
2848    
2849    
2850     6.4. Sender
2851    
2852     The Sender header identifies the poster, in the event that
2853     this differs from the author identified in the From header:
2854    
2855     Sender-content = From-content
2856    
2857     In the absence of Sender, the default poster is the author
2858     (named in the From header).
2859    
2860     NOTE: The intent is that the Sender header have a
2861     fairly high probability of identifying the person
2862     who really posted the article. The ability to
2863     specify a From header naming someone other than
2864     the poster is useful but can be abused.
2865    
2866     If the poster supplies a From header, the posting agent MUST
2867     ensure that a Sender header is present, unless it can verify
2868     that the mailing address in the From header is a valid mail-
2869     ing address for the poster. A poster-supplied Sender header
2870     MAY be used, if its mailing address is verifiably a valid
2871     mailing address for the poster; otherwise the posting agent
2872     MUST supply a Sender header and delete (or rename, e.g. to
2873     X-Unverifiable-Sender) any poster-supplied Sender header.
2874    
2875     NOTE: It might be useful to preserve a poster-
2876     supplied Sender header so that the poster can sup-
2877     ply the full-name part of the content. The mail-
2878     ing address, however, must be right. Hence, the
2879     posting agent must generate the Sender header if
2880     it is unable to verify the mailing address of a
2881     poster-supplied one.
2882    
2883     NOTE: NNTP implementors, in particular, are urged
2884     to note this requirement (which would eliminate
2885     the need for ad hoc headers like NNTP-Posting-
2886     Host), although there are admittedly some imple-
2887     mentation difficulties. A user name from an RFC
2888     1413 server and a host name from an inverse map-
2889     ping of the address, perhaps with a "full name"
2890     comment noting the origin of the information,
2891     would be at least a first approximation:
2892    
2893     Sender: fred@zoo.toronto.edu (RFC-1413@reverse-lookup; not verified)
2894    
2895     While this does not completely meet the specs, it
2896     comes a lot closer than not having a Sender header
2897     at all. Even just supplying a placeholder for the
2898     user name:
2899    
2900    
2901    
2902     2 June 1994 - 44 - expires 15 July 1994
2903    
2904    
2905    
2906    
2907    
2908     INTERNET DRAFT to be NEWS sec. 6.4
2909    
2910    
2911     Sender: somebody@zoo.toronto.edu (user name unknown)
2912    
2913     would be better than nothing.
2914    
2915    
2916     6.5. References
2917    
2918     The References header content lists message IDs of precur-
2919     sors:
2920    
2921     References-content = message-id *( space message-id )
2922    
2923     A followup MUST have a References header, and an article
2924     which is not a followup MUST not have a References header.
2925     In a followup, if the precursor had a References header, the
2926     message ID of the precursor is appended to the end of the
2927     precursor's References-content to form the followup's Refer-
2928     ences-content. a References header containing the precur-
2929     sor's message ID. A followup to an article which had a Ref-
2930     erences header MUST have a References header containing the
2931     precursor's References content, plus the precursor's message
2932     ID appended to the end of the list.
2933    
2934     NOTE: Use the See-Also header (section 6.16) for
2935     interconnection of articles which are not in a
2936     followup relationship to each other.
2937    
2938     NOTE: In retrospect, RFCs 850 and 1036, and the
2939     implementations whose practice they represented,
2940     erred here. The proper MAIL header to use for
2941     references to precursors is In-Reply-To, and the
2942     References header is meant to be used for the pur-
2943     poses here ascribed to See-Also. This incompati-
2944     bility is far too solidly established to be fixed,
2945     unfortunately. The best that can be done is to
2946     provide a clear mapping between the two, and urge
2947     gateways to do the transformation. The news usage
2948     is (now) a deliberate violation of the MAIL speci-
2949     fications; articles containing news References
2950     headers are technically not valid MAIL messages,
2951     although it is unlikely that much MAIL software
2952     will notice because the incompatibility is at a
2953     subtle semantic level that does not affect the
2954     syntax.
2955    
2956     UNRESOLVED ISSUE: Would it be better to just give
2957     up and admit that news uses References for both
2958     purposes?
2959    
2960     UNRESOLVED ISSUE: Should the syntax be generalized
2961     to include URLs as alternatives to message IDs?
2962     Perhaps not; too many things know about References
2963     already. And non-articles can't be precursors of
2964     articles, not really.
2965    
2966    
2967    
2968     2 June 1994 - 45 - expires 15 July 1994
2969    
2970    
2971    
2972    
2973    
2974     INTERNET DRAFT to be NEWS sec. 6.5
2975    
2976    
2977     Followup agents SHOULD not shorten References headers. If
2978     it is absolutely necessary to shorten the header, as a des-
2979     perate last resort, a followup agent MAY do this by deleting
2980     some of the message IDs. However, it MUST not delete the
2981     first message ID, the last three message IDs (including that
2982     of the immediate precursor), or any message ID mentioned in
2983     the body of the followup. If it is possible for the fol-
2984     lowup agent to determine the Subject content of the articles
2985     identified in the References header, it MUST not delete the
2986     message ID of any article where the Subject content changed
2987     (other than by prepending of a back reference). The fol-
2988     lowup agent MUST not delete any message ID whose local part
2989     ends with "_-_" (underscore (ASCII 95), hyphen (ASCII 45),
2990     underscore); followup agents are urged to use this form to
2991     mark subject changes, and to avoid using it otherwise.
2992    
2993     NOTE: As software capable of exploiting References
2994     chains has grown more common, the random shorten-
2995     ing permitted by RFC 1036 has become increasingly
2996     troublesome. ANY shortening is undesirable, and
2997     software should do it only in cases of dire neces-
2998     sity. In such cases, these rules attempt to limit
2999     the damage.
3000    
3001     NOTE: The first message ID is very important as
3002     the starting point of the "thread" of discussion,
3003     and absolutely should not be deleted. Keeping the
3004     last three message IDs gives thread-following
3005     software a fighting chance to reconstruct a full
3006     thread even if an article or two is missing.
3007     Keeping message IDs mentioned in the body is obvi-
3008     ously desirable.
3009    
3010     NOTE: Subject changes are difficult to determine,
3011     but they are significant as possible beginnings of
3012     new threads. The "_-_" convention is provided so
3013     that posting agents (which have more information
3014     about subjects) can flag articles containing a
3015     subject change in a way that followup agents can
3016     detect without access to the articles themselves.
3017     The sequence is chosen as one that is fairly
3018     unlikely to occur by accident.
3019    
3020     NOTE: Is "_-_" really worth having?
3021    
3022     When a References header is shortened, at least three blanks
3023     SHOULD be left between adjacent message IDs at each point
3024     where deletions were made. Software preparing new Refer-
3025     ences headers SHOULD preserve multiple blanks in older Ref-
3026     erences content.
3027    
3028     NOTE: It's desirable to have some marker of where
3029     deletions occurred, but the restricted syntax of
3030     the header makes this difficult. Extra white
3031    
3032    
3033    
3034     2 June 1994 - 46 - expires 15 July 1994
3035    
3036    
3037    
3038    
3039    
3040     INTERNET DRAFT to be NEWS sec. 6.5
3041    
3042    
3043     space is not a very good marker, since it may be
3044     deleted by software that ill-advisedly rewrites
3045     headers, but at least it doesn't break existing
3046     software.
3047    
3048     To repeat: followup agents SHOULD not shorten References
3049     headers.
3050    
3051     NOTE: Unfortunately, reading agents and other
3052     software analyzing References patterns have to be
3053     prepared for the worst anyway. The worst includes
3054     random deletions and the possibility of circular
3055     References chains (when References is misused in
3056     place of See-Also, section 6.16).
3057    
3058    
3059     6.6. Control
3060    
3061     The Control header content marks the article as a control
3062     message, and specifies the desired actions (other than the
3063     usual ones of filing and passing on the article):
3064    
3065     Control-content = verb *( space argument )
3066     verb = 1*( letter / digit )
3067     argument = 1*<ASCII printable character>
3068    
3069     The verb indicates what action should be taken, and the
3070     argument(s) (if any) supply details. In some cases, the
3071     body of the article may also contain details. Section 7
3072     describes the standard verbs. See also the Also-Control
3073     header (section 6.15).
3074    
3075     NOTE: Control messages are often processed and
3076     filed rather differently than normal articles.
3077    
3078     NOTE: The restriction of verbs to letters and dig-
3079     its is new, but is consistent with existing prac-
3080     tice and potentially simplifies implementation by
3081     avoiding characters significant to command inter-
3082     preters. Beware that the arguments are under no
3083     such restriction in general.
3084    
3085     NOTE: Two other conventions for distinguishing
3086     control messages from normal articles were for-
3087     merly in use: a three-component newsgroup name
3088     ending in ".ctl" or a subject beginning with
3089     "cmsg " was considered to imply that the article
3090     was a control message. These conventions are
3091     obsolete. Do not use them.
3092    
3093     An article with a Control header MUST not have an Also-
3094     Control or Supersedes header.
3095    
3096    
3097    
3098    
3099    
3100     2 June 1994 - 47 - expires 15 July 1994
3101    
3102    
3103    
3104    
3105    
3106     INTERNET DRAFT to be NEWS sec. 6.7
3107    
3108    
3109     6.7. Distribution
3110    
3111     The Distribution header content specifies geographic or
3112     organizational limits on an article's propagation:
3113    
3114     Distribution-content = distribution *( dist-delim distribution )
3115     dist-delim = ","
3116     distribution = plain-component
3117    
3118     A distribution is syntactically identical to a one-component
3119     newsgroup name, and must satisfy the same rules and restric-
3120     tions. In the absence of Distribution, the default distri-
3121     bution is "world".
3122    
3123     NOTE: This syntax has the disadvantage of contain-
3124     ing no white space, making it impossible to con-
3125     tinue a Distribution header across several lines.
3126     Implementors of relayers and reading agents are
3127     warned that it is intended that the successor to
3128     this Draft will change the definition of dist
3129     delimiter to:
3130    
3131     dist-delim = "," [ space ]
3132    
3133     and are urged to fix their software to handle
3134     (i.e., ignore) white space following the commas.
3135    
3136     A relayer MUST not pass an article to another relayer unless
3137     configuration information specifies transmission to that
3138     other relayer of BOTH (a) at least one of the article's
3139     newsgroup(s), and (b) at least one of the article's distri-
3140     bution(s). In effect, the only role of distributions is to
3141     limit propagation, by preventing transmission of articles
3142     that would have been transmitted had the decision been based
3143     solely on newsgroups.
3144    
3145     A posting agent might wish to present a menu of possible
3146     distributions, or suggest a default, but normally SHOULD not
3147     supply a default without giving the poster a chance to over-
3148     ride it. A followup agent SHOULD initially supply the same
3149     Distribution header as found in the precursor, although the
3150     poster MAY alter this if appropriate.
3151    
3152     Despite the syntactic similarity and some historical confu-
3153     sion, distributions are NOT newsgroup names. The whole
3154     point of putting a distribution on an article is that it is
3155     DIFFERENT from the newsgroup(s). In general, a meaningful
3156     distribution corresponds to some sort of region of propaga-
3157     tion: a geographical area, an organization, or a cooperating
3158     subnet.
3159    
3160     NOTE: Distributions have historically suffered
3161     from the completely uncontrolled nature of their
3162     name space, the lack of feedback to posters on
3163    
3164    
3165    
3166     2 June 1994 - 48 - expires 15 July 1994
3167    
3168    
3169    
3170    
3171    
3172     INTERNET DRAFT to be NEWS sec. 6.7
3173    
3174    
3175     incomplete propagation resulting from use of ran-
3176     dom trash in Distribution headers, and confusion
3177     with newsgroups (arising partly because many
3178     regions and organizations DO have internal news-
3179     groups with names resembling their internal dis-
3180     tributions). This has resulted in much garbage in
3181     Distribution headers, notably the pointless prac-
3182     tice of automatically supplying the first compo-
3183     nent of the newsgroup name as a distribution
3184     (which is MOST unlikely to restrict propagation!).
3185     Many sites have opted to maximize propagation of
3186     such ill-formed articles by essentially ignoring
3187     distributions. This unfortunately interferes with
3188     legitimate uses. The situation is bad enough that
3189     distributions must be considered largely useless
3190     except within cooperating subnets that make an
3191     organized effort to restrain propagation of their
3192     internal distributions.
3193    
3194     NOTE: The distributions "world" and "local" have
3195     no standard magic meaning (except that the former
3196     is the default distribution if none is given).
3197     Some pieces of software do assign such meanings to
3198     them.
3199    
3200    
3201     6.8. Keywords
3202    
3203     The Keywords header content is one or more phrases intended
3204     to describe some aspect of the content of the article:
3205    
3206     Keywords-content = plain-phrase *( "," [ space ] plain-phrase )
3207    
3208     Keywords, separated by commas, each follow the <plain-
3209     phrase> syntax defined in section 5.2. Encoded words in
3210     keywords MUST not contain characters other than letters (of
3211     either case), digits, and the characters "!", "*", "+", "-",
3212     "/", "=", and "_".
3213    
3214     NOTE: Posters and posting agents are asked to take
3215     note that keywords are separated by commas, not by
3216     white space. The following Keywords header con-
3217     tains only one keyword (a rather unlikely and
3218     improbable one):
3219    
3220     Keywords: Thompson Ritchie Multics Linux
3221    
3222     and should probably have been written:
3223    
3224     Keywords: Thompson, Ritchie, Multics, Linux
3225    
3226     This particular error is unfortunately rather
3227     widespread.
3228    
3229    
3230    
3231    
3232     2 June 1994 - 49 - expires 15 July 1994
3233    
3234    
3235    
3236    
3237    
3238     INTERNET DRAFT to be NEWS sec. 6.8
3239    
3240    
3241     NOTE: Reading agents and archivers preparing
3242     indexes of articles should bear in mind that user-
3243     chosen keywords are notoriously poor for indexing
3244     purposes unless the keywords are picked from a
3245     predefined set (which they are not in this case).
3246     Also, some followup agents unwisely propagate the
3247     Keywords header from the precursor into the fol-
3248     lowup by default. At least one news-based experi-
3249     ment has found the contents of Keywords headers to
3250     be completely valueless for indexing.
3251    
3252    
3253     6.9. Summary
3254    
3255     The Summary header content is a short phrase summarizing the
3256     article's content:
3257    
3258     Summary-content = nonblank-text
3259    
3260     As with the subject, no restriction is placed on the content
3261     since it is intended solely for display to humans.
3262    
3263     NOTE: Reading agents should be aware that the Sum-
3264     mary header is often used as a sort of secondary
3265     Subject header, and (if present) its contents
3266     should perhaps be displayed when the subject is
3267     displayed.
3268    
3269     The summary SHOULD be terse. Posters SHOULD avoid trying to
3270     cram their entire article into the headers; even the sim-
3271     plest query usually benefits from a sentence or two of elab-
3272     oration and context, and not all reading agents display all
3273     headers.
3274    
3275    
3276     6.10. Approved
3277    
3278     The Approved header content indicates the mailing addresses
3279     (and possibly the full names) of the persons or entities
3280     approving the article for posting:
3281    
3282     Approved-content = From-content *( "," [ space ] From-content )
3283    
3284     An Approved header is required in all postings to moderated
3285     newsgroups; the presence or absence of this header allows a
3286     posting agent to distinguish between articles posted by the
3287     moderator (which are normal articles to be posted normally)
3288     and attempted contributions by others (which should be
3289     mailed to the moderator for approval). An Approved header
3290     is also required in certain control messages, to reduce the
3291     probability of accidental posting of same; see the relevant
3292     parts of section 7.
3293    
3294    
3295    
3296    
3297    
3298     2 June 1994 - 50 - expires 15 July 1994
3299    
3300    
3301    
3302    
3303    
3304     INTERNET DRAFT to be NEWS sec. 6.10
3305    
3306    
3307     NOTE: There is, at present, no way to authenticate
3308     Approved headers to ensure that the claimed
3309     approval really was bestowed. Nor is there an
3310     established mechanism for even maintaining a list
3311     of legitimate approvers (such a list would quickly
3312     become out of date if it had to be maintained by
3313     hand). Such mechanisms, presumably relying on
3314     cryptographic authentication, would be a worth-
3315     while extension to this Draft, and experimental
3316     work in this area is encouraged. (The problem is
3317     harder than it sounds because news is used on many
3318     systems which do not have real-time access to key
3319     servers.)
3320    
3321     NOTE: Relayer implementors, please note well: it
3322     is the POSTING AGENT that is authorized to distin-
3323     guish between moderator postings and attempted
3324     contributions, and to mail the latter to the mod-
3325     erator. As discussed in section 9.1, relayers
3326     MUST not, repeat MUST not, send such mail; on
3327     receipt of an unApproved article in a moderated
3328     newsgroup, they should discard the article, NOT
3329     transform it into a mail message (except perhaps
3330     to a local administrator).
3331    
3332     NOTE: RFC 1036 restricted Approved to a single
3333     From-content. However, multiple moderation is no
3334     longer rare, and multi-moderator Approved headers
3335     are already in use.
3336    
3337    
3338     6.11. Lines
3339    
3340     The Lines header content indicates the number of lines in
3341     the body of the article:
3342    
3343     Lines-content = 1*digit
3344    
3345     The line count includes all body lines, including the signa-
3346     ture if any, including empty lines (if any) at beginning or
3347     end of the body. (The single empty separator line between
3348     the headers and the body is not part of the body.) The
3349     "body" here is the body as found in the posted article,
3350     AFTER all transformations such as MIME encodings.
3351    
3352     Reading agents SHOULD not rely on the presence of this
3353     header, since it is optional (and some posting agents do not
3354     supply it). They MUST not rely on it being precise, since
3355     it frequently is not.
3356    
3357     NOTE: The average line length in article bodies is
3358     surprisingly consistent at about 40 characters,
3359     and since the line count typically is used only
3360     for approximate judgements ("is this too long to
3361    
3362    
3363    
3364     2 June 1994 - 51 - expires 15 July 1994
3365    
3366    
3367    
3368    
3369    
3370     INTERNET DRAFT to be NEWS sec. 6.11
3371    
3372    
3373     read quickly?"), dividing the byte count of the
3374     body by 40 gives an estimate of the body line
3375     count that is adequate for normal use. This esti-
3376     mate is NOT adequate if the body has been MIME
3377     encoded... but neither is the Lines header, since
3378     at least one major relayer will supply a Lines
3379     header for an article that lacks one, and will not
3380     consider the possibility of MIME encodings when
3381     computing the line count.
3382    
3383     NOTE: It would be better to have a Content-Size
3384     header as part of MIME, so that body parts could
3385     have their own sizes, and so that the units used
3386     could be appropriate to the data type (line count
3387     is not a useful measure of the size of an encoded
3388     image, for example). Doing this is preferable to
3389     trying to fix Lines.
3390    
3391     UNRESOLVED ISSUE: Update on Content-Size?
3392    
3393     Relayers SHOULD discard this header if they find it neces-
3394     sary to re-encode the article in such a way that the origi-
3395     nal Lines header would be rendered incorrect.
3396    
3397    
3398     6.12. Xref
3399    
3400     The Xref header content indicates where an article was filed
3401     by the last relayer to process it:
3402    
3403     Xref-content = relayer 1*( space location )
3404     relayer = relayer-name
3405     location = newsgroup-name ":" article-locator
3406     article-locator = 1*<ASCII printable character>
3407    
3408     The relayer's name is included so that software can deter-
3409     mine which relayer generated the header (and specifically,
3410     whether it really was the one that filed the copy being
3411     examined). The locations specify what newsgroups the arti-
3412     cle was filed under (which may differ from those in the
3413     Newsgroups header) and where it was filed under them. The
3414     exact form of an article locator is implementation-specific.
3415    
3416     NOTE: Reading agents can exploit this information
3417     to avoid presenting the same article to a reader
3418     several times. The information is sometimes
3419     available in system databases, but having it in
3420     the article is convenient. Relayers traditionally
3421     generate an Xref header only if the article is
3422     cross-posted, but this is not mandatory, and there
3423     is at least one new application ("mirroring":
3424     keeping news databases on two hosts identical)
3425     where the header is useful in all articles.
3426    
3427    
3428    
3429    
3430     2 June 1994 - 52 - expires 15 July 1994
3431    
3432    
3433    
3434    
3435    
3436     INTERNET DRAFT to be NEWS sec. 6.12
3437    
3438    
3439     NOTE: The traditional form of an article locator
3440     is a decimal number, with articles in each news-
3441     group numbered consecutively starting from 1.
3442     NNTP [rrr] demands that such a model be provided,
3443     and there may be other software which expects it,
3444     but it seems desirable to permit flexibility for
3445     unorthodox implementations.
3446    
3447     A relayer inserting an Xref header into an article MUST
3448     delete any previous Xref header. A relayer which is not
3449     inserting its own Xref header SHOULD delete any previous
3450     Xref header. A relayer MAY delete the Xref header when
3451     passing an article on to another relayer.
3452    
3453     NOTE: RFC 1036 specified that the Xref header was
3454     not transmitted when an article was passed to
3455     another relayer, but the major news implementa-
3456     tions have never obeyed this rule, and applica-
3457     tions like mirroring depend on this disobedience.
3458    
3459     A relayer MUST use the same name in Xref headers as it uses
3460     in Path headers. Reading agents MUST ignore an Xref header
3461     containing a relayer name that differs from the one that
3462     begins the path list.
3463    
3464    
3465     6.13. Organization
3466    
3467     The Organization header content is a short phrase identify-
3468     ing the poster's organization:
3469    
3470     Organization-content = nonblank-text
3471    
3472     This header is typically supplied by the posting agent. The
3473     Organization content SHOULD mention geographical location
3474     (e.g. city and country) when it is not obvious from the
3475     organization's name.
3476    
3477     NOTE: The motive here is that the organization is
3478     often difficult to guess from the mailing address,
3479     is not always supplied in a signature, and can
3480     help identify the poster to the reader.
3481    
3482     NOTE: There is no "s" in "Organization".
3483    
3484     The Organization content is provided for identification
3485     only, and does not imply that the poster speaks for the
3486     organization or that the article represents organization
3487     policy. Posting agents SHOULD permit the poster to override
3488     a local default Organization header.
3489    
3490    
3491    
3492    
3493    
3494    
3495    
3496     2 June 1994 - 53 - expires 15 July 1994
3497    
3498    
3499    
3500    
3501    
3502     INTERNET DRAFT to be NEWS sec. 6.14
3503    
3504    
3505     6.14. Supersedes
3506    
3507     The Supersedes header content specifies articles to be can-
3508     celled on arrival of this one:
3509    
3510     Supersedes-content = message-id *( space message-id )
3511    
3512     Supersedes is equivalent to Also-Control (section 6.15) with
3513     an implicit verb of "cancel" (section 7.1).
3514    
3515     NOTE: Supersedes is normally used where the arti-
3516     cle is an updated version of the one(s) being can-
3517     celled.
3518    
3519     NOTE: Although the ability to use multiple message
3520     IDs in Supersedes is highly desirable (see section
3521     7.1), posters are warned that existing implementa-
3522     tions often do not correctly handle more than one.
3523    
3524     NOTE: There is no "c" in "Supersedes".
3525    
3526     An article with a Supersedes header MUST not have an Also-
3527     Control or Control header.
3528    
3529    
3530     6.15. Also-Control
3531    
3532     The Also-Control header content marks the article as being a
3533     control message IN ADDITION to being a normal news article,
3534     and specifies the desired actions:
3535    
3536     Also-Control-content = Control-content
3537    
3538     An article with an Also-Control header is filed and passed
3539     on normally, but the content of the Also-Control header is
3540     processed as if it were found in a Control header.
3541    
3542     NOTE: It is sometimes desirable to piggyback con-
3543     trol actions on a normal article, so that the
3544     article will be filed normally but will also be
3545     acted on as a control message. This header is
3546     essentially a generalization of Supersedes.
3547    
3548     NOTE: Be warned that some old relayers do not
3549     implement Also-Control.
3550    
3551     An article with an Also-Control header MUST not have a Con-
3552     trol or Supersedes header.
3553    
3554    
3555     6.16. See-Also
3556    
3557     The See-Also header content lists message IDs of articles
3558     that are related to this one but are not its precursors:
3559    
3560    
3561    
3562     2 June 1994 - 54 - expires 15 July 1994
3563    
3564    
3565    
3566    
3567    
3568     INTERNET DRAFT to be NEWS sec. 6.16
3569    
3570    
3571     See-Also-content = message-id *( space message-id )
3572    
3573     See-Also resembles References, but without the restrictions
3574     imposed on References by the followup rules.
3575    
3576     NOTE: See-Also provides a way to group related
3577     articles, such as the parts of a single document
3578     that had to be split across multiple articles due
3579     to its size, or to cross-reference between paral-
3580     lel threads.
3581    
3582     NOTE: See the discussion (in section 6.5) on MAIL
3583     compatibility issues of References and See-Also.
3584    
3585     NOTE: In the specific case where it is desired to
3586     essentially make another article PART of the cur-
3587     rent one, e.g. for annotation of the other arti-
3588     cle, MIME's "message/external-body" convention can
3589     be used to do so without actual inclusion. "news-
3590     message-ID" was registered as a standard external-
3591     body access method, with a mandatory NAME parame-
3592     ter giving the message ID and an optional SITE
3593     parameter suggesting an NNTP site that might have
3594     the article available (if it is not available
3595     locally), by IANA 22 June 1993.
3596    
3597     UNRESOLVED ISSUE: Could the syntax be generalized
3598     to include URLs as alternatives to message IDs?
3599     Here it makes much more sense than in References.
3600    
3601    
3602     6.17. Article-Names
3603    
3604     The Article-Names header content indicates any special sig-
3605     nificance the article may have in particular newsgroups:
3606    
3607     Article-Names-content = 1*( name-clause space )
3608     name-clause = newsgroup-name ":" article-name
3609     article-name = letter 1*( letter / digit / "-" )
3610    
3611     Each name clause specifies a newsgroup (which SHOULD be
3612     among those in the Newsgroups header) and an article name
3613     local to that newsgroup. Article names MAY be used by
3614     relayers to file the article in special ways, or they MAY
3615     just be noted for possible special attention by reading
3616     agents. Article names are case-sensitive.
3617    
3618     NOTE: This header provides a way to mark special
3619     postings, such as introductions, frequently-asked-
3620     question lists, etc., so that reading agents have
3621     a way of finding them automatically. The news-
3622     group name is specified for each article name
3623     because the names may be newsgroup-specific; for
3624     example, many frequently-asked-question lists are
3625    
3626    
3627    
3628     2 June 1994 - 55 - expires 15 July 1994
3629    
3630    
3631    
3632    
3633    
3634     INTERNET DRAFT to be NEWS sec. 6.17
3635    
3636    
3637     posted to "news.answers" in addition to their
3638     "home" newsgroup, and they would not be known by
3639     the same name(s) in both newsgroups.
3640    
3641     The Article-Names header SHOULD be ignored unless the arti-
3642     cle also contains an Approved header.
3643    
3644     NOTE: This stipulation is made in anticipation of
3645     the possibility that Approved headers will be
3646     involved in cryptographic authentication.
3647    
3648     The presence of an Article-Names header does not necessarily
3649     imply that the article will be retained unusually long
3650     before expiration, or that previous article(s) with similar
3651     Article-Names headers will be cancelled by its arrival.
3652     Posters preparing special postings SHOULD include appropri-
3653     ate other headers, such as Expires and Supersedes, to
3654     request such actions.
3655    
3656     Different networks MAY establish different sets of article
3657     names for the special postings they deem significant; it is
3658     preferable for usage to be standardized within networks,
3659     although it might be desirable for individual newsgroups to
3660     have different naming conventions in some situations. Arti-
3661     cle names MUST be 14 characters or less. The following
3662     names are suggested but are not mandatory:
3663    
3664     intro Introduction to the newsgroup for newcomers.
3665    
3666     charter Charter, rules, organization, moderation poli-
3667     cies, etc.
3668    
3669     background Biographies of special participants, history of
3670     the newsgroup, notes on related newsgroups, etc.
3671    
3672     subgroups Descriptions of sub-newsgroups under this news-
3673     group, e.g. "sci.space.news" under "sci.space".
3674    
3675     facts Information relating to the purpose of the news-
3676     group, e.g. an acronym glossary in "sci.space".
3677    
3678     references Where to get more information: books, journals,
3679     FTP repositories, etc.
3680    
3681     faq Answers to frequently-asked questions.
3682    
3683     menu If present, a list of all the other article
3684     names local to this newsgroup, with brief
3685     descriptions of their contents.
3686    
3687     Such articles may be divided into subsections using the MIME
3688     "multipart/mixed" conventions. If size considerations make
3689     it necessary to split such articles, names ending in a
3690     hyphen and a part number are suggested; for example, a
3691    
3692    
3693    
3694     2 June 1994 - 56 - expires 15 July 1994
3695    
3696    
3697    
3698    
3699    
3700     INTERNET DRAFT to be NEWS sec. 6.17
3701    
3702    
3703     three-part frequently-asked-questions list could have arti-
3704     cle names "faq-1", "faq-2", and "faq-3".
3705    
3706     NOTE: It is somewhat premature to attempt to stan-
3707     dardize article names, since this is essentially a
3708     new feature with no experience behind it. How-
3709     ever, if reading agents are to attach special sig-
3710     nificance to these names, some attempt at standard
3711     conventions is imperative. This is a first
3712     attempt at providing some.
3713    
3714    
3715     6.18. Article-Updates
3716    
3717     The Article-Updates header content indicates what previous
3718     articles this one is deemed (by the poster) to update (i.e.,
3719     replace):
3720    
3721     Article-Updates-content = message-id *( space message-id )
3722    
3723     Each message ID identifies a previous article that this one
3724     is deemed to update. This MUST not cause the previous arti-
3725     cle(s) to be cancelled or otherwise altered, unless this is
3726     implied by other headers (e.g. Supersedes); Article-Updates
3727     is merely an advisory which MAY be noted for special atten-
3728     tion by reading agents.
3729    
3730     NOTE: This header provides a way to mark articles
3731     which are only minor updates of previous ones,
3732     containing no significant new information and not
3733     worth reading if the previous ones have been read.
3734    
3735     NOTE: If suitable conventions using MIME multipart
3736     bodies and the "message/external-body" body-part
3737     type can be developed, a replacing article might
3738     contain only differences between the old text and
3739     the new text, rather than a complete new copy.
3740     This is the motivation for not making Article-
3741     Updates also function as Supersedes does: the
3742     replacing article might depend on the continued
3743     presence of the replaced article.
3744    
3745    
3746     7. Control Messages
3747    
3748     The following sections document the currently-defined con-
3749     trol messages. "Message" is used herein as a synonym for
3750     "article" unless context indicates otherwise.
3751    
3752     Posting agents are warned that since certain control mes-
3753     sages require article bodies in quite specific formats, sig-
3754     natures SHOULD not be appended to such articles, and it may
3755     be wise to take greater care than usual to avoid unintended
3756     (although perhaps well-meaning) alterations to text supplied
3757    
3758    
3759    
3760     2 June 1994 - 57 - expires 15 July 1994
3761    
3762    
3763    
3764    
3765    
3766     INTERNET DRAFT to be NEWS sec. 7
3767    
3768    
3769     by the poster. Relayers MUST assume that control messages
3770     mean what they say; they MAY be obeyed as is or rejected,
3771     but MUST not be reinterpreted.
3772    
3773     The execution of the actions requested by control messages
3774     is subject to local administrative restrictions, which MAY
3775     deny requests or refer them to an administrator for
3776     approval. The descriptions below are generally phrased in
3777     terms suggesting mandatory actions, but any or all of these
3778     MAY be subject to local administrative approval (either as a
3779     class or case-by-case). Analogously, where the description
3780     below specifies that a message or portion thereof is to be
3781     ignored, this action MAY include reporting it to an adminis-
3782     trator.
3783    
3784     NOTE: The exact choice of local action might
3785     depend on what action the control message
3786     requests, who it claims to come from, etc.
3787    
3788     Relayers MUST propagate even control messages they do not
3789     understand.
3790    
3791     In the following sections, each type of control message is
3792     defined syntactically by defining its arguments and its
3793     body. For example, "cancel" is defined by defining cancel-
3794     arguments and cancel-body.
3795    
3796    
3797     7.1. cancel
3798    
3799     The cancel message requests that one or more previous arti-
3800     cles be "cancelled":
3801    
3802     cancel-arguments = message-id *( space message-id )
3803     cancel-body = body
3804    
3805     The argument(s) identify the articles to be cancelled, by
3806     message ID. The body is a comment, which software MUST
3807     ignore, and SHOULD contain an indication of why the cancel-
3808     lation was requested. The cancel message SHOULD be posted
3809     to the same newsgroup(s), with the same distribution(s), as
3810     the article(s) it is attempting to cancel.
3811    
3812     NOTE: Using the same newsgroups and distributions
3813     maximizes the chances of the cancel message propa-
3814     gating everywhere the target articles went.
3815    
3816     NOTE: RFC 1036 permitted only a single message-id
3817     in a cancel message. Support for cancelling mul-
3818     tiple articles is highly desirable, especially for
3819     use with Supersedes (see section 6.14). If sev-
3820     eral revisions of an article appear in fast suc-
3821     cession, each using Supersedes to cancel the pre-
3822     vious one, it is possible for a middle revision to
3823    
3824    
3825    
3826     2 June 1994 - 58 - expires 15 July 1994
3827    
3828    
3829    
3830    
3831    
3832     INTERNET DRAFT to be NEWS sec. 7.1
3833    
3834    
3835     be destroyed by cancellation before it is propa-
3836     gated onward to cancel its predecessor. Allowing
3837     each article to cancel several predecessors
3838     greatly alleviates this problem. (Posting agents
3839     preparing a cancel of an article which itself can-
3840     cels other articles might wish to add those arti-
3841     cles to the cancel-arguments.) However, posters
3842     should be aware that much old software does not
3843     implement multiple cancellation properly, and
3844     should avoid using it when reliable cancellation
3845     is vitally important.
3846    
3847     When an article (the "target article") is to be cancelled,
3848     there are four cases of interest: the article hasn't arrived
3849     yet, it has arrived and been filed and is available for
3850     reading, it has expired and been archived on some less-
3851     accessible storage medium, or it has expired and been
3852     deleted. The next few paragraphs discuss each case in turn
3853     (in reverse order, which is convenient for the explanation).
3854    
3855     EXPIRED AND DELETED. Take no action.
3856    
3857     EXPIRED AND ARCHIVED. If the article is readily accessible
3858     and can be deleted or made unreadable easily, treat as under
3859     AVAILABLE below. Otherwise treat as under EXPIRED AND
3860     DELETED.
3861    
3862     NOTE: While it is desirable for archived articles
3863     to be cancellable, this can easily involve rewrit-
3864     ing an entire archive volume just to get rid of
3865     one article, perhaps with manual actions required
3866     to arrange it. It is difficult to envision a sit-
3867     uation so dire as to require such measures from
3868     hundreds or thousands of administrators, or for
3869     that matter one in which widespread compliance
3870     with such a request is likely.
3871    
3872     AVAILABLE. Compare the mailing addresses from the From
3873     lines of the cancel message and the target article, bearing
3874     in mind that local parts (except for "postmaster") are case-
3875     sensitive and domains are case-insensitive. If they do not
3876     match, either refer the issue to an administrator for a
3877     case-by-case decision, or treat as if they matched.
3878    
3879     NOTE: It is generally trivial to forge articles,
3880     so nothing short of cryptographic authentication
3881     is really adequate to ensure that a cancel came
3882     from the original article's author. Moreover, it
3883     is highly desirable to permit authorities other
3884     than the author to cancel articles, to allow for
3885     cases in which the author is unavailable, uncoop-
3886     erative, or malicious, and in which damage and/or
3887     legal problems may be minimized by prompt cancel-
3888     lation. Reliable authentication that would permit
3889    
3890    
3891    
3892     2 June 1994 - 59 - expires 15 July 1994
3893    
3894    
3895    
3896    
3897    
3898     INTERNET DRAFT to be NEWS sec. 7.1
3899    
3900    
3901     such administrative cancels would be a worthwhile
3902     extension to this Draft, and experimental work in
3903     this area is encouraged.
3904    
3905     NOTE: Meanwhile, a simple check of addresses is
3906     useful accident prevention and catches at least
3907     the most simple-minded forgers. Since the intent
3908     is accident prevention rather than ironclad secu-
3909     rity, use of the From address is appropriate, all
3910     the more so because in the presence of gateways
3911     (especially redundant multiple gateways), the
3912     author may not have full control over Sender head-
3913     ers.
3914    
3915     NOTE: The "refer... or treat as if they matched"
3916     rule is intended to specifically forbid quietly
3917     ignoring cancels with mismatched addresses.
3918    
3919     If the addresses match, then if technically possible, the
3920     relayer MUST delete the target article completely and imme-
3921     diately. Failing that, it MUST make the target article
3922     unreadable (preferably to everyone, minimally to everyone
3923     but the administrator) and either arrange for it to be
3924     deleted as soon as possible or notify an administrator at
3925     once.
3926    
3927     NOTE: To allow for events such as criminal
3928     actions, malicious forgeries, and copyright
3929     infringements, where damage and/or legal problems
3930     may be minimized by prompt cancellation, complete
3931     removal is strongly preferred over merely making
3932     the target article unreadable. The potential for
3933     malice is outweighed by the importance of really
3934     getting rid of the target article in some legiti-
3935     mate cases. (In cases of inadvertent copyright
3936     violation in particular, the ability to quickly
3937     remedy the violation is of considerable legal
3938     importance.) Failing that, making it unreadable
3939     is better than nothing.
3940    
3941     NOTE: Merely annotating the article so that read-
3942     ers see an indication that the author wanted it
3943     cancelled is not acceptable. Making the article
3944     unreadable is the minimum action.
3945    
3946     NOTE: There have been experiments with making can-
3947     celled articles unreadable, so that local news
3948     administrators could reverse cancellations. In
3949     practice, administrators almost never find cause
3950     to do so. Removal appears to be clearly prefer-
3951     able where technically feasible.
3952    
3953     NOT ARRIVED YET. If practical, retain the cancel message
3954     until the target article does arrive, or until there is no
3955    
3956    
3957    
3958     2 June 1994 - 60 - expires 15 July 1994
3959    
3960    
3961    
3962    
3963    
3964     INTERNET DRAFT to be NEWS sec. 7.1
3965    
3966    
3967     further possibility of it arriving and being accepted (see
3968     section 9.2), and then treat as under AVAILABLE. Failing
3969     that, arrange for the target article to be rejected and dis-
3970     carded if it does arrive.
3971    
3972     NOTE: It may well be impractical to retain the
3973     control message, given uncertainty about whether
3974     the target article will ever arrive. Existing
3975     practice in such cases is to assume that addresses
3976     would match and arrange the equivalent of dele-
3977     tion. This is often done by making a spurious
3978     entry in a database of already-seen message IDs
3979     (see section 9.3), so that if the article does
3980     arrive, it will be rejected as a duplicate.
3981    
3982     The cancel message MUST be propagated onward in the usual
3983     fashion, regardless of which of the four cases applied, so
3984     that the target article will be cancelled everywhere even if
3985     cancellation and target article follow different routes.
3986    
3987     NOTE: RFC 1036 appeared to require stopping cancel
3988     propagation in the NOT ARRIVED YET case, although
3989     the wording was somewhat unclear. This appears to
3990     have been an unwise decision; there are known
3991     cases of important cancellations (in situations
3992     of, e.g., inadvertent copyright violation) achiev-
3993     ing rather poorer propagation than the target
3994     article. News propagation is often a much less
3995     orderly process than the authors of RFC 1036
3996     apparently envisioned. Modern implementations
3997     generally propagate the cancellation regardless.
3998    
3999     Posting agents meant for use by ordinary posters SHOULD
4000     reject an attempt to post a cancel message if the target
4001     article is available and the mailing address in its From
4002     header does not match the one in the cancel message's From
4003     header.
4004    
4005     NOTE: This, again, is primarily accident preven-
4006     tion.
4007    
4008    
4009     7.2. ihave, sendme
4010    
4011     The ihave and sendme control messages implement a crude
4012     batched predecessor of the NNTP [rrr] protocol. They are
4013     largely obsolete in the Internet, but still see use in the
4014     UUCP environment, especially for backup feeds that normally
4015     are active only when a primary feed path has failed.
4016    
4017     NOTE: The ihave and sendme messages defined here
4018     have ABSOLUTELY NOTHING TO DO WITH NNTP, despite
4019     similarities of terminology.
4020    
4021    
4022    
4023    
4024     2 June 1994 - 61 - expires 15 July 1994
4025    
4026    
4027    
4028    
4029    
4030     INTERNET DRAFT to be NEWS sec. 7.2
4031    
4032    
4033     The two messages share the same syntax:
4034    
4035     ihave-arguments = *( message-id space ) relayer-name
4036     sendme-arguments = ihave-arguments
4037     ihave-body = *( message-id eol )
4038     sendme-body = ihave-body
4039    
4040     Message IDs MUST appear in either the arguments or the body,
4041     but not both. Relayers SHOULD generate the form putting
4042     message IDs in the body, but the other form MUST be sup-
4043     ported for backward compatibility.
4044    
4045     NOTE: RFC 1036 made the relayer name optional, but
4046     difficulties could easily ensue in determining the
4047     origin of the message, and this option is believed
4048     to be unused nowadays. Putting the message IDs in
4049     the body is strongly preferred over putting them
4050     in the arguments because it lends itself much bet-
4051     ter to large numbers of message IDs and avoids the
4052     empty-body problem mentioned in section 4.3.1.
4053    
4054     The ihave message states that the named relayer has filed
4055     articles with the specified message IDs, which may be of
4056     interest to the relayer(s) receiving the ihave message. The
4057     sendme message requests that the relayer receiving it send
4058     the articles having the specified message IDs to the named
4059     relayer.
4060    
4061     These control messages are normally sent essentially as
4062     point-to-point messages, by using "to." newsgroups (see sec-
4063     tion 5.5) that are sent only to the relayer the messages are
4064     intended for. The two relayers MUST be neighbors, exchang-
4065     ing news directly with each other. Each relayer advertises
4066     its new arrivals to the other using ihave messages, and each
4067     uses sendme messages to request the articles it lacks.
4068    
4069     NOTE: Arguably these point-to-point control mes-
4070     sages should flow by some other protocol, e.g.
4071     mail, but administrative and interfacing issues
4072     are simplified if the news system doesn't need to
4073     talk to the mail system.
4074    
4075     To reduce overhead, ihave and sendme messages SHOULD be sent
4076     relatively infrequently and SHOULD contain substantial num-
4077     bers of message IDs. If ihave and sendme are being used to
4078     implement a backup feed, it may be desirable to insert a
4079     delay between reception of an ihave and generation of a
4080     sendme, so that a slightly slow primary feed will not cause
4081     large numbers of articles to be requested unnecessarily via
4082     sendme.
4083    
4084    
4085    
4086    
4087    
4088    
4089    
4090     2 June 1994 - 62 - expires 15 July 1994
4091    
4092    
4093    
4094    
4095    
4096     INTERNET DRAFT to be NEWS sec. 7.3
4097    
4098    
4099     7.3. newgroup
4100    
4101     The newgroup control message requests that a new newsgroup
4102     be created:
4103    
4104     newgroup-arguments = newsgroup-name [ space moderation ]
4105     moderation = "moderated" / "unmoderated"
4106     newgroup-body = body
4107     / [ body ] descriptor [ body ]
4108     descriptor = descriptor-tag eol description-line eol
4109     descriptor-tag = "For your newsgroups file:"
4110     description-line = newsgroup-name space description
4111     description = nonblank-text [ " (Moderated)" ]
4112    
4113     The first argument names the newsgroup to be created, and
4114     the second one (if present) indicates whether it is moder-
4115     ated. If there is no second argument, the default is
4116     "unmoderated".
4117    
4118     NOTE: Implementors are warned that there is occa-
4119     sional use of other forms in the second argument.
4120     It is suggested that such violations of this
4121     Draft, which are also violations of RFC 1036,
4122     cause the newgroup message to be ignored. RFC
4123     1036 was slightly vague about how second arguments
4124     other than "moderated" were to be treated (specif-
4125     ically, whether they were illegal or just
4126     ignored), but it is thought that all existing
4127     major implementations will handle "unmoderated"
4128     correctly, and it appears desirable to tighten up
4129     the specs to make it possible for other forms to
4130     be used in future.
4131    
4132     The body is a comment, which software MUST ignore, except
4133     that if it contains a descriptor, the description line is
4134     intended to be suitable for addition to a list of newsgroup
4135     descriptions. The description cannot be continued onto
4136     later lines, but is not constrained to any particular
4137     length. Moderated newsgroups have descriptions that end
4138     with the string " (Moderated)" (note that this string begins
4139     with a blank).
4140    
4141     NOTE: It is unfortunate that the description line
4142     is part of the body, rather than being supplied in
4143     a header, but this is established practice. News-
4144     group creators are cautioned that the descriptor
4145     tag must be reproduced exactly as given above,
4146     alone on a line, and is case-sensitive. (To
4147     reduce errors in this regard, posting agents might
4148     wish to question or reject newgroup messages which
4149     do not contain a descriptor.) Given the desire
4150     for short lines, description writers should avoid
4151     content-free phrases like "discussion of" and
4152     "news about", and stick to defining what the
4153    
4154    
4155    
4156     2 June 1994 - 63 - expires 15 July 1994
4157    
4158    
4159    
4160    
4161    
4162     INTERNET DRAFT to be NEWS sec. 7.3
4163    
4164    
4165     newsgroup is about.
4166    
4167     The remainder of the body SHOULD contain an explanation of
4168     the purpose of the newsgroup and the decision to create it.
4169    
4170     NOTE: Criteria for newsgroup creation vary widely
4171     and are outside the scope of this Draft, but if
4172     formal procedures of one kind or another were fol-
4173     lowed in the decision, the body should mention
4174     this. Administrators often look for such informa-
4175     tion when deciding whether to comply with cre-
4176     ation/deletion requests.
4177    
4178     A newgroup message which lacks an Approved header MUST be
4179     ignored.
4180    
4181     NOTE: It would also be desirable to ignore a new-
4182     group message unless its Approved header names a
4183     person who is authorized (in some sense) to create
4184     such a newsgroup. A cooperating subnet with suf-
4185     ficiently strong coordination to maintain a cor-
4186     rect and current list of authorized creators might
4187     wish to do so for its internal newsgroups. It
4188     also (or alternatively) might wish to ignore a
4189     newgroup message for an internal newsgroup that
4190     was posted (or cross-posted) to a non-internal
4191     newsgroup.
4192    
4193     NOTE: As mentioned in section 6.10, some form of
4194     (cryptographic?) authentication of Approved head-
4195     ers would be highly desirable, especially for con-
4196     trol messages.
4197    
4198     It would be desirable to provide some way of supplying a
4199     moderator's address in a newgroup message for a moderated
4200     newsgroup, but this will cause problems unless effective
4201     authentication is available, so it is left for future work.
4202    
4203     NOTE: This leaves news administrators stuck with
4204     the annoying chore of arranging proper mailing of
4205     moderated-newsgroup submissions. On Usenet, this
4206     can be simplified by exploiting a forwarding
4207     facility that some major sites provide: they main-
4208     tain forwarding addresses, each the name of a mod-
4209     erated newsgroup with all periods (".", ASCII 46)
4210     replaced by hyphens ("-", ASCII 45), which forward
4211     mail to the current newsgroup moderators. More
4212     advice on the subject of forwarding to moderators
4213     can be found in the document titled "How to Con-
4214     struct the Mailpaths File", posted regularly to
4215     the Usenet newsgroups news.lists, news.admin.misc,
4216     and news.answers.
4217    
4218    
4219    
4220    
4221    
4222     2 June 1994 - 64 - expires 15 July 1994
4223    
4224    
4225    
4226    
4227    
4228     INTERNET DRAFT to be NEWS sec. 7.3
4229    
4230    
4231     A newgroup message naming a newsgroup that already exists is
4232     requesting a change in the moderation status or description
4233     of the newsgroup. The same rules apply.
4234    
4235    
4236     7.4. rmgroup
4237    
4238     The rmgroup message requests that a newsgroup be deleted:
4239    
4240     rmgroup-arguments = newsgroup-name
4241     rmgroup-body = body
4242    
4243     The sole argument is the newsgroup name. The body is a com-
4244     ment, which software MUST ignore; it SHOULD contain an
4245     explanation of the decision to delete the newsgroup.
4246    
4247     NOTE: Criteria for newsgroup deletion vary widely
4248     and are outside the scope of this Draft, but if
4249     formal procedures of one kind or another were fol-
4250     lowed in the decision, the body should mention
4251     this. Administrators often look for such informa-
4252     tion when deciding whether to comply with cre-
4253     ation/deletion requests.
4254    
4255     A rmgroup message which lacks an Approved header MUST be
4256     ignored.
4257    
4258     NOTE: It would also be desirable to ignore a
4259     rmgroup message unless its Approved header names a
4260     person who is authorized (in some sense) to delete
4261     such a newsgroup. A cooperating subnet with suf-
4262     ficiently strong coordination to maintain a cor-
4263     rect and current list of authorized deleters might
4264     wish to do so for its internal newsgroups. It
4265     also (or alternatively) might wish to ignore a
4266     rmgroup message for an internal newsgroup that was
4267     posted (or cross-posted) to a non-internal news-
4268     group.
4269    
4270     Unexpected deletion of a newsgroup being a disruptive
4271     action, implementations are strongly advised to refer
4272     rmgroup messages to an administrator by default, unless per-
4273     haps the message can be determined to have originated within
4274     a cooperating subnet whose members are considered trustwor-
4275     thy. Abuses have occurred.
4276    
4277    
4278     7.5. sendsys, version, whogets
4279    
4280     The sendsys message requests that a description of the
4281     relayer's news feeds to other relayers be mailed to the
4282     article's reply address:
4283    
4284    
4285    
4286    
4287    
4288     2 June 1994 - 65 - expires 15 July 1994
4289    
4290    
4291    
4292    
4293    
4294     INTERNET DRAFT to be NEWS sec. 7.5
4295    
4296    
4297     sendsys-arguments = [ relayer-name ]
4298     sendsys-body = body
4299    
4300     If there is an argument, relayers other than the one named
4301     by the argument MUST not respond. The body is a comment,
4302     which software MUST ignore; it SHOULD contain an explanation
4303     of the reason for the request.
4304    
4305     The version message requests that the name and version of
4306     the relayer software be mailed to the reply address:
4307    
4308     version-arguments =
4309     version-body = body
4310    
4311     There are no arguments. The body is a comment, which soft-
4312     ware MUST ignore; it SHOULD contain an explanation of the
4313     reason for the request.
4314    
4315     The whogets message requests that a description of the
4316     relayer and its news feeds to other relayers be mailed to
4317     the article's reply address:
4318    
4319     whogets-arguments = newsgroup-name [ space relayer-name ]
4320     whogets-body = body
4321    
4322     The first argument is the name of the "target newsgroup",
4323     specifying the newsgroup for which propagation information
4324     is desired. This MUST be a complete newsgroup name, not the
4325     name of a hierarchy or a portion of a newsgroup name that is
4326     not itself the name of a newsgroup. If there is a second
4327     argument, only the relayer named by that argument should
4328     respond. The body is a comment, which software MUST ignore;
4329     it SHOULD contain an explanation of the reason for the
4330     request.
4331    
4332     NOTE: Whogets is intended as a replacement for
4333     sendsys (and version) with a precisely-specified
4334     reply format. Since the syntax for specifying
4335     what newsgroups get sent to what other relayers
4336     varies widely between different forms of relayer
4337     software, the only practical way to standardize
4338     the reply format is to indicate a specific news-
4339     group and ask where THAT newsgroup propagates.
4340     The requirement that it be a complete newsgroup
4341     name is intended to (largely) avoid the problem of
4342     having to answer "yes and no" in cases where not
4343     all newsgroups in a hierarchy are sent.
4344    
4345     Any of these messages lacking an Approved header MUST be
4346     ignored. Response to any of these messages SHOULD be
4347     delayed for at least 24 hours, and no response should be
4348     attempted if the message has been cancelled in that time.
4349     Also, no response SHOULD be attempted unless the local part
4350     of the destination address is "newsmap". News
4351    
4352    
4353    
4354     2 June 1994 - 66 - expires 15 July 1994
4355    
4356    
4357    
4358    
4359    
4360     INTERNET DRAFT to be NEWS sec. 7.5
4361    
4362    
4363     administrators SHOULD arrange for mail to "newsmap" on their
4364     systems to be discarded (without reply) unless legitimate
4365     use is in progress.
4366    
4367     NOTE: Because these messages can cause many, many
4368     relayers to send mail to one person, such mes-
4369     sages, specifying mailing to an innocent person's
4370     mailbox, have been forged as a half-witted practi-
4371     cal joke. A delay gives administrators time to
4372     notice a fraudulent message and act (by cancelling
4373     the message, preparing to divert the flood of mail
4374     into the bit bucket, or both). Restriction of the
4375     destination address to "newsmap" reduces the
4376     appeal of fraud by making it impossible to use it
4377     to harass a normal user. (A site which does NOT
4378     discard mail to "newsmap", but rather bounces it
4379     back, may incur higher communications costs than
4380     if the mail had been accepted into a user's mail-
4381     box... but a malicious forger could accomplish
4382     this anyway, by using an address whose local part
4383     is very unlikely to be a legitimate mailbox name.)
4384    
4385     NOTE: RFC 1036 did not require the Approved header
4386     for these control messages. This has been added
4387     because of the possibility that cryptographic
4388     authentication of Approved headers will become
4389     available.
4390    
4391     The body of the reply to a sendsys message SHOULD be of the
4392     form:
4393    
4394     sendsys-reply = responder 1*sys-line
4395     responder = "Responding-System:" space domain eol
4396     sys-line = relayer-name ":" newsgroup-patterns [ ":" text ] eol
4397     newsgroup-patterns = newsgroup-name *( "," newsgroup-name )
4398    
4399     The first line identifies the responding system, using a
4400     syntax resembling a header (but note that it is part of the
4401     BODY). Remaining lines indicate what newsgroups are sent to
4402     what other systems. The syntax of newsgroup patterns is not
4403     well standardized; the form described is common (often with
4404     newsgroup names only partially given, denoting all names
4405     starting with a particular set of components) but not uni-
4406     versal. The whogets message provides a better-defined
4407     alternative.
4408    
4409     The reply to a version message is of somewhat ill-defined
4410     form, with a body normally consisting of a single line of
4411     text that somehow describes the version of the relayer soft-
4412     ware. The whogets message provides a better-defined alter-
4413     native.
4414    
4415     The body of the reply to a whogets message MUST be of the
4416     form:
4417    
4418    
4419    
4420     2 June 1994 - 67 - expires 15 July 1994
4421    
4422    
4423    
4424    
4425    
4426     INTERNET DRAFT to be NEWS sec. 7.5
4427    
4428    
4429     whogets-reply = responder-domain responder-relayer response-date
4430     responding-to arrived-via responder-version
4431     whogets-delimiter *pass-line
4432     responder-domain = "Responding-System:" space domain eol
4433     responder-relayer = "Responding-Relayer:" space relayer-name eol
4434     response-date = "Response-Date:" space date eol
4435     responding-to = "Responding-To:" space message-id eol
4436     arrived-via = "Arrived-Via:" path-list eol
4437     responder-version = "Responding-Version:" space nonblank-text eol
4438     whogets-delimiter = eol
4439     pass-line = relayer-name [ space domain ] eol
4440    
4441     The first six lines identify the responding relayer by its
4442     Internet domain name (use of the ".uucp" and ".bitnet"
4443     pseudo-domains is permissible, for registered hosts in them,
4444     but discouraged) and its relayer name, specify the date when
4445     the reply was generated and the message ID of the whogets
4446     message being replied to, give the path list (from the Path
4447     header) of the whogets message (which MAY, if absolutely
4448     necessary, be truncated to a convenient length, but MUST
4449     contain at least the leading three relayer names), and indi-
4450     cate the version of relayer software responding. Note that
4451     these lines are part of the BODY even though their format
4452     resembles that of headers. Despite the apparently-fixed
4453     order specified by the syntax above, they can appear in any
4454     order, but there must be exactly one of each.
4455    
4456     After those preliminaries, and an empty line to unambigu-
4457     ously define their end, the remaining lines are the relayer
4458     names (which MAY be accompanied by the corresponding domain
4459     names, if known) of systems which the responding system
4460     passes the target newsgroup to. Only the names of news
4461     relayers are to be included.
4462    
4463     NOTE: It is desirable for a reply to identify its
4464     source by both domain name and relayer name
4465     because news propagation is governed by the latter
4466     but location in a broader context is best deter-
4467     mined by the former. The date and whogets message
4468     ID should, in principle, be present in the MAIL
4469     headers, but are included in the body for robust-
4470     ness in the presence of uncooperative mail sys-
4471     tems. The reason for the path list is discussed
4472     below. Adding version information eliminates the
4473     need for a separate message to gather it.
4474    
4475     NOTE: The limitation of pass lines to contain only
4476     names of news relayers is meant to exclude names
4477     used within a single host (as identifiers for mail
4478     gateways, portions of ihave/sendme implementa-
4479     tions, etc.), which do not actually refer to other
4480     hosts.
4481    
4482    
4483    
4484    
4485    
4486     2 June 1994 - 68 - expires 15 July 1994
4487    
4488    
4489    
4490    
4491    
4492     INTERNET DRAFT to be NEWS sec. 7.5
4493    
4494    
4495     A relayer which is unaware of the existence of the target
4496     newsgroup MUST not reply to a whogets message at all,
4497     although this MUST not influence decisions on whether to
4498     pass the article on to other relayers.
4499    
4500     NOTE: While this may result in discontinuous maps
4501     in cases where some hosts have not honored
4502     requests for creation of a newsgroup, it will also
4503     prevent a flood of useless responses in the event
4504     that a whogets message intended to map a small
4505     region "leaks" out to a larger one. The possibil-
4506     ity of discontinuous recognition of a newsgroup
4507     does make it important that the whogets message
4508     itself continue to propagate (if other criteria
4509     permit). This is also the reason for the inclu-
4510     sion of the whogets message's path list, or at
4511     least the leading portion of it, in the reply: to
4512     permit reconstruction of at least small gaps in
4513     maps.
4514    
4515     Different networks set different rules for the legitimacy of
4516     these messages, given that they may reveal details of orga-
4517     nization-internal topology that are sometimes considered
4518     proprietary.
4519    
4520     NOTE: On Usenet, in particular, willingness to
4521     respond to these messages is held to be a condi-
4522     tion of network membership: the topology of Usenet
4523     is public information. Organizations wishing to
4524     belong to such networks while keeping their inter-
4525     nal topology confidential might wish to organize
4526     their internal news software so that all articles
4527     reaching outsiders appear to be from a single
4528     "gatekeeper" system, with the details of internal
4529     topology hidden behind that system.
4530    
4531     UNRESOLVED ISSUE: It might be useful to have a way
4532     to set some sort of hop limit for these.
4533    
4534    
4535     7.6. checkgroups
4536    
4537     The checkgroups control message contains a supposedly
4538     authoritative list of the valid newsgroups within some sub-
4539     set of the newsgroup name space:
4540    
4541     checkgroups-arguments =
4542     checkgroups-body = [ invalidation ] valid-groups
4543     / invalidation
4544     invalidation = "!" plain-component *( "," plain-component ) eol
4545     valid-groups = 1*( description-line eol )
4546    
4547     There are no arguments. The body lines (except possibly for
4548     an initial invalidation) each contain a description line for
4549    
4550    
4551    
4552     2 June 1994 - 69 - expires 15 July 1994
4553    
4554    
4555    
4556    
4557    
4558     INTERNET DRAFT to be NEWS sec. 7.6
4559    
4560    
4561     a newsgroup, as defined under the newgroup message (section
4562     7.3).
4563    
4564     NOTE: Some other, ill-defined, forms of the check-
4565     groups body were formerly used. See appendix A.
4566    
4567     The checkgroups message applies to all hierarchies contain-
4568     ing any of the newsgroups listed in the body. The check-
4569     groups message asserts that the newsgroups it lists are the
4570     only newsgroups in those hierarchies. If there is an inval-
4571     idation, it asserts that the hierarchies it names no longer
4572     contain any newsgroups.
4573    
4574     Processing a checkgroups message MAY cause a local list of
4575     newsgroup descriptions to be updated. It SHOULD also cause
4576     the local lists of newsgroups (and their moderation sta-
4577     tuses) in the mentioned hierarchies to be checked against
4578     the message. The results of the check MAY be used for auto-
4579     matic corrective action, or MAY be reported to the news
4580     administrator in some way.
4581    
4582     NOTE: Automatically updating descriptions of
4583     existing newsgroups is relatively safe. In the
4584     case of newsgroup additions or deletions, simply
4585     notifying the administrator is generally the wis-
4586     est action, unless perhaps the message can be
4587     determined to have originated within a cooperating
4588     subnet whose members are considered trustworthy.
4589    
4590     NOTE: There is a problem with the checkgroups con-
4591     cept: not all newsgroups in a hierarchy necessar-
4592     ily propagate to the same set of machines.
4593     (Notably, there is a set of newsgroups known as
4594     the "inet" newsgroups, which have relatively lim-
4595     ited distribution but coexist in several hierar-
4596     chies with more widely-distributed newsgroups.)
4597     The advice of checkgroups should always be taken
4598     with a grain of salt, and should never be followed
4599     blindly.
4600    
4601    
4602     8. Transmission Formats
4603    
4604     While this Draft does not specify transmission methods
4605     except to place a few constraints on them, there are some
4606     data formats used only for transmission that are unique to
4607     news.
4608    
4609    
4610     8.1. Batches
4611    
4612     For efficient bulk transmission and processing of news arti-
4613     cles, it is often desirable to transmit a number of them as
4614     a single block of data, a "batch". The format of a batch
4615    
4616    
4617    
4618     2 June 1994 - 70 - expires 15 July 1994
4619    
4620    
4621    
4622    
4623    
4624     INTERNET DRAFT to be NEWS sec. 8.1
4625    
4626    
4627     is:
4628    
4629     batch = 1*( batch-header article )
4630     batch-header = "#! rnews " article-size eol
4631     article-size = 1*digit
4632    
4633     A batch is a sequence of articles, each prefixed by a header
4634     line that includes its size. The article size is a decimal
4635     count of the octets in the article, counting each EOL as one
4636     octet regardless of how it is actually represented.
4637    
4638     NOTE: A relayer might wish to accept either a sin-
4639     gle article or a batch as input. Since "#" cannot
4640     appear in a header name, examination of the first
4641     octet of the input will reveal its nature.
4642    
4643     NOTE: In the header line, there is exactly one
4644     blank before "rnews", there is exactly one blank
4645     after "rnews", and the EOL immediately follows the
4646     article size. Beware that some software inserts
4647     non-standard trash after the size.
4648    
4649     NOTE: Despite the similarity of this format to the
4650     executable-script format used by some operating
4651     systems, it is EXTREMELY unwise to just feed
4652     incoming batches to a command interpreter in the
4653     anticipation that it will run a command named
4654     "rnews" to process the batch. Unless arrangements
4655     are made to very tightly restrict the range of
4656     commands that can be executed by this means, the
4657     security implications are disastrous.
4658    
4659    
4660     8.2. Encoded Batches
4661    
4662     When transmitting news, especially over communications links
4663     that are slow or are billed by the bit, it is often desir-
4664     able to batch news and apply data compression to the
4665     batches. Transmission links sending compressed batches
4666     SHOULD use out-of-band means of communication to specify the
4667     compression algorithm being used. If there is no way to
4668     send out-of-band information along with a batch, the follow-
4669     ing encapsulation for a compressed batch MAY be used:
4670    
4671     ec-batch = "#! " compression-keyword eol compressed-batch
4672     compression-keyword = "cunbatch"
4673    
4674     A line containing a keyword indicating the type of compres-
4675     sion is followed by the compressed batch. The only truly
4676     widespread compression keyword at present is "cunbatch",
4677     indicating compression using the widely-distributed "com-
4678     press" program. Other compression keywords MAY be used by
4679     mutual agreement between the hosts involved.
4680    
4681    
4682    
4683    
4684     2 June 1994 - 71 - expires 15 July 1994
4685    
4686    
4687    
4688    
4689    
4690     INTERNET DRAFT to be NEWS sec. 8.2
4691    
4692    
4693     NOTE: An encapsulated compressed batch is NOT, in
4694     general, a text file, despite having an initial
4695     text line. This combination of text and non-text
4696     data is often awkward to handle; for example,
4697     standard decompression programs cannot be used
4698     without first stripping off the initial line, and
4699     that in turn is painful to do because many text-
4700     handling tools that are superficially suited to
4701     the job do not cope well with non-text data.
4702     Hence the recommendation that out-of-band communi-
4703     cation be used instead when possible.
4704    
4705     NOTE: For UUCP transmission, where a batch is typ-
4706     ically transmitted by invoking the remote command
4707     "rnews" with the batch as its input stream, a
4708     plausible out-of-band method for indicating a com-
4709     pression type would be to give a compression key-
4710     word in an option to "rnews", perhaps in the form:
4711    
4712     rnews -d decompressor
4713    
4714     where "decompressor" is the name of a decompres-
4715     sion program (e.g. "uncompress" for a batch com-
4716     pressed with "compress" or "gunzip" for a batch
4717     compressed with "gzip"). How this decompression
4718     program is located and invoked by the receiving
4719     relayer is implementation-specific.
4720    
4721     NOTE: See the notes in section 8.1 on the inadvis-
4722     ability of feeding batches directly to command
4723     interpreters.
4724    
4725     NOTE: There is exactly one blank between "#!" and
4726     the compression keyword, and the EOL immediately
4727     follows the keyword.
4728    
4729    
4730     8.3. News Within Mail
4731    
4732     It is often desirable to transmit news as mail, either for
4733     the convenience of a human recipient or because that is the
4734     only type of transmission available on a restrictive commu-
4735     nication path.
4736    
4737     Given the similarity between the news format and the MAIL
4738     format, it is superficially attractive to just send the news
4739     article as a mail message. This is typically a mistake:
4740     mail-handling software often feels free to manipulate vari-
4741     ous headers in undesirable ways (in some cases, such as
4742     Sender, such manipulation is actually mandatory), and mail
4743     transmission problems etc. MUST be reported to the adminis-
4744     trators responsible for the mail transmission rather than to
4745     the article's author. In general, news sent as mail should
4746     be encapsulated to separate the mail headers and the news
4747    
4748    
4749    
4750     2 June 1994 - 72 - expires 15 July 1994
4751    
4752    
4753    
4754    
4755    
4756     INTERNET DRAFT to be NEWS sec. 8.3
4757    
4758    
4759     headers.
4760    
4761     When the intended recipient is a human, any convenient form
4762     of encapsulation may be used. Recommended practice is to
4763     use MIME encapsulation with a content type of "mes-
4764     sage/news", given that news articles have additional seman-
4765     tics beyond what "message/rfc822" implies.
4766    
4767     NOTE: "message/news" was registered as a standard
4768     subtype by IANA 22 June 1993.
4769    
4770     When mail is being used as a transmission path between two
4771     relayers, however, a standard method is desirable. Cur-
4772     rently the standard method is to send the mail to an address
4773     whose local part is "rnews", with whatever mail headers are
4774     necessary for successful transmission. The news article
4775     (including its headers) is sent as the body of the mail mes-
4776     sage, with an "N" prepended to each line.
4777    
4778     NOTE: The "N" reduces the probability of an inno-
4779     cent line in a news article being taken as a magic
4780     command to mail software, and makes it easy for
4781     receiving software to strip off any lines added by
4782     mail software (e.g. the trailing empty line added
4783     by some UUCP mail software).
4784    
4785     This method has its weaknesses. In particular, it assumes
4786     that the mail transmission channel can transmit nearly-
4787     arbitrary body text undamaged. When mail is being used as a
4788     transmission path of last resort, however, the mail system
4789     often has inconvenient preconceived notions about the format
4790     of message bodies. Various ad-hoc encoding schemes have
4791     been used to avoid such problems. The recommended method is
4792     to send a news article or batch as the body of a MIME mail
4793     message, using content type "application/news-transmission"
4794     and MIME's "base64" encoding (which is specifically designed
4795     to survive all known major mail systems).
4796    
4797     NOTE: In the process, MIME conventions could be
4798     used to fragment and reassemble an article which
4799     is too large to be sent as a single mail message
4800     over a transmission path that restricts message
4801     length. In addition, the "conversions" parameter
4802     to the content type could be used to indicate what
4803     (if any) compression method has been used. And
4804     the Content-MD5 header [rrr 1544] can be used as a
4805     "checksum" to provide high confidence of detecting
4806     accidental damage to the contents.
4807    
4808     UNRESOLVED ISSUE: The "conversions" parameter no
4809     longer exists. What should be done about this, if
4810     anything?
4811    
4812    
4813    
4814    
4815    
4816     2 June 1994 - 73 - expires 15 July 1994
4817    
4818    
4819    
4820    
4821    
4822     INTERNET DRAFT to be NEWS sec. 8.3
4823    
4824    
4825     NOTE: It might look tempting to use a content type
4826     such as "message/X-netnews", but MIME bans non-
4827     trivial encodings of the entire body of messages
4828     with content type "message". The intent is to
4829     avoid obscuring nested structure underneath encod-
4830     ings. For inter-relayer news transmission, there
4831     is no nested structure of interest, and it is
4832     important that the entire article (including its
4833     headers, not just its body) be protected against
4834     the vagaries of intervening mail software. This
4835     situation appears to fit the MIME description of
4836     circumstances in which "application" is the proper
4837     content type.
4838    
4839     NOTE: "application/news-transmission", with a
4840     "conversions" parameter, was registered as a stan-
4841     dard subtype by IANA 22 June 1993.
4842    
4843     UNRESOLVED ISSUE: The "conversions" parameter no
4844     longer exists in MIME. What should we do about
4845     this?
4846    
4847    
4848     8.4. Partial Batches
4849    
4850     UNRESOLVED ISSUE: The existing batch conventions
4851     assemble (potentially) many articles into one
4852     batch. Handling very large articles would be sub-
4853     stantially less troublesome if there was also a
4854     fragmentation convention for splitting a large
4855     article into several batches. Is this worth
4856     defining at this time?
4857    
4858    
4859     9. Propagation and Processing
4860    
4861     Most aspects of news propagation and processing are imple-
4862     mentation-specific. The basic propagation algorithms, and
4863     certain details of how they are implemented, nevertheless
4864     need to be standard.
4865    
4866     There are two important principles that news implementors
4867     (and administrators) need to keep in mind. The first is the
4868     well-known Internet Robustness Principle:
4869    
4870     Be liberal in what you accept, and conservative in what you send.
4871    
4872     However, in the case of news there is an even more important
4873     principle, derived from a much older code of practice, the
4874     Hippocratic Oath (we will thus call this the Hippocratic
4875     Principle):
4876    
4877     First, do no harm.
4878    
4879    
4880    
4881    
4882     2 June 1994 - 74 - expires 15 July 1994
4883    
4884    
4885    
4886    
4887    
4888     INTERNET DRAFT to be NEWS sec. 9
4889    
4890    
4891     It is VITAL to realize that decisions which might be merely
4892     suboptimal in a smaller context can become devastating mis-
4893     takes when amplified by the actions of thousands of hosts
4894     within a few hours.
4895    
4896    
4897     9.1. Relayer General Issues
4898    
4899     Relayers MUST not alter the content of articles unnecessar-
4900     ily. Well-intentioned attempts to "improve" headers, in
4901     particular, typically do more harm than good. It is neces-
4902     sary for a relayer to prepend its own name to the Path con-
4903     tent (see section 5.6) and permissible for it to rewrite or
4904     delete the Xref header (see section 6.12). Relayers MAY
4905     delete the thoroughly-obsolete headers described in appendix
4906     A.3, although this behavior no longer seems useful enough to
4907     encourage. Other alterations SHOULD be avoided at all
4908     costs, as per the Hippocratic Principle.
4909    
4910     NOTE: As discussed in section 2.3, tidying up the
4911     headers of a user-prepared article is the job of
4912     the posting agent, not the relayer. The relayer's
4913     purpose is to move already-compliant articles
4914     around efficiently without damaging them. Note
4915     that in existing implementations, specific pro-
4916     grams may contain both posting-agent functions and
4917     relayer functions. The distinction is that post-
4918     ing-agent functions are invoked only on articles
4919     posted by local posters, never on articles
4920     received from other relayers.
4921    
4922     NOTE: A particular corollary of this rule is that
4923     relayers should not add headers unless truly nec-
4924     essary. In particular, this is not SMTP; do not
4925     add Received headers.
4926    
4927     Relayers MUST not pass non-conforming articles on to other
4928     relayers, except perhaps in a cooperating subnet that has
4929     agreed to permit certain kinds of non-conforming behavior.
4930     This is a direct consequence of the Internet Robustness
4931     Principle.
4932    
4933     The two preceding paragraphs may appear to be in conflict.
4934     What is to be done when a non-conforming article is
4935     received? The Robustness Principle argues that it should be
4936     accepted but must not be passed on to other relayers while
4937     still non-conforming, and the Hippocratic Principle strongly
4938     discourages attempts at repair. The conclusion that this
4939     appears to lead to is correct: a non-conforming article MAY
4940     be accepted for local filing and processing, or it MAY be
4941     discarded entirely, but it MUST not be passed on to other
4942     relayers.
4943    
4944    
4945    
4946    
4947    
4948     2 June 1994 - 75 - expires 15 July 1994
4949    
4950    
4951    
4952    
4953    
4954     INTERNET DRAFT to be NEWS sec. 9.1
4955    
4956    
4957     A relayer MUST not respond to the arrival of an article by
4958     sending mail to any destination, other than a local adminis-
4959     trator, except by explicit prearrangement with the recipi-
4960     ent. Neither posting an article (other than certain types
4961     of control message, see section 7.5) nor being the moderator
4962     of a moderated newsgroup constitutes such prearrangement.
4963     UNDER NO CIRCUMSTANCES WHATSOEVER may a relayer attempt to
4964     send mail to either an article's originator or a moderator.
4965    
4966     NOTE: Reporting apparent errors in message compo-
4967     sition is the job of a posting agent, not a
4968     relayer. The same is true of mailing moderated-
4969     newsgroup postings to moderators. In networks of
4970     thousands of cooperating relayers, it is simply
4971     unacceptable for there to be any circumstance
4972     whatsoever that causes any significant fraction of
4973     them to simultaneously send mail to the same des-
4974     tination. (Some control messages are exceptions,
4975     although perhaps ill-advised ones.) What might,
4976     in a smaller network, be a useful notification or
4977     forwarding becomes a deluge of near-identical mes-
4978     sages that can bring mail software to its knees
4979     and severely inconvenience recipients. Modera-
4980     tors, in particular, historically have suffered
4981     grievously from this.
4982    
4983     Notification of problems in incoming articles MAY go to
4984     local administrators, or at most (by prearrangement!) to
4985     the administrators of the neighboring relayer(s) that passed
4986     on the problematic articles.
4987    
4988     NOTE: It would be desirable to notify the author
4989     that his posting is not propagating as he expects.
4990     However, there is no known method for doing this
4991     that will scale up gracefully. (In particular,
4992     "notify only if within N relayers of the origina-
4993     tor" falls down in the presence of commercial news
4994     services like UUNET: there may be hundreds or
4995     thousands of relayers within a couple of hops of
4996     the originator.) The best that can be done right
4997     now is to notify neighbors, in hopes that the word
4998     will eventually propagate up the line, or organize
4999     regional monitoring at major hubs.
5000    
5001     If it is necessary to alter an article, e.g. translate it to
5002     another character set or alter its EOL representation,
5003     strenuous efforts should be made to ensure that such trans-
5004     formations are reversible, and that relayers or other soft-
5005     ware that might wish to reverse them know exactly how to do
5006     so.
5007    
5008     NOTE: For example, a cooperating subnet that
5009     exchanges articles using a non-ASCII character set
5010     like EBCDIC should define a standard, reversible
5011    
5012    
5013    
5014     2 June 1994 - 76 - expires 15 July 1994
5015    
5016    
5017    
5018    
5019    
5020     INTERNET DRAFT to be NEWS sec. 9.1
5021    
5022    
5023     ASCII-EBCDIC mapping and take pains to see that it
5024     is used at all points where the subnet meets the
5025     outside. If the only reason for using EBCDIC is
5026     that the readers typically employ EBCDIC devices,
5027     it would be more robust to employ ASCII as the
5028     interchange format and do the transformation in
5029     the reading and posting agents.
5030    
5031    
5032     9.2. Article Acceptance And Propagation
5033    
5034     When a relayer first receives an article, it must decide
5035     whether to accept it. (This applies regardless of whether
5036     the article arrived by itself or as part of a batch, and in
5037     principle regardless of whether it originated as a local
5038     posting or as traffic from another relayer.) In a cooperat-
5039     ing subnet with well-controlled propagation paths, some of
5040     the tests specified here MAY be delegated to centrally-
5041     located relayers; that is, relayers that can receive news
5042     ONLY via one of the central relayers might simplify accep-
5043     tance testing based on the assumption that incoming traffic
5044     has already passed the full set of tests at a central
5045     relayer.
5046    
5047     The wording that follows is based on a model in which arti-
5048     cles arrive on a relayer's host before acceptance tests are
5049     done. However, depending on the degree of integration of
5050     the transport mechanisms and the relayer, some or all of
5051     these tests MAY be done before the article is actually
5052     transmitted, so that articles which definitely will not be
5053     accepted need not be transmitted at all.
5054    
5055     The wording that follows also specifies a particular order
5056     for the acceptance tests. While this order is the obvious
5057     one, the tests MAY be done in any order.
5058    
5059     First, the relayer MUST verify that the article is a legal
5060     news article, with all mandatory headers present with legal
5061     contents.
5062    
5063     NOTE: This check in principle is done by the first
5064     relayer to see an article, so an article received
5065     from another relayer should always be legal, but
5066     there is enough old software still operational
5067     that this cannot be taken for granted; see the
5068     discussion of the Internet Robustness Principle in
5069     section 9.1.
5070    
5071     Second, the relayer MUST determine whether it has already
5072     seen this article (identified by its message ID). This is
5073     normally done by retaining a history of all article message
5074     IDs seen in the last N days, where the value of N is decided
5075     by the relayer's administrator but SHOULD be at least 7.
5076     Since N cannot practically be infinite, articles whose Date
5077    
5078    
5079    
5080     2 June 1994 - 77 - expires 15 July 1994
5081    
5082    
5083    
5084    
5085    
5086     INTERNET DRAFT to be NEWS sec. 9.2
5087    
5088    
5089     content indicates that they are older than N days are
5090     declared "stale" and are deemed to have been seen already.
5091    
5092     NOTE: This check is important because news propa-
5093     gation topology is typically redundant, often
5094     highly so, and it is not at all uncommon for a
5095     relayer to receive the same article from several
5096     neighbors. The history of already-seen message
5097     IDs can get quite large, hence the desire to limit
5098     its length... but it is important that it be long
5099     enough that slowly-propagating articles are not
5100     classed as stale. News propagation within the
5101     Internet is normally very rapid, but when UUCP
5102     links are involved, end-to-end delays of several
5103     days are not rare, so a week is not a particularly
5104     generous minimum.
5105    
5106     NOTE: Despite generally more rapid propagation in
5107     recent times, it is still not unheard-of for some
5108     propagation paths to be very slow. This can
5109     introduce the possibility of old articles arriving
5110     again after they are gone from the history. Hence
5111     the "stale" rule.
5112    
5113     Third, the relayer MUST determine whether any of the arti-
5114     cle's newsgroups are "subscribed to" by the host, i.e. fit a
5115     description of what hierarchies or newsgroups the site wants
5116     to receive.
5117    
5118     NOTE: This check is significant because informa-
5119     tion on what newsgroups a relayer wishes to
5120     receive is often stored at its neighbors, who may
5121     not have up-to-date information or may simplify
5122     the rules for implementation reasons. As a hedge
5123     against the possibility of missed or delayed new-
5124     group control messages, relayers may wish to
5125     observe a notion of a newsgroup subscription that
5126     is independent of the list of newsgroups actually
5127     known to the relayer. This would permit reception
5128     and relaying of articles in newsgroups that the
5129     relayer is not (yet) aware of, subject to more
5130     general criteria indicating that they are likely
5131     to be of interest.
5132    
5133     Once an article has been accepted, it may be passed on to
5134     other relayers. The fundamental news propagation rule is a
5135     flooding algorithm: on receiving and accepting an article,
5136     send it to all neighboring relayers not already in its path
5137     list that are sent its newsgroup(s) and distribution(s).
5138    
5139     NOTE: The path list's role in loop prevention may
5140     appear relatively unimportant, given that looping
5141     articles would typically be rejected as duplicates
5142     anyway. However, the path list's role in
5143    
5144    
5145    
5146     2 June 1994 - 78 - expires 15 July 1994
5147    
5148    
5149    
5150    
5151    
5152     INTERNET DRAFT to be NEWS sec. 9.2
5153    
5154    
5155     preventing superfluous transmissions is not triv-
5156     ial. In particular, the path list is the only
5157     thing that prevents relayer X, on receiving an
5158     article from relayer Y, from sending it back to Y
5159     again. (Indeed, the usual symptom of confusion
5160     about relayer names is that incoming news loops
5161     back in this manner.) The looping articles would
5162     be rejected as duplicates, but doubling the commu-
5163     nications load on every news transmission path is
5164     not to be taken lightly!
5165    
5166     In general, relayers SHOULD not make propagation decisions
5167     by "anticipation": relayer X, noting that the article's path
5168     list already contains relayer Y, decides not to send it to
5169     relayer Z because X anticipates that Z will get the article
5170     by a better path. If that is generally true, then why is
5171     there a news feed from X to Z at all? In fact, the "better
5172     path" may be running slowly or may be down. News propaga-
5173     tion is very robust precisely because some redundant trans-
5174     mission is done "just in case". If it is imperative to
5175     limit unnecessary traffic on a path, use of NNTP [rrr] or
5176     ihave/sendme (see section 7.2) to pass articles only when
5177     necessary is better than arbitrary decisions not to pass
5178     articles at all.
5179    
5180     Anticipation is occasionally justified in special cases.
5181     Such cases should involve both (1) a cooperating subnet
5182     whose propagation paths are well-understood and well-
5183     monitored, with failures and slowdowns noticed and dealt
5184     with promptly, and (2) a persistent pattern of heavy unnec-
5185     essary traffic on a path that is either slow or costly. In
5186     addition, there should be some reason why neither NNTP nor
5187     ihave/sendme is suitable as a solution to the problem.
5188    
5189    
5190     9.3. Administrator Contact
5191    
5192     It is desirable to have a standardized contact address for a
5193     relayer's administrators, in the spirit of the "postmaster"
5194     address for mail administrators. Mail addressed to "news-
5195     master" on a relayer's host MUST go to the administrator(s)
5196     of that relayer. Mail addressed to "usenet" on the
5197     relayer's host SHOULD be handled likewise. Mail addressed
5198     to either address on other hosts using the same news
5199     database SHOULD be handled likewise.
5200    
5201     NOTE: These addresses are case-sensitive, although
5202     it would be desirable for sequences equivalent to
5203     them using case-insensitive comparison to be han-
5204     dled likewise. While "newsmaster" seems the pre-
5205     ferred network-independent address, by analogy to
5206     "postmaster", there is an existing practice of
5207     using "usenet" for this purpose, and so "usenet"
5208     should be supported if at all possible (especially
5209    
5210    
5211    
5212     2 June 1994 - 79 - expires 15 July 1994
5213    
5214    
5215    
5216    
5217    
5218     INTERNET DRAFT to be NEWS sec. 9.3
5219    
5220    
5221     on hosts belonging to Usenet!). The address
5222     `news" is also sometimes used for purposes like
5223     this, but less consistently.
5224    
5225    
5226     10. Gatewaying
5227    
5228     Gatewaying of traffic between news networks using this Draft
5229     and those using other exchange mechanisms can be useful, but
5230     must be done cautiously. Gateway administrators are taking
5231     on significant responsibilities, and must recognize that the
5232     consequences of error can be quite serious.
5233    
5234    
5235     10.1. General Gatewaying Issues
5236    
5237     This section will primarily address the problems of gateway-
5238     ing traffic INTO news networks. Little can be said about
5239     the other direction without some specific knowledge of the
5240     network(s) involved. However, the two issues are not
5241     entirely independent: if a non-news network is gatewayed
5242     into a news network at more than one point, traffic injected
5243     into the non-news network by one gateway may appear at
5244     another as a candidate for injection back into the news net-
5245     work.
5246    
5247     This raises a more general principle, the single most impor-
5248     tant issue for gatewaying:
5249    
5250     Above all, prevent loops.
5251    
5252     The normal loop prevention of news transmission is vitally
5253     dependent on the Message-ID header. Any gateway which finds
5254     it necessary to remove this header, alter it, or supersede
5255     it (by moving it into the body), MUST take equally effective
5256     precautions against looping.
5257    
5258     NOTE: There are few things more effective at turn-
5259     ing news readers into a lynch mob than a malfunc-
5260     tioning gateway, or pair of gateways, that takes
5261     in news articles, mangles them just enough to pre-
5262     vent news relayers from recognizing them as dupli-
5263     cates, and regurgitates them back into the news
5264     stream. This happens rather too often.
5265    
5266     Gateway implementors should realize that gateways have all
5267     the responsibilities of relayers, plus the added complica-
5268     tions introduced by transformations between different infor-
5269     mation formats. Much of section 9's discussion of relayer
5270     issues is relevant to gateways as well. In particular,
5271     gateways SHOULD keep a history of recently-seen articles, as
5272     described in section 9.2, and not assume that articles will
5273     never reappear. This is particularly important for networks
5274     that have their own concept analogous to message IDs: a
5275    
5276    
5277    
5278     2 June 1994 - 80 - expires 15 July 1994
5279    
5280    
5281    
5282    
5283    
5284     INTERNET DRAFT to be NEWS sec. 10.1
5285    
5286    
5287     gateway should keep a history of traffic seen from BOTH
5288     directions.
5289    
5290     If at all possible, articles entering the non-news network
5291     SHOULD be marked in some way so that they will NOT be re-
5292     gatewayed back into news. Multiple gateways obviously must
5293     agree on the marking method used; if it is done by having
5294     them know each others' names, name changes MUST be coordi-
5295     nated with great care. If marking cannot be done, all
5296     transformations MUST be reversible so that a re-gatewayed
5297     article is identical to the original (except perhaps for a
5298     longer Path header).
5299    
5300     Gateways MUST not pass control messages (articles containing
5301     Control, Also-Control, or Supersedes headers) without remov-
5302     ing the headers that make them control messages, unless
5303     there are compelling reasons to believe that they are rele-
5304     vant to both sides and that conventions are compatible. If
5305     it is truly desirable to pass them unaltered, suitable pre-
5306     cautions MUST be taken to ensure that there is NO POSSIBIL-
5307     ITY of a looping control message.
5308    
5309     NOTE: The damage done by looping articles is mul-
5310     tiplied a thousandfold if one of the affected
5311     articles is something like a sendsys message (see
5312     section 7.3) that requests multiple automatic
5313     replies. Most gateways simply should not pass
5314     control messages at all. If some unusual reason
5315     dictates doing so, gateway implementors and admin-
5316     istrators are urged to consider bulletproof rate-
5317     limiting measures for the more destructive ones
5318     like sendsys, e.g. passing only one per hour no
5319     matter how many are offered.
5320    
5321     Gateways, like relayers, SHOULD make determined efforts to
5322     avoid mangling articles unnecessarily. In the case of gate-
5323     ways, some transformations may be inevitable, but keeping
5324     them to a minimum and ensuring that they are reversible is
5325     still highly desirable.
5326    
5327     Gateways MUST avoid destroying information. In particular,
5328     the restrictions of section 4.2.2 are best taken with a
5329     grain of salt in the context of gateways. Information that
5330     does not translate directly into news headers SHOULD be
5331     retained, perhaps in "X-" headers, both because it may be of
5332     interest to sophisticated readers and because it may be cru-
5333     cial to tracing propagation problems.
5334    
5335     Gateway implementors should take particular note of the dis-
5336     cussion of mailed replies, or more precisely the ban on
5337     same, in section 9.1. Gateway problems MUST be reported to
5338     the local administration, not to the innocent originator of
5339     traffic. "Gateway problems" here includes all forms of
5340     propagation anomaly on the non-news side of the gateway,
5341    
5342    
5343    
5344     2 June 1994 - 81 - expires 15 July 1994
5345    
5346    
5347    
5348    
5349    
5350     INTERNET DRAFT to be NEWS sec. 10.1
5351    
5352    
5353     e.g. unreachable addresses on a mailing list. Note that
5354     this requires consideration of possible misbehavior of
5355     "downstream" hosts, not just the gateway host.
5356    
5357    
5358     10.2. Header Synthesis
5359    
5360     News articles prepared by gateways MUST be legal news arti-
5361     cles. In particular, they MUST include all of the mandatory
5362     headers (see section 5) and MUST fully conform to the
5363     restrictions on said headers. This often requires that a
5364     gateway function not only as a relayer, but also partly as a
5365     posting agent, aiding in the synthesis of a conforming arti-
5366     cle from non-conforming input.
5367    
5368     NOTE: The full-conformance requirement needs par-
5369     ticularly careful attention when gatewaying mail-
5370     ing lists to news, because a number of constructs
5371     that are legal in MAIL headers are NOT permissible
5372     in news headers. (Note also that not all mail
5373     traffic fully conforms to even the MAIL specifica-
5374     tion.) The rest of this section will be phrased
5375     in terms of mail-to-news gatewaying, but most of
5376     it is more generally applicable.
5377    
5378     The mandatory headers generally present few problems.
5379    
5380     If no date information is available, the gateway should sup-
5381     ply a Date header with the gateway's current date. If only
5382     partial information is available (e.g. date but not time),
5383     this should be fleshed out to a full Date header by adding
5384     default values, not by mixing in parts of the gateway's cur-
5385     rent date. (Defaults should be chosen so that fleshed-out
5386     dates will not be in the future!) It may be necessary to
5387     map timezone information to the restricted forms permitted
5388     in the news Date header. See section 5.1.
5389    
5390     NOTE: The prohibition of mixing dates is on the
5391     theory that it is better to admit ignorance than
5392     to lie.
5393    
5394     If the author's address as supplied in the original message
5395     is not suitable for inclusion in a From header, the gateway
5396     MUST transform it so it is, e.g. by use of the "% hack" and
5397     the domain address of the gateway. The desire to preserve
5398     information is NOT an excuse for violating the rules. If
5399     the transformation is drastic enough that there is reason to
5400     suspect loss of information, it may be desirable to include
5401     the original form in an X- header, but the From header's
5402     contents MUST be as specified in section 5.2.
5403    
5404     If the message contains a Message-ID header, the contents
5405     should be dealt with as discussed in section 10.3. If there
5406     is no message ID present, it will be necessary to synthesize
5407    
5408    
5409    
5410     2 June 1994 - 82 - expires 15 July 1994
5411    
5412    
5413    
5414    
5415    
5416     INTERNET DRAFT to be NEWS sec. 10.2
5417    
5418    
5419     one, following the news rules (see section 5.3).
5420    
5421     Every effort should be made to produce a meaningful Subject
5422     header; see section 5.4. Many news readers select articles
5423     to read based on Subject headers, and inserting a place-
5424     holder like "<no subject available>" is considered highly
5425     objectionable. Even synthesizing a Subject header by pick-
5426     ing out the first half-dozen nouns and adjectives in the
5427     article body is better than using a placeholder, since it
5428     offers SOME indication of what the article might contain.
5429    
5430     The contents of the Newsgroups header (section 5.5) are usu-
5431     ally predetermined by gateway configuration, but a gateway
5432     to a network that has its own concept of newsgroups or dis-
5433     cussions might have to make transformations. Such transfor-
5434     mations should be reversible; otherwise confusion is likely
5435     on both sides.
5436    
5437     It will rarely be possible for gateways to provide a Path
5438     header that is both an accurate history of the relayers the
5439     article has passed through AS NEWS and a usable reply
5440     address. The history function MUST be given priority; see
5441     the discussion in section 5.6. It will usually be necessary
5442     for a gateway to supply an empty path list, abandoning the
5443     reply function.
5444    
5445     It is desirable for gatewayed articles to convey as much
5446     useful information as possible, e.g. by use of optional news
5447     headers (see section 6) when the relevant information is
5448     available. Synthesis of optional headers can generally fol-
5449     low similar rules.
5450    
5451     Software synthesizing References headers should note the
5452     discussion in section 6.5 concerning the incompatibility
5453     between MAIL and news. Also of interest is the possibility
5454     of incorporating information from In-Reply-To headers and
5455     from attribution lines in the body; an incomplete or some-
5456     what conjectural References header is much better than none
5457     at all, and reading agents already have to cope with incom-
5458     plete or slightly erroneous References lists.
5459    
5460    
5461     10.3. Message ID Mapping
5462    
5463     This section, like the previous one, is phrased in terms of
5464     mail being gatewayed into news, but most of the discussion
5465     should be more generally applicable.
5466    
5467     A particularly sticky problem of gatewaying mail into news
5468     is supplying legal news message IDs. Note, in particular,
5469     that not all MAIL message IDs are legal in news; the news
5470     syntax (specified in section 5.3, with related material in
5471     5.2) is more restrictive. Generating a fully-conforming
5472     news article from a mail message may require transforming
5473    
5474    
5475    
5476     2 June 1994 - 83 - expires 15 July 1994
5477    
5478    
5479    
5480    
5481    
5482     INTERNET DRAFT to be NEWS sec. 10.3
5483    
5484    
5485     the message ID somewhat.
5486    
5487     Generation and transformation of message IDs assumes partic-
5488     ular importance if a given mailing list (or whatever) is
5489     being handled by more than one gateway. It is highly desir-
5490     able that the same article contents not appear twice in the
5491     same newsgroup, which requires that they receive the same
5492     message ID from all gateways. Gateways SHOULD use the fol-
5493     lowing algorithm (possibly modified by the later discussion
5494     of gatewaying into more than one newsgroup) unless local
5495     considerations dictate another:
5496    
5497     1. Separate message ID from surroundings, if necessary.
5498     A plausible method for this is to start at the first
5499     "<", end at the next ">", and reject the message if
5500     no ">" is found or a second "<" is seen before the
5501     ">". Also reject the message if the message ID con-
5502     tains no "@" or more than one "@", or if it contains
5503     no ".". Also reject the message if the message ID
5504     contains non-ASCII characters, ASCII control charac-
5505     ters, or white space.
5506    
5507     NOTE: Any legitimate domain will include at
5508     least one ".". RFC 822 section 6.2.2 forbids
5509     white space in this context when passing mail
5510     on to non-MAIL software.
5511    
5512     2. Delete the leading "<" and trailing ">". Separate
5513     message ID into local part and domain at the "@".
5514    
5515     3. In both components, transliterate leading dots
5516     (".", ASCII 46), trailing dots, and dots after the
5517     first in sequences of two or more consecutive
5518     dots, into underscores (ASCII 95).
5519    
5520     4. In both components, transliterate disallowed char-
5521     acters other than dots (see the definition of
5522     <unquoted-char> in section 5.2) to underscores
5523     (ASCII 95).
5524    
5525     5. Form the message ID as
5526    
5527     "<" local-part "@" domain ">"
5528    
5529    
5530     NOTE: This algorithm is approximately that of Rich
5531     Salz's successful gatewaying package.
5532    
5533     Despite the desire to keep message IDs consistent across
5534     multiple gateways, there is also a more subtle issue that
5535     can require a different approach. If the same articles are
5536     being gatewayed into more than one newsgroup, and it is not
5537     possible to arrange that all gateways gateway them to the
5538     same cross-posted set of newsgroups, then the message IDs in
5539    
5540    
5541    
5542     2 June 1994 - 84 - expires 15 July 1994
5543    
5544    
5545    
5546    
5547    
5548     INTERNET DRAFT to be NEWS sec. 10.3
5549    
5550    
5551     the different newsgroups MUST be DIFFERENT.
5552    
5553     NOTE: Otherwise, arrival of an article in one
5554     newsgroup will prevent it from appearing in
5555     another, and which newsgroup a particular article
5556     appears in will be an accident of which direction
5557     it arrives from first. It is very difficult to
5558     maintain a coherent discussion when each partici-
5559     pant sees a randomly-selected 50% of the traffic.
5560     The fundamental problem here is that the basic
5561     assumption behind message IDs is being violated:
5562     the gateways are assigning the same message ID to
5563     articles that differ in an important respect
5564     (Newsgroups header).
5565    
5566     In such cases, it is suggested that the newsgroup name, or
5567     an agreed-on abbreviation thereof, be prepended to the local
5568     part of the message ID (with a separating ".") by the gate-
5569     way. This will ensure that multiple gateways generate the
5570     same message ID, while also ensuring that different news-
5571     groups can be read independently.
5572    
5573     NOTE: It is preferable to have the gateway(s)
5574     cross-post the article, avoiding the issue alto-
5575     gether, but this may not be feasible, especially
5576     if one newsgroup is widespread and the other is
5577     purely local.
5578    
5579    
5580     10.4. Mail to and from News
5581    
5582     Gatewaying mail to news, and vice-versa, is the most obvious
5583     form of news gatewaying. It is common to set up gateways
5584     between news and mail rather too casually.
5585    
5586     It is hard to go very wrong in gatewaying news into a mail-
5587     ing list, except for the non-trivial matter of making sure
5588     that error reports go to the local administration rather
5589     than to the authors of news articles. (This requires atten-
5590     tion to the "envelope address" as well as to the message
5591     headers.) Doing the reverse connection correctly is much
5592     harder than it looks.
5593    
5594     NOTE: In particular, just feeding the mail message
5595     to "inews -h" or the equivalent is NOT, repeat
5596     NOT, adequate to gateway mail to news. Signifi-
5597     cant gatewaying software is necessary to do it
5598     right. Not all headers of mail messages conform
5599     to even the MAIL specifications, never mind the
5600     stricter rules for news.
5601    
5602     It is useful to distinguish between two different forms of
5603     mail-to-news gatewaying: gatewaying a mailing list into a
5604     newsgroup, and operating a "post-by-mail" service in which
5605    
5606    
5607    
5608     2 June 1994 - 85 - expires 15 July 1994
5609    
5610    
5611    
5612    
5613    
5614     INTERNET DRAFT to be NEWS sec. 10.4
5615    
5616    
5617     individual articles can be posted to a newsgroup by mailing
5618     them to a specific address. In the first case, the message
5619     is already being "broadcast", and the situation can be
5620     viewed as gatewaying one form of news into another. The
5621     second case is closer to that of a moderator posting submis-
5622     sions to a moderated newsgroup.
5623    
5624     In either case, the discussions in the preceding two sec-
5625     tions are relevant, as is the Hippocratic Principle of sec-
5626     tion 9. However, some additional considerations are spe-
5627     cific to mail-to-news gatewaying.
5628    
5629     As mentioned in section 6, point-to-point headers like To
5630     and Cc SHOULD not appear as such in news, although it is
5631     suggested that they be transformed to "X-" headers, e.g. X-
5632     To and X-Cc, to preserve their information content for pos-
5633     sible use by readers or troubleshooters. The Received
5634     header is entirely specific to MAIL and SHOULD be deleted
5635     completely during gatewaying, except perhaps for the
5636     Received header supplied by the gateway host itself.
5637    
5638     The Sender header is a tricky case, one where mailing-list
5639     and post-by-mail practice should differ. For gatewaying
5640     mailing lists, the mailing-list host should be considered a
5641     relayer, and the From and Sender headers supplied in its
5642     transmissions left strictly untouched. For post-by-mail, as
5643     for a moderator posting a mailed submission, the Sender
5644     header should reflect the poster rather than the author. If
5645     a post-by-mail gateway receives a message with its own
5646     Sender header, it might wish to preserve the content in an
5647     X-Sender header.
5648    
5649     It will generally be necessary to transform between mail's
5650     In-Reply-To/References convention and news's References/See-
5651     Also convention, to preserve correct semantics of cross ref-
5652     erences. This also requires attention when going the other
5653     way, from news to mail. See the discussion of the differ-
5654     ence in section 6.5.
5655    
5656    
5657     10.5. Gateway Administration
5658    
5659     Any news system will benefit from an attentive administra-
5660     tor, preferably assisted by automated monitoring for anoma-
5661     lies. This is particularly true of gateways. Gateway soft-
5662     ware SHOULD be instrumented so that unusual occurrences,
5663     such as sudden massive surges in traffic, are reported
5664     promptly. It is desirable, in fact, to go further: gateway
5665     software SHOULD endeavour to limit damage in the event that
5666     the administrator does not respond promptly.
5667    
5668     NOTE: For example, software might limit the gate-
5669     waying rate by queueing incoming traffic and emp-
5670     tying the queue at a finite maximum rate (well
5671    
5672    
5673    
5674     2 June 1994 - 86 - expires 15 July 1994
5675    
5676    
5677    
5678    
5679    
5680     INTERNET DRAFT to be NEWS sec. 10.5
5681    
5682    
5683     below the maximum that the host is capable of!)
5684     which is set by the administrator and is not
5685     raised automatically.
5686    
5687     Traffic gatewayed into a news network SHOULD include a suit-
5688     able header, perhaps X-Gateway-Administrator, giving an
5689     electronic address that can be used to report problems.
5690     This SHOULD be an address that goes direct to a human, not
5691     to a "routine administrative issues" mailbox that is exam-
5692     ined only occasionally, since the point is to be able to
5693     reach the administrator quickly in an emergency. Gateway
5694     administrators SHOULD arrange substitutes to cover gateway
5695     operation (with suitable redirection of mail) when they are
5696     on vacation etc.
5697    
5698    
5699     11. Security And Related Issues
5700    
5701     Although the interchange format itself raises no significant
5702     security issues, the wider context does.
5703    
5704    
5705     11.1. Leakage
5706    
5707     The most obvious form of security problem with news is
5708     "leakage" of articles which are intended to have only
5709     restricted circulation. The flooding algorithm is EXTREMELY
5710     good at finding any path by which articles can leave a sub-
5711     net with supposedly-restrictive boundaries. Substantial
5712     administrative effort is required to ensure that local news-
5713     groups remain local, unless connections to the outside world
5714     are tightly restricted.
5715    
5716     A related problem is that the sendme control message can be
5717     used to ask for any article by its message ID. The useful-
5718     ness of this has declined as message-ID generation algo-
5719     rithms have become less predictable, but it remains a poten-
5720     tial problem for "secure" newsgroups. Hosts with such news-
5721     groups may wish to disable the sendme control message
5722     entirely.
5723    
5724     The sendsys, version, and whogets control messages also
5725     allow "outsiders" to request information from "inside",
5726     which may reveal details of internal topology (etc.) that
5727     are considered confidential. (Note that at least limited
5728     openness about such matters may be a condition of membership
5729     in such networks, e.g. Usenet.)
5730    
5731     Organizations wishing to control these forms of leakage are
5732     strongly advised to designate a small number of "official
5733     gateway" hosts to handle all news exchange with the outside
5734     world, so that a bounded amount of administrative effort is
5735     needed to control propagation and eliminate problems.
5736     Attempts to keep news out entirely, by refusing to support
5737    
5738    
5739    
5740     2 June 1994 - 87 - expires 15 July 1994
5741    
5742    
5743    
5744    
5745    
5746     INTERNET DRAFT to be NEWS sec. 11.1
5747    
5748    
5749     an official gateway, typically result in large numbers of
5750     unofficial partial gateways appearing over time. Such a
5751     configuration is much more difficult to troubleshoot.
5752    
5753     A somewhat-related problem is the possibility of proprietary
5754     material being disclosed unintentionally by a poster who
5755     does not realize how far his words will propagate, either
5756     from sheer misunderstanding or because of errors made (by
5757     human or software) in followup preparation. There is little
5758     that can be done about this except education.
5759    
5760    
5761     11.2. Attacks
5762    
5763     Although the limitations of the medium restrict what can be
5764     done to attack a host via news, some possibilities exist,
5765     most of them problems news shares with mail.
5766    
5767     If reading agents are careless about transmitting non-
5768     printable characters to output devices, malicious posters
5769     may post articles containing control sequences ("letter-
5770     bombs") meant to have various destructive effects on output
5771     devices. Possible effects depend on the device, but they
5772     can include hardware damage (e.g. by repeated writing of
5773     values into configuration memories that can tolerate only a
5774     limited number of write cycles) and security violation (e.g.
5775     by reprogramming function keys potentially used by privi-
5776     leged readers).
5777    
5778     A more sophisticated variation on the letterbomb is inclu-
5779     sion of "Trojan horses" in programs. Obviously, readers
5780     must be cautious about using software found in news, but
5781     more subtly, reading agents must also exercise care. MIME
5782     messages can include material that is executable in some
5783     sense, such as PostScript documents (which are programs!),
5784     and letterbombs may be introduced into such material.
5785    
5786     Given the presence of finite resources and other software
5787     limitations, some degree of system disruption can be
5788     achieved by posting otherwise-innocent material in great
5789     volume, either in single huge articles (see section 4.6) or
5790     in a stream of modest-sized articles. (Some would say that
5791     the steady growth of Usenet volume constitutes a subtle and
5792     unintentional attack of the latter type; certainly it can
5793     have disruptive effects if administrators are inattentive.)
5794     Systems need some ability to cope with surges, because sin-
5795     gle huge articles occur occasionally as the result of soft-
5796     ware error, innocent misunderstanding, or deliberate malice,
5797     and downtime at upstream hosts can cause droughts, followed
5798     by floods, of legitimate articles. (There is also a certain
5799     amount of normal variation; for example, Usenet traffic is
5800     noticeably lighter on weekends and during Christmas holi-
5801     days, and rises noticeably at the start of the school term
5802     of North American universities.) However, a site that
5803    
5804    
5805    
5806     2 June 1994 - 88 - expires 15 July 1994
5807    
5808    
5809    
5810    
5811    
5812     INTERNET DRAFT to be NEWS sec. 11.2
5813    
5814    
5815     normally receives little traffic may be quite vulnerable to
5816     "swamping" attack if its software is insufficiently careful.
5817    
5818     In general, careless implementation may open doors that are
5819     not intrinsic to news. In particular, implementation of
5820     control messages (see sections 6.6 and 7) and unbatchers
5821     (see section 8.1 and 8.2) via a command interpreter requires
5822     substantial precautions to ensure that only the intended
5823     capabilities are available. Care must also be taken that
5824     article-supplied text is not fed to programs that have
5825     escapes to command interpreters.
5826    
5827     Finally, there is considerable potential for malice in the
5828     sendsys, version, and whogets control messages. They are
5829     not harmful to the hosts receiving them as news, but they
5830     can be used to enlist those hosts (by the thousands) as
5831     unwitting allies in a mail-swamping attack on a victim who
5832     may not even receive news. The precautions discussed in
5833     section 7.5 can reduce the potential for such attacks con-
5834     siderably, but the hazard cannot be eliminated as long as
5835     these control messages exist.
5836    
5837    
5838     11.3. Anarchy
5839    
5840     The highly distributed nature of news propagation, and the
5841     lack of adequate authentication protocols (especially for
5842     use over the less-interactive transport mechanisms such as
5843     UUCP), make article forgery relatively straightforward. It
5844     may be possible to at least track a forgery to its source,
5845     once it is recognized as such, but clever forgers can make
5846     even that relatively difficult. The assumption that forg-
5847     eries will be recognized as such is also not to be taken for
5848     granted; readers are notoriously prone to blindly assuming
5849     authenticity. If a forged article's initial path list
5850     includes the relayer name of the supposed poster's host, the
5851     article will never be sent to that host, and the alleged
5852     author may learn about the forgery secondhand or not at all.
5853    
5854     A particularly noxious form of forgery is the forged "can-
5855     cel" control message. Notably, it is relatively straight-
5856     forward to write software that will automatically send out a
5857     (forged) cancel message for any article meeting some crite-
5858     rion, e.g. written by a specific author. The authentication
5859     problems discussed in section 7.1 make it difficult to solve
5860     this without crippling cancel's important functionality.
5861    
5862     A related problem is the possibility of disagreements over
5863     newsgroup creation, on networks where such things are not
5864     decided by central authorities. There have been cases of
5865     "rmgroup wars", where one poster persistently sends out new-
5866     group messages to create a newsgroup and another, equally
5867     persistently, sends out rmgroup messages asking that it be
5868     removed. This is not particularly damaging, if relayers are
5869    
5870    
5871    
5872     2 June 1994 - 89 - expires 15 July 1994
5873    
5874    
5875    
5876    
5877    
5878     INTERNET DRAFT to be NEWS sec. 11.3
5879    
5880    
5881     configured to be cautious, but can cause serious confusion
5882     among innocent third parties who just want to know whether
5883     they can use the newsgroup for communication or not.
5884    
5885    
5886     11.4. Liability
5887    
5888     News shares the legal uncertainty surrounding other forms of
5889     electronic communication: what rules apply to this new
5890     medium of information exchange? News is a particularly
5891     problematic case because it is a broadcast medium rather
5892     than a point-to-point one like mail, and analogies to older
5893     forms of communication are particularly weak.
5894    
5895     Are news-carrying hosts common carriers, like the phone com-
5896     panies, providing communications paths without having either
5897     authority over or responsibility for content? Or are they
5898     publishers, responsible for the content regardless of
5899     whether they are aware of it or not? Or something in
5900     between? Such questions are particularly significant when
5901     the content is technically criminal, e.g. some types of sex-
5902     ually-oriented material in some jurisdictions, in which case
5903     ignorance of its presence may not be an adequate defence.
5904    
5905     Even in milder situations such as libel or copyright viola-
5906     tion, the responsibilities of the poster, his host, and
5907     other hosts carrying the traffic are unclear. Note, in par-
5908     ticular, the problems arising when the article is a forgery,
5909     or when the alleged author claims it is a forgery but cannot
5910     prove this.
5911    
5912    
5913     A. Archeological Notes
5914    
5915    
5916     A.1. A-News Article Format
5917    
5918     The obsolete "A News" article format consisted of exactly
5919     five lines of header information, followed by the body. For
5920     example:
5921    
5922     Aeagle.642
5923     news.misc
5924     cbosgd!mhuxj!mhuxt!eagle!jerry
5925     Fri Nov 19 16:14:55 1982
5926     Usenet Etiquette - Please Read
5927     body
5928     body
5929     body
5930    
5931     The first line consisted of an "A" followed by an article ID
5932     (analogous to a message ID and used for similar purposes).
5933     The second line was the list of newsgroups. The third line
5934     was the path. The fourth was the date, in the format above
5935    
5936    
5937    
5938     2 June 1994 - 90 - expires 15 July 1994
5939    
5940    
5941    
5942    
5943    
5944     INTERNET DRAFT to be NEWS sec. A.1
5945    
5946    
5947     (all fields fixed width), resembling an Internet date but
5948     not quite the same. The fifth was the subject.
5949    
5950     This format is documented for archeological purposes only.
5951     Do not generate articles in this format.
5952    
5953    
5954     A.2. Early B-News Article Format
5955    
5956     The obsolete pseudo-Internet article format, used briefly
5957     during the transition between the A News format and the mod-
5958     ern format, followed the general outline of a MAIL message
5959     but with some non-standard headers. For example:
5960    
5961     From: cbosgd!mhuxj!mhuxt!eagle!jerry (Jerry Schwarz)
5962     Newsgroups: news.misc
5963     Title: Usenet Etiquette -- Please Read
5964     Article-I.D.: eagle.642
5965     Posted: Fri Nov 19 16:14:55 1982
5966     Received: Fri Nov 19 16:59:30 1982
5967     Expires: Mon Jan 1 00:00:00 1990
5968    
5969     body
5970     body
5971     body
5972    
5973     The From header contained the information now found in the
5974     Path header, plus possibly the full name now typically found
5975     in the From header. The Title header contained what is now
5976     the Subject content. The Posted header contained what is
5977     now the Date content. The Article-I.D. header contained an
5978     article ID, analogous to a message ID and used for similar
5979     purposes. The Newsgroups and Expires headers were approxi-
5980     mately as now. The Received header contained the date when
5981     the latest relayer to process the article first saw it. All
5982     dates were in the above format, with all fields fixed width,
5983     resembling an Internet date but not quite the same.
5984    
5985     This format is documented for archeological purposes only.
5986     Do not generate articles in this format.
5987    
5988    
5989     A.3. Obsolete Headers
5990    
5991     Early versions of news software following the modern format
5992     sometimes generated headers like the following:
5993    
5994     Relay-Version: version B 2.10 2/13/83; site cbosgd.UUCP
5995     Posting-Version: version B 2.10 2/13/83; site eagle.UUCP
5996     Date-Received: Friday, 19-Nov-82 16:59:30 EST
5997    
5998     Relay-Version contained version information about the
5999     relayer that last processed the article. Posting-Version
6000     contained version information about the posting agent that
6001    
6002    
6003    
6004     2 June 1994 - 91 - expires 15 July 1994
6005    
6006    
6007    
6008    
6009    
6010     INTERNET DRAFT to be NEWS sec. A.3
6011    
6012    
6013     posted the article. Date-Received contained the date when
6014     the last relayer to process the article first saw it (in a
6015     slightly nonstandard format).
6016    
6017     These headers are documented for archeological purposes
6018     only. Do not generate articles using them.
6019    
6020    
6021     A.4. Obsolete Control Messages
6022    
6023     There once was a senduuname control message, resembling
6024     sendsys but requesting transmission of the list of hosts
6025     that the receiving host had UUCP connections to. This
6026     rapidly ceased to be of much use, and many organizations
6027     consider information about their internal connectivity to be
6028     confidential.
6029    
6030     Historically, a checkgroups body consisting of one or two
6031     lines, the first of the form "-n newsgroup", caused check-
6032     groups to apply to only that single newsgroup. This form is
6033     documented for archeological purposes only; do not use it.
6034    
6035     Historically, an article posted to a newsgroup whose name
6036     had exactly three components of which the third was "ctl"
6037     signified that article was to be taken as a control message.
6038     The Subject header specified the actions, in the same way
6039     the Control header does now. This form is documented for
6040     archeological purposes only; do not use it; do not implement
6041     it.
6042    
6043    
6044     B. A Quick Tour Of MIME
6045    
6046     (The editor wishes to thank Luc Rooijakkers; most of this
6047     appendix is a lightly-edited version of a summary he kindly
6048     supplied.)
6049    
6050     MIME (Multipurpose Internet Mail Extensions) is an upward-
6051     compatible set of extensions to RFC 822, currently docu-
6052     mented in RFCs 1341 and 1342. This appendix summarizes
6053     these documents. See the MIME RFCs for more information;
6054     they are very readable.
6055    
6056     UNRESOLVED ISSUE: These RFC numbers (here and
6057     elsewhere in this Draft) need updating when the
6058     new MIME RFCs come out.
6059    
6060     MIME defines the following new headers:
6061    
6062    
6063    
6064    
6065    
6066    
6067    
6068    
6069    
6070     2 June 1994 - 92 - expires 15 July 1994
6071    
6072    
6073    
6074    
6075    
6076     INTERNET DRAFT to be NEWS sec. B
6077    
6078    
6079     MIME-Version
6080     Content-Type
6081     Content-Transfer-Encoding
6082     Content-ID
6083     Content-Description
6084    
6085    
6086     The MIME-Version header is mandatory for all messages con-
6087     forming to the MIME specification and carries the version
6088     number of the MIME specification. Example:
6089    
6090     MIME-Version: 1.0
6091    
6092    
6093     The Content-Type header indicates the content type of the
6094     message. Content types are split into a top-level type and
6095     a subtype, separated by a slash. Auxiliary information can
6096     also be supplied, using an attribute-value notation. Exam-
6097     ple:
6098    
6099     Content-Type: text/plain; charset=us-ascii
6100    
6101     (In the absence of a Content-Type header this is in fact the
6102     default content type.)
6103    
6104     Important type/subtype combinations are
6105    
6106     text/plain Plain text, possibly in a non-
6107     ASCII character set.
6108    
6109     text/enriched A very simple wordprocessor-like
6110     language supporting character
6111     attributes (e.g., underlining),
6112     justification control, and multi-
6113     ple character sets. (This pro-
6114     posal has gone through several
6115     iterations and has recently split
6116     off from the main MIME RFCs into a
6117     separate document.)
6118    
6119     message/rfc822 A mail message conforming to a
6120     slightly-relaxed version of RFC
6121     822.
6122    
6123     message/partial Part of a message (supporting the
6124     transparent splitting and joining
6125     of messages when they are too
6126     large to be handled by some trans-
6127     port agent).
6128    
6129     message/external-body A message whose body is external.
6130     Possible access methods include
6131     via mail, FTP, local file, etc.
6132    
6133    
6134    
6135    
6136     2 June 1994 - 93 - expires 15 July 1994
6137    
6138    
6139    
6140    
6141    
6142     INTERNET DRAFT to be NEWS sec. B
6143    
6144    
6145     multipart/mixed A message whose body consists of
6146     multiple parts, possibly of dif-
6147     ferent types, intended to be
6148     viewed in serial order. Each part
6149     looks like an RFC 822 message,
6150     consisting of headers and a body.
6151     Most of the RFC 822 headers have
6152     no defined semantics for body
6153     parts.
6154    
6155     multipart/parallel Likewise, except that the parts
6156     are intended to be viewed in par-
6157     allel (on user agents that support
6158     it).
6159    
6160     multipart/alternative Likewise, except that the parts
6161     are intended to be semantically
6162     equivalent such that the part that
6163     best matches the capabilities of
6164     the environment should be dis-
6165     played. For example, a message
6166     may include plain-text, enriched-
6167     text, and postscript versions of
6168     some document.
6169    
6170     multipart/digest A variant of multipart/mixed espe-
6171     cially intended for message
6172     digests (the default type of the
6173     parts is message/rfc822 instead of
6174     text/plain, saving on the number
6175     of headers for the parts).
6176    
6177     application/postscript A PostScript document.
6178     (PostScript is a trademark of
6179     Adobe.)
6180    
6181     Other top-level types exist for still images, audio, and
6182     video samples.
6183    
6184     Some of the above types require the ability to transport
6185     binary data. Since the existing message systems usually do
6186     not support this, MIME provides a Content-Transfer-Encoding
6187     header to indicate the kind of encoding used. The possible
6188     encodings are:
6189    
6190     7bit No encoding; the data consists of short
6191     (less than 1000 characters) lines of
6192     7-bit ASCII data, delimited by EOL
6193     sequences. This is the default encod-
6194     ing.
6195    
6196     8bit Like 7bit, except that bytes with the
6197     high-order bit set may be present.
6198     Many transmission paths are incapable
6199    
6200    
6201    
6202     2 June 1994 - 94 - expires 15 July 1994
6203    
6204    
6205    
6206    
6207    
6208     INTERNET DRAFT to be NEWS sec. B
6209    
6210    
6211     of carrying messages which use this
6212     encoding.
6213    
6214     binary No encoding; any sequence of bytes may
6215     be present. Many transmission paths
6216     are incapable of carrying messages
6217     which use this encoding.
6218    
6219     base64 The data is encoded by representing
6220     every group of 3 bytes as 4 characters
6221     from the alphabet "A-Za-z0-9+/", which
6222     was chosen for its high robustness
6223     through mail gateways (the alphabet
6224     used by uuencode does not survive
6225     ASCII-EBCDIC-ASCII translations). In
6226     the final group of 4 characters, "=" is
6227     used for those characters not repre-
6228     senting data bytes. Line length is
6229     limited and EOLs in the encoded form
6230     are ignored.
6231    
6232     quoted-printable Any byte can be represented by a three
6233     character "=XX" sequence where the X's
6234     are upper case hexadecimal digits.
6235     Bytes representing printable 7-bit US-
6236     ASCII characters except "=" may be rep-
6237     resented literally. Tabs and blanks
6238     may be represented literally if not at
6239     the end of a line. Line length is lim-
6240     ited, and an EOL preceded by "=" was
6241     inserted for this purpose and is not
6242     present in the original.
6243    
6244     The base64 and quoted-printable encodings are applied to
6245     data in Internet canonical form, which means that any EOL
6246     encoded as anything but EOL must be an Internet canonical
6247     EOL: CR followed by LF.
6248    
6249     The Content-Description header allows further description of
6250     a body part, analogous to the use of Subject for messages.
6251    
6252     Finally, the Content-ID header can be used to assign an
6253     identification to body parts, analogous to the assignment of
6254     identifications to messages by Message-ID.
6255    
6256     Note that most of these headers are structured header
6257     fields, as defined in RFC 822. Consequently, comments are
6258     allowed in their values. The following is a legal MIME
6259     header:
6260    
6261     Content-Type: (a comment) text (yeah) /
6262     plain (and now some params:) ; charset= (guess what)
6263     iso-8859-1 (we don't have iso-10646 yet, pity)
6264    
6265    
6266    
6267    
6268     2 June 1994 - 95 - expires 15 July 1994
6269    
6270    
6271    
6272    
6273    
6274     INTERNET DRAFT to be NEWS sec. B
6275    
6276    
6277     NOTE: Although the MIME specification was devel-
6278     oped for mail, there is nothing precluding its use
6279     for news as well. While it might simplify imple-
6280     mentation to restrict the MIME headers somewhat,
6281     in the same way that other news headers (e.g.
6282     From) are restricted subsets of the RFC-822 origi-
6283     nals, this would add yet another divergence
6284     between two formats that ought to be as compatible
6285     as possible. In the case of the MIME headers,
6286     there is no body of existing code posing compati-
6287     bility concerns. A full-featured MIME reading
6288     agent needs a full RFC-822 parser anyway, to prop-
6289     erly handle body parts of types like mes-
6290     sage/rfc822, so there is little gain from
6291     restricting MIME headers. Adopting the MIME spec-
6292     ification unchanged seems best. However, article-
6293     level MIME headers must still comply with the
6294     overall news header syntax given in section 4, so
6295     that news software which is NOT interested in MIME
6296     need not contain a full RFC-822 parser.
6297    
6298     The second part of MIME, RFC 1342 (Representation of Non-
6299     ASCII Text in Internet Message Headers), addresses the prob-
6300     lem of non-ASCII characters in headers. An example of a
6301     header using the RFC 1342 mechanism is
6302    
6303     From: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
6304    
6305     Such encodings are allowed in selected headers, subject to
6306     the restrictions listed in RFC 1342.
6307    
6308     The MIME effort has also produced an RFC defining a Content-
6309     MD5 header [rrr 1544], containing an MD5-based "checksum" of
6310     the contents of an article or body part, giving high confi-
6311     dence of detecting accidental modifications to the contents.
6312    
6313     The "metamail" software package [rrr] helps provide MIME
6314     support with minimal changes to mailers, and may also be
6315     relevant to news reading agents.
6316    
6317     The PEM (Privacy Enhanced Mail) effort is pursuing analogous
6318     facilities to offer stronger guarantees against malicious
6319     modifications, unauthorized eavesdropping, and forgery.
6320     This work too may be applicable to news, once it is recon-
6321     ciled with MIME (by efforts now underway).
6322    
6323    
6324     C. Summary of Changes Since RFC 1036
6325    
6326     This Draft is much longer than RFC 1036, so there is obvi-
6327     ously much change in content. Much of this is just
6328     increased precision and rigor. Noteworthy changes and addi-
6329     tions include:
6330    
6331    
6332    
6333    
6334     2 June 1994 - 96 - expires 15 July 1994
6335    
6336    
6337    
6338    
6339    
6340     INTERNET DRAFT to be NEWS sec. C
6341    
6342    
6343     + section 4.3's restrictions on article bodies
6344    
6345     + all references to MIME facilities
6346    
6347     + size limits on articles
6348    
6349     + precise specification of Date-content syntax
6350    
6351     + message IDs must never be re-used, ever
6352    
6353     + "!" is the only Path delimiter
6354    
6355     + multiple moderators in the Approved header
6356    
6357     + rules on References trimming, and the _-_ mechanism
6358    
6359     + generalization of the Xref rules
6360    
6361     + multiple message IDs in Cancel and Supersedes
6362    
6363     + Also-Control
6364    
6365     + See-Also
6366    
6367     + Article-Names
6368    
6369     + Article-Replacing
6370    
6371     + more precise rules for cancellation
6372    
6373     + cancellation authorization based on From, not Sender
6374    
6375     + "unmoderated" and descriptors in newgroup messages
6376    
6377     + restrictive rules on handling of sendsys and version
6378     messages
6379    
6380     + the whogets control message
6381    
6382     + precise specification of checkgroups messages
6383    
6384     + compression type preferably specified out-of-band
6385    
6386     + rules for encapsulating news in MIME mail
6387    
6388     + tighter specification of relayer functioning (section
6389     9.1)
6390    
6391     + the "newsmaster" contact address
6392    
6393     + rules for gatewaying (section 10)
6394    
6395     + discussion of security issues (section 11)
6396    
6397    
6398    
6399    
6400     2 June 1994 - 97 - expires 15 July 1994
6401    
6402    
6403    
6404    
6405    
6406     INTERNET DRAFT to be NEWS sec. C
6407    
6408    
6409     D. Summary of Completely New Features
6410    
6411     Most of this Draft merely documents existing practice, but
6412     there are a few attempts to extend it. These are:
6413    
6414     TBW
6415    
6416    
6417     E. Summary of Differences From RFC 822+1123
6418    
6419     The following are noteworthy differences between this
6420     Draft's articles and MAIL messages:
6421    
6422     + generally less-permissive header syntax
6423    
6424     + notably, limited From syntax
6425    
6426     + MAIL header comments allowed in only a few contexts
6427    
6428     + slightly more restricted message-ID syntax
6429    
6430     + several more mandatory headers
6431    
6432     + duplicate headers forbidden
6433    
6434     + References/See-Also versus In-Reply-To/References
6435     (section 6.5)
6436    
6437     + case sensitivity in some contexts
6438    
6439     + point-to-point headers, e.g. To and Cc, forbidden
6440     (section 6)
6441    
6442     + several new headers
6443    
6444    
6445     References
6446    
6447     [Sanderson] "Smileys", David Sanderson, O'Reilly & Associ-
6448     ates Ltd., 1993.
6449    
6450     TBW
6451    
6452    
6453     Security Considerations
6454    
6455     Section 11 discusses security considerations in detail.
6456    
6457    
6458     Author's Address
6459    
6460    
6461    
6462    
6463    
6464    
6465    
6466     2 June 1994 - 98 - expires 15 July 1994
6467    
6468    
6469    
6470    
6471    
6472     INTERNET DRAFT to be NEWS sec. -
6473    
6474    
6475     Henry Spencer
6476     henry@zoo.toronto.edu
6477    
6478     SP Systems
6479     Box 280 Stn. A
6480     Toronto, Ont. M5W1B2 Canada
6481    
6482    
6483    
6484    
6485    
6486    
6487    
6488    
6489    
6490    
6491    
6492    
6493    
6494    
6495    
6496    
6497    
6498    
6499    
6500    
6501    
6502    
6503    
6504    
6505    
6506    
6507    
6508    
6509    
6510    
6511    
6512    
6513    
6514    
6515    
6516    
6517    
6518    
6519    
6520    
6521    
6522    
6523    
6524    
6525    
6526    
6527    
6528    
6529    
6530    
6531    
6532     2 June 1994 - 99 - expires 15 July 1994
6533    
6534    

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24