/[suikacvs]/www/2005/pre-id/son-of-rfc1036.txt
Suika

Contents of /www/2005/pre-id/son-of-rfc1036.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Fri Mar 11 05:36:57 2005 UTC (19 years, 8 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2
3
4 INTERNET DRAFT to be NEWS sec. -
5
6
7
8
9
10 News Article Format and Transmission
11
12 Henry Spencer
13
14
15
16 Status of this Memo
17
18 This document is intended to become an Internet Draft.
19 Internet Drafts are working documents of the Internet Engi-
20 neering Task Force (IETF), its Areas, and its Working
21 Groups. Note that other groups may also distribute working
22 documents as Internet Drafts.
23
24 Internet Drafts are draft documents valid for a maximum of
25 six months. Internet Drafts may be updated, replaced, or
26 obsoleted by other documents at any time. It is not appro-
27 priate to use Internet Drafts as reference material or to
28 cite them other than as a "working draft" or "work in
29 progress".
30
31 Please check the I-D abstract listing contained in each
32 Internet Draft directory to learn the current status of this
33 or any other Internet Draft. (Actually, this draft is at
34 too early a stage to even be listed there yet.)
35
36 It is hoped that a later version of this Draft will obsolete
37 RFC 1036 and will become an Internet standard.
38
39 References to the "successor to this Draft" refer not to
40 later versions of this draft, but to a hypothetical future
41 rewrite of this Draft (in the same way that this Draft is a
42 rewrite of RFC 1036).
43
44 Distribution of this memo is unlimited.
45
46
47 Abstract
48
49 This Draft defines the format and procedures for interchange
50 of network news articles. It is hoped that a later version
51 of this Draft will obsolete RFC 1036, reflecting more recent
52 experience and accommodating future directions.
53
54 Network news articles resemble mail messages but are broad-
55 cast to potentially-large audiences, using a flooding algo-
56 rithm that propagates one copy to each interested host (or
57 group thereof), typically stores only one copy per host, and
58 does not require any central administration or systematic
59 registration of interested users. Network news originated
60 as the medium of communication for Usenet, circa 1980.
61
62
63
64 2 June 1994 - 1 - expires 15 July 1994
65
66
67
68
69
70 INTERNET DRAFT to be NEWS sec. -
71
72
73 Since then Usenet has grown explosively, and many Internet
74 sites participate in it. In addition, the news technology
75 is now in widespread use for other purposes, on the Internet
76 and elsewhere.
77
78 This Draft primarily codifies and organizes existing prac-
79 tice. A few small extensions have been added in an attempt
80 to solve problems that are considered serious. Major exten-
81 sions (e.g. cryptographic authentication) that need signifi-
82 cant development effort are left to be undertaken as inde-
83 pendent efforts.
84
85
86 Table of Contents
87
88 TBW
89
90
91 1. Introduction
92
93 Network news articles resemble mail messages but are broad-
94 cast to potentially-large audiences, using a flooding algo-
95 rithm that propagates one copy to each interested host (or
96 groups thereof), typically stores only one copy per host,
97 and does not require any central administration or system-
98 atic registration of interested users. Network news origi-
99 nated as the medium of communication for Usenet, circa 1980.
100 Since then Usenet has grown explosively, and many Internet
101 sites participate in it. In addition, the news technology
102 is now in widespread use for other purposes, on the Internet
103 and elsewhere.
104
105 The earliest news interchange used the so-called "A News"
106 article format. Shortly thereafter, an article format
107 vaguely resembling Internet mail was devised and used
108 briefly. Both of those formats are completely obsolete;
109 they are documented in appendix A for historical reasons
110 only. With publication of RFC 850 [rrr] in 1983, news arti-
111 cles came to closely resemble Internet mail messages, with
112 some restrictions and some additional headers. RFC 1036
113 [rrr] in 1987 updated RFC 850 without making major changes.
114
115 In the intervening five years, the RFC 1036 article format
116 has proven quite satisfactory, although minor extensions
117 appear desirable to match recent developments in areas such
118 as multi-media mail. RFC 1036 itself has not proven quite
119 so satisfactory. It is often rather vague and does not
120 address some issues at all; this has caused significant
121 interoperability problems at times, and implementations have
122 diverged somewhat. Worse, although it was intended primar-
123 ily to document existing practice, it did not precisely
124 match existing practice even at the time it was published,
125 and the deviations have grown since.
126
127
128
129
130 2 June 1994 - 2 - expires 15 July 1994
131
132
133
134
135
136 INTERNET DRAFT to be NEWS sec. 1
137
138
139 This Draft attempts to specify the format of articles, and
140 the procedures used to exchange them and process them, in
141 sufficient detail to allow full interoperability. In addi-
142 tion, some tentative suggestions are made about directions
143 for future development, in an attempt to avert unnecessary
144 divergence and consequent loss of interoperability. Major
145 extensions (e.g. cryptographic authentication) that need
146 significant development effort are left to be undertaken as
147 independent efforts.
148
149 NOTE: One question this all may raise is: why is
150 there no News-Version header, analogous to MIME-
151 Version, specifying a version number corresponding
152 to this specification? The answer is: it doesn't
153 appear to be useful, given news's backward-
154 compatibility constraints. The major use of a
155 version number is indicating which of several
156 INCOMPATIBLE interpretations is relevant. The
157 impossibility of orchestrating any sort of simul-
158 taneous change over news's installed base makes it
159 necessary to avoid such incompatible changes (as
160 opposed to extensions) entirely. MIME has a ver-
161 sion number mostly because it introduced incompat-
162 ible changes to the interpretation of several
163 "Content-" headers. This Draft attempts no
164 changes in interpretation and it appears doubtful
165 that future Drafts will find it feasible to intro-
166 duce any.
167
168 UNRESOLVED ISSUE: Should this be reconsidered?
169 Only if the header has SPECIFIC IDENTIFIABLE uses
170 today. Otherwise it's just useless added bulk.
171
172 As in this Draft's predecessors, the exact means used to
173 transmit articles from one host to another is not specified.
174 NNTP [rrr] is probably the most common transmission method
175 on the Internet, but a number of others are known to be in
176 use, including the UUCP protocol [rrr] extensively used in
177 the early days of Usenet and still much used on its fringes
178 today.
179
180 Several of the mechanisms described in this Draft may seem
181 somewhat strange or even bizarre at first reading. As with
182 Internet mail, there is no reasonable possibility of updat-
183 ing the entire installed base of news software promptly, so
184 interoperability with old software is crucial and will
185 remain so. Compatibility with existing practice and robust-
186 ness in an imperfect world necessarily take priority over
187
188
189
190
191
192
193
194
195
196 2 June 1994 - 3 - expires 15 July 1994
197
198
199
200
201
202 INTERNET DRAFT to be NEWS sec. 1
203
204
205 elegance.
206
207
208 2. Definitions, Notations, and Conventions
209
210
211 2.1. Textual Notations
212
213 Throughout this Draft, "MAIL" is short for "RFC 822 [rrr] as
214 amended by RFC 1123 [rrr]". (RFC 1123's amendments are
215 mostly relatively small, but they are not insignificant.)
216 See also the discussion in section 3 about this Draft's
217 relationship to MAIL. "MIME" is short for "RFCs 1341 and
218 1342" (or their updated replacements).
219
220 UNRESOLVED ISSUE: Update these numbers.
221
222 "ASCII" is short for "the ANSI X3.4 character set" [rrr].
223 While "ASCII" is often misused to refer to various character
224 sets somewhat similar to X3.4, in this Draft, "ASCII" means
225 X3.4 and only X3.4.
226
227 NOTE: The name is traditional (to the point where
228 the ANSI standard sanctions it) even though it is
229 no longer an acronym for the name of the standard.
230
231 NOTE: ASCII, X3.4, contains 128 characters, not
232 all of them printable. Character sets with more
233 characters are not ASCII, although they may
234 include it as a subset.
235
236 Certain words used to define the significance of individual
237 requirements are capitalized. "MUST" means that the item is
238 an absolute requirement of the specification. "SHOULD"
239 means that the item is a strong recommendation: there may be
240 valid reasons to ignore it in unusual circumstances, but
241 this should be done only after careful study of the full
242 implications and a firm conclusion that it is necessary,
243 because there are serious disadvantages to doing so. "MAY"
244 means that the item is truly optional, and implementors and
245 users are warned that conformance is possible but not to be
246 relied on.
247
248 The term "compliant", applied to implementations etc., indi-
249 cates satisfaction of all relevant "MUST" and "SHOULD"
250 requirements. The term "conditionally compliant" indicates
251 satisfaction of all relevant "MUST" requirements but viola-
252 tion of at least one relevant "SHOULD" requirement.
253
254 This Draft contains explanatory notes using the following
255 format. These may be skipped by persons interested solely
256 in the content of the specification. The purpose of the
257 notes is to explain why choices were made, to place them in
258 context, or to suggest possible implementation techniques.
259
260
261
262 2 June 1994 - 4 - expires 15 July 1994
263
264
265
266
267
268 INTERNET DRAFT to be NEWS sec. 2.1
269
270
271 NOTE: While such explanatory notes may seem super-
272 fluous in principle, they often help the less-
273 than-omniscient reader grasp the purpose of the
274 specification and the constraints involved. Given
275 the limitations of natural language for descrip-
276 tive purposes, this improves the probability that
277 implementors and users will understand the true
278 intent of the specification in cases where the
279 wording is not entirely clear.
280
281 All numeric values are given in decimal unless otherwise
282 indicated. Octets are assumed to be unsigned values for
283 this purpose. Large numbers are written using the North
284 American convention, in which "," separates groups of three
285 digits but otherwise has no significance.
286
287
288 2.2. Syntax Notation
289
290 Although the mechanisms specified in this Draft are all
291 described in prose, most are also described formally in the
292 modified BNF notation of RFC 822. Implementors will need to
293 be familiar with this notation to fully understand this
294 specification, and are referred to RFC 822 for a complete
295 explanation of the modified BNF notation. Here is a brief
296 illustrative example:
297
298 sentence = clause *( punct clause ) "."
299 punct = ":" / ";"
300 clause = 1*word [ "(" clause ")" / "," 1*word ]
301 word = <any English word>
302
303 This defines a sentence as some clauses separated by puncts
304 and ended by a period, a punct as a colon or semicolon, a
305 clause as at least one <word> optionally followed by either
306 a parenthesized clause or a comma and at least one more
307 <word>, and a <word> as (informally) any English word. <>
308 are used to enclose names when (and only when) distinguish-
309 ing them from surrounding text is useful. The full form of
310 the repetition notation is <m>"*"<n><thing>, denoting <m>
311 through <n> repetitions of <thing>; <m> defaults to zero,
312 <n> to infinity, and the "*" and <n> can be omitted if <m>
313 and <n> are equal, so 1*word is one or more words, 1*5word
314 is one through five words, and 2word is exactly two words.
315
316 The character "\" is not special in any way in this nota-
317 tion.
318
319 This Draft is intended to be self-contained; all syntax
320 rules used in it are defined within it, and a rule with the
321 same name as one found in MAIL does not necessarily have the
322 same definition. The lexical layer of MAIL is NOT, repeat
323 NOT, used in this Draft, and its presence must not be
324 assumed; notably, this Draft spells out all places where
325
326
327
328 2 June 1994 - 5 - expires 15 July 1994
329
330
331
332
333
334 INTERNET DRAFT to be NEWS sec. 2.2
335
336
337 white space is permitted/required and all places where con-
338 structs resembling MAIL comments can occur.
339
340 NOTE: News parsers historically have been much
341 less permissive than MAIL parsers.
342
343
344 2.3. Definitions
345
346 The term "character set", wherever it is used in this Draft,
347 refers to a coded character set, in the sense of ISO charac-
348 ter set standardization work, and must not be misinterpreted
349 as meaning merely "a set of characters".
350
351 In this Draft, ASCII character 32 is referred to as "blank";
352 the word "space" has a more generic meaning.
353
354 An "article" is the unit of news, analogous to a MAIL "mes-
355 sage".
356
357 A "poster" is a human being (or software equivalent) submit-
358 ting a possibly-compliant article to be "posted": made
359 available for reading on all relevant hosts. A "posting
360 agent" is software that assists posters to prepare articles,
361 including determining whether the final article is compli-
362 ant, passing it on to a relayer for posting if so, and
363 returning it to the poster with an explanation if not. A
364 "relayer" is software which receives allegedly-compliant
365 articles from posting agents and/or other relayers, files
366 copies in a "news database", and possibly passes copies on
367 to other relayers.
368
369 NOTE: While the same software may well function
370 both as a relayer and as part of a posting agent,
371 the two functions are distinct and should not be
372 confused. The posting agent's purpose is (in
373 part) to validate an article, supply header infor-
374 mation that can or should be supplied automati-
375 cally, and generally take reasonable actions in an
376 attempt to transform the poster's submission into
377 a compliant article. The relayer's purpose is to
378 move already-compliant articles around efficiently
379 without damaging them.
380
381 A "reader" is a human being reading news articles. A "read-
382 ing agent" is software which presents articles to a reader.
383
384 NOTE: Informal usage often uses "reader" for both
385 these meanings, but this introduces considerable
386 potential for confusion and misunderstanding, so
387 this Draft takes care to make the distinction.
388
389 A "newsgroup" is a single news forum, a logical bulletin
390 board, having a name and nominally intended for articles on
391
392
393
394 2 June 1994 - 6 - expires 15 July 1994
395
396
397
398
399
400 INTERNET DRAFT to be NEWS sec. 2.3
401
402
403 a specific topic. An article is "posted to" a single news-
404 group or several newsgroups. When an article is posted to
405 more than one newsgroup, it is said to be "cross-posted";
406 note that this differs from posting the same text as part of
407 each of several articles, one per newsgroup. A "hierarchy"
408 is the set of all newsgroups whose names share a first com-
409 ponent (see the name syntax in section 5.5).
410
411 A newsgroup may be "moderated", in which case submissions
412 are not posted directly, but mailed to a "moderator" for
413 consideration and possible posting. Moderators are typi-
414 cally human but may be implemented partially or entirely in
415 software.
416
417 A "followup" is an article containing a response to the con-
418 tents of an earlier article (the followup's "precursor"). A
419 "followup agent" is a combination of reading agent and post-
420 ing agent that aids in the preparation and posting of a fol-
421 lowup.
422
423 Text comparisons are "case-sensitive" if they consider
424 uppercase letters (e.g. "A") different from lowercase let-
425 ters (e.g. "a"), and "case-insensitive" if letters differing
426 only in case (e.g. "A" and "a") are considered identical.
427 Categories of text are said to be case-(in)sensitive if com-
428 parisons of such texts to others are case-(in)sensitive.
429
430 A "cooperating subnet" is a set of news-exchanging hosts
431 which is sufficiently well-coordinated (typically via a cen-
432 tral administration of some sort) that stronger assumptions
433 can be made about hosts in the set than about news hosts in
434 general. This is typically used to relax restrictions which
435 are otherwise required for worst-case interoperability; mem-
436 bers of a cooperating subnet MAY interchange articles that
437 do not conform to this Draft's specifications, provided all
438 members have agreed to this and provided the articles are
439 not permitted to leak out of the subnet. The word "subnet"
440 is used to emphasize that a cooperating subnet is typically
441 not an isolated universe; care must be taken that traffic
442 leaving the subnet complies with the restrictions of the
443 larger net, not just those of the cooperating subnet.
444
445 A "message ID" is a unique identifier for an article, usu-
446 ally supplied by the posting agent which posted it. It dis-
447 tinguishes the article from every other article ever posted
448 anywhere (in theory). Articles with the same message ID are
449 treated as identical copies of the same article even if they
450 are not in fact identical.
451
452 A "gateway" is software which receives news articles and
453 converts them to messages of some other kind (e.g. mail to a
454 mailing list), or vice-versa; in essence it is a translating
455 relayer that straddles boundaries between different methods
456 of message exchange. The most common type of gateway
457
458
459
460 2 June 1994 - 7 - expires 15 July 1994
461
462
463
464
465
466 INTERNET DRAFT to be NEWS sec. 2.3
467
468
469 connects newsgroup(s) to mailing list(s), either unidirec-
470 tionally or bidirectionally, but there are also gateways
471 between news networks using this Draft's news format and
472 those using other formats.
473
474 A "control message" is an article which is marked as con-
475 taining control information; a relayer receiving such an
476 article will (subject to permissions etc.) take actions
477 beyond just filing and passing on the article.
478
479 NOTE: "Control article" would be more consistent
480 terminology, but "control message" is already well
481 established.
482
483 An article's "reply address" is the address to which mailed
484 replies should be sent. This is the address specified in
485 the article's From header (see section 5.2), unless it also
486 has a Reply-To header (see section 6.3).
487
488 The notation (e.g.) "(ASCII 17)" following a name means
489 "this name refers to the ASCII character having value 17".
490 An "ASCII printable character" is an ASCII character in the
491 range 33-126. An "ASCII control character" is an ASCII
492 character in the range 0-31, or the character DEL (ASCII
493 127). A "non-ASCII character" is a character having a value
494 exceeding 127.
495
496 NOTE: Blank is neither an "ASCII printable charac-
497 ter" nor an "ASCII control character".
498
499
500 2.4. End Of Line
501
502 How the end of a text line is represented depends on the
503 context and the implementation. For Internet transmission
504 via protocols such as SMTP [rrr], an end-of-line is a CR
505 (ASCII 13) followed by an LF (ASCII 10). ISO C [rrr] and
506 many modern operating systems indicate end-of-line with a
507 single character, typically ASCII LF (aka "newline"), and
508 this is the normal convention when news is transmitted via
509 UUCP. A variety of other methods are in use, including out-
510 of-band methods in which there is no specific character that
511 means end-of-line.
512
513 This Draft does not constrain how end-of-line is represented
514 in news, except that characters other than CR and LF MUST
515 not be usurped for use in end-of-line representations.
516 Also, obviously, all software dealing with a particular copy
517 of an article must agree on the convention to be used.
518 "EOL" is used to mean "whatever end-of-line representation
519 is appropriate"; it is not necessarily a character or
520 sequence of characters.
521
522
523
524
525
526 2 June 1994 - 8 - expires 15 July 1994
527
528
529
530
531
532 INTERNET DRAFT to be NEWS sec. 2.4
533
534
535 NOTE: If faced with picking an EOL representation
536 in the absence of other constraints, use of a sin-
537 gle character simplifies processing, and the ASCII
538 standard [rrr] specifies that if one character is
539 to be used for this purpose, it should be LF
540 (ASCII 10).
541
542 NOTE: Inside MIME encodings, use of the Internet
543 canonical EOL representation (CR followed by LF)
544 is mandatory. See [rrr].
545
546
547 2.5. Case-Sensitivity
548
549 Text in newsgroup names, header parameters, etc. is case-
550 sensitive unless stated otherwise.
551
552 NOTE: This is at variance with MAIL, which is
553 case-insensitive unless stated otherwise, but is
554 consistent with news historical practice and
555 existing news software. See the comments on back-
556 ward compatibility in section 1.
557
558
559 2.6. Language
560
561 Various constant strings in this Draft, such as header names
562 and month names, are derived from English words. Despite
563 their derivation, these words do NOT change when the poster
564 or reader employing them is interacting in a language other
565 than English. Posting and reading agents SHOULD translate
566 as appropriate in their interaction with the poster or
567 reader, but the forms that actually appear in articles are
568 always the English-derived ones defined in this Draft.
569
570
571 3. Relation To MAIL (RFC 822 etc.)
572
573 The primary intent of this Draft is to completely describe
574 the news article format as a subset of MAIL's message format
575 augmented by some new headers. Unless explicitly noted oth-
576 erwise, the intent throughout is that an article MUST also
577 be a valid MAIL message.
578
579 NOTE: Despite obvious similarities between news
580 and mail, opinions vary on whether it is possible
581 or desirable to unify them into a single service.
582 However, it is unquestionably both possible and
583 useful to employ some of the same tools for manip-
584 ulating both mail messages and news articles, so
585 there is specific advantage to be had in defining
586 them compatibly. Furthermore, there is no appar-
587 ent need to re-invent the wheel when slight exten-
588 sions to an existing definition will suffice.
589
590
591
592 2 June 1994 - 9 - expires 15 July 1994
593
594
595
596
597
598 INTERNET DRAFT to be NEWS sec. 3
599
600
601 Given that this Draft attempts to be self-contained, it
602 inevitably contains considerable repetition of information
603 found in MAIL. This raises the possibility of unintentional
604 conflicts. Unless specifically noted otherwise, any wording
605 in this Draft which permits behavior that is not MAIL-
606 compliant is erroneous and should be followed only to the
607 extent that the result remains compliant with MAIL.
608
609 NOTE: RFC 1036 said "where this standard conflicts
610 with [RFC 822], RFC-822 should be considered cor-
611 rect and this standard in error". Taken liter-
612 ally, this was obviously incorrect, since RFC 1036
613 imposed a number of restrictions not found in RFC
614 822. The intent, however, was reasonable: to
615 indicate that UNINTENTIONAL differences were
616 errors in RFC 1036.
617
618 Implementors and users should note that MAIL is deliberately
619 an extensible standard, and most extensions devised for mail
620 are also relevant to (and compatible with) news. Note par-
621 ticularly MIME [rrr], summarized briefly in appendix B,
622 which extends MAIL in a number of useful ways that are defi-
623 nitely relevant to news. Also of note is the work in
624 progress on reconciling PEM (Privacy Enhanced Mail, which
625 defines extensions for authentication and security) with
626 MIME, after which this may also be relevant to news.
627
628 UNRESOLVED ISSUE: Update the MIME/PEM information.
629
630 Similarly, descriptions here of MIME facilities should be
631 considered correct only to the extent that they do not
632 require or legitimize practices that would violate those
633 RFCs. (Note that this Draft does extend the application of
634 some MIME facilities, but this is an extension rather than
635 an alteration.)
636
637
638 4. Basic Format
639
640
641 4.1. Overall Syntax
642
643 The overall syntax of a news article is:
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658 2 June 1994 - 10 - expires 15 July 1994
659
660
661
662
663
664 INTERNET DRAFT to be NEWS sec. 4.1
665
666
667 article = 1*header separator body
668 header = start-line *continuation
669 start-line = header-name ":" space [ nonblank-text ] eol
670 continuation = space nonblank-text eol
671 header-name = 1*name-character *( "-" 1*name-character )
672 name-character = letter / digit
673 letter = <ASCII letter A-Z or a-z>
674 digit = <ASCII digit 0-9>
675 separator = eol
676 body = *( [ nonblank-text / space ] eol )
677 eol = <EOL>
678 nonblank-text = [ space ] text-character *( space-or-text )
679 text-character = <any ASCII character except NUL (ASCII 0),
680 HT (ASCII 9), LF (ASCII 10), CR (ASCII 13),
681 or blank (ASCII 32)>
682 space = 1*( <HT (ASCII 9)> / <blank (ASCII 32)> )
683 space-or-text = space / text-character
684
685 An article consists of some headers followed by a body. An
686 empty line separates the two. The headers contain struc-
687 tured information about the article and its transmission. A
688 header begins with a header name identifying it, and can be
689 continued onto subsequent lines by beginning the continua-
690 tion line(s) with white space. (Note that section 4.2.3
691 adds some restrictions to the header syntax indicated here.)
692 The body is largely-unstructured text significant only to
693 the poster and the readers.
694
695 NOTE: Terminology here follows the current custom
696 in the news community, rather than the MAIL con-
697 vention of (sometimes) referring to what is here
698 called a "header" as a "header field" or "field".
699
700 Note that the separator line must be truly empty, not just a
701 line containing white space. Further empty lines following
702 it are part of the body, as are empty lines at the end of
703 the article.
704
705 NOTE: Some systems make no distinction between
706 empty lines and lines consisting entirely of white
707 space; indeed, some systems cannot represent
708 entirely empty lines. The grammar's requirement
709 that header continuation lines contain some print-
710 able text is meant to ensure that the empty/space
711 distinction cannot confuse identification of the
712 separator line.
713
714 NOTE: It is tempting to authorize posting agents
715 to strip empty lines at the beginning and end of
716 the body, but such empty lines could possibly be
717 part of a preformatted document.
718
719 Implementors are warned that trailing white space, whether
720 alone on the line or not, MAY be significant in the body,
721
722
723
724 2 June 1994 - 11 - expires 15 July 1994
725
726
727
728
729
730 INTERNET DRAFT to be NEWS sec. 4.1
731
732
733 notably in early versions of the "uuencode" encoding for
734 binary data. Trailing white space MUST be preserved unless
735 the article is known to have originated within a cooperating
736 subnet that avoids using significant trailing white space,
737 and SHOULD be preserved regardless. Posters SHOULD avoid
738 using conventions or encodings which make trailing white
739 space significant; for encoding of binary data, MIME's
740 "base64" encoding is recommended. Implementors are warned
741 that ISO C implementations are not required to preserve
742 trailing white space, and special precautions may be neces-
743 sary in implementations which do not.
744
745 NOTE: Unfortunately, the signature-delimiter con-
746 vention (described in section 4.3.2) does use sig-
747 nificant trailing white space. It's too late to
748 fix this; there is work underway on defining an
749 organized signature convention as part of MIME,
750 which is a preferable solution in the long run.
751
752 Posters are warned that some very old relayer software mis-
753 behaves when the first non-empty line of an article body
754 begins with white space.
755
756
757 4.2. Headers
758
759
760 4.2.1. Names and Contents
761
762 Despite the restrictions on header-name syntax imposed by
763 the grammar, relayers and reading agents SHOULD tolerate
764 header names containing any ASCII printable character other
765 than colon (":", ASCII 58).
766
767 NOTE: MAIL header names can contain any ASCII
768 printable character (other than colon) in theory,
769 but in practice, arbitrary header names are known
770 to cause trouble for some news software. Section
771 4.1's restriction to alphanumeric sequences sepa-
772 rated by hyphens is believed to permit all widely-
773 used header names without causing problems for any
774 widely-used software. Software is nevertheless
775 encouraged to cope correctly with the full range
776 of possibilities, since aberrations are known to
777 occur.
778
779 Relayers MUST disregard headers not described in this Draft
780 (that is, with header names not mentioned in this Draft),
781 and pass them on unaltered.
782
783 Posters wishing to convey non-standard information in head-
784 ers SHOULD use header names beginning with "X-". No stan-
785 dard header name will ever be of this form. Reading agents
786 SHOULD ignore "X-" headers, or at least treat them with
787
788
789
790 2 June 1994 - 12 - expires 15 July 1994
791
792
793
794
795
796 INTERNET DRAFT to be NEWS sec. 4.2.1
797
798
799 great care.
800
801 The order of headers in an article is not significant. How-
802 ever, posting agents are encouraged to put mandatory headers
803 (see section 5) first, followed by optional headers (see
804 section 6), followed by headers not defined in this Draft.
805
806 NOTE: While relayers and reading agents must be
807 prepared to handle any order, having the signifi-
808 cant headers (the precise definition of "signifi-
809 cant" depends on context) first can noticeably
810 improve efficiency, especially in memory-limited
811 environments where it is difficult to buffer up an
812 arbitrary quantity of headers while searching for
813 the few that matter.
814
815 Header names are case-insensitive. There is a preferred
816 case convention, which posters and posting agents SHOULD
817 use: each hyphen-separated "word" has its initial letter (if
818 any) in uppercase and the rest in lowercase, except that
819 some abbreviations have all letters uppercase (e.g. "Mes-
820 sage-ID" and "MIME-Version"). The forms used in this Draft
821 are the preferred forms for the headers described herein.
822 Relayers and reading agents are warned that articles might
823 not obey this convention.
824
825 NOTE: Although software must be prepared for the
826 possibility of random use of case in header names
827 (and other case-independent text), establishing a
828 preferred convention reduces pointless diversity,
829 and may permit optimized software that looks for
830 the preferred forms before resorting to less-
831 efficient case-insensitive searches.
832
833 In general, a header can consist of several lines, with each
834 continuation line beginning with white space. The EOLs pre-
835 ceding continuation lines are ignored when processing such a
836 header, effectively combining the start-line and the contin-
837 uations into a single logical line. The logical line, less
838 the header name, colon, and any white space following the
839 colon, is the "header content".
840
841
842 4.2.2. Undesirable Headers
843
844 A header whose content is empty is said to be an empty
845 header. Relayers and reading agents SHOULD not consider
846 presence or absence of an empty header to alter the seman-
847 tics of an article (although syntactic rules, such as
848 requirements that certain header names appear at most once
849 in an article, MUST still be satisfied). Posting agents
850 SHOULD delete empty headers from articles before posting
851 them.
852
853
854
855
856 2 June 1994 - 13 - expires 15 July 1994
857
858
859
860
861
862 INTERNET DRAFT to be NEWS sec. 4.2.2
863
864
865 Headers that merely state defaults explicitly (e.g., a Fol-
866 lowup-To header with the same content as the Newsgroups
867 header, or a MIME Content-Type header with contents
868 "text/plain; charset=us-ascii") or state information that
869 reading agents can typically determine easily themselves
870 (e.g. the length of the body in octets) are redundant, con-
871 veying no information whatsoever. Headers that state infor-
872 mation which cannot possibly be of use to a significant num-
873 ber of relayers, reading agents, or readers (e.g., the name
874 of the software package used as the posting agent) are use-
875 less and pointless. Posters and posting agents SHOULD avoid
876 including redundant or useless headers in articles.
877
878 NOTE: Information that someone, somewhere, might
879 someday find useful is best omitted from headers.
880 (There's quite enough of it in article bodies.)
881 Headers should contain information of known util-
882 ity only. This is not meant to preclude inclusion
883 of information primarily meant for news-software
884 debugging, but such information should be included
885 only if there is real reason, preferably based on
886 experience, to suspect that it may be genuinely
887 useful. Articles passing through gateways are the
888 only obvious case where inclusion of debugging
889 information appears clearly legitimate. (See sec-
890 tion 10.1.)
891
892 NOTE: A useful rule of thumb for software imple-
893 mentors is: "if I had to pay a dollar a day for
894 the transmission of this header, would I still
895 think it worthwhile?".
896
897
898 4.2.3. White Space and Continuations
899
900 The colon following the header name on the start-line MUST
901 be followed by white space, even if the header is empty. If
902 the header is not empty, at least some of the content MUST
903 appear on the start-line. Posting agents MUST enforce these
904 restrictions, but relayers (etc.) SHOULD accept even arti-
905 cles that violate them.
906
907 NOTE: MAIL does not require white space after the
908 colon, but it is usual. RFC 1036 required the
909 white space, even in empty headers, and some
910 existing software demands it. In MAIL, and
911 arguably in RFC 1036 (although the wording is
912 vague), it is technically legitimate for the white
913 space to be part of a continuation line rather
914 than the start-line, but not all existing software
915 will accept this. Deleting empty headers and
916 placing some content on the start-line avoids this
917 issue... which is desirable because trailing
918 blanks, easily deleted by accident, are best not
919
920
921
922 2 June 1994 - 14 - expires 15 July 1994
923
924
925
926
927
928 INTERNET DRAFT to be NEWS sec. 4.2.3
929
930
931 made significant in headers.
932
933 In general, posters and posting agents SHOULD use blank
934 (ASCII 32), not tab (ASCII 9), where white space is desired
935 in headers. Existing software does not consistently accept
936 tab as synonymous with blank in all contexts. In particu-
937 lar, RFC 1036 appeared to specify that the character immedi-
938 ately following the colon after a header name was required
939 to be a blank, and some news software insists on that, so
940 this character MUST be a blank. Again, posting agents MUST
941 enforce these restrictions but relayers SHOULD be more tol-
942 erant.
943
944 Since the white space beginning a continuation line remains
945 a part of the logical line, headers can be "broken" into
946 multiple lines only at white space. Posting agents SHOULD
947 not break headers unnecessarily. Relayers SHOULD preserve
948 existing header breaks, and SHOULD not introduce new breaks.
949 Breaking headers SHOULD be a last resort; relayers and read-
950 ing agents SHOULD handle long header lines gracefully. (See
951 the discussion of size limits in section 4.6.)
952
953
954 4.3. Body
955
956 Although the article body is unstructured for most of the
957 purposes of this Draft, structure MAY be imposed on it by
958 other means, notably MIME headers (see appendix B).
959
960
961 4.3.1. Body Format Issues
962
963 The body of an article MAY be empty, although posting agents
964 SHOULD consider this an error condition (meriting returning
965 the article to the poster for revision). A posting agent
966 which does not reject such an article SHOULD issue a warning
967 message to the poster and supply a non-empty body. Note
968 that the separator line MUST be present even if the body is
969 empty.
970
971 NOTE: An empty body is probably a poster error
972 except, arguably, for some control messages... and
973 even they really ought to have a body explaining
974 the reason for the control message. Some old
975 reading agents are known to generate empty bodies
976 for "cancel" control messages, so posting agents
977 might opt not to reject body-less articles in such
978 cases (although it would be better to fix the
979 reading agents to request a body). However, some
980 existing news software is known to react badly to
981 body-less articles, hence the request for posting
982 agents to insert a body in such cases.
983
984
985
986
987
988 2 June 1994 - 15 - expires 15 July 1994
989
990
991
992
993
994 INTERNET DRAFT to be NEWS sec. 4.3.1
995
996
997 NOTE: A possible posting-agent-supplied body text
998 (already used by one widespread posting agent) is
999 "This article was probably generated by a buggy
1000 news reader.". (The use of "reader" to refer to
1001 the reading agent is traditional, although this
1002 Draft uses more precise terminology.)
1003
1004 NOTE: The requirement for the separator line even
1005 in a bodyless article is inherited from MAIL, and
1006 also distinguishes legitimately-bodyless articles
1007 from articles accidentally truncated in the middle
1008 of the headers.
1009
1010 Note that an article body is a sequence of lines terminated
1011 by EOLs, not arbitrary binary data, and in particular it
1012 MUST end with an EOL. However, relayers SHOULD treat the
1013 body of an article as an uninterpreted sequence of octets
1014 (except as mandated by changes of EOL representation and by
1015 control-message processing) and SHOULD avoid imposing con-
1016 straints on it. See also section 4.6.
1017
1018
1019 4.3.2. Body Conventions
1020
1021 Although body lines can in principle be very long (see sec-
1022 tion 4.6 for some discussion of length limits), posters
1023 SHOULD restrict body line lengths to circa 70-75 characters.
1024 On systems where text is conventionally stored with EOLs
1025 only at paragraph breaks and other "hard return" points,
1026 with software breaking lines as appropriate for display or
1027 manipulation, posting agents SHOULD insert EOLs as necessary
1028 so that posted articles comply with this restriction.
1029
1030 NOTE: News originated in environments where line
1031 breaks in plain text files were supplied by the
1032 user, not the software. Be this good or bad, much
1033 reading-agent and posting-agent software assumes
1034 that news articles follow this convention, so it
1035 is often inconvenient to read or respond to arti-
1036 cles which violate it. The "70-75" number comes
1037 from the widespread use of display devices which
1038 are 80 columns wide, and the desire to leave a bit
1039 of margin for quoting etc. (see below).
1040
1041 Reading agents confronted with body lines much longer than
1042 the available output-device width SHOULD break lines as
1043 appropriate. Posters are warned that such breaks may not
1044 occur exactly where the poster intends.
1045
1046 NOTE: "As appropriate" would typically include
1047 breaking lines when supplying the text of an arti-
1048 cle to be quoted in a reply or followup, something
1049 that line-breaking reading agents often neglect to
1050 do now.
1051
1052
1053
1054 2 June 1994 - 16 - expires 15 July 1994
1055
1056
1057
1058
1059
1060 INTERNET DRAFT to be NEWS sec. 4.3.2
1061
1062
1063 Although styles vary widely, for plain text it is usual to
1064 use no left margin, leave the right edge ragged, use a sin-
1065 gle empty line to separate paragraphs, and employ normal
1066 natural-language usage on matters such as upper/lowercase.
1067 (In particular, articles SHOULD not be written entirely in
1068 uppercase. In environments where posters have access only
1069 to uppercase, posting agents SHOULD translate it to lower-
1070 case.)
1071
1072 NOTE: Most people find substantial bodies of text
1073 entirely in uppercase relatively hard to read,
1074 while all-lowercase text merely looks slightly
1075 odd. The common association of uppercase with
1076 strong emphasis adds to this.
1077
1078 Tone of voice does not carry well in written text, and mis-
1079 understandings are common when sarcasm, parody, or exaggera-
1080 tion for humorous effect is attempted without explicit warn-
1081 ing. It has become conventional to use the sequence ":-)",
1082 which (on most output devices) resembles a rotated "smiley
1083 face" symbol, as a marker for text not meant to be taken
1084 literally, especially when humor is intended. This practice
1085 aids communication and averts unintended ill-will; posters
1086 are urged to use it. A variety of analogous sequences are
1087 used with less-standardized meanings [Sanderson].
1088
1089 The order of arrival of news articles at a particular host
1090 depends somewhat on transmission paths, and occasionally
1091 articles are lost for various reasons. When responding to a
1092 previous article, posters SHOULD not assume that all readers
1093 understand the exact context. It is common to quote some of
1094 the previous article to establish context. This SHOULD be
1095 done by prefacing each quoted line (even if it is empty)
1096 with the character ">". This will result in multiple levels
1097 of ">" when quoted context itself contains quoted context.
1098
1099 NOTE: It may seem superfluous to put a prefix on
1100 empty lines, but it simplifies implementation of
1101 functions such as "skip all quoted text" in read-
1102 ing agents.
1103
1104 Readability is enhanced if quoted text and new text are sep-
1105 arated by an empty line.
1106
1107 Posters SHOULD edit quoted context to trim it down to the
1108 minimum necessary. However, posting agents SHOULD not
1109 attempt to enforce this by imposing overly-simplistic rules
1110 like "no more than 50% of the lines should be quotes".
1111
1112 NOTE: While encouraging trimming is desirable, the
1113 50% rule imposed by some old posting agents is
1114 both inadequate and counterproductive. Posters do
1115 not respond to it by being more selective about
1116 quoting; they respond by padding short responses,
1117
1118
1119
1120 2 June 1994 - 17 - expires 15 July 1994
1121
1122
1123
1124
1125
1126 INTERNET DRAFT to be NEWS sec. 4.3.2
1127
1128
1129 or by using different quoting styles to defeat
1130 automatic analysis. The former adds unnecessary
1131 noise and volume, while the latter also defeats
1132 more useful forms of automatic analysis that read-
1133 ing agents might wish to do.
1134
1135 NOTE: At the very least, if a minimum-unquoted
1136 quota is being set, article bodies shorter than
1137 (say) 20 lines, or perhaps articles which exceed
1138 the quota by only a few lines, should be exempt.
1139 This avoids the ridiculous situation of complain-
1140 ing about a 5-line response to a 6-line quote.
1141
1142 NOTE: A more subtle posting-agent rule, suggested
1143 for experimental use, is to reject articles that
1144 appear to contain quoted signatures (see below).
1145 This is almost certainly the result of a careless
1146 poster not bothering to trim down quoted context.
1147 Also, if a posting agent or followup agent pre-
1148 sents an article template to the poster for edit-
1149 ing, it really should take note of whether the
1150 poster actually made any changes, and refrain from
1151 posting an unmodified template.
1152
1153 Some followup agents supply "attribution" lines for quoted
1154 context, indicating where it first appeared and under whose
1155 name. When multiple levels of quoting are present and
1156 quoted context is edited for brevity, "inner" attribution
1157 lines are not always retained. The editing process is also
1158 somewhat error-prone. Reading agents (and readers) are
1159 warned not to assume that attributions are accurate.
1160
1161 UNRESOLVED ISSUE: Should a standard format for
1162 attribution lines be defined? There is already
1163 considerable diversity... but automatic news anal-
1164 ysis would be substantially aided by a standard
1165 convention.
1166
1167 Early difficulties in inferring return addresses from arti-
1168 cle headers led to "signatures": short closing texts, auto-
1169 matically added to the end of articles by posting agents,
1170 identifying the poster and giving his network addresses etc.
1171 If a poster or posting agent does append a signature to an
1172 article, the signature SHOULD be preceded with a delimiter
1173 line containing (only) two hyphens (ASCII 45) followed by
1174 one blank (ASCII 32). Posting agents SHOULD limit the
1175 length of signatures, since verbose excess bordering on
1176 abuse is common if no restraint is imposed; 4 lines is a
1177 common limit.
1178
1179 NOTE: While signatures are arguably a blemish,
1180 they are a well-understood convention, and convey-
1181 ing the same information in headers exposes it to
1182 mangling and makes it rather less conspicuous. A
1183
1184
1185
1186 2 June 1994 - 18 - expires 15 July 1994
1187
1188
1189
1190
1191
1192 INTERNET DRAFT to be NEWS sec. 4.3.2
1193
1194
1195 standard delimiter line makes it possible for
1196 reading agents to handle signatures specially if
1197 desired. (This is unfortunately hampered by
1198 extensive misunderstanding of, and misuse of, the
1199 delimiter.)
1200
1201 NOTE: The choice of delimiter is somewhat unfortu-
1202 nate, since it relies on preservation of trailing
1203 white space, but it is too well-established to
1204 change. There is work underway to define a more
1205 sophisticated signature scheme as part of MIME,
1206 and this will presumably supersede the current
1207 convention in due time.
1208
1209 NOTE: Four 75-column lines of signature text is
1210 300 characters, which is ample to convey name and
1211 mail-address information in all but the most
1212 bizarre situations.
1213
1214
1215 4.4. Characters And Character Sets
1216
1217 Header and body lines MAY contain any ASCII characters other
1218 than CR (ASCII 13), LF (ASCII 10), and NUL (ASCII 0).
1219
1220 NOTE: CR and LF are excluded because they clash
1221 with common EOL conventions. NUL is excluded
1222 because it clashes with the C end-of-string con-
1223 vention, which is significant to most existing
1224 news software. These three characters are
1225 unlikely to be transmitted successfully.
1226
1227 However, posters SHOULD avoid using ASCII control characters
1228 except for tab (ASCII 9), formfeed (ASCII 12), and backspace
1229 (ASCII 8). Tab signifies sufficient horizontal white space
1230 to reach the next of a set of fixed positions; posters are
1231 warned that there is no standard set of positions, so tabs
1232 should be avoided if precise spacing is essential. Formfeed
1233 signifies a point at which a reading agent SHOULD pause and
1234 await reader interaction before displaying further text.
1235 Backspace SHOULD be used only for underlining, done by a
1236 sequence of underscores (ASCII 95) followed by an equal num-
1237 ber of backspaces, signifying that the same number of text
1238 characters following are to be underlined. Posters are
1239 warned that underlining is not available on all output
1240 devices and is best not relied on for essential meaning.
1241 Reading agents SHOULD recognize underlining and translate it
1242 to the appropriate commands for devices that support it.
1243
1244 NOTE: Interpretation of almost all control charac-
1245 ters is device-specific to some degree, and
1246 devices differ. Tabs and underlining are sup-
1247 ported, to some extent, by most modern devices and
1248 reading agents, hence the cautious exemptions for
1249
1250
1251
1252 2 June 1994 - 19 - expires 15 July 1994
1253
1254
1255
1256
1257
1258 INTERNET DRAFT to be NEWS sec. 4.4
1259
1260
1261 them. The underlining method is specified because
1262 the inverse method, text and then underscores, is
1263 tempting to the naive... but if sent unaltered to
1264 a device that shows only the most recent of sev-
1265 eral overstruck characters rather than a compos-
1266 ite, the result can be utterly unreadable.
1267
1268 NOTE: A common interpretation of tab is that it is
1269 a request to space forward to the next position
1270 whose number is one more than a multiple of 8,
1271 with positions numbered sequentially starting at
1272 1. (So tab positions are 9, 17, 25, ...) Reading
1273 agents not constrained by existing system conven-
1274 tions might wish to use this interpretation.
1275
1276 NOTE: It will typically be necessary for a reading
1277 agent to catch and interpret formfeed, not just
1278 send it to the output device. The actions per-
1279 formed by typical output devices on receiving a
1280 formfeed are neither adequate for nor appropriate
1281 to the pause-for-interaction meaning.
1282
1283 Cooperating subnets which wish to employ non-ASCII character
1284 sets by using escape sequences (employing, e.g., ESC (ASCII
1285 27), SO (ASCII 14), and SI (ASCII 15)) to alter the meaning
1286 of superficially-ASCII characters MAY do so, but MUST use
1287 MIME headers to alert reading agents to the particular char-
1288 acter set(s) and escape sequences in use. A reading agent
1289 SHOULD not pass such an escape sequence through, unaltered,
1290 to the output device unless the agent confirms that the
1291 sequence is one used to affect character sets and has reason
1292 to believe that the device is capable of interpreting that
1293 particular sequence properly.
1294
1295 NOTE: Cooperating-subnet organizers are warned
1296 that some very old relayers strip certain control
1297 characters out of articles they pass along. ESC
1298 is known to be among the affected characters.
1299
1300 NOTE: There are now standard Internet encodings
1301 for Japanese [rrr] and Vietnamese [rrr] in partic-
1302 ular.
1303
1304 Articles MUST not contain any octet with value exceeding
1305 127, i.e. any octet that is not an ASCII character.
1306
1307 NOTE: This rule, like others, may be relaxed by
1308 unanimous consent of the members of a cooperating
1309 subnet, provided suitable precautions are taken to
1310 ensure that rule-violating articles do not leak
1311 out of the subnet. (This has already been done in
1312 many areas where ASCII is not adequate for the
1313 local language(s).) Beware that articles contain-
1314 ing non-ASCII octets in headers are a violation of
1315
1316
1317
1318 2 June 1994 - 20 - expires 15 July 1994
1319
1320
1321
1322
1323
1324 INTERNET DRAFT to be NEWS sec. 4.4
1325
1326
1327 the MAIL specifications and are not valid MAIL
1328 messages. MIME offers a way to encode non-ASCII
1329 characters in ASCII for use in headers; see sec-
1330 tion 4.5.
1331
1332 NOTE: While there is great interest in using 8-bit
1333 character sets, not all software can yet handle
1334 them correctly. Hence the restriction to cooper-
1335 ating subnets. MIME encodings can be used to
1336 transmit such characters while remaining within
1337 the octet restriction.
1338
1339 In anticipation of the day when it is possible to use non-
1340 ASCII characters safely anywhere, and to provide for the
1341 (substantial) cooperating subnets that are already using
1342 them, transmission paths SHOULD treat news articles as unin-
1343 terpreted sequences of octets (except perhaps for transfor-
1344 mations between EOL representations) and relayers SHOULD
1345 treat non-ASCII characters in articles as ordinary charac-
1346 ters.
1347
1348 NOTE: 8-bit enthusiasts are warned that not all
1349 software conforms to these recommendations yet.
1350 In particular, standard NNTP [rrr] is a 7-bit pro-
1351 tocol, and there may be implementations which
1352 enforce this rule. Be warned, also, that it will
1353 never be safe to send raw binary data in the body
1354 of news articles, because changes of EOL represen-
1355 tation may (will!) corrupt it.
1356
1357 Except where cooperating subnets permit more direct
1358 approaches, MIME [rrr] headers and encodings SHOULD be used
1359 to transmit non-ASCII content using ASCII characters; see
1360 section 4.5, appendix B, and the MIME RFCs for details. If
1361 article content can be expressed in ASCII, it SHOULD be.
1362 Failing that, the order of preference for character sets is
1363 that described in MIME [rrr].
1364
1365 NOTE: Using the MIME facilities, it is possible to
1366 transmit ANY character set, and ANY form of binary
1367 data, using only ASCII characters. Equally impor-
1368 tant, such articles are self-describing and the
1369 reading agent can tell which octet-to-symbol map-
1370 ping is intended! Designation of some preferred
1371 character sets is intended to minimize the number
1372 of character sets that a reading agent must under-
1373 stand in order to display most articles properly.
1374
1375 Articles containing non-ASCII characters, articles using
1376 ASCII characters (values 0 through 127) to refer to non-
1377 ASCII symbols, and articles using escape sequences to shift
1378 character sets SHOULD include MIME headers indicating which
1379 character set(s) and conventions are being used, and MUST do
1380 so unless such articles are strictly confined to a
1381
1382
1383
1384 2 June 1994 - 21 - expires 15 July 1994
1385
1386
1387
1388
1389
1390 INTERNET DRAFT to be NEWS sec. 4.4
1391
1392
1393 cooperating subnet which has its own pre-agreed conventions.
1394 MIME encodings are preferred over all these techniques. If
1395 it comes to a relayer's attention that it is being asked to
1396 pass an article using such techniques outward across what it
1397 knows to be the boundary of such a cooperating subnet, it
1398 MUST report this error to its administrator, and MAY refuse
1399 to pass the article beyond the subnet boundary. If it does
1400 pass the article, it MUST re-encode it with MIME encodings
1401 to make it conform to this Draft.
1402
1403 NOTE: Such re-encoding is a non-trivial task, due
1404 to MIME rules such as the prohibition of nested
1405 encodings. It's not just a matter of pouring the
1406 body through a simple filter.
1407
1408 Reading agents SHOULD note MIME headers and attempt to show
1409 the reader the closest possible approximation to the
1410 intended content. They SHOULD not just send the octets of
1411 the article to the output device unaltered, unless there is
1412 reason to believe that the output device will indeed inter-
1413 pret them correctly. Reading agents MUST not pass ASCII
1414 control characters or escape sequences, other than as dis-
1415 cussed above, unaltered to the output device; only by chance
1416 would the result be the desired one, and there is serious
1417 potential for harmful side effects, either accidental or
1418 malicious.
1419
1420 NOTE: Exactly what to do with unwanted control
1421 characters/sequences depends on the philosophy of
1422 the reading agent, but passing them straight to
1423 the output device is almost always wrong. If the
1424 reading agent wants to mark the presence of such a
1425 character/sequence in circumstances where only
1426 ASCII printable characters are available, trans-
1427 lating it to "#" might be a suitable method; "#"
1428 is a conspicuous character seldom used in normal
1429 text.
1430
1431 NOTE: Reading agents should be aware that many old
1432 output devices (or the transmission paths to them)
1433 zero out the top bit of octets sent to them. This
1434 can transform non-ASCII characters into ASCII con-
1435 trol characters.
1436
1437 Followup agents MUST be careful to apply appropriate trans-
1438 formations of representation to the outbound followup as
1439 well as the inbound precursor. A followup to an article
1440 containing non-ASCII material is very likely to contain non-
1441 ASCII material itself.
1442
1443
1444
1445
1446
1447
1448
1449
1450 2 June 1994 - 22 - expires 15 July 1994
1451
1452
1453
1454
1455
1456 INTERNET DRAFT to be NEWS sec. 4.5
1457
1458
1459 4.5. Non-ASCII Characters In Headers
1460
1461 All octets found in headers MUST be ASCII characters. How-
1462 ever, it is desirable to have a way of encoding non-ASCII
1463 characters, especially in "human-readable" headers such as
1464 Subject. MIME [rrr] provides a way to do this. Full
1465 details may be found in the MIME specifications; herewith a
1466 quick summary to alert software authors to the issues...
1467
1468 encoded-word = "=?" charset "?" encoding "?" codes "?="
1469 charset = 1*tag-char
1470 encoding = 1*tag-char
1471 tag-char = <ASCII printable character except !()<>@,;:\"[]/?=>
1472 codes = 1*code-char
1473 code-char = <ASCII printable character except ?>
1474
1475 An encoded word is a sequence of ASCII printable characters
1476 that specifies the character set, encoding method, and bits
1477 of (potentially) non-ASCII characters. Encoded words are
1478 allowed only in certain positions in certain headers. Spe-
1479 cific headers impose restrictions on the content of encoded
1480 words beyond that specified in this section. Posting agents
1481 MUST ensure that any material resembling an encoded word
1482 (complete with all delimiters), in a context where encoded
1483 words may appear, really is an encoded word.
1484
1485 NOTE: The syntax is a bit ugly, but it was
1486 designed to minimize chances of confusion with
1487 legitimate header contents, and to satisfy diffi-
1488 cult constraints on use within existing headers.
1489
1490 An encoded word MUST not be more than 75 octets long. Each
1491 line of a header containing encoded word(s) MUST be at most
1492 76 octets long, not counting the EOL.
1493
1494 NOTE: These limits are meant to bound the looka-
1495 head needed to determine whether text that begins
1496 "=?" is really an encoded word.
1497
1498 The details of charsets and encodings are defined by MIME
1499 [rrr]; the sequence of preferred character sets is the same
1500 as MIME's. Encoded words SHOULD not be used for content
1501 expressible in ASCII.
1502
1503 When an encoded word is used, other than in a newsgroup name
1504 (see section 5.5), it MUST be separated from any adjacent
1505 non-space characters (including other encoded words) by
1506 white space. Reading agents displaying the contents of
1507 encoded words (as opposed to their encoded form) should
1508 ignore white space adjacent to encoded words.
1509
1510 UNRESOLVED ISSUE: Should this section be deleted
1511 entirely, or made much more terse? The material
1512 is relevant, but too complex to discuss fully.
1513
1514
1515
1516 2 June 1994 - 23 - expires 15 July 1994
1517
1518
1519
1520
1521
1522 INTERNET DRAFT to be NEWS sec. 4.5
1523
1524
1525 NOTE: The deletion of intervening white space per-
1526 mits using multiple encoded words, implicitly con-
1527 catenated by the deletion, to encode text that
1528 will not fit within a single 75-character encoded
1529 word.
1530
1531 Reading-agent implementors are warned that although this
1532 Draft completely specifies where encoded words may appear in
1533 the headers it defines, there are other headers (e.g. the
1534 MIME Content-Description header) that MAY contain them.
1535
1536
1537 4.6. Size Limits
1538
1539 Implementations SHOULD avoid fixed constraints on the sizes
1540 of lines within an article and on the size of the entire
1541 article.
1542
1543 Relayers SHOULD treat the body of an article as an uninter-
1544 preted sequence of octets (except as mandated by changes of
1545 EOL representation and processing of control messages), not
1546 to be altered or constrained in any way.
1547
1548 If it is absolutely necessary for an implementation to
1549 impose a limit on the length of header lines, body lines, or
1550 header logical lines, that limit shall be at least 1000
1551 octets, including EOL representations. Relayers and trans-
1552 mission paths confronted with lines beyond their internal
1553 limits (if any) MUST not simply inject EOLs at random
1554 places; they MAY break headers (as described in 4.2.3) as a
1555 last resort, and otherwise they MUST either pass the long
1556 lines through unaltered, or refuse to pass the article at
1557 all (see section 9.1 for further discussion).
1558
1559 NOTE: The limit here is essentially the same mini-
1560 mum as that specified for SMTP mail in RFC 821
1561 [rrr]. Implementors are warned that Path (see
1562 section 5.6) and References (see section 6.5)
1563 headers, in particular, often become several hun-
1564 dred characters long, so 1000 is not an overly
1565 generous limit.
1566
1567 All implementations MUST be able to handle an article
1568 totalling at least 65,000 octets, including headers and EOL
1569 representations, gracefully and efficiently. All implemen-
1570 tations SHOULD be able to handle an article totalling at
1571 least 1,000,000 (one million) octets, including headers and
1572 EOL representations, gracefully and efficiently. "Grace-
1573 fully and efficiently" is intended to preclude not only
1574 failures, but also major loss of performance, serious prob-
1575 lems in error recovery, or resource consumption beyond what
1576 is reasonably necessary.
1577
1578
1579
1580
1581
1582 2 June 1994 - 24 - expires 15 July 1994
1583
1584
1585
1586
1587
1588 INTERNET DRAFT to be NEWS sec. 4.6
1589
1590
1591 NOTE: The intent here is to prohibit lowering the
1592 existing de-facto limit any further, while
1593 strongly encouraging movement towards a higher
1594 one. Actually, although improvements are desir-
1595 able in some cases, much news software copes rea-
1596 sonably well with very large articles. The same
1597 cannot be said of the communications software and
1598 protocols used to transmit news from one host to
1599 another, especially when slow communications links
1600 are involved. Occasional huge articles that
1601 appear now (by accident or through ignorance) typ-
1602 ically leave trails of failing software, system
1603 problems, and irate administrators in their wake.
1604
1605 NOTE: It is intended that the successor to this
1606 Draft will raise the "MUST" limit to 1,000,000 and
1607 the "SHOULD" limit still further.
1608
1609 Posters SHOULD limit posted articles to at most 60,000
1610 octets, including headers and EOL representations, unless
1611 the articles are being posted only within a cooperating sub-
1612 net which is known to be capable of handling larger articles
1613 gracefully. Posting agents presented with a large article
1614 SHOULD warn the poster and request confirmation.
1615
1616 NOTE: The difference between this and the earlier
1617 "MUST" limit is margin for header growth, differ-
1618 ing EOL representations, and transmission over-
1619 heads.
1620
1621 NOTE: Disagreeable though these limits are, it is
1622 a fact that in current networks, an article larger
1623 than 64K (after header growth etc.) simply is not
1624 transmitted reliably. Note also the comments
1625 above on the trauma caused by single extremely-
1626 large articles now; the problems are real and cur-
1627 rent. These problems arguably should be fixed,
1628 but this will not happen network-wide in the imme-
1629 diate future. Hence the restriction of larger
1630 articles to cooperating subnets, for now.
1631
1632 Posters using non-ASCII characters in their text MUST take
1633 into account the overhead involved in MIME encoding, unless
1634 the article's propagation will be entirely limited to a
1635 cooperating subnet which does not use MIME encodings for
1636 non-ASCII characters. For example, MIME base64 encoding
1637 involves growth by a factor of approximately 4/3, so an
1638 article which would likely have to use this encoding should
1639 be at most about 45,000 octets before encoding.
1640
1641 Posters SHOULD use MIME "message/partial" conventions to
1642 facilitate automatic reassembly of a large document split
1643 into smaller pieces for posting. It is recommended that the
1644 content identifier used should be a message ID, generated by
1645
1646
1647
1648 2 June 1994 - 25 - expires 15 July 1994
1649
1650
1651
1652
1653
1654 INTERNET DRAFT to be NEWS sec. 4.6
1655
1656
1657 the same means as article message IDs (see section 5.3), and
1658 that all parts should have a See-Also header (see section
1659 6.16) giving the message IDs of at least the previous parts
1660 and preferably all the parts.
1661
1662 NOTE: See-Also is more correct for this purpose
1663 than References, although References is in common
1664 use today (with less-formal reassembly arrange-
1665 ments). MIME reassemblers should probably examine
1666 articles suggested by References headers if See-
1667 Also headers are not present to indicate the
1668 whereabouts of the other parts of "mes-
1669 sage/partial" articles.
1670
1671 To repeat: implementations SHOULD avoid fixed constraints on
1672 the sizes of lines within an article and on the size of the
1673 entire article.
1674
1675
1676 4.7. Example
1677
1678 Here is a sample article:
1679
1680 From: jerry@eagle.ATT.COM (Jerry Schwarz)
1681 Path: cbosgd!mhuxj!mhuxt!eagle!jerry
1682 Newsgroups: news.announce
1683 Subject: Usenet Etiquette -- Please Read
1684 Message-ID: <642@eagle.ATT.COM>
1685 Date: Mon, 17 Jan 1994 11:14:55 -0500 (EST)
1686 Followup-To: news.misc
1687 Expires: Wed, 19 Jan 1994 00:00:00 -0500
1688 Organization: AT&T Bell Laboratories, Murray Hill
1689
1690 body
1691 body
1692 body
1693
1694
1695
1696 5. Mandatory Headers
1697
1698 An article MUST have one, and only one, of each of the fol-
1699 lowing headers: Date, From, Message-ID, Subject, Newsgroups,
1700 Path.
1701
1702 NOTE: MAIL specifies (if read most carefully) that
1703 there must be exactly one Date header and exactly
1704 one From header, but otherwise does not restrict
1705 multiple appearances of headers. (Notably, it
1706 permits multiple Message-ID headers!) This
1707 appears singularly useless, or even harmful, in
1708 the context of news, and much current news soft-
1709 ware will not tolerate multiple appearances of
1710 mandatory headers.
1711
1712
1713
1714 2 June 1994 - 26 - expires 15 July 1994
1715
1716
1717
1718
1719
1720 INTERNET DRAFT to be NEWS sec. 5
1721
1722
1723 Note also that there are situations, discussed in the rele-
1724 vant parts of section 6, where References, Sender, or
1725 Approved headers are mandatory.
1726
1727 In the discussions of the individual headers, the content of
1728 each is specified using the syntax notation. The convention
1729 used is that the content of, for example, the Subject header
1730 is defined as <Subject-content>.
1731
1732
1733 5.1. Date
1734
1735 The Date header contains the date and time when the article
1736 was submitted for transmission:
1737
1738 Date-content = [ weekday "," space ] date space time
1739 weekday = "Mon" / "Tue" / "Wed" / "Thu"
1740 / "Fri" / "Sat" / "Sun"
1741 date = day space month space year
1742 day = 1*2digit
1743 month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun"
1744 / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec"
1745 year = 4digit / 2digit
1746 time = hh ":" mm [ ":" ss ] space timezone
1747 timezone = "UT" / "GMT"
1748 / ( "+" / "-" ) hh mm [ space "(" zone-name ")" ]
1749 hh = 2digit
1750 mm = 2digit
1751 ss = 2digit
1752 zone-name = 1*( <ASCII printable character except ()\> / space )
1753
1754 This is a restricted subset of the MAIL date format.
1755
1756 If a weekday is given, it MUST be consistent with the date.
1757 The modern Gregorian calendar is used, and dates MUST be
1758 consistent with its usual conventions; for example, if the
1759 month is May, the day must be between 1 and 31 inclusive.
1760 The year SHOULD be given as four digits, and posting agents
1761 SHOULD enforce this; however, relayers MUST accept the two-
1762 digit form, and MUST interpret it as having the implicit
1763 prefix "19".
1764
1765 NOTE: Two-digit year numbers can, should, and must
1766 be phased out by 1999.
1767
1768 The time is given on the 24-hour clock, e.g. two hours
1769 before midnight is "22:00" or "22:00:00". The hh must be
1770 between 00 and 23 inclusive, the mm between 0 and 59 inclu-
1771 sive, and the ss between 0 and 61 inclusive.
1772
1773 NOTE: Leap seconds very occasionally result in
1774 minutes that are 61 or 62 seconds long.
1775
1776
1777
1778
1779
1780 2 June 1994 - 27 - expires 15 July 1994
1781
1782
1783
1784
1785
1786 INTERNET DRAFT to be NEWS sec. 5.1
1787
1788
1789 The date and time SHOULD be given in the poster's local
1790 timezone, including a specification of that timezone as a
1791 numeric offset (which SHOULD include the timezone name, e.g.
1792 "EST", supplied in parentheses like a MAIL comment). If
1793 not, they MUST be given in Universal Time (abbreviated "UT";
1794 "GMT" is a historical synonym for "UT"). The timezone name
1795 in parentheses, if present, is a comment; software MUST
1796 ignore it, except that reading agents might wish to display
1797 it to the reader. Timezone names other than "UT" and "GMT"
1798 MUST appear only in the comment.
1799
1800 NOTE: Attempts to deal with a full set of timezone
1801 names have all foundered on the vast number of
1802 such names in use and the duplications (for exam-
1803 ple, there are at least FIVE different timezones
1804 called "EST" by somebody). Even the limited set
1805 of North American zone names authorized by MAIL is
1806 subject to confusion and misinterpretation. Hence
1807 the flat ban on non-UT timezone names except as
1808 comments.
1809
1810 NOTE: RFC 1036 specified that use of GMT (aka UT,
1811 UTC) was preferred. However, the local time (in
1812 the poster's timezone) is arguably information of
1813 possible interest to the reader, and this requires
1814 some indication of the poster's timezone. Numeric
1815 offsets are an unambiguous way of doing this, and
1816 their use was indeed sanctioned by RFC 1036 (that
1817 is, this is a change of preference only).
1818
1819 NOTE: There is frequent confusion, including
1820 errors in some news software, regarding the sign
1821 of numeric timezones. Zones west of Greenwich
1822 have negative offsets. For example, North Ameri-
1823 can Eastern Standard Time is zone -0500 and North
1824 American Eastern Daylight Time is zone -0400.
1825
1826 NOTE: Implementors are warned that the hh in a
1827 timezone can go up to about 14; it is not limited
1828 to 12. This is because the International Date
1829 Line does not run exactly along the boundary
1830 between zone -1200 and zone +1200.
1831
1832 NOTE: The comments in section 2.6 regarding trans-
1833 lation to other languages are relevant here. The
1834 Date-content format, and the spellings of its com-
1835 ponents, as found in articles themselves, are
1836 always as defined in this Draft, regardless of the
1837 language used to interact with readers and
1838 posters. Reading and posting agents should trans-
1839 late as appropriate. Actually, even English-
1840 language reading and posting agents will probably
1841 want to do some degree of translation on dates, if
1842 only to abbreviate the lengthy format and
1843
1844
1845
1846 2 June 1994 - 28 - expires 15 July 1994
1847
1848
1849
1850
1851
1852 INTERNET DRAFT to be NEWS sec. 5.1
1853
1854
1855 (perhaps) translate to and from the reader's time-
1856 zone.
1857
1858
1859 5.2. From
1860
1861 The From header contains the electronic address, and possi-
1862 bly the full name, of the article's author:
1863
1864 From-content = address [ space "(" paren-phrase ")" ]
1865 / [ plain-phrase space ] "<" address ">"
1866 paren-phrase = 1*( paren-char / space / encoded-word )
1867 paren-char = <ASCII printable character except ()<>\>
1868 plain-phrase = plain-word *( space plain-word )
1869 plain-word = unquoted-word / quoted-word / encoded-word
1870 unquoted-word = 1*unquoted-char
1871 unquoted-char = <ASCII printable character except !()<>@,;:\".[]>
1872 quoted-word = quote 1*( quoted-char / space ) quote
1873 quote = <" (ASCII 34)>
1874 quoted-char = <ASCII printable character except "()<>\>
1875 address = local-part "@" domain
1876 local-part = unquoted-word *( "." unquoted-word )
1877 domain = unquoted-word *( "." unquoted-word )
1878
1879 (Encoded words are described in section 4.5.) The full name
1880 is distinguished from the electronic address either by
1881 enclosing the former in parentheses (making it resemble a
1882 MAIL comment, after the address) or by enclosing the latter
1883 in angle brackets. The second form is preferred. In the
1884 first form, encoded words inside the full name MUST be com-
1885 posed entirely of <paren-char>s. In the second form,
1886 encoded words inside the full name may not contain charac-
1887 ters other than letters (of either case), digits, and the
1888 characters "!", "*", "+", "-", "/", "=", and "_". The local
1889 part is case-sensitive (except that all case counterparts of
1890 "postmaster" are deemed equivalent), the domain is case-
1891 insensitive, and all other parts of the From content are
1892 comments which MUST be ignored by news software (except
1893 insofar as reading agents may wish to display them to the
1894 reader). Posters and posting agents MUST restrict them-
1895 selves to this subset of the MAIL From syntax; relayers MAY
1896 accept a broader subset, but see the discussion in section
1897 9.1.
1898
1899 NOTE: The syntax here is a restricted subset of
1900 the MAIL From syntax, with quoting particularly
1901 restricted, for simple parsing. In particular,
1902 the presence of "<" in the From content indicates
1903 that the second form is being used, otherwise the
1904 first form is being used. The major restrictions
1905 here are those already de-facto imposed by exist-
1906 ing software.
1907
1908
1909
1910
1911
1912 2 June 1994 - 29 - expires 15 July 1994
1913
1914
1915
1916
1917
1918 INTERNET DRAFT to be NEWS sec. 5.2
1919
1920
1921 NOTE: Overly-lenient posting agents sometimes per-
1922 mit the second form with a full name containing
1923 "(" or ")", but it is extremely rare for a full
1924 name to contain "<" or ">" even in mail. Accord-
1925 ingly, reading agents wishing to robustly deter-
1926 mine which form is in use in a particular article
1927 should key on the presence or absence of "<", not
1928 the presence or absence of "(".
1929
1930 The address SHOULD be a valid and complete Internet domain
1931 address, capable of being successfully mailed to by an
1932 Internet host (possibly via an MX record and a forwarder).
1933 The pseudo-domain ".uucp" MAY be used for hosts registered
1934 in the UUCP maps (e.g. name "xyz.uucp" for registered site
1935 "xyz"), but such hosts SHOULD discontinue this usage (either
1936 by arranging a proper Internet address and forwarder, or by
1937 using the "% hack" (see below)), as soon as possible. Bit-
1938 net hosts SHOULD use Internet addresses, avoiding the obso-
1939 lescent ".bitnet" pseudo-domain. Other forms of address
1940 MUST not be used.
1941
1942 NOTE: "Other forms" specifically include UK-style
1943 "backward" domains ("uk.oxbridge.cs" is in the
1944 Czech Republic, not the UK), pure-UUCP addressing
1945 ("knee!shin!foot" instead of
1946 "foot%shin@knee.uucp"), and abbreviated domains
1947 ("zebra.zoo" instead of "zebra.zoo.toronto.edu").
1948
1949 If it is necessary to use the local part to specify a rout-
1950 ing relative to the nearest Internet host, this MUST be done
1951 using the "% hack", using "%" as a secondary "@". For exam-
1952 ple, to specify that mail to the address should go to Inter-
1953 net host "foo.bar.edu", then to non-Internet host "ein",
1954 then to non-Internet host "deux", for delivery there to
1955 mailbox "fred", a suitable address would be:
1956
1957 fred%deux%ein@foo.bar.edu
1958
1959 Analogous forms using "!" in the local part MUST not be
1960 used, as they are ambiguous; they should be expressed in the
1961 "%" form.
1962
1963 NOTE: "a!b@c" can be interpreted as either "b%c@a"
1964 or "b%a@c", and there is no consistency in which
1965 choice is made. Such addresses consequently are
1966 unreliable. The "%" form does not suffer from
1967 this problem, and although its use is officially
1968 discouraged, it is a de-facto standard, to the
1969 point that MAIL recognizes it.
1970
1971 Relayers MUST not, repeat MUST not, repeat MUST not, rewrite
1972 From lines, in any way, however minor or innocent-seeming.
1973 Trying to "fix" a non-conforming address has a very high
1974 probability of making things worse. Either pass it along
1975
1976
1977
1978 2 June 1994 - 30 - expires 15 July 1994
1979
1980
1981
1982
1983
1984 INTERNET DRAFT to be NEWS sec. 5.2
1985
1986
1987 unchanged, or reject the article.
1988
1989 NOTE: An additional reason for banning the use of
1990 "!" addressing is that it has a much higher proba-
1991 bility of being rewritten into mangled unrecogniz-
1992 ability by old relayers.
1993
1994 Posters and posting agents SHOULD avoid use of the charac-
1995 ters "!" and "@" in full names, as they may trigger unwanted
1996 header rewriting by old, simple-minded news software.
1997
1998 NOTE: Also, the characters "." and ",", not infre-
1999 quently found in names (e.g., "John W. Campbell,
2000 Jr."), are NOT, repeat NOT, allowed in an unquoted
2001 word. A From header like the following MUST not
2002 be written without the quotation marks:
2003
2004 From: "John W. Campbell, Jr." <editor@analog.com>
2005
2006
2007
2008 5.3. Message-ID
2009
2010 The Message-ID header contains the article's message ID, a
2011 unique identifier distinguishing the article from every
2012 other article:
2013
2014 Message-ID-content = message-id
2015 message-id = "<" local-part "@" domain ">"
2016
2017 As with From addresses, a message ID's local part is case-
2018 sensitive and its domain is case-insensitive. The "<" and
2019 ">" are parts of the message ID, not peculiarities of the
2020 Message-ID header.
2021
2022 NOTE: News message IDs are a restricted subset of
2023 MAIL message IDs. In particular, no existing news
2024 software copes properly with MAIL quoting conven-
2025 tions within the local part, so they are forbid-
2026 den. This is unfortunate, particularly for X.400
2027 gateways that often wish to include characters
2028 which are not legal in unquoted message IDs, but
2029 it is impossible to fix net-wide. See the notes
2030 on gatewaying in section 10.
2031
2032 The domain in the message ID SHOULD be the full Internet
2033 domain name of the posting agent's host. Use of the ".uucp"
2034 pseudo-domain (for hosts registered in the UUCP maps) or the
2035 ".bitnet" pseudo-domain (for Bitnet hosts) is permissible,
2036 but SHOULD be avoided.
2037
2038 Posters and posting agents MUST generate the local part of a
2039 message ID using an algorithm which obeys the specified syn-
2040 tax (words separated by ".", with certain characters not
2041
2042
2043
2044 2 June 1994 - 31 - expires 15 July 1994
2045
2046
2047
2048
2049
2050 INTERNET DRAFT to be NEWS sec. 5.3
2051
2052
2053 permitted) (see section 5.2 for details), and will not
2054 repeat itself (ever). The algorithm SHOULD not generate
2055 message IDs which differ only in case of letters. Note the
2056 specification in section 6.5 of a recommended convention for
2057 indicating subject changes. Otherwise the algorithm is up
2058 to the implementor.
2059
2060 NOTE: The crucial use of message IDs is to distin-
2061 guish circulating articles from each other and
2062 from articles circulated recently. They are also
2063 potentially useful as permanent indexing keys,
2064 hence the requirement for permanent uniqueness...
2065 but indexers cannot absolutely rely on this
2066 because the earlier RFCs urged it but did not
2067 demand it. All major implementations have always
2068 generated permanently-unique message IDs by
2069 design, but in some cases this is sensitive to
2070 proper administration, and duplicates may have
2071 occurred by accident.
2072
2073 NOTE: The most popular method of generating local
2074 parts is to use the date and time, plus some way
2075 of distinguishing between simultaneous postings on
2076 the same host (e.g. a process number), and encode
2077 them in a suitably-restricted alphabet. An older
2078 but now less-popular alternative is to use a
2079 sequence number, incremented each time the host
2080 generates a new message ID; this is workable, but
2081 requires careful design to cope properly with
2082 simultaneous posting attempts, and is not as
2083 robust in the presence of crashes and other mal-
2084 functions.
2085
2086 NOTE: Some buggy news software considers message
2087 IDs completely case-insensitive, hence the advice
2088 to avoid relying on case distinctions. The
2089 restrictions placed on the "alphabet" of local
2090 parts and domains in section 5.2 have the useful
2091 side effect of making it unnecessary to parse mes-
2092 sage IDs in complex ways to break them into case-
2093 sensitive and case-insensitive portions.
2094
2095 The local part of a message ID MUST not be "postmaster" or
2096 any other string that would compare equal to "postmaster" in
2097 a case-insensitive comparison. Message IDs MUST be no
2098 longer than 250 octets, including the "<" and ">".
2099
2100 NOTE: "Postmaster" is an irksome exception to
2101 case-sensitivity in local parts, inherited from
2102 MAIL, and simply avoiding it is the best way to
2103 deal with it (not that it's likely, but the issue
2104 needs to be dealt with). The length limit is
2105 undesirable, but is present in widely-used exist-
2106 ing software. The limit is actually 255, but a
2107
2108
2109
2110 2 June 1994 - 32 - expires 15 July 1994
2111
2112
2113
2114
2115
2116 INTERNET DRAFT to be NEWS sec. 5.3
2117
2118
2119 small safety margin is wise.
2120
2121
2122 5.4. Subject
2123
2124 The Subject header's content (the "subject" of the article)
2125 is a short phrase describing the topic of the article:
2126
2127 Subject-content = [ "Re: " ] nonblank-text
2128
2129 Encoded words MAY appear in this header.
2130
2131 If the article is a followup, the subject SHOULD begin with
2132 "Re: " (a "back reference"). If the article is not a fol-
2133 lowup, the subject MUST not begin with a back reference.
2134 Back references are case-insensitive, although "Re: " is the
2135 preferred form. A followup agent assisting a poster in
2136 preparing a followup SHOULD prepend a back reference, UNLESS
2137 the subject already begins with one. If the poster deter-
2138 mines that the topic of the followup differs significantly
2139 from what is described in the subject, a new, more descrip-
2140 tive, subject SHOULD be substituted (with no back refer-
2141 ence). An article whose subject begins with a back refer-
2142 ence MUST have a References header referencing the precur-
2143 sor.
2144
2145 NOTE: A back reference is FOUR characters, the
2146 fourth being a blank. RFC 1036 was confused about
2147 this. Observe also that only ONE back reference
2148 should be present.
2149
2150 NOTE: There is a semi-standard convention, often
2151 used, in which a subject change is flagged by mak-
2152 ing the new Subject-content of the form:
2153
2154 new topic (was: old topic)
2155
2156 possibly with "old topic" somewhat truncated.
2157 Posters wishing to do something like this are
2158 urged to use this exact form, to simplify auto-
2159 mated analysis.
2160
2161 For historical reasons, the subject MUST not begin with
2162 "cmsg " (note that this sequence ends with a blank).
2163
2164 NOTE: Some old news software takes a subject
2165 beginning with "cmsg " as an indication that the
2166 article is a control message (see sections 6.6 and
2167 7). This mechanism is obsolete and undesirable,
2168 but accidental triggering of it is still possible.
2169
2170 The subject SHOULD be terse. Posters SHOULD avoid trying to
2171 cram their entire article into the headers; even the sim-
2172 plest query usually benefits from a sentence or two of
2173
2174
2175
2176 2 June 1994 - 33 - expires 15 July 1994
2177
2178
2179
2180
2181
2182 INTERNET DRAFT to be NEWS sec. 5.4
2183
2184
2185 elaboration and context, and the details of header display
2186 vary widely among reading agents.
2187
2188 NOTE: All-in-the-subject articles are sometimes
2189 the result of misunderstandings over the interac-
2190 tion protocol of a posting agent. Posting agents
2191 might wish to give special attention to the possi-
2192 bility that a poster specifying a very long sub-
2193 ject might have thought he was typing the body of
2194 the article.
2195
2196
2197 5.5. Newsgroups
2198
2199 The Newsgroups header's content specifies which newsgroup(s)
2200 the article is posted to:
2201
2202 Newsgroups-content = newsgroup-name *( ng-delim newsgroup-name )
2203 newsgroup-name = plain-component *( "." component )
2204 component = plain-component / encoded-word
2205 plain-component = component-start *13component-rest
2206 component-start = lowercase / digit
2207 lowercase = <letter a-z>
2208 component-rest = component-start / "+" / "-" / "_"
2209 ng-delim = ","
2210
2211 Encoded words used in newsgroup names MUST not contain char-
2212 acters other than letters, digits, "+", "-", "/", "_", "=",
2213 and "?" (although they may encode them).
2214
2215 A newsgroup name consists of one or more components, which
2216 may be plain components or (except for the first) encoded
2217 words. A plain component MUST contain at least one letter,
2218 MUST begin with a letter or digit, and MUST not be longer
2219 than 14 characters. The first component MUST begin with a
2220 letter; subsequent components SHOULD begin with a letter.
2221 Newsgroup names MUST not contain uppercase letters, except
2222 where required by encodings in encoded words. The sequences
2223 "all" and "ctl" MUST not be used as components.
2224
2225 NOTE: The alphabet and syntax specified encom-
2226 passes all existing names of widespread news-
2227 groups, while avoiding various forms that are
2228 known to cause problems. Important existing soft-
2229 ware uses various non-alphanumeric characters as
2230 punctuation adjacent to newsgroup names. (It
2231 would, in fact, be preferable to ban "+" from
2232 newsgroup names, were it not that several
2233 widespread newsgroups related to the C++ program-
2234 ming language already use it.)
2235
2236 NOTE: Much existing software converts the news-
2237 group name into a directory path and stores the
2238 articles themselves using numeric filenames, so
2239
2240
2241
2242 2 June 1994 - 34 - expires 15 July 1994
2243
2244
2245
2246
2247
2248 INTERNET DRAFT to be NEWS sec. 5.5
2249
2250
2251 all-digit name components can be troublesome; the
2252 "Great Renaming" early in the history of Usenet
2253 included revisions of several newsgroup names to
2254 eliminate such components.
2255
2256 NOTE: The same storage technique is the reason for
2257 the 14-character limit. The limit is now largely
2258 historical, since most modern systems have much
2259 larger limits on the length of a directory entry's
2260 name, but many old systems are still in use. Sys-
2261 tems with shorter limits also exist, but news
2262 software on such systems has had to deal with the
2263 problem already, since there are several
2264 widespread newsgroups with 14-character components
2265 in their names. Implementors are warned that it
2266 is intended that the successor to this Draft will
2267 increase the 14-character limit, and are urged to
2268 fix their software to handle longer names grace-
2269 fully (if such fixes are necessary, given the
2270 intended domain of application of the particular
2271 software).
2272
2273 NOTE: The requirement that the first character of
2274 a name be a letter accommodates existing software
2275 which assumes it can tell the difference between a
2276 newsgroup name and other possible syntactic enti-
2277 ties by inspecting the first character. Similar
2278 considerations motivate excluding "+", "-", and
2279 "_" from coming first in a component, and the
2280 preference for components that do not begin with
2281 digits. The "all" sequence is used as a wildcard
2282 symbol in much existing software, and the "ctl"
2283 sequence was involved in an obsolete historical
2284 mechanism for marking control messages, so they
2285 are best avoided.
2286
2287 NOTE: Possibly newsgroup names should have been
2288 case-insensitive, but all existing software treats
2289 them as case-sensitive. (RFC 977 [rrr] claims
2290 that they are case-insensitive in NNTP, but exist-
2291 ing implementations are believed to ignore this.)
2292 The simplest solution is just to ban use of upper-
2293 case letters, since no widespread newsgroup name
2294 uses them anyway; this avoids any possibility of
2295 confusion.
2296
2297 NOTE: The syntax has the disadvantage of contain-
2298 ing no white space, making it impossible to con-
2299 tinue a Newsgroups header across several lines.
2300 Implementors of relayers and reading agents are
2301 warned that it is intended that the successor to
2302 this Draft will change the definition of ng-delim
2303 to:
2304
2305
2306
2307
2308 2 June 1994 - 35 - expires 15 July 1994
2309
2310
2311
2312
2313
2314 INTERNET DRAFT to be NEWS sec. 5.5
2315
2316
2317 ng-delim = "," [ space ]
2318
2319 and are urged to fix their software to handle
2320 (i.e., ignore) white space following the commas.
2321 Meanwhile, posters must avoid inserting such space
2322 (despite the natural-language convention which
2323 permits it) and posting agents should strip it
2324 out.
2325
2326 NOTE: Encoded words as components are somewhat
2327 problematic, but are clearly desirable for use in
2328 non-English-speaking nations. They are not sub-
2329 ject to the 14-character limit, and this (plus the
2330 possibility of "/" within them) may require spe-
2331 cial handling in news software.
2332
2333 Encoded words are allowed in newsgroup names ONLY where non-
2334 ASCII characters are necessary to the name, and must use the
2335 "b" encoding [rrr] and the first suitable character set in
2336 the MIME order of preferred character sets [rrr].
2337
2338 NOTE: Since the newsgroup name is the encoded
2339 form, NOT the underlying non-ASCII form, there is
2340 room for terrible confusion here if the choice of
2341 encoding for a particular name is not fully stan-
2342 dardized.
2343
2344 Posters SHOULD use only the names of existing newsgroups in
2345 the Newsgroups header, because newsgroups are NOT created
2346 simply by being posted to. However, it is legitimate to
2347 cross-post to newsgroup(s) which do not exist on the posting
2348 agent's host, provided that at least one of the newsgroups
2349 DOES exist there, and followup agents MUST accept this
2350 (posting agents MAY accept it, but SHOULD at least alert the
2351 poster to the situation and request confirmation). Relayers
2352 MUST not rewrite Newsgroups headers in any way, even if some
2353 or all of the newsgroups do not exist on the relayer's host.
2354
2355 NOTE: Early experience with news software that
2356 created newsgroups when they were mentioned in a
2357 Newsgroups header was thoroughly negative: posters
2358 frequently mistype newsgroup names.
2359
2360 NOTE: While it is legitimate for some of an arti-
2361 cle's newsgroups not to exist on the host where it
2362 is posted, this IS a rather unusual situation
2363 except in followups (which should go to all news-
2364 groups the precursor was posted to, even if not
2365 all of them reach the site where the followup is
2366 being posted).
2367
2368 NOTE: Rewriting Newsgroups headers to strip
2369 locally-unknown newsgroups is superficially
2370 attractive. However, early experience with
2371
2372
2373
2374 2 June 1994 - 36 - expires 15 July 1994
2375
2376
2377
2378
2379
2380 INTERNET DRAFT to be NEWS sec. 5.5
2381
2382
2383 exactly that policy was thoroughly negative: news
2384 propagation is more redundant and much less
2385 orderly than many people imagine, and in particu-
2386 lar it is not unheard-of for the (sometimes)
2387 fastest path between two (say) U of Toronto sites
2388 to pass outside U of Toronto... in which case
2389 newsgroup stripping can cause incomplete propaga-
2390 tion. Having an article's set of newsgroups
2391 change as it propagates can also result in fol-
2392 lowups not achieving the same propagation as the
2393 original. It's been tried; it's more trouble than
2394 it's worth; don't do it.
2395
2396 NOTE: In particular, newsgroup stripping superfi-
2397 cially looks like a solution to the problem of
2398 duplicate regional newsgroup names. For example,
2399 both University of Toronto and University of Texas
2400 have "ut.general" newsgroups, and material cross-
2401 posted to that name and a global newsgroup appears
2402 in both universities' local newsgroups. However,
2403 the side effects of stripping are sufficiently
2404 unacceptable to disqualify it for this purpose.
2405 Don't do it.
2406
2407 Cross-posting an article to several relevant newsgroups is
2408 far superior to posting separate articles with duplicated
2409 content to each newsgroup, because reading agents can detect
2410 the situation and show the article to a reader only once.
2411 Posters SHOULD cross-post rather than duplicate-post.
2412
2413 NOTE: On the other hand, cross-posting to a large
2414 number of newsgroups usually indicates that the
2415 poster has not thought about his audience; arti-
2416 cles are rarely pertinent to more than (say) half
2417 a dozen newsgroups. Posting agents might wish to
2418 request confirmation when the number of newsgroups
2419 exceeds (say) five in the presence of a Followup-
2420 To header, or (say) two in the absence of such a
2421 header.
2422
2423 NOTE: One problem with cross-postings is what to
2424 do with an article cross-posted to a set of news-
2425 groups including both moderated and unmoderated
2426 ones. Posters tend to expect such an article to
2427 show up immediately in the unmoderated newsgroups,
2428 especially if they do not realize that one or more
2429 of the newsgroups is moderated. However, since it
2430 is not possible for a moderator to retroactively
2431 add an already-posted article to a moderated news-
2432 group, the only correct action is to mail such an
2433 article to one (and only one) of the moderators
2434 for action. It is probably best for the posting
2435 agent to detect this situation and ask the poster
2436 what action is preferred. The acceptable choices
2437
2438
2439
2440 2 June 1994 - 37 - expires 15 July 1994
2441
2442
2443
2444
2445
2446 INTERNET DRAFT to be NEWS sec. 5.5
2447
2448
2449 are to alter the newsgroup list or to mail to a
2450 moderator of the poster's choice; the posting
2451 agent should NOT offer duplicate-posting as an
2452 easy-to-request option (if only because many mod-
2453 erators will reject a submission that has already
2454 been posted to unmoderated newsgroups).
2455
2456 NOTE: An article cross-posted to multiple moder-
2457 ated newsgroups really should have approval from
2458 all the moderators involved. In practice, the
2459 only straightforward way to do this is to send the
2460 article to one of them and have him consult the
2461 others.
2462
2463 A newsgroup SHOULD not appear more than once in the News-
2464 groups header.
2465
2466 Newsgroup names having only one component are reserved for
2467 newsgroups whose propagation is restricted to a single host
2468 (or the administrative equivalent). It is inadvisable to
2469 name a newsgroup "poster" because that word has special
2470 meaning in the Followup-To header (see section 6.1). The
2471 names "control" and "junk" are frequently used for pseudo-
2472 newsgroups internal to relayer implementations, and hence
2473 are also best avoided.
2474
2475 NOTE: Beware of the duplicate-regional-newsgroup-
2476 names problem mentioned above. In particular,
2477 there are many, many hosts with a newsgroup named
2478 "general", and some surprising things show up in
2479 such newsgroups when people cross-post. It is
2480 probably better to use multi-component names,
2481 which are less likely to be duplicated. Fred's
2482 Widget House should use "fwh.general" rather than
2483 just "general" as its in-house general-topics
2484 newsgroup.
2485
2486 It is conventional to reserve newsgroup names beginning with
2487 "to." for test messages sent on an essentially point-to-
2488 point basis (see also the ihave/sendme protocol described in
2489 section 7.2); newsgroup names beginning with "to." SHOULD
2490 not be used for any other purpose. The second (and possibly
2491 later) components of such a name should, together, comprise
2492 the relayer name (see section 5.6) of a relayer. The news-
2493 group exists only at the named relayer and its neighbors.
2494 The neighbors all pass that newsgroup to the named relayer,
2495 while the named relayer does not pass it to anyone.
2496
2497 The order of newsgroup names in the Newsgroups header is not
2498 significant.
2499
2500
2501
2502
2503
2504
2505
2506 2 June 1994 - 38 - expires 15 July 1994
2507
2508
2509
2510
2511
2512 INTERNET DRAFT to be NEWS sec. 5.6
2513
2514
2515 5.6. Path
2516
2517 The Path header's content indicates which relayers the arti-
2518 cle has already visited, so that unnecessary redundant
2519 transmission can be avoided:
2520
2521 Path-content = [ path-list path-delimiter ] local-part
2522 path-list = relayer-name *( path-delimiter relayer-name )
2523 relayer-name = 1*rn-char
2524 rn-char = letter / digit / "." / "-" / "_"
2525 path-delimiter = "!"
2526
2527 The Path content is a list of relayer names, separated by
2528 path delimiters, followed (after a final delimiter) by the
2529 local part of a mailing address. Each relayer MUST prepend
2530 its name, and a delimiter, to the Path content in all arti-
2531 cles it processes. A relayer MUST not pass an article to a
2532 neighboring relayer whose name is already mentioned in an
2533 article's path list, unless this is explicitly requested by
2534 the neighbor in some way. The Path content is case-
2535 sensitive.
2536
2537 NOTE: The Path header supplied by a posting agent
2538 should normally contain only the local part. The
2539 relayer that the posting agent passes the article
2540 to for posting will prepend its relayer name to
2541 get the path list started.
2542
2543 NOTE: Observe that the trailing local part is NOT
2544 part of the path list. This Path header:
2545
2546 Path: fee!fie!foe!fum
2547
2548 contains three relayer names: "fee", "fie", and
2549 "foe". A relayer named "fum" is still eligible to
2550 be sent this article.
2551
2552 NOTE: This syntax has the disadvantage of contain-
2553 ing no white space, making it impossible to con-
2554 tinue a Path header across several lines. Imple-
2555 mentors of relayers and reading agents are warned
2556 that it is intended that the successor to this
2557 Draft will change the definition of path delimiter
2558 to:
2559
2560 path-delimiter = "!" [ space ]
2561
2562 and are urged to fix their software to handle
2563 (i.e., ignore) white space following the exclama-
2564 tion points. They are urged to hurry; some ill-
2565 behaved systems reportedly already feel free to
2566 add such white space.
2567
2568
2569
2570
2571
2572 2 June 1994 - 39 - expires 15 July 1994
2573
2574
2575
2576
2577
2578 INTERNET DRAFT to be NEWS sec. 5.6
2579
2580
2581 NOTE: RFC 1036 allows considerably more flexibil-
2582 ity in choice of delimiter, in theory, but this
2583 flexibility has never been used and most news
2584 software does not implement it properly. The
2585 grammar reflects the current reality. Note, in
2586 particular, that RFC 1036 treats "_" as a delim-
2587 iter, but in fact it is known to appear in relayer
2588 names occasionally.
2589
2590 Because an article will not propagate to a relayer already
2591 mentioned in its path list, the path list MUST not contain
2592 any names other than those of relayers the article has
2593 passed through AS NEWS. This is trivially obvious for nor-
2594 mal news articles, but requires attention from the modera-
2595 tors of moderated newsgroups and the implementors and main-
2596 tainers of gateways.
2597
2598 NOTE: For the same reason, a relayer and its
2599 neighbors need to agree on the choice of relayer
2600 name, and names should not be changed without
2601 notifying neighbors.
2602
2603 Relayer names need to be unique among all relayers which
2604 will ever see the articles using them. A relayer name is
2605 normally either an "official" name for the host the relayer
2606 runs on, or some other "official" name controlled by the
2607 same organization. Except in cooperating subnets that agree
2608 to some other convention, and don't let articles using it
2609 escape beyond the subnet, a relayer name MUST be either a
2610 UUCP name registered in the UUCP maps (without any domain
2611 suffix such as ".UUCP"), or a complete Internet domain name.
2612 Use of a (registered) UUCP name is recommended, where prac-
2613 tical, to keep the length of the path list down.
2614
2615 The use of Internet domain names in the path list presents
2616 one problem: domain names are case-insensitive, but the path
2617 list is case-sensitive. Relayers using domain names as
2618 their relayer names MUST pick a standard form for the name,
2619 and use that form consistently to the exclusion of all oth-
2620 ers. The preferred form for this purpose, which relayers
2621 SHOULD use, is the all-lowercase form.
2622
2623 NOTE: It is arguably unfortunate that the path
2624 list is case-sensitive, but it is much too late to
2625 change this. Most Internet sites do, in any
2626 event, use one standardized form of their name
2627 almost everywhere.
2628
2629 In the ordinary case, where the poster is the author of the
2630 article, the local part following the path list SHOULD be
2631 the local part of the poster's full Internet domain mailing
2632 address.
2633
2634
2635
2636
2637
2638 2 June 1994 - 40 - expires 15 July 1994
2639
2640
2641
2642
2643
2644 INTERNET DRAFT to be NEWS sec. 5.6
2645
2646
2647 NOTE: It should be just the local part, not the
2648 full address. The character "@" does not appear
2649 in a Path header.
2650
2651 The Path content somewhat resembles a mailing address, par-
2652 ticularly in the UUCP world with its manual routing and "!"
2653 address syntax. Historically, this resemblance was impor-
2654 tant, and the Path content was often used as a reply
2655 address. This practice has always been somewhat unreliable,
2656 since news paths are not always mail paths and news relayer
2657 names are not always recognized by mail handlers, and its
2658 reliability has generally worsened in recent times. The
2659 widespread use of and recognition of Internet domain
2660 addresses, even outside the actual Internet, has largely
2661 eliminated the problem. Readers SHOULD not use the Path
2662 content as a reply address. On the other hand, relayer
2663 administrators are urged not to break this usage without
2664 good reason; where practical, paths followed by news SHOULD
2665 be traversable by mail, and mail handlers SHOULD recognize
2666 relayer names as host names.
2667
2668 It will typically be difficult or impractical for gateways
2669 and moderators to supply a Path content that is useful as a
2670 reply address for the author, bearing in mind that the path
2671 list they supply will normally be empty. (To reiterate: the
2672 path list MUST not contain any names other than those of
2673 relayers the article has passed through AS NEWS.) They
2674 SHOULD supply a local part that will result in replies to a
2675 Path-derived address being returned to the sender with a
2676 brief explanation. Software permitting, the local part
2677 "not-for-mail" is recommended.
2678
2679 NOTE: A moderator or gateway administrator who
2680 supplies a local part that delivers such mail to
2681 an administrative mailbox will quickly discover
2682 why it should be bounced automatically! It is
2683 best, however, for the returned message to include
2684 an explanation of what has probably happened,
2685 rather than just a mysterious "undeliverable mail"
2686 complaint, since the sender may not be aware that
2687 his/her software is unwisely using the Path con-
2688 tent as a reply address. Reply software might
2689 wish to question attempts to reply to a Path-
2690 derived address ending in "not-for-mail" (which is
2691 why a specific name is being recommended here).
2692
2693
2694 6. Optional Headers
2695
2696 Many MAIL headers, and many of those specified in present
2697 and future MAIL extensions, are potentially applicable to
2698 news. Headers specific to MAIL's point-to-point transmis-
2699 sion paradigm, e.g. To and Cc, SHOULD not appear in news
2700 articles. (Gateways wishing to preserve such information
2701
2702
2703
2704 2 June 1994 - 41 - expires 15 July 1994
2705
2706
2707
2708
2709
2710 INTERNET DRAFT to be NEWS sec. 6
2711
2712
2713 for debugging probably SHOULD hide it under different names;
2714 prefixing "X-" to the original headers, resulting in e.g.
2715 "X-To", is suggested.)
2716
2717 The following optional headers are either specific to news
2718 or of particular note in news articles; an article MAY con-
2719 tain some or all of them. (Note that there are some circum-
2720 stances in which some of them are mandatory; these are
2721 explained under the individual headers.) An article MUST
2722 not contain two or more headers with any one of these header
2723 names.
2724
2725 NOTE: The ban on duplicate header names does not
2726 apply to headers not specified in this Draft at
2727 all, such as "X-" headers. Software should not
2728 assume that all header names in a given article
2729 are unique.
2730
2731
2732 6.1. Followup-To
2733
2734 The Followup-To header contents specify which newsgroup(s)
2735 followups should be posted to:
2736
2737 Followup-To-content = Newsgroups-content / "poster"
2738
2739 The syntax is the same as that of the Newsgroups content,
2740 with the exception that the magic word "poster" means that
2741 followups should be mailed to the article's reply address
2742 rather than posted. In the absence of Followup-To, the
2743 default newsgroup(s) for a followup are those in the News-
2744 groups header.
2745
2746 NOTE: The way to request that followups be mailed
2747 to a specific address other than that in the From
2748 line is to supply "Followup-To: poster" and a
2749 Reply-To header. Putting a mailing address in the
2750 Followup-To line is incorrect; posting agents
2751 should reject or rewrite such headers.
2752
2753 NOTE: There is no syntax for "no followups
2754 allowed" because "Followup-To: poster" accom-
2755 plishes this effect without extra machinery.
2756
2757 Although it is generally desirable to limit followups to the
2758 smallest reasonable set of newsgroups, especially when the
2759 precursor was cross-posted widely, posting agents SHOULD not
2760 supply a Followup-To header except at the poster's explicit
2761 request.
2762
2763 NOTE: In particular, it is incorrect for the post-
2764 ing agent to assume that followups to a cross-
2765 posted article should be directed to the first
2766 newsgroup only. Trimming the list of newsgroups
2767
2768
2769
2770 2 June 1994 - 42 - expires 15 July 1994
2771
2772
2773
2774
2775
2776 INTERNET DRAFT to be NEWS sec. 6.1
2777
2778
2779 should be the poster's decision, not the posting
2780 agent's. However, when an article is to be cross-
2781 posted to a considerable number of newsgroups, a
2782 posting agent might wish to SUGGEST to the poster
2783 that followups go to a shorter list.
2784
2785
2786 6.2. Expires
2787
2788 The Expires header content specifies a date and time when
2789 the article is deemed to be no longer useful and should be
2790 removed ("expired"):
2791
2792 Expires-content = Date-content
2793
2794 The content syntax is the same as that of the Date content.
2795 In the absence of Expires, the default is decided by the
2796 administrators of each host the article reaches, who MAY
2797 also restrict the extent to which the Expires header is hon-
2798 ored.
2799
2800 The Expires header has two main applications: removing arti-
2801 cles whose utility ends on a specific date (e.g., event
2802 announcements which can be removed once the day of the event
2803 is past) and preserving articles expected to be of prolonged
2804 usefulness (e.g., information aimed at new readers of a
2805 newsgroup). The latter application is sometimes abused.
2806 Since individual hosts have local policies for expiration of
2807 news (depending on available disk space, for instance),
2808 posters SHOULD not provide Expires headers for articles
2809 unless there is a natural expiration date associated with
2810 the topic. Posting agents MUST not provide a default
2811 Expires header. Leave it out and allow local policies to be
2812 used unless there is a good reason not to. Expiry dates are
2813 properly the decision of individual host administrators;
2814 posters and moderators SHOULD set only expiry dates that
2815 most administrators would agree with.
2816
2817 NOTE: A poster preparing an Expires header for an
2818 article whose utility ends on a specific day
2819 should typically specify the NEXT day as the
2820 expiry date. A meeting on July 7th remains of
2821 interest on the 7th.
2822
2823
2824 6.3. Reply-To
2825
2826 The Reply-To header content specifies a reply address dif-
2827 ferent from the author's address given in the From header:
2828
2829 Reply-To-content = From-content
2830
2831 In the absence of Reply-To, the reply address is the address
2832 in the From header.
2833
2834
2835
2836 2 June 1994 - 43 - expires 15 July 1994
2837
2838
2839
2840
2841
2842 INTERNET DRAFT to be NEWS sec. 6.3
2843
2844
2845 Use of a Reply-To header is preferable to including a simi-
2846 lar request in the article body, because reply-preparation
2847 software can take account of Reply-To automatically.
2848
2849
2850 6.4. Sender
2851
2852 The Sender header identifies the poster, in the event that
2853 this differs from the author identified in the From header:
2854
2855 Sender-content = From-content
2856
2857 In the absence of Sender, the default poster is the author
2858 (named in the From header).
2859
2860 NOTE: The intent is that the Sender header have a
2861 fairly high probability of identifying the person
2862 who really posted the article. The ability to
2863 specify a From header naming someone other than
2864 the poster is useful but can be abused.
2865
2866 If the poster supplies a From header, the posting agent MUST
2867 ensure that a Sender header is present, unless it can verify
2868 that the mailing address in the From header is a valid mail-
2869 ing address for the poster. A poster-supplied Sender header
2870 MAY be used, if its mailing address is verifiably a valid
2871 mailing address for the poster; otherwise the posting agent
2872 MUST supply a Sender header and delete (or rename, e.g. to
2873 X-Unverifiable-Sender) any poster-supplied Sender header.
2874
2875 NOTE: It might be useful to preserve a poster-
2876 supplied Sender header so that the poster can sup-
2877 ply the full-name part of the content. The mail-
2878 ing address, however, must be right. Hence, the
2879 posting agent must generate the Sender header if
2880 it is unable to verify the mailing address of a
2881 poster-supplied one.
2882
2883 NOTE: NNTP implementors, in particular, are urged
2884 to note this requirement (which would eliminate
2885 the need for ad hoc headers like NNTP-Posting-
2886 Host), although there are admittedly some imple-
2887 mentation difficulties. A user name from an RFC
2888 1413 server and a host name from an inverse map-
2889 ping of the address, perhaps with a "full name"
2890 comment noting the origin of the information,
2891 would be at least a first approximation:
2892
2893 Sender: fred@zoo.toronto.edu (RFC-1413@reverse-lookup; not verified)
2894
2895 While this does not completely meet the specs, it
2896 comes a lot closer than not having a Sender header
2897 at all. Even just supplying a placeholder for the
2898 user name:
2899
2900
2901
2902 2 June 1994 - 44 - expires 15 July 1994
2903
2904
2905
2906
2907
2908 INTERNET DRAFT to be NEWS sec. 6.4
2909
2910
2911 Sender: somebody@zoo.toronto.edu (user name unknown)
2912
2913 would be better than nothing.
2914
2915
2916 6.5. References
2917
2918 The References header content lists message IDs of precur-
2919 sors:
2920
2921 References-content = message-id *( space message-id )
2922
2923 A followup MUST have a References header, and an article
2924 which is not a followup MUST not have a References header.
2925 In a followup, if the precursor had a References header, the
2926 message ID of the precursor is appended to the end of the
2927 precursor's References-content to form the followup's Refer-
2928 ences-content. a References header containing the precur-
2929 sor's message ID. A followup to an article which had a Ref-
2930 erences header MUST have a References header containing the
2931 precursor's References content, plus the precursor's message
2932 ID appended to the end of the list.
2933
2934 NOTE: Use the See-Also header (section 6.16) for
2935 interconnection of articles which are not in a
2936 followup relationship to each other.
2937
2938 NOTE: In retrospect, RFCs 850 and 1036, and the
2939 implementations whose practice they represented,
2940 erred here. The proper MAIL header to use for
2941 references to precursors is In-Reply-To, and the
2942 References header is meant to be used for the pur-
2943 poses here ascribed to See-Also. This incompati-
2944 bility is far too solidly established to be fixed,
2945 unfortunately. The best that can be done is to
2946 provide a clear mapping between the two, and urge
2947 gateways to do the transformation. The news usage
2948 is (now) a deliberate violation of the MAIL speci-
2949 fications; articles containing news References
2950 headers are technically not valid MAIL messages,
2951 although it is unlikely that much MAIL software
2952 will notice because the incompatibility is at a
2953 subtle semantic level that does not affect the
2954 syntax.
2955
2956 UNRESOLVED ISSUE: Would it be better to just give
2957 up and admit that news uses References for both
2958 purposes?
2959
2960 UNRESOLVED ISSUE: Should the syntax be generalized
2961 to include URLs as alternatives to message IDs?
2962 Perhaps not; too many things know about References
2963 already. And non-articles can't be precursors of
2964 articles, not really.
2965
2966
2967
2968 2 June 1994 - 45 - expires 15 July 1994
2969
2970
2971
2972
2973
2974 INTERNET DRAFT to be NEWS sec. 6.5
2975
2976
2977 Followup agents SHOULD not shorten References headers. If
2978 it is absolutely necessary to shorten the header, as a des-
2979 perate last resort, a followup agent MAY do this by deleting
2980 some of the message IDs. However, it MUST not delete the
2981 first message ID, the last three message IDs (including that
2982 of the immediate precursor), or any message ID mentioned in
2983 the body of the followup. If it is possible for the fol-
2984 lowup agent to determine the Subject content of the articles
2985 identified in the References header, it MUST not delete the
2986 message ID of any article where the Subject content changed
2987 (other than by prepending of a back reference). The fol-
2988 lowup agent MUST not delete any message ID whose local part
2989 ends with "_-_" (underscore (ASCII 95), hyphen (ASCII 45),
2990 underscore); followup agents are urged to use this form to
2991 mark subject changes, and to avoid using it otherwise.
2992
2993 NOTE: As software capable of exploiting References
2994 chains has grown more common, the random shorten-
2995 ing permitted by RFC 1036 has become increasingly
2996 troublesome. ANY shortening is undesirable, and
2997 software should do it only in cases of dire neces-
2998 sity. In such cases, these rules attempt to limit
2999 the damage.
3000
3001 NOTE: The first message ID is very important as
3002 the starting point of the "thread" of discussion,
3003 and absolutely should not be deleted. Keeping the
3004 last three message IDs gives thread-following
3005 software a fighting chance to reconstruct a full
3006 thread even if an article or two is missing.
3007 Keeping message IDs mentioned in the body is obvi-
3008 ously desirable.
3009
3010 NOTE: Subject changes are difficult to determine,
3011 but they are significant as possible beginnings of
3012 new threads. The "_-_" convention is provided so
3013 that posting agents (which have more information
3014 about subjects) can flag articles containing a
3015 subject change in a way that followup agents can
3016 detect without access to the articles themselves.
3017 The sequence is chosen as one that is fairly
3018 unlikely to occur by accident.
3019
3020 NOTE: Is "_-_" really worth having?
3021
3022 When a References header is shortened, at least three blanks
3023 SHOULD be left between adjacent message IDs at each point
3024 where deletions were made. Software preparing new Refer-
3025 ences headers SHOULD preserve multiple blanks in older Ref-
3026 erences content.
3027
3028 NOTE: It's desirable to have some marker of where
3029 deletions occurred, but the restricted syntax of
3030 the header makes this difficult. Extra white
3031
3032
3033
3034 2 June 1994 - 46 - expires 15 July 1994
3035
3036
3037
3038
3039
3040 INTERNET DRAFT to be NEWS sec. 6.5
3041
3042
3043 space is not a very good marker, since it may be
3044 deleted by software that ill-advisedly rewrites
3045 headers, but at least it doesn't break existing
3046 software.
3047
3048 To repeat: followup agents SHOULD not shorten References
3049 headers.
3050
3051 NOTE: Unfortunately, reading agents and other
3052 software analyzing References patterns have to be
3053 prepared for the worst anyway. The worst includes
3054 random deletions and the possibility of circular
3055 References chains (when References is misused in
3056 place of See-Also, section 6.16).
3057
3058
3059 6.6. Control
3060
3061 The Control header content marks the article as a control
3062 message, and specifies the desired actions (other than the
3063 usual ones of filing and passing on the article):
3064
3065 Control-content = verb *( space argument )
3066 verb = 1*( letter / digit )
3067 argument = 1*<ASCII printable character>
3068
3069 The verb indicates what action should be taken, and the
3070 argument(s) (if any) supply details. In some cases, the
3071 body of the article may also contain details. Section 7
3072 describes the standard verbs. See also the Also-Control
3073 header (section 6.15).
3074
3075 NOTE: Control messages are often processed and
3076 filed rather differently than normal articles.
3077
3078 NOTE: The restriction of verbs to letters and dig-
3079 its is new, but is consistent with existing prac-
3080 tice and potentially simplifies implementation by
3081 avoiding characters significant to command inter-
3082 preters. Beware that the arguments are under no
3083 such restriction in general.
3084
3085 NOTE: Two other conventions for distinguishing
3086 control messages from normal articles were for-
3087 merly in use: a three-component newsgroup name
3088 ending in ".ctl" or a subject beginning with
3089 "cmsg " was considered to imply that the article
3090 was a control message. These conventions are
3091 obsolete. Do not use them.
3092
3093 An article with a Control header MUST not have an Also-
3094 Control or Supersedes header.
3095
3096
3097
3098
3099
3100 2 June 1994 - 47 - expires 15 July 1994
3101
3102
3103
3104
3105
3106 INTERNET DRAFT to be NEWS sec. 6.7
3107
3108
3109 6.7. Distribution
3110
3111 The Distribution header content specifies geographic or
3112 organizational limits on an article's propagation:
3113
3114 Distribution-content = distribution *( dist-delim distribution )
3115 dist-delim = ","
3116 distribution = plain-component
3117
3118 A distribution is syntactically identical to a one-component
3119 newsgroup name, and must satisfy the same rules and restric-
3120 tions. In the absence of Distribution, the default distri-
3121 bution is "world".
3122
3123 NOTE: This syntax has the disadvantage of contain-
3124 ing no white space, making it impossible to con-
3125 tinue a Distribution header across several lines.
3126 Implementors of relayers and reading agents are
3127 warned that it is intended that the successor to
3128 this Draft will change the definition of dist
3129 delimiter to:
3130
3131 dist-delim = "," [ space ]
3132
3133 and are urged to fix their software to handle
3134 (i.e., ignore) white space following the commas.
3135
3136 A relayer MUST not pass an article to another relayer unless
3137 configuration information specifies transmission to that
3138 other relayer of BOTH (a) at least one of the article's
3139 newsgroup(s), and (b) at least one of the article's distri-
3140 bution(s). In effect, the only role of distributions is to
3141 limit propagation, by preventing transmission of articles
3142 that would have been transmitted had the decision been based
3143 solely on newsgroups.
3144
3145 A posting agent might wish to present a menu of possible
3146 distributions, or suggest a default, but normally SHOULD not
3147 supply a default without giving the poster a chance to over-
3148 ride it. A followup agent SHOULD initially supply the same
3149 Distribution header as found in the precursor, although the
3150 poster MAY alter this if appropriate.
3151
3152 Despite the syntactic similarity and some historical confu-
3153 sion, distributions are NOT newsgroup names. The whole
3154 point of putting a distribution on an article is that it is
3155 DIFFERENT from the newsgroup(s). In general, a meaningful
3156 distribution corresponds to some sort of region of propaga-
3157 tion: a geographical area, an organization, or a cooperating
3158 subnet.
3159
3160 NOTE: Distributions have historically suffered
3161 from the completely uncontrolled nature of their
3162 name space, the lack of feedback to posters on
3163
3164
3165
3166 2 June 1994 - 48 - expires 15 July 1994
3167
3168
3169
3170
3171
3172 INTERNET DRAFT to be NEWS sec. 6.7
3173
3174
3175 incomplete propagation resulting from use of ran-
3176 dom trash in Distribution headers, and confusion
3177 with newsgroups (arising partly because many
3178 regions and organizations DO have internal news-
3179 groups with names resembling their internal dis-
3180 tributions). This has resulted in much garbage in
3181 Distribution headers, notably the pointless prac-
3182 tice of automatically supplying the first compo-
3183 nent of the newsgroup name as a distribution
3184 (which is MOST unlikely to restrict propagation!).
3185 Many sites have opted to maximize propagation of
3186 such ill-formed articles by essentially ignoring
3187 distributions. This unfortunately interferes with
3188 legitimate uses. The situation is bad enough that
3189 distributions must be considered largely useless
3190 except within cooperating subnets that make an
3191 organized effort to restrain propagation of their
3192 internal distributions.
3193
3194 NOTE: The distributions "world" and "local" have
3195 no standard magic meaning (except that the former
3196 is the default distribution if none is given).
3197 Some pieces of software do assign such meanings to
3198 them.
3199
3200
3201 6.8. Keywords
3202
3203 The Keywords header content is one or more phrases intended
3204 to describe some aspect of the content of the article:
3205
3206 Keywords-content = plain-phrase *( "," [ space ] plain-phrase )
3207
3208 Keywords, separated by commas, each follow the <plain-
3209 phrase> syntax defined in section 5.2. Encoded words in
3210 keywords MUST not contain characters other than letters (of
3211 either case), digits, and the characters "!", "*", "+", "-",
3212 "/", "=", and "_".
3213
3214 NOTE: Posters and posting agents are asked to take
3215 note that keywords are separated by commas, not by
3216 white space. The following Keywords header con-
3217 tains only one keyword (a rather unlikely and
3218 improbable one):
3219
3220 Keywords: Thompson Ritchie Multics Linux
3221
3222 and should probably have been written:
3223
3224 Keywords: Thompson, Ritchie, Multics, Linux
3225
3226 This particular error is unfortunately rather
3227 widespread.
3228
3229
3230
3231
3232 2 June 1994 - 49 - expires 15 July 1994
3233
3234
3235
3236
3237
3238 INTERNET DRAFT to be NEWS sec. 6.8
3239
3240
3241 NOTE: Reading agents and archivers preparing
3242 indexes of articles should bear in mind that user-
3243 chosen keywords are notoriously poor for indexing
3244 purposes unless the keywords are picked from a
3245 predefined set (which they are not in this case).
3246 Also, some followup agents unwisely propagate the
3247 Keywords header from the precursor into the fol-
3248 lowup by default. At least one news-based experi-
3249 ment has found the contents of Keywords headers to
3250 be completely valueless for indexing.
3251
3252
3253 6.9. Summary
3254
3255 The Summary header content is a short phrase summarizing the
3256 article's content:
3257
3258 Summary-content = nonblank-text
3259
3260 As with the subject, no restriction is placed on the content
3261 since it is intended solely for display to humans.
3262
3263 NOTE: Reading agents should be aware that the Sum-
3264 mary header is often used as a sort of secondary
3265 Subject header, and (if present) its contents
3266 should perhaps be displayed when the subject is
3267 displayed.
3268
3269 The summary SHOULD be terse. Posters SHOULD avoid trying to
3270 cram their entire article into the headers; even the sim-
3271 plest query usually benefits from a sentence or two of elab-
3272 oration and context, and not all reading agents display all
3273 headers.
3274
3275
3276 6.10. Approved
3277
3278 The Approved header content indicates the mailing addresses
3279 (and possibly the full names) of the persons or entities
3280 approving the article for posting:
3281
3282 Approved-content = From-content *( "," [ space ] From-content )
3283
3284 An Approved header is required in all postings to moderated
3285 newsgroups; the presence or absence of this header allows a
3286 posting agent to distinguish between articles posted by the
3287 moderator (which are normal articles to be posted normally)
3288 and attempted contributions by others (which should be
3289 mailed to the moderator for approval). An Approved header
3290 is also required in certain control messages, to reduce the
3291 probability of accidental posting of same; see the relevant
3292 parts of section 7.
3293
3294
3295
3296
3297
3298 2 June 1994 - 50 - expires 15 July 1994
3299
3300
3301
3302
3303
3304 INTERNET DRAFT to be NEWS sec. 6.10
3305
3306
3307 NOTE: There is, at present, no way to authenticate
3308 Approved headers to ensure that the claimed
3309 approval really was bestowed. Nor is there an
3310 established mechanism for even maintaining a list
3311 of legitimate approvers (such a list would quickly
3312 become out of date if it had to be maintained by
3313 hand). Such mechanisms, presumably relying on
3314 cryptographic authentication, would be a worth-
3315 while extension to this Draft, and experimental
3316 work in this area is encouraged. (The problem is
3317 harder than it sounds because news is used on many
3318 systems which do not have real-time access to key
3319 servers.)
3320
3321 NOTE: Relayer implementors, please note well: it
3322 is the POSTING AGENT that is authorized to distin-
3323 guish between moderator postings and attempted
3324 contributions, and to mail the latter to the mod-
3325 erator. As discussed in section 9.1, relayers
3326 MUST not, repeat MUST not, send such mail; on
3327 receipt of an unApproved article in a moderated
3328 newsgroup, they should discard the article, NOT
3329 transform it into a mail message (except perhaps
3330 to a local administrator).
3331
3332 NOTE: RFC 1036 restricted Approved to a single
3333 From-content. However, multiple moderation is no
3334 longer rare, and multi-moderator Approved headers
3335 are already in use.
3336
3337
3338 6.11. Lines
3339
3340 The Lines header content indicates the number of lines in
3341 the body of the article:
3342
3343 Lines-content = 1*digit
3344
3345 The line count includes all body lines, including the signa-
3346 ture if any, including empty lines (if any) at beginning or
3347 end of the body. (The single empty separator line between
3348 the headers and the body is not part of the body.) The
3349 "body" here is the body as found in the posted article,
3350 AFTER all transformations such as MIME encodings.
3351
3352 Reading agents SHOULD not rely on the presence of this
3353 header, since it is optional (and some posting agents do not
3354 supply it). They MUST not rely on it being precise, since
3355 it frequently is not.
3356
3357 NOTE: The average line length in article bodies is
3358 surprisingly consistent at about 40 characters,
3359 and since the line count typically is used only
3360 for approximate judgements ("is this too long to
3361
3362
3363
3364 2 June 1994 - 51 - expires 15 July 1994
3365
3366
3367
3368
3369
3370 INTERNET DRAFT to be NEWS sec. 6.11
3371
3372
3373 read quickly?"), dividing the byte count of the
3374 body by 40 gives an estimate of the body line
3375 count that is adequate for normal use. This esti-
3376 mate is NOT adequate if the body has been MIME
3377 encoded... but neither is the Lines header, since
3378 at least one major relayer will supply a Lines
3379 header for an article that lacks one, and will not
3380 consider the possibility of MIME encodings when
3381 computing the line count.
3382
3383 NOTE: It would be better to have a Content-Size
3384 header as part of MIME, so that body parts could
3385 have their own sizes, and so that the units used
3386 could be appropriate to the data type (line count
3387 is not a useful measure of the size of an encoded
3388 image, for example). Doing this is preferable to
3389 trying to fix Lines.
3390
3391 UNRESOLVED ISSUE: Update on Content-Size?
3392
3393 Relayers SHOULD discard this header if they find it neces-
3394 sary to re-encode the article in such a way that the origi-
3395 nal Lines header would be rendered incorrect.
3396
3397
3398 6.12. Xref
3399
3400 The Xref header content indicates where an article was filed
3401 by the last relayer to process it:
3402
3403 Xref-content = relayer 1*( space location )
3404 relayer = relayer-name
3405 location = newsgroup-name ":" article-locator
3406 article-locator = 1*<ASCII printable character>
3407
3408 The relayer's name is included so that software can deter-
3409 mine which relayer generated the header (and specifically,
3410 whether it really was the one that filed the copy being
3411 examined). The locations specify what newsgroups the arti-
3412 cle was filed under (which may differ from those in the
3413 Newsgroups header) and where it was filed under them. The
3414 exact form of an article locator is implementation-specific.
3415
3416 NOTE: Reading agents can exploit this information
3417 to avoid presenting the same article to a reader
3418 several times. The information is sometimes
3419 available in system databases, but having it in
3420 the article is convenient. Relayers traditionally
3421 generate an Xref header only if the article is
3422 cross-posted, but this is not mandatory, and there
3423 is at least one new application ("mirroring":
3424 keeping news databases on two hosts identical)
3425 where the header is useful in all articles.
3426
3427
3428
3429
3430 2 June 1994 - 52 - expires 15 July 1994
3431
3432
3433
3434
3435
3436 INTERNET DRAFT to be NEWS sec. 6.12
3437
3438
3439 NOTE: The traditional form of an article locator
3440 is a decimal number, with articles in each news-
3441 group numbered consecutively starting from 1.
3442 NNTP [rrr] demands that such a model be provided,
3443 and there may be other software which expects it,
3444 but it seems desirable to permit flexibility for
3445 unorthodox implementations.
3446
3447 A relayer inserting an Xref header into an article MUST
3448 delete any previous Xref header. A relayer which is not
3449 inserting its own Xref header SHOULD delete any previous
3450 Xref header. A relayer MAY delete the Xref header when
3451 passing an article on to another relayer.
3452
3453 NOTE: RFC 1036 specified that the Xref header was
3454 not transmitted when an article was passed to
3455 another relayer, but the major news implementa-
3456 tions have never obeyed this rule, and applica-
3457 tions like mirroring depend on this disobedience.
3458
3459 A relayer MUST use the same name in Xref headers as it uses
3460 in Path headers. Reading agents MUST ignore an Xref header
3461 containing a relayer name that differs from the one that
3462 begins the path list.
3463
3464
3465 6.13. Organization
3466
3467 The Organization header content is a short phrase identify-
3468 ing the poster's organization:
3469
3470 Organization-content = nonblank-text
3471
3472 This header is typically supplied by the posting agent. The
3473 Organization content SHOULD mention geographical location
3474 (e.g. city and country) when it is not obvious from the
3475 organization's name.
3476
3477 NOTE: The motive here is that the organization is
3478 often difficult to guess from the mailing address,
3479 is not always supplied in a signature, and can
3480 help identify the poster to the reader.
3481
3482 NOTE: There is no "s" in "Organization".
3483
3484 The Organization content is provided for identification
3485 only, and does not imply that the poster speaks for the
3486 organization or that the article represents organization
3487 policy. Posting agents SHOULD permit the poster to override
3488 a local default Organization header.
3489
3490
3491
3492
3493
3494
3495
3496 2 June 1994 - 53 - expires 15 July 1994
3497
3498
3499
3500
3501
3502 INTERNET DRAFT to be NEWS sec. 6.14
3503
3504
3505 6.14. Supersedes
3506
3507 The Supersedes header content specifies articles to be can-
3508 celled on arrival of this one:
3509
3510 Supersedes-content = message-id *( space message-id )
3511
3512 Supersedes is equivalent to Also-Control (section 6.15) with
3513 an implicit verb of "cancel" (section 7.1).
3514
3515 NOTE: Supersedes is normally used where the arti-
3516 cle is an updated version of the one(s) being can-
3517 celled.
3518
3519 NOTE: Although the ability to use multiple message
3520 IDs in Supersedes is highly desirable (see section
3521 7.1), posters are warned that existing implementa-
3522 tions often do not correctly handle more than one.
3523
3524 NOTE: There is no "c" in "Supersedes".
3525
3526 An article with a Supersedes header MUST not have an Also-
3527 Control or Control header.
3528
3529
3530 6.15. Also-Control
3531
3532 The Also-Control header content marks the article as being a
3533 control message IN ADDITION to being a normal news article,
3534 and specifies the desired actions:
3535
3536 Also-Control-content = Control-content
3537
3538 An article with an Also-Control header is filed and passed
3539 on normally, but the content of the Also-Control header is
3540 processed as if it were found in a Control header.
3541
3542 NOTE: It is sometimes desirable to piggyback con-
3543 trol actions on a normal article, so that the
3544 article will be filed normally but will also be
3545 acted on as a control message. This header is
3546 essentially a generalization of Supersedes.
3547
3548 NOTE: Be warned that some old relayers do not
3549 implement Also-Control.
3550
3551 An article with an Also-Control header MUST not have a Con-
3552 trol or Supersedes header.
3553
3554
3555 6.16. See-Also
3556
3557 The See-Also header content lists message IDs of articles
3558 that are related to this one but are not its precursors:
3559
3560
3561
3562 2 June 1994 - 54 - expires 15 July 1994
3563
3564
3565
3566
3567
3568 INTERNET DRAFT to be NEWS sec. 6.16
3569
3570
3571 See-Also-content = message-id *( space message-id )
3572
3573 See-Also resembles References, but without the restrictions
3574 imposed on References by the followup rules.
3575
3576 NOTE: See-Also provides a way to group related
3577 articles, such as the parts of a single document
3578 that had to be split across multiple articles due
3579 to its size, or to cross-reference between paral-
3580 lel threads.
3581
3582 NOTE: See the discussion (in section 6.5) on MAIL
3583 compatibility issues of References and See-Also.
3584
3585 NOTE: In the specific case where it is desired to
3586 essentially make another article PART of the cur-
3587 rent one, e.g. for annotation of the other arti-
3588 cle, MIME's "message/external-body" convention can
3589 be used to do so without actual inclusion. "news-
3590 message-ID" was registered as a standard external-
3591 body access method, with a mandatory NAME parame-
3592 ter giving the message ID and an optional SITE
3593 parameter suggesting an NNTP site that might have
3594 the article available (if it is not available
3595 locally), by IANA 22 June 1993.
3596
3597 UNRESOLVED ISSUE: Could the syntax be generalized
3598 to include URLs as alternatives to message IDs?
3599 Here it makes much more sense than in References.
3600
3601
3602 6.17. Article-Names
3603
3604 The Article-Names header content indicates any special sig-
3605 nificance the article may have in particular newsgroups:
3606
3607 Article-Names-content = 1*( name-clause space )
3608 name-clause = newsgroup-name ":" article-name
3609 article-name = letter 1*( letter / digit / "-" )
3610
3611 Each name clause specifies a newsgroup (which SHOULD be
3612 among those in the Newsgroups header) and an article name
3613 local to that newsgroup. Article names MAY be used by
3614 relayers to file the article in special ways, or they MAY
3615 just be noted for possible special attention by reading
3616 agents. Article names are case-sensitive.
3617
3618 NOTE: This header provides a way to mark special
3619 postings, such as introductions, frequently-asked-
3620 question lists, etc., so that reading agents have
3621 a way of finding them automatically. The news-
3622 group name is specified for each article name
3623 because the names may be newsgroup-specific; for
3624 example, many frequently-asked-question lists are
3625
3626
3627
3628 2 June 1994 - 55 - expires 15 July 1994
3629
3630
3631
3632
3633
3634 INTERNET DRAFT to be NEWS sec. 6.17
3635
3636
3637 posted to "news.answers" in addition to their
3638 "home" newsgroup, and they would not be known by
3639 the same name(s) in both newsgroups.
3640
3641 The Article-Names header SHOULD be ignored unless the arti-
3642 cle also contains an Approved header.
3643
3644 NOTE: This stipulation is made in anticipation of
3645 the possibility that Approved headers will be
3646 involved in cryptographic authentication.
3647
3648 The presence of an Article-Names header does not necessarily
3649 imply that the article will be retained unusually long
3650 before expiration, or that previous article(s) with similar
3651 Article-Names headers will be cancelled by its arrival.
3652 Posters preparing special postings SHOULD include appropri-
3653 ate other headers, such as Expires and Supersedes, to
3654 request such actions.
3655
3656 Different networks MAY establish different sets of article
3657 names for the special postings they deem significant; it is
3658 preferable for usage to be standardized within networks,
3659 although it might be desirable for individual newsgroups to
3660 have different naming conventions in some situations. Arti-
3661 cle names MUST be 14 characters or less. The following
3662 names are suggested but are not mandatory:
3663
3664 intro Introduction to the newsgroup for newcomers.
3665
3666 charter Charter, rules, organization, moderation poli-
3667 cies, etc.
3668
3669 background Biographies of special participants, history of
3670 the newsgroup, notes on related newsgroups, etc.
3671
3672 subgroups Descriptions of sub-newsgroups under this news-
3673 group, e.g. "sci.space.news" under "sci.space".
3674
3675 facts Information relating to the purpose of the news-
3676 group, e.g. an acronym glossary in "sci.space".
3677
3678 references Where to get more information: books, journals,
3679 FTP repositories, etc.
3680
3681 faq Answers to frequently-asked questions.
3682
3683 menu If present, a list of all the other article
3684 names local to this newsgroup, with brief
3685 descriptions of their contents.
3686
3687 Such articles may be divided into subsections using the MIME
3688 "multipart/mixed" conventions. If size considerations make
3689 it necessary to split such articles, names ending in a
3690 hyphen and a part number are suggested; for example, a
3691
3692
3693
3694 2 June 1994 - 56 - expires 15 July 1994
3695
3696
3697
3698
3699
3700 INTERNET DRAFT to be NEWS sec. 6.17
3701
3702
3703 three-part frequently-asked-questions list could have arti-
3704 cle names "faq-1", "faq-2", and "faq-3".
3705
3706 NOTE: It is somewhat premature to attempt to stan-
3707 dardize article names, since this is essentially a
3708 new feature with no experience behind it. How-
3709 ever, if reading agents are to attach special sig-
3710 nificance to these names, some attempt at standard
3711 conventions is imperative. This is a first
3712 attempt at providing some.
3713
3714
3715 6.18. Article-Updates
3716
3717 The Article-Updates header content indicates what previous
3718 articles this one is deemed (by the poster) to update (i.e.,
3719 replace):
3720
3721 Article-Updates-content = message-id *( space message-id )
3722
3723 Each message ID identifies a previous article that this one
3724 is deemed to update. This MUST not cause the previous arti-
3725 cle(s) to be cancelled or otherwise altered, unless this is
3726 implied by other headers (e.g. Supersedes); Article-Updates
3727 is merely an advisory which MAY be noted for special atten-
3728 tion by reading agents.
3729
3730 NOTE: This header provides a way to mark articles
3731 which are only minor updates of previous ones,
3732 containing no significant new information and not
3733 worth reading if the previous ones have been read.
3734
3735 NOTE: If suitable conventions using MIME multipart
3736 bodies and the "message/external-body" body-part
3737 type can be developed, a replacing article might
3738 contain only differences between the old text and
3739 the new text, rather than a complete new copy.
3740 This is the motivation for not making Article-
3741 Updates also function as Supersedes does: the
3742 replacing article might depend on the continued
3743 presence of the replaced article.
3744
3745
3746 7. Control Messages
3747
3748 The following sections document the currently-defined con-
3749 trol messages. "Message" is used herein as a synonym for
3750 "article" unless context indicates otherwise.
3751
3752 Posting agents are warned that since certain control mes-
3753 sages require article bodies in quite specific formats, sig-
3754 natures SHOULD not be appended to such articles, and it may
3755 be wise to take greater care than usual to avoid unintended
3756 (although perhaps well-meaning) alterations to text supplied
3757
3758
3759
3760 2 June 1994 - 57 - expires 15 July 1994
3761
3762
3763
3764
3765
3766 INTERNET DRAFT to be NEWS sec. 7
3767
3768
3769 by the poster. Relayers MUST assume that control messages
3770 mean what they say; they MAY be obeyed as is or rejected,
3771 but MUST not be reinterpreted.
3772
3773 The execution of the actions requested by control messages
3774 is subject to local administrative restrictions, which MAY
3775 deny requests or refer them to an administrator for
3776 approval. The descriptions below are generally phrased in
3777 terms suggesting mandatory actions, but any or all of these
3778 MAY be subject to local administrative approval (either as a
3779 class or case-by-case). Analogously, where the description
3780 below specifies that a message or portion thereof is to be
3781 ignored, this action MAY include reporting it to an adminis-
3782 trator.
3783
3784 NOTE: The exact choice of local action might
3785 depend on what action the control message
3786 requests, who it claims to come from, etc.
3787
3788 Relayers MUST propagate even control messages they do not
3789 understand.
3790
3791 In the following sections, each type of control message is
3792 defined syntactically by defining its arguments and its
3793 body. For example, "cancel" is defined by defining cancel-
3794 arguments and cancel-body.
3795
3796
3797 7.1. cancel
3798
3799 The cancel message requests that one or more previous arti-
3800 cles be "cancelled":
3801
3802 cancel-arguments = message-id *( space message-id )
3803 cancel-body = body
3804
3805 The argument(s) identify the articles to be cancelled, by
3806 message ID. The body is a comment, which software MUST
3807 ignore, and SHOULD contain an indication of why the cancel-
3808 lation was requested. The cancel message SHOULD be posted
3809 to the same newsgroup(s), with the same distribution(s), as
3810 the article(s) it is attempting to cancel.
3811
3812 NOTE: Using the same newsgroups and distributions
3813 maximizes the chances of the cancel message propa-
3814 gating everywhere the target articles went.
3815
3816 NOTE: RFC 1036 permitted only a single message-id
3817 in a cancel message. Support for cancelling mul-
3818 tiple articles is highly desirable, especially for
3819 use with Supersedes (see section 6.14). If sev-
3820 eral revisions of an article appear in fast suc-
3821 cession, each using Supersedes to cancel the pre-
3822 vious one, it is possible for a middle revision to
3823
3824
3825
3826 2 June 1994 - 58 - expires 15 July 1994
3827
3828
3829
3830
3831
3832 INTERNET DRAFT to be NEWS sec. 7.1
3833
3834
3835 be destroyed by cancellation before it is propa-
3836 gated onward to cancel its predecessor. Allowing
3837 each article to cancel several predecessors
3838 greatly alleviates this problem. (Posting agents
3839 preparing a cancel of an article which itself can-
3840 cels other articles might wish to add those arti-
3841 cles to the cancel-arguments.) However, posters
3842 should be aware that much old software does not
3843 implement multiple cancellation properly, and
3844 should avoid using it when reliable cancellation
3845 is vitally important.
3846
3847 When an article (the "target article") is to be cancelled,
3848 there are four cases of interest: the article hasn't arrived
3849 yet, it has arrived and been filed and is available for
3850 reading, it has expired and been archived on some less-
3851 accessible storage medium, or it has expired and been
3852 deleted. The next few paragraphs discuss each case in turn
3853 (in reverse order, which is convenient for the explanation).
3854
3855 EXPIRED AND DELETED. Take no action.
3856
3857 EXPIRED AND ARCHIVED. If the article is readily accessible
3858 and can be deleted or made unreadable easily, treat as under
3859 AVAILABLE below. Otherwise treat as under EXPIRED AND
3860 DELETED.
3861
3862 NOTE: While it is desirable for archived articles
3863 to be cancellable, this can easily involve rewrit-
3864 ing an entire archive volume just to get rid of
3865 one article, perhaps with manual actions required
3866 to arrange it. It is difficult to envision a sit-
3867 uation so dire as to require such measures from
3868 hundreds or thousands of administrators, or for
3869 that matter one in which widespread compliance
3870 with such a request is likely.
3871
3872 AVAILABLE. Compare the mailing addresses from the From
3873 lines of the cancel message and the target article, bearing
3874 in mind that local parts (except for "postmaster") are case-
3875 sensitive and domains are case-insensitive. If they do not
3876 match, either refer the issue to an administrator for a
3877 case-by-case decision, or treat as if they matched.
3878
3879 NOTE: It is generally trivial to forge articles,
3880 so nothing short of cryptographic authentication
3881 is really adequate to ensure that a cancel came
3882 from the original article's author. Moreover, it
3883 is highly desirable to permit authorities other
3884 than the author to cancel articles, to allow for
3885 cases in which the author is unavailable, uncoop-
3886 erative, or malicious, and in which damage and/or
3887 legal problems may be minimized by prompt cancel-
3888 lation. Reliable authentication that would permit
3889
3890
3891
3892 2 June 1994 - 59 - expires 15 July 1994
3893
3894
3895
3896
3897
3898 INTERNET DRAFT to be NEWS sec. 7.1
3899
3900
3901 such administrative cancels would be a worthwhile
3902 extension to this Draft, and experimental work in
3903 this area is encouraged.
3904
3905 NOTE: Meanwhile, a simple check of addresses is
3906 useful accident prevention and catches at least
3907 the most simple-minded forgers. Since the intent
3908 is accident prevention rather than ironclad secu-
3909 rity, use of the From address is appropriate, all
3910 the more so because in the presence of gateways
3911 (especially redundant multiple gateways), the
3912 author may not have full control over Sender head-
3913 ers.
3914
3915 NOTE: The "refer... or treat as if they matched"
3916 rule is intended to specifically forbid quietly
3917 ignoring cancels with mismatched addresses.
3918
3919 If the addresses match, then if technically possible, the
3920 relayer MUST delete the target article completely and imme-
3921 diately. Failing that, it MUST make the target article
3922 unreadable (preferably to everyone, minimally to everyone
3923 but the administrator) and either arrange for it to be
3924 deleted as soon as possible or notify an administrator at
3925 once.
3926
3927 NOTE: To allow for events such as criminal
3928 actions, malicious forgeries, and copyright
3929 infringements, where damage and/or legal problems
3930 may be minimized by prompt cancellation, complete
3931 removal is strongly preferred over merely making
3932 the target article unreadable. The potential for
3933 malice is outweighed by the importance of really
3934 getting rid of the target article in some legiti-
3935 mate cases. (In cases of inadvertent copyright
3936 violation in particular, the ability to quickly
3937 remedy the violation is of considerable legal
3938 importance.) Failing that, making it unreadable
3939 is better than nothing.
3940
3941 NOTE: Merely annotating the article so that read-
3942 ers see an indication that the author wanted it
3943 cancelled is not acceptable. Making the article
3944 unreadable is the minimum action.
3945
3946 NOTE: There have been experiments with making can-
3947 celled articles unreadable, so that local news
3948 administrators could reverse cancellations. In
3949 practice, administrators almost never find cause
3950 to do so. Removal appears to be clearly prefer-
3951 able where technically feasible.
3952
3953 NOT ARRIVED YET. If practical, retain the cancel message
3954 until the target article does arrive, or until there is no
3955
3956
3957
3958 2 June 1994 - 60 - expires 15 July 1994
3959
3960
3961
3962
3963
3964 INTERNET DRAFT to be NEWS sec. 7.1
3965
3966
3967 further possibility of it arriving and being accepted (see
3968 section 9.2), and then treat as under AVAILABLE. Failing
3969 that, arrange for the target article to be rejected and dis-
3970 carded if it does arrive.
3971
3972 NOTE: It may well be impractical to retain the
3973 control message, given uncertainty about whether
3974 the target article will ever arrive. Existing
3975 practice in such cases is to assume that addresses
3976 would match and arrange the equivalent of dele-
3977 tion. This is often done by making a spurious
3978 entry in a database of already-seen message IDs
3979 (see section 9.3), so that if the article does
3980 arrive, it will be rejected as a duplicate.
3981
3982 The cancel message MUST be propagated onward in the usual
3983 fashion, regardless of which of the four cases applied, so
3984 that the target article will be cancelled everywhere even if
3985 cancellation and target article follow different routes.
3986
3987 NOTE: RFC 1036 appeared to require stopping cancel
3988 propagation in the NOT ARRIVED YET case, although
3989 the wording was somewhat unclear. This appears to
3990 have been an unwise decision; there are known
3991 cases of important cancellations (in situations
3992 of, e.g., inadvertent copyright violation) achiev-
3993 ing rather poorer propagation than the target
3994 article. News propagation is often a much less
3995 orderly process than the authors of RFC 1036
3996 apparently envisioned. Modern implementations
3997 generally propagate the cancellation regardless.
3998
3999 Posting agents meant for use by ordinary posters SHOULD
4000 reject an attempt to post a cancel message if the target
4001 article is available and the mailing address in its From
4002 header does not match the one in the cancel message's From
4003 header.
4004
4005 NOTE: This, again, is primarily accident preven-
4006 tion.
4007
4008
4009 7.2. ihave, sendme
4010
4011 The ihave and sendme control messages implement a crude
4012 batched predecessor of the NNTP [rrr] protocol. They are
4013 largely obsolete in the Internet, but still see use in the
4014 UUCP environment, especially for backup feeds that normally
4015 are active only when a primary feed path has failed.
4016
4017 NOTE: The ihave and sendme messages defined here
4018 have ABSOLUTELY NOTHING TO DO WITH NNTP, despite
4019 similarities of terminology.
4020
4021
4022
4023
4024 2 June 1994 - 61 - expires 15 July 1994
4025
4026
4027
4028
4029
4030 INTERNET DRAFT to be NEWS sec. 7.2
4031
4032
4033 The two messages share the same syntax:
4034
4035 ihave-arguments = *( message-id space ) relayer-name
4036 sendme-arguments = ihave-arguments
4037 ihave-body = *( message-id eol )
4038 sendme-body = ihave-body
4039
4040 Message IDs MUST appear in either the arguments or the body,
4041 but not both. Relayers SHOULD generate the form putting
4042 message IDs in the body, but the other form MUST be sup-
4043 ported for backward compatibility.
4044
4045 NOTE: RFC 1036 made the relayer name optional, but
4046 difficulties could easily ensue in determining the
4047 origin of the message, and this option is believed
4048 to be unused nowadays. Putting the message IDs in
4049 the body is strongly preferred over putting them
4050 in the arguments because it lends itself much bet-
4051 ter to large numbers of message IDs and avoids the
4052 empty-body problem mentioned in section 4.3.1.
4053
4054 The ihave message states that the named relayer has filed
4055 articles with the specified message IDs, which may be of
4056 interest to the relayer(s) receiving the ihave message. The
4057 sendme message requests that the relayer receiving it send
4058 the articles having the specified message IDs to the named
4059 relayer.
4060
4061 These control messages are normally sent essentially as
4062 point-to-point messages, by using "to." newsgroups (see sec-
4063 tion 5.5) that are sent only to the relayer the messages are
4064 intended for. The two relayers MUST be neighbors, exchang-
4065 ing news directly with each other. Each relayer advertises
4066 its new arrivals to the other using ihave messages, and each
4067 uses sendme messages to request the articles it lacks.
4068
4069 NOTE: Arguably these point-to-point control mes-
4070 sages should flow by some other protocol, e.g.
4071 mail, but administrative and interfacing issues
4072 are simplified if the news system doesn't need to
4073 talk to the mail system.
4074
4075 To reduce overhead, ihave and sendme messages SHOULD be sent
4076 relatively infrequently and SHOULD contain substantial num-
4077 bers of message IDs. If ihave and sendme are being used to
4078 implement a backup feed, it may be desirable to insert a
4079 delay between reception of an ihave and generation of a
4080 sendme, so that a slightly slow primary feed will not cause
4081 large numbers of articles to be requested unnecessarily via
4082 sendme.
4083
4084
4085
4086
4087
4088
4089
4090 2 June 1994 - 62 - expires 15 July 1994
4091
4092
4093
4094
4095
4096 INTERNET DRAFT to be NEWS sec. 7.3
4097
4098
4099 7.3. newgroup
4100
4101 The newgroup control message requests that a new newsgroup
4102 be created:
4103
4104 newgroup-arguments = newsgroup-name [ space moderation ]
4105 moderation = "moderated" / "unmoderated"
4106 newgroup-body = body
4107 / [ body ] descriptor [ body ]
4108 descriptor = descriptor-tag eol description-line eol
4109 descriptor-tag = "For your newsgroups file:"
4110 description-line = newsgroup-name space description
4111 description = nonblank-text [ " (Moderated)" ]
4112
4113 The first argument names the newsgroup to be created, and
4114 the second one (if present) indicates whether it is moder-
4115 ated. If there is no second argument, the default is
4116 "unmoderated".
4117
4118 NOTE: Implementors are warned that there is occa-
4119 sional use of other forms in the second argument.
4120 It is suggested that such violations of this
4121 Draft, which are also violations of RFC 1036,
4122 cause the newgroup message to be ignored. RFC
4123 1036 was slightly vague about how second arguments
4124 other than "moderated" were to be treated (specif-
4125 ically, whether they were illegal or just
4126 ignored), but it is thought that all existing
4127 major implementations will handle "unmoderated"
4128 correctly, and it appears desirable to tighten up
4129 the specs to make it possible for other forms to
4130 be used in future.
4131
4132 The body is a comment, which software MUST ignore, except
4133 that if it contains a descriptor, the description line is
4134 intended to be suitable for addition to a list of newsgroup
4135 descriptions. The description cannot be continued onto
4136 later lines, but is not constrained to any particular
4137 length. Moderated newsgroups have descriptions that end
4138 with the string " (Moderated)" (note that this string begins
4139 with a blank).
4140
4141 NOTE: It is unfortunate that the description line
4142 is part of the body, rather than being supplied in
4143 a header, but this is established practice. News-
4144 group creators are cautioned that the descriptor
4145 tag must be reproduced exactly as given above,
4146 alone on a line, and is case-sensitive. (To
4147 reduce errors in this regard, posting agents might
4148 wish to question or reject newgroup messages which
4149 do not contain a descriptor.) Given the desire
4150 for short lines, description writers should avoid
4151 content-free phrases like "discussion of" and
4152 "news about", and stick to defining what the
4153
4154
4155
4156 2 June 1994 - 63 - expires 15 July 1994
4157
4158
4159
4160
4161
4162 INTERNET DRAFT to be NEWS sec. 7.3
4163
4164
4165 newsgroup is about.
4166
4167 The remainder of the body SHOULD contain an explanation of
4168 the purpose of the newsgroup and the decision to create it.
4169
4170 NOTE: Criteria for newsgroup creation vary widely
4171 and are outside the scope of this Draft, but if
4172 formal procedures of one kind or another were fol-
4173 lowed in the decision, the body should mention
4174 this. Administrators often look for such informa-
4175 tion when deciding whether to comply with cre-
4176 ation/deletion requests.
4177
4178 A newgroup message which lacks an Approved header MUST be
4179 ignored.
4180
4181 NOTE: It would also be desirable to ignore a new-
4182 group message unless its Approved header names a
4183 person who is authorized (in some sense) to create
4184 such a newsgroup. A cooperating subnet with suf-
4185 ficiently strong coordination to maintain a cor-
4186 rect and current list of authorized creators might
4187 wish to do so for its internal newsgroups. It
4188 also (or alternatively) might wish to ignore a
4189 newgroup message for an internal newsgroup that
4190 was posted (or cross-posted) to a non-internal
4191 newsgroup.
4192
4193 NOTE: As mentioned in section 6.10, some form of
4194 (cryptographic?) authentication of Approved head-
4195 ers would be highly desirable, especially for con-
4196 trol messages.
4197
4198 It would be desirable to provide some way of supplying a
4199 moderator's address in a newgroup message for a moderated
4200 newsgroup, but this will cause problems unless effective
4201 authentication is available, so it is left for future work.
4202
4203 NOTE: This leaves news administrators stuck with
4204 the annoying chore of arranging proper mailing of
4205 moderated-newsgroup submissions. On Usenet, this
4206 can be simplified by exploiting a forwarding
4207 facility that some major sites provide: they main-
4208 tain forwarding addresses, each the name of a mod-
4209 erated newsgroup with all periods (".", ASCII 46)
4210 replaced by hyphens ("-", ASCII 45), which forward
4211 mail to the current newsgroup moderators. More
4212 advice on the subject of forwarding to moderators
4213 can be found in the document titled "How to Con-
4214 struct the Mailpaths File", posted regularly to
4215 the Usenet newsgroups news.lists, news.admin.misc,
4216 and news.answers.
4217
4218
4219
4220
4221
4222 2 June 1994 - 64 - expires 15 July 1994
4223
4224
4225
4226
4227
4228 INTERNET DRAFT to be NEWS sec. 7.3
4229
4230
4231 A newgroup message naming a newsgroup that already exists is
4232 requesting a change in the moderation status or description
4233 of the newsgroup. The same rules apply.
4234
4235
4236 7.4. rmgroup
4237
4238 The rmgroup message requests that a newsgroup be deleted:
4239
4240 rmgroup-arguments = newsgroup-name
4241 rmgroup-body = body
4242
4243 The sole argument is the newsgroup name. The body is a com-
4244 ment, which software MUST ignore; it SHOULD contain an
4245 explanation of the decision to delete the newsgroup.
4246
4247 NOTE: Criteria for newsgroup deletion vary widely
4248 and are outside the scope of this Draft, but if
4249 formal procedures of one kind or another were fol-
4250 lowed in the decision, the body should mention
4251 this. Administrators often look for such informa-
4252 tion when deciding whether to comply with cre-
4253 ation/deletion requests.
4254
4255 A rmgroup message which lacks an Approved header MUST be
4256 ignored.
4257
4258 NOTE: It would also be desirable to ignore a
4259 rmgroup message unless its Approved header names a
4260 person who is authorized (in some sense) to delete
4261 such a newsgroup. A cooperating subnet with suf-
4262 ficiently strong coordination to maintain a cor-
4263 rect and current list of authorized deleters might
4264 wish to do so for its internal newsgroups. It
4265 also (or alternatively) might wish to ignore a
4266 rmgroup message for an internal newsgroup that was
4267 posted (or cross-posted) to a non-internal news-
4268 group.
4269
4270 Unexpected deletion of a newsgroup being a disruptive
4271 action, implementations are strongly advised to refer
4272 rmgroup messages to an administrator by default, unless per-
4273 haps the message can be determined to have originated within
4274 a cooperating subnet whose members are considered trustwor-
4275 thy. Abuses have occurred.
4276
4277
4278 7.5. sendsys, version, whogets
4279
4280 The sendsys message requests that a description of the
4281 relayer's news feeds to other relayers be mailed to the
4282 article's reply address:
4283
4284
4285
4286
4287
4288 2 June 1994 - 65 - expires 15 July 1994
4289
4290
4291
4292
4293
4294 INTERNET DRAFT to be NEWS sec. 7.5
4295
4296
4297 sendsys-arguments = [ relayer-name ]
4298 sendsys-body = body
4299
4300 If there is an argument, relayers other than the one named
4301 by the argument MUST not respond. The body is a comment,
4302 which software MUST ignore; it SHOULD contain an explanation
4303 of the reason for the request.
4304
4305 The version message requests that the name and version of
4306 the relayer software be mailed to the reply address:
4307
4308 version-arguments =
4309 version-body = body
4310
4311 There are no arguments. The body is a comment, which soft-
4312 ware MUST ignore; it SHOULD contain an explanation of the
4313 reason for the request.
4314
4315 The whogets message requests that a description of the
4316 relayer and its news feeds to other relayers be mailed to
4317 the article's reply address:
4318
4319 whogets-arguments = newsgroup-name [ space relayer-name ]
4320 whogets-body = body
4321
4322 The first argument is the name of the "target newsgroup",
4323 specifying the newsgroup for which propagation information
4324 is desired. This MUST be a complete newsgroup name, not the
4325 name of a hierarchy or a portion of a newsgroup name that is
4326 not itself the name of a newsgroup. If there is a second
4327 argument, only the relayer named by that argument should
4328 respond. The body is a comment, which software MUST ignore;
4329 it SHOULD contain an explanation of the reason for the
4330 request.
4331
4332 NOTE: Whogets is intended as a replacement for
4333 sendsys (and version) with a precisely-specified
4334 reply format. Since the syntax for specifying
4335 what newsgroups get sent to what other relayers
4336 varies widely between different forms of relayer
4337 software, the only practical way to standardize
4338 the reply format is to indicate a specific news-
4339 group and ask where THAT newsgroup propagates.
4340 The requirement that it be a complete newsgroup
4341 name is intended to (largely) avoid the problem of
4342 having to answer "yes and no" in cases where not
4343 all newsgroups in a hierarchy are sent.
4344
4345 Any of these messages lacking an Approved header MUST be
4346 ignored. Response to any of these messages SHOULD be
4347 delayed for at least 24 hours, and no response should be
4348 attempted if the message has been cancelled in that time.
4349 Also, no response SHOULD be attempted unless the local part
4350 of the destination address is "newsmap". News
4351
4352
4353
4354 2 June 1994 - 66 - expires 15 July 1994
4355
4356
4357
4358
4359
4360 INTERNET DRAFT to be NEWS sec. 7.5
4361
4362
4363 administrators SHOULD arrange for mail to "newsmap" on their
4364 systems to be discarded (without reply) unless legitimate
4365 use is in progress.
4366
4367 NOTE: Because these messages can cause many, many
4368 relayers to send mail to one person, such mes-
4369 sages, specifying mailing to an innocent person's
4370 mailbox, have been forged as a half-witted practi-
4371 cal joke. A delay gives administrators time to
4372 notice a fraudulent message and act (by cancelling
4373 the message, preparing to divert the flood of mail
4374 into the bit bucket, or both). Restriction of the
4375 destination address to "newsmap" reduces the
4376 appeal of fraud by making it impossible to use it
4377 to harass a normal user. (A site which does NOT
4378 discard mail to "newsmap", but rather bounces it
4379 back, may incur higher communications costs than
4380 if the mail had been accepted into a user's mail-
4381 box... but a malicious forger could accomplish
4382 this anyway, by using an address whose local part
4383 is very unlikely to be a legitimate mailbox name.)
4384
4385 NOTE: RFC 1036 did not require the Approved header
4386 for these control messages. This has been added
4387 because of the possibility that cryptographic
4388 authentication of Approved headers will become
4389 available.
4390
4391 The body of the reply to a sendsys message SHOULD be of the
4392 form:
4393
4394 sendsys-reply = responder 1*sys-line
4395 responder = "Responding-System:" space domain eol
4396 sys-line = relayer-name ":" newsgroup-patterns [ ":" text ] eol
4397 newsgroup-patterns = newsgroup-name *( "," newsgroup-name )
4398
4399 The first line identifies the responding system, using a
4400 syntax resembling a header (but note that it is part of the
4401 BODY). Remaining lines indicate what newsgroups are sent to
4402 what other systems. The syntax of newsgroup patterns is not
4403 well standardized; the form described is common (often with
4404 newsgroup names only partially given, denoting all names
4405 starting with a particular set of components) but not uni-
4406 versal. The whogets message provides a better-defined
4407 alternative.
4408
4409 The reply to a version message is of somewhat ill-defined
4410 form, with a body normally consisting of a single line of
4411 text that somehow describes the version of the relayer soft-
4412 ware. The whogets message provides a better-defined alter-
4413 native.
4414
4415 The body of the reply to a whogets message MUST be of the
4416 form:
4417
4418
4419
4420 2 June 1994 - 67 - expires 15 July 1994
4421
4422
4423
4424
4425
4426 INTERNET DRAFT to be NEWS sec. 7.5
4427
4428
4429 whogets-reply = responder-domain responder-relayer response-date
4430 responding-to arrived-via responder-version
4431 whogets-delimiter *pass-line
4432 responder-domain = "Responding-System:" space domain eol
4433 responder-relayer = "Responding-Relayer:" space relayer-name eol
4434 response-date = "Response-Date:" space date eol
4435 responding-to = "Responding-To:" space message-id eol
4436 arrived-via = "Arrived-Via:" path-list eol
4437 responder-version = "Responding-Version:" space nonblank-text eol
4438 whogets-delimiter = eol
4439 pass-line = relayer-name [ space domain ] eol
4440
4441 The first six lines identify the responding relayer by its
4442 Internet domain name (use of the ".uucp" and ".bitnet"
4443 pseudo-domains is permissible, for registered hosts in them,
4444 but discouraged) and its relayer name, specify the date when
4445 the reply was generated and the message ID of the whogets
4446 message being replied to, give the path list (from the Path
4447 header) of the whogets message (which MAY, if absolutely
4448 necessary, be truncated to a convenient length, but MUST
4449 contain at least the leading three relayer names), and indi-
4450 cate the version of relayer software responding. Note that
4451 these lines are part of the BODY even though their format
4452 resembles that of headers. Despite the apparently-fixed
4453 order specified by the syntax above, they can appear in any
4454 order, but there must be exactly one of each.
4455
4456 After those preliminaries, and an empty line to unambigu-
4457 ously define their end, the remaining lines are the relayer
4458 names (which MAY be accompanied by the corresponding domain
4459 names, if known) of systems which the responding system
4460 passes the target newsgroup to. Only the names of news
4461 relayers are to be included.
4462
4463 NOTE: It is desirable for a reply to identify its
4464 source by both domain name and relayer name
4465 because news propagation is governed by the latter
4466 but location in a broader context is best deter-
4467 mined by the former. The date and whogets message
4468 ID should, in principle, be present in the MAIL
4469 headers, but are included in the body for robust-
4470 ness in the presence of uncooperative mail sys-
4471 tems. The reason for the path list is discussed
4472 below. Adding version information eliminates the
4473 need for a separate message to gather it.
4474
4475 NOTE: The limitation of pass lines to contain only
4476 names of news relayers is meant to exclude names
4477 used within a single host (as identifiers for mail
4478 gateways, portions of ihave/sendme implementa-
4479 tions, etc.), which do not actually refer to other
4480 hosts.
4481
4482
4483
4484
4485
4486 2 June 1994 - 68 - expires 15 July 1994
4487
4488
4489
4490
4491
4492 INTERNET DRAFT to be NEWS sec. 7.5
4493
4494
4495 A relayer which is unaware of the existence of the target
4496 newsgroup MUST not reply to a whogets message at all,
4497 although this MUST not influence decisions on whether to
4498 pass the article on to other relayers.
4499
4500 NOTE: While this may result in discontinuous maps
4501 in cases where some hosts have not honored
4502 requests for creation of a newsgroup, it will also
4503 prevent a flood of useless responses in the event
4504 that a whogets message intended to map a small
4505 region "leaks" out to a larger one. The possibil-
4506 ity of discontinuous recognition of a newsgroup
4507 does make it important that the whogets message
4508 itself continue to propagate (if other criteria
4509 permit). This is also the reason for the inclu-
4510 sion of the whogets message's path list, or at
4511 least the leading portion of it, in the reply: to
4512 permit reconstruction of at least small gaps in
4513 maps.
4514
4515 Different networks set different rules for the legitimacy of
4516 these messages, given that they may reveal details of orga-
4517 nization-internal topology that are sometimes considered
4518 proprietary.
4519
4520 NOTE: On Usenet, in particular, willingness to
4521 respond to these messages is held to be a condi-
4522 tion of network membership: the topology of Usenet
4523 is public information. Organizations wishing to
4524 belong to such networks while keeping their inter-
4525 nal topology confidential might wish to organize
4526 their internal news software so that all articles
4527 reaching outsiders appear to be from a single
4528 "gatekeeper" system, with the details of internal
4529 topology hidden behind that system.
4530
4531 UNRESOLVED ISSUE: It might be useful to have a way
4532 to set some sort of hop limit for these.
4533
4534
4535 7.6. checkgroups
4536
4537 The checkgroups control message contains a supposedly
4538 authoritative list of the valid newsgroups within some sub-
4539 set of the newsgroup name space:
4540
4541 checkgroups-arguments =
4542 checkgroups-body = [ invalidation ] valid-groups
4543 / invalidation
4544 invalidation = "!" plain-component *( "," plain-component ) eol
4545 valid-groups = 1*( description-line eol )
4546
4547 There are no arguments. The body lines (except possibly for
4548 an initial invalidation) each contain a description line for
4549
4550
4551
4552 2 June 1994 - 69 - expires 15 July 1994
4553
4554
4555
4556
4557
4558 INTERNET DRAFT to be NEWS sec. 7.6
4559
4560
4561 a newsgroup, as defined under the newgroup message (section
4562 7.3).
4563
4564 NOTE: Some other, ill-defined, forms of the check-
4565 groups body were formerly used. See appendix A.
4566
4567 The checkgroups message applies to all hierarchies contain-
4568 ing any of the newsgroups listed in the body. The check-
4569 groups message asserts that the newsgroups it lists are the
4570 only newsgroups in those hierarchies. If there is an inval-
4571 idation, it asserts that the hierarchies it names no longer
4572 contain any newsgroups.
4573
4574 Processing a checkgroups message MAY cause a local list of
4575 newsgroup descriptions to be updated. It SHOULD also cause
4576 the local lists of newsgroups (and their moderation sta-
4577 tuses) in the mentioned hierarchies to be checked against
4578 the message. The results of the check MAY be used for auto-
4579 matic corrective action, or MAY be reported to the news
4580 administrator in some way.
4581
4582 NOTE: Automatically updating descriptions of
4583 existing newsgroups is relatively safe. In the
4584 case of newsgroup additions or deletions, simply
4585 notifying the administrator is generally the wis-
4586 est action, unless perhaps the message can be
4587 determined to have originated within a cooperating
4588 subnet whose members are considered trustworthy.
4589
4590 NOTE: There is a problem with the checkgroups con-
4591 cept: not all newsgroups in a hierarchy necessar-
4592 ily propagate to the same set of machines.
4593 (Notably, there is a set of newsgroups known as
4594 the "inet" newsgroups, which have relatively lim-
4595 ited distribution but coexist in several hierar-
4596 chies with more widely-distributed newsgroups.)
4597 The advice of checkgroups should always be taken
4598 with a grain of salt, and should never be followed
4599 blindly.
4600
4601
4602 8. Transmission Formats
4603
4604 While this Draft does not specify transmission methods
4605 except to place a few constraints on them, there are some
4606 data formats used only for transmission that are unique to
4607 news.
4608
4609
4610 8.1. Batches
4611
4612 For efficient bulk transmission and processing of news arti-
4613 cles, it is often desirable to transmit a number of them as
4614 a single block of data, a "batch". The format of a batch
4615
4616
4617
4618 2 June 1994 - 70 - expires 15 July 1994
4619
4620
4621
4622
4623
4624 INTERNET DRAFT to be NEWS sec. 8.1
4625
4626
4627 is:
4628
4629 batch = 1*( batch-header article )
4630 batch-header = "#! rnews " article-size eol
4631 article-size = 1*digit
4632
4633 A batch is a sequence of articles, each prefixed by a header
4634 line that includes its size. The article size is a decimal
4635 count of the octets in the article, counting each EOL as one
4636 octet regardless of how it is actually represented.
4637
4638 NOTE: A relayer might wish to accept either a sin-
4639 gle article or a batch as input. Since "#" cannot
4640 appear in a header name, examination of the first
4641 octet of the input will reveal its nature.
4642
4643 NOTE: In the header line, there is exactly one
4644 blank before "rnews", there is exactly one blank
4645 after "rnews", and the EOL immediately follows the
4646 article size. Beware that some software inserts
4647 non-standard trash after the size.
4648
4649 NOTE: Despite the similarity of this format to the
4650 executable-script format used by some operating
4651 systems, it is EXTREMELY unwise to just feed
4652 incoming batches to a command interpreter in the
4653 anticipation that it will run a command named
4654 "rnews" to process the batch. Unless arrangements
4655 are made to very tightly restrict the range of
4656 commands that can be executed by this means, the
4657 security implications are disastrous.
4658
4659
4660 8.2. Encoded Batches
4661
4662 When transmitting news, especially over communications links
4663 that are slow or are billed by the bit, it is often desir-
4664 able to batch news and apply data compression to the
4665 batches. Transmission links sending compressed batches
4666 SHOULD use out-of-band means of communication to specify the
4667 compression algorithm being used. If there is no way to
4668 send out-of-band information along with a batch, the follow-
4669 ing encapsulation for a compressed batch MAY be used:
4670
4671 ec-batch = "#! " compression-keyword eol compressed-batch
4672 compression-keyword = "cunbatch"
4673
4674 A line containing a keyword indicating the type of compres-
4675 sion is followed by the compressed batch. The only truly
4676 widespread compression keyword at present is "cunbatch",
4677 indicating compression using the widely-distributed "com-
4678 press" program. Other compression keywords MAY be used by
4679 mutual agreement between the hosts involved.
4680
4681
4682
4683
4684 2 June 1994 - 71 - expires 15 July 1994
4685
4686
4687
4688
4689
4690 INTERNET DRAFT to be NEWS sec. 8.2
4691
4692
4693 NOTE: An encapsulated compressed batch is NOT, in
4694 general, a text file, despite having an initial
4695 text line. This combination of text and non-text
4696 data is often awkward to handle; for example,
4697 standard decompression programs cannot be used
4698 without first stripping off the initial line, and
4699 that in turn is painful to do because many text-
4700 handling tools that are superficially suited to
4701 the job do not cope well with non-text data.
4702 Hence the recommendation that out-of-band communi-
4703 cation be used instead when possible.
4704
4705 NOTE: For UUCP transmission, where a batch is typ-
4706 ically transmitted by invoking the remote command
4707 "rnews" with the batch as its input stream, a
4708 plausible out-of-band method for indicating a com-
4709 pression type would be to give a compression key-
4710 word in an option to "rnews", perhaps in the form:
4711
4712 rnews -d decompressor
4713
4714 where "decompressor" is the name of a decompres-
4715 sion program (e.g. "uncompress" for a batch com-
4716 pressed with "compress" or "gunzip" for a batch
4717 compressed with "gzip"). How this decompression
4718 program is located and invoked by the receiving
4719 relayer is implementation-specific.
4720
4721 NOTE: See the notes in section 8.1 on the inadvis-
4722 ability of feeding batches directly to command
4723 interpreters.
4724
4725 NOTE: There is exactly one blank between "#!" and
4726 the compression keyword, and the EOL immediately
4727 follows the keyword.
4728
4729
4730 8.3. News Within Mail
4731
4732 It is often desirable to transmit news as mail, either for
4733 the convenience of a human recipient or because that is the
4734 only type of transmission available on a restrictive commu-
4735 nication path.
4736
4737 Given the similarity between the news format and the MAIL
4738 format, it is superficially attractive to just send the news
4739 article as a mail message. This is typically a mistake:
4740 mail-handling software often feels free to manipulate vari-
4741 ous headers in undesirable ways (in some cases, such as
4742 Sender, such manipulation is actually mandatory), and mail
4743 transmission problems etc. MUST be reported to the adminis-
4744 trators responsible for the mail transmission rather than to
4745 the article's author. In general, news sent as mail should
4746 be encapsulated to separate the mail headers and the news
4747
4748
4749
4750 2 June 1994 - 72 - expires 15 July 1994
4751
4752
4753
4754
4755
4756 INTERNET DRAFT to be NEWS sec. 8.3
4757
4758
4759 headers.
4760
4761 When the intended recipient is a human, any convenient form
4762 of encapsulation may be used. Recommended practice is to
4763 use MIME encapsulation with a content type of "mes-
4764 sage/news", given that news articles have additional seman-
4765 tics beyond what "message/rfc822" implies.
4766
4767 NOTE: "message/news" was registered as a standard
4768 subtype by IANA 22 June 1993.
4769
4770 When mail is being used as a transmission path between two
4771 relayers, however, a standard method is desirable. Cur-
4772 rently the standard method is to send the mail to an address
4773 whose local part is "rnews", with whatever mail headers are
4774 necessary for successful transmission. The news article
4775 (including its headers) is sent as the body of the mail mes-
4776 sage, with an "N" prepended to each line.
4777
4778 NOTE: The "N" reduces the probability of an inno-
4779 cent line in a news article being taken as a magic
4780 command to mail software, and makes it easy for
4781 receiving software to strip off any lines added by
4782 mail software (e.g. the trailing empty line added
4783 by some UUCP mail software).
4784
4785 This method has its weaknesses. In particular, it assumes
4786 that the mail transmission channel can transmit nearly-
4787 arbitrary body text undamaged. When mail is being used as a
4788 transmission path of last resort, however, the mail system
4789 often has inconvenient preconceived notions about the format
4790 of message bodies. Various ad-hoc encoding schemes have
4791 been used to avoid such problems. The recommended method is
4792 to send a news article or batch as the body of a MIME mail
4793 message, using content type "application/news-transmission"
4794 and MIME's "base64" encoding (which is specifically designed
4795 to survive all known major mail systems).
4796
4797 NOTE: In the process, MIME conventions could be
4798 used to fragment and reassemble an article which
4799 is too large to be sent as a single mail message
4800 over a transmission path that restricts message
4801 length. In addition, the "conversions" parameter
4802 to the content type could be used to indicate what
4803 (if any) compression method has been used. And
4804 the Content-MD5 header [rrr 1544] can be used as a
4805 "checksum" to provide high confidence of detecting
4806 accidental damage to the contents.
4807
4808 UNRESOLVED ISSUE: The "conversions" parameter no
4809 longer exists. What should be done about this, if
4810 anything?
4811
4812
4813
4814
4815
4816 2 June 1994 - 73 - expires 15 July 1994
4817
4818
4819
4820
4821
4822 INTERNET DRAFT to be NEWS sec. 8.3
4823
4824
4825 NOTE: It might look tempting to use a content type
4826 such as "message/X-netnews", but MIME bans non-
4827 trivial encodings of the entire body of messages
4828 with content type "message". The intent is to
4829 avoid obscuring nested structure underneath encod-
4830 ings. For inter-relayer news transmission, there
4831 is no nested structure of interest, and it is
4832 important that the entire article (including its
4833 headers, not just its body) be protected against
4834 the vagaries of intervening mail software. This
4835 situation appears to fit the MIME description of
4836 circumstances in which "application" is the proper
4837 content type.
4838
4839 NOTE: "application/news-transmission", with a
4840 "conversions" parameter, was registered as a stan-
4841 dard subtype by IANA 22 June 1993.
4842
4843 UNRESOLVED ISSUE: The "conversions" parameter no
4844 longer exists in MIME. What should we do about
4845 this?
4846
4847
4848 8.4. Partial Batches
4849
4850 UNRESOLVED ISSUE: The existing batch conventions
4851 assemble (potentially) many articles into one
4852 batch. Handling very large articles would be sub-
4853 stantially less troublesome if there was also a
4854 fragmentation convention for splitting a large
4855 article into several batches. Is this worth
4856 defining at this time?
4857
4858
4859 9. Propagation and Processing
4860
4861 Most aspects of news propagation and processing are imple-
4862 mentation-specific. The basic propagation algorithms, and
4863 certain details of how they are implemented, nevertheless
4864 need to be standard.
4865
4866 There are two important principles that news implementors
4867 (and administrators) need to keep in mind. The first is the
4868 well-known Internet Robustness Principle:
4869
4870 Be liberal in what you accept, and conservative in what you send.
4871
4872 However, in the case of news there is an even more important
4873 principle, derived from a much older code of practice, the
4874 Hippocratic Oath (we will thus call this the Hippocratic
4875 Principle):
4876
4877 First, do no harm.
4878
4879
4880
4881
4882 2 June 1994 - 74 - expires 15 July 1994
4883
4884
4885
4886
4887
4888 INTERNET DRAFT to be NEWS sec. 9
4889
4890
4891 It is VITAL to realize that decisions which might be merely
4892 suboptimal in a smaller context can become devastating mis-
4893 takes when amplified by the actions of thousands of hosts
4894 within a few hours.
4895
4896
4897 9.1. Relayer General Issues
4898
4899 Relayers MUST not alter the content of articles unnecessar-
4900 ily. Well-intentioned attempts to "improve" headers, in
4901 particular, typically do more harm than good. It is neces-
4902 sary for a relayer to prepend its own name to the Path con-
4903 tent (see section 5.6) and permissible for it to rewrite or
4904 delete the Xref header (see section 6.12). Relayers MAY
4905 delete the thoroughly-obsolete headers described in appendix
4906 A.3, although this behavior no longer seems useful enough to
4907 encourage. Other alterations SHOULD be avoided at all
4908 costs, as per the Hippocratic Principle.
4909
4910 NOTE: As discussed in section 2.3, tidying up the
4911 headers of a user-prepared article is the job of
4912 the posting agent, not the relayer. The relayer's
4913 purpose is to move already-compliant articles
4914 around efficiently without damaging them. Note
4915 that in existing implementations, specific pro-
4916 grams may contain both posting-agent functions and
4917 relayer functions. The distinction is that post-
4918 ing-agent functions are invoked only on articles
4919 posted by local posters, never on articles
4920 received from other relayers.
4921
4922 NOTE: A particular corollary of this rule is that
4923 relayers should not add headers unless truly nec-
4924 essary. In particular, this is not SMTP; do not
4925 add Received headers.
4926
4927 Relayers MUST not pass non-conforming articles on to other
4928 relayers, except perhaps in a cooperating subnet that has
4929 agreed to permit certain kinds of non-conforming behavior.
4930 This is a direct consequence of the Internet Robustness
4931 Principle.
4932
4933 The two preceding paragraphs may appear to be in conflict.
4934 What is to be done when a non-conforming article is
4935 received? The Robustness Principle argues that it should be
4936 accepted but must not be passed on to other relayers while
4937 still non-conforming, and the Hippocratic Principle strongly
4938 discourages attempts at repair. The conclusion that this
4939 appears to lead to is correct: a non-conforming article MAY
4940 be accepted for local filing and processing, or it MAY be
4941 discarded entirely, but it MUST not be passed on to other
4942 relayers.
4943
4944
4945
4946
4947
4948 2 June 1994 - 75 - expires 15 July 1994
4949
4950
4951
4952
4953
4954 INTERNET DRAFT to be NEWS sec. 9.1
4955
4956
4957 A relayer MUST not respond to the arrival of an article by
4958 sending mail to any destination, other than a local adminis-
4959 trator, except by explicit prearrangement with the recipi-
4960 ent. Neither posting an article (other than certain types
4961 of control message, see section 7.5) nor being the moderator
4962 of a moderated newsgroup constitutes such prearrangement.
4963 UNDER NO CIRCUMSTANCES WHATSOEVER may a relayer attempt to
4964 send mail to either an article's originator or a moderator.
4965
4966 NOTE: Reporting apparent errors in message compo-
4967 sition is the job of a posting agent, not a
4968 relayer. The same is true of mailing moderated-
4969 newsgroup postings to moderators. In networks of
4970 thousands of cooperating relayers, it is simply
4971 unacceptable for there to be any circumstance
4972 whatsoever that causes any significant fraction of
4973 them to simultaneously send mail to the same des-
4974 tination. (Some control messages are exceptions,
4975 although perhaps ill-advised ones.) What might,
4976 in a smaller network, be a useful notification or
4977 forwarding becomes a deluge of near-identical mes-
4978 sages that can bring mail software to its knees
4979 and severely inconvenience recipients. Modera-
4980 tors, in particular, historically have suffered
4981 grievously from this.
4982
4983 Notification of problems in incoming articles MAY go to
4984 local administrators, or at most (by prearrangement!) to
4985 the administrators of the neighboring relayer(s) that passed
4986 on the problematic articles.
4987
4988 NOTE: It would be desirable to notify the author
4989 that his posting is not propagating as he expects.
4990 However, there is no known method for doing this
4991 that will scale up gracefully. (In particular,
4992 "notify only if within N relayers of the origina-
4993 tor" falls down in the presence of commercial news
4994 services like UUNET: there may be hundreds or
4995 thousands of relayers within a couple of hops of
4996 the originator.) The best that can be done right
4997 now is to notify neighbors, in hopes that the word
4998 will eventually propagate up the line, or organize
4999 regional monitoring at major hubs.
5000
5001 If it is necessary to alter an article, e.g. translate it to
5002 another character set or alter its EOL representation,
5003 strenuous efforts should be made to ensure that such trans-
5004 formations are reversible, and that relayers or other soft-
5005 ware that might wish to reverse them know exactly how to do
5006 so.
5007
5008 NOTE: For example, a cooperating subnet that
5009 exchanges articles using a non-ASCII character set
5010 like EBCDIC should define a standard, reversible
5011
5012
5013
5014 2 June 1994 - 76 - expires 15 July 1994
5015
5016
5017
5018
5019
5020 INTERNET DRAFT to be NEWS sec. 9.1
5021
5022
5023 ASCII-EBCDIC mapping and take pains to see that it
5024 is used at all points where the subnet meets the
5025 outside. If the only reason for using EBCDIC is
5026 that the readers typically employ EBCDIC devices,
5027 it would be more robust to employ ASCII as the
5028 interchange format and do the transformation in
5029 the reading and posting agents.
5030
5031
5032 9.2. Article Acceptance And Propagation
5033
5034 When a relayer first receives an article, it must decide
5035 whether to accept it. (This applies regardless of whether
5036 the article arrived by itself or as part of a batch, and in
5037 principle regardless of whether it originated as a local
5038 posting or as traffic from another relayer.) In a cooperat-
5039 ing subnet with well-controlled propagation paths, some of
5040 the tests specified here MAY be delegated to centrally-
5041 located relayers; that is, relayers that can receive news
5042 ONLY via one of the central relayers might simplify accep-
5043 tance testing based on the assumption that incoming traffic
5044 has already passed the full set of tests at a central
5045 relayer.
5046
5047 The wording that follows is based on a model in which arti-
5048 cles arrive on a relayer's host before acceptance tests are
5049 done. However, depending on the degree of integration of
5050 the transport mechanisms and the relayer, some or all of
5051 these tests MAY be done before the article is actually
5052 transmitted, so that articles which definitely will not be
5053 accepted need not be transmitted at all.
5054
5055 The wording that follows also specifies a particular order
5056 for the acceptance tests. While this order is the obvious
5057 one, the tests MAY be done in any order.
5058
5059 First, the relayer MUST verify that the article is a legal
5060 news article, with all mandatory headers present with legal
5061 contents.
5062
5063 NOTE: This check in principle is done by the first
5064 relayer to see an article, so an article received
5065 from another relayer should always be legal, but
5066 there is enough old software still operational
5067 that this cannot be taken for granted; see the
5068 discussion of the Internet Robustness Principle in
5069 section 9.1.
5070
5071 Second, the relayer MUST determine whether it has already
5072 seen this article (identified by its message ID). This is
5073 normally done by retaining a history of all article message
5074 IDs seen in the last N days, where the value of N is decided
5075 by the relayer's administrator but SHOULD be at least 7.
5076 Since N cannot practically be infinite, articles whose Date
5077
5078
5079
5080 2 June 1994 - 77 - expires 15 July 1994
5081
5082
5083
5084
5085
5086 INTERNET DRAFT to be NEWS sec. 9.2
5087
5088
5089 content indicates that they are older than N days are
5090 declared "stale" and are deemed to have been seen already.
5091
5092 NOTE: This check is important because news propa-
5093 gation topology is typically redundant, often
5094 highly so, and it is not at all uncommon for a
5095 relayer to receive the same article from several
5096 neighbors. The history of already-seen message
5097 IDs can get quite large, hence the desire to limit
5098 its length... but it is important that it be long
5099 enough that slowly-propagating articles are not
5100 classed as stale. News propagation within the
5101 Internet is normally very rapid, but when UUCP
5102 links are involved, end-to-end delays of several
5103 days are not rare, so a week is not a particularly
5104 generous minimum.
5105
5106 NOTE: Despite generally more rapid propagation in
5107 recent times, it is still not unheard-of for some
5108 propagation paths to be very slow. This can
5109 introduce the possibility of old articles arriving
5110 again after they are gone from the history. Hence
5111 the "stale" rule.
5112
5113 Third, the relayer MUST determine whether any of the arti-
5114 cle's newsgroups are "subscribed to" by the host, i.e. fit a
5115 description of what hierarchies or newsgroups the site wants
5116 to receive.
5117
5118 NOTE: This check is significant because informa-
5119 tion on what newsgroups a relayer wishes to
5120 receive is often stored at its neighbors, who may
5121 not have up-to-date information or may simplify
5122 the rules for implementation reasons. As a hedge
5123 against the possibility of missed or delayed new-
5124 group control messages, relayers may wish to
5125 observe a notion of a newsgroup subscription that
5126 is independent of the list of newsgroups actually
5127 known to the relayer. This would permit reception
5128 and relaying of articles in newsgroups that the
5129 relayer is not (yet) aware of, subject to more
5130 general criteria indicating that they are likely
5131 to be of interest.
5132
5133 Once an article has been accepted, it may be passed on to
5134 other relayers. The fundamental news propagation rule is a
5135 flooding algorithm: on receiving and accepting an article,
5136 send it to all neighboring relayers not already in its path
5137 list that are sent its newsgroup(s) and distribution(s).
5138
5139 NOTE: The path list's role in loop prevention may
5140 appear relatively unimportant, given that looping
5141 articles would typically be rejected as duplicates
5142 anyway. However, the path list's role in
5143
5144
5145
5146 2 June 1994 - 78 - expires 15 July 1994
5147
5148
5149
5150
5151
5152 INTERNET DRAFT to be NEWS sec. 9.2
5153
5154
5155 preventing superfluous transmissions is not triv-
5156 ial. In particular, the path list is the only
5157 thing that prevents relayer X, on receiving an
5158 article from relayer Y, from sending it back to Y
5159 again. (Indeed, the usual symptom of confusion
5160 about relayer names is that incoming news loops
5161 back in this manner.) The looping articles would
5162 be rejected as duplicates, but doubling the commu-
5163 nications load on every news transmission path is
5164 not to be taken lightly!
5165
5166 In general, relayers SHOULD not make propagation decisions
5167 by "anticipation": relayer X, noting that the article's path
5168 list already contains relayer Y, decides not to send it to
5169 relayer Z because X anticipates that Z will get the article
5170 by a better path. If that is generally true, then why is
5171 there a news feed from X to Z at all? In fact, the "better
5172 path" may be running slowly or may be down. News propaga-
5173 tion is very robust precisely because some redundant trans-
5174 mission is done "just in case". If it is imperative to
5175 limit unnecessary traffic on a path, use of NNTP [rrr] or
5176 ihave/sendme (see section 7.2) to pass articles only when
5177 necessary is better than arbitrary decisions not to pass
5178 articles at all.
5179
5180 Anticipation is occasionally justified in special cases.
5181 Such cases should involve both (1) a cooperating subnet
5182 whose propagation paths are well-understood and well-
5183 monitored, with failures and slowdowns noticed and dealt
5184 with promptly, and (2) a persistent pattern of heavy unnec-
5185 essary traffic on a path that is either slow or costly. In
5186 addition, there should be some reason why neither NNTP nor
5187 ihave/sendme is suitable as a solution to the problem.
5188
5189
5190 9.3. Administrator Contact
5191
5192 It is desirable to have a standardized contact address for a
5193 relayer's administrators, in the spirit of the "postmaster"
5194 address for mail administrators. Mail addressed to "news-
5195 master" on a relayer's host MUST go to the administrator(s)
5196 of that relayer. Mail addressed to "usenet" on the
5197 relayer's host SHOULD be handled likewise. Mail addressed
5198 to either address on other hosts using the same news
5199 database SHOULD be handled likewise.
5200
5201 NOTE: These addresses are case-sensitive, although
5202 it would be desirable for sequences equivalent to
5203 them using case-insensitive comparison to be han-
5204 dled likewise. While "newsmaster" seems the pre-
5205 ferred network-independent address, by analogy to
5206 "postmaster", there is an existing practice of
5207 using "usenet" for this purpose, and so "usenet"
5208 should be supported if at all possible (especially
5209
5210
5211
5212 2 June 1994 - 79 - expires 15 July 1994
5213
5214
5215
5216
5217
5218 INTERNET DRAFT to be NEWS sec. 9.3
5219
5220
5221 on hosts belonging to Usenet!). The address
5222 `news" is also sometimes used for purposes like
5223 this, but less consistently.
5224
5225
5226 10. Gatewaying
5227
5228 Gatewaying of traffic between news networks using this Draft
5229 and those using other exchange mechanisms can be useful, but
5230 must be done cautiously. Gateway administrators are taking
5231 on significant responsibilities, and must recognize that the
5232 consequences of error can be quite serious.
5233
5234
5235 10.1. General Gatewaying Issues
5236
5237 This section will primarily address the problems of gateway-
5238 ing traffic INTO news networks. Little can be said about
5239 the other direction without some specific knowledge of the
5240 network(s) involved. However, the two issues are not
5241 entirely independent: if a non-news network is gatewayed
5242 into a news network at more than one point, traffic injected
5243 into the non-news network by one gateway may appear at
5244 another as a candidate for injection back into the news net-
5245 work.
5246
5247 This raises a more general principle, the single most impor-
5248 tant issue for gatewaying:
5249
5250 Above all, prevent loops.
5251
5252 The normal loop prevention of news transmission is vitally
5253 dependent on the Message-ID header. Any gateway which finds
5254 it necessary to remove this header, alter it, or supersede
5255 it (by moving it into the body), MUST take equally effective
5256 precautions against looping.
5257
5258 NOTE: There are few things more effective at turn-
5259 ing news readers into a lynch mob than a malfunc-
5260 tioning gateway, or pair of gateways, that takes
5261 in news articles, mangles them just enough to pre-
5262 vent news relayers from recognizing them as dupli-
5263 cates, and regurgitates them back into the news
5264 stream. This happens rather too often.
5265
5266 Gateway implementors should realize that gateways have all
5267 the responsibilities of relayers, plus the added complica-
5268 tions introduced by transformations between different infor-
5269 mation formats. Much of section 9's discussion of relayer
5270 issues is relevant to gateways as well. In particular,
5271 gateways SHOULD keep a history of recently-seen articles, as
5272 described in section 9.2, and not assume that articles will
5273 never reappear. This is particularly important for networks
5274 that have their own concept analogous to message IDs: a
5275
5276
5277
5278 2 June 1994 - 80 - expires 15 July 1994
5279
5280
5281
5282
5283
5284 INTERNET DRAFT to be NEWS sec. 10.1
5285
5286
5287 gateway should keep a history of traffic seen from BOTH
5288 directions.
5289
5290 If at all possible, articles entering the non-news network
5291 SHOULD be marked in some way so that they will NOT be re-
5292 gatewayed back into news. Multiple gateways obviously must
5293 agree on the marking method used; if it is done by having
5294 them know each others' names, name changes MUST be coordi-
5295 nated with great care. If marking cannot be done, all
5296 transformations MUST be reversible so that a re-gatewayed
5297 article is identical to the original (except perhaps for a
5298 longer Path header).
5299
5300 Gateways MUST not pass control messages (articles containing
5301 Control, Also-Control, or Supersedes headers) without remov-
5302 ing the headers that make them control messages, unless
5303 there are compelling reasons to believe that they are rele-
5304 vant to both sides and that conventions are compatible. If
5305 it is truly desirable to pass them unaltered, suitable pre-
5306 cautions MUST be taken to ensure that there is NO POSSIBIL-
5307 ITY of a looping control message.
5308
5309 NOTE: The damage done by looping articles is mul-
5310 tiplied a thousandfold if one of the affected
5311 articles is something like a sendsys message (see
5312 section 7.3) that requests multiple automatic
5313 replies. Most gateways simply should not pass
5314 control messages at all. If some unusual reason
5315 dictates doing so, gateway implementors and admin-
5316 istrators are urged to consider bulletproof rate-
5317 limiting measures for the more destructive ones
5318 like sendsys, e.g. passing only one per hour no
5319 matter how many are offered.
5320
5321 Gateways, like relayers, SHOULD make determined efforts to
5322 avoid mangling articles unnecessarily. In the case of gate-
5323 ways, some transformations may be inevitable, but keeping
5324 them to a minimum and ensuring that they are reversible is
5325 still highly desirable.
5326
5327 Gateways MUST avoid destroying information. In particular,
5328 the restrictions of section 4.2.2 are best taken with a
5329 grain of salt in the context of gateways. Information that
5330 does not translate directly into news headers SHOULD be
5331 retained, perhaps in "X-" headers, both because it may be of
5332 interest to sophisticated readers and because it may be cru-
5333 cial to tracing propagation problems.
5334
5335 Gateway implementors should take particular note of the dis-
5336 cussion of mailed replies, or more precisely the ban on
5337 same, in section 9.1. Gateway problems MUST be reported to
5338 the local administration, not to the innocent originator of
5339 traffic. "Gateway problems" here includes all forms of
5340 propagation anomaly on the non-news side of the gateway,
5341
5342
5343
5344 2 June 1994 - 81 - expires 15 July 1994
5345
5346
5347
5348
5349
5350 INTERNET DRAFT to be NEWS sec. 10.1
5351
5352
5353 e.g. unreachable addresses on a mailing list. Note that
5354 this requires consideration of possible misbehavior of
5355 "downstream" hosts, not just the gateway host.
5356
5357
5358 10.2. Header Synthesis
5359
5360 News articles prepared by gateways MUST be legal news arti-
5361 cles. In particular, they MUST include all of the mandatory
5362 headers (see section 5) and MUST fully conform to the
5363 restrictions on said headers. This often requires that a
5364 gateway function not only as a relayer, but also partly as a
5365 posting agent, aiding in the synthesis of a conforming arti-
5366 cle from non-conforming input.
5367
5368 NOTE: The full-conformance requirement needs par-
5369 ticularly careful attention when gatewaying mail-
5370 ing lists to news, because a number of constructs
5371 that are legal in MAIL headers are NOT permissible
5372 in news headers. (Note also that not all mail
5373 traffic fully conforms to even the MAIL specifica-
5374 tion.) The rest of this section will be phrased
5375 in terms of mail-to-news gatewaying, but most of
5376 it is more generally applicable.
5377
5378 The mandatory headers generally present few problems.
5379
5380 If no date information is available, the gateway should sup-
5381 ply a Date header with the gateway's current date. If only
5382 partial information is available (e.g. date but not time),
5383 this should be fleshed out to a full Date header by adding
5384 default values, not by mixing in parts of the gateway's cur-
5385 rent date. (Defaults should be chosen so that fleshed-out
5386 dates will not be in the future!) It may be necessary to
5387 map timezone information to the restricted forms permitted
5388 in the news Date header. See section 5.1.
5389
5390 NOTE: The prohibition of mixing dates is on the
5391 theory that it is better to admit ignorance than
5392 to lie.
5393
5394 If the author's address as supplied in the original message
5395 is not suitable for inclusion in a From header, the gateway
5396 MUST transform it so it is, e.g. by use of the "% hack" and
5397 the domain address of the gateway. The desire to preserve
5398 information is NOT an excuse for violating the rules. If
5399 the transformation is drastic enough that there is reason to
5400 suspect loss of information, it may be desirable to include
5401 the original form in an X- header, but the From header's
5402 contents MUST be as specified in section 5.2.
5403
5404 If the message contains a Message-ID header, the contents
5405 should be dealt with as discussed in section 10.3. If there
5406 is no message ID present, it will be necessary to synthesize
5407
5408
5409
5410 2 June 1994 - 82 - expires 15 July 1994
5411
5412
5413
5414
5415
5416 INTERNET DRAFT to be NEWS sec. 10.2
5417
5418
5419 one, following the news rules (see section 5.3).
5420
5421 Every effort should be made to produce a meaningful Subject
5422 header; see section 5.4. Many news readers select articles
5423 to read based on Subject headers, and inserting a place-
5424 holder like "<no subject available>" is considered highly
5425 objectionable. Even synthesizing a Subject header by pick-
5426 ing out the first half-dozen nouns and adjectives in the
5427 article body is better than using a placeholder, since it
5428 offers SOME indication of what the article might contain.
5429
5430 The contents of the Newsgroups header (section 5.5) are usu-
5431 ally predetermined by gateway configuration, but a gateway
5432 to a network that has its own concept of newsgroups or dis-
5433 cussions might have to make transformations. Such transfor-
5434 mations should be reversible; otherwise confusion is likely
5435 on both sides.
5436
5437 It will rarely be possible for gateways to provide a Path
5438 header that is both an accurate history of the relayers the
5439 article has passed through AS NEWS and a usable reply
5440 address. The history function MUST be given priority; see
5441 the discussion in section 5.6. It will usually be necessary
5442 for a gateway to supply an empty path list, abandoning the
5443 reply function.
5444
5445 It is desirable for gatewayed articles to convey as much
5446 useful information as possible, e.g. by use of optional news
5447 headers (see section 6) when the relevant information is
5448 available. Synthesis of optional headers can generally fol-
5449 low similar rules.
5450
5451 Software synthesizing References headers should note the
5452 discussion in section 6.5 concerning the incompatibility
5453 between MAIL and news. Also of interest is the possibility
5454 of incorporating information from In-Reply-To headers and
5455 from attribution lines in the body; an incomplete or some-
5456 what conjectural References header is much better than none
5457 at all, and reading agents already have to cope with incom-
5458 plete or slightly erroneous References lists.
5459
5460
5461 10.3. Message ID Mapping
5462
5463 This section, like the previous one, is phrased in terms of
5464 mail being gatewayed into news, but most of the discussion
5465 should be more generally applicable.
5466
5467 A particularly sticky problem of gatewaying mail into news
5468 is supplying legal news message IDs. Note, in particular,
5469 that not all MAIL message IDs are legal in news; the news
5470 syntax (specified in section 5.3, with related material in
5471 5.2) is more restrictive. Generating a fully-conforming
5472 news article from a mail message may require transforming
5473
5474
5475
5476 2 June 1994 - 83 - expires 15 July 1994
5477
5478
5479
5480
5481
5482 INTERNET DRAFT to be NEWS sec. 10.3
5483
5484
5485 the message ID somewhat.
5486
5487 Generation and transformation of message IDs assumes partic-
5488 ular importance if a given mailing list (or whatever) is
5489 being handled by more than one gateway. It is highly desir-
5490 able that the same article contents not appear twice in the
5491 same newsgroup, which requires that they receive the same
5492 message ID from all gateways. Gateways SHOULD use the fol-
5493 lowing algorithm (possibly modified by the later discussion
5494 of gatewaying into more than one newsgroup) unless local
5495 considerations dictate another:
5496
5497 1. Separate message ID from surroundings, if necessary.
5498 A plausible method for this is to start at the first
5499 "<", end at the next ">", and reject the message if
5500 no ">" is found or a second "<" is seen before the
5501 ">". Also reject the message if the message ID con-
5502 tains no "@" or more than one "@", or if it contains
5503 no ".". Also reject the message if the message ID
5504 contains non-ASCII characters, ASCII control charac-
5505 ters, or white space.
5506
5507 NOTE: Any legitimate domain will include at
5508 least one ".". RFC 822 section 6.2.2 forbids
5509 white space in this context when passing mail
5510 on to non-MAIL software.
5511
5512 2. Delete the leading "<" and trailing ">". Separate
5513 message ID into local part and domain at the "@".
5514
5515 3. In both components, transliterate leading dots
5516 (".", ASCII 46), trailing dots, and dots after the
5517 first in sequences of two or more consecutive
5518 dots, into underscores (ASCII 95).
5519
5520 4. In both components, transliterate disallowed char-
5521 acters other than dots (see the definition of
5522 <unquoted-char> in section 5.2) to underscores
5523 (ASCII 95).
5524
5525 5. Form the message ID as
5526
5527 "<" local-part "@" domain ">"
5528
5529
5530 NOTE: This algorithm is approximately that of Rich
5531 Salz's successful gatewaying package.
5532
5533 Despite the desire to keep message IDs consistent across
5534 multiple gateways, there is also a more subtle issue that
5535 can require a different approach. If the same articles are
5536 being gatewayed into more than one newsgroup, and it is not
5537 possible to arrange that all gateways gateway them to the
5538 same cross-posted set of newsgroups, then the message IDs in
5539
5540
5541
5542 2 June 1994 - 84 - expires 15 July 1994
5543
5544
5545
5546
5547
5548 INTERNET DRAFT to be NEWS sec. 10.3
5549
5550
5551 the different newsgroups MUST be DIFFERENT.
5552
5553 NOTE: Otherwise, arrival of an article in one
5554 newsgroup will prevent it from appearing in
5555 another, and which newsgroup a particular article
5556 appears in will be an accident of which direction
5557 it arrives from first. It is very difficult to
5558 maintain a coherent discussion when each partici-
5559 pant sees a randomly-selected 50% of the traffic.
5560 The fundamental problem here is that the basic
5561 assumption behind message IDs is being violated:
5562 the gateways are assigning the same message ID to
5563 articles that differ in an important respect
5564 (Newsgroups header).
5565
5566 In such cases, it is suggested that the newsgroup name, or
5567 an agreed-on abbreviation thereof, be prepended to the local
5568 part of the message ID (with a separating ".") by the gate-
5569 way. This will ensure that multiple gateways generate the
5570 same message ID, while also ensuring that different news-
5571 groups can be read independently.
5572
5573 NOTE: It is preferable to have the gateway(s)
5574 cross-post the article, avoiding the issue alto-
5575 gether, but this may not be feasible, especially
5576 if one newsgroup is widespread and the other is
5577 purely local.
5578
5579
5580 10.4. Mail to and from News
5581
5582 Gatewaying mail to news, and vice-versa, is the most obvious
5583 form of news gatewaying. It is common to set up gateways
5584 between news and mail rather too casually.
5585
5586 It is hard to go very wrong in gatewaying news into a mail-
5587 ing list, except for the non-trivial matter of making sure
5588 that error reports go to the local administration rather
5589 than to the authors of news articles. (This requires atten-
5590 tion to the "envelope address" as well as to the message
5591 headers.) Doing the reverse connection correctly is much
5592 harder than it looks.
5593
5594 NOTE: In particular, just feeding the mail message
5595 to "inews -h" or the equivalent is NOT, repeat
5596 NOT, adequate to gateway mail to news. Signifi-
5597 cant gatewaying software is necessary to do it
5598 right. Not all headers of mail messages conform
5599 to even the MAIL specifications, never mind the
5600 stricter rules for news.
5601
5602 It is useful to distinguish between two different forms of
5603 mail-to-news gatewaying: gatewaying a mailing list into a
5604 newsgroup, and operating a "post-by-mail" service in which
5605
5606
5607
5608 2 June 1994 - 85 - expires 15 July 1994
5609
5610
5611
5612
5613
5614 INTERNET DRAFT to be NEWS sec. 10.4
5615
5616
5617 individual articles can be posted to a newsgroup by mailing
5618 them to a specific address. In the first case, the message
5619 is already being "broadcast", and the situation can be
5620 viewed as gatewaying one form of news into another. The
5621 second case is closer to that of a moderator posting submis-
5622 sions to a moderated newsgroup.
5623
5624 In either case, the discussions in the preceding two sec-
5625 tions are relevant, as is the Hippocratic Principle of sec-
5626 tion 9. However, some additional considerations are spe-
5627 cific to mail-to-news gatewaying.
5628
5629 As mentioned in section 6, point-to-point headers like To
5630 and Cc SHOULD not appear as such in news, although it is
5631 suggested that they be transformed to "X-" headers, e.g. X-
5632 To and X-Cc, to preserve their information content for pos-
5633 sible use by readers or troubleshooters. The Received
5634 header is entirely specific to MAIL and SHOULD be deleted
5635 completely during gatewaying, except perhaps for the
5636 Received header supplied by the gateway host itself.
5637
5638 The Sender header is a tricky case, one where mailing-list
5639 and post-by-mail practice should differ. For gatewaying
5640 mailing lists, the mailing-list host should be considered a
5641 relayer, and the From and Sender headers supplied in its
5642 transmissions left strictly untouched. For post-by-mail, as
5643 for a moderator posting a mailed submission, the Sender
5644 header should reflect the poster rather than the author. If
5645 a post-by-mail gateway receives a message with its own
5646 Sender header, it might wish to preserve the content in an
5647 X-Sender header.
5648
5649 It will generally be necessary to transform between mail's
5650 In-Reply-To/References convention and news's References/See-
5651 Also convention, to preserve correct semantics of cross ref-
5652 erences. This also requires attention when going the other
5653 way, from news to mail. See the discussion of the differ-
5654 ence in section 6.5.
5655
5656
5657 10.5. Gateway Administration
5658
5659 Any news system will benefit from an attentive administra-
5660 tor, preferably assisted by automated monitoring for anoma-
5661 lies. This is particularly true of gateways. Gateway soft-
5662 ware SHOULD be instrumented so that unusual occurrences,
5663 such as sudden massive surges in traffic, are reported
5664 promptly. It is desirable, in fact, to go further: gateway
5665 software SHOULD endeavour to limit damage in the event that
5666 the administrator does not respond promptly.
5667
5668 NOTE: For example, software might limit the gate-
5669 waying rate by queueing incoming traffic and emp-
5670 tying the queue at a finite maximum rate (well
5671
5672
5673
5674 2 June 1994 - 86 - expires 15 July 1994
5675
5676
5677
5678
5679
5680 INTERNET DRAFT to be NEWS sec. 10.5
5681
5682
5683 below the maximum that the host is capable of!)
5684 which is set by the administrator and is not
5685 raised automatically.
5686
5687 Traffic gatewayed into a news network SHOULD include a suit-
5688 able header, perhaps X-Gateway-Administrator, giving an
5689 electronic address that can be used to report problems.
5690 This SHOULD be an address that goes direct to a human, not
5691 to a "routine administrative issues" mailbox that is exam-
5692 ined only occasionally, since the point is to be able to
5693 reach the administrator quickly in an emergency. Gateway
5694 administrators SHOULD arrange substitutes to cover gateway
5695 operation (with suitable redirection of mail) when they are
5696 on vacation etc.
5697
5698
5699 11. Security And Related Issues
5700
5701 Although the interchange format itself raises no significant
5702 security issues, the wider context does.
5703
5704
5705 11.1. Leakage
5706
5707 The most obvious form of security problem with news is
5708 "leakage" of articles which are intended to have only
5709 restricted circulation. The flooding algorithm is EXTREMELY
5710 good at finding any path by which articles can leave a sub-
5711 net with supposedly-restrictive boundaries. Substantial
5712 administrative effort is required to ensure that local news-
5713 groups remain local, unless connections to the outside world
5714 are tightly restricted.
5715
5716 A related problem is that the sendme control message can be
5717 used to ask for any article by its message ID. The useful-
5718 ness of this has declined as message-ID generation algo-
5719 rithms have become less predictable, but it remains a poten-
5720 tial problem for "secure" newsgroups. Hosts with such news-
5721 groups may wish to disable the sendme control message
5722 entirely.
5723
5724 The sendsys, version, and whogets control messages also
5725 allow "outsiders" to request information from "inside",
5726 which may reveal details of internal topology (etc.) that
5727 are considered confidential. (Note that at least limited
5728 openness about such matters may be a condition of membership
5729 in such networks, e.g. Usenet.)
5730
5731 Organizations wishing to control these forms of leakage are
5732 strongly advised to designate a small number of "official
5733 gateway" hosts to handle all news exchange with the outside
5734 world, so that a bounded amount of administrative effort is
5735 needed to control propagation and eliminate problems.
5736 Attempts to keep news out entirely, by refusing to support
5737
5738
5739
5740 2 June 1994 - 87 - expires 15 July 1994
5741
5742
5743
5744
5745
5746 INTERNET DRAFT to be NEWS sec. 11.1
5747
5748
5749 an official gateway, typically result in large numbers of
5750 unofficial partial gateways appearing over time. Such a
5751 configuration is much more difficult to troubleshoot.
5752
5753 A somewhat-related problem is the possibility of proprietary
5754 material being disclosed unintentionally by a poster who
5755 does not realize how far his words will propagate, either
5756 from sheer misunderstanding or because of errors made (by
5757 human or software) in followup preparation. There is little
5758 that can be done about this except education.
5759
5760
5761 11.2. Attacks
5762
5763 Although the limitations of the medium restrict what can be
5764 done to attack a host via news, some possibilities exist,
5765 most of them problems news shares with mail.
5766
5767 If reading agents are careless about transmitting non-
5768 printable characters to output devices, malicious posters
5769 may post articles containing control sequences ("letter-
5770 bombs") meant to have various destructive effects on output
5771 devices. Possible effects depend on the device, but they
5772 can include hardware damage (e.g. by repeated writing of
5773 values into configuration memories that can tolerate only a
5774 limited number of write cycles) and security violation (e.g.
5775 by reprogramming function keys potentially used by privi-
5776 leged readers).
5777
5778 A more sophisticated variation on the letterbomb is inclu-
5779 sion of "Trojan horses" in programs. Obviously, readers
5780 must be cautious about using software found in news, but
5781 more subtly, reading agents must also exercise care. MIME
5782 messages can include material that is executable in some
5783 sense, such as PostScript documents (which are programs!),
5784 and letterbombs may be introduced into such material.
5785
5786 Given the presence of finite resources and other software
5787 limitations, some degree of system disruption can be
5788 achieved by posting otherwise-innocent material in great
5789 volume, either in single huge articles (see section 4.6) or
5790 in a stream of modest-sized articles. (Some would say that
5791 the steady growth of Usenet volume constitutes a subtle and
5792 unintentional attack of the latter type; certainly it can
5793 have disruptive effects if administrators are inattentive.)
5794 Systems need some ability to cope with surges, because sin-
5795 gle huge articles occur occasionally as the result of soft-
5796 ware error, innocent misunderstanding, or deliberate malice,
5797 and downtime at upstream hosts can cause droughts, followed
5798 by floods, of legitimate articles. (There is also a certain
5799 amount of normal variation; for example, Usenet traffic is
5800 noticeably lighter on weekends and during Christmas holi-
5801 days, and rises noticeably at the start of the school term
5802 of North American universities.) However, a site that
5803
5804
5805
5806 2 June 1994 - 88 - expires 15 July 1994
5807
5808
5809
5810
5811
5812 INTERNET DRAFT to be NEWS sec. 11.2
5813
5814
5815 normally receives little traffic may be quite vulnerable to
5816 "swamping" attack if its software is insufficiently careful.
5817
5818 In general, careless implementation may open doors that are
5819 not intrinsic to news. In particular, implementation of
5820 control messages (see sections 6.6 and 7) and unbatchers
5821 (see section 8.1 and 8.2) via a command interpreter requires
5822 substantial precautions to ensure that only the intended
5823 capabilities are available. Care must also be taken that
5824 article-supplied text is not fed to programs that have
5825 escapes to command interpreters.
5826
5827 Finally, there is considerable potential for malice in the
5828 sendsys, version, and whogets control messages. They are
5829 not harmful to the hosts receiving them as news, but they
5830 can be used to enlist those hosts (by the thousands) as
5831 unwitting allies in a mail-swamping attack on a victim who
5832 may not even receive news. The precautions discussed in
5833 section 7.5 can reduce the potential for such attacks con-
5834 siderably, but the hazard cannot be eliminated as long as
5835 these control messages exist.
5836
5837
5838 11.3. Anarchy
5839
5840 The highly distributed nature of news propagation, and the
5841 lack of adequate authentication protocols (especially for
5842 use over the less-interactive transport mechanisms such as
5843 UUCP), make article forgery relatively straightforward. It
5844 may be possible to at least track a forgery to its source,
5845 once it is recognized as such, but clever forgers can make
5846 even that relatively difficult. The assumption that forg-
5847 eries will be recognized as such is also not to be taken for
5848 granted; readers are notoriously prone to blindly assuming
5849 authenticity. If a forged article's initial path list
5850 includes the relayer name of the supposed poster's host, the
5851 article will never be sent to that host, and the alleged
5852 author may learn about the forgery secondhand or not at all.
5853
5854 A particularly noxious form of forgery is the forged "can-
5855 cel" control message. Notably, it is relatively straight-
5856 forward to write software that will automatically send out a
5857 (forged) cancel message for any article meeting some crite-
5858 rion, e.g. written by a specific author. The authentication
5859 problems discussed in section 7.1 make it difficult to solve
5860 this without crippling cancel's important functionality.
5861
5862 A related problem is the possibility of disagreements over
5863 newsgroup creation, on networks where such things are not
5864 decided by central authorities. There have been cases of
5865 "rmgroup wars", where one poster persistently sends out new-
5866 group messages to create a newsgroup and another, equally
5867 persistently, sends out rmgroup messages asking that it be
5868 removed. This is not particularly damaging, if relayers are
5869
5870
5871
5872 2 June 1994 - 89 - expires 15 July 1994
5873
5874
5875
5876
5877
5878 INTERNET DRAFT to be NEWS sec. 11.3
5879
5880
5881 configured to be cautious, but can cause serious confusion
5882 among innocent third parties who just want to know whether
5883 they can use the newsgroup for communication or not.
5884
5885
5886 11.4. Liability
5887
5888 News shares the legal uncertainty surrounding other forms of
5889 electronic communication: what rules apply to this new
5890 medium of information exchange? News is a particularly
5891 problematic case because it is a broadcast medium rather
5892 than a point-to-point one like mail, and analogies to older
5893 forms of communication are particularly weak.
5894
5895 Are news-carrying hosts common carriers, like the phone com-
5896 panies, providing communications paths without having either
5897 authority over or responsibility for content? Or are they
5898 publishers, responsible for the content regardless of
5899 whether they are aware of it or not? Or something in
5900 between? Such questions are particularly significant when
5901 the content is technically criminal, e.g. some types of sex-
5902 ually-oriented material in some jurisdictions, in which case
5903 ignorance of its presence may not be an adequate defence.
5904
5905 Even in milder situations such as libel or copyright viola-
5906 tion, the responsibilities of the poster, his host, and
5907 other hosts carrying the traffic are unclear. Note, in par-
5908 ticular, the problems arising when the article is a forgery,
5909 or when the alleged author claims it is a forgery but cannot
5910 prove this.
5911
5912
5913 A. Archeological Notes
5914
5915
5916 A.1. A-News Article Format
5917
5918 The obsolete "A News" article format consisted of exactly
5919 five lines of header information, followed by the body. For
5920 example:
5921
5922 Aeagle.642
5923 news.misc
5924 cbosgd!mhuxj!mhuxt!eagle!jerry
5925 Fri Nov 19 16:14:55 1982
5926 Usenet Etiquette - Please Read
5927 body
5928 body
5929 body
5930
5931 The first line consisted of an "A" followed by an article ID
5932 (analogous to a message ID and used for similar purposes).
5933 The second line was the list of newsgroups. The third line
5934 was the path. The fourth was the date, in the format above
5935
5936
5937
5938 2 June 1994 - 90 - expires 15 July 1994
5939
5940
5941
5942
5943
5944 INTERNET DRAFT to be NEWS sec. A.1
5945
5946
5947 (all fields fixed width), resembling an Internet date but
5948 not quite the same. The fifth was the subject.
5949
5950 This format is documented for archeological purposes only.
5951 Do not generate articles in this format.
5952
5953
5954 A.2. Early B-News Article Format
5955
5956 The obsolete pseudo-Internet article format, used briefly
5957 during the transition between the A News format and the mod-
5958 ern format, followed the general outline of a MAIL message
5959 but with some non-standard headers. For example:
5960
5961 From: cbosgd!mhuxj!mhuxt!eagle!jerry (Jerry Schwarz)
5962 Newsgroups: news.misc
5963 Title: Usenet Etiquette -- Please Read
5964 Article-I.D.: eagle.642
5965 Posted: Fri Nov 19 16:14:55 1982
5966 Received: Fri Nov 19 16:59:30 1982
5967 Expires: Mon Jan 1 00:00:00 1990
5968
5969 body
5970 body
5971 body
5972
5973 The From header contained the information now found in the
5974 Path header, plus possibly the full name now typically found
5975 in the From header. The Title header contained what is now
5976 the Subject content. The Posted header contained what is
5977 now the Date content. The Article-I.D. header contained an
5978 article ID, analogous to a message ID and used for similar
5979 purposes. The Newsgroups and Expires headers were approxi-
5980 mately as now. The Received header contained the date when
5981 the latest relayer to process the article first saw it. All
5982 dates were in the above format, with all fields fixed width,
5983 resembling an Internet date but not quite the same.
5984
5985 This format is documented for archeological purposes only.
5986 Do not generate articles in this format.
5987
5988
5989 A.3. Obsolete Headers
5990
5991 Early versions of news software following the modern format
5992 sometimes generated headers like the following:
5993
5994 Relay-Version: version B 2.10 2/13/83; site cbosgd.UUCP
5995 Posting-Version: version B 2.10 2/13/83; site eagle.UUCP
5996 Date-Received: Friday, 19-Nov-82 16:59:30 EST
5997
5998 Relay-Version contained version information about the
5999 relayer that last processed the article. Posting-Version
6000 contained version information about the posting agent that
6001
6002
6003
6004 2 June 1994 - 91 - expires 15 July 1994
6005
6006
6007
6008
6009
6010 INTERNET DRAFT to be NEWS sec. A.3
6011
6012
6013 posted the article. Date-Received contained the date when
6014 the last relayer to process the article first saw it (in a
6015 slightly nonstandard format).
6016
6017 These headers are documented for archeological purposes
6018 only. Do not generate articles using them.
6019
6020
6021 A.4. Obsolete Control Messages
6022
6023 There once was a senduuname control message, resembling
6024 sendsys but requesting transmission of the list of hosts
6025 that the receiving host had UUCP connections to. This
6026 rapidly ceased to be of much use, and many organizations
6027 consider information about their internal connectivity to be
6028 confidential.
6029
6030 Historically, a checkgroups body consisting of one or two
6031 lines, the first of the form "-n newsgroup", caused check-
6032 groups to apply to only that single newsgroup. This form is
6033 documented for archeological purposes only; do not use it.
6034
6035 Historically, an article posted to a newsgroup whose name
6036 had exactly three components of which the third was "ctl"
6037 signified that article was to be taken as a control message.
6038 The Subject header specified the actions, in the same way
6039 the Control header does now. This form is documented for
6040 archeological purposes only; do not use it; do not implement
6041 it.
6042
6043
6044 B. A Quick Tour Of MIME
6045
6046 (The editor wishes to thank Luc Rooijakkers; most of this
6047 appendix is a lightly-edited version of a summary he kindly
6048 supplied.)
6049
6050 MIME (Multipurpose Internet Mail Extensions) is an upward-
6051 compatible set of extensions to RFC 822, currently docu-
6052 mented in RFCs 1341 and 1342. This appendix summarizes
6053 these documents. See the MIME RFCs for more information;
6054 they are very readable.
6055
6056 UNRESOLVED ISSUE: These RFC numbers (here and
6057 elsewhere in this Draft) need updating when the
6058 new MIME RFCs come out.
6059
6060 MIME defines the following new headers:
6061
6062
6063
6064
6065
6066
6067
6068
6069
6070 2 June 1994 - 92 - expires 15 July 1994
6071
6072
6073
6074
6075
6076 INTERNET DRAFT to be NEWS sec. B
6077
6078
6079 MIME-Version
6080 Content-Type
6081 Content-Transfer-Encoding
6082 Content-ID
6083 Content-Description
6084
6085
6086 The MIME-Version header is mandatory for all messages con-
6087 forming to the MIME specification and carries the version
6088 number of the MIME specification. Example:
6089
6090 MIME-Version: 1.0
6091
6092
6093 The Content-Type header indicates the content type of the
6094 message. Content types are split into a top-level type and
6095 a subtype, separated by a slash. Auxiliary information can
6096 also be supplied, using an attribute-value notation. Exam-
6097 ple:
6098
6099 Content-Type: text/plain; charset=us-ascii
6100
6101 (In the absence of a Content-Type header this is in fact the
6102 default content type.)
6103
6104 Important type/subtype combinations are
6105
6106 text/plain Plain text, possibly in a non-
6107 ASCII character set.
6108
6109 text/enriched A very simple wordprocessor-like
6110 language supporting character
6111 attributes (e.g., underlining),
6112 justification control, and multi-
6113 ple character sets. (This pro-
6114 posal has gone through several
6115 iterations and has recently split
6116 off from the main MIME RFCs into a
6117 separate document.)
6118
6119 message/rfc822 A mail message conforming to a
6120 slightly-relaxed version of RFC
6121 822.
6122
6123 message/partial Part of a message (supporting the
6124 transparent splitting and joining
6125 of messages when they are too
6126 large to be handled by some trans-
6127 port agent).
6128
6129 message/external-body A message whose body is external.
6130 Possible access methods include
6131 via mail, FTP, local file, etc.
6132
6133
6134
6135
6136 2 June 1994 - 93 - expires 15 July 1994
6137
6138
6139
6140
6141
6142 INTERNET DRAFT to be NEWS sec. B
6143
6144
6145 multipart/mixed A message whose body consists of
6146 multiple parts, possibly of dif-
6147 ferent types, intended to be
6148 viewed in serial order. Each part
6149 looks like an RFC 822 message,
6150 consisting of headers and a body.
6151 Most of the RFC 822 headers have
6152 no defined semantics for body
6153 parts.
6154
6155 multipart/parallel Likewise, except that the parts
6156 are intended to be viewed in par-
6157 allel (on user agents that support
6158 it).
6159
6160 multipart/alternative Likewise, except that the parts
6161 are intended to be semantically
6162 equivalent such that the part that
6163 best matches the capabilities of
6164 the environment should be dis-
6165 played. For example, a message
6166 may include plain-text, enriched-
6167 text, and postscript versions of
6168 some document.
6169
6170 multipart/digest A variant of multipart/mixed espe-
6171 cially intended for message
6172 digests (the default type of the
6173 parts is message/rfc822 instead of
6174 text/plain, saving on the number
6175 of headers for the parts).
6176
6177 application/postscript A PostScript document.
6178 (PostScript is a trademark of
6179 Adobe.)
6180
6181 Other top-level types exist for still images, audio, and
6182 video samples.
6183
6184 Some of the above types require the ability to transport
6185 binary data. Since the existing message systems usually do
6186 not support this, MIME provides a Content-Transfer-Encoding
6187 header to indicate the kind of encoding used. The possible
6188 encodings are:
6189
6190 7bit No encoding; the data consists of short
6191 (less than 1000 characters) lines of
6192 7-bit ASCII data, delimited by EOL
6193 sequences. This is the default encod-
6194 ing.
6195
6196 8bit Like 7bit, except that bytes with the
6197 high-order bit set may be present.
6198 Many transmission paths are incapable
6199
6200
6201
6202 2 June 1994 - 94 - expires 15 July 1994
6203
6204
6205
6206
6207
6208 INTERNET DRAFT to be NEWS sec. B
6209
6210
6211 of carrying messages which use this
6212 encoding.
6213
6214 binary No encoding; any sequence of bytes may
6215 be present. Many transmission paths
6216 are incapable of carrying messages
6217 which use this encoding.
6218
6219 base64 The data is encoded by representing
6220 every group of 3 bytes as 4 characters
6221 from the alphabet "A-Za-z0-9+/", which
6222 was chosen for its high robustness
6223 through mail gateways (the alphabet
6224 used by uuencode does not survive
6225 ASCII-EBCDIC-ASCII translations). In
6226 the final group of 4 characters, "=" is
6227 used for those characters not repre-
6228 senting data bytes. Line length is
6229 limited and EOLs in the encoded form
6230 are ignored.
6231
6232 quoted-printable Any byte can be represented by a three
6233 character "=XX" sequence where the X's
6234 are upper case hexadecimal digits.
6235 Bytes representing printable 7-bit US-
6236 ASCII characters except "=" may be rep-
6237 resented literally. Tabs and blanks
6238 may be represented literally if not at
6239 the end of a line. Line length is lim-
6240 ited, and an EOL preceded by "=" was
6241 inserted for this purpose and is not
6242 present in the original.
6243
6244 The base64 and quoted-printable encodings are applied to
6245 data in Internet canonical form, which means that any EOL
6246 encoded as anything but EOL must be an Internet canonical
6247 EOL: CR followed by LF.
6248
6249 The Content-Description header allows further description of
6250 a body part, analogous to the use of Subject for messages.
6251
6252 Finally, the Content-ID header can be used to assign an
6253 identification to body parts, analogous to the assignment of
6254 identifications to messages by Message-ID.
6255
6256 Note that most of these headers are structured header
6257 fields, as defined in RFC 822. Consequently, comments are
6258 allowed in their values. The following is a legal MIME
6259 header:
6260
6261 Content-Type: (a comment) text (yeah) /
6262 plain (and now some params:) ; charset= (guess what)
6263 iso-8859-1 (we don't have iso-10646 yet, pity)
6264
6265
6266
6267
6268 2 June 1994 - 95 - expires 15 July 1994
6269
6270
6271
6272
6273
6274 INTERNET DRAFT to be NEWS sec. B
6275
6276
6277 NOTE: Although the MIME specification was devel-
6278 oped for mail, there is nothing precluding its use
6279 for news as well. While it might simplify imple-
6280 mentation to restrict the MIME headers somewhat,
6281 in the same way that other news headers (e.g.
6282 From) are restricted subsets of the RFC-822 origi-
6283 nals, this would add yet another divergence
6284 between two formats that ought to be as compatible
6285 as possible. In the case of the MIME headers,
6286 there is no body of existing code posing compati-
6287 bility concerns. A full-featured MIME reading
6288 agent needs a full RFC-822 parser anyway, to prop-
6289 erly handle body parts of types like mes-
6290 sage/rfc822, so there is little gain from
6291 restricting MIME headers. Adopting the MIME spec-
6292 ification unchanged seems best. However, article-
6293 level MIME headers must still comply with the
6294 overall news header syntax given in section 4, so
6295 that news software which is NOT interested in MIME
6296 need not contain a full RFC-822 parser.
6297
6298 The second part of MIME, RFC 1342 (Representation of Non-
6299 ASCII Text in Internet Message Headers), addresses the prob-
6300 lem of non-ASCII characters in headers. An example of a
6301 header using the RFC 1342 mechanism is
6302
6303 From: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
6304
6305 Such encodings are allowed in selected headers, subject to
6306 the restrictions listed in RFC 1342.
6307
6308 The MIME effort has also produced an RFC defining a Content-
6309 MD5 header [rrr 1544], containing an MD5-based "checksum" of
6310 the contents of an article or body part, giving high confi-
6311 dence of detecting accidental modifications to the contents.
6312
6313 The "metamail" software package [rrr] helps provide MIME
6314 support with minimal changes to mailers, and may also be
6315 relevant to news reading agents.
6316
6317 The PEM (Privacy Enhanced Mail) effort is pursuing analogous
6318 facilities to offer stronger guarantees against malicious
6319 modifications, unauthorized eavesdropping, and forgery.
6320 This work too may be applicable to news, once it is recon-
6321 ciled with MIME (by efforts now underway).
6322
6323
6324 C. Summary of Changes Since RFC 1036
6325
6326 This Draft is much longer than RFC 1036, so there is obvi-
6327 ously much change in content. Much of this is just
6328 increased precision and rigor. Noteworthy changes and addi-
6329 tions include:
6330
6331
6332
6333
6334 2 June 1994 - 96 - expires 15 July 1994
6335
6336
6337
6338
6339
6340 INTERNET DRAFT to be NEWS sec. C
6341
6342
6343 + section 4.3's restrictions on article bodies
6344
6345 + all references to MIME facilities
6346
6347 + size limits on articles
6348
6349 + precise specification of Date-content syntax
6350
6351 + message IDs must never be re-used, ever
6352
6353 + "!" is the only Path delimiter
6354
6355 + multiple moderators in the Approved header
6356
6357 + rules on References trimming, and the _-_ mechanism
6358
6359 + generalization of the Xref rules
6360
6361 + multiple message IDs in Cancel and Supersedes
6362
6363 + Also-Control
6364
6365 + See-Also
6366
6367 + Article-Names
6368
6369 + Article-Replacing
6370
6371 + more precise rules for cancellation
6372
6373 + cancellation authorization based on From, not Sender
6374
6375 + "unmoderated" and descriptors in newgroup messages
6376
6377 + restrictive rules on handling of sendsys and version
6378 messages
6379
6380 + the whogets control message
6381
6382 + precise specification of checkgroups messages
6383
6384 + compression type preferably specified out-of-band
6385
6386 + rules for encapsulating news in MIME mail
6387
6388 + tighter specification of relayer functioning (section
6389 9.1)
6390
6391 + the "newsmaster" contact address
6392
6393 + rules for gatewaying (section 10)
6394
6395 + discussion of security issues (section 11)
6396
6397
6398
6399
6400 2 June 1994 - 97 - expires 15 July 1994
6401
6402
6403
6404
6405
6406 INTERNET DRAFT to be NEWS sec. C
6407
6408
6409 D. Summary of Completely New Features
6410
6411 Most of this Draft merely documents existing practice, but
6412 there are a few attempts to extend it. These are:
6413
6414 TBW
6415
6416
6417 E. Summary of Differences From RFC 822+1123
6418
6419 The following are noteworthy differences between this
6420 Draft's articles and MAIL messages:
6421
6422 + generally less-permissive header syntax
6423
6424 + notably, limited From syntax
6425
6426 + MAIL header comments allowed in only a few contexts
6427
6428 + slightly more restricted message-ID syntax
6429
6430 + several more mandatory headers
6431
6432 + duplicate headers forbidden
6433
6434 + References/See-Also versus In-Reply-To/References
6435 (section 6.5)
6436
6437 + case sensitivity in some contexts
6438
6439 + point-to-point headers, e.g. To and Cc, forbidden
6440 (section 6)
6441
6442 + several new headers
6443
6444
6445 References
6446
6447 [Sanderson] "Smileys", David Sanderson, O'Reilly & Associ-
6448 ates Ltd., 1993.
6449
6450 TBW
6451
6452
6453 Security Considerations
6454
6455 Section 11 discusses security considerations in detail.
6456
6457
6458 Author's Address
6459
6460
6461
6462
6463
6464
6465
6466 2 June 1994 - 98 - expires 15 July 1994
6467
6468
6469
6470
6471
6472 INTERNET DRAFT to be NEWS sec. -
6473
6474
6475 Henry Spencer
6476 henry@zoo.toronto.edu
6477
6478 SP Systems
6479 Box 280 Stn. A
6480 Toronto, Ont. M5W1B2 Canada
6481
6482
6483
6484
6485
6486
6487
6488
6489
6490
6491
6492
6493
6494
6495
6496
6497
6498
6499
6500
6501
6502
6503
6504
6505
6506
6507
6508
6509
6510
6511
6512
6513
6514
6515
6516
6517
6518
6519
6520
6521
6522
6523
6524
6525
6526
6527
6528
6529
6530
6531
6532 2 June 1994 - 99 - expires 15 July 1994
6533
6534

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24