newslite User Manual


This document is made of the following parts:
The newslite archive contains the following files:


Many thanks to Jean-François Minet for his testing support, suggestions, and User Manual inputs.


Pierre Lescuyer - last update 29-Dec-2008 - pcles@free.fr 


Executive Summary

newslite is a UNIX/LINUX command line tool to download from BINARY newsgroup and decode multipart and monopart yENC and UU (Unix-to-Unix) encoded files.

There already exist a lot of programs like that. I built this new one to fulfill the following objectives:
At this point in time, newslite only supports yEnc or UU encoded binaries, the most widely used format for binary posting. Any attempt to download any other kind of encoded files will fail.

Launch first with -r to retrieve headers from the group you are interested in, and then launch again with -d to retrieve your files. All binaries are saved in the current directory, i.e. where newslite is launched.

The text output is kept to a minimum level. To know more details on how a session has run, please read the log.txt once the session is over.

newslite main features

Threaded server connections

By using the -c option, newslite creates several connections to the server in order to speed up downloads (most servers limit the bandwidth per connection).
This option allows to retrieve binary files more quickly (-d option), but also works with newsgroup header retrieval (-r option).

Configuration file

In order to simplify newslite launch, a configuration text file can optionally be used. This file is named "news.conf", it can be edited with any standard text file editing tool, and helps to specify the following configuration information which will be taken by newslite at each launch:
  1. the server name (-s option); the program works with only one server.
  2. the number of simultaneous connections for file downloading (-c option)
  3. user name and password for authentication (-a option)
  4. the port number used to get connected to the server (-p option)
  5. the maximum number of files per newsgroup newslite can handle - this allows to extend the default value of 10000
  6. the maximum number of titles per newsgroup newslite can handle - this allows to extend the default value of 5000
For items (5) and (6), keep in mind that increasing the limits will have a direct impact on the allocated memory.

In any case, when used in the command line, -s, -c, -a and -p options override the coresponding parameters specified in the "news.conf" file.

NZB files

".nzb" files can be found somewhere on usenet groups. These files allow newsreader utilities to easily retrieve all the parts of a file being posted. Please check the newzbin site for further information.
newslite processes ".nzb" files using the -d option. A tutorial on "how to use .nzb files" is available further in this document.

Download resuming

In case newslite is interrupted, or the connection is torn down (which causes the download task to stop), it can be possible to resume the file downloading by using the -b option. When launched with options -r or -b, newslite generates a "batch.idx" file containing all the files to download. When option -b is used, newslite analyses what has already been downloaded, so that only missing files are downloaded.

Fast and efficient message header retrieval

Newsgroup header retrieval can be a long process. As an illustration, it's good to know that retrieving 300000 headers is equivalent to download a 80 Mo file. In order to limit the number of headers to be downloaded, newslite offers two options:

Searching

The search feature of newslite allows to perform a text search on the "subject" field of articles within a given group, using the -S option. This feature is an alternative to the classical way of searching items (i.e. retrieve all the group headers using -r option, and search in the .txt result file). The -S option is a good way to save time if something specific is looked at.
This option makes use of the XPAT NNTP command. It may happen that your news server does not support it. This can easily be checked by launching newslite with the -h (news server help) option, which returns the list of commands supported by the news server.


Remarks and limitations

Hints

System requirements

From version 1.7, newslite is a universal binary, working on both PPC and Intel MAC platforms.

Back to the top


How to use newslite ?

newslite does not make everything in one shot. This section describes the various steps the user need to know to achieve binary file downloading from newsgroups using newslite.


Step 1: getting the group list

This step is not mandatory if you already know which group(s) you are looking for.
This can be done e.g. using the following syntax ./newslite -s my.news.server -l
At the end a text file is produced, having the same name of the news server, plus a .txt extension.

All the supported groups are listed in this file, in alphabetic order. Below is an example of what the text file contains:

alt.autos.studebaker    --> 6582 articles
alt.autos.subaru        --> 1660 articles
alt.autos.toyota        --> 4928 articles
alt.autos.toyota.camry  --> 1127 articles


Step 2: searching for a specific content

This step is optional. However, in case the user has someting specific to look for, this may be a useful option in order to avoid retrieving all article headers and discovering that the group does not contain the requested information after having downloaded 80 Mbytes of data.

The search can be performed using the following syntax ./newslite -s my.news.server -g my.binary.group -S string-to-search
 
Depending on the number of messages in the group, this may take some time to do the search, but only the message subject which contain the string will be downloaded.

Here is an example of the text ouptut generated by this command:

12142639 320 kbps - Miles Davis - The Man In The Green Shirt - 11 - Shout.mp3 - [12/15] - yEnc (02/22)
12142641 320 kbps - Miles Davis - The Man In The Green Shirt - 11 - Shout.mp3 - [12/15] - yEnc (07/22)
12142647 320 kbps - Miles Davis - The Man In The Green Shirt - 11 - Shout.mp3 - [12/15] - yEnc (08/22)

The first field of each line is the message number. The rest of the line is "subject" field of the message, which should in principle contain the string of the -S option.
For futher information on the -S option and limitations, please look to the command line syntax section below.


Step 3: retrieving article headers

In order for newslite to know what is really available in a given group, the article headers need to be retrieved from the server and processed so the the user can check available binary files. This step is anyway mandatoy before downloading anything (see next step), even if the -S (search) option has been used succefully beforehand.
The header retrieval can be done e.g. using the following syntax ./newslite -s my.news.server -g my.binary.group -r

At the end of the process, two files are created:
Here is an example of what the "my.binary.group.txt" file may contain:

(OK) 04 Apr / Joey-s01e01-hdtv.par2 / 1 parts / 18 Kbytes
(OK) 04 Apr / Joey-s01e01-hdtv.part01.rar / 61 parts / 15531 Kbytes
(OK) 04 Apr / Joey-s01e01-hdtv.part02.rar / 61 parts / 15531 Kbytes
(OK) 04 Apr / Joey-s01e01-hdtv.part03.rar / 61 parts / 15536 Kbytes

Each line indentifies a binary file. It indicates a status (OK means all the parts have been found on the server), the file posting date, the number of parts (or number of articles) and the totoal file size in Kbytes.

The concept of "title" is not specific to newslite. Most of the time, large binary files or disk images are cut into several parts. newslite attempts to gather all the binary files and identify the initial file title, so that group content is even more synthetic. Here is an example of what the "my.binary.groupTITLES.txt" file may contain:

(OK) 04 Apr / (http://abmd.info/1429) [000/104] - "dmd-lxg.nzb" yEnc  / 1 files / 696890 bytes
(OK) 04 Apr / (http://abmd.info/1429) [001/119] - "dmd-lxg.par2" yEnc  / 16 files / 149781 bytes
(OK) 04 Apr / (http://abmd.info/1429) [002/119] - "dmd-lxg-cd1.r00" yEnc  / 49 files / 752456426 bytes
(OK) 04 Apr / (http://abmd.info/1429) [051/119] - "dmd-lxg-cd1.rar" yEnc  / 1 files / 15548048 bytes

Each line identifies a title. It indicates a status (OK means all the individual binary files are complete), the title posting date, the subject (as posted on the group), the number of binary files which compose he title, and the total size in bytes.


Step 4: downloading files

Now we know what is stored in the server, it's time to download...
This can be done e.g. using the following syntax ./newslite -s my.news.server -g my.binary.group -d myfile

This command makes newslite to look for the sub-string "myfile" in identified filenames.
This a very simple search. No wildcard allowed.

Before starting the actual download process, newslite asks for confirmation, just to make sure the download list corresponds to user wish (this can be avoided by using the -f option).

Remember that if your download session fails or is terminated before completion for any reason (loss of connection, ...) it can be resumed very simply by ./newslite -s my.news.server -b


Back to the top


Using ".nzb" files

NZB files have been designed in order to ease file downloading from news groups. NZB files are XML files; they are short summary files containing all the information a newsreader requires to download any given file.
Shortly said, NZB files contain the following information:
The main interest of NZB files is that they allow to simply get all messages IDs corresponding to all files parts related to a title, and check for their availability on the news server before launching the download process.

Please check the newzbin site for further information.

The following explains how you can use newslite to easily download files from news groups using NZB file descriptions.

Step 1: looking for .nzb files

If you already have NZB files, go directly to step 2.

If you don't have any NZB file, the best is to check whether or not the groups you have access to contain such files.
This can be done using the following command: ./newslite -s my.news.server -g my.binary.group -r

Once the command is finished, you can use the "grep" UNIX command to check the availability of .nzb files in my.binary.group.
This can be done using the following command: grep ".nzb" my.binary.groupTITLES.txt

It is better to do the "grep" on the TITLE file, as the full subject will be displayed so that file content identification is easier.
Here is an example of the result of the grep command:

(OK) 04 Apr / "The Blue Lagoon CD-1.part.nzb - yEnc [00/27]  / 1 files / 317658 bytes
(OK) 04 Apr / "The Blue Lagoon CD-2.part.nzb - yEnc [00/27]  / 1 files / 327162 bytes



Step 2: download files indexed by a .nzb

This can be done using the usual syntax: ./newslite -s my.news.server -g my.binary.group -d myfile.nzb

Two cases may occur:
NZB file analysis consists in completeness checking, i.e. checking if all message IDs are present in the newsgroup referenced in the NZB file. For that purpose, newslite attempts to open a <newsgroup>.idm file. idm files are generated by newslite when launched with -r (message header retrieval) option. If this .idm file is not present in the local directory, the program stops.

At the end of the NZB file processing, the following information are displayed:

FTD#331763 The Aviator divx - aviator.part.nzb
Father process, pid 5076
Connected to news server
pid 5076: download file FTD#331763 The Aviator divx - aviator.part.nzb
Processing message IDs from alt.binaries.movies.divx group
Checking .nzb file completeness
(OK)  [01/70] - 'FTD#331763 The Aviator divx - aviator.part01.rar' (01/17)
(OK)  [02/70] - 'FTD#331763 The Aviator divx - aviator.part02.rar' (01/17)
...
Do you confirm the download of 788 Mbytes - (OK) files only - (y/n) ?

Each line corresponds to a file described in the .NZB file. It indicates a status (OK means all the message IDs have been found on the server) and the subject (as posted on the group).

Before starting the actual download process, newslite asks for confirmation, just to make sure the download list corresponds to user wish (this can be avoided by using the -f option).

Limitations:
In this version of newslite, there are some limitations in the support of .nzb files:


Back to the top


newslite command syntax

As a summary, the following text is displayed when newslite is launched without any arguments.

newslite [-a [username password]] [-p port] -s servername -h
        To retrieve informations from the news server.

newslite [-a [username password]] [-p port] -s servername -l [articleNb]
        To list all available groups in news server.

newslite [-a [username password]] [-p port] -s servername -g groupname -t
        To test a given group is available from the news server.

newslite [-a [username password]] [-p port] -s servername -g groupname -r [-L nbHeader] [-c cnxNb] [-D]
        To retrieve news headers from a specific group hosted by the news server.

newslite [-a [username password]] [-p port] -s servername -g groupname -d string [-c cnxNb] [-f]
        To download files from a newsgroup.

newslite [-a [username password]] [-p port] -s servername -b [-c cnxNb]
        To resume an interrupted download session.

newslite [-a [username password]] [-p port] -s servername -g groupname -S string
        To search for an article in a specific group.

The rest of the section provide more in-depth information on each available options.

Option details:

-a [username password]
The "authentication" option. This option allows user authentication. his may be required for some news servers. "username" and "password" fields can be avoided in the command line if the "news.conf" file content is correctly filled. When specified in the command line, "username" and "password" override "news.conf" settings.

-g group name
Used to specify a group name to newslite.

-s server name
Used to specify to which news server newslite shall connect to. This option can be omitted if the "news.conf" file is present and correct. However, -s presence in the command line overrides "news.conf" settings.

-p port number
Usually, news clients are supposed to connect to news servers through the "nntp" (119)  TCP port - this is also the value newslite uses by default. However, it may happen in some cases, that these ports are not available, filtered by firewalls, or bandwidth limited. For thoses reasons, newservers sometimes accept TCP port connection requests other than the usual "nntp" value. This option allows newslite to connect using any port value. This option can be omitted if the "news.conf" file is present and correct. However, -p presence in the command line overrides "news.conf" settings.

-r
The "retrieve" option, used to retrieve message list from a news group. This option produces a <groupname>.txt file which can be edited or printed on the screen using the standard UNIX "more" command. This .txt contains the list of yEnc files newslite has found in the news group.
A <groupname>TITLES.txt file is also generated by newslite, presenting in a more compact way the binary files which appear to be part of the same the same entity. The title "status" indicates whether or not all the binary files are complete or not.

-d string
The "download" option. When present, newslite looks for the substring in the filenames, as they appear in the <groupname>.txt  or <groupname>TITLES.txt files, and attemps to download all the files which match. Prior to this, newslite needs to be launched using the -r option and the corresponding news group.
In case the string terminates with ".nzb", newslite applies a specific process related to NZB files.
Use double quotes if the "string" contains spaces, e.g. -d "my file"

-S string
The "search" option. when present, newslite sends a XPAT command to the news server. As a result, the server will return any article which subject contains the requested string. As in this implementation of newslite, the search option is limited to the "subject" field of the header. It does not allow to search into other fields such as "From" or "Date".
The "string" shall only contain alphanumeric (A to Z, a to z, 0 to 9) or spaces. Otherwise it is rejected.
Use double quotes if the "string" contains spaces, e.g. -S "my string".
The search is not case-sensitive (meaning, e.g.,  "test case" will match with "Test Case").
Even if the "string" only contains basic ASCII chars, it will match with ISO 8859-1 character set (meaning, e.g., "ecran" will match with "écran").
Before using this option, it may be useful to check whether or not the news server supports it (this can be done using the -h option), as it requires the command XPAT to be accepted by the server.

-h
The "help" option. This option displays the news server help message (which includes the list of commands supported by the news server) and the format of overview headers stored by the news server.

-f
The "force" option. This avoid newslite asking for confirmation for file downloading, i.e. when invoked with the -d option.
 
-l [articleNb]
The "list" option. This option requests newslite to retrieves the list of newsgroups. The result is stored in a <newsserver>.txt text file which can be either edited or printed on the screen, the group list being sorted by alphabetic order. Each line of this file contains the a group name, and the number of articles currently known by the server for this group.
If you want to check is a given newsgroup is hosted by the news server, you can use the UNIX grep command:
grep "example.newsgroup" <newsserver>.txt
In order to only list active groups, it is possible to optionally indicate for this option a minimum number of articles (e.g. with "-l 100", all groups having less than 100 messages are not stored in the list).  

-D
The "delta" option. This option requests newslite to only retrieve headers of new messages, taking into account the results of the last -r header retrieval session. This avoids downloading unnecessary information, already downloaded and analized by the end-user. This is a useful option, as it allows to reduce both the time for header downloading and analysis. Of course, if newslite is launched again, not using this option, the full list of headers will be retrieved.

-L nbHeader
The "limit" option. This option may optionally be used to limit the number of retrieved headers. Only works with -r option. This may be useful in case of very large newsgroups (some may store more than 500000 headers... which may turn header downloading into a painful process... It's good to know that retrieving 300000 headers is equivalent to download a 80 Mo file !!).

-c cnxNb
The "connection limit" option. This option is used to specify the number of parallel connections to be setup for message header retrieval or downloading . This is only valid with the -r, -d or -b option. The principle is that the files or headers to download are spread over the threads. This option can be omitted if the "news.conf" file is present and correct. However, -p presence in the command line overrides "news.conf" settings.
For file downloading, if there is only one file to download, even with multiple articles, only one thread will be active.
At this point in time, newslite limits the number of simultaneous connections to "4", which complies with most of the news servers.

-b
This is the "batch" option. if file download process (with -d option) was interrupted (intentionnally or not) this option helps to complete the job. When used,  newslite looks for any existing "batch.idx" file, and attempts to resume file download, based on what has already been retrieved

-t
The "test" option. This can be used together with option -g to simply check if a given group is available from the news server. As an output, newslite indicates if the group is present and how many articles are present in this group.

-v
The "verbose" option - for debugging only. It is not advised to use this under normal operation.
The verbose option cause the log file to contain much more traces, and become less readable for standard use.


Back to the top


Command examples

./newslite -a mySelf myPass -l -s news.isp.fr
The full list of hosted newsgroup is donwloaded from the "news.isp.fr" server and written the "news.isp.fr.txt" text file. News server authentication is tried with "mySelf" username and "myPass" password.

./newslite -l 300
Same as before, except that in this case, newslite knows the server name by reading the "news.conf" file. In the resulting file, only groups having more than 300 articles are listed and contained in a "<server name>.txt" file

./newslite -h -s news.isp.fr
This command displays the newsserver help message (i.e. the list of supported NNTP commands...) and the format of the headers retrieved by the "overview" NNTP command.

./newslite -r -s news.isp.fr -g alt.binaries.divx
All the headers of the "alt.binaries.divx" are downloaded and stored in the "alt.binaries.divx.txt" text file. This file is analysed and the yENC encoded binary files are printed on the screen.

./newslite -r -s news.isp.fr -g alt.binaries.divx -L 10000
Same as before, except that only the last 10000 headers are downloaded. This may be a helpful option for groups having too many messages.
 
./newslite -d dragon -s news.isp.fr -g alt.binaries.divx
In this case, newslite looks for all the files in alt.binaries.divx group which name contains "dragon". If there are any matches, the files are put in the download queue.

./newslite -b -s news.isp.fr -c 4
In this case, newslite looks for the "batch.idx" file, and attempts to resume file download based on what has already bee retrieved. as the -c option is setup, newslite will attempt to open 4 connections towards the news server to download all the files in parallel.


Back to the top


Session Output example

What follows is an example of a retrieval session with 4 threads:

$ newslite -r -c 4 -g alt.binaries.cd.image.french   <return>

newslite - V1.6 - 12-Jan-06 - pcles

pid 431: Connected to news server
Group: alt.binaries.cd.image.french
65038 articles
12180007: first article number
12245044: last article number

Wait for 65038 headers to be retrieved from the news server...

pid 431: father process continuing
pid 431: sending XOVER 12180007-12196266
pid 432: child process 1 starting
pid 433: child process 2 starting
pid 434: child process 3 starting
pid 432: Connected to news server
pid 434: Connected to news server
pid 432: sending XOVER 12196267-12212526
pid 434: sending XOVER 12228787-12245044
pid 433: Connected to news server
pid 433: sending XOVER 12212527-12228786
pid 434: child process 3 closing
pid 432: child process 1 closing
pid 431: 100 percent
pid 431: end of process 434 detected
pid 431: end of process 432 detected
pid 433: child process 2 closing
pid 431: end of process 433 detected

Processing alt.binaries.cd.image.french group - 65031 articles

100 percent processed
Re-ordering all 1813 files

1813 binary files found
File list file alt.binaries.cd.image.french.txt has been created.

retrieveFiles: Errors detected. Check log.txt file.

316 titles found
Title list alt.binaries.cd.image.frenchTITLES.txt has been created.



The green areas (percentages) were of course updated during the progression of the retrieval.
You may notice here the ratios:
65038 articles are grouped into 1813 binary files that are further grouped into 316 titles.

Example  of files names contents :

e.g. as in alt.binaries.cd.image.french.txt

(OK) 04 Apr / Acronis True Image v8.0.par2 / 1 parts / 3 Kbytes
(OK) 04 Apr / Acronis True Image v8.0.part1.rar / 14 parts / 5420 Kbytes
(OK) 04 Apr / Acronis True Image v8.0.part2.rar / 14 parts / 5420 Kbytes
(OK) 04 Apr / Acronis True Image v8.0.part3.rar / 14 parts / 5420 Kbytes
(OK) 04 Apr / Acronis True Image v8.0.part4.rar / 14 parts / 5420 Kbytes
(OK) 04 Apr / Acronis True Image v8.0.part5.rar / 7 parts / 2509 Kbytes
(OK) 04 Apr / Acronis True Image v8.0.vol0+1.PAR2 / 2 parts / 400 Kbytes
(OK) 04 Apr / Acronis True Image v8.0.vol1+2.PAR2 / 3 parts / 800 Kbytes
(OK) 04 Apr / Acronis True Image v8.0.vol3+3.PAR2 / 4 parts / 1197 Kbytes

and the corresponding titles, as in alt.binaries.cd.image.frenchTITLES.txt:

(OK) 04 Apr / (M8X Pa9nE Post) - "Acronis True Image v8.0.par2 yEnc  / 1 files / 3585 bytes
(OK) 04 Apr / (M8X Pa9nE Post) - "Acronis True Image v8.0.part1.rar yEnc  / 5 files / 24190851 bytes
(OK) 04 Apr / (M8X Pa9nE Post) - "Acronis True Image v8.0.vol0+1.PAR2 yEnc  / 3 files / 2398442 bytes


Back to the top



History

1.10 - 29/Dec/08

1.9 - 03/Feb/08

1.8 - 05/Nov/07

1.7 - 28/Aug/06

1.6 - 24/Jun/06
1.5 - 12/Nov/05
1.4b - 04/Apr/05
1.3 - 23/Feb/05
1.2 - 03/Jan/05
1.1b - 13/1Dec/04
1.1a - 07/Dec/04
1.1 - 06/Dec/04
0.9 - 01/Dec/04

Back to the top



Files created by newslite

Once newslite is launched, somes files are created in the current working directory:

log.txt
Log file recording last newslite session events.

<group name>.txt
This file contains the binary files identified by newslite in the group <group name>, as a result of the -r option.

<groupname>TITLES.txt
This file contains the titles identified by newslite, i.e. the list of binary files which appear to belong to the same element. It is produced by the -r option.

<news server name>.txt
This file contains the list of the news groups hosted by the server <news server name>, as a result of the -l option.



newslite also stores working files in a "temp" directory, created in the current working directory. These files are:

<group name>.idx
This file contains references to all the files and corresponding articles in group <group name>, as a result of the -r option.
This file is not text editable.

<group name>.idm
This file contains "message identities" of all the articles in group <group name>, as a result of the -r option. This file is used for checking .nzb file completeness.

batch.idx
This file contains references to all the files and corresponding articles the user as asked for using the -d option.
This file is not text editable.



The following files may be found in the current working directory. They all are temporary files which should not exist when newslite is not running.

listtmp.txt
This one should not appear under normal conditions.
This file contains the result of the list command send to the server, i.e. the raw list of all groups hosted by the server.

logtmp.txt
This one should not appear under normal conditions.
This file is the temporary log file created to clean up errors contained in the log.txt file, when the -r option is used.

*xovertmp.txt
These ones should not appear under normal conditions.
These files contain the result of the xover commands send to the server, i.e. all the article headers. They are usually removed at the end of the "-r" process.

*log.txt
These ones should not appear under normal conditions.
These files contain the log of the threaded download sessions. They are usually removed at the end of the "-d" process.

tmp*.bin
These ones should not appear under normal conditions.
These files contain all the yEncoded articles of a binary file. They are usually removed at the end of the yEnc decoding process.


Back to the top


The End