PWSTAT(1)               NetBSD General Commands Manual               PWSTAT(1)

NAME
     pwstat, pwstat3 -- generate a report from web logs

SYNOPSIS
     pwstat [-htgrRouUve] [-y formula] [-l] [-s N] [-j N] [-J N] [-k pattern]
            [-K pattern] [-L] [-mM] [-b pattern] [-B pattern] [-f pattern]
            [-F pattern] [-d date] [-D date] [-q cufr] [logfile ...]
            [logfile.Z ...] [logfile.gz ...]

     pwstat3 [-htgrouUvWRyAjJkKL] [-y formula] [-l] [-P] [-s N] [-j N] [-J N]
            [-k pattern] [-K pattern] [-L] [-mM] [-b pattern] [-B pattern]
            [-f pattern] [-F pattern] [-d date] [-D date] [-q cufr]
            [logfile ...] [logfile.Z ...]

DESCRIPTION
     These commands write to STDOUT an HTML-ized statistical extract of a
     Panix user's web log.  pwstat requires the input files to be in the Panix
     Oldstyle Log Format (POLF), whereas pwstat3 accepts input in either POLF
     or Extended Common Log Format (XCLF).  If no log file(s) are specified,
     the Panix command 'getlogs' (or pwstat (or pwstat3 ).  Note that the
     information fetched by 'getlogs' and 'getclogs' is normally "reset" at
     the beginning of every month.

     The report generated by pwstat (or pwstat3 ) includes the following
     information:

     -   Total traffic (in files and bytes delivered) from your site during
         the reporting period.
     -   Analysis of traffic by date.
     -   Analysis of traffic by hour of the day.
    -   Analysis of traffic by archive name; i.e., filename. See the pwlog
         discussion for an explanation of the "(u)" and "(c)" sequences which
         precede the filenames.
     -   Analysis of traffic by archive type; i.e., the filename extension.
         Exact archive type breakdown could be difficult to interpret if, for
         example, you are making significant use of CGI scripts to deliver
         your site's content, as such will be listed under "CGI".
     -   Analysis of traffic by requesting top-level domain; e.g., gov, com,
         uk, etc. (Note: See the -r option.)
     -   Analysis of traffic by requesting continent, as determined by exami-
         nation of the requesting domain. (Note: See the -r option.)  The non-
         country top-level domains are determined as follows: com is treated
         as international and reported as "Commercial", edu although possibly
         international is treated as US and reported as "North America", gov
         is reported as "North America", mil is reported as "North America",
         net is treated as international and reported as "Network", and org is
         treated as international and reported as "Noncommercial Organiza-
         tion",
     -   Analysis of traffic by requesting reversed sub-domain; e.g., requests
         from America On-Line users are reported as com.aol.*. (Note: See the
         -r option.)
     -   List of URLs most frequently referring visitors to your site.
     -   List of domains most frequently referring visitors to your site.

     Weblogs - as well as the outputs of 'getlogs' and 'getclogs' commands -
     use IP numbers, rather than hostnames, to identify the machines request-
     ing Web pages.  When you use pwstat (or pwstat3 ) with the -r option, the
     program attempts to resolve those numbers to hostnames, and saves the
     resolved IP/hostname pairs in the pwstat , even without the -r flag,
     those hosts whose IPs already appear in the .pwhosts file will, by
     default, be identified by hostname rather than by IP. You can use the -R
     flag to turn this off and to get all hosts listed by IP number, as if the
     .pwhosts information didn't exist.

     When reporting by reversed subdomain, the default is to list resolved
     hostnames one level up from the machine name (e.g., gate-
     keeper.nytimes.com is included in "com.nytimes.*"), while all unresolved
     IP numbers are reported as "Unresolved". Use the -o option for more host-
     name detail and the -u or -U option for more detail on IP numbers.

RUNNING PWSTAT
     If you have already run getlogs or pwlog , the procedure for obtaining
     webstats is to simply type:

          pwstat logfilename > statfilename

     If you do not specify an input file name, pwstat will automatically call
     getlogs for you. In other words, instead of typing

          getlogs > logfilename pwstat logfilename > statfilename

     you can just type

          pwstat > statfilename

     You can also create a statistical extract from more than one input log
     file by typing

          pwstat logfile1name logfile2name > statfilename

OPTIONS
     The complete list of pwstat's options is included in its help message,
     which you can obtain by typing
          pwstat -h

     Notable among these options are:

     -b <pattern>
             Include only requests from machines which include this pattern (a
             Perl regexp). Note: If the -r option is used, this test is made
             after the IP-to-hostname conversion is attempted,

     -B <pattern>
             Omit requests from machines which include this pattern (a Perl
             regexp). Note: If you specify any combination of the -b, -B, -m
             and -M options, only one of them will be evaluated.  Preference
             is in the order just given (i.e., -b always wins)

     -d <somedate>
             Omit requests before the spcified date. The format of the date
             must be YYYY:MM:DD; for example, to obtain a report limited to
             requests on or after August 15, 1995, you would replace somedate
             with 1995:08:15. Note: Remember that the only requests which will
             be checked against the specified date are those from the log
             file(s) you've specified.

     -d <somedate>
             Omit requests after the specified date.

     -f <pattern>
             Include only requests for filenames which include this pattern (a
             Perl regexp).

     -F <pattern>
             Omit requests for filenames which include this pattern (a Perl
             regexp).

     -g      "Smash" the filenames of graphics, reducing any filename with
             extension bmp, gif, jpg, jpeg or png to (gfx) This is handy if
             you have directories full of GIFs and JPEGs that you don't want
             to see listed individually in your stats.

     -j <N>  In the list of URLs which most frequently referred visitors to
             your site, include only the N most frequent URLs. If this option
             is not specified, then the default is 25. If you do not want this
             section included in your pwstat report, then specify pwstat -j 0.

     -J <N>  In the list of domains which most frequently referred visitors to
             your site, include only the N most frequent domains.  If this
             option is not specified, then the default is 25. If you do not
             want this section included in your pwstat report, then specify
             pwstat -J 0.

     -k <pattern>
             In the list of URLs which most frequently referred visitors to
             your site, exclude URLS which match this pattern (a Perl regex).
             This option is most useful when you want to exclude referrals
             from within your own domain. For example, if your domain were
             www.skatecity.com, then you exclude self-referrals by specifying
             pwstat -k 'www.skatecity.com'.

     -K <pattern>
             In the list of domains which most frequently referred visitors to
             your site, exclude domains which match this pattern (a Perl
             regex).

     -l      Execute getlogs -o and use the result as input for pwstat.  This
             results in pwstat output based on the previous getlogs reporting
             period. This option is ignored if you specify an input log file-
             name.

     -m      Omit any request coming from any *.panix.com and *.access.net
             host.

     -M      Omit any request coming from outside the *.panix.com and
             *.access.net domains.

     -o      In the reversed sub-domain section of the report, the last por-
             tion of a computer name is normally lopped off; e.g., gate-
             keeper.nytimes.com would just be reported as com.nytimes.* as
             would all requests from everyone else in the nytimes.com domain.
             To force hostnames to be completely reported, invoke the -o
             option.

     -q <list>
             Filter log entries by usage type, where "list" can be one or more
             of c, u, or f. If c, then we want corporate web hits included; if
             u, include personal web hits; and if f, include ftp transfers.
             Note: Most Panix users do not have both corporate and personal
             web traffic, but corporate users may want to use this option to
             generate separate reports for their web and ftp traffic.

     -r      Turns on IP-to-hostname resolving. In the raw weblogs, the
             machines requesting your webpages are normally identified by IP
             number, and to turn that number into a computer name, a host
             lookup must be performed. Every time you use the -r option to
             pwstat, the newly found resolutions of IP numbers into hostnames
             are appended to a special file in your directory, and these
             results are used during subsequent runs of pwstat.

             See below for more information on why you should not use this
             option unless you really need to know the domains and subdomains
             of the computers visiting your site.

     -R      Reports all the requesting hosts by IP number, even in cases
             where the IPs have been previously resolved to hostnames and the
             results are available in the .pwhosts file.

     -s <N>  Execute getlogs -sN and use the result as input for pwstat. The N
             is an integral value indicating the number of bytes at the begin-
             ning of the getlogs report to ignore/skip. This option is ignored
             if you specify an input log filename.

     -t      Generate a text-only report. The default is an HTML report.

     -u      Normally, unresolved IP numbers are listed in the domain and
             reversed sub-domain sections of the pwstat report as simply
             "Unresolved". To force all IP numbers to be individually reported
             in the reversed sub-domain section, invoke this option.

     -U      The -u option will likely result in more data than you want, but
             perhaps you still want some sort of guess-timate of the number of
             different sites visiting your webpages. The -U option will force
             partial reporting of unresolved IP numbers, ignoring the last
             number in the four-number sequence. For example, the IP number
             166.84.197.198 would be listed as 166.84.197.*, as would all
             other machines in the 166.84.197.* network that happened to visit
             your site.

     -y <scheme>
             The pwstat output includes near the top a line that says "Approx.
             in the reversed sub-domain section, invoke this option.

     -U      The -u option will likely result in more data than you want, but
             perhaps you still want some sort of guess-timate of the number of
             different sites visiting your webpages. The -U option will force
             partial reporting of unresolved IP numbers, ignoring the last
             number in the four-number sequence. For example, the IP number
             166.84.197.198 would be listed as 166.84.197.*, as would all
             other machines in the 166.84.197.* network that happened to visit
             your site.

     -y <scheme>
             The pwstat output includes near the top a line that says "Approx.
             Cost of External Transmissions $12.34". This cost is by default
             calculated using the formula for personal web service.  However,
             the various levels of corporate web service have different cost
             formulas, but pwstat has no way of knowing which to use unless
             you tell it. Thus, you may specify one of the following schemes:
             personal, corporate, basic, standard or deluxe. (Note: Panix
             assesses monthly charges on your total traffic. If you have
             invoked any pwstat options which cause it to skip log entries,
             then the value calculated will not correspond with what you are
             actually charged.)

SEE ALSO
     getlogs(1)

     getclogs(1)

     http://www.panix.com/web/faq/logs/

NetBSD 3.1                     February 6, 2007                     NetBSD 3.1