From xsov@mail.ru Wed May 11 23:30:30 2005
Received: 194.67.23.149 / smx1.brturbo.com
Received: from [212.48.205.42] (port=32246 helo=[192.168.0.77]) by
	mx3.mail.ru with asmtp  id 1DVwh2-0005nc-00 for orso@brturbo.com.br; Wed,
	11 May 2005 23:16:29 +0400
From: Oleg <xsov@mail.ru>
To: Pedro Lineu Orso <orso@brturbo.com.br>
Subject: Re: sarg-2.0.6 new patches
Date: Wed, 11 May 2005 23:30:30 +0400
User-Agent: KMail/1.7.2
References: <200504241642.06841.xsov@mail.ru>
	 <200505060430.57213.xsov@mail.ru> <1115814131.8900.1.camel@lcaklds49>
In-Reply-To: <1115814131.8900.1.camel@lcaklds49>
MIME-Version: 1.0
Content-Type: Multipart/Mixed; boundary="Boundary-00=_W1lgCFYB9s7BJUS"
Message-Id: <200505112330.30670.xsov@mail.ru>
X-Evolution-Source: pop://orso@pop.brturbo.com.br/


--Boundary-00=_W1lgCFYB9s7BJUS
Content-Type: text/plain; charset="koi8-r"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

Hello Pedro!

I found the bugplace, it is already fixed by me in previous topuser.diff patch 
for 2.0.7 (not by those 6 patches for 2.0.6), howeverer I include my log file 
example here with configuration files:
- sarg.conf;
- exclude_codes;
- access.log.

It looks like in this access.log example the one symbol is replaced by 6 (for 
example: %u049F), so in previous topuser.diff patch there must be 1024*6 = 
6144 size for array url (I include new, modified topuser.diff patch for 
2.0.7, which will fix this bug). So, for now, the part of old my patch for 
2.0.6, which adds '\0' at the end of line is not necessary.

I also include here my recommendations for "download_suffix" option default 
value (look at extensions-was.txt for old values and extensions-be.txt for 
new). I mark with star "(*)" symbol most important changes.

Regards, Oleg.

    11  2005 16:22  :
PL > Hello Oleg,
PL >
PL > May I have some of that log file to try to hack sarg, and you sarg.conf
PL > file too, please?
PL >
PL > Thanks

--Boundary-00=_W1lgCFYB9s7BJUS
Content-Type: text/plain; charset="koi8-r"; name="extensions-be.txt"
Content-Disposition: attachment; filename="extensions-be.txt"
Content-Transfer-Encoding: 8bit

tgz   - archive
tar   - archive(*)
cpio  - archive
zip   - archive
arj   - archive
bzip  - archive
bz2   - archive
gz    - archive
rar   - archive
ace   - archive
lha   - archive
lzh   - archive
cab   - archive
7z    - archive
tar   - archive
cpio  - archive

doc   - office suite document
mdb   - office suite document
ppt   - office suite document
rtf   - office suite document
mso   - office suite document(?)
dot   - office suite document(?)

bin   - conatain binary executable
com   - conatain binary executable
sys   - conatain binary executable
exe   - conatain binary executable
dll   - conatain binary executable
bin   - conatain binary executable
scr   - conatain binary executable(*)
bat   - conatain binary executable

iso   - cd/dvd image
nrg   - cd/dvd image
vcd   - cd/dvd image(*)
vob   - cd/dvd image(*)

mp3   - multimedia
avi   - multimedia
mpg   - multimedia
mpeg  - multimedia
wma   - multimedia
wmv   - multimedia(*)
ogg   - multimedia
mov   - multimedia

===================================
adt   - don't know
drv$  - don't know
src   - don't know
shs   - don't know

--Boundary-00=_W1lgCFYB9s7BJUS
Content-Type: text/x-diff; charset="koi8-r"; name="topuser.diff"
Content-Disposition: attachment; filename="topuser.diff"
Content-Transfer-Encoding: 8bit

--- sarg-2.0.7/topuser.c	2005-05-02 17:55:04.000000000 +0400
+++ sarg-2.0.7rbs/topuser.c	2005-05-04 03:53:48.000000000 +0400
@@ -38,7 +38,7 @@
    int posicao=0;
    char olduser[MAXLEN], csort[MAXLEN], periodo[MAXLEN], arqper[MAXLEN];
    char wger[MAXLEN], top1[MAXLEN], top2[MAXLEN], top3[MAXLEN];
-   char user[MAXLEN], nacc[20], nbytes[20], url[1024], preg[8000], tusr[MAXLEN];
+   char user[MAXLEN], nacc[20], nbytes[20], url[6144], preg[8000], tusr[MAXLEN];
    char ip[MAXLEN], hora[9], data[11], elap[15], incac[15], oucac[15], html[MAXLEN];
    char ipantes[MAXLEN], nameantes[MAXLEN];
    char sfield[10]="2,2";

--Boundary-00=_W1lgCFYB9s7BJUS
Content-Type: text/plain; charset="koi8-r"; name="extensions-was.txt"
Content-Disposition: attachment; filename="extensions-was.txt"
Content-Transfer-Encoding: 8bit

tgz   - archive
zip   - archive
arj   - archive
bzip  - archive
bz2   - archive
gz    - archive
rar   - archive
ace   - archive
lha   - archive
lzh   - archive
cab   - archive

doc   - office suite document
mdb   - office suite document
ppt   - office suite document
rtf   - office suite document
mso   - office suite document(?)
dot   - office suite document(?)

bin   - conatain binary executable
com   - conatain binary executable
sys   - conatain binary executable
exe   - conatain binary executable
dll   - conatain binary executable
bin   - conatain binary executable

iso   - cd/dvd image

mp3   - multimedia
avi   - multimedia
mpg   - multimedia
mpeg  - multimedia

===================================
adt   - don't know
drv$  - don't know
src   - don't know
shs   - don't know

--Boundary-00=_W1lgCFYB9s7BJUS
Content-Type: text/plain; charset="koi8-r"; name="sarg.conf"
Content-Disposition: attachment; filename="sarg.conf"
Content-Transfer-Encoding: 8bit

# sarg.conf
#
# TAG:  language 
#	Available languages:
#		Bulgarian_windows1251
#		Catalan
#		Czech
#		Dutch
#		English
#		French
#		German
#		Hungarian
#		Indonesian
#		Italian
#		Japanese
#		Latvian
#		Polish
#		Portuguese
#		Romanian
#		Russian_koi8
#		Russian_windows1251
#		Serbian
#		Spanish
#		Turkish
#
language Russian_koi8

# TAG:  access_log file
#       Where is the access.log file
#       sarg -l file
#
access_log /mnt/access.log

# TAG: graphs yes|no
#	Use graphics where is possible.
#           graph_days_bytes_bar_color blue|green|yellow|orange|brown|red
#
graphs yes
graph_days_bytes_bar_color orange

# TAG:	title
# 	Especify the title for html page.
#

# TAG:	font_face
# 	Especify the font for html page.
#
font_face Tahoma,Verdana,Arial

# TAG:	header_color
# 	Especify the header color
#
header_color darkblue

# TAG:	header_bgcolor
# 	Especify the header bgcolor
#
header_bgcolor blanchedalmond

# TAG:	font_size
# 	Especify the text font size
#
font_size 10px

# TAG:	header_font_size
# 	Especify the header font size
#
header_font_size 10px

# TAG:	title_font_size
# 	Especify the title font size
#
title_font_size 20px

# TAG:	background_color
# TAG:	background_color
#	Html page background color
#
background_color white

# TAG:	text_color
#	Html page text color
#
text_color #000000

# TAG:	text_bgcolor
#	Html page text background color
#
text_bgcolor lavender

# TAG:	title_color
#	Html page title color
#
title_color green

# TAG:	logo_image
#	Html page logo.
#

# TAG:	logo_text
#	Html page logo text.
#
#logo_text ""

# TAG:	logo_text_color
#	Html page logo texti color.
#
#logo_text_color #000000

# TAG:	logo_image_size
#	Html page logo image size. 
#       width height
#
image_size 253 35

# TAG:	background_image
#	Html page background image
#
#background_image none

# TAG:  password
#       User password file used by authentication
#       If used here, reports will be generated only for that users.
#
password none

# TAG:  temporary_dir
#       Temporary directory name for work files
#       sarg -w dir
#
temporary_dir /tmp

# TAG:  output_dir
#       The reports will be saved in that directory
#       sarg -o dir
#
output_dir /xxx

# TAG:  output_email
#       Email address to send the reports. If you use this tag, no html reports will be generated.
#       sarg -e email
#
output_email none

# TAG:  resolve_ip yes/no
#       Convert ip address to dns name
#       sarg -n
resolve_ip no

# TAG:  user_ip yes/no
#       Use Ip Address instead userid in reports.
#       sarg -p
user_ip no

# TAG:  topuser_sort_field field normal/reverse
#       Sort field for the Topuser Report.
#       Allowed fields: USER CONNECT BYTES TIME
#
topuser_sort_field BYTES reverse

# TAG:  user_sort_field field normal/reverse
#       Sort field for the User Report.
#       Allowed fields: SITE CONNECT BYTES TIME
#
user_sort_field BYTES reverse

# TAG:  exclude_users file
#       users within the file will be excluded from reports.
#       you can use indexonly to have only index.html file.
#
exclude_users none

# TAG:  exclude_hosts file
#       Hosts, domains or subnets will be excluded from reports.
#
#       Eg.: 192.168.10.10 - exclude ip address only
#            192.168.10.0  - exclude full C class
#            s1.acme.foo   - exclude hostname only
#            acme.foo      - exclude full domain name
#
exclude_hosts none

# TAG:  useragent_log file
#       Put here where is useragent.log to nable useragent report.
#
#useragent_log none

# TAG:  date_format
#       Date format in reports: e (European=dd/mm/yy), u (American=mm/dd/yy), w (Weekly=yy.ww)
#       
date_format e

# TAG:  per_user_limit file MB
#       Saves userid on file if download exceed n MB.
#       This option allow you to disable user access if user exceed a download limit.
#       
per_user_limit none

# TAG: lastlog n
#      How many reports files must be keept in reports directory.
#      The oldest report file will be automatically removed.
#      0 - no limit.
#
lastlog 0

# TAG: remove_temp_files yes
#      Remove temporary files: geral, usuarios, top, periodo from root report directory.
#
remove_temp_files yes

# TAG: index yes|no|only
#      Generate the main index.html.
#      only - generate only the main index.html
#
index yes

# TAG: overwrite_report yes|no
#      yes - if report date already exist then will be overwrited.
#       no - if report date already exist then will be renamed to filename.n, filename.n+1
#
overwrite_report yes

# TAG: records_without_userid ignore|ip|everybody
#      What can I do with records without user id (no authentication) in access.log file ?
#
#      ignore - This record will be ignored.
#          ip - Use ip address instead. (default)
#   everybody - Use "everybody" instead.
#
records_without_userid ip

# TAG: use_comma no|yes
#      Use comma instead point in reports.
#      Eg.: use_comma yes => 23,450,110
#           use_comma no  => 23.450.110
#
use_comma no

# TAG: mail_utility mail|mailx
#      Mail command to use to send reports via SMTP
#
#mail_utility mailx

# TAG: topsites_num n
#      How many sites in topsites report.
#
topsites_num 100

# TAG: topsites_sort_order CONNECT|BYTES A|D
#      Sort for topsites report, where A=Ascendent, D=Descendent
#
topsites_sort_order BYTES D

# TAG: index_sort_order A/D
#      Sort for index.html, where A=Ascendent, D=Descendent
#
index_sort_order D

# TAG: exclude_codes file
#      Ignore records with these codes. Eg.: NONE/400
#
exclude_codes /xxx/exclude_codes

# TAG: replace_index string
#      Replace "index.html" in the main index file with this string
#      If null "index.html" is used 
#
#replace_index <?php echo str_replace(".", "_", $REMOTE_ADDR); echo ".html"; ?>

# TAG: max_elapsed milliseconds
#      If elapsed time is recorded in log is greater than max_elapsed use 0 for elapsed time.
#      Use 0 for no checking 
#
max_elapsed 0
# 8 Hours
#max_elapsed 28800000

# TAG: report_type type
#      What kind of reports to generate.
#      topsites		   - shows the site, connect and bytes
#      sites_users	   - shows which users were accessing a site
#      users_sites	   - shows sites accessed by the user
#      date_time	   - shows the amount of bytes used by day and hour
#      denied		   - shows all denied sites with full URL
#      auth_failures       - shows autentication failures
#      site_user_time_date - shows sites, dates, times and bytes
#
#      Eg.: report_type topsites denied 
#
#report_type topsites sites_users users_sites date_time denied auth_failures site_user_time_date
report_type topsites sites_users users_sites date_time denied auth_failures site_user_time_date

# TAG: usertab filename
#      You can change the "userid" or the "ip address" to be a real user name on the rpeorts.
#      Table syntax:
# 		userid name   or   ip address name
#      Eg:
#		SirIsaac Isaac Newton
#		vinci Leonardo da Vinci
#		192.168.10.1 Karol Wojtyla
#      
#      Each line must be terminated with '\n'
#

# TAG: long_url yes|no
#      If yes, the full url is showed in report.
#      If no, only the site will be showed
#
#      YES option generate very big sort files and reports.
#
long_url yes

# TAG: date_time_by bytes|elap
#      Date/Time reports will use bytes or elapsed time?
#
date_time_by bytes

# TAG: charset name
#      ISO 8859 is a full series of 10 standardized multilingual single-byte coded (8bit)
#      graphic character sets for writing in alphabetic languages
#      You can use the following charsets:
#		Latin1 		- West European
#		Latin2 		- East European 
#		Latin3 		- South European 
#		Latin4 		- North European 
#		Cyrillic 
#		Arabic 
#		Greek 
#		Hebrew 
#		Latin5 		- Turkish 
#		Latin6
#		Windows-1251
#		Koi8-r
#
charset KOI8-R

# TAG: user_invalid_char "&/"
#      Records that contain invalid characters in userid will be ignored by Sarg.
#
#user_invalid_char "&/"

# TAG: privacy yes|no
#      privacy_string "***.***.***.***"
#      privacy_string_color blue
#      In some countries the sysadm cannot see the visited sites by a restrictive law.
#      Using privacy yes the visited url will be changes by privacy_string and the link
#      will be removed from reports.
#
privacy no
#privacy_string "***.***.***.***"
#privacy_string_color blue

# TAG: include_users "user1:user2:...:usern"
#      Reports will be generated only for listed users.
#
#include_users none

# TAG: exclude_string "string1:string2:...:stringn"
#      Records from access.log file that contain one of listed strings will be ignored.
#
#exclude_string none

# TAG: show_successful_message yes|no
#      Shows "Successful report generated on dir" at end of process.
#
show_successful_message no

# TAG: show_read_statistics yes|no
#      Shows some reading statistics.
#
show_read_statistics no

# TAG: topuser_fields
#      Which fields must be in Topuser report.
#
topuser_fields NUM DATE_TIME USERID CONNECT BYTES %BYTES TOTAL AVERAGE

# TAG: user_report_fields
#      Which fields must be in User report.
#
user_report_fields CONNECT BYTES %BYTES TOTAL AVERAGE

# TAG: topuser_num n
#      How many users in topsites report. 0 = no limit
#
topuser_num 0

# TAG: site_user_time_date_type list|table
#      generate reports for site_user_time_date in list or table format
#
site_user_time_date_type table

# TAG: datafile file
#      Save the report results in a file to populate some database
#
#datafile none
#datafile /tmp/p8

# TAG: datafile_delimiter ";"
#      ascii character to use as a field separator in datafile
#
#datafile_delimiter ";"

# TAG: datafile_fields all
#      Which data fields must be in datafile
#      user;date;time;url;connect;bytes;in_cache;out_cache;elapsed
#
#datafile_fields user;date;time;url;connect;bytes;in_cache;out_cache;elapsed

# TAG: weekdays
#      The weekdays to take account ( Sunday->0, Saturday->6 )
# Example:
#weekdays 1-3,5
# Default:
weekdays 0-6

# TAG: hours
#      The hours to take account
# Example:
#hours 7-12,14,16,18-20
# Default:
hours 0-23

# TAG: squidguard_conf file
#      path to squidGuard.conf file
#      Generate reports from SquidGuard logs.
#      Use 'none' to disable.
#      squidguard_conf /usr/local/squidGuard/squidGuard.conf
#
#squidguard_conf none

# TAG: squidguard_log_format
#      Format string SquidGuard logs.
#      REJIK       #year#-#mon#-#day# #hour# #list#:#tmp# #ip# #user# #tmp#/#tmp#/#url#/#end#
#      SQUIDGUARD  #year#-#mon#-#day# #hour# #tmp#/#list#/#tmp#/#tmp#/#url#/#tmp# #ip#/#tmp# #user# #end#
#squidguard_log_format #year#-#mon#-#day# #hour# #tmp#/#list#/#tmp#/#tmp#/#url#/#tmp# #ip#/#tmp# #user# #end#

# TAG: show_sarg_info yes|no
#      shows sarg information and site path on each report bottom
#
show_sarg_info no

# TAG: show_sarg_logo yes|no
#      shows sarg logo
#
show_sarg_logo no

# TAG: parsed_output_log directory
#      Saves the processed log in a sarg format after parsing the squid log file.
#      This is a way to dump all of the data structures out, after parsing from 
#      the logs (presumably this data will be much smaller than the log files themselves),
#      and pull them back in for later processing and merging with data from previous logs.
#
#parsed_output_log none

# TAG parsed_output_log_compress /bin/gzip|/usr/bin/bzip2|nocompress
#      sarg logs compress util
#
#parsed_output_log_compress /bin/gzip

# TAG displayed_values bytes|abbreviation
#      how the values will be displayed in reports.
#      eg. bytes  	-  209.526
#          abbreviation -  210K
#
displayed_values bytes

# Report limits
# TAG authfail_report_limit n
# TAG denied_report_limit n
# TAG siteusers_report_limit n
# TAG squidguard_report_limit n
# TAG user_report_limit n
#      report limits (lines).
#      '0' no limit
#
authfail_report_limit 0
denied_report_limit 0
siteusers_report_limit 0
squidguard_report_limit 0
user_report_limit 0

# TAG www_document_root dir
#     Where is your Web DocumentRoot
#     Sarg will create sarg-php directory with some PHP modules:
#     - sarg-squidguard-block.php - add urls from user reports to squidGuard DB
#
www_document_root /xxx

# TAG block_it module_url
#     This tag allow you to pass urls from user reports to a cgi or php module,
#     to be blocked by some Squid acl
#
#     Eg.: block_it /sarg-php/sarg-block-it.php
#     sarg-block-it is a php that will append a url to a flat file.
#     You must change /var/www/html/sarg-php/sarg-block-it to point to your file
#     in $filename variable, and chown to a httpd owner.
#
#     sarg will pass http://module_url?url=url
#
block_it none

# TAG external_css_file path
#     This tag allow internal sarg css override.
#     Sarg use theses style classes:
#     	.body		body class
#	.info		sarg information class, align=center
#	.title		title class, align=center
#	.header		header class, align:left
#	.header2	header class, align:right
#	.header3	header class, align:right
#	.text		text class, align:left
#	.data		table text class, align:right
#	.data2		table text class, align:right, border colors
#	.link  		link class
#
#     There is a sample in /usr/local/sarg/etc/css.tpl
#
#external_css_file none

# TAG user_authentication yes|no
#     Allow user authentication in User Reports using .htaccess
#     Parameters:  
#	AuthUserFile 	- where the user password file is
#	AuthName	- authentication realm. Eg "Members Only"
#	AuthType	- authenticaion type - basic
#	Require		- authorized users to see the report.
#                                          %u - user report
#
# user_authentication no
# AuthUserFile /usr/local/sarg/passwd
# AuthName "SARG, Restricted Access"
# AuthType Basic
# Require user admin %u

# TAG download_suffix "suffix,suffix,...,suffix"
#    file suffix to be considered as "download" in Download report.
#    Use 'none' to disable.    
#
download_suffix "tgz,zip,arj,bzip,bz2,gz,rar,ace,doc,iso,adt,bin,cab,com,dot,drv$,lha,lzh,mdb,mso,ppt,rtf,src,shs,sys,exe,dll,mp3,avi,mpg,mpeg"

--Boundary-00=_W1lgCFYB9s7BJUS
Content-Type: text/x-log; charset="koi8-r"; name="access.log"
Content-Disposition: attachment; filename="access.log"
Content-Transfer-Encoding: 8bit

1112196006.509    265 192.168.77.15 TCP_MISS/200 296 GET http://kmindex.ru/c/?id=313791&id2=48&v=30&l=http%3A//www.alkonvvs.ru/&r=http%3A//www.yandex.ru/yandsearch%3Ftext%3D%25D2%25EE%25F0%25E3%25EE%25E2%25FB%25E5+%25E2%25E8%25F2%25F0%25E8%25ED%25FB%26holdreq%3D%25D2%25EE%25F0%25E3%25EE%25E2%25EE%25E5+%25EE%25E1%25EE%25F0%25F3%25E4%25EE%25E2%25E0%25ED%25E8%25E5%26stype%3Dwww&t=%u0422%u043E%u0440%u0433%u043E%u0432%u043E%u0435%20%u043E%u0431%u043E%u0440%u0443%u0434%u043E%u0432%u0430%u043D%u0438%u0435%2C%20%u043F%u0440%u043E%u0434%u0430%u0436%u0430%20%u0442%u043E%u0440%u0433%u043E%u0432%u043E%u0433%u043E%20%u043E%u0431%u043E%u0440%u0443%u0434%u043E%u0432%u0430%u043D%u0438%u044F%3A%20%u0432%u0438%u0442%u0440%u0438%u043D%u0430%20%u0438%20%u0432%u0438%u0442%u0440%u0438%u043D%u044B%20%u0438%u0437%20%u0430%u043B%u044E%u043C%u0438%u043D%u0438%u0435%u0432%u043E%u0433%u043E%20%u043F%u0440%u043E%u0444%u0438%u043B%u044F%2C%20%u043F%u0440%u043E%u0441%u0442%u043E%20%u0430%u043B%u044E%u043C%u0438%u043D%u0438%u0435%u0432%u044B%u0439%20%u043F%u0440%u043E%u0444%u0438%u043B%u044C%20%u0434%u043B%u044F%20%u0432%u044B%u0441%u0442%u0430%u0432%u043E%u0447%u043D%u043E%u0433%u043E%20%u043E%u0431%u043E%u0440%u0443%u0434%u043E%u0432%u0430%u043D%u0438%u044F%20-%20%u043E%u0442%20%u043A%u043E%u043C%u043F%u0430%u043D%u0438%u0438%20ALCON-BBC&f=0&d=0.95783290883686160.014756022435909899 - DIRECT/217.174.98.3 image/gif

--Boundary-00=_W1lgCFYB9s7BJUS
Content-Type: text/plain; charset="koi8-r"; name="exclude_codes"
Content-Disposition: attachment; filename="exclude_codes"
Content-Transfer-Encoding: 8bit

NONE/400
TCP_MEM_HIT/200
TCP_REFRESH_HIT/304
TCP_REFRESH_HIT/200
TCP_IMS_HIT/304
TCP_HIT/200
TCP_NEGATIVE_HIT/404

--Boundary-00=_W1lgCFYB9s7BJUS--

