4 - YADU, May/2001 - Alek Komarnitsky [<Prev][TOC][Next>]


            yadu options - there's a bunch of 'em! ;-)
   
YES, there are a LOT of options/way to use yadu
If you get serious about filesystem scanning,
Trust me that you will end up using 'em!  ;-)
   
$ yadu -help
yadu is Yet Another Disk Usage program and is a simple/quick-n-dirty tool
to parse directory(s) and determine how many files and bytes are being used 
sliced-n-diced various ways. This turns out to be kinda useful - some questions
that yadu can be used to easily/quickly answer are:
   - Are there any files out there owned by users that no longer exist?
   - Which files are really BIG and account for most of the disk usage?
   - Which files have not been access/modified in a LONG time?
   - What percentage of the disk space is .mp3's, .ppt's, etc.?
   - Are there any setUID files out there?
   - Are there any directories of excessive length and/or repeated path parts? 
   - I need some data for generic disk management reporting.
        yadu generates output that is easily importable.
yadu uses the lstat command to generate sorted output in categories such as:
   atime -   last time file accessed (note: backup scripts can reset this)
   ctime -   last time inode changed
   mtime -   last time file changed
   dirdeep - Breakdown number of directory parts per path (branches)
   dirsize - Breakdown number of entries per directory (leaves)
   filelen - Breakdown of lengths of filenames (basename only)
   pathlen - Breakdown of lengths of full pathnames
   size -    Breakdown by size groupings
   user -    uid/owner of file    (not available on Windoze)
   gcos -    GCOS field of uid/owner of file    (not available on Windoze)
   group -   gid/group of file   (not available on Windoze)
   Extension (i.e. how many bytes/files by .doc, .tar, etc.)
   List SetUID Files (might as well do this security check)
   List SetGID Files (ditto)
   List Hardlinks (show files that have link count greater than one) 
You CAN do all of this using "find/ls/etc." ... yadu just makes it easier,
especially for large number of files. And as alluded to above, it also runs
on Windoze (with some limitations), so it can be useful there as well.
   
YES, there are a LOT of options/way to use yadu
If you get serious about filesystem scanning,
Trust me that you will end up using 'em!  ;-)
yadu [options] file/directory(s)
      -check_symlinks   Turn on checking of what the symlink points to and
                           classify the symlink based on rules specified in 
                           the variable %symlinks_defs.  Can increase run-time
                           if there is a large number of links, since a 
                           readlink() has to be done for each symlink found.
                           WARNING: Doing a readlink() updates the access time
                           of the symlink. Also see -dump_breakdown=symlink
      -debug            Generate (voluminous) debug output to STDOUT
      -df               Do a "df on the file/directory(s)" and put this as a
                           comment in various reports. NOTE: This is meant as a
                           quick-n-dirty hack, so rather than use one of the
                           semi-supported modules for this, we just do a 
                           system call ... so while this should work on most 
                           UNIX-type systems, it won't on Windoze.
                           Disabled if -input_from_file option is used.
      -do_nothing       Do NOTHING (except the recursive find-used for timings)
      -do_only_dump     ONLY dump all filenames found (used for timings/dumps)
      -do_only_match    ONLY do the {dir,file}name match (speeds it up a lot!)
      -do_only_stat     ONLY do the file stat (used for timings)
      -dump_breakdown=? Dump filenames broken down by category. You can select
                           each category or ALL seperated by comma's (no space):
                              atime,ctime,mtime,group,user,ext,dirdeep,dirsize,
                              filelen,pathlen,size,symlink,ALL
                           I.e.  -dump_breakdown=atime,mtime,ext
                           NOTE: This can DRAMATICALLY increase the size
                           of the output files ... and also increases the
                           yadu memory requirements. The hashed array's
                           are dumped every 10 * -heartbeat files in order
                           to try to prevent these from getting too large.
                           Also implicitly sets -log_to_files. Specifying
                           symlink implicitely turned on -check_symlink ...
                           but you MUST say that (i.e. ALL does not turn it on).
      -dump_filenames   Dump a single file with a list of all filenames.
                           Note: This can possible be useful as input to yadu
                           using the -input_from_file option ... but can 
                           generate quite voluminous output, but is not
                           memory intensive like -dump_breakdown is.
                           Also implicitely sets -log_to_files
      -dump_lstat_info  Dump a single file with a list of all filenames and
                           lstat info (see man page) which is:
                              dev ino mode nlink uid gid rdev size
                                 atime mtime ctime blksize blocks filename
                           filename is LAST because filenames may have all sorts
                           of interesting characters in them including a space;
                           so an appropriate delimeter would be tough!   ;-)
                           Note: This can be useful if you want to post-process
                           this raw data, but it can generate voluminous output,
                           but is not memory intensive like -dump_breakdown is.
                           Also implicitely sets -log_to_files
      -examples         Shows examples of how to use yadu and then exit
      -ext_char=?       Set "extension character" to "?" 
                           For example, if this is a ".", then "file_name.xls" 
                           would be a ".xls" file. The default is:  .
                           NOTE: You can say more than one character for "?",
                           in which case it will split on EITHER character.
      -ext_num_min=###  If a file extension has less than ### occurances,
                           then group 'em all together. I.e. if .whacko-smacko 
                           and a bunch of other file extensions that you don't
                           care about only show up once, then set this to 2 
                           (or higher) and they will all be grouped as:
                              EXTENSION_FILES_WITH_OCCURANCES_EQUAL_TO_###
                           The default is: 10
      -ext_on_first     Determine extension by using the first "." rather than 
                           last one.  I.e. "base.one.two" will be classified as 
                           ".one.two" instead of ".two"
      -future=###       Specify how many ### seconds in the future is "really"
                           the future (default: 300)
      -hardlinks        Show files with hardlinks (determined by nlink > 1)
      -heartbeat=###    Generate a "heartbeat" every ### files (default: 10000)
                           Note: typical scan rates are 5-50000+ files/minute
      -help OR -usage   Generate this listing
      -input_from_file  Use file(s) as LIST of stuff to scan - I.e. if you have
                           a long list of files that you want yadu to take a
                           look at for you, stuff 'em in a file and use this.
      -log_to_files     Log to "report.by" files (in CWD) instead of STDOUT
      -match_dirname=?  ONLY print directories that match string ? 
      -match_filename=? ONLY print files that match string ? 
                          NOTE: Use in conjuction with -do_only_match to make
                          this go REALLY fast since limited stats are done.
                          Note that the comparison is case-insensitive.
      -most_recent=?    Print the most recently accessed file by: 
                              atime, ctime, or mtime  (ex: -most_recent=atime)
                          Handy to see which file in the specified directory(s)
                          has been accessed/modified most recently. Directories
                          and any files more than 300 seconds in
                          the future are ignored. NOTE: Only one line is output
                          to STDOUT, so this could be handy for scripting.
      -no_print_ext     Do NOT print extension info (which can be lengthy)
      -no_push_dir_atime By default, yadu takes the directory access times 
                           and groups these into a seperate category ... 
                           since the atime is updated by yadu itself!
                           However, if you have a read-only medium, you may
                           want to see the actual access times, so you can
                           disable this seperate catagorization. 
      -no_push_symlink_atime By default, yadu takes the symlink access times 
                           and groups these into a seperate category ... 
                           since the atime MAY updated by yadu itself if you
                           have used the -check_symlinks option (or done
                           anything else that does a readlink(). However, 
                           if you have a read-only medium, you may want to 
                           see the actual access times, so you can disable
                           this seperate catagorization. 
      -parseable        Generate output that is (semi ;-) parseable
                           Precedes each line of output by a keyword that 
                           shows what category it is in (ex: LISTOFWAYTOBIGS)
                           This makes parsing very easy to do - type:
                           yadu -parseable_help to see the categories.
      -parseable_help   Print categoies that could be listed with -parseable
      -prune_dirs       If a directory is listed, then do NOT descend into it.
                           This is helpful if you have a list of dirs and files,
                           and you ONLY want data on just THOSE. 
      -prune_nfs        If an NFS (or LOFS) mount point is enountered, then do
                           NOT descend into it. This is quite useful if you just
                           want to do ALL local filesystems as you can simply
                           provide "/" as the starting location. Note that this
                           is VERY operating system dependant and has only been
                           tested on hpux and solaris. 
      -too_deep=###     If directory path has more than ## parts (default: 100)
                           then prune from there. Deals with ridiculously deep
                           directories that probably shouldn't exist. 
      -too_deep_repeat=### If directory path has more than ## parts (default: 10)
                           that are the SAME, then prune from there. Deals with
                           dirpaths that look something like this:
                              /a/a/a/a/a/a/a/a/a/a
      -too_long_file=## If filename (not the complete pathname) is longer than
                           ## characters (default: 100), then print this out. 
      -too_long_path=## If complete pathname is longer than ## characters 
                           (default: 500), then print this out. 
      -use_conf=???     Use filename ??? as a "configuration" file to override
                           the default settings. This is advanced stuff - type
                           yadu -help_use_conf for info on how this works
      -use_conf_help    Print info on how to use -use_conf=
      -way_too_big=###  Print files bigger than ### bytes (default: 104857600)
      -way_too_big_dir=###  Directories with more than ### entries (default: 1000)
      -yes              Don't ask Y/N? proceed ahead question - just do it!
                           Turned on by default for -log_to_files
      file/directory(s) Whitespace seperated list of files and/or directories 
                           to scan. Note that files/directories that start with
                           a hyphen (-) will need an extra hyphen - i.e.
                              yadu --dir
                           to run yadu on -dir
    # NOTE: Directories skipped are /yadu_output$ /yadu_skip_this_dir$ ^/proc$ /.snapshot$ ^/arc$ ^/app$ ^/appl$ ^/data$ ^/home$ ^/mail$ ^/net$ ^/nfs$ ^/parts$  
   # NOTE: /_CR_/_FF_/_NL_/_SPACE_/_TAB_ in reports.by.ext tweeked for readability/parsability
yadu version 1.2 (20030312) - questions/comments to AUTHOR