YADU - Yet Another Disk Usage program

yadu options - there's a bunch of 'em! ;-)

YES, there are a LOT of options/way to use yadu
If you get serious about filesystem scanning,
Trust me that you will end up using 'em! ;-)

$ yadu -help
yadu is Yet Another Disk Usage program and is a simple/quick-n-dirty tool
to parse directory(s) and determine how many files and bytes are being used
sliced-n-diced various ways. This turns out to be kinda useful - some questions
that yadu can be used to easily/quickly answer are:
- Are there any files out there owned by users that no longer exist?
- Which files are really BIG and account for most of the disk usage?
- Which files have not been access/modified in a LONG time?
- What percentage of the disk space is .mp3's, .ppt's, etc.?
- Are there any setUID files out there?
- Are there any directories of excessive length and/or repeated path parts?
- I need some data for generic disk management reporting.
yadu generates output that is easily importable.
yadu uses the lstat command to generate sorted output in categories such as:
atime - last time file accessed (note: backup scripts can reset this)
ctime - last time inode changed
mtime - last time file changed
dirdeep - Breakdown number of directory parts per path (branches)
dirsize - Breakdown number of entries per directory (leaves)
filelen - Breakdown of lengths of filenames (basename only)
pathlen - Breakdown of lengths of full pathnames
size - Breakdown by size groupings
user - uid/owner of file (not available on Windoze)
gcos - GCOS field of uid/owner of file (not available on Windoze)
group - gid/group of file (not available on Windoze)
Extension (i.e. how many bytes/files by .doc, .tar, etc.)
List SetUID Files (might as well do this security check)
List SetGID Files (ditto)
List Hardlinks (show files that have link count greater than one)
You CAN do all of this using "find/ls/etc." ... yadu just makes it easier,
especially for large number of files. And as alluded to above, it also runs
on Windoze (with some limitations), so it can be useful there as well.

YES, there are a LOT of options/way to use yadu
If you get serious about filesystem scanning,
Trust me that you will end up using 'em! ;-)
yadu [options] file/directory(s)
-check_symlinks Turn on checking of what the symlink points to and
classify the symlink based on rules specified in
the variable %symlinks_defs. Can increase run-time
if there is a large number of links, since a
readlink() has to be done for each symlink found.
WARNING: Doing a readlink() updates the access time
of the symlink. Also see -dump_breakdown=symlink
-debug Generate (voluminous) debug output to STDOUT
-df Do a "df on the file/directory(s)" and put this as a
comment in various reports. NOTE: This is meant as a
quick-n-dirty hack, so rather than use one of the
semi-supported modules for this, we just do a
system call ... so while this should work on most
UNIX-type systems, it won't on Windoze.
Disabled if -input_from_file option is used.
-do_nothing Do NOTHING (except the recursive find-used for timings)
-do_only_dump ONLY dump all filenames found (used for timings/dumps)
-do_only_match ONLY do the {dir,file}name match (speeds it up a lot!)
-do_only_stat ONLY do the file stat (used for timings)
-dump_breakdown=? Dump filenames broken down by category. You can select
each category or ALL seperated by comma's (no space):
atime,ctime,mtime,group,user,ext,dirdeep,dirsize,
filelen,pathlen,size,symlink,ALL
I.e. -dump_breakdown=atime,mtime,ext
NOTE: This can DRAMATICALLY increase the size
of the output files ... and also increases the
yadu memory requirements. The hashed array's
are dumped every 10 * -heartbeat files in order
to try to prevent these from getting too large.
Also implicitly sets -log_to_files. Specifying
symlink implicitely turned on -check_symlink ...
but you MUST say that (i.e. ALL does not turn it on).
-dump_filenames Dump a single file with a list of all filenames.
Note: This can possible be useful as input to yadu
using the -input_from_file option ... but can
generate quite voluminous output, but is not
memory intensive like -dump_breakdown is.
Also implicitely sets -log_to_files
-dump_lstat_info Dump a single file with a list of all filenames and
lstat info (see man page) which is:
dev ino mode nlink uid gid rdev size
atime mtime ctime blksize blocks filename
filename is LAST because filenames may have all sorts
of interesting characters in them including a space;
so an appropriate delimeter would be tough! ;-)
Note: This can be useful if you want to post-process
this raw data, but it can generate voluminous output,
but is not memory intensive like -dump_breakdown is.
Also implicitely sets -log_to_files
-examples Shows examples of how to use yadu and then exit
-ext_char=? Set "extension character" to "?"
For example, if this is a ".", then "file_name.xls"
would be a ".xls" file. The default is: .
NOTE: You can say more than one character for "?",
in which case it will split on EITHER character.
-ext_num_min=### If a file extension has less than ### occurances,
then group 'em all together. I.e. if .whacko-smacko
and a bunch of other file extensions that you don't
care about only show up once, then set this to 2
(or higher) and they will all be grouped as:
EXTENSION_FILES_WITH_OCCURANCES_EQUAL_TO_###
The default is: 10
-ext_on_first Determine extension by using the first "." rather than
last one. I.e. "base.one.two" will be classified as
".one.two" instead of ".two"
-future=### Specify how many ### seconds in the future is "really"
the future (default: 300)
-hardlinks Show files with hardlinks (determined by nlink > 1)
-heartbeat=### Generate a "heartbeat" every ### files (default: 10000)
Note: typical scan rates are 5-50000+ files/minute
-help OR -usage Generate this listing
-input_from_file Use file(s) as LIST of stuff to scan - I.e. if you have
a long list of files that you want yadu to take a
look at for you, stuff 'em in a file and use this.
-log_to_files Log to "report.by" files (in CWD) instead of STDOUT
-match_dirname=? ONLY print directories that match string ?
-match_filename=? ONLY print files that match string ?
NOTE: Use in conjuction with -do_only_match to make
this go REALLY fast since limited stats are done.
Note that the comparison is case-insensitive.
-most_recent=? Print the most recently accessed file by:
atime, ctime, or mtime (ex: -most_recent=atime)
Handy to see which file in the specified directory(s)
has been accessed/modified most recently. Directories
and any files more than 300 seconds in
the future are ignored. NOTE: Only one line is output
to STDOUT, so this could be handy for scripting.
-no_print_ext Do NOT print extension info (which can be lengthy)
-no_push_dir_atime By default, yadu takes the directory access times
and groups these into a seperate category ...
since the atime is updated by yadu itself!
However, if you have a read-only medium, you may
want to see the actual access times, so you can
disable this seperate catagorization.
-no_push_symlink_atime By default, yadu takes the symlink access times
and groups these into a seperate category ...
since the atime MAY updated by yadu itself if you
have used the -check_symlinks option (or done
anything else that does a readlink(). However,
if you have a read-only medium, you may want to
see the actual access times, so you can disable
this seperate catagorization.
-parseable Generate output that is (semi ;-) parseable
Precedes each line of output by a keyword that
shows what category it is in (ex: LISTOFWAYTOBIGS)
This makes parsing very easy to do - type:
yadu -parseable_help to see the categories.
-parseable_help Print categoies that could be listed with -parseable
-prune_dirs If a directory is listed, then do NOT descend into it.
This is helpful if you have a list of dirs and files,
and you ONLY want data on just THOSE.
-prune_nfs If an NFS (or LOFS) mount point is enountered, then do
NOT descend into it. This is quite useful if you just
want to do ALL local filesystems as you can simply
provide "/" as the starting location. Note that this
is VERY operating system dependant and has only been
tested on hpux and solaris.
-too_deep=### If directory path has more than ## parts (default: 100)
then prune from there. Deals with ridiculously deep
directories that probably shouldn't exist.
-too_deep_repeat=### If directory path has more than ## parts (default: 10)
that are the SAME, then prune from there. Deals with
dirpaths that look something like this:
/a/a/a/a/a/a/a/a/a/a
-too_long_file=## If filename (not the complete pathname) is longer than
## characters (default: 100), then print this out.
-too_long_path=## If complete pathname is longer than ## characters
(default: 500), then print this out.
-use_conf=??? Use filename ??? as a "configuration" file to override
the default settings. This is advanced stuff - type
yadu -help_use_conf for info on how this works
-use_conf_help Print info on how to use -use_conf=
-way_too_big=### Print files bigger than ### bytes (default: 104857600)
-way_too_big_dir=### Directories with more than ### entries (default: 1000)
-yes Don't ask Y/N? proceed ahead question - just do it!
Turned on by default for -log_to_files
file/directory(s) Whitespace seperated list of files and/or directories
to scan. Note that files/directories that start with
a hyphen (-) will need an extra hyphen - i.e.
yadu --dir
to run yadu on -dir
# NOTE: Directories skipped are /yadu_output$ /yadu_skip_this_dir$ ^/proc$ /.snapshot$ ^/arc$ ^/app$ ^/appl$ ^/data$ ^/home$ ^/mail$ ^/net$ ^/nfs$ ^/parts$
# NOTE: /_CR_/_FF_/_NL_/_SPACE_/_TAB_ in reports.by.ext tweeked for readability/parsability
yadu version 1.2 (20030312) - questions/comments to AUTHOR