OriginalBugID: 948 Bug
Version: 8.0.3
SubmitDate: '1998-12-14'
LastModified: '2000-06-22'
Severity: MED
Status: Assigned
Submitter: pat
ChangedBy: hobbs
OS: Windows NT
Machine: X86
FixedDate: '2000-10-25'
ClosedDate: '2000-10-25'
Name: Uwe Traum
ReproducibleScript:
proc dotest {{filename test.bin}} {
set fid [open $filename w]
fconfigure $fid -translation binary
for { set i 0 } { $i < 2000 } { incr i } {
set ind [expr {128*int(rand()*30000)}]
#seek $fid $ind start
puts -nonewline $fid "123456789012345678901234567890"
}
close $fid
}
time dotest 3
[rewritten by hobbs as proc]
ObservedBehavior:
Output:
NT4;local disk;PentiumPro 200: 155172000 microseconds per iteration
Solaris2.5;local disk;sparc20: 1844860 microseconds per iteration
on unix it's 80 time faster than on NT!!!
DesiredBehavior:
same speed
In FileOutputProc (tcl8.0.3/win/tclWinChan.c,line 560) there
is ALWAYS a call to FlushFileBuffers.
So every I/O is written directly to disk.
That's why the Disk-LED is permanently blinking.
What's the reason for this call ?
Can it be removed ?
thanks
--
This is verified in 8.4a1. The disk LED does stay permanently
on under NT. Using the Performance Monitor, it does seem that
excessive flushing may be occuring.
-- 06/22/2000 hobbs
File channel driver on Win* forces a flush. It really doesn't need to, but some file tests depend on it doing a true write to disk. So therefore, it's slower.
See also Bug #119300 - we've so many unclosed bugs that it is impractical to link related ones... <sigh>)
Logged In: YES
user_id=75003
The actual id is #219300 after SF did its renumbering dance.
Logged In: YES
user_id=75003
This is the list of tests which fail if flushing is
disabled in the windows file driver:
io-27.2 FlushChannel, some output buffered
io-27.4 FlushChannel, implicit flush when buffer fills
io-27.5 FlushChannel, implicit flush when buffer fills and
on close
io-29.4 Tcl_WriteChars, buffering in full buffering mode
io-29.5 Tcl_WriteChars, buffering in line buffering mode
io-29.6 Tcl_WriteChars, buffering in no buffering mode
io-29.7 Tcl_Flush, full buffering
io-29.8 Tcl_Flush, full buffering
io-29.17 Tcl_WriteChars buffers, then Tcl_Flush flushes
io-29.18 Tcl_WriteChars and Tcl_Flush intermixed
io-29.19 Explicit and implicit flushes
io-29.20 Implicit flush when buffer is full
io-29.28 Tcl_WriteChars, lf mode
io-39.6 Tcl_SetChannelOption, multiple options
io-39.7 Tcl_SetChannelOption, buffering, translation
io-39.8 Tcl_SetChannelOption, different buffering options
io-52.7 TclCopyChannel
Logged In: YES
user_id=75003
Ok, I now understand the problem much better. It is
partially an OS issue and partially an issue of how the
affected tests were written.
When Tcl 'flushes' a channel it actually only writes its
internal buffers to the OS and then forgets about the data.
The OS is free to delay the actual write to disk.
The affected tests try to check that the flushing behaviour
of tcl is correct. To do so they perform some writes and
then check the size of the resulting file. But this
meansthat they actually check the flushing behaviour of Tcl
itself and how the OS deals with pending data when it comes
to reporting the size of a file.
Both Unix and Win* platforms delay writing data to disk
until they have idle time, or by grouping nearby block
together, etc. But obviously Win* is more lazy than Unix
when it comes to reporting the size of a file with pending
writes. Win* reports the size actually on disk, no matter
how much data is pending. Unix goes to the trouble and
calculates the size of the file as if the pending data had
been written to the disk.
The current solution of this problem is to force Win* to
actually write all the data written to it by Tcl to the
disk too, without delay. This gets us the reliable file
sizes the tests need to perform correctly, at the expense
of general I/O performance.
Logged In: NO
I agree 100% with your summary.
Logged In: YES
user_id=75003
Just for the record here are the results of running tclbench
for a tclsh with forced flushing (1) and without (2) for my
machine (Win NT 5, 128 MB). Used fcopy to exercise the I/O
system.
$ ./tcl/win/win-dll/tclsh84.exe tclbench/runbench.tcl \ -match 'FCOPY*' -notk \ -paths "./tcl/win/win-dll/ ./tcl.nf/win/win-dll/"
000 VERSIONS: 1:8.4a4 2:8.4a4
001 FCOPY binary: 164K 2320137 19575
002 FCOPY encoding: 164K 1583793 39857
003 FCOPY std: 164K 2435353 18588
003 BENCHMARKS 1:8.4a4 2:8.4a4
Logged In: YES
user_id=75003
Ideas to solve this problem collected so far.
________________________________________
Just remove the forced OS flush for
Windows. Make the tests 'unixOnly'.
Anticipated Effects:
- Speedup for Windows I/O compared to
current solution.
- No change for the other platforms.
- The coverage of code paths by the
testsuite decreases. In other words,
the testsuite becomes worse.
________________________________________
Add counters in the channel structures
(on the driver side) to count how many
bytes were read and written to the OS.
Add testchannel subcommands to access this
information instead of using [file size].
The tests will have to be rewritten.
Anticipated Effects:
- General slowdown in the I/O system
for all platforms (Counter management).
Should be negligible though.
- Speedup for Windows I/O compared to
current solution.
- The testsuite stays in shape.
________________________________________
Handle the proposed counters only for Win*.
Write separate tests for Unix and Win*
Anticipated Effects:
- Speedup for Windows.
- No change for the other platforms.
- The testsuite stays in shape.
________________________________________
Add a boolean flag to the Win* structures
(driver side). Indicates if a true flush
was done on the file channel.
Whenever a [file size] is requested the
system goes through the list of file
channels and does an OS flush on all with
the flag not set. The flag is set by this
action. Any write on the channel resets
the flag for that channel. When closing a
file channel do a true flush in the driver.
The testsuite needs no change.
Anticipated Effects:
- Slowdown of [file size] operation
for Win*.
- Speedup of Win* I/O in general.
- No change for the other platforms.
- Essentially emulates Unix behaviour
on Windows for Tcl.
- Adds interaction between the
filesystem and the I/O (channel)
code.
- The testsuite stays in shape.
Logged In: YES
user_id=75003
More ideas (coming from Jeff).
________________________________________
What happens on Windows if another process
opens the file ? Does that process also
get the bogus file size ?
________________________________________
Are there Win* APIs we could use to peek
into the buffering done by Windows ?
We could use this instead of the counters.
Or we could use this in [file size] to
report a better size.
Logged In: YES
user_id=75003
Ideas from David Graveraux:
The only thing I know is that if there's uncommitted
buffers the OS holding, a
request for file size won't cause the OS to commit the
buffers first.
A look at using I/O completion ports for writing to disk
from within Tcl >might<
be a good work-around for tracking what the OS hasn't
committed yet. I can't
say for sure. The amount of code for tracking could get
very large. Adding an
explicit flush to the channel driver might be the best
alternative, but explicit
at the script level to the user instead of the implicit one
as is now.
That's all I know.
>Hm. We have a flushproc in the driver, it is just not used
yet. This
>could contain the OS-Flush on windows and be called by
[flush] after
>it has committed the tcl buffers to the OS. This does not
help with the
>test which check file sizes to check the correctness
the 'implicit'
>flushes. And the moment we add the OS-flush to them we are
back to the
>current situation.
half way there... add a [flush] to the tests, that will do
FlushFileBuffers()
or whatever was the API func...
It's not the same. Make [flush] not only flush the channel
but commit the OS
buffers, too. Normal mode flushing of the channel buffer
doesn't have to also
mean flushing the OS buffers, too.
Logged In: YES
user_id=75003
Added a patch solving the problem. Used the idea of a
boolean flag and flushing only the channels which were
written too and only when requesting size information.
Fix, unified diff, v0
Logged In: YES
user_id=72656
Looks great.
Logged In: YES
user_id=75003
Committed to both head and core-8-3-1-branch.