[J3] SYSTEM_CLOCK
Vipul Parekh
parekhvs at gmail.com
Sat Jan 30 03:20:41 UTC 2021
On Fri, Jan 29, 2021 at 11:00 AM John Reid via J3 <j3 at mailman.j3-fortran.org>
wrote:
> ..
> Your idea of basing everything on a real R for the count rate on an
> image would not work for an old code using default integers on an image
> that has a fast clock. ..
>
John,
I am very much bothered by the sentence, "If an image has more than one clock,
the clock is determined by the kind of the integer arguments or is
processor dependent if there are no integer arguments."
All,
My experience is limited to a *single* clock on an image that is generally
fast but which the implementors might view as being too fast relative to
32-bit integer counts. So the processor might try to be "smart" and
attempt some scaling of the tick counts to serve as a good timer for the
calling program when the integer arguments are 32-bit and/or when they are
of larger bit-size. Under the circumstances, it does seem the count rate
becomes an arbitrary value such as 1,000 or 10,000 or 1,000,000 to simply
help with the scaling. Conceptually such an arbitrary fixed value for the
tick rate comes across as no different than CLOCKS_PER_SEC in <time.h> with
the C processor that usually has an arbitrary constant value of
1,000,000,000. Or even the HZ constant with 'jiffies' on Linux kernel
timers.
Nonetheless, the processors has to ensure the "scaled" value of COUNT using
the SYSTEM_CLOCK along with the COUNT_RATE are such that
real(COUNT,kind=K)/real(COUNT_RATE,kind=K) (where kind K corresponds to a
precision equal to or better than KIND(1D0)) provides an approximation of
the "processor time". Or better yet, that real(
COUNT2-COUNT1,kind=K)/real(COUNT_RATE,kind=K) - where COUNT2 and COUNT1 are
2 readings via consistent invocations of SYSTEM_CLOCK - provide a
reasonable measure of the DELTA processor time elapsed between certain
processor instructions. This appears only natural, for this supports the
common use of the SYSTEM_CLOCK intrinsic toward instrumented code. This
can be noticed with the code example using MATMUL below and where the
program behavior using 2 different processors on WIndows are shown first:
--- program output: gfortran ---
Timer Count Before Count After Count Rate
SYSTEM_CLOCK 32-bit 425628421 425629437 1000
SYSTEM_CLOCK 64-bit 4256544112416 4256554369340 10000000
Windows Tick Count 425628421 425629437 1000
Timer MATMUL CPU Usage
SYSTEM_CLOCK 32-bit 1.0160000000000000
SYSTEM_CLOCK 64-bit 1.0256924000000001
Windows Tick Count 1.0160000000000000
--- end output ---
--- program output: Intel Fortran ---
Timer Ticks Before Ticks After Rate
SYSTEM_CLOCK 32-bit 716841390 716857990 10000
SYSTEM_CLOCK 64-bit 1611970172139000 1611970173799000 1000000
Windows Tick Count 425542500 425544156 1000
Timer MATMUL CPU Usage
SYSTEM_CLOCK 32-bit 1.660000000000000
SYSTEM_CLOCK 64-bit 1.660000000000000
Windows Tick Count 1.656000000000000
--- end output ---
So what will be noticeable with this example is the common use of
SYSTEM_CLOCK where some instrumentation toward a timing study is attempted
and ultimately the value of interest is merely the DELTA "time" in a
floating-point representation (e.g., real(
COUNT2-COUNT1,kind=K)/real(COUNT_RATE,kind=K) ). On this basis, the
distinction between a "32-bit clock" and "64-bit clock" (or a not-so-fast
or fast clock) doesn't seem to matter all that much. Perhaps it may make a
difference when the DELTA "time" is quite short but then it's difficult to
read much into results when the elapsed times are too short anyway.
Keeping this example in mind, some questions:
1) What all are the uses of SYSTEM_CLOCK besides the one shown above that
helps approximate a DELTA "time" for some processor instructions?
2) Is there really more than one clock on an image with the processors
currently in use, or is it just one which returns a certain tick count?
3) If it's just one clock but using which the Fortran processor scales the
COUNTs relative to an arbitrary count rate, why introduce terminology in
the standard about more than one clock? It will be more confusing to
readers.
4) If the most common, or perhaps the only, use of SYSTEM_CLOCK is as a
DELTA timer, and where the time values need to be in floating-point to be
useful, why not orient the intrinsic toward the same? That is what the
example in the current standard appears to attempt.
5) Can someone show an example of an existing code with default integers
that does not work with more than one clock?
Thanks,
Vipul Parekh
The results for SYSTEM_CLOCK shown above are based on this:
--- begin code ---
use, intrinsic :: iso_fortran_env, only : int32, int64, real64
integer(int32) :: t0b, rate0b, t0a, rate0a
integer(int64) :: t1b, rate1b, t1a, rate1a
integer(int64) :: t2b, rate2b, t2a, rate2a
integer, parameter :: N = 2048
real(real64), allocatable :: a(:,:), b(:,:), c(:,:)
character(len=*), parameter :: fmtg = "(g0,t25,g0,t45,g0,t65,g0)"
allocate( a(N,N), b(N,N), c(N,N) )
call random_number( a )
call random_number( b )
call system_clock( count=t0b, count_rate=rate0b )
call system_clock( count=t1b, count_rate=rate1b )
call sys_clock( ticks=t2b, rate=rate2b )
c = matmul( a, b )
call system_clock( count=t0a, count_rate=rate0a )
call system_clock( count=t1a, count_rate=rate1a )
call sys_clock( ticks=t2a, rate=rate2a )
print fmtg, "Timer", "Count Before", "Count After", "Count Rate"
print fmtg, "SYSTEM_CLOCK 32-bit",t0b, t0a, rate0b
print fmtg, "SYSTEM_CLOCK 64-bit",t1b, t1a, rate1b
print fmtg, "Windows Tick Count", t2b, t2a, rate2b
print *
print fmtg, "Timer", "MATMUL CPU Usage"
print fmtg, "SYSTEM_CLOCK 32-bit", real((t0a-t0b), real64)/rate0b
print fmtg, "SYSTEM_CLOCK 64-bit", real((t1a-t1b), real64)/rate1b
print fmtg, "Windows Tick Count", real((t2a-t2b), real64)/rate2b
contains
subroutine sys_clock( ticks, rate )
use, intrinsic :: iso_c_binding, only : c_long_long
interface
function GetTickCount64() result(ticks) bind(C,
name="GetTickCount64")
! Microsoft API <sysinfoapi.h>
! ULONGLONG GetTickCount64();
import :: c_long_long
integer(c_long_long) :: ticks
end function GetTickCount64
end interface
integer(c_long_long), parameter :: FREQUENCY = 1000_c_long_long !
milliseconds
! Argument list
integer(c_long_long), intent(inout) :: ticks
integer(c_long_long), intent(inout) :: rate
ticks = GetTickCount64()
rate = FREQUENCY
return
end subroutine
end
--- end code ---
Compilation and execution:
--- begin console output ---
C:\Temp>gfortran -O3 t.f90 -o t.exe
C:\Temp>t.exe
Timer Count Before Count After Count Rate
SYSTEM_CLOCK 32-bit 431543234 431544203 1000
SYSTEM_CLOCK 64-bit 4315692302527 4315701959260 10000000
Windows Tick Count 431543234 431544203 1000
Timer MATMUL CPU Usage
SYSTEM_CLOCK 32-bit 0.96899999999999997
SYSTEM_CLOCK 64-bit 0.96567329999999996
Windows Tick Count 0.96899999999999997
C:\Temp>
--- end console output ---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.j3-fortran.org/pipermail/j3/attachments/20210129/506a17c5/attachment-0002.htm>
More information about the J3
mailing list