FileName
expanding_entropy_5.txt
Title
Expand Entropy Availability in Random Data Aquisition by using the TIME factor.
Abstract
Our goal is to enhance Global Entropy Gathering, i.e., Increase eficiency in Random-Data aquisition.
Here we demonstrate and exemplify our proposal:
To use TIME as a MULTIPLYING factor in an eficient way. Meanning to extract MORE and BETTER data using LESS inputs.
In a sense we will extract entropy from the flow of time, expressed as an hash routine frequency.
Time will be used as an extra dimension acting as MULTIPLIER by increasing the universe of possible states.
This method can be applied by both software or dedicated hardware.
Author
Dutra de Lacerda,
Email: administrator@mail.telepac.pt
Change: {administrator} by {dulac} in the above address.
Introduction
- "Engeniering is making the most with less" - Definition by 'unknown'.
Obtaining Random numbers from events is dificult and slow.
Our goal is to increase the eficiency of Random data aquisition...
Let's examine the problem and solutions:
1st Level Solution
Data is usualy obtained from the source(s) in a one-by-one process.
The basic "collect-and-add" Event values were the entropy value of the data collected is event dependent.
This leads to hard choices in the events sources to use.
E = (E1 + E2 + .... + En)
2nd Level Solution
Using several sources, crossing them in a Mix-and-Smash cumulative process.
Here we may have improved the process.
E = F1(E1,1+E1,2+...+E1,m) +
F2(E2,1+E2,2+...+E2,m) +
... +
Fn(En,1+En,2+...+En,m)
Doubts come because we have also have inflated the Quality Analysis problem.
3th Level Solution
Another Improvement (proposed here) is to sinergeticaly use diferent vectors.
IFF fully unrelated AND full interaction between them THEN this vectors may represent new dimensions so:
E = F1(E1,1+E1,2+...+E1,m) x
F2(E2,1+E2,2+...+E2,m) x
... +
Fn(En,1+En,2+...+En,m)
At this point we have a real problem: The dificulty to analise the iteraction between Events. And we drop it as problematic and ineficient.
A simplification will be the basis for our demonstration were these problems will be overcome.
The Hash Resourse
In previous solutions, HASHING can be used in two ways:
In Security and in Non-Security levels.
For its real value can be used by both.
If the amount of data is reduced down to its real entropy level And that value is known and fully understood as the Entropy level can be problematic to calculate and/or to undertand correctly.
Description
The Scientist says something is Impossible to do and explains why.
If he is wrong (99% chance) then another scientist makes the correction ...
... and comes a Technician that builds that something.
If he is right (1% chance) then there is much fuss about ...
... and comes an Engenier that surounds the impossibility and creates that something.
Rationale
- "Hacking is true Engeniering, the essence of Inovation and the basis of Technology".
The basis of Randomness is tranformation. Transformation is what give us the the notion of TIME.
Time, from a CPU point of view, correspond to any change of state in the system.
Time Vector
It will be our BASE environment. So it is very special and diferent from EVENT vectors.
We will represent the tranformations in time by a fast hash function in order to improve our results.
Time will be the Multiplier factor IFF there is at least 1 bit of entropy in the time variation between one event and the next.
Event Vector
An Event expressed by a value and also a MOMENT in TIME.
Analysis
We will analise a simplified (more workable and faster) version where:
E = Ft x ( F1[]+F2[]+...+Fn[] )
and will present an easy to follow example using just one Event source.
E = Ft x F1[]
were Ft is the Time vector. This being a special event is notated dfiferently.
The increase in the entropy of the outputs is proporcional to:
(hash quality + its frequency) x ( inputs interval variance )
Example
This example is based on Keyboard Input.
The Event used in this routine is Keyboard input. Input is the KeyPressed value. Its use here is validated it's variance in time.
The Multiplier are the idle time transforms beetwen Events which have been influenced by the cipher TEA, choosen due to its eficiency.
The example is presented for demonstration purposes an can be modified so each KeyPressed creates one new 32bits word.
Code
// --------- Get Event Status ---------------------------------- //
//
// Input: None
//
// Output:
// Zero flag Set = ZF = No KeyPressed
// Zero Flag Clear = NZ = Key Waiting )
// ------------------------------------------------------------- //
function GetKeybStatus()
Mov ah, 11h
Int 16h
ret
// --------- Get Event Value ----------------------------------- //
//
// Input: None
//
// Output:
// AH = Keyboard Scan Code
// AL = ASCII Char )
// ------------------------------------------------------------- //
function GetKeyPressed()
Mov ah, 10h
Int 16h
ret
// --------- Get Moment in Time -------------------------------- //
//
// Input: None
//
// Output:
// CX = High( TickCount )
// DX = Low ( TickCount )
// ------------------------------------------------------------- //
function GetTickCount()
Mov ah, 00h
Int 1Ah
ret
// --------- Get Random ---------------------------------------- //
// Input: None
//
// Output:
// RANDOM in CX:DX
//
// Note1: Uses unknown idle time
// Note2: Entropy is directly proporcional to
// Loops done
// and inversely proporcional to
// interrupting factor frequency
// ------------------------------------------------------------- //
function Random()
Init:
mov CODES, 0000
mov RAND, 00000000
mov SUM, 00000000
mov DELTA, 9E3779B9
Loop:
add SUM, DELTA // Tea based fast hash
mov RAND, SUM
rol RAND, 3
call GetKeybStatus
jz Tick
jnz Key
Tick:
Call GetTickCount
add RAND, Ticks in CX:DX
jmp Next
Key:
Call GetKeyPressed
Mov CODES, AX
add Low(RAND), CODES
jmp Next
Next:
XOR RAND, SUM
Until:
mov AL, Low(Codes)
Test AL, 10h
JZ Return
Test AL, 13h
JZ Return
Test AL, 1Bh
JZ Return
JMP LOOP
Return:
MOV CX, High( RAND )
MOV DX, Low ( RAND )
ret
// ------------------------------------------------------------- //
Notes
This routines have an undetermined number of cycles while waiting for a keypress at a random moment in a finite set of time slices related with the timer (programmable, in a PC, for smaller slices and thus a bigger set of values).
However the number of transforms between events is the major vector to the whole process. This should be counted as an indicator value.
If true random events cannot be used, even then a good degree of randomness would be achieved using other system events taking in account its properties... Then this events could be used instead or in association. Possible Sources: Microphone, Mouse, Disk, etc.
Such changes in code are simple and do not affect the main routine with one exception: a "director" routine *must* be used to evaluate the entropy (de facto) present in the random routine state.
Conclusions
1 - Entropy is hard to get...
...but there's plenty around us in every event.
2 - The faces of a dice are NOT the only measure for the entropy in a dice.
3 - Time is one of the greatest unused resources.
4 - A Cipher Output is the closest thing to True-Random data.
5 - The number of Rounds is of major importance in the quality of the output.
6 - Data from one Event can also be used as password to other Event data.
7 - Multisource cross can also be used.
The concept of extracting entropy from time is simple tough unexpected.
It should be considered as an optimizer and by no means eliminate the analysis
of the final estimation of entropy level achieved in the resulting data.
Worst case situations in each user system should be expected and analysed.
An analyser of such cases could be implemented as a manager routine for
control and/or report.
FAQ
1-What is the value of E(Key) ?
Q: Even if you modify your code to generate a 32 bit word for each KeyPressed, it is still essential to estimate how much entropy the KeyPressed event contains (certainly not 32 bits).
A: Yes. An estimation of how much entropy is available *IS* essencial. But No on the 32 bits statement:
Actually the Tick Counter has more that 8 bits, but knowing that the entropy associated with the frequency of code transformations in the idle time, they may be superior to 64 bits for the total work involved.
Notice that the heart of the routine is the idle transformations interrupted by a keypress. Thus the importance of the frequency of the transforms while idle. What is such a frequency in a 686 at 200MHz?
There is another consideration: The synergetic aspect of Events (Data) with the TEA-like transformation affecting all bits (Time). I suppose that cannot be measured with full acuracy but is present and we can estimate it's value to controll the whole process safely.
The variance between Event intervals (delta_ticks) should be proporcional to the entropy multiplier.
So the Event Entropy Multiplier IS Event_data x Event_interval_variance,
E[f] = E[ev] x E[t_var]
2a-Could E(Key) = 1bit ?
>Suppose each KeyPressed event actually generates 1 bit of entropy,
>and you need 64 bits of random data for a key.
Actually its up to 4 or 5 bits because keys are randomly choosed by the user instead of choosen words to memorize. However they can be up to 49 if Shift/Alt/Ctrl keys are used together with All keys available.
This is possible because we are giving random keys that do NOT need to be remembered. Expected 2 bit of entropy refer to words, not random Keys.
As previously stated: The Key used is almost unimportant compared with the number of iterations, affecting all bits, in the idle time. You actually get a number of bits multiplier of the (idle time) transform counter. Also to consider is the variance of the idle transforms AND the variance of the interval beteen events.
And every Key adds the key pressed and continues adding the next iterations up to the end of the routine with an Key.
2b-Shouldn't Hash the final result ?
Q: Once you've collected the data from 64 KeyPressed calls, you should hash this data (64 of your 32 bit words) with a cryptographically secure hash. This will yield more secure data than using the data from your routine directly.
A: Hashing does NOT make that miracle: To increase security as a rule.
(Anyway, our goal is to get random data eficiently, in size as in quality over a short period of time...safely!)
Explanation:
Hashing does NOT increases the amount of entropy. What it does is a translation from ANY amount of data to a single size token. It can only reduce the entropy of the input, never to increase it:
Used in a file:
It reduces the FULL file entropy down to a fingerprint.
Used in a password:
It maximizes the entropy of a limited size password (limit defined by the cipher) as in plain english is limited to arround 2 bit by char (so more chars are needed) allowing to maximize it with bigger passphrases.
Hashing then compress those passphrases to the password size of the cipher.
Q: How does it do that?
This compress.to.fingerprint property where data is reduced (and so is the full entropy of the data, unless the data entropy is smaller than the fingerprint) is due to one of the properties of crypto hash methods: Every bit transforms the result of the output, diferently depending from its position and the previous values processed.
However it does NOT increase the entropy of a small password. It just translates... with loss.
As a result the present entropy is maintained from the minimum present in the passfrase up to the maximum value used allowed the hash. We nedd hashing because we need more chars... to get a better use of the password size limitation of the cipher.
This is usually not understood because it is usually the result of a study and store process instead of grockening the subject of study.
Note: Blowfish doesn't need hashing because allows passwords up to 576 bits (72 chars). Teoreticaly this is 2*72 = 144 bits of entropy. Personaly I dont accept this value as more than an indicator.
3-How many bits can I trust?
Q: It would be very, very foolish to use the output of your code to directly fill a key structure (e.g. two calls at 32bits each yields your 64 bit key).
A: That is not even sligthly sugested. Please note:
If using the Keys only (and not the described method) then definetlly yes!
But seeing this method in the 'usual' way is a incorrect view of the algorithm.
It is an Hibrid algorithm. Thus a different analysis is needed...
(Engineering is getting the best results with the smallest expenses)
4-Isn't (1 cicle = 1 bit )?
>> Notice that the heart of the routine is the idle transformations
>> interrupted by a keypress. Thus the importance of the frequency
>> of the transforms while idle. What is such a frequency in a 686
>> at 200MHz?
>
>even with the generous assumption that there is only one keypress per
>1 second interval, randomly distributed, and a full CPU operation per
>clock cycle, this only leads you to a maximum of ~18 bits of entropy
>per keypress.
Yes, that may be a way to see the problem.
However one must have the notion Entropy does not come in packs.
It is necessary to evaluate the global process, and this is very
tricky when dealing with a chaotic system like this. Because that
is what is happening here:
We have 2 different dimensional sources of entropy:
The least important is the Key Choosed; 27 Char to choose.
The most important is WHEN to press; This is Out of the usual.
Other variables relevant are the total amount of rounds used, the
internal primitive Hash continues to roll, and the number of times
the path is disturbed with each disturbance affecting the final
result. How many paths are there to the internal Hash?!?
If we assume the first idle transform resultant (until the
first key), of 2^10 (a very conservative approach) we must add to
the exponent all other events like 2^4 from the key choosed and
again the next cycle.
For 10 Keys we will get some 2^140 possible paths, more
or less... only dependent from the primitive hash falling in
deterministic situations like data loops. The weak part is the
inner hash running at full speed for millions of iterations.
5-How fair is calculated ?
>you cannot get more entropy, no matter what your algorithm
>is doing, unless it's pulling in another truly random source.
Right... and Wrong!
You are assuming fixed sources for entropy... You forget time
as source. If a cube has 6 faces how many slices have one second?!?
In a computer you have the frequency of usage of a piece of code
in a loop... What is it's entropy ?!? This are NOT common waters.
This is the inner of such a small piece of code.
How many possible values can we obtain?!?
How many paths are possible, knowing that User interference will
change the actual path in ONE of the Millions of cycles.
6-How many paths do I have?
>your algorithm could be useful as the input to a hash which was then
>used to fill crypto purposes, but it's not safe to use directly.
If I have 2^104 possible results I'm confident i can get a good
2^64 bits password completely Random.
As a proposal to the study of this situation I would propose to
consider the TOTAL number of cycles in the Loop... subtracted by
the minimum delay between keystrokes times the amount of keystrokes.
This value should be close for the amount of possible paths.
Should be added the combinations of the number of keys used.
I expect a value much larger than 2^64 for the final result.