comp.lang.ada
 help / color / mirror / Atom feed
* Simple hash or pseudo-random function
@ 2018-07-16  8:20 gautier_niouzes
  2018-07-16 13:34 ` Shark8
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: gautier_niouzes @ 2018-07-16  8:20 UTC (permalink / raw)


Hello,
Does someone have a function that would take an integer (range 0 .. 2**63), not uniformily distributed (it's a code with some meaning) and return a number that is pseudo-random, uniformily distributed (could be a floating-point number or an integer in a range of minimum length 365) ?
The difficulty for me is to find a very simple function. There are many that are for hashing strings, but they could be too time-consuming: it's for a number-crunching program where this function will be called billions of times, so I'm looking for something really simple (and fast :-) ).
TIA
_________________________ 
Gautier's Ada programming 
http://gautiersblog.blogspot.com/search/label/Ada 
NB: follow the above link for a valid e-mail address 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-16  8:20 Simple hash or pseudo-random function gautier_niouzes
@ 2018-07-16 13:34 ` Shark8
  2018-07-17  4:44   ` robin.vowels
  2018-07-16 14:52 ` Marius Amado-Alves
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Shark8 @ 2018-07-16 13:34 UTC (permalink / raw)


On Monday, July 16, 2018 at 2:20:28 AM UTC-6, gautier...@hotmail.com wrote:
> Hello,
> Does someone have a function that would take an integer (range 0 .. 2**63), not uniformily distributed (it's a code with some meaning) and return a number that is pseudo-random, uniformily distributed (could be a floating-point number or an integer in a range of minimum length 365) ?
> The difficulty for me is to find a very simple function. There are many that are for hashing strings, but they could be too time-consuming: it's for a number-crunching program where this function will be called billions of times, so I'm looking for something really simple (and fast :-) ).
> TIA

There was somebody a few years back who was an older Mathematician who posted a RNG function to this board some years ago. That might be useful for you. (I want to say his method was KISS#### or something like that.)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-16  8:20 Simple hash or pseudo-random function gautier_niouzes
  2018-07-16 13:34 ` Shark8
@ 2018-07-16 14:52 ` Marius Amado-Alves
  2018-07-16 20:59   ` gautier_niouzes
  2018-07-16 17:17 ` Jeffrey R. Carter
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Marius Amado-Alves @ 2018-07-16 14:52 UTC (permalink / raw)


Once I did this very simple function: create the target bitstring on N bits (your integer range 1..365, N=9) from N segments (or N bits at fixed intervals) of the source bitstring (your long integer).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-16  8:20 Simple hash or pseudo-random function gautier_niouzes
  2018-07-16 13:34 ` Shark8
  2018-07-16 14:52 ` Marius Amado-Alves
@ 2018-07-16 17:17 ` Jeffrey R. Carter
  2018-07-16 21:14   ` gautier_niouzes
  2018-07-17  0:40 ` Dan'l Miller
  2018-07-17  6:44 ` Paul Rubin
  4 siblings, 1 reply; 12+ messages in thread
From: Jeffrey R. Carter @ 2018-07-16 17:17 UTC (permalink / raw)


On 07/16/2018 10:20 AM, gautier_niouzes@hotmail.com wrote:
> 
> Does someone have a function that would take an integer (range 0 .. 2**63), not uniformily distributed (it's a code with some meaning) and return a number that is pseudo-random, uniformily distributed (could be a floating-point number or an integer in a range of minimum length 365) ?
> The difficulty for me is to find a very simple function. There are many that are for hashing strings, but they could be too time-consuming: it's for a number-crunching program where this function will be called billions of times, so I'm looking for something really simple (and fast :-) ).

Not sure what you mean by "an integer in a range of minimum length 365". You 
could simply use a RNG to generate 64-bit values and xor them with your values 
if you want a 64-bit result. By setting the initial seed, the sequence would be 
repeatable. Ada.Numerics.Discrete_Random would probably serve. If it's too slow, 
the PragmAda Reusable Components includes KISS, a very fast, pretty good RNG. It 
produces 32-bit values, so you'd probably need to call it twice.

-- 
Jeff Carter
"If you don't get the President of the United States on that
phone, ... you're going to have to answer to the Coca-Cola
Company."
Dr. Strangelove
32

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-16 14:52 ` Marius Amado-Alves
@ 2018-07-16 20:59   ` gautier_niouzes
  0 siblings, 0 replies; 12+ messages in thread
From: gautier_niouzes @ 2018-07-16 20:59 UTC (permalink / raw)


Le lundi 16 juillet 2018 16:52:04 UTC+2, Marius Amado-Alves a écrit :
> Once I did this very simple function: create the target bitstring on N bits (your integer range 1..365, N=9) from N segments (or N bits at fixed intervals) of the source bitstring (your long integer).

Thanks, that's something I'll explore!


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-16 17:17 ` Jeffrey R. Carter
@ 2018-07-16 21:14   ` gautier_niouzes
  2018-07-17  6:09     ` Jeffrey R. Carter
  0 siblings, 1 reply; 12+ messages in thread
From: gautier_niouzes @ 2018-07-16 21:14 UTC (permalink / raw)


Le lundi 16 juillet 2018 19:17:49 UTC+2, Jeffrey R. Carter a écrit :

> Not sure what you mean by "an integer in a range of minimum length 365". You 
> could simply use a RNG to generate 64-bit values and xor them with your values 
> if you want a 64-bit result. By setting the initial seed, the sequence would be 
> repeatable. Ada.Numerics.Discrete_Random would probably serve. If it's too slow, 
> the PragmAda Reusable Components includes KISS, a very fast, pretty good RNG. It 
> produces 32-bit values, so you'd probably need to call it twice.

The 64-bit value is the *input* and the output is a function of that input only.
e.g.
10562032 gives always 211
31375393 gives always 31
85232830 gives always 172
NB: the input codes can appear in a different order, so a pseudo-random *sequence* cannot be used.

I've tested different RNG's by initializing them with the input code and using only the first pseudo-random value using that seed. The good news is that they seem uniformly distributed even with successive seed values, but they are not random enough when seeds are similar. I'll check Marius' solution, or a hash function like CRC.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-16  8:20 Simple hash or pseudo-random function gautier_niouzes
                   ` (2 preceding siblings ...)
  2018-07-16 17:17 ` Jeffrey R. Carter
@ 2018-07-17  0:40 ` Dan'l Miller
  2018-07-17  6:44 ` Paul Rubin
  4 siblings, 0 replies; 12+ messages in thread
From: Dan'l Miller @ 2018-07-17  0:40 UTC (permalink / raw)


On Monday, July 16, 2018 at 3:20:28 AM UTC-5, gautier...@hotmail.com wrote:
> Hello,
> Does someone have a function that would take an integer (range 0 .. 2**63), not uniformily distributed
> (it's a code with some meaning) and return a number that is pseudo-random, uniformily distributed
> (could be a floating-point number or an integer in a range of minimum length 365) ?

Do you mean, e.g., a modular-arithmetic integer of length 365 bits?  Of range 0..2**365-1?

I ask because bitlength is the customary measurement of “length“ of an integer in cryptography, but the lengths of integers in crypto tend to be 512, 1024, 2048, and so forth, not 365 per se.  Bitlength of an integer is also the customary measurement of “length” of an integer keys for depth-first walks of radix tries* as well (extrapolated from binary/radix-2 PATRICIA tries*), but the lengths of depth-first walks of radix tries tends to be word sizes (e.g., 16, 32, 64, or 128 bits) or the lengths of depth-first walks of PATRICIA tries tends to be bitwidth-of-character multiplied by length of string, not 365 per se.

* The more-correct ‘trie’ is sometimes misspelled ‘tree’, despite the O(1) growth of trie search-time as opposed to log(n) growth of tree search-time for trees, as the quantity n of leaves increases arbitrarily large.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-16 13:34 ` Shark8
@ 2018-07-17  4:44   ` robin.vowels
  0 siblings, 0 replies; 12+ messages in thread
From: robin.vowels @ 2018-07-17  4:44 UTC (permalink / raw)


On Monday, July 16, 2018 at 11:34:02 PM UTC+10, Shark8 wrote:
> On Monday, July 16, 2018 at 2:20:28 AM UTC-6, gautier...@hotmail.com wrote:
> > Hello,
> > Does someone have a function that would take an integer (range 0 .. 2**63), not uniformily distributed (it's a code with some meaning) and return a number that is pseudo-random, uniformily distributed (could be a floating-point number or an integer in a range of minimum length 365) ?
> > The difficulty for me is to find a very simple function. There are many that are for hashing strings, but they could be too time-consuming: it's for a number-crunching program where this function will be called billions of times, so I'm looking for something really simple (and fast :-) ).
> > TIA
> 
> There was somebody a few years back who was an older Mathematician who posted a RNG function to this board some years ago. That might be useful for you. (I want to say his method was KISS#### or something like that.)

That was George Marsaglia, and he was author of a number of
excellend pseudo random number generators.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-16 21:14   ` gautier_niouzes
@ 2018-07-17  6:09     ` Jeffrey R. Carter
  2018-07-18 13:38       ` gautier_niouzes
  0 siblings, 1 reply; 12+ messages in thread
From: Jeffrey R. Carter @ 2018-07-17  6:09 UTC (permalink / raw)


On 07/16/2018 11:14 PM, gautier_niouzes@hotmail.com wrote:
> 
> The 64-bit value is the *input* and the output is a function of that input only.
> e.g.
> 10562032 gives always 211
> 31375393 gives always 31
> 85232830 gives always 172
> NB: the input codes can appear in a different order, so a pseudo-random *sequence* cannot be used.
> 
> I've tested different RNG's by initializing them with the input code and using only the first pseudo-random value using that seed. The good news is that they seem uniformly distributed even with successive seed values, but they are not random enough when seeds are similar. I'll check Marius' solution, or a hash function like CRC.

That makes things clearer. I would have suggested what you tried. Perhaps you 
could modify it a bit so that, for a value N, you discard the 1st N rem M values 
from the RNG before generating the output for N, and see if that improves the 
randomness of the results. M should be small enough that this is fast enough for 
your requirements but large enough to be significantly different from what 
you've already tried.

Another possibility would be to try (calling your 64-bit type U64)

subtype S8 is String (1 .. 8);

function To_S8 is new Ada.Unchecked_Conversion (Source => U64, Target => S8);

Ada.Strings.Hash (To_S8 (N) );

If Hash is a decent hash function ("It should be unlikely for similar strings to 
return the same value.") then it might be sufficiently random.

-- 
Jeff Carter
"What's the amount of the insult?"
Never Give a Sucker an Even Break
104


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-16  8:20 Simple hash or pseudo-random function gautier_niouzes
                   ` (3 preceding siblings ...)
  2018-07-17  0:40 ` Dan'l Miller
@ 2018-07-17  6:44 ` Paul Rubin
  4 siblings, 0 replies; 12+ messages in thread
From: Paul Rubin @ 2018-07-17  6:44 UTC (permalink / raw)


gautier_niouzes@hotmail.com writes:
> The difficulty for me is to find a very simple function. There are
> many that are for hashing strings, but they could be too
> time-consuming: it's for a number-crunching program where this
> function will be called billions of times, so I'm looking for
> something really simple (and fast :-) ).

String-hashing functions can be very fast, and if you're -really- all
out for speed, there might be machine intrinsics that you could use.
Also you can take the exact characteristics of the input and output
into account when optimizing.

From your description "not uniformily distributed (it's a code with some
meaning)", this may be of interest:

https://en.wikipedia.org/wiki/Zobrist_hashing

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-17  6:09     ` Jeffrey R. Carter
@ 2018-07-18 13:38       ` gautier_niouzes
  2018-08-01  9:08         ` gautier_niouzes
  0 siblings, 1 reply; 12+ messages in thread
From: gautier_niouzes @ 2018-07-18 13:38 UTC (permalink / raw)


Am Dienstag, 17. Juli 2018 08:09:10 UTC+2 schrieb Jeffrey R. Carter:

> Another possibility would be to try (calling your 64-bit type U64)
> 
> subtype S8 is String (1 .. 8);
> 
> function To_S8 is new Ada.Unchecked_Conversion (Source => U64, Target => S8);
> 
> Ada.Strings.Hash (To_S8 (N) );
> 
> If Hash is a decent hash function ("It should be unlikely for similar strings to 
> return the same value.") then it might be sufficiently random.

Brillant, works as expected (after a few adjustments on the final range to be really uniform).

Looking at GNAT's s-strhas.adb :

      H := 0;
      for J in Key'Range loop
         H := Char_Type'Pos (Key (J))
                + Shift_Left (H, 6) + Shift_Left (H, 16) - H;
      end loop;

I guess it is "random" enough on small changes of Key (Key'First .. Key'Last - 1) or on small changes (+1 or -1) of N if U64's are stored as little-endian.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Simple hash or pseudo-random function
  2018-07-18 13:38       ` gautier_niouzes
@ 2018-08-01  9:08         ` gautier_niouzes
  0 siblings, 0 replies; 12+ messages in thread
From: gautier_niouzes @ 2018-08-01  9:08 UTC (permalink / raw)


Just a followup on the last remark: as expected, GNAT's Hash function gives consecutive values for consecutive characters at the end of the string. People may want to append some constant string to get randomness in the hash code.

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Hash;

procedure Hash_test is
begin
  Put_Line (Ada.Strings.Hash ("Abcd")'Image);
  Put_Line (Ada.Strings.Hash ("Bbcd")'Image);
  Put_Line (Ada.Strings.Hash ("Cbcd")'Image);
  Put_Line ("--");
  Put_Line (Ada.Strings.Hash ("abCd")'Image);
  Put_Line (Ada.Strings.Hash ("abDd")'Image);
  Put_Line (Ada.Strings.Hash ("abEd")'Image);
  Put_Line ("--");
  Put_Line (Ada.Strings.Hash ("abcD")'Image);
  Put_Line (Ada.Strings.Hash ("abcE")'Image);
  Put_Line (Ada.Strings.Hash ("abcF")'Image);
end Hash_test;

 14682274
 795269473
 1575856672
--
 3516536994
 3516602593
 3516668192
--
 3518636130
 3518636131
 3518636132

NB: same behavior for ObjectAda (reverting here to pure Ada 2005 syntax):

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Hash;
with Ada.Containers; use Ada.Containers;

procedure Hash_test is
begin
  Put_Line (Hash_Type'Image (Ada.Strings.Hash ("Abcd")));
  Put_Line (Hash_Type'Image (Ada.Strings.Hash ("Bbcd")));
  Put_Line (Hash_Type'Image (Ada.Strings.Hash ("Cbcd")));
  Put_Line ("--");                                    
  Put_Line (Hash_Type'Image (Ada.Strings.Hash ("abCd")));
  Put_Line (Hash_Type'Image (Ada.Strings.Hash ("abDd")));
  Put_Line (Hash_Type'Image (Ada.Strings.Hash ("abEd")));
  Put_Line ("--");                                    
  Put_Line (Hash_Type'Image (Ada.Strings.Hash ("abcD")));
  Put_Line (Hash_Type'Image (Ada.Strings.Hash ("abcE")));
  Put_Line (Hash_Type'Image (Ada.Strings.Hash ("abcF")));
end Hash_test;

 803596578
 1584183777
 2364770976
--
 10484002
 10549601
 10615200
--
 12583138
 12583139
 12583140

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-08-01  9:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-16  8:20 Simple hash or pseudo-random function gautier_niouzes
2018-07-16 13:34 ` Shark8
2018-07-17  4:44   ` robin.vowels
2018-07-16 14:52 ` Marius Amado-Alves
2018-07-16 20:59   ` gautier_niouzes
2018-07-16 17:17 ` Jeffrey R. Carter
2018-07-16 21:14   ` gautier_niouzes
2018-07-17  6:09     ` Jeffrey R. Carter
2018-07-18 13:38       ` gautier_niouzes
2018-08-01  9:08         ` gautier_niouzes
2018-07-17  0:40 ` Dan'l Miller
2018-07-17  6:44 ` Paul Rubin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox