comp.lang.ada
 help / color / mirror / Atom feed
* Performance of records with variant parts
@ 2021-03-22 17:02 John Perry
  2021-03-22 17:32 ` Jeffrey R. Carter
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: John Perry @ 2021-03-22 17:02 UTC (permalink / raw)


Hello all

I encountered a significant penalty when using a record with variant parts in gnat community edition, on both MacOS and Linux. Suppose I have the following. (Kind of long, sorry.)

| type Object_Type is ( Sphere, Plane, None );
| type Surface_Type is ( Shiny, Checkerboard );
| type Thing_Type(kind: Object_Type := None; surface: Surface_Type := Shiny)
| is record
|    case kind is
|       when None =>
|          null;
|       when Sphere =>
|          radius2: Long_Float := 0.0;
|          center: Vector;
|       when Plane =>
|          offset: Long_Float := 0.0;
|          norm: Vector;
|    end case;
| end record;

Noticing that the fields are actually the same types, but different names, I also tried this:

| type Thing_Type is record
|    Kind: Object_Type := None;
|    Surface: Surface_Type := Shiny;
|    Size: Long_Float := 0.0;
|    Pos: Vector;
| end record;

This second version doesn't seem a good idea, but compiled output runs 15-30% faster. As far as I can tell***, it's related to this test which gets called ~3 million times:

|       case obj.kind is
|       when Sphere =>
|          -- do stuff
|       when Plane =>
|          -- do other stuff
|       when None =>
|          null;
|       end case;

Is there a reason that the record with variant parts runs so much slower? I can make the complete source available if need be.

john perry

***"as far as I can tell": gprof  stated at one point that this procedure sucks up most of the time; it's one of only two places where the discriminant is evaluated, and it's the only one that gets called A LOT; and this one change makes for the difference in performance.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 17:02 Performance of records with variant parts John Perry
@ 2021-03-22 17:32 ` Jeffrey R. Carter
  2021-03-22 17:49   ` John Perry
  2021-03-22 17:39 ` Dmitry A. Kazakov
  2021-03-22 18:59 ` Niklas Holsti
  2 siblings, 1 reply; 24+ messages in thread
From: Jeffrey R. Carter @ 2021-03-22 17:32 UTC (permalink / raw)


On 3/22/21 6:02 PM, John Perry wrote:
> 
> |       case obj.kind is
> |       when Sphere =>
> |          -- do stuff
> |       when Plane =>
> |          -- do other stuff
> |       when None =>
> |          null;
> |       end case;
> 
> Is there a reason that the record with variant parts runs so much slower? I can make the complete source available if need be.

1. Why is Surface a discriminant?
2. What are the quantitative timing requirements for the program? Does the 
difference affect whether or not they are met?
3. What is the declaration of Obj?

-- 
Jeff Carter
"One day he told me he was a gynecologist. He
couldn't speak no foreign languages."
Take the Money and Run
147

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 17:02 Performance of records with variant parts John Perry
  2021-03-22 17:32 ` Jeffrey R. Carter
@ 2021-03-22 17:39 ` Dmitry A. Kazakov
  2021-03-22 17:45   ` John Perry
  2021-03-22 18:59 ` Niklas Holsti
  2 siblings, 1 reply; 24+ messages in thread
From: Dmitry A. Kazakov @ 2021-03-22 17:39 UTC (permalink / raw)


On 2021-03-22 18:02, John Perry wrote:

> Is there a reason that the record with variant parts runs so much slower? I can make the complete source available if need be.

It must check the discriminant if the object is dynamically constrained. 
Checking the discriminant and accessing a scalar field is roughly in the 
same league, so 15-30% penalty is a quite good outcome. I would expect 
50% or even worse with cache misses between independent memory accesses.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 17:39 ` Dmitry A. Kazakov
@ 2021-03-22 17:45   ` John Perry
  2021-03-22 18:07     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 24+ messages in thread
From: John Perry @ 2021-03-22 17:45 UTC (permalink / raw)


On Monday, March 22, 2021 at 12:39:54 PM UTC-5, Dmitry A. Kazakov wrote:
> On 2021-03-22 18:02, John Perry wrote: 
> 
> > Is there a reason that the record with variant parts runs so much slower? I can make the complete source available if need be.
> It must check the discriminant if the object is dynamically constrained.

I'm not quite sure I understand. Are you saying that it has to check the discriminant at run-time? How would that differ from checking the value if it were merely a field of the record?

(Sorry if this is is a dumb question, but I guess I don't understand variant records as well as I thought.)

> Checking the discriminant and accessing a scalar field is roughly in the 
> same league, so 15-30% penalty is a quite good outcome. I would expect 
> 50% or even worse with cache misses between independent memory accesses. 

OK, so 15-30% is actually good news. Interesting.

john perry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 17:32 ` Jeffrey R. Carter
@ 2021-03-22 17:49   ` John Perry
  2021-03-22 17:54     ` John Perry
  2021-03-22 19:31     ` Jeffrey R. Carter
  0 siblings, 2 replies; 24+ messages in thread
From: John Perry @ 2021-03-22 17:49 UTC (permalink / raw)


On Monday, March 22, 2021 at 12:32:39 PM UTC-5, Jeffrey R. Carter wrote:
> On 3/22/21 6:02 PM, John Perry wrote: 
> > 
> > | case obj.kind is 
> > | when Sphere => 
> > | -- do stuff 
> > | when Plane => 
> > | -- do other stuff 
> > | when None => 
> > | null; 
> > | end case; 
> > 
> > Is there a reason that the record with variant parts runs so much slower? I can make the complete source available if need be.
> 1. Why is Surface a discriminant? 

That's a good question. I don't remember why I made it a discriminant. That's easily fixed.

> 2. What are the quantitative timing requirements for the program? Does the 
> difference affect whether or not they are met? 

This is a raytracing program. There are no particular timing requirements, but if I wanted a larger image with more objects things would get ugly. This is really more a matter of my curiosity.

> 3. What is the declaration of Obj?

Obj is an "in" parameter to a function. Do you need more than that?

john perry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 17:49   ` John Perry
@ 2021-03-22 17:54     ` John Perry
  2021-03-22 19:31     ` Jeffrey R. Carter
  1 sibling, 0 replies; 24+ messages in thread
From: John Perry @ 2021-03-22 17:54 UTC (permalink / raw)


On Monday, March 22, 2021 at 12:49:59 PM UTC-5, John Perry wrote:
> On Monday, March 22, 2021 at 12:32:39 PM UTC-5, Jeffrey R. Carter wrote: 
> > 1. Why is Surface a discriminant?
> That's a good question. I don't remember why I made it a discriminant. That's easily fixed.

(...but doesn't affect the performance)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 17:45   ` John Perry
@ 2021-03-22 18:07     ` Dmitry A. Kazakov
  2021-03-22 18:23       ` John Perry
  0 siblings, 1 reply; 24+ messages in thread
From: Dmitry A. Kazakov @ 2021-03-22 18:07 UTC (permalink / raw)


On 2021-03-22 18:45, John Perry wrote:
> On Monday, March 22, 2021 at 12:39:54 PM UTC-5, Dmitry A. Kazakov wrote:
>> On 2021-03-22 18:02, John Perry wrote:
>>
>>> Is there a reason that the record with variant parts runs so much slower? I can make the complete source available if need be.
>> It must check the discriminant if the object is dynamically constrained.
> 
> I'm not quite sure I understand. Are you saying that it has to check the discriminant at run-time?

Yes, if the field is in the variant part and the discriminant cannot be 
statically deduced to indicate that part [or not, so that it could 
always raise Constraint_Error]

> How would that differ from checking the value if it were merely a field of the record?

The field is always there. I think if you made it an Unchecked_Union you 
would have no penalty (and no safety).

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 18:07     ` Dmitry A. Kazakov
@ 2021-03-22 18:23       ` John Perry
  2021-03-22 20:30         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 24+ messages in thread
From: John Perry @ 2021-03-22 18:23 UTC (permalink / raw)


On Monday, March 22, 2021 at 1:07:02 PM UTC-5, Dmitry A. Kazakov wrote:
> On 2021-03-22 18:45, John Perry wrote: 
> > On Monday, March 22, 2021 at 12:39:54 PM UTC-5, Dmitry A. Kazakov wrote: 
> >> It must check the discriminant if the object is dynamically constrained. 
> > 
> > I'm not quite sure I understand. Are you saying that it has to check the discriminant at run-time?
> Yes, if the field is in the variant part and the discriminant cannot be 
> statically deduced to indicate that part [or not, so that it could 
> always raise Constraint_Error]

I think I understand that, but the check for "obj.kind" is there in both versions of the code. Isn't that a dynamic check even when I use a non-variant record? What's the difference in the check when I'm checking the discriminant of a variant record and the value of a field?

(Sorry if I'm not expressing myself well.)

> > How would that differ from checking the value if it were merely a field of the record?
> The field is always there. I think if you made it an Unchecked_Union you 
> would have no penalty (and no safety).

...and no access to check obj.kind, either :-(

john perry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 17:02 Performance of records with variant parts John Perry
  2021-03-22 17:32 ` Jeffrey R. Carter
  2021-03-22 17:39 ` Dmitry A. Kazakov
@ 2021-03-22 18:59 ` Niklas Holsti
  2021-03-22 21:54   ` John Perry
  2 siblings, 1 reply; 24+ messages in thread
From: Niklas Holsti @ 2021-03-22 18:59 UTC (permalink / raw)


On 2021-03-22 19:02, John Perry wrote:
> Hello all
> 
> I encountered a significant penalty when using a record with variant
> parts in gnat community edition, on both MacOS and Linux. Suppose I have
> the following. (Kind of long, sorry.)
> 
> | type Object_Type is ( Sphere, Plane, None );
> | type Surface_Type is ( Shiny, Checkerboard );
> | type Thing_Type(kind: Object_Type := None; surface: Surface_Type := Shiny)
> | is record
> |    case kind is
> |       when None =>
> |          null;
> |       when Sphere =>
> |          radius2: Long_Float := 0.0;
> |          center: Vector;
> |       when Plane =>
> |          offset: Long_Float := 0.0;
> |          norm: Vector;
> |    end case;
> | end record;
> 
> Noticing that the fields are actually the same types, but different
> names, I also tried this:
> 
> | type Thing_Type is record
> |    Kind: Object_Type := None;
> |    Surface: Surface_Type := Shiny;
> |    Size: Long_Float := 0.0;
> |    Pos: Vector;
> | end record;
> 
> This second version doesn't seem a good idea, but compiled output 
> runs 15-30% faster. As far as I can tell***, it's related to this
> test which gets called ~3 million times:
> 
> |       case obj.kind is
> |       when Sphere =>
> |          -- do stuff
> |       when Plane =>
> |          -- do other stuff
> |       when None =>
> |          null;
> |       end case;
> 
> Is there a reason that the record with variant parts runs so much 
> slower? I can make the complete source available if need be.


What is your optimization level? -O2?

After the access to read obj.kind, I would not expect a significant 
slow-down in accessing the discriminant-dependent components of the 
record in each branch of the case statement, because the compiler should 
"know" what the discriminant is, and should not check it again -- at 
least if the record is a constant, as "in" parameters are supposed to be 
(in the absence of aliasing). Moreover, in your case the compiler should 
statically know the offset of each discriminant-dependent component, so 
accessing such a component should not require any overhead compared to a 
record without discriminants.

However, in the past I've seen unexpectedly slow code for copying 
records with discriminants. For example, I once measured a simple "swap" 
operation of this form, where Rec_Type had a single discriminant, with 
about 5 values, all components were of scalar type, and the number of 
discriminant-dependent components was about 4 in each case:

    procedure Swap (This, That : in out Rec_Type)
    is
       X : constant Rec_Type := This;
    begin
       This := That;
       That := X;
    end Swap;

On a SPARC v7 target, this procedure took more than 1000 clock cycles. I 
had a look at the machine code, and there seemed to be a ridiculous 
amount of code to compute how much space should be allocated for X, 
depending on the discriminant of This, and how much data should be 
copied in that initialization and in each of the two assignments. It 
seemed to me that the compiler was using some very general approach to 
code generation for record types with variants, and did not try much to 
simplify simple cases like this, or like your code.

But I don't see any similar variable-initialization or record-copying 
code in your example, so my experience with Swap may not be relevant.

If the compiler is stupidly checking the discriminant of obj for each 
obj-component access, you may be able to speed up the code by creating 
separate sub-procedures for "doing stuff" for each value of the 
discriminant (Sphere, Plane, ...), each having an "in" parameter of a 
constrained sub-type of Thing_Type, for example

    subtype Sphere_Thing is Thing_Type (Kind => Sphere);

    procedure Do_Sphere_Stuff (obj : in Sphere_Thing)
    is ... begin ... end Do_Sphere_Stuff;

    ...
    case obj.kind is
    when Sphere => Do_Sphere_Stuff (obj);
    ...

Of course this adds call overhead, unless the procedures are in-lined -- 
but the possible speed-up from the discriminant-constrained subtype may 
be lost if the compiler's in-lining methods are equally stupid. Perhaps 
worth a try, still.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 17:49   ` John Perry
  2021-03-22 17:54     ` John Perry
@ 2021-03-22 19:31     ` Jeffrey R. Carter
  2021-03-22 22:11       ` John Perry
  1 sibling, 1 reply; 24+ messages in thread
From: Jeffrey R. Carter @ 2021-03-22 19:31 UTC (permalink / raw)


On 3/22/21 6:49 PM, John Perry wrote:
> 
> This is a raytracing program. There are no particular timing requirements, but if I wanted a larger image with more objects things would get ugly. This is really more a matter of my curiosity.

If you have no timing requirements, then you don't care how long it takes and 
have no issue.

> 
> Obj is an "in" parameter to a function. Do you need more than that?

Yes.

What compiler and optimization option are you using? I would expect any compiler 
(except GNAT at -O0) to optimize away any discriminant checks on direct accesses 
to variant components in the branches of the case statement.

Is Obj passed to any subprograms from the branches of the case statement? Are 
any variant components accessed outside of the case statement?

-- 
Jeff Carter
"One day he told me he was a gynecologist. He
couldn't speak no foreign languages."
Take the Money and Run
147

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 18:23       ` John Perry
@ 2021-03-22 20:30         ` Dmitry A. Kazakov
  0 siblings, 0 replies; 24+ messages in thread
From: Dmitry A. Kazakov @ 2021-03-22 20:30 UTC (permalink / raw)


On 2021-03-22 19:23, John Perry wrote:

> I think I understand that, but the check for "obj.kind" is there in both versions of the code. Isn't that a dynamic check even when I use a non-variant record? What's the difference in the check when I'm checking the discriminant of a variant record and the value of a field?

1. Depending on the optimization level the compiler might keep the 
second check in place when accessing the variant field regardless the 
selected case statement alternative.

2. A varying size object might be allocated on the secondary stack 
rather than on the program stack. That could indict a sufficient penalty

3. Variant alternatives could be allocated outside the object. That 
would require an additional indirection and thus penalties when copying 
objects and accessing fields.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 18:59 ` Niklas Holsti
@ 2021-03-22 21:54   ` John Perry
  0 siblings, 0 replies; 24+ messages in thread
From: John Perry @ 2021-03-22 21:54 UTC (permalink / raw)


On Monday, March 22, 2021 at 1:59:34 PM UTC-5, Niklas Holsti wrote:
> What is your optimization level? -O2? 

I've tried several. The one with the best performance is -O3 -funroll-loops -gnatn.

> Moreover, in your case the compiler should 
> statically know the offset of each discriminant-dependent component, so 
> accessing such a component should not require any overhead compared to a 
> record without discriminants. 

Precisely this perplexes me. Nothing in this section of the code would be unknown after the discriminant check. -- Nothing that seems obvious to me, anyway. I'll copy some code in a reply to someone else in a moment.

> However, in the past I've seen unexpectedly slow code for copying 
> records with discriminants. ...
> [very interesting material excised for brevity]
> If the compiler is stupidly checking the discriminant of obj for each 
> obj-component access, you may be able to speed up the code by creating 
> separate sub-procedures

I liked this idea and thought it might be the problem, but that also didn't help out. I may have to look at the assembly, which I am loath to do. :-/

john perry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 19:31     ` Jeffrey R. Carter
@ 2021-03-22 22:11       ` John Perry
  2021-03-23  9:31         ` Jeffrey R. Carter
  0 siblings, 1 reply; 24+ messages in thread
From: John Perry @ 2021-03-22 22:11 UTC (permalink / raw)


On Monday, March 22, 2021 at 2:31:46 PM UTC-5, Jeffrey R. Carter wrote:
> On 3/22/21 6:49 PM, John Perry wrote: 
> > Obj is an "in" parameter to a function. Do you need more than that?
> Yes. 
> 
> What compiler and optimization option are you using? I would expect any compiler 
> (except GNAT at -O0) to optimize away any discriminant checks on direct accesses 
> to variant components in the branches of the case statement. 
> 
> Is Obj passed to any subprograms from the branches of the case statement? Are 
> any variant components accessed outside of the case statement?

The compiler on the computer I'm using at the moment is GNAT Community 2020 (20200429-93) on x86_64-redhat-linux. I've also tested on MacOS, but I think that one is Community 2019. Both exhibit identical behavior.

Current optimization is -O3 -funroll-loops -gnatn, but the behavior remains consistent.

The full subprogram as it currently stands is below. As indicated in the OP, the *only* differences between this code and the non-variant code occurs lie in the variance of Thing_Type and the field names.

|  function Object_Intersect( obj: in Thing_Type; ray: in Ray_Type )
|                            return Long_Float
|  is
|  begin
|     case obj.kind is
|     when Sphere =>
|        declare
|           eo: Vector := obj.center - ray.start;
|           v: Long_Float := eo * ray.dir;
|        begin
|           if v > 0.0 then
|              declare
|                 disc: Long_Float := obj.radius2 - ( eo * eo - v * v );
|              begin
|                 if disc > 0.0 then
|                    return v - Sqrt(disc);
|                 end if;
|              end;
|           end if;
|        end;
|     when Plane =>
|        declare
|           denom: Long_Float := obj.norm * ray.dir;
|        begin
|           if denom <= 0.0 then
|              return ( obj.norm * ray.start + obj.offset ) / ( -denom );
|           end if;
|        end;
|        when None =>
|           return far_away;
|     end case;
|     return Far_Away;
|  end Object_Intersect;

john perry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-22 22:11       ` John Perry
@ 2021-03-23  9:31         ` Jeffrey R. Carter
  2021-03-23 14:27           ` Simon Wright
  2021-03-23 23:00           ` John Perry
  0 siblings, 2 replies; 24+ messages in thread
From: Jeffrey R. Carter @ 2021-03-23  9:31 UTC (permalink / raw)


On 3/22/21 11:11 PM, John Perry wrote:
> 
> Current optimization is -O3 -funroll-loops -gnatn, but the behavior remains consistent.
> 
> |  function Object_Intersect( obj: in Thing_Type; ray: in Ray_Type )
> |                            return Long_Float
> |  is
> |  begin
> |     case obj.kind is
> |     when Sphere =>
> |        declare
> |           eo: Vector := obj.center - ray.start;
> |           v: Long_Float := eo * ray.dir;
> |        begin
> |           if v > 0.0 then
> |              declare
> |                 disc: Long_Float := obj.radius2 - ( eo * eo - v * v );
> |              begin
> |                 if disc > 0.0 then
> |                    return v - Sqrt(disc);
> |                 end if;
> |              end;
> |           end if;
> |        end;
> |     when Plane =>
> |        declare
> |           denom: Long_Float := obj.norm * ray.dir;
> |        begin
> |           if denom <= 0.0 then
> |              return ( obj.norm * ray.start + obj.offset ) / ( -denom );
> |           end if;
> |        end;
> |        when None =>
> |           return far_away;
> |     end case;
> |     return Far_Away;
> |  end Object_Intersect;

Interesting. With that optimization I would not expect this to differ noticeably 
from a record without discriminants.

I note that all your nested declarations could be declared constant.

Use of Long_Float is completely non-portable.

If you want to look into this further, IIRC, there is an option that will output 
a low-level, Ada-like description of the IR that the front end produces. You'll 
have to look in the top-secret GNAT documentation to find it.

-- 
Jeff Carter
"[I]f you ask, 'Why does Ada do/have this?',
the answer often makes you a better programmer."
Chad R. Meiners
177

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-23  9:31         ` Jeffrey R. Carter
@ 2021-03-23 14:27           ` Simon Wright
  2021-03-23 23:00           ` John Perry
  1 sibling, 0 replies; 24+ messages in thread
From: Simon Wright @ 2021-03-23 14:27 UTC (permalink / raw)


"Jeffrey R. Carter" <spam.jrcarter.not@spam.not.acm.org> writes:

> If you want to look into this further, IIRC, there is an option that
> will output a low-level, Ada-like description of the IR that the front
> end produces. You'll have to look in the top-secret GNAT documentation
> to find it.

-gnatG (-gnatGnn to output in lines up to nn long, vs default 72) to
 stdout

-gnatD (-gnatDnn likewise). The help (gnatmake --help) suggests it's for
 debug, but the main difference I see is that the output is to
 <input>.dg, e.g. foo.adb.dg

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-23  9:31         ` Jeffrey R. Carter
  2021-03-23 14:27           ` Simon Wright
@ 2021-03-23 23:00           ` John Perry
  2021-03-23 23:27             ` Jeffrey R. Carter
  1 sibling, 1 reply; 24+ messages in thread
From: John Perry @ 2021-03-23 23:00 UTC (permalink / raw)


On Tuesday, March 23, 2021 at 4:31:05 AM UTC-5, Jeffrey R. Carter wrote:
> On 3/22/21 11:11 PM, John Perry wrote: 
> Interesting. With that optimization I would not expect this to differ noticeably 
> from a record without discriminants. 
> 
> I note that all your nested declarations could be declared constant. 
> 
> Use of Long_Float is completely non-portable. 

Thank you for these tips.

> If you want to look into this further, IIRC, there is an option that will output 
> a low-level, Ada-like description of the IR that the front end produces. You'll 
> have to look in the top-secret GNAT documentation to find it. 

Yep, that shows the problem. For the declaration that immediately follows "when Sphere =>" we get

|           B_2 : declare
|              [constraint_error when
|               objects__thing_typeD2 (obj.kind)
|                "discriminant check failed"]
|              eo : vectors__vector := vectors__Osubtract (obj.center,

and that function translates as

|     function objects__thing_typeD2 (kind : objects__object_type)
|       return boolean is
|     begin
|        if kind = objects__sphere then
|           return false;
|        else
|           return true;
|        end if;
|     end objects__thing_typeD2;

So I'm getting a completely useless warning.

It's even worse for "case Plane =>"; it checks the same thing twice!

|              if denom <= 0.0 then
|                 [constraint_error when
|                   objects__thing_typeD3 (obj.kind)
|                   "discriminant check failed"]
|                 [constraint_error when
|                   objects__thing_typeD3 (obj.kind)
|                   "discriminant check failed"]
|                 return (vectors__vector_dot (obj.norm, ray.start) +
|                   obj.offset) / (-denom);

Am I right in thinking this is a bug? It shouldn't even check it, let alone check it twice.

regards
john perry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-23 23:00           ` John Perry
@ 2021-03-23 23:27             ` Jeffrey R. Carter
  2021-03-26 15:38               ` Stephen Leake
  0 siblings, 1 reply; 24+ messages in thread
From: Jeffrey R. Carter @ 2021-03-23 23:27 UTC (permalink / raw)


On 3/24/21 12:00 AM, John Perry wrote:
> 
> |              if denom <= 0.0 then
> |                 [constraint_error when
> |                   objects__thing_typeD3 (obj.kind)
> |                   "discriminant check failed"]
> |                 [constraint_error when
> |                   objects__thing_typeD3 (obj.kind)
> |                   "discriminant check failed"]
> |                 return (vectors__vector_dot (obj.norm, ray.start) +
> |                   obj.offset) / (-denom);
> 
> Am I right in thinking this is a bug? It shouldn't even check it, let alone check it twice.

It's not technically an error. Both Obj.Norm and Obj.Offset depend on a 
discriminant, so having two checks is technically OK. But eliminating the checks 
completely seems like a basic optimization that one would expect at -O3.

It's possible that this kind of optimization takes place later, and I guess the 
only way to check that would be to look at the assembler.

You could perhaps create a reproducer that demonstrates the timing difference 
and submit it to AdaCore.

-- 
Jeff Carter
"[I]f you ask, 'Why does Ada do/have this?',
the answer often makes you a better programmer."
Chad R. Meiners
177

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-23 23:27             ` Jeffrey R. Carter
@ 2021-03-26 15:38               ` Stephen Leake
  2021-03-26 15:44                 ` John Perry
  0 siblings, 1 reply; 24+ messages in thread
From: Stephen Leake @ 2021-03-26 15:38 UTC (permalink / raw)


"Jeffrey R. Carter" <spam.jrcarter.not@spam.not.acm.org> writes:

> On 3/24/21 12:00 AM, John Perry wrote:
>> |              if denom <= 0.0 then
>> |                 [constraint_error when
>> |                   objects__thing_typeD3 (obj.kind)
>> |                   "discriminant check failed"]
>> |                 [constraint_error when
>> |                   objects__thing_typeD3 (obj.kind)
>> |                   "discriminant check failed"]
>> |                 return (vectors__vector_dot (obj.norm, ray.start) +
>> |                   obj.offset) / (-denom);
>> Am I right in thinking this is a bug? It shouldn't even check it,
>> let alone check it twice.

As a workaround, you can suppress checks in that block of code:

pragma Suppress (Discriminant_Check);


-- 
-- Stephe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-26 15:38               ` Stephen Leake
@ 2021-03-26 15:44                 ` John Perry
  2021-03-30  7:12                   ` Emmanuel Briot
  0 siblings, 1 reply; 24+ messages in thread
From: John Perry @ 2021-03-26 15:44 UTC (permalink / raw)


On Friday, March 26, 2021 at 10:38:22 AM UTC-5, Stephen Leake wrote:
> As a workaround, you can suppress checks in that block of code: 
> 
> pragma Suppress (Discriminant_Check); 

Sorry, I should have updated earlier. I discovered tried this a couple of nights ago, and it did nothing to help. I checked the IR and that did change, so it certainly did something; it just didn't improve the performance as I expected. It has made me think that perhaps the problem is elsewhere, but I have no idea where.

So, thank you very much for remembering this and getting back to me! But by now I've concluded that gprof misled me. Wouldn't be the first time. The difference in performance is real, but must be caused elsewhere. I don't even feel confidence submitting an enhancement request to AdaCore because by now I have no idea what the problem is.

For what it's worth: I reimplemented it a fourth time as tagged records, and while the code is quite a bit wordier, it was more efficient than variant records.

(I also tried examining the generated assembly code, but I'm not strong enough in assembly to work that out.)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-26 15:44                 ` John Perry
@ 2021-03-30  7:12                   ` Emmanuel Briot
  2021-04-01  0:03                     ` John Perry
  0 siblings, 1 reply; 24+ messages in thread
From: Emmanuel Briot @ 2021-03-30  7:12 UTC (permalink / raw)


> So, thank you very much for remembering this and getting back to me! But by now I've concluded that gprof misled me. Wouldn't be the first time. The difference in performance is real, but must be caused elsewhere. I don't even feel confidence submitting an enhancement request to AdaCore because by now I have no idea what the problem is. 

Since you are on linux, you might want to look at `perf` for performance analysis:

    perf record --call-graph=fp your_program
    perf report -g

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-03-30  7:12                   ` Emmanuel Briot
@ 2021-04-01  0:03                     ` John Perry
  2021-04-01  6:45                       ` Emmanuel Briot
  2021-04-01 13:38                       ` Niklas Holsti
  0 siblings, 2 replies; 24+ messages in thread
From: John Perry @ 2021-04-01  0:03 UTC (permalink / raw)


On Tuesday, March 30, 2021 at 2:12:36 AM UTC-5, briot.e...@gmail.com wrote:
> Since you are on linux, you might want to look at `perf` for performance analysis: 
> 
> perf record --call-graph=fp your_program 
> perf report -g

I don't know why I didn't think of that; I've used perf before. It took a while to find the problem (inlining was an issue), but that did the trick! Both variant and non-variant versions now perform at the same speed.

The problem was indeed a different location. A function returns an Intersection_Type, which has several fields, one of which was a Thing_Type. So if Thing_Type is variant, Intersection_Type also has to be variant. The non-variant version in Ada was

|     return (
|             if which = 0 then No_Intersection
|             else (
|                   thing => things(which),
|                   ray => ray,
|                   dist => dist(which)
|                  )
|            );

...and -gnatD's output for the non-variant version was

|     return (if which = 0 then objects__no_intersection else (
|        thing => things (which),
|        ray => ray,
|        dist => dist (which)));

The non-variant Ada version was:

|     return (
|             if which = 0 then No_Intersection
|             else (
|                   kind => things(which).kind,
|                   thing => things(which),
|                   ray => ray,
|                   dist => dist(which)
|                  )
|            );

The -gnatD option (thanks, Simon!) translates this as:

|     R79b : constant objects__object_type := things (which).kind;
|     type objects__intersections__A84b is access all
|       objects__intersection_type;
|     freeze objects__intersections__A84b []
|     R83b : constant objects__intersections__A84b := (if which = 0
|       then objects__no_intersection else
|        do
|           [subtype objects__intersections__T78b is
|             objects__intersection_type (R79b)]
|           A81b : objects__intersections__T78b;
|           A81b.kind := R79b;
|           A81b.thing := things (which);
|           A81b.ray := ray;
|           A81b.dist := dist (which);
|        in A81b end
|     )'reference;
|     R87b : constant objects__object_type := (R83b.all).kind;
|     subtype objects__intersections__S86b is
|       objects__intersection_type (R87b);
|     return (objects__intersections__S86b!((R83b.all)));

That's... wow. Anyone know why all that is necessary? It seems to create a new inline type instead of using the existing type.

Studying the code, I realized that there was no reason that Intersections_Type had to keep a copy of thing(which); it could instead just remember which, and the caller could obtain thing(which) directly.

That's it. There was nothing else to change.

thank you again, very much
john perry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-04-01  0:03                     ` John Perry
@ 2021-04-01  6:45                       ` Emmanuel Briot
  2021-04-01 13:38                       ` Niklas Holsti
  1 sibling, 0 replies; 24+ messages in thread
From: Emmanuel Briot @ 2021-04-01  6:45 UTC (permalink / raw)


> Studying the code, I realized that there was no reason that Intersections_Type had to keep a copy of thing(which); it could instead just remember which, and the caller could obtain thing(which) directly. 

Well done, seems like it wasn't so easy to find, even if the fix is simple in the end.
I think the -gnatD code you show is because GNAT is creating a temporary object then copying it in the output. So there's one extra copy. The like "A81b.think := things (which)" might be pretty costly too, depending on the size of thing.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-04-01  0:03                     ` John Perry
  2021-04-01  6:45                       ` Emmanuel Briot
@ 2021-04-01 13:38                       ` Niklas Holsti
  2021-04-02 16:07                         ` John Perry
  1 sibling, 1 reply; 24+ messages in thread
From: Niklas Holsti @ 2021-04-01 13:38 UTC (permalink / raw)



On 2021-04-01 3:03, John Perry wrote:

> A function returns an Intersection_Type, which has several fields,
> one of which was a Thing_Type. So if Thing_Type is variant,
> Intersection_Type also has to be variant.

No.

I assume by "X is variant" you mean that X has a variant_part (RM 3.8.1(2)).

If the discriminant of Thing_Type has a default value, Intersection_Type 
can have a Thing_Type component without supplying it with a discriminant 
value, and Intersection_Type need not have discriminants nor a variant part.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Performance of records with variant parts
  2021-04-01 13:38                       ` Niklas Holsti
@ 2021-04-02 16:07                         ` John Perry
  0 siblings, 0 replies; 24+ messages in thread
From: John Perry @ 2021-04-02 16:07 UTC (permalink / raw)


On Thursday, April 1, 2021 at 8:38:49 AM UTC-5, Niklas Holsti wrote:
> I assume by "X is variant" you mean that X has a variant_part (RM 3.8.1(2)). 
> 
> If the discriminant of Thing_Type has a default value, Intersection_Type 
> can have a Thing_Type component without supplying it with a discriminant 
> value, and Intersection_Type need not have discriminants nor a variant part.

You're right. Originally Thing_Type didn't have a default value. Later I added one, but I didn't think to change Intersection_Type.

Either way, the discriminant on Intersection_Type was the problem. Once I modify Intersection_Type so that it is no longer variant, Intersections can return an Intersection_Type that contains a Thing_Type without the performance hit it had when Intersection_Type was variant.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-04-02 16:07 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-22 17:02 Performance of records with variant parts John Perry
2021-03-22 17:32 ` Jeffrey R. Carter
2021-03-22 17:49   ` John Perry
2021-03-22 17:54     ` John Perry
2021-03-22 19:31     ` Jeffrey R. Carter
2021-03-22 22:11       ` John Perry
2021-03-23  9:31         ` Jeffrey R. Carter
2021-03-23 14:27           ` Simon Wright
2021-03-23 23:00           ` John Perry
2021-03-23 23:27             ` Jeffrey R. Carter
2021-03-26 15:38               ` Stephen Leake
2021-03-26 15:44                 ` John Perry
2021-03-30  7:12                   ` Emmanuel Briot
2021-04-01  0:03                     ` John Perry
2021-04-01  6:45                       ` Emmanuel Briot
2021-04-01 13:38                       ` Niklas Holsti
2021-04-02 16:07                         ` John Perry
2021-03-22 17:39 ` Dmitry A. Kazakov
2021-03-22 17:45   ` John Perry
2021-03-22 18:07     ` Dmitry A. Kazakov
2021-03-22 18:23       ` John Perry
2021-03-22 20:30         ` Dmitry A. Kazakov
2021-03-22 18:59 ` Niklas Holsti
2021-03-22 21:54   ` John Perry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox