comp.lang.ada
 help / color / mirror / Atom feed
* Weird behavior of Get character with trailing new lines.
@ 2023-09-22 19:30 Blady
  2023-09-22 19:52 ` Niklas Holsti
  2023-09-22 20:05 ` Jeffrey R.Carter
  0 siblings, 2 replies; 10+ messages in thread
From: Blady @ 2023-09-22 19:30 UTC (permalink / raw)


Hello,

I'm reading a text file with Get character from Text_IO with a while 
loop controlled by End_Of_File.

% cat test_20230922_get_char.adb
with Ada.Text_IO; use Ada.Text_IO;
procedure test_20230922_get_char is
    procedure Get is
       F : File_Type;
       Ch : Character;
    begin
       Open (F, In_File, "test_20230922_get_char.adb");
       while not End_Of_File(F) loop
          Get (F, Ch);
          Put (Ch);
       end loop;
       Close (F);
       Put_Line ("File read with get.");
    end;
begin
Get;
end;



All will be well, unfortunately not!

Despite the End_Of_File, I got an END_ERROR exception when there are 
several trailing new lines at the end of the text:

% test_20230922_get_char
with Ada.Text_IO; use Ada.Text_IO;procedure test_20230922_get_char is 
procedure Get is      F : File_Type;      Ch : Character;   begin 
Open (F, In_File, "test_20230922_get_char.adb");      while not 
End_Of_File(F) loop         Get (F, Ch);         Put (Ch);      end 
loop;      Close (F);      Put_Line ("File read with get."); 
end;beginGet;end;

Execution of ../bin/test_20230922_get_char terminated by unhandled exception
raised ADA.IO_EXCEPTIONS.END_ERROR : a-textio.adb:517

The code is compiled with GNAT, does it comply with the standard?

A.10.7 Input-Output of Characters and Strings
For an item of type Character the following procedures are provided:
procedure Get(File : in File_Type; Item : out Character);
procedure Get(Item : out Character);
After skipping any line terminators and any page terminators, reads the 
next character from the specified input file and returns the value of 
this character in the out parameter Item.
The exception End_Error is propagated if an attempt is made to skip a 
file terminator.

This seems to be the case, then how to avoid the exception?

Thanks, Pascal.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Weird behavior of Get character with trailing new lines.
  2023-09-22 19:30 Weird behavior of Get character with trailing new lines Blady
@ 2023-09-22 19:52 ` Niklas Holsti
  2023-09-22 20:05 ` Jeffrey R.Carter
  1 sibling, 0 replies; 10+ messages in thread
From: Niklas Holsti @ 2023-09-22 19:52 UTC (permalink / raw)


On 2023-09-22 22:30, Blady wrote:
> Hello,
> 
> I'm reading a text file with Get character from Text_IO with a while 
> loop controlled by End_Of_File.
> 
> % cat test_20230922_get_char.adb
> with Ada.Text_IO; use Ada.Text_IO;
> procedure test_20230922_get_char is
>     procedure Get is
>        F : File_Type;
>        Ch : Character;
>     begin
>        Open (F, In_File, "test_20230922_get_char.adb");
>        while not End_Of_File(F) loop
>           Get (F, Ch);
>           Put (Ch);
>        end loop;
>        Close (F);
>        Put_Line ("File read with get.");
>     end;
> begin
> Get;
> end;
> 
> 
> 
> All will be well, unfortunately not!
> 
> Despite the End_Of_File, I got an END_ERROR exception when there are 
> several trailing new lines at the end of the text:
> 
> % test_20230922_get_char
> with Ada.Text_IO; use Ada.Text_IO;procedure test_20230922_get_char is 
> procedure Get is      F : File_Type;      Ch : Character;   begin Open 
> (F, In_File, "test_20230922_get_char.adb");      while not 
> End_Of_File(F) loop         Get (F, Ch);         Put (Ch);      end 
> loop;      Close (F);      Put_Line ("File read with get."); 
> end;beginGet;end;
> 
> Execution of ../bin/test_20230922_get_char terminated by unhandled 
> exception
> raised ADA.IO_EXCEPTIONS.END_ERROR : a-textio.adb:517
> 
> The code is compiled with GNAT, does it comply with the standard?
> 
> A.10.7 Input-Output of Characters and Strings
> For an item of type Character the following procedures are provided:
> procedure Get(File : in File_Type; Item : out Character);
> procedure Get(Item : out Character);
> After skipping any line terminators and any page terminators, reads the 
> next character from the specified input file and returns the value of 
> this character in the out parameter Item.
> The exception End_Error is propagated if an attempt is made to skip a 
> file terminator.
> 
> This seems to be the case, then how to avoid the exception?


In Text_IO, a line terminator is not an ordinary character, so you must 
handle it separately, for example like this:

       while not End_Of_File(F) loop
          if End_Of_Line(F) then
             New_Line;
             Skip_Line(F);
          else
             Get (F, Ch);
             Put (Ch);
          end if;





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Weird behavior of Get character with trailing new lines.
  2023-09-22 19:30 Weird behavior of Get character with trailing new lines Blady
  2023-09-22 19:52 ` Niklas Holsti
@ 2023-09-22 20:05 ` Jeffrey R.Carter
  2023-09-23  7:02   ` J-P. Rosen
  1 sibling, 1 reply; 10+ messages in thread
From: Jeffrey R.Carter @ 2023-09-22 20:05 UTC (permalink / raw)


On 2023-09-22 21:30, Blady wrote:
> 
> A.10.7 Input-Output of Characters and Strings
> For an item of type Character the following procedures are provided:
> procedure Get(File : in File_Type; Item : out Character);
> procedure Get(Item : out Character);
> After skipping any line terminators and any page terminators, reads the next 
> character from the specified input file and returns the value of this character 
> in the out parameter Item.
> The exception End_Error is propagated if an attempt is made to skip a file 
> terminator.

As you have quoted, Get (Character) skips line terminators. End_Of_File returns 
True if there is a single line terminator before the file terminator, but False 
if there are multiple line terminators before the file terminator. So you either 
have to explicitly skip line terminators, or handle End_Error.

-- 
Jeff Carter
"Unix and C are the ultimate computer viruses."
Richard Gabriel
99

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Weird behavior of Get character with trailing new lines.
  2023-09-22 20:05 ` Jeffrey R.Carter
@ 2023-09-23  7:02   ` J-P. Rosen
  2023-09-23  8:39     ` Niklas Holsti
  2023-09-26  5:53     ` Randy Brukardt
  0 siblings, 2 replies; 10+ messages in thread
From: J-P. Rosen @ 2023-09-23  7:02 UTC (permalink / raw)


Le 22/09/2023 à 22:05, Jeffrey R.Carter a écrit :
> On 2023-09-22 21:30, Blady wrote:
>>
>> A.10.7 Input-Output of Characters and Strings
>> For an item of type Character the following procedures are provided:
>> procedure Get(File : in File_Type; Item : out Character);
>> procedure Get(Item : out Character);
>> After skipping any line terminators and any page terminators, reads 
>> the next character from the specified input file and returns the value 
>> of this character in the out parameter Item.
>> The exception End_Error is propagated if an attempt is made to skip a 
>> file terminator.
> 
> As you have quoted, Get (Character) skips line terminators. End_Of_File 
> returns True if there is a single line terminator before the file 
> terminator, but False if there are multiple line terminators before the 
> file terminator. So you either have to explicitly skip line terminators, 
> or handle End_Error.
> 
And this works only if the input file is "well formed", i.e. if it has 
line terminators as the compiler expects them to be (f.e., you will be 
in trouble if the last line has no LF).
That's why I never check End_Of_File, but handle the End_Error 
exception. It always works.
-- 
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
https://www.adalog.fr https://www.adacontrol.fr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Weird behavior of Get character with trailing new lines.
  2023-09-23  7:02   ` J-P. Rosen
@ 2023-09-23  8:39     ` Niklas Holsti
  2023-09-23  9:25       ` Dmitry A. Kazakov
  2023-09-25 19:55       ` Blady
  2023-09-26  5:53     ` Randy Brukardt
  1 sibling, 2 replies; 10+ messages in thread
From: Niklas Holsti @ 2023-09-23  8:39 UTC (permalink / raw)


On 2023-09-23 10:02, J-P. Rosen wrote:
> Le 22/09/2023 à 22:05, Jeffrey R.Carter a écrit :
>> On 2023-09-22 21:30, Blady wrote:
>>>
>>> A.10.7 Input-Output of Characters and Strings
>>> For an item of type Character the following procedures are provided:
>>> procedure Get(File : in File_Type; Item : out Character);
>>> procedure Get(Item : out Character);
>>> After skipping any line terminators and any page terminators, reads 
>>> the next character from the specified input file and returns the 
>>> value of this character in the out parameter Item.
>>> The exception End_Error is propagated if an attempt is made to skip a 
>>> file terminator.
>>
>> As you have quoted, Get (Character) skips line terminators. 
>> End_Of_File returns True if there is a single line terminator before 
>> the file terminator, but False if there are multiple line terminators 
>> before the file terminator. So you either have to explicitly skip line 
>> terminators, or handle End_Error.
>>
> And this works only if the input file is "well formed", i.e. if it has 
> line terminators as the compiler expects them to be (f.e., you will be 
> in trouble if the last line has no LF).


Hm. The code I suggested, which handles line terminators separately, 
does work without raising End_Error even if the last line has no line 
terminator, at least in the context of the OP's program.


> That's why I never check End_Of_File, but handle the End_Error 
> exception. It always works.


True, but it may not be convenient for the overall logic of the program 
that reads the file. That program often wants do to something with the 
contents, after reading the whole file, and having to enter that part of 
the program through an exception does complicate the code a little.

On the other hand, past posts on this issue say that using End_Error 
instead of the End_Of_File function is faster, probably because the 
Text_IO code that implements Get cannot know that the program has 
already checked for End_Of_File, so Get has to check for that case 
anyway, redundantly.

My usual method for reading text files is to use Text_IO.Get_Line, and 
(I admit) usually with End_Error termination.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Weird behavior of Get character with trailing new lines.
  2023-09-23  8:39     ` Niklas Holsti
@ 2023-09-23  9:25       ` Dmitry A. Kazakov
  2023-09-23 14:03         ` Niklas Holsti
  2023-09-25 19:55       ` Blady
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry A. Kazakov @ 2023-09-23  9:25 UTC (permalink / raw)


On 2023-09-23 10:39, Niklas Holsti wrote:
> On 2023-09-23 10:02, J-P. Rosen wrote:

>> That's why I never check End_Of_File, but handle the End_Error 
>> exception. It always works.
> 
> True, but it may not be convenient for the overall logic of the program 
> that reads the file. That program often wants do to something with the 
> contents, after reading the whole file, and having to enter that part of 
> the program through an exception does complicate the code a little.

It rather simplifies the code. You exit the loop and do whatever is 
necessary there.

Testing for the file end is unreliable and non-portable. Many types of 
files simply do not support that test. In other cases the test is not 
file immutable with the side effects that can change the program logic.

It is well advised to never ever use it.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Weird behavior of Get character with trailing new lines.
  2023-09-23  9:25       ` Dmitry A. Kazakov
@ 2023-09-23 14:03         ` Niklas Holsti
  2023-09-24  7:50           ` Dmitry A. Kazakov
  0 siblings, 1 reply; 10+ messages in thread
From: Niklas Holsti @ 2023-09-23 14:03 UTC (permalink / raw)


On 2023-09-23 12:25, Dmitry A. Kazakov wrote:
> On 2023-09-23 10:39, Niklas Holsti wrote:
>> On 2023-09-23 10:02, J-P. Rosen wrote:
> 
>>> That's why I never check End_Of_File, but handle the End_Error 
>>> exception. It always works.
>>
>> True, but it may not be convenient for the overall logic of the 
>> program that reads the file. That program often wants do to something 
>> with the contents, after reading the whole file, and having to enter 
>> that part of the program through an exception does complicate the code 
>> a little.
> 
> It rather simplifies the code. 


Oh?


> You exit the loop and do whatever is necessary there.

That is exactly what happens in the "while not End_Of_File" loop.

If you want to use End_Error instead, you have to add an exception 
handler, and if you want to stay in the subprogram's statement sequence 
without entering the subprogram-level exception handlers, you have to 
add a block to contain the reading loop and make the exception handler 
local to that block.

To me that looks like adding code -> more complex. Of course not much 
more complex, but a little, as I said.


> Testing for the file end is unreliable and non-portable. Many types
> of files simply do not support that test.In other cases the test is
> not file immutable with the side effects that can change the program
> logic.

I suppose you are talking about the need for End_Of_File to possibly 
read ahead past a line terminator? If not, please clarify.

That said, I certainly think that a program reading files should be 
prepared to handle End_Error, especially if a file is read at several 
places in the program (and not in a single loop as in the present program).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Weird behavior of Get character with trailing new lines.
  2023-09-23 14:03         ` Niklas Holsti
@ 2023-09-24  7:50           ` Dmitry A. Kazakov
  0 siblings, 0 replies; 10+ messages in thread
From: Dmitry A. Kazakov @ 2023-09-24  7:50 UTC (permalink / raw)


On 2023-09-23 16:03, Niklas Holsti wrote:
> On 2023-09-23 12:25, Dmitry A. Kazakov wrote:

>> You exit the loop and do whatever is necessary there.
> 
> That is exactly what happens in the "while not End_Of_File" loop.

It does not because you must handle I/O errors and close the file.

> If you want to use End_Error instead, you have to add an exception 
> handler, and if you want to stay in the subprogram's statement sequence 
> without entering the subprogram-level exception handlers, you have to 
> add a block to contain the reading loop and make the exception handler 
> local to that block.

You always have to in order to handle I/O errors.

> To me that looks like adding code -> more complex. Of course not much 
> more complex, but a little, as I said.

No, it is simpler if the code is production code rather than an 
exercise. Consider typical case when looping implements reading some 
message, block etc. You have

    loop
       read something
       read another piece
       read some count
       read a block of count bytes
       ...

You cannot do it this way if you use end of file test because you must 
protect each minimal input item (e.g. byte) by the test. It is massively 
obtrusive and would distort program logic. You will end up with nested 
ifs or else gotos.

>> Testing for the file end is unreliable and non-portable. Many types
>> of files simply do not support that test.In other cases the test is
>> not file immutable with the side effects that can change the program
>> logic.
> 
> I suppose you are talking about the need for End_Of_File to possibly 
> read ahead past a line terminator? If not, please clarify.

Yes, reading ahead and also issues with blocking and with race condition 
in shared files. Then things like sockets do not have end of file, 
connection drop is indicated by an empty read.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Weird behavior of Get character with trailing new lines.
  2023-09-23  8:39     ` Niklas Holsti
  2023-09-23  9:25       ` Dmitry A. Kazakov
@ 2023-09-25 19:55       ` Blady
  1 sibling, 0 replies; 10+ messages in thread
From: Blady @ 2023-09-25 19:55 UTC (permalink / raw)


Le 24/09/2023 à 09:50, Dmitry A. Kazakov a écrit :
> On 2023-09-23 16:03, Niklas Holsti wrote:
>> On 2023-09-23 10:02, J-P. Rosen wrote: >>> Le 22/09/2023 à 22:05, Jeffrey R.Carter a écrit :>>>> On 2023-09-22 
21:30, Blady wrote:
>>>>
>>>> A.10.7 Input-Output of Characters and Strings
>>>> For an item of type Character the following procedures are provided:
>>>> procedure Get(File : in File_Type; Item : out Character);
>>>> procedure Get(Item : out Character);
>>>> After skipping any line terminators and any page terminators, reads 
>>>> the next character from the specified input file and returns the 
>>>> value of this character in the out parameter Item.
>>>> The exception End_Error is propagated if an attempt is made to skip 
>>>> a file terminator.

Thanks all for your helpful answers.

It actually helps.

Especially, I was not aware of the particular behavior of End_Of_File 
with a single line terminator before the file terminator.

In my case, I prefer to reserve exceptions for exceptional situations 
:-) so I've took the code from Niklas example.

Regards, Pascal.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Weird behavior of Get character with trailing new lines.
  2023-09-23  7:02   ` J-P. Rosen
  2023-09-23  8:39     ` Niklas Holsti
@ 2023-09-26  5:53     ` Randy Brukardt
  1 sibling, 0 replies; 10+ messages in thread
From: Randy Brukardt @ 2023-09-26  5:53 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]

"J-P. Rosen" <rosen@adalog.fr> wrote in message 
news:uem2id$moia$1@dont-email.me...
> Le 22/09/2023 à 22:05, Jeffrey R.Carter a écrit :
>> On 2023-09-22 21:30, Blady wrote:
>>>
>>> A.10.7 Input-Output of Characters and Strings
>>> For an item of type Character the following procedures are provided:
>>> procedure Get(File : in File_Type; Item : out Character);
>>> procedure Get(Item : out Character);
>>> After skipping any line terminators and any page terminators, reads the 
>>> next character from the specified input file and returns the value of 
>>> this character in the out parameter Item.
>>> The exception End_Error is propagated if an attempt is made to skip a 
>>> file terminator.
>>
>> As you have quoted, Get (Character) skips line terminators. End_Of_File 
>> returns True if there is a single line terminator before the file 
>> terminator, but False if there are multiple line terminators before the 
>> file terminator. So you either have to explicitly skip line terminators, 
>> or handle End_Error.
>>
> And this works only if the input file is "well formed", i.e. if it has 
> line terminators as the compiler expects them to be (f.e., you will be in 
> trouble if the last line has no LF).
> That's why I never check End_Of_File, but handle the End_Error exception. 
> It always works.

Agreed. And if the file might contain a page terminator, things get even 
worse because you would have to mess around with End_of_Page in order to 
avoid hitting a combination that still will raise End_Error. It's not worth 
the mental energy to avoid it, especially in a program that will be used by 
others. (I've sometimes used the simplest possible way to writing a 
"quick&dirty" program for my own use; for such programs I skip the error 
handling as I figure I can figure out what I did wrong by looking at the 
exception raised. But that's often a bad idea even in that case as such 
programs have a tendency to get reused years later and then the intended 
usage often isn't clear.)

                         Randy.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-26  5:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-22 19:30 Weird behavior of Get character with trailing new lines Blady
2023-09-22 19:52 ` Niklas Holsti
2023-09-22 20:05 ` Jeffrey R.Carter
2023-09-23  7:02   ` J-P. Rosen
2023-09-23  8:39     ` Niklas Holsti
2023-09-23  9:25       ` Dmitry A. Kazakov
2023-09-23 14:03         ` Niklas Holsti
2023-09-24  7:50           ` Dmitry A. Kazakov
2023-09-25 19:55       ` Blady
2023-09-26  5:53     ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox