GET$# inconsistency

Discussions related to database technologies, file handling, directories and storage
Richard Russell
Posts: 366
Joined: Tue 18 Jun 2024, 09:32

GET$# inconsistency

Post by Richard Russell »

I've noticed an inconsistency in the way GET$# works in ARM BASIC V and BBC BASIC (Z80) v5.00 - the only two versions of BBC BASIC which support GET$# whilst also imposing a string length limit of 255 bytes, I think.

The difference can be illustrated by writing two 255-byte strings using BPUT# and reading them back using GET$#, for example:

Code: Select all

   10 F%=OPENOUT"TEMP"
   20 BPUT#F%,STRING$(255,"A")
   30 BPUT#F%,STRING$(255,"B")
   40 CLOSE #F%
   50 F%=OPENIN"TEMP"
   60 A$=GET$#F%
   70 PRINT LEN A$
   80 B$=GET$#F%
   90 PRINT LEN B$
  100 CLOSE #F%
In BBC BASIC (Z80) this code outputs:

Code: Select all

       255
       255
but in ARM BASIC V it outputs:

Code: Select all

       255
       0
By contrast if you run this code:

Code: Select all

   10 F%=OPENOUT"TEMP"
   20 BPUT#F%,STRING$(255,"A");
   30 BPUT#F%,STRING$(255,"B");
   40 CLOSE #F%
   50 F%=OPENIN"TEMP"
   60 A$=GET$#F%
   70 PRINT LEN A$
   80 B$=GET$#F%
   90 PRINT LEN B$
  100 CLOSE #F%
BBC BASIC (Z80) outputs:

Code: Select all

       0
       254
whilst ARM BASIC V outputs:

Code: Select all

       255
       255
I'm not going to pass judgement on whether one is 'right' and the other is 'wrong', but it's noteworthy - and arguably unfortunate - that there is a difference.

I would add that it's possible to make BBC BASIC (Z80) return the same result as ARM BASIC V in the latter case by making the following modification:

Code: Select all

   10 F%=OPENOUT"TEMP"
   20 BPUT#F%,STRING$(255,"A");
   30 BPUT#F%,STRING$(255,"B");
   40 CLOSE #F%
   50 F%=OPENIN"TEMP"
   60 A$=GET$#F% BY 255
   70 PRINT LEN A$
   80 B$=GET$#F%
   90 PRINT LEN B$
  100 CLOSE #F%
Of course ARM BASIC V doesn't support the 'BY' qualifier.
jgharston
Posts: 39
Joined: Thu 05 Apr 2018, 14:08

Re: GET$# inconsistency

Post by jgharston »

Code: Select all

   10 F%=OPENOUT"TEMP"
   20 BPUT#F%,STRING$(255,"A")
   30 BPUT#F%,STRING$(255,"B")
   40 CLOSE #F%
   50 F%=OPENIN"TEMP"
   60 A$=GET$#F%
   70 PRINT LEN A$
   80 B$=GET$#F%
   90 PRINT LEN B$
  100 CLOSE #F%
PDP11 BASIC also gives:

Code: Select all

       255
       0

Code: Select all

   10 F%=OPENOUT"TEMP"
   20 BPUT#F%,STRING$(255,"A");
   30 BPUT#F%,STRING$(255,"B");
   40 CLOSE #F%
   50 F%=OPENIN"TEMP"
   60 A$=GET$#F%
   70 PRINT LEN A$
   80 B$=GET$#F%
   90 PRINT LEN B$
  100 CLOSE #F%
PDP11 BASIC gives:

Code: Select all

       255
       254
!

That is peculiar, to say the least. I shall add it to my "to look at" list. The code logic is fairly naive:

Code: Select all

ADR  SV_STRING,R4	; Point to string buffer
CLR  R3			; String length=0
.fnGETlp
JSR  PC,IO_BGET
BCS  fnGETend		; End of file
CMP  R0,#10
BEQ  fnGETend		; LF - end of line
CMP  R0,#13
BEQ  fnGETend		; CR - end of line
MOVB R0,(R4)+		; Store in string buffer
INC  R3
CMP  R3,#255
BCS  fnGETlp		; Loop for up to 255 characters
Richard Russell
Posts: 366
Joined: Tue 18 Jun 2024, 09:32

Re: GET$# inconsistency

Post by Richard Russell »

If you want to write code which will successfully read a 255-byte LF-terminated string, typically written with BPUT#, and which will work in both in ARM BASIC V and BBC BASIC (Z80) v5, you can do this:

Code: Select all

P% = PTR#F% : A$ = GET$#F% : IF LEN(A$)=255 THEN PTR#F% = P% + 256
This copies the file pointer before reading the string and then, if the string was 255 bytes long (which is the only problematical case), sets the file pointer to 256 beyond the copied position. The effect of this is to advance the pointer past the LF terminator in ARM BASIC V, but to do nothing in BBC BASIC (Z80) because the pointer has already been advanced.

It's interesting, with hindsight, to speculate on why Sophie made ARM BASIC V work the way it does. She must have realised that it resulted in reading a 255-byte (terminated) string behaving differently from a string of any other length, which isn't ideal, but what it did also do was make it possible to read from a file containing unterminated data (by reading it in 255-byte 'chunks').

Perhaps she felt the latter capability was too valuable to sacrifice. Fortunately in BBC BASIC (Z80) v5 I've introduced the BY qualifier (which works the same way as it does in BB4W, BBCSDL and BBCTTY) so it's possible to read a file containing arbitrary data in 255-byte chunks without impacting on how terminated strings, including 255-byte strings, are read.
jgharston
Posts: 39
Joined: Thu 05 Apr 2018, 14:08

Re: GET$# inconsistency

Post by jgharston »

Richard Russell wrote: Sun 29 Dec 2024, 14:39 If you want to write code which will successfully read a 255-byte LF-terminated string, typically written with BPUT#, and which will work in both in ARM BASIC V and BBC BASIC (Z80) v5, you can do this:

Code: Select all

P% = PTR#F% : A$ = GET$#F% : IF LEN(A$)=255 THEN PTR#F% = P% + 256
Or, if your program is happy with the I/O hit:
P% = PTR#F% : A$ = GET$#F% : PTR#F% = P% + 1 + LEN A$
Richard Russell wrote: Sun 29 Dec 2024, 14:39 It's interesting, with hindsight, to speculate on why Sophie made ARM BASIC V work the way it does. She must have realised that it resulted in reading a 255-byte (terminated) string behaving differently from a string of any other length, which isn't ideal, but what it did also do was make it possible to read from a file containing unterminated data (by reading it in 255-byte 'chunks').
Without denigrating Sophie's programming skills, I'm inclined to believe it was a simple oversight.

I ran some tests on various BBC BASICs and collated them. I too am undecided what should be the "right" implementation. The test code immediately showed I've got something wrong with EOF handing in the the PDP11 Unix OS interface. ;) And I'm doing *something* wrong with unterminated strings.

Part of me says that if I can write a terminated string of any length, I should always be able to read back that same terminated string. But that would require reading ahead of a 255-byte string to see if it is terminated, and retreating if not.

Another part of me says the code should just be a simple loop until end_condition, the end condition being len=255 OR eof OR eol.

I had hoped the Acorn BBC BASIC reference manual would shed some light, but it could be read ambiguously:
"read(s) until a linefeed (ASCII 10), carriage return (ASCII 13) or the end of the file is encountered, or else the maximum of 255 characters is
reached."

I was hoping it would be stricter, somewhat along the line of "reads a terminated string of up to 255 characters". But then, the documentation does describe the implementation: REPEAT UNTIL len=255 OR eof OR eol
Richard Russell
Posts: 366
Joined: Tue 18 Jun 2024, 09:32

Re: GET$# inconsistency

Post by Richard Russell »

jgharston wrote: Tue 31 Dec 2024, 14:21 Or, if your program is happy with the I/O hit:

Code: Select all

P% = PTR#F% : A$ = GET$#F% : PTR#F% = P% + 1 + LEN A$
Indeed, but setting the file pointer (even to its current location) is typically quite expensive and I wanted to avoid doing that except when it is actually necessary.
Without denigrating Sophie's programming skills, I'm inclined to believe it was a simple oversight.
We'll never know, but I wonder if perhaps she tested it with code similar to this. It writes LF-terminated strings with lengths 0-255 and reads them back again, checking that they agree with what was written:

Code: Select all

   10 F%=OPENOUT"TEMP"
   20 FOR I%=0 TO 255
   30   BPUT#F%,STRING$(I%,CHR$(64 + I% MOD 32))
   40 NEXT
   50 CLOSE #F%

   60 F%=OPENIN"TEMP"
   70 FOR I%=0 TO 255
   80   IF GET$#F%<>STRING$(I%,CHR$(64 + I% MOD 32)) PRINT "Failed!":STOP
   90 NEXT
  100 CLOSE #F%
This code will fail to detect the 'problem' with the 255-byte string because it's the last to be read, so the file-pointer being left mispositioned (before the LF terminator rather than after) isn't apparent. Reversing the order (FOR I% = 255 TO 0 STEP -1) would demonstrate it. ;)
Part of me says that if I can write a terminated string of any length, I should always be able to read back that same terminated string.
I agree, you should. As I mentioned before, my BY extension lets me off the hook, because it allows me always to read the LF terminator when there is no BY qualifier, but to read only the specified number of bytes when there is:

Code: Select all

   10 F%=OPENOUT"TEMP"
   20 FOR I%=0 TO 255
   30   BPUT#F%,STRING$(I%,CHR$(64 + I% MOD 32));
   40 NEXT
   50 CLOSE #F%

   60 F%=OPENIN"TEMP"
   70 FOR I%=1 TO 255
   80   IF (GET$#F% BY I%)<>STRING$(I%,CHR$(64 + I% MOD 32)) PRINT "Failed!":STOP
   90 NEXT
  100 CLOSE #F%
Personally I think this is the best of both worlds, and results in code that is nicely consistent.
jgharston
Posts: 39
Joined: Thu 05 Apr 2018, 14:08

Re: GET$# inconsistency

Post by jgharston »

jgharston wrote: Tue 31 Dec 2024, 14:21I ran some tests on various BBC BASICs and collated them.
...
The test code immediately showed I've got something wrong with EOF handing in the the PDP11 Unix OS interface. ;)
Doh! While the I/O layer was correctly setting an "eof already reported" flag so that you got byte n-2, byte n-1, byte n-0, Carry set, EOF error; the code path never *cleared* the "eof already reported" flag, so the *next* time you BGET'ed past the end of a file, you never got the "Carry set" state. :roll:
Richard Russell
Posts: 366
Joined: Tue 18 Jun 2024, 09:32

Re: GET$# inconsistency

Post by Richard Russell »

jgharston wrote: Fri 03 Jan 2025, 13:48 so that you got byte n-2, byte n-1, byte n-0, Carry set, EOF error
I confess I've never bothered to implement, in any of my versions of BBC BASIC, the two-stage end-of-file processing whereby you initially get EOF#file set to TRUE, but if your program ignores that you later get an EOF error. It's probably quite a good system, but seemed overkill when laziness was my main motivation!