Programming Challenge - count capital letters in a string

Here you can talk about anything related to BBC BASIC, not covered in another category
User avatar
hellomike
Posts: 192
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

Re: Programming Challenge - count capital letters in a string

Post by hellomike »

Yes, from the documentation only, I would assume that adding this line:

Code: Select all

   75 PRINT actual$
would show that the global variable is already altered and would print "Output string".

Thanks for clarifying.

Mike
KenDown
Posts: 327
Joined: Wed 04 Apr 2018, 06:36

Re: Programming Challenge - count capital letters in a string

Post by KenDown »

You are, of course, absolutely right about speed. I was merely responding to the function in the previous example by hellomike, which seemed unnecessarily complicated (to my mind). I also copied his use of i% for PTR, so thank you for the correction that it should have been i%%. (And thank you to hellomike, as I was not previously aware of that use for PTR!)
Hated Moron

Re: Programming Challenge - count capital letters in a string

Post by Hated Moron »

KenDown wrote: Wed 01 Mar 2023, 11:13 I also copied his use of i% for PTR, so thank you for the correction that it should have been i%%.
It's not at all clear in the published listings which code is hellomike's and which is yours, but for the avoidance of doubt i% (i.e. a 32-bit integer) is entirely suitable as an array index (as in b&(i%)) but not as a pointer (as in ?i%). A pointer should be either a 64-bit integer (e.g. i%%) or a variant (e.g. i).

There was also a suggestion made by somebody that this code:

Code: Select all

      p%%=PTR(a$):e%%=p%%+LEN(a$)-1
      FOR i%% = p%% TO e%%
        IF ?i%>64 IF ?i%<91 n%+=1
      NEXT
should be faster than:

Code: Select all

      FOR i%%=PTR(a$) TO PTR(a$)+LEN(a$)-1
        IF ?i%>64 IF ?i%<91 n%+=1
      NEXT
It isn't, they will run at exactly the same speed (by which, for the pedants, I mean immeasurably different). I suspect this misunderstanding may have arisen because of an assumption that the final value of the FOR loop (the value after TO) is re-evaluated each time around the loop, but it's not. The FOR statement (including evaluation of the TO and optional STEP expression) is executed only once.
Hated Moron

Re: Programming Challenge - count capital letters in a string

Post by Hated Moron »

The profiler can be very useful in identifying bottlenecks. Here is a report for Jonathan's routine, except that I have split the IF statement into multiple lines so that the time taken by the individual tests can be compared:

Code: Select all

Profiler report for count_capitals_speed.bbc

Fri. 03 Mar 2023, 10:44:20

Figures in the first column indicate approximate
time in milliseconds spent in each program line.

Figures in the second column indicate approximate
percentage of the time spent in each program line.

Time spent profiling: 83.098 seconds.

         0:                   HIMEM = PAGE + 10000000
         0:                   
         2:                   test$ = STRING$(10000, "This is a TEST string. It has some Capital Letters, some 123456789 numbers, and some that are NOT so capital.")
         4:                   TIME = 0 : PRINT FNstr_capsJH(test$), TIME
         0:                   QUIT
         0:                   
         0:                   DEFFNstr_capsJH(a$)
         0:                   LOCAL I%,N%
         0:                   IF a$="":=0
         0:                   FOR I%=1 TO LEN a$
     48608:    58.49          IF MID$(a$,I%,1)>="@" THEN
     34314:    41.29          IF MID$(a$,I%,1)<="_"THEN
        20:     0.02          N%+=1
        18:     0.02          ENDIF
        22:     0.03          ENDIF
       110:     0.13          NEXT
         0:                   =N%
         0:                   

         0:                   Libraries and immediate mode
You can see that (with the particular test string used) more than 58% of the time is spent in the first MID$() test, 41% in the second MID$() test (which is only executed if the first succeeds) and only 0.2% everywhere else! It's noteworthy that the FOR statement doesn't even take enough time to be measurable.

Clearly this is an extreme example, but the profiler can draw attention to code which is dominating the execution time and therefore may well be a target for optimising.
User avatar
JeremyNicoll
Posts: 73
Joined: Sun 26 Jul 2020, 22:22
Location: Edinburgh

Re: Programming Challenge - count capital letters in a string

Post by JeremyNicoll »

Hated Moron wrote: Tue 28 Feb 2023, 17:29
But of course you're right that the reason Jonathan's code is so terribly slow is not primarily the loop, it's the use of the MID$() string-slicing function. And the reason that MID$() is so slow is because it has to assume that evaluating the numeric parameter(s) might modify the string (although it probably won't) and therefore it has to copy the string first.
I've been puzzling over this for a while, and I still don't get it. Why would any value of either of the numeric parameters cause the string that's being sliced to get altered?

It's not as if MID$ is on the lefthand side of an assignment statement.

I would have assumed that the main reason this method is slow would just be the overhead of lots and lots of function call-and-return processing.


Also ..
Hated Moron wrote: Tue 28 Feb 2023, 17:29 If the syntax of the MID$() function had been MID$(start, count, string$) rather than MID$(string$, start, count) it could be much faster,
Why does the syntax matter? Is it something to do with how parameters to (any) function get passed to it? And in particular, the first parameter?
Hated Moron

Re: Programming Challenge - count capital letters in a string

Post by Hated Moron »

JeremyNicoll wrote: Fri 03 Mar 2023, 12:35 Why would any value of either of the numeric parameters cause the string that's being sliced to get altered?
It's unlikely, but since it's possible (and in that event the result of the MID$ function would be wrong) you can't risk it. There are several similar situations in the interpreter where you have to go to extra effort (sometimes with a performance cost) to ensure it still works correctly in 'edge cases'.

There's a classic example of when passing parameters into a FN/PROC naively can fail:

Code: Select all

PROCtest(a$,b$)
....

DEF PROCtest(b$,a$)
...
ENDPROC
If you do it naively a$ gets copied into b$ and then b$ gets copied into a$. Whoops!
Why does the syntax matter?
Because the parameter order matters. If the string was the final parameter to MID$, you could use its contents 'in situ' because there's no way they can have changed before the actual slicing operation is carried out. But when the string isn't the final parameter the evaluation of the subsequent parameter(s) could change the contents of the memory it occupies (occupied).
User avatar
JeremyNicoll
Posts: 73
Joined: Sun 26 Jul 2020, 22:22
Location: Edinburgh

Re: Programming Challenge - count capital letters in a string

Post by JeremyNicoll »

Hated Moron wrote: Fri 03 Mar 2023, 20:05
JeremyNicoll wrote: Fri 03 Mar 2023, 12:35 Why would any value of either of the numeric parameters cause the string that's being sliced to get altered?
It's unlikely, but since it's possible (and in that event the result of the MID$ function would be wrong) you can't risk it. There are several similar situations in the interpreter where you have to go to extra effort (sometimes with a performance cost) to ensure it still works correctly in 'edge cases'.
I would have expected the code inside however it is that you implement MID$ to validity check the values of 'start' and 'length' and either raise an error if they are (for 'start') outwith the length of the string, and (for 'length') longer than the remaining number of characters after the 'start' position, or if they're bad, simply return eg "" - which they do depends on your implementation choice for such things.

I would not expect the formal parameter copy of the string being sliced to get altered.

Also as the formal parms for MID$ are local to that function, how could the caller's string get changed? Even if weird values of 'start'/'length' did somehow get used?

I'd welcome an example of an edge case where this could happen. I can't imagine I'm the only person here who doesn't see how this can happen.


Hated Moron wrote: Fri 03 Mar 2023, 20:05 There's a classic example of when passing parameters into a FN/PROC naively can fail:

Code: Select all

PROCtest(a$,b$)
....

DEF PROCtest(b$,a$)
...
ENDPROC
If you do it naively a$ gets copied into b$ and then b$ gets copied into a$. Whoops!
When you say "If /you/ do it naively" ... do you mean "you" as in the implementer of the language?

I'm quite sure you, the implementer, won't have done it naively.

It's completely irrelevant that - lexically - the names of the formal parameters happen to match any of those used, regardless of their order, in the actual parameters when PROCtest is called. They' should just be arbitrarily-named placeholders for "the first, second, third... passed parameters". The value of the caller's first variable should be copied to the storage location that the PROC is using for its first formal parameter, and the value of the caller's second variable to the second such location. It's up to the way the implementor has designed variable lookup within the body of a PROC/FN to make sure that those /local/ locations are used for local variables of those names while the PROC/FN is executing.


I don't see how this theoretical example (of how not to implement parameter passing) has any bearing on the reason that MID$ was slow in the programming challenge.


Hated Moron wrote: Fri 03 Mar 2023, 20:05
JeremyNicoll wrote: Fri 03 Mar 2023, 12:35 Why does the syntax matter?
Because the parameter order matters. If the string was the final parameter to MID$, you could use its contents 'in situ' because there's no way they can have changed before the actual slicing operation is carried out. But when the string isn't the final parameter the evaluation of the subsequent parameter(s) could change the contents of the memory it occupies (occupied).
Is that because the caller might code something like

p$ = MID$(A$, (an expression that involves A$, & may change it, but produces a number), (another such expression)) ?

If so, I'd expect the interpreter to do something equivalent to

tempvar2 = (an expression that involves A$, & may change it, but produces a number
tempvar3 = (another such expression)
p$ = MID$(A$,tempvar2,tempvar3)

or possibly

tempvar1 = A$
tempvar2 = (an expression that involves A$, & may change it, but produces a number
tempvar3 = (another such expression)
p$ = MID$(tempvar1,tempvar2,tempvar3)

In general I think I'd expect a programmer who wants to allow A$ to change its value prior to MID$ being called explicitly to use the tempvar approach themselves, in the order they want those expresssions to be determined, so they're completely aware that the A$ that will be passed to MID$ isn't necessarily the same as A$ was prior to the statement containing the call.

Is there anywhere in the BBC BASIC manual where you describe this potential problem?


Normally the only reason that I'd consider parameter order is if I wanted the name of the function to sort-of match that order, so (a poor example) eg

DEF PROC_place_person_on_animal(rider,animal)

or

DEF PROC_mount_on(animal,rider)

and that would probably have less to do with a specific function and more to do with how a whole set of functions/procedures have been named and whether the calling code is primarily working through a list of animals or riders.

In other words, it'd have nothing at all to do with how functions/procedures are implemented.
User avatar
hellomike
Posts: 192
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

Re: Programming Challenge - count capital letters in a string

Post by hellomike »

Consider this:

Code: Select all

      A$="This is a test"
      PRINT MID$(A$, FNfromlib)
      PRINT A$
      END

      DEF FNfromlib
      A$="The Quick Brown Fox"
      =6
What would you expect (want) the two PRINT statements to print?

Regards,

Mike
User avatar
JeremyNicoll
Posts: 73
Joined: Sun 26 Jul 2020, 22:22
Location: Edinburgh

Re: Programming Challenge - count capital letters in a string

Post by JeremyNicoll »

hellomike wrote: Sat 04 Mar 2023, 11:04 Consider this:

Code: Select all

      A$="This is a test"
      PRINT MID$(A$, FNfromlib)
      PRINT A$
      END

      DEF FNfromlib
      A$="The Quick Brown Fox"
      =6
What would you expect (want) the two PRINT statements to print?
I'd have no expectations without first checking what the implementation was meant to do.

I wouldn't write code like that myself. I've learned over the years to be a careful programmer; if I were to write a function like FNfromlib I'd have made certain that I knew it changed A$ (and probably named A$ something like GLOBAL_A$ to make this a lot more obvious, and the function name would have been more appropriate too). So if, then, I'd had the call look like

Code: Select all

      GLOBAL_A$="This is a test"
      PRINT MID$(GLOBAL_A$, FNupdate_globals_and_set_pos)
      PRINT GLOBAL_A$
      END


I would have been more alert to the risk. But in fact I would have coded:

Code: Select all

      GLOBAL_A$="This is a test"
      frompos = FNupdate_globals_and_set_pos ;                REM remember this may reset GLOBAL_A$
      PRINT MID$(GLOBAL_A$, frompos)
      PRINT GLOBAL_A$
      END


and then there would be no surprises.

It's this sort of thing - unexpected side-effects of functions etc - that can make tracking down bugs incredibly hard.
User avatar
hellomike
Posts: 192
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

Re: Programming Challenge - count capital letters in a string

Post by hellomike »

It really is totally beyond the point whether you would or wouldn't write code like this.

The issue is pure a technical one of how the interpreter works internally. I think the snippet of code that I provided illustrates exactly why MID$() and similar functions, internally first make a copy and doing the actual instruction on the copy.

Cheers,

Mike