Programming Challenge - count capital letters in a string

Here you can talk about anything related to BBC BASIC, not covered in another category
DDRM

Re: Programming Challenge - count capital letters in a string

Post by DDRM »

Hi Richard,

I'm interested! I didn't bother to comment, because I thought your explanation was perfectly clear. I'd been looking for a solution that would catch ALL the non-alphabetic characters in the range 64-95 at once, but I couldn't find one. Given that there are only a handful, though, filtering them individually is perfectly reasonable.

Best wishes,

D
Hated Moron

Re: Programming Challenge - count capital letters in a string

Post by Hated Moron »

On 28/02/2023 00:45, J.G.Harston wrote (cross-posted from the Discussion Group):
I think I'd just do it naively:

DEFFNstr_caps(A$)
LOCAL i%,n%
IF A$="":=0
FOR i%=1 TO LEN A$
IF MID$(A$,i%,1)>="A":IF MID$(A$,i%,1)<="Z":n%=n%+1
NEXT i%
=n%
A naïve approach is fine, when you don't care how long it takes, but with an interpreted language like BBC BASIC speed is often an important factor. I did a quick comparison of your method compared with David's, yours took about 2,500 (two-and-a-half-thousand) times longer than his!

For a 1 Mbyte string (which is not at all unreasonable in a Word Processing context) David's code took 70 ms to count the capital letters on this PC, yours took three minutes. :shock:

After all, the only reason the 'array arithmetic' operators were added to BBC BASIC (in 1986 or thereabouts) was speed; you can do the same things without them using a trivial loop (two nested loops in the case of the dot-product). If you eschew them in favour of the naïve approach, you pay a high price in performance.

By leveraging the array arithmetic in a non-standard ('clever') way you can reap even greater speed benefits, as I have shown.
HAB
Posts: 1
Joined: Thu 05 Apr 2018, 07:15

Re: Programming Challenge - count capital letters in a string

Post by HAB »

Hated Moron wrote: Mon 27 Feb 2023, 23:14 Judging by the deafening silence, the solution doesn't seem to have been of any interest either! Does nobody think BBC BASIC 'hacks' of this kind are worthy of note?
They absolutely are worthy of note and you have pointed out just how valuable an in-depth understanding of the language can be. Although this particular application might be narrowly specific, you have used it to demonstrate the worth of an important feature of the language which even experienced programmers might overlook.

I tend to keep a copy of these things in a folder which could be named "Don't when I will need this BUT one day ..."

Please keep up the good work!
User avatar
hellomike
Posts: 192
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

Re: Programming Challenge - count capital letters in a string

Post by hellomike »

To start with, BBC BASIC array arithmetic operators are extremely powerful. No question about it.

However I like to point out that using a loop as such is not the reason for the performance drop. I.e. this ´naïve´ approach is 160 times faster (tested using a 1Mbyte string as input).

Code: Select all

      DEFFNstr_caps_fast(RETURN A$)
      LOCAL i%,n%,b&()
      IF A$="":=0
      DIM b&(LEN(A$)) : $$^b&(0)=A$
      FOR i%=0 TO LEN A$-1
        IF b&(i%)>=65 IF b&(i%)<=90 n%+=1
      NEXT i%
      =n%
Regards,

Mike
Hated Moron

Re: Programming Challenge - count capital letters in a string

Post by Hated Moron »

hellomike wrote: Tue 28 Feb 2023, 14:59 However I like to point out that using a loop as such is not the reason for the performance drop. I.e. this ´naïve´ approach is 160 times faster (tested using a 1Mbyte string as input).

Code: Select all

      DEFFNstr_caps_fast(RETURN A$)
      LOCAL i%,n%,b&()
      IF A$="":=0
      DIM b&(LEN(A$)) : $$^b&(0)=A$
      FOR i%=0 TO LEN A$-1
        IF b&(i%)>=65 IF b&(i%)<=90 n%+=1
      NEXT i%
      =n%
It's still about fifteen times slower than DDRM's code though (by my measurement), so using array arithmetic wins by a large margin. And anyway your code is by no means 'naïve', in my judgement: 'standard' BBC BASIC doesn't even have byte arrays or the 'address of' operator so copying a string into a byte array is quite an advanced technique.

But of course you're right that the reason Jonathan's code is so terribly slow is not primarily the loop, it's the use of the MID$() string-slicing function. And the reason that MID$() is so slow is because it has to assume that evaluating the numeric parameter(s) might modify the string (although it probably won't) and therefore it has to copy the string first.

If the syntax of the MID$() function had been MID$(start, count, string$) rather than MID$(string$, start, count) it could be much faster, because the string could be used in-situ rather than a copy being made. A case of what was probably an arbitrary choice, originally, having major consequences downstream.

(Incidentally your code doesn't need to use RETURN A$ since it doesn't modify the string. The RETURN will slow it down fractionally).
User avatar
hellomike
Posts: 192
Joined: Sat 09 Jun 2018, 09:47
Location: Amsterdam

Re: Programming Challenge - count capital letters in a string

Post by hellomike »

Thanks for the information Richard. You are always very thorough and brilliant in explaining such details.

I used RETURN A$ because I thought calling by reference prevents the interpreter making a copy (and releasing it on exit) of the input on the stack, saving time, especially when 1MB chunks are involved.

Also I realized that the following function body is even easier and faster (240 times instead of 160 times).

Code: Select all

      DEFFNstr_caps_fast(RETURN A$)
      LOCAL i%,n%
      IF A$="":=0
      FOR i%=PTR(A$) TO PTR(A$)+LENA$-1
        IF ?i%>64 IF ?i%<91 n%+=1
      NEXT i%
      =n%
Still considerably slower than using array arithmetic operators but arguably easier to read/understand and maintain for a common programmer (like me).

Regards,

Mike
KenDown
Posts: 327
Joined: Wed 04 Apr 2018, 06:36

Re: Programming Challenge - count capital letters in a string

Post by KenDown »

But this is faster still and - to my simple brain at least - seems to be simpler both to write and to execute.

Code: Select all

      l%=1024*500
      a$=STRING$(l%,"*")
      FORi%=1TOl%:MID$(a$,i%,1)=CHR$(32+RND(124)):NEXT

      TIME=0
      PRINTFNstr_caps_fast(a$)
      PRINTTIME
      TIME=0
      PRINTFNcaps(a$)
      PRINTTIME
      END
      :
      DEFFNstr_caps_fast(RETURN A$)
      LOCAL i%,n%
      IF A$="":=0
      FOR i%=PTR(A$) TO PTR(A$)+LENA$-1
        IF ?i%>64 IF ?i%<91 n%+=1
      NEXT i%
      =n%
      :
      DEFFNcaps(a$):LOCALi%,n%,p%,e%
      IFa$="":=0
      p%=PTR(a$):e%=p%+LEN(a$)-1
      FORi%=p%TOe%
        IF?i%>64IF?i%<91n%+=1
      NEXT
      =n%
Not much in it, but still ...
Hated Moron

Re: Programming Challenge - count capital letters in a string

Post by Hated Moron »

hellomike wrote: Tue 28 Feb 2023, 18:38 I thought calling by reference prevents the interpreter making a copy (and releasing it on exit) of the input on the stack, saving time, especially when 1MB chunks are involved.
No, the opposite is the case: adding RETURN increases the number of string copies, it doesn't reduce them! When the function or procedure is called, the actual parameter is copied into the formal parameter; this happens whether a RETURN is present or not. On return from the FN/PROC, if there is no RETURN the formal parameter is simply discarded, but with RETURN the formal parameter is copied back into the actual parameter. So, in essence, without RETURN there is one copy and with RETURN there are two copies.

Although the documentation describes RETURN as passing the string parameter by reference it's a lie; the end result is similar but the mechanism is quite different. BBC BASIC has no variable aliases, there is no way that two different variables - in this case the actual parameter and the formal parameter - can refer to the same object in memory. Consider the following code:

Code: Select all

   10 actual$ = "Input string"
   20 PROCtest(actual$)
   30 PRINT actual$
   40 END
      
   50 DEF PROCtest(RETURN formal$)
   60 PRINT formal$
   70 formal$ = "Output string"
   80 ENDPROC
For BBC BASIC to support 'true' passed-by-reference parameters (as some BASICs do using the BYREF keyword) the variable formal$ would need to refer to the same string as the variable actual$, but there is simply no way that can be achieved given how BBC BASIC's variables are organised in memory.

This can easily be demonstrated by replacing the ENDPROC with GOTO 30.
Hated Moron

Re: Programming Challenge - count capital letters in a string

Post by Hated Moron »

I wrote this program to time the various different methods of counting the number of capital letters (more precisely the number of characters in columns 4 and 5 of the ASCII table) in a string:

Code: Select all

      HIMEM = PAGE + 10000000

      test$ = STRING$(10000, "This is a TEST string. It has some Capital Letters, some 123456789 numbers, and some that are NOT so capital.")
      TIME = 0 : PRINT FNstr_caps1(test$), TIME
      TIME = 0 : PRINT FNstr_caps2(test$), TIME
      TIME = 0 : PRINT FNstr_caps3(test$), TIME
      TIME = 0 : PRINT FNstr_caps4(test$), TIME
      END

      DEFFNstr_caps1(a$)
      LOCAL a&()
      DIM a&(LENa$)
      $$^a&(0) = a$
      a&() AND= %100000
      =LEN(a$)-SUM(a&())/32

      DEFFNstr_caps2(a$)
      LOCAL N%,p%%
      IF a$="":=0
      FOR p%% = PTR(a$) TO PTR(a$)+LEN(a$)-1
        IF ?p%%>=&40 IF ?p%%<=&5F N%+=1
      NEXT
      =N%

      DEFFNstr_caps3(a$)
      LOCAL I%,N%,b&()
      IF a$="":=0
      DIM b&(LEN(a$)) : $$^b&(0)=a$
      FOR I%=0 TO LEN a$-1
        IF b&(I%)>=&40 IF b&(I%)<=&5F N%+=1
      NEXT
      =N%

      DEFFNstr_caps4(a$)
      LOCAL I%,N%
      IF a$="":=0
      FOR I%=1 TO LEN a$
        IF MID$(a$,I%,1)>="@" IF MID$(a$,I%,1)<="_" N%+=1
      NEXT
      =N%
Try it yourself, you should find that DDRM's non-loop (array arithmetic) method is far faster than any of the others, albeit that it would also count characters in columns 0 and 1. That can be fixed, without introducing a loop, but will increase the time.
Hated Moron

Re: Programming Challenge - count capital letters in a string

Post by Hated Moron »

KenDown wrote: Tue 28 Feb 2023, 21:17 But this is faster still and - to my simple brain at least - seems to be simpler both to write and to execute.

Code: Select all

      DEFFNcaps(a$):LOCALi%,n%,p%,e%
      IFa$="":=0
      p%=PTR(a$):e%=p%+LEN(a$)-1
      FORi%=p%TOe%
        IF?i%>64IF?i%<91n%+=1
      NEXT
      =n%
Putting to one side that your code is broken (pointers need to be 64-bit capable, so for example i%% here rather than i%) this is yet another method using a loop. The challenge stated: "you are not allowed to use any kind of loop, iteration or recursive function call - linear code only!".

There are several 'non qualifying' methods using a loop, and whilst they vary in their efficiency even the fastest of them is about seven times slower than DDRM's non-loop solution, using his test string, or about four times slower if his solution is modified not to count characters in columns 0 and 1.

It's also worth noting that the 'loop' solutions will get even slower if the proportion of capital letters in the string increases, whereas the non-loop solution takes the same time however few or many capital letters there are.

So the moral of the story, if there is one, is that whenever iterating through all the elements of an array (and a string is conceptually an array of characters) think about whether you might be able to use array arithmetic. If you can, there will probably be a big speed benefit.