DeepSeek AI

Here you can talk about anything related to BBC BASIC, not covered in another category
Richard Russell
Posts: 458
Joined: Tue 18 Jun 2024, 09:32

Re: DeepSeek AI

Post by Richard Russell »

Richard Russell wrote: Tue 11 Feb 2025, 15:01 I asked DeepSeek to write BBC BASIC code for a two-input, two-layer Perceptron and this is what it produced.
The tabular output from the program doesn't give any indication of what the Perceptron does with input values other than the 'ideal' 0.0 and 1.0, so I got it to plot the entire 'landscape' of results with each input varying over the full range:

perceptron.jpg

Interestingly, although this particular output represents a near-perfect result, the learning process (even with 10,000 iterations) doesn't reliably generate it. Some runs give very different results, which whilst still solving the exclusive-or problem are far more 'marginal'. This is presumably because with training data consisting of only four different states, and nine weights to adjust, the model is under-constrained.
You do not have the required permissions to view the files attached to this post.
Richard Russell
Posts: 458
Joined: Tue 18 Jun 2024, 09:32

Re: DeepSeek AI

Post by Richard Russell »

Here's a hint for anybody asking DeepSeek for help in writing a BBC BASIC program. If you know that your program is going to need some BB4W or BBCSDL-specific features, or is going to need to call functions in one or more libraries, click on the paperclip icon on the DeepSeek front page and upload any relevant documentation, such as sections from the BB4W or BBCSDL manual. That may involve downloading them to your PC first.

It's entirely possible that those documents, if online, have already been ingested by DeepSeek during its training phase, but explicitly uploading them will ensure that they are taken into account and may raise their priority in its deliberations (although I don't know for sure that's the case). You must re-upload them in every session, DeepSeek won't remember them from one session to the next.

I've not tried this myself but in principle it should be a useful way of improving its understanding of BBC BASIC. With better known languages (e.g. C or Python), and therefore much more online material available for training, AIs are now approaching or exceeding the skills of the very best human programmers (one recent test assessed an AI as being better than 99.8% of human coders).
User avatar
Trackside
Posts: 7
Joined: Fri 09 May 2025, 14:17

Re: DeepSeek AI

Post by Trackside »

I use BBC BASIC, mainly Sophie's version on a RPi3b+, but also yours on the road (because it runs on my phone), to code my new type of AI.

It's mainly maths on paper, but I'm working on a version 2 (FPGA) simulator, which will support a benchmark. Currently working with Bath University and Chipstart UK, whom will run independent tests on the benchmark under NDA. (Hence also, no code sharing of the AI here.)

In order to benchmark it against other systems though, I need a basic ANN, preferably coded in BBC BASIC, to run as a comparison. Yours above, even though coded by an LLM, looks like a start. I'm also looking at NEAT (NeuroEvolution of Augmenting Topologies - evolutionary computing and ANN) in Lua, which could be converted and the evolutionary bit stripped out. That one comes with its own benchmark, which looks pretty, but isn't needed beyond its API. Description, in a video with links: https://www.youtube.com/watch?v=qv6UVOQ0F44
Richard Russell
Posts: 458
Joined: Tue 18 Jun 2024, 09:32

Re: DeepSeek AI

Post by Richard Russell »

DeepSeek confirms that it has only ingested plain text format BBC BASIC programs for training purposes, not tokenised / internal-format programs. Unfortunately this means that a large resource of potentially valuable training data (including all the libraries and example programs supplied with BBCSDL) has not been used, which is an unanticipated consequence of using tokenised files online for efficiency.

It's probably too late for this to be of much value now, but perhaps in the future more use should be made of plain-text BBC BASIC programs online. Or perhaps some web page could be created, by somebody cleverer than I am, which can de-tokenise on-the-fly to make the libraries and example programs available for AI training purposes.
jgharston
Posts: 50
Joined: Thu 05 Apr 2018, 14:08

Re: DeepSeek AI

Post by jgharston »

On my site where I have BASIC programs I've put a (L) link that detokenises the code and presents it as a textual listing, using an update of Ben Ryves' detokeniser, eg at: here.
jgharston
Posts: 50
Joined: Thu 05 Apr 2018, 14:08

Re: DeepSeek AI

Post by jgharston »

Richard Russell wrote: Sat 05 Jul 2025, 11:37 Or perhaps some web page could be created, by somebody cleverer than I am, which can de-tokenise on-the-fly to make the libraries and example programs available for AI training purposes.
It's fairly simple to do:
* Create a directory:
/bin
* Upload the files:
bbc.phps as /bin/bbc.php
bbctok.phps as /bin/bbctok.php
* Edit the /.htaccess file (or create one if it doesn't exist). If there is no RewriteEngine
section, add to the end:
RewriteEngine On
RewriteBase /
* Add to the end of the RewriteEngine section:
RewriteRule ^(.*?)\.(bas|BAS)$ /bin/bbc.php?file=$1.$2

Then any URL that ends ".bas" or ".BAS" will redirect to the detokeniser using the source URL with the extension changed to .bbc or .BBC or .src or .SRC

Eg a ref to blah/mytestprog.bas will redirect and fetch hlah/mytestprog.bbc and display it as text.
Richard Russell
Posts: 458
Joined: Tue 18 Jun 2024, 09:32

Re: DeepSeek AI

Post by Richard Russell »

jgharston wrote: Sat 05 Jul 2025, 15:32 It's fairly simple to do:
That sounds great, but - if I have understood your description correctly - I presume there would need to be some kind of index page, listing all the relevant programs but with .bas extensions rather than .bbc, which could be crawled by an AI looking for training data.

What's more, that index page would presumably have to be somewhere other than at Github, which is where the programs themselves are to be found, because I can't put the required .htaccess file at Github itself.

Before putting any effort into implementing this, can you confirm that it should work in such a cross-site manner?
jgharston
Posts: 50
Joined: Thu 05 Apr 2018, 14:08

Re: DeepSeek AI

Post by jgharston »

Richard Russell wrote: Sat 05 Jul 2025, 16:50 That sounds great, but - if I have understood your description correctly - I presume there would need to be some kind of index page, listing all the relevant programs but with .bas extensions rather than .bbc, which could be crawled by an AI looking for training data.
No. Wherever you write a page and say:
Here is a spiffy program that Bob has written that simulates 5.6-dimensional dice: (a href=dice56.bbc)dice56.bbc(/a)
..you just also put....
text version (a href=dice56.bas)here(/a)
Richard Russell wrote: Sat 05 Jul 2025, 16:50 What's more, that index page would presumably have to be somewhere other than at Github, which is where the programs themselves are to be found, because I can't put the required .htaccess file at Github itself.
It does require that you control the site and have php execution and url rewriting available. You should be able to have a reference on github to a site with the tokenised file on it, such as to bbcbasic.com. It would be no different to me posting here: Cursor.bbc - a link from bbcbasic.net to a URL on mdfs.net.

As the code stands at the moment, the pdp code has to be on the same site as the tokenised source BASIC. You couldn't have a tokenised file on GitHub, point a GitHub URL to it ending in .bas and have it translated. It would have to go via the PHP on the site the PHP is on, something like:

On this GitHub repository is (a href=src/demos/dice56.bbc)dice65.bbc(/a) (a href=http:bbcbasic.com/bin/bbctok.php?http:github/rtrussell/src/demos/dice56.bas)text version(/a)

...but it would need some work on the PHP code to work. I'll have a look when I have a break from ANSI colours with BBC BASIC on RunCPM. ;) if I can get my Linux installation to boot properly. CentOS claims my x86-64 is a 32-bit 686. :(
Richard Russell wrote: Sat 05 Jul 2025, 16:50 Before putting any effort into implementing this, can you confirm that it should work in such a cross-site manner?
I've got a GitHub repository, so I've got bbcbasic program files on there that I can use for testing, so I think I've got everything set up to get it working.
Richard Russell
Posts: 458
Joined: Tue 18 Jun 2024, 09:32

Re: DeepSeek AI

Post by Richard Russell »

jgharston wrote: Sat 05 Jul 2025, 19:25 No. Wherever you write a page and say:
Here is a spiffy program that Bob has written that simulates 5.6-dimensional dice: (a href=dice56.bbc)dice56.bbc(/a)
..you just also put....
text version (a href=dice56.bas)here(/a)
But that's not the situation we have, the question is about crawling the web to obtain training data for an AI. Since there is no such page or pages on the web that refers to the example programs as .bas files (hence why I was suggesting creating one) they will never be found whether or not a PHP script is present to de-tokenise them.
As the code stands at the moment, the pdp code has to be on the same site as the tokenised source BASIC.
Ah, it's not going to work then.

The PHP proxy I use for my Ceefax simulator does read files from another site, so there would probably be a way to achieve the required behaviour, in principle, but I don't have the necessary skill.