Home-Server MachineLearning

Using AI in programming Part 3

To make it short: I’m now using an existing opensource tool for VSCode which can point to my local server. It’s called open copilot and is licensed under Apache 2 which allows commercial use.

Once setting up the url for your local server you can query the extension “cody” with the icon on the left sidebar or within inline code.

Its somewhat usable on my machine.


Using AI in programming Part 2

Here i will describe how to install the code llama modell on one of your VM’s. Problem here is that you usually loose the GPU support because MS is still not able to provide GPU-P for Hyper-V. TY MS at this point.

First prepare a vm, I’m choosing ubuntu srv 22.04.03 LTS. Assigning 8 cores and 64GB of RAM.

Check if git is installed (insert “git –version” in the terminal):

Clone the llama.cpp project, insert the following commands in the terminal (“make” w/o anything is cpu only, you may need to install the gcc compiler):

git clone
cd llama.cpp
sudo apt install build-essential //gcc compiler

Downloading the model can be done with a nice script:

bash <(curl -sSL -m TheBloke/CodeLlama-70B-hf-GGUF --concurrent 8 --storage models/

After downloading the modell, we can start llama.cpp.

./server \
    --model ./models/TheBloke_CodeLlama-70B-hf-GGUF/codellama-70b-hf.Q5_K_M.gguf --host --port 8080 -c 2048

Next step would be to create a plugin for VS e.g. to autofill code from commentblock.


Using AI in programming

This is a an example on how to use AI in your own environment. I’m using the popular LLM Code Llama from Meta which is open source (also for commercial use) and supports C#, my main programming language. Another advantage is that you can use it in your own environment and your most value asset the sourcecode will not leave your network.

Code Llama comes in different sizes, I’m currently using the 70B Model which returns the best result but requires quite a bit of resources. I’m showing you how to self host a tool which runs the llm locally on a central server and write a simple VSCode extension to access the api to query results.

But first lets try to run it locally on my development machine. For this just install ollama (MIT License) and select the model from meta.

It usually gives you a code snipped and an explanation. It works, but it may be a bit inconvenient. In the next post, I’ll show you how to run the modell on your own virtual environment and access it via web or inside your code editor.


Speeding up storage (with NVMe)

I just found those PCIe card from Asus (ASUS Hyper M.2 x16 Gen 4) for about 50€ where you can connect 4 additional NVMe drives to your Mainboard. Its for my Epyc board (ArockRack ROMED8-2T) but it works also with sTRX40 boards. I’ve put 4x Samsung 970 EVO Plus in a Raid 1 configuration. The storage will be used primarily for the VMs and the Databases. For the adapter to work you will need to change the PCIe lane configuration of the slot from 16x to 4x+4x+ 4x+4x.


My old Nas

This is my old NAS based on an Intel Core i3-6100 from 2016. The system is using 5 Seagate NAS HDDs (ST4000VN000) in a storage space configuration using raid 5 mode. It was reliable, quiet and it ran almost 5 years without any issue. However it was too slow and i needed more space. It will serve as a playground for my future Youtube videos and maybe later as a backup machine.

My old NAS in a Node 304 casing
Status of my old NAS, some data is already moved to my new Server.

New HBA has arrived

Finally i received my new Microsemi HBA 1100-16i SAS Controller. Now i can begin with setting up my hyperconverged server. This host bus adapter offers 4×4 internal Sata/SAS channels. Together with the ports from the mainboards this should be more than enough for any upgrades in the next few years.

Microsemi HBA 1100-16i


I recently moved my files from a NAS to a Windows Server 2019 installation. For this I’m using Storage Spaces with tiered storage.

See my wiki for the HowTo: Create tiered storage spaces or my Youtube video.