recap 2024 - updates at OSSO
2024 – a story in four acts
The end of the year. A good time to reflect on what happened the past twelve months. The ups, the downs. What did we achieve? And what can we look forward to?
Some of you have asked to be updated on what we’re working on at OSSO. We’ll gladly share the gist here in this recap, which is in English for once (after four Dutch editions in 2020, 2021, 2022 and 2023). Now our expat friends who have been slow to learn Dutch can read it too. You know who you are. π
The year 2024 has been “eventful”, if we may call it that. On a personal level, a personell level and on an interpersonal level: we have lost Herman’s beloved father, we’re saying goodbye to a colleague, but we’re gaining two, and we’ve discovered that project pressure can cause β unseen before β strain between otherwise happy colleagues.
The year has also been fruitful. Things mentioned in previous years β multitenant logging, hardened kubernetes security β have been deployed for various customers.
We had one grave incident this year. On the 6th of December a DDoS was not properly detected due to human error. This caused severe packet loss for most of our incoming traffic for about 12 minutes. But other than that, we can say we did well, from an uptime perspective.
Here’s an overview of the rest of this post, in case you want to skip to a particular section:
A story in four acts
-
Act I: Certification Frustration β Words about certifications and pain points and learning we did.
-
Act II: Rejuvenation β About parting with people and fresh faces.
-
Act III: Technicalities β The tech we’ve been working on for the past year, what improvements we have made, and a glimpse behind the curtain for next year.
-
Act IV: Fun and Future β Things we did as a team and things to come.
Act I: Certification Frustration
Certification is not the most fun.
PCI aggravation
Ah, the acronym PCI, short for PCI-DSS, short for Payment Card Industry Data Security Standard. Three letters that can evoke strong emotions at HQ.
In 2023 we started a process to be PCI-DSS certified for a subset of our systems. This process turned out to be more involved than we had anticipated. We had the ISO/NEN certifications and even an undisclosed third one. How much work could another one be?
A lot. Apparently.
This is not the place to rehash all the details needed to certify, but among others it involved:
- altering our workflow when accessing systems;
- adding technical layers of security in prescribed β in detail! β forms;
- doing more reviews and approvals.
If you look at the requirements from a bird’s-eye perspective, many are good. You can’t be against more layers of security. We love extra security. And this whole process has absolutely improved certain aspects of the systems we manage.
But, adding more layers and sidecars can chafe when it objectively does nothing but check a box.
You see, we believe in running lean systems. Removing parts that we know aren’t going to be useful, reduces upgrade burdens and attack scope. This has been an integral part of our modus operandi.
“Deploy het nou maar gewoon."
When we felt forced to add stuff that just added bloat (or worse) for no benefit, this went against the OSSO-grain and caused frustration between colleagues. That took more than a single beer to remediate.
Communication was lacking. Assumptions were made. Conclusions were drawn.
Words matter. It’s important for some of us to recognize that not everyone voices their frustrations right away.
PCI goodness
Luckily there was also a βοΈ flip side.
We worked hard to get everything in place. Everyone played their part. And aside from having to make the compromises we mentioned, we did in the end gain more than we lost. Security has been improving. Consistency across deployments and core systems has improved. Documentation about resources has been amended. And we gained the change management workflow.
The verdict isn’t completely out on that one yet. It certainly slows things down. But everyone can agree that, in the long run, change documentation and reasoning is a good thing. And the extra review can reduce future π£ booby-traps.
Additionally, this process of extra documentation and review made our other audits this year a breeze. In fact, the auditor even cancelled the second day of auditing, deeming it unnecessary.
The pressure was high, it was stressful. But we learnt things, about ourselves and each other. And OSSO is now a better π OSSO.
Certifications in 2024
- BHV-hulpverlening β β like last year, we made it through a fun and educational day of Company Emergency Response
- ISO27001 β
- NEN7510 β
- PCI-DSS (for the in-scope systems) β β we pulled through, and we do benefit from its implementation
Act II: Rejuvenation
We lost and we gained.
Ronald, gone-ald
Our network engineer, necromancer, guy who could revive a brick if needed, always happy to indulge us with a chuckle after a bad wordplay, will be leaving us at the end of the year. He had already been gone, twice. But he had returned twice as well.
It looks like this time he’ll be leaving for good. While we did not discuss all his motivations, it’s safe to assume that the dreaded PCI acronym had been an influence on Ronald’s motivation to move on from OSSO.
Lucky for us, Ronald won’t be leaving for Australia (again), but to the complementary business 2HipβοΈ, which is in the close vicinity, so we hope to see him around. π
Actually, I know for a fact we’ll be seeing remnants of Ronald in the infrastructure for the foreseeable future anyway. Meanwhile, Jordi has taken over Ronald’s desk and has been gaining extra πͺ powers every day.
Pery
On the 17th of July this year, Herman’s father Pery Bos passed away. He will be remembered as a fit father and grandfather. Always enthousiastic, happy to chat and happy to help.
You will be missed.
Young blood
Lucky for us, we’ve also welcomed new OSSO family members: more youngsters.
We’re thrilled to see them thrive β they love to learn, and we love to teach. It’s a win for the organization, too, as they shed light on things that might not be obvious to the old guard.
We’ve had to pick up some Gen-Z lingo just to keep up. But hey, it was totally worth it. LET’S GOOOOOOO!
Seppe
Seppe joined at the start of 2024 and since July 2024 he has been incorporated into the 24/7 on-call schedule.
He writes:
Eerst was het wel even wennen want bij OSSO is het een hechte groep waar ze al lang samen werken. Gelukkig kende ik Robin al van een ander bedrijf waar we samen hebben gewerkt. Het matchte daarom al snel. […] Wat opvalt is dat ze bij OSSO niet snel stilzitten en er altijd genoeg te doen is: klanten eerst, en anders zijn er genoeg interne projecten of lijsten aan mogelijke verbeteringen. Ook is er idioot veel kennis binnen het team. Alex kubectl’t met zijn ogen dicht en Walter lijkt bijna alles wel te kunnen. En Ronald heeft het hele OSSO netwerk in zijn hoofd. […] Minpuntje: ik werk hier nu bijna een jaar en nog steeds sta ik niet op de OSSO website. Gelukkig hebben we betere dingen te doen dan de website updaten. π
We’re glad to inform Seppe that the website issue has been resolved. β¨
Seppe is a really nice guy. Eager and enthusiastic, always ready to go the extra mile. And he knows a hell of a lot already.
Sometimes he can be a bit of a loose cannon β a little too much π₯, I suppose β but we think we got it managed.
Emilia
Emilia started as a trainee in September 2024. Unlike most colleagues, she skipped the Hanze part. No biggie. We all know that experience and drive is not taught in school.
She writes:
Ik was erg blij toen ik van Herman hoorde dat ik een traineeship bij OSSO mocht doen. Bij mijn vorige baan was de uitdaging en interesse wat weggevallen. Er was daar geen ruimte om dieper op dingen in te gaan, omdat daar geen behoefte aan was. […] Toen ik bij OSSO binnenkwam en dingen ging doen met Kubernetes (en Cilium, Argo, etc.), Ansible, BGP en meer, vond ik dat supercool. Ik mocht meteen beginnen om de testomgevingen compleet te vervangen door vijf nieuwe clusters. Dit vond ik echt leuk en leerzaam omdat het mij meteen een idee gaf hoe dit er ongeveer aan toe gaat bij het opzetten van een nieuwe klant. […] Het SONiC-uitje naar Duitsland vond ik heel interessant en het was ook een uitgelezen kans om m’n collega’s beter te leren kennen. Dit heeft enorm bijgedragen aan het vinden van mijn plek binnen het team.
In the relatively short time she’s been here, we’ve grown fond of Emilia. She picks up things quickly. And she doesn’t even mind having her Pull Requests scrutinized line by line by Walter or Harm. π
Act III: Technicalities
Logging, logging ..
This year, we finally got around to dotting the i’s and crossing the t’s on the Grafana/Loki logging front.
Sure, there were always logs, but they were dispersed and hard to aggregate. Last year, we got the Loki ingestion with the Grafana dashboard up and running, but it wasn’t usable for everyone. The forwarding agent was still in flux. Now we’re using grafana-alloy everywhere.
More and more of you are now using it. We’re still optimizing and tweaking, but are confident that it is up to standards. Let us know, and we can help you get up and running.
.. and more logging
Apart from the regular logging, we’ve also been playing with NATSβοΈ, a publish and subscribe messaging service.
VectorβοΈ passes logs on to both LokiβοΈ and NATS. NATS then gets a shitload of logs from most systems to process. NATS o' MatchβοΈ gets to do an initial filter on the logs, so we get relevant messages in relevant streams. From there on, we do post-processing on streams of lower-volume logs.
For instance, here we also aggregate the logs that
TetragonβοΈ produces when using eBPF to record
execve()
calls. That gives us a detailed history of all commands that
someone ran when doing work on a system.
And we trigger alerts for unexpected events.
Of course this can be greatly improved by considering groups of events. That will be a project for 2025.
.. and metrics
We’ve begun collecting the first metrics in MimirβοΈ.
Expect more of this in 2025, along with TempoβοΈ and PyroscopeβοΈ for better insight into your applications.
Let us know if you want to be kept in the loop!
Kubernetes
Our Managed Kubernetes has been taken to the next level.
We use Cilium networking for all new setups. We add network policies and have Kyverno enforce them. And we’re now managing all Kubernetes core components using a combination of Ansible and Argo-CD. All your live-hacks will be found. π
(Customers can now also get Cilium metrics.)
Did you know that in 2024, Kubernetes turned 10? π This year we decommissioned our oldest cluster which went live in February 2018 with Kubernetes v1.10 β back then provisioned using SaltStack and using Canal networking.
Our testing clusters on dostno have all been upgraded to the latest and greatest by our newest colleague Emilia.
MinIOβοΈ has proven to be a useful addition to individual clusters and is deployed for those who need private Object Storage within their clusters.
General improvements
The thing that the PCI β see above β has brought, is better internal scrutiny over the changes that we make. Change management helps us stay in control.
We use GiteaβοΈ for those sensitive projects, documentation and reports. GitLab is nice and feature-rich, but it has a new CVE coming out every fortnight. With Gitea we get fine-grained permissions, a lower footprint and better sleep.
And of course there is more. For instance, an in-house suave tool to keep
track of CVE reports. And various utility scripts in
vcutil like nbdig
to get
host and IP information from Netbox on the command line.
OSSO and AI
Large language model (LLM)-style AI is powerful. This is not the place for a long-winded argument about why, but you should keep the following points in mind:
-
Uploading data to the cloud is risky β to any cloud, especially those hosted in other countries or even on other continentsβbecause you can’t be sure what happens to your data.
-
Will it be used for training and end up in other people’s results?
-
Will the government of that country respect your privacy, or can they access your data regardless of the company’s promises?
-
Is the data secure from hackers?
-
-
LLMs can hallucinate. Since they are essentially sophisticated autocomplete tools, they often produce convincing but entirely false information. Don’t take anything from an LLM at face value β always double-check the results.
For these reasons, and because we like to have control over every component, we are cautious about using AI. While this might feel like we’re missing out, it also ensures that OSSO is not feeding your sensitive medical logs into some cloud.
Together with our customers, we will continue to explore the possibilities of AI at a sensible and deliberate pace.
But, one can still make good use of the available resources.
For instance, during the writing of this text, ChatGPT was consulted as an assistant to come up with more natural-sounding English. Using it like a native-speaker friend seems to work well.
This dictionary style lookup is really convenient as it picks the right synonym most of the time. This obviously works best when you’re already proficient in the language.
The other thing we used AI for, was the Christmas card many of you got in the mail. The images were generated using Stable Diffusion WebUI ForgeβοΈ with the FLUX.1-schnell dataset which has an apache-2.0 license.
This ran on one of our hypervisors with a dedicated GPU. If you want access to one of those, just give us a π call.
Act IV: Fun and Future
There were fewer blogs written in 2024. A three letter acronym from the first Act that had perused so much energy might be partly to blame. We can probably do better next year.
Did we do fun stuff in 2024? We certainly did.
Office improvements
Rona, Ronald β Ro-Ro Plant Services βΒand Alex have been busy keeping the greenery in the office green. The avocado plants, which we grew from kernels, are thriving nicely.
We’ve upgraded our coffee machine. Making a cappuccino is no harder than taking milk from the fridge. Don’t hesitate to ask for one if you’re at HQ!
And just in the nick of time, we put some software on the Pimoroni Cosmic Unicorn boards. They now display pretty green dots that come and go… unless there is a Disaster trigger in Zabbix monitoring, at which point they turn red to alert everyone in the office.
Finsterwolde
As a team outing, we took an after-PCI mini-trip to Finsterwolde. We rented a villa where we relaxed and spent time together.
Cramped in the car with the food.
Enjoying a game of pool. Chilling in the sun.
Being outside and eating good food.
That was well deserved.
SONiC in Stuttgart
The other trip we did was to Stuttgart, or actually Herrenberg. There we did a three day training course on SONiC with a consultant from Stordis.
Our Cumulus based Closso internal network infrastructure (see the 2021 blog) isn’t that old, but it could use some updates and improvements. Cumulus Networks was bought by NVIDIA and after acquisition it ended support for Broadcom switches. This means that we cannot use that upgrade path.
Instead, we’ve been looking into SONiCβοΈ as a replacement. And because the one with the most network expertise (Ronald) is leaving, we thought it would be a good idea to do a training together. This is certainly something that a bit of self-study could’ve achieved, but by being away from the office we made time to work on it together and get a shared vocabulary on the subject.
The Stordis consultant gave us a friendly environment where we could tinker with some virtual switches. And the dedicated time made sure we weren’t distracted by other tasks that would normally get our attention.
SONiC promises to do the things we do now, and do more, like VXLAN for easier movement of IPs accross physical locations. But, how it’s built is vastly different from how Cumulus did it and it’s not entirely mature yet (see: [1] [2]βοΈ).
But aside from the tech, the trip to Germany was a nice way to bond, especially with our newer colleagues.
SchwΓ€bisch dinner.
After dinner frisbee at the main square of the old town.
Chilling at the hotel bar. And a view from the observation tower.
We did not get to see the city of Stuttgart, but we did enjoy the small walkable town of Herrenberg with its pretty houses. And the forest with the observation tower provided a nice little hike. Let’s hope Alex, Jordi and Rona can join us next time.
As for SONiC: we’ll see in 2025 if this is going to be the replacement of our current Software-Defined Network (SDN).
As for other networking in 2025: 2Hip has been busy migrating our older Juniper MX204’s to newer and π faster Nokia 7750 SR-1’s. Most of that is now done. And Duocast is working on getting 100Gbit links from Amsterdam to Groningen. Aside from possibly noticable speed gains, it will certainly further reduce impact of DDoSes on our network.
WHY2025
In 2022, the entire company went with our families to the hacker camp MCH. In 2025, its successor WHY is scheduled.
WHY2025 is a nonprofit outdoors hacker camp where knowledge sharing, technological advancement, experimentation, connecting and hacking are the topic. MCH2022 was a really fun and enjoyable experience. The extended OSSO family will again be present at the Family Village in WHY.
WHY don’t you join us in 2025? *cringe*
Finishing up
As we do every year, we’ve turned the office into a Christmas lights galore to create a warm and welcoming atmosphere for our Christmas dinner. It will also provide the backlight to the yearly OSSO Gaming Day on Friday the 20th of December.
The families, including the little ones, were present for dinner.
The food and the company were good.
OSSO wishes you the best 2025. You know how to reach us!
π»
P.S. It has been rumoured that OSSO B.V. has its 20 year anniversary in 2025. That might call for a party…
Whoa
Congratulations, you made it all the way down! We like you. β€οΈ
See you at HQ, or in Slack.