Embracing Mediocrity


I was talking with a coworker the other day about one of the new AI offerings and how well it simulated human interaction. I was asking it to summarize some notes I had taken on a book, and it did a fantastic job - but I then mentioned something I truly believe:

AI is mediocre, at best

It has to be, by definition. It's just repeating back to us the most common and probable things that we have said to it. Or, as Scott Hanselman describes it: "AI is a sock puppet on your own hand":

Understanding Probability and Information Theory

Claude Shannon is one of the towering figures of Computer Science and mathematics that no one has heard of. He created the digital circuit by synthesizing George Boole's work with true and false (Boolean Algebra) with the base two numbering system (aka "binary"). That was for his Master's Thesis back in the 1930s.

Later in his life, he single-handedly started the Information Age by publishing his theorems on how you could "quantify" information. His breakthroughs led to compression algorithms, the notion of "entropy" in cryptography, and error correction.

He also laid the foundation for predictive analysis, which is key when it comes to error correction.

Surprise!

Back in the 1930s and 40s, people wanted to know just how much pure information they could send from point to point. This used to be done by telegraph, person-to-person, or actual letters in the mail. Wars, however, were becoming larger and more complicated, so the need to send more information in a shorter period of time was crucial.

Shannon realized that people communicate in extremely "redundant" ways, filling sentences with "noise" that don't communicate anything aside from non-useful pleasantries.

To him, the true measure of a sentence was its "surprise" - the unexpected thing that, quite literally, would surprise you, and that you would learn from.

Consider a text from my mom, checking in on me:

Hi Rob, hope you're having a nice day. I wanted to know if you could send me the link to the article you and I talked about the other day. Call me if you can. Love, mom

Note: yes, she really does sign off that way.

This sentence is full of redundancy that, to an Information Theorist (or a network admin trying to reduce load on his pipes), could be made much more efficient, thus increasing it's "surprise":

ChatGPT 🔗🤦🏼‍♀️

In my mom's first message, each sentence has a degree of predictability and form. In other words: it's not all that surprising. The only thing I learn is that she forgot the link for the ChatGPT article we talked about.

The second message makes this bleedingly clear.

Shannon realized two things from this: surprise is useful for compressing information, but redundancy is good for predicting what will be said.

The Surprise of ChatGPT Responses

This brings us to the mediocrity part, and I do realize that it was a bit of a wander, but hey, it's Saturday and it's fun to learn CS stuff, isn't it?

When an LLM does its thing and chunks massive amounts of data, it's following a path laid down by Claude Shannon almost a century ago. Each bit of information is related to a previous bit based on the structure of the language being used - which is English, in most cases. Sometimes it's code, I suppose, but either way there has to be some underlying structure or nothing will work.

If I type the letter "q", what letter will follow? In almost every case, it will be a "u" because that's the way English works. If I type the phrase "The Little" into Google's search box, the predictive bits in the background will do their thing for me:

Google is able to do this by leveraging a more advanced form of a graph data structure called a "Digital Tree", also called a "Trie":

This is, of course, is an extremely simplified doodle by yours truly - but the idea is the same: each letter you type decreases the possible result set, which increases the likelihood of an answer.

If I typed in "HA" and used this trie to predict what was next, my choices would be "half", "hang", "hand", "handball", and "handbook". If I added an "n", we get closer to the thing you're looking for, assuming we're working with word structures.

But here's the real question: how was this trie created? Randomly from a dictionary, or from an encyclopedia? Maybe it's a collection of every tweet ever written, which would be horrifying!

The transformers and algorithmic voodoo that LLMs use to predict an answer for a question work in much the same way: using probability, based on the underlying model, to hand you the most likely answer.

The most likely thing tends to be the most common thing, which can be good or bad depending what you need.

If you have a "frozen pizza" type of problem, then a Walmart answer will probably work just fine for you. If, however, you're trying to figure out your latte art with your new Rocket Espresso Cinquantotto Mignon, a Starbucks answer might be problematic.

This is what I think about, every time I use AI to help with a coding problem: this suggested answer is the most mediocre solution possible, given the source.

The Missing CS Degree

As you can tell, I love spending my weekends learning new things and writing about them if I find them interesting. That's how The Imposter's Handbook (and the more recent The Imposter's Roadmap) was born!

Which got me thinking...

I wrote the first book over 8 years ago, and while I think it's still very relevant... I think there's a lot more I could do with the subject. Which led me to an idea...

Do you think a course with a title like "The Missing CS Degree" would be worthwhile? I've been thinking about this since my last post a few weeks ago - and no I wouldn't charge a ton of money for it! But I was thinking 6-8 hours of in-depth Computer Science and how it applies to what we do today.

If this is interesting to you, let me know! I'm sketching out a few ideas for an outline.

Happy Saturday, thanks for reading!

Rob

🥷🏽 Notes From an Imposter Programmer

I taught myself to code in 1998 and within 7 years had a client list that included Google, Microsoft, Starbucks, Ameritech, KLA-Tencor, PayPal, and Visa. In 2014 I decided that I really needed to understand core Computer Science concepts, so I dove in, using the free resources from MIT and Stanford. In 2016 I shared what I learned with The Imposter's Handbook.

Read more from 🥷🏽 Notes From an Imposter Programmer

I was at a local clothing store with my youngest, looking at clothes over the summer. She was trying to find a good pair of jeans, so we ended up at the Levis store, which isn't cheap. She found a very nice pair that were $95, which seemed a bit high but she's in college now and I suppose that's when you start thinking about buying nicer things. She's extremely thrifty, unlike me, so she suggested: Dad we should drive up to Nordstrom and look there. They have a great selection and it's...

We invest a lot of our time learning programming languages and frameworks, but I think it's just as important to learn the "foundational" skills of our industry, including: Patterns and Principles Architectural Styles Object-oriented as well as Functional Your editor SQL I think most people absorb the first two as their career progresses, and some might dabble in the third. But how many of us take the time to learn SQL or an editing skill like Vim? SQL: The Gift That Keeps Giving Whenever I...

In my last note I mentioned that I put my precious Bullet Journal on the shelf (which really does make me sad) and switched to Obsidian as my full time journal. I wrote about this process in The Imposter's Roadmap as well. I did this for these reasons: I like to review what I've done for a given month, and often need to find a specific reference. That's difficult with a paper-based book. I like saving code snippets. I like using tags so I can roll up my notes and entries based on who or what....