I was talking with a coworker the other day about one of the new AI offerings and how well it simulated human interaction. I was asking it to summarize some notes I had taken on a book, and it did a fantastic job - but I then mentioned something I truly believe: AI is mediocre, at best It has to be, by definition. It's just repeating back to us the most common and probable things that we have said to it. Or, as Scott Hanselman describes it: "AI is a sock puppet on your own hand": Understanding Probability and Information TheoryClaude Shannon is one of the towering figures of Computer Science and mathematics that no one has heard of. He created the digital circuit by synthesizing George Boole's work with true and false (Boolean Algebra) with the base two numbering system (aka "binary"). That was for his Master's Thesis back in the 1930s. Later in his life, he single-handedly started the Information Age by publishing his theorems on how you could "quantify" information. His breakthroughs led to compression algorithms, the notion of "entropy" in cryptography, and error correction. He also laid the foundation for predictive analysis, which is key when it comes to error correction. Surprise!Back in the 1930s and 40s, people wanted to know just how much pure information they could send from point to point. This used to be done by telegraph, person-to-person, or actual letters in the mail. Wars, however, were becoming larger and more complicated, so the need to send more information in a shorter period of time was crucial. Shannon realized that people communicate in extremely "redundant" ways, filling sentences with "noise" that don't communicate anything aside from non-useful pleasantries. To him, the true measure of a sentence was its "surprise" - the unexpected thing that, quite literally, would surprise you, and that you would learn from. Consider a text from my mom, checking in on me: Hi Rob, hope you're having a nice day. I wanted to know if you could send me the link to the article you and I talked about the other day. Call me if you can. Love, mom Note: yes, she really does sign off that way. This sentence is full of redundancy that, to an Information Theorist (or a network admin trying to reduce load on his pipes), could be made much more efficient, thus increasing it's "surprise": ChatGPT 🔗🤦🏼♀️ In my mom's first message, each sentence has a degree of predictability and form. In other words: it's not all that surprising. The only thing I learn is that she forgot the link for the ChatGPT article we talked about. The second message makes this bleedingly clear. Shannon realized two things from this: surprise is useful for compressing information, but redundancy is good for predicting what will be said. The Surprise of ChatGPT ResponsesThis brings us to the mediocrity part, and I do realize that it was a bit of a wander, but hey, it's Saturday and it's fun to learn CS stuff, isn't it? When an LLM does its thing and chunks massive amounts of data, it's following a path laid down by Claude Shannon almost a century ago. Each bit of information is related to a previous bit based on the structure of the language being used - which is English, in most cases. Sometimes it's code, I suppose, but either way there has to be some underlying structure or nothing will work. If I type the letter "q", what letter will follow? In almost every case, it will be a "u" because that's the way English works. If I type the phrase "The Little" into Google's search box, the predictive bits in the background will do their thing for me: Google is able to do this by leveraging a more advanced form of a graph data structure called a "Digital Tree", also called a "Trie": This is, of course, is an extremely simplified doodle by yours truly - but the idea is the same: each letter you type decreases the possible result set, which increases the likelihood of an answer. If I typed in "HA" and used this trie to predict what was next, my choices would be "half", "hang", "hand", "handball", and "handbook". If I added an "n", we get closer to the thing you're looking for, assuming we're working with word structures. But here's the real question: how was this trie created? Randomly from a dictionary, or from an encyclopedia? Maybe it's a collection of every tweet ever written, which would be horrifying! The transformers and algorithmic voodoo that LLMs use to predict an answer for a question work in much the same way: using probability, based on the underlying model, to hand you the most likely answer. The most likely thing tends to be the most common thing, which can be good or bad depending what you need. If you have a "frozen pizza" type of problem, then a Walmart answer will probably work just fine for you. If, however, you're trying to figure out your latte art with your new Rocket Espresso Cinquantotto Mignon, a Starbucks answer might be problematic. This is what I think about, every time I use AI to help with a coding problem: this suggested answer is the most mediocre solution possible, given the source. The Missing CS DegreeAs you can tell, I love spending my weekends learning new things and writing about them if I find them interesting. That's how The Imposter's Handbook (and the more recent The Imposter's Roadmap) was born! Which got me thinking... I wrote the first book over 8 years ago, and while I think it's still very relevant... I think there's a lot more I could do with the subject. Which led me to an idea... Do you think a course with a title like "The Missing CS Degree" would be worthwhile? I've been thinking about this since my last post a few weeks ago - and no I wouldn't charge a ton of money for it! But I was thinking 6-8 hours of in-depth Computer Science and how it applies to what we do today. If this is interesting to you, let me know! I'm sketching out a few ideas for an outline. Happy Saturday, thanks for reading! Rob |
I taught myself to code in 1998 and within 7 years had a client list that included Google, Microsoft, Starbucks, Ameritech, KLA-Tencor, PayPal, and Visa. In 2014 I decided that I really needed to understand core Computer Science concepts, so I dove in, using the free resources from MIT and Stanford. In 2016 I shared what I learned with The Imposter's Handbook.
Full disclosure: I work at Microsoft but what you're about to read is 100% my personal opinion. Don't @ me with "dude from Microsoft claims..." please... I've been deep in a rewrite of the AI chapter in The Imposter's Roadmap, and I'm trying to discuss the role of AI in a senior programmer's life... and I noticed something interesting as the AI discussions heat up in our industry: we both fear and want to protect junior developers. What a weird straw man argument! Note: this article addresses...
I learned to play ice hockey when I was 11 and, as a kid from Southern California, it wasn't easy. I learned how to skate and how to play the game at the same time, all while going through a massive growth spurt. My sister once called me a "baby giraffe on a frozen lake". Nice. One of the hardest things to learn when it comes to skating is the "hockey turn", which looks something like this: Photo Credit: Tim Bates/OJHL Images This is hard. Very hard. If you master it, you can change direction...
I make videos for a living and I swear: each one is an adventure. You would think I would have a system down by now but, as it turns out, each video is a unique thing that demands it's own type of story telling. For instance: here's the latest video I did for the VS Code team. It's on Remote Development with VS Code, which is wild stuff! Many call it the "killer feature", but that brings up an interesting problem: how do you make a video about an 'interesting feature'? We discussed this...