Scribbling through TensorFlow.js
I’ve been trying to wrap my head around machine learning lately. Today I worked through the TensorFlow.js tutorial on recognizing handwritten numbers with a neural network. Herein, my notes and scribbles.
[caption id=“attachment_4714” align=“alignnone” width=“1537”] TensorFlow: it’s about turning linear algebra into models built of layers built of math[/caption]
My previous forays into machine learning left me a little frustrated: I could tell there was language, pattern, and notations to this, but I couldn’t see them from the novelty of new-to-me words like sigmoids, convolution, and hidden layers. Turns out those are part of the language.
But the really handy idioms are encoded in TensorFlow’s high-level model-and-layer API. A model encapsulates a chunk of machine learning that can be trained to classify inputs (images, texts, etc.) based on a mess of training data (pre-classified stuff). Every model is built from a network of layers; layers use linear algebra to transform numbers into classifications.
Once you’ve built a model, you feed it a bunch of training data so that it can learn the coefficients and other number-stuff that goes inside the math-y network. You also provide it with an optimizer and loss function so that as the model is trained, it can know whether its getting better or worse at classifying data.
A really cool thing is you run this training process on your computer’s GPU. GPUs, like machine learning models, are big networks of fast math-y stuff. Beautiful symmetry! On the other hand, you usually can’t fit your training data set into GPU memory, so you end up batching your test data and submitting it to the GPU in loops.
Once all this runs, you’ve got a trained model that can take image inputs (in this case, hand-written digits) and classify them to decimal numbers (0-9). Magic!
Code minutiae, October 23, 2017
For some reason, identifier schemes that are global unique, coordination-free, somewhat humanely-representable, and efficiently indexed by databases are a thing I really like. Universally Unique Lexicographically Sortable Identifier (ulid, for humans) is one of those things. Implementations available for dozens of languages! They look like this: 01ARZ3NDEKTSV4RRFFQ69G5FAV
.
Paul Ford’s website is twenty years old. For maybe half that time I’ve been extremely jealous of how well he writes about technology without being dry and technical. When I grow up, I’ll write like that!
How Awesome Engineers Ask For Help. So much good stuff there, I can’t quote it. There’s something in there for new and experienced engineers alike. In particular: don’t give up, actively participate in the process of getting unstuck, take and share notes, give thanks afterwards.
The best time to work on your dotfiles is on weekends between high-intensity project pushes at work. No better time to do some lateral thinking and improving of your workflow. Feels good, man.
You must be this tall to ride the services
If I were trying to convince myself to extract a (micro)service, today, I’d do it like this. First I’d have a conversation with myself:
- you are making tactical changes slightly easier at the expense of making strategic changes quite hard; is that really the trade-off you're after?
- you must have the operational acumen to provision and deploy new services in less than a week
- you must have the operational acumen to instrument, monitor, and debug how your applications interact with each other over unreliable datacenter networks
- you must have the design and refactoring acumen to patiently encapsulate the service you want to build inside your current application until you get the boundaries just right and only then does it make sense to start thinking about pulling a service out
I would reflect upon how most of the required acumen is operational and wonder if I’m trying to solve a design problem with operational complexity. If I still thought that operational complexity was worthwhile, I’d then reflect upon how close the code in question was to the necessary design. If it wasn’t, I would again kick the can down the road; if I can’t refactor the code when it’s objects and methods, there’s little hope I can refactor it once its spread across two codebases and interacting via network calls as API endpoints, clients, data formats, etc.
If, upon all that reflection, I was sure in my heart that I was ready to extract a service, it’d go something like this:
- try to encapsulate the service in question inside the current app
- spike out an internal API just for that service; this API will become the client contract
- wrap an HTTP API around the encapsulation
- make sure I have an ops buddy who can help me at every provisioning and deployment step, especially if this sort of thing is new and a monolith is the status quo
- test the monolith calling itself with the new API
- trial deploy the service and make some cross-cutting changes (client and server) to make sure I know the change process
- start transferring traffic from the monolith to the service
In short, I still don’t think service extraction is as awesome as it sounds on paper. But, if you can get to the point of making a Modular Monolith, and if you can level up your operations to deal with the demands of multiple services, you might successfully pull off (micro)services.
How methodical and quality might keep up with fast and loose
I’ve previously thought that a developer moving fast and coding loose will always outpace a developer moving methodically and intentionally. Cynically stated, someone making a mess will always make more mess than someone else can clean up or produce offsetting code of The Quality.
I’ve recently had luck changing my mindset to “make The Quality by making the quantity”. That is, I’m trying to make more stuff that express some aspect of The Quality I’m going for. Notably, I’m not worrying too much if I have An Eternal Quality or A Complete Expression of the Quality. I’m a lot less perfectionist and doing more experiments with my own style to match the code around me.
I now suspect that given the first two developers, its possible to make noticeably more Quality by putting little bits of thoughtfulness throughout the code. Unless the person moving fast and loose is actively undermining the quality of the system, they will notice the Quality practices or idioms and adopt them. Code review is the first line of defense to pump the brakes and inform someone moving a little too fast/loose that there’s a Quality way to do what they’re after without slowing down too much.
Sometimes, I’m an optimist.
A strange world of mathematical and computational complexity
Over the past few weekends, I’ve been reading on two topics which are way out of my technical confidence. I’ve spent the majority of my software development career building web applications and neither of these are very coincident with web apps right now:
- blockchains, cryptocurrencies, and autonomous contracts
- machine learning, neural networks, general purpose GPUs, deep learning
With blockchain stuff, there are very interesting fundamentals underlying a sprawling system of hype and information asymmetry. Every time I go in, it’s “shields up!”, time to defend myself from people trading reputation for short-term speculation or actively spreading inaccurate information. In other words, here comes the snake oil salesmen!
That said, there are cool ideas in there. Solidity is a language built into Ethereum for writing programs that run alongside the blockchain. You wouldn’t want to build a normal application this way, but if you want some degree of confidence in a system, like voting or accounting, a system inside Ethereum and Solidity might make sense. Even more strange, to a web developer, you have to pay for the compute time that program requires in Ethereum itself. Strange and intriguing!
By comparison, machine learning is equally hyped but has little speculation. They both involve about the same level of mathematical and computational complexity. Which is probably how I’ve managed to avoid both so far: I’m far better at social reasoning, which is a big deal in web applications, than I am at math. But I’m trying to change that!
I found deeplearning.js and it seems like a nice gateway into the domain of building neural networks for machine learning, computer vision, etc. And it utilizes your GPUs, if present, which is pretty neat because GPUs are strange little computers we seem to have increasingly more of as the days go on.
No idea where this line of thinking is going. All I know is it’s more fun than reading about yet another client or server side framework. ;)
Just keep writing, October 16, 2017
I watched pal Drew Yeaton work in Ableton briefly and it was pretty incredible. He laid down a keyboard and drums beat, fixed up all the off-beat stuff, and proceeded to tinker with his myriad of synthesizers and effects rack with speed. I had no idea what his hands were doing as he moved from MIDI keyboards, mouse, and computer keyboard like a blur. Seems pretty cool!
I talked myself into and out of porting this website to Jekyll three times over the past week. Hence, the writing dropped off, which is silly because I just blogged about not tinkering with blog tools in the last month. WordPress.com doesn’t quite do the things I want it to and its syntax highlighting is keeping the dream of the nineties alive. I’m writing these short form bits in lieu of a sidebar thing for now. No idea how I’ll make do with the code highlighting.
The Good Place is an amazing show. Ted Danson, Kristen Bell, and the rest of the cast are fantastic. There is an amazing-for-a-comedy twist. Do not read the internet until you watch the first season of this show. It’s just started season two, get on board now!
One step closer to a good pipeline operator for Ruby
I’ve previously yearned for something like Elm and Elixir’s |>
operator in Ruby. Turns out, this clever bit of concision is in Ruby 2.5:
object.yield_self {|x| block } → an_object # Yields self to the block and returns the result of the block. class Object def yield_self yield(self) end end
I would prefer then
or even |
to the verbosely literal yield_self
, but I’ll take anything. Surprisingly, both of my options are legal method names!
class Object def then yield self end def | yield self end end require "pathname" __FILE__. then { |s| Pathname.new(s) }. yield_self { |p| p.read }. | { |source| source.each_line }. select { |line| line.match /^\W*def ([\S]*)/ }. map { |defn| p defn }
However, |
already has 20+ implementations, either of the mathematical logical-OR variety or of the shell piping variety. Given the latter, maybe there’s a chance!
Next, all we need is:
- a syntax to curry a method by name (which is in the works!)
- a syntax to partially apply said curry
If those two things make their way into Ruby, I can move on to my next pet feature request: a module/non-global namespace scheme ala Python, ES6, Elixir, etc. A guy can dream!
Heck yeah, October 09, 2017
Simon Willison returns to blogging, in peak form nonetheless. Heck yeah!
Janet Jackson, “Rhythm Nation”, posted by Billy Eichner. Heck yeah!
A rocket engine made of nuclear fission. Note the “poison rod” in the schematic. Heck yeah engineering, heck no they actually tested this on Earth, heck yeah they never flew it!
Strange Loop 2017
I was lucky enough to attend Strange Loop this year. I described the conference to friends as a gathering of minds interested in programming esoterica. The talks I attended were appropriately varied: from very academic slides to illustrated hero’s journeys, from using decomposed mushrooms to create materials to programming GPUs, from JavaScript to Ruby. Gotcha, that last one was not particularly varied.
In short, most of the language-centric conferences I’ve been to in the past were about “hey look at what I did with this library or weird corner of the language”, though the most recent Ruby/Rails conference are more varied than this. By comparison, Strange Loop was more about “I did this thing that I’m excited about and its a little brainy but not intimidating and also I’m really excited about it.”
Elm Conf 2017
I started the weekend off checking out the Elm community. I already think pretty highly of the language. I would certainly use it for a green-field project.
Size, excitement, and employment-wise, Elm is about where Ruby was when I joined the community in 2005. Lots of excited folks, a smattering of employed folks, and a good technical/social setup for growth.
A nice thing about the community is that there is no “other” that Elm is set against. Elm code often needs to interface with JavaScript to get at functionality like location or databases, so they don’t turn their nose up at it. It’s a symbiotic relationship. Further, most Elm developers are probably coming from JavaScript, so its a pretty friendly relationship. This is nice shift from the tribalism of yore.
It’s also exciting that Elm is already more diverse than Ruby was at the same point in its growth/inflection curve. Fewer dudes, more beginners, and none of the “pure Ruby” sort of condescension towards Rails and web development.
Favorite talks:
- “Teaching Elm to Beginners” (no talk video), Richard Feldman. Using Elm at work requires teaching Elm to beginners. Teaching is a totally different skill set, disjoint from programming. When answering a question, introduce as few new concepts as possible. Find the most direct path to helping someone understand. It’s unimportant to be precise, include lots of details, or being entertaining when teaching. You can avoid types and still help students build a substantial Elm program.
- If Coco Chanel Reviewed Elm, Tereza Sokol: Elm as seen through the lens of high and low fashion. Elm is a carefully curated, slow releasing collection of parts ala Coco Chanel. It is not the hectic variety of an H&M store.
- Accessibility with Elm, Tessa Kelly: Make accessible applications by enforcing view/DOM helpers with functional encapsulation and types. Your program won’t compile if you forget an accessibility annotation. A pretty good idea!
- Mogee, or how we fit Elm in a 64x64 grid”, Andrew Kuzmin: A postmortem on building games with Elm. Key insight: work on the game, not on the code or engine. Don’t frivolously polish code. Use entity-component-system modeling. Build sprite/bitmap graphics in WebGL by making one pixel out of two triangles.
The majority of the talks referenced Elm creator Evan Czaplicki’s approach to designing APIs. He is humble enough that I don’t think this will backlash like it did with DHH’s opinions did with Rails.
By far the biggest corporate footprint in the community and talks was NoRedInk. Nearly half of the talks were by someone at the company.
Most practical talks from StrangeLoop
Types for Ruby: it seems like they’ve implemented a full-blown type system for Ruby. It’s got all the gizmos and gadgets you might expect: unions, generics, gradual typing. It applies all its checks at runtime though, and they didn’t say if it does exhaustive checking, so I’m not sure how handy it would be in the way that e.g. Elm or Flow are. On my list of things to check out later.
Level up your concurrency skills with Rust. Learning Rust’s concepts for memory and concurrency safety, i.e. resources, ownerships, and lifetimes, can help you program in any language. Putting concurrency into a system is refactoring for out-of-orderness and most likely a retrofit of the underlying structure. Rust models memory like a resource, ala file handles or network sockets are modeled by the operating system. Rust resource borrowing in summary: if you can read it, no one else can write it; if you can write it, no one else can read or write it; borrows are checked at compile time so there is no runtime overhead/cost.
GPGPU programming with Metal. Your processor core has a medium sized arithmetic logic unit and a giant control unit (plus as much memory/cache as they can spare). A GPU is thousands of arithmetic logic units. Besides drawing amazing pictures, you can use all those arithmetic logic units to train/implement a neural network, do machine vision or image processing, run machine learning algorithms, and any kind of linear algebra or vector calculus. Work is sent to the GPU by loading data/state into buffers, converting math instructions to GPU code and load that into GPU buffers, and then let the GPU go wild executing it.
Seeking a better culture and organization of open source maintainership (no talk video). Projects are getting smaller, more fragmented, and attracting no community (ed. the unintended consequence of extreme modularity?) Bitcoin and Ethereum have very little backing despite the astronomical amounts of money in the ecosystem. We need a new perspective on funding open source work. Consumption of open source has won, but production of open source is still in a pretty bad place.
How to be a compiler. Knitting is programming; you can even compile between knitting description pseudo-languages. Implemented Design by Numbers, a Processing predecessor, as transpiler to SVG.
Random cool things people are really doing
Measuring and optimizing tail latency. Activating instrumentation and “slow-path” techniques on live web requests that run so long they will fall into the 99th percentile. Switch processor voltage to “power up” a processor that’s running a slow request so it will finish faster, e.g. switch a core from low power/500MHz mode to high power/2GHz mode.
Really using functional ideas of composition and state in production, consumer-facing applications (e.g. the NY Times) and using ML-style type checkers with JavaScript (e.g. Flow and Elm).
My two favorite talks by far: Making digital art with JavaScript, WebGL, vdom and immutability. Scraping/querying/aggregating image data from various space missions (e.g. Jupiter and Pluto flybys).
Facebook stopped using datacenter routers and started building their own servers that program the networking chips a router would use from CentOS, basically giving them programmable routers that deploy like you would update infrastructure like Nginx or memcached. I wonder when/if treating network devices as software will scale down to your typical large company?
Strange Loop takeaways
- a conference of diverse backgrounds and experiences is a better one
- my favorite talks told a hero’s journey story through illustrations
- folks in this sphere of technology are taking privacy and security very seriously, but the politics of code, e.g. user safety and information war, were not particularly up there in the talks I went to (probably by self-selection)
- way more people are doing machine learning applications than I’d realized; someone said off-hand that we’d “emerged from the AI winter in 2012” and that struck me as pretty accurate
- everyone gets the impostor syndrome, even conference speakers and wildly successful special effects and TV personalities like Adam Savage
If you get the chance, you should go to Strange Loop!
exa in 30 seconds
What is it? exa is ls
reimagined for modern times, in Rust. And more colorfully. It is nifty, but not life-changing. I mostly still use ls
, because muscle memory is strong and its basically the only mildly friendly thing about Unix.
How do I do boring old ls things?
Spoiler alert: basically the same.
ls -a
:exa -a
ls -l
:exa -l
ls -lR
:exa -lR
How do I do things I rarely had the gumption to do with ls?
exa -rs created
: simple listing, sort files reverse by created time. Other options:name, extension, size, type, modified, accessed, created, inode
exa -hl
: show a long listing with headers for each columnexa -T
: recurse into directories alatree
exa -l --git
: showgit
metadata alongside file info
Generalization and specialization: more of column A, a little less of column B
- Now, I attempt to write in the style of a tweetstorm. But about code. For my website. Not for tweets.
- For a long time, we have been embracing specialization. It's taken for granted even more than capitalism. But maybe not as much as the sun rising in the morning
- From specialization comes modularization, inheritance, microservices, pizza teams, Conway's Law, and lots of other things we sometimes consider as righteous as apple pie.
- Specialization comes at a cost though. Because a specialized entity is specific, it is useless out of context. It cannot exist except for the support of other specialized things.
- Interconnectedness is the unintended consequence of specialization. Little things depend on other things.
- Those dependencies may prove surprising, fragile, unstable, chaotic, or create a bottleneck.
- Specialization also requires some level of infrastructure to even get started. You can't share code in a library until you have the infrastructure to import it at runtime (dynamic linking) or resolve the library's dependencies (package managers).
- The expensive open secret of microservices and disposable infrastructure is that you need a high level of operational acumen to even consider starting down the road.
- You're either going to buy this as a service, buy it as software you host, or build it yourself. Either way, you're going to pay for this decision right in the budget.
- On the flip side is generalization. The grand vision of interchangeable cogs that can work any project.
- A year ago I would have thought this was as foolish as microservices. But the ecosystems and tooling are getting really good. And, JavaScript is getting good enough and continues to have the most amazing reach across platforms and devices.
- A year ago I would have told you generalization is the foolish dream of the capitalist who wants to drive down his costs by treating every person as a commodity. I suspect this exists in parts of our trade, but developers are generally rare enough that finding a good one is difficult enough, let alone a good one that knows your ecosystem and domain already.
- Generalization gives you a cushion when you need help a short handed team get something out the door. You can shift a generalist over to take care of the dozen detail things so the existing team can stay focused on the core, important things. Shifting a generalist over for a day doesn't get you 8 developer hours, but it might get you 4 when you really need it.
- Generalization means more people can help each other. Anyone can grab anyone else and ask to pair, for a code review, for a sanity check, etc.
- When we speak of increasing our team's bus number, we are talking about generalizing along some axis. Ecosystem generalists, domain knowledge generalists, operational generalists, etc.
- On balance, I still want to make myself a T-shaped person. But, I think the top of the T is fatter than people think. Or, it's wider than it is tall, by a factor of one or two.
- Organizationally, I think we should choose what the tools and processes we use carefully so that we don't end up where only one or two people do something. That's creating fragility and overhead where it doesn't yield any benefit.
Not-a-Science Science
Any field with the word "science" in its name is guaranteed not to be a science.
– Gerald Weinberg, An Introduction to General Systems Thinking
The loungification of luxury cars
High-end luxury cars are starting to resemble first-class airport lounges and it’s bothering me.
The Porsche Panamera has a dang tray table. Just about every German luxury car has the option to put an LCD screen on the back of the front seats, for entertainment. Who puts $100k+ down on a car so that someone else can drive you around? The seats recline, have tablets to control their massage and scent-control functions. Of course they’re heated and ventilated.
I’m fine with cars as things that merely get you from point A to point B, and I’m fine with rich people buying extravagant cars, but I’m not okay with this airport lounge stuff. No one likes airports! They’re miserable! Stop designing things to resemble airports!
Categorizing and understanding magical code
Sometimes, programmers like to disparage “magical code”. They say magical code is causing their bugs, magical code is offensive to use, magical code is harder to understand, we should try to write “less magical” code.
“Wait, what’s magic?”, I hear you say. That’s what I’m here to talk about! (Warning: this post contains an above-average density of “air quotes”, ask your doctor if your heart is strong enough for “humorous quoting”.)
Magic is code I have yet to understand
It’s not inscrutable code. It’s not bad code. It doesn’t intentionally defy understanding, like an obfuscated code contest or code golfing.
I can start to understand why a big of code is frustratingly magical to me by categorizing it. (Hi, I’m Adam, I love categorizing things, it’s awful.)
“Mathemagical” code escapes my understanding due to its foundation in math and my lack of understanding therein. I recently read Purely Functional Data Structures, which is a great book, but the parts on proving e.g. worst-case cost for amortized operations on data structures are completely beyond my patience or confidence in math. Once Greek symbols enter the text, my brain kinda “nope!”s out.
“Metamagic” is hard to understand due to use of metaprogramming. Code that generates code inside code is a) really cool and b) a bit of a mind exploder at first. When it works, its glorious and not “magical”. When it falls short, it’s a mess of violated expectations and complaints about magic. PSA: don’t metaprogram when you can program.
“Sleight of hand” makes it harder for me to understand code because I don’t know where the control flow or logic goes. Combining inheritance and mixins when using Ruby is a good example of control flow sleight-of-hand. If a class extends Foo
, includes Bar
, and all three define a method do_the_thing
, which one gets called (trick question: all of them, trick follow-up question: in what order!)? The Rails router is a good example of logical sleight-of-hand. If I’m wondering how root to: “some_controller/index”
works and I have only the Rails sources on me, where would I start looking to find that logic? For the first few years of Rails, I’d dig around in various files before I found the trail to that answer.
“Multi-level magic schemes” is my new tongue-in-cheek way to explain a tool like tmux
. It’s a wonderful tool for those of us who prefer to work in (several) shells all day. I’m terrified of when things go wrong with it, though. To multiplex several shells into one process while persisting that state across user sessions requires tmux
to operate at the intersection of Unix shells, process trees, and redrawing interfaces to a terminal emulator. I understand the first two in isolation, but when you put it all together, my brain again “nope!”s out of trying to solve any problems that arise. Other multi-level magic schemes include object-relational mappers, game engines, operating system containers, and datacenter networking.
I can understand magic and so can you!
I’m writing this because I often see ineffective reactions to “magical” code. Namely, 1) identify code that is frustrating, 2) complain on Twitter or Slack, 3) there is no step 3. Getting frustrated is okay and normal! Contributing only negative energy to the situation is not.
Instead, once I find a thing frustrating, I try to step back and figure out what’s going on. How does this bit of code or tool work? Am I doing something that it recommends against or doesn’t expect? Can I get back on the “golden path” the code is built for? Can I find the code and understand what’s going on by reading it? Often some combination of these side quests puts me back on my way an out of frustration’s way.
Other times, I don’t have time for a side quest of understanding. If that’s the case, I make a mental note that “here be dragons” and try to work around it until I’m done with my main quest. Next time I come across that mental map and remember “oh, there were dragons here!”, I try to understand the situation a little better.
For example, I have a “barely tolerating” relationship with webpack
. I’m glad it exists, it mostly works well, but I feel its human factors leave a lot to be desired. It took a few dives into how it works and how to configure it before I started to develop a mental model for what’s going on such that I didn’t feel like it was constantly burning me. I probably even complained about this in the confidence of friends, but for my own personal assurances, attached the caveat of “this is magical because it’s unfamiliar to me.”
Which brings me to my last caveat: all this advice works for me because I’ve been programming for quite a while. I have tons of knowledge, the kind anyone can read and the kind you have to win by experience, to draw upon. If you’re still in your first decade of programming, nearly everything will seem like magic. Worse, it’s hard to tell what’s useful magic, what’s virtuous magic, and what’s plain-old mediocre code. In that case: when you’re confronted with magic, consult me or your nearest Adam-like collaborator.
Bias to small, digestible review requests. When possible, try to break down your large refactor into smaller, easier to reason about changes, which can be reviewed in sequence (or better still, orthogonally). When your review request gets bigger than about 400 lines of code, ask yourself if it can be compartmentalized. If everyone is efficient at reviewing code as it is published, there’s no advantage to batching small changes together, and there are distinct disadvantages. The most dangerous outcome of a large review request is that reviewers are unable to sustain focus over many lines, and the code isn’t reviewed well or at all.
This has made code review of big features way more plausible on my current team. Large work is organized into epic branches which have review branches which are individually reviewed. This makes the final merge and review way more tractable.
Your description should tell the story of your change. It should not be an automated list of commits. Instead, you should talk about why you’re making the change, what problem you’re solving, what code you changed, what classes you introduced, how you tested it. The description should tell the reviewers what specific pieces of the change they should take extra care in reviewing.
This is a good start for a style guide ala git commits!
Fewer changes are faster to deploy than fewer changes
Itamar Turner-Trauring, Incremental results: how to succeed at large software projects:
- Faster feedback...
- Less unnecessary features...
- Less cancellation risk...
- Less deployment risk...
👏 👏 👏 👏 👏 read the whole thing, Itamar’s tale is well told.
Consider: incremental approaches consist of taking a large scope and finding smaller, still-valuable scopes inside of it. Risk is 100% proportional to scope. Time-to-deliver grows as scope grows. Cancellation and deployment risk grow as time-to-deliver grows. It’s not quite math, but it is easy to demonstrate on a whiteboard. In case you happen to need to work with someone who wants large scope and low risk and low time-to-delivery.