Truck Tech: Smarter Search

January 7, 2018 at 7:06 PM

Source: Weirdly enough, I didn't even have to make this image. It was already a part of the the big icon collection I subscribe to.

Obligatory warning: This post is about some of the inner technical workings of the site. If that doesn't tickle your fancy, you might find this unfathomably boring.

Sometimes, when I've exhausted all other interesting or productive things to do, I'll blow the dust off the code that makes this site hum along, and tweak it in one way or another. I've got to say, it's a pretty thankless task. Brandon in 2015 didn't really know what he was doing when it came to building web services in Golang, so the core codebase has historically been…unkempt, to say the least. To say the most, it looked like it was haphazardly slapped together by a team of monkeys banging on typewriters in the wee morning hours at the end of a hackathon.

But with a few more years of professional experience working with these tools, and a couple late nights refactoring the site into submission, making changes and adding functionality isn't nearly as painful these days. So I decided to sit down and fix one of the weaker parts of the site: the search functionality.

I've talked about my original search implementation before. I was (slightly) young(er) and ambitious, and I wanted to do everything myself (except the frontend, because HTML/CSS/JavaScript are the devil). The search code looked something like this:

func buildATerribleSearchIndex() map[string]map[int64]bool {
  // Our "search index" is a big map from words to the post IDs that they
  // appear in.
  invIndex := make(map[string]map[int64]bool)

  for _, post := range getAllOfThePublishedPosts() {
    for _, word := range extractAllTheWordsFromThePostButDoItPoorly(post) {
      // If the word isn't in the index yet, make a submap for it.
      if _, ok := invIndex[word]; !ok {
        invIndex[word] = make(map[int64]bool)
      }
      invIndex[word][post.ID] = true
    }
  }

  return invIndex
}

I removed some of the error-handling for brevity, and made the function names obnoxious because that's who I am, but this is basically what I had. There are a few major problems here:

It's not persisted anywhere. It's stored in-memory. This was pure laziness on my part, and it means we have to rebuild the index every time our server dies. By virtue of the way App Engine manages server instances, it's pretty much guaranteed to kill your server randomly for fun and sport.
We go through allllll the posts. Now I'm not the most prolific writer, but there are over 100 posts, and some of them are fairly long, and App Engine servers are small (by default), and the parsing/delimiting operation isn't exactly cheap. Reindexing all the posts before a page load was actually noticeably slow, on the order of a second.
Index invalidation is hard. Say I want to edit a post (because someone points out I've spelled the word "truk" wrong, for example), how do I update this index? Well, I blow it away and start from scratch obviously! Instead of just removing references to the edited post and reindexing that, I reindex all the things. Again, because laziness.

So, how do we solve any/all of those problems? Well, we put our huge ego aside and leave searching up to people that know what they're doing, let's say, Google, for example. Sure enough, Google has a search API for Go on App Engine, which is exactly what we want. Ripping out my own search code and subbing in calls to Google's API was fairly straightforward, with each post mapping to a unique document.

The result is a faster search system that's easier for me to maintain. The search results are also more accurate, thanks to Google's smarter tokenization rules. And since I'm not indexing anything crazy large, it's also free for me to run. Definitely a big win for the site.

There was just one little problem left to fix: keyword highlighting. When someone searches for something, they're probably interested in finding the text where that keyword occurs. Previously, I had home-rolled a simple (but highly questionable) highlighting system on the backend that would that would find the keyword and wrap it in a <span class="highlight"></span>. This worked except when your keyword also matched something inside an HTML tag (because I write my posts in normal HTML), at which point it would just unceremoniously clobber the HTML and mangle the search results.

I figured, if I'm putting in the effort to make search better, I might as well make the whole experience actually work. So I sprinkled some mark.js magic on the search page in the frontend, and all was well with the ~~world~~ end-to-end search experience. Well, except for the fact that the search box got moved into obscurity at the bottom of the page after my last site redesign, but I'll probably fix that in a future update. Until then, feel free to scroll allllll the way to the bottom of the page and give the search functionality a try. As always, email me or drop a comment if you have any problems.

Next Up

Like I alluded to in my last post, I spent a lot of time the past few months ~~not writing blog posts~~ building different websites as Christmas presents for my friends and family. This was mostly an excuse to learn how to use modern frontend technologies for building responsive single-page applications, including actual frontend tests and minified/productionized assets. I'm currently in the process of rewriting the blog as two separate pieces: an API server that speaks a basic CRUD protocol to serve posts, questions, etc, and a static frontend that makes calls to this API server. If all goes well, nobody will even notice when I roll it out.

Site Stuff Truck Tech

Truck Tech: You've Got Mail (and TLS and other stuff)

August 27, 2016 at 2:21 PM

Source: Amalgamated from Jim Browne Chevy and Bob 'n' Bee. Basically, I took a way nicer, newer version of my truck and Photoshopped someone else's mail icon onto it.

I've gotten into this wonderful rhythm of publishing posts that are tragically untimely. My Spring Cleaning post was five days too late, I'd had the insulation and skylight for weeks before I'd recounted the tale, and I'd successfully picked up and put down my weight goals way before I picked up the pen and put down a post. So it's only fitting that I'm just getting around to talking about a feature I added to the site over eight months ago.

Every so often, I get bored and fidgety and change things around on the site. Sometimes I'll tweak the savings calculation, sometimes I'll move around items on the mobile layout, and yet other times I'll sit down with a coffee and some music and plop a few new features into existence. The title and picture probably provide plenty perspective, but this time around, I added email subscriptions. I noted in this post (a relative eternity ago) that I was thinking about adding them because, in theory, it's pretty simple. Someone gives me their email address and clicks a button, then they get an email every time I write a new post. Unfortunately, like pretty much every problem I've sized up thus far, it wasn't nearly as simple as I was expecting, and I definitely hit a few snags along the way.

Forewarning: this is another post for my fellow nerds. For anyone else, it's a valid substitute for a tranquilizer.

The Nitty Grity Technical Details

Depending on how closely you've been following my various love affairs with trendy web technologies, you may or may not know that my site runs on Google's App Engine platform. Knowing very little about how email works, it logically seemed like the App Engine Mail API documentation could maybe potentially be a good place to start. The more I read though, the more foreign concepts I wandered across. What is DKIM? How does SPF work? Do I need an alias? Whose SMTP server am I using? I started to wonder, how bad do I really want email subscriptions anyway? But I, with ample hesitation, plodded down the path anyway.

DKIM

The ideas behind DKIM are pretty similar to the ones that power TLS and consequently all the important things that happen on the Internet. My understanding of DKIM is that, like, you have a DNS record saying "Yo, this is my public key, use it if you want to see if an email from me is legit" and anyone can see that when they look up records for your domain name (FromInsideTheBox.com in my case). Then, every time I send a message [at] frominsidethebox [dot] com, AppEngine adds a special magic signature in the header that's signed with the corresponding private key, and the recipient can use the public key to verify that the signature makes sense and thus the email must have really been sent by me. Cool stuff.

SPF

Turns out it's not just for sunscreen anymore. SPF works kind of like DKIM, minus all of the fancy encryption stuff. SPF works by adding another DNS record, but this one says "Bro, if you get a message from my buddy App Engine, you can trust it, he's chill." This way, you can send mail from someone else's mail server and still show that it can be trusted.

At this point, you may be wondering a few things, like Why are all of these things necessary? and Why is Brandon personifying computing protocols and making them sound like frat bros? To answer the former question (and ignore the latter), it's because the Internet is a giant pile of dusty, wobbling, unstable, duct-taped together systems stacked on top of each other like some shoddy version of digital Jenga. The underlying protocol for sending emails reared its head in 1982, and offered literally no way to authenticate where the messages are coming from. It makes sense, it evolved at a time when the "Internet" consisted entirely of academics and the military, so you could just trust the network. It wasn't like some drunk researcher was going to send prank emails, like:

From: Ronald Reagan <rreagan@whitehouse.gov>

To: Alexander Haig <ahaig@whitehouse.gov>

Subject: fire the nukes

Date: April 1, 1985

Ay Hags,

Launch one of those suckers at the moon, I wanna make it rain nacho cheese.

LOL,
Ya boy El Prezzo

...so yeah. That'd definitely be a problem today, thus DKIM and SPF.

CAN SPAM

There was one other acronym I had forgotten to take into consideration: CAN SPAM. CAN SPAM is a US law signed in 2003. It's also the reason all of the (legitimate) emails you get have an "Unsubscribe" link. Not wanting to be at the mercy of the US Government (well, any more than my taxes already cause me to be), I figured it prudent to add one of these "Unsubscribe" links myself. That takes work and effort though, now we're not just blindly (and indefinitely) sending out emails to anyone who sends me an address. Instead, now we have to maintain an active database, and remove people when they ask.

I'm making this sound harder and more complicated than it actually was. All I had to do was add a few web endpoints, and add a new type to my Datastore. The end code (removing the extra boring parts) looked something like:

// Email actions
http.HandleFunc("/subscribe", confirmationHandler)
http.HandleFunc("/confirm", subscribeHandler)
http.HandleFunc("/unsubscribe", unsubscribeHandler)

// When someone submits their email
func confirmationHandler(w http.ResponseWriter, r *http.Request) {
  address := strings.TrimSpace(r.PostFormValue("email"))

  // Do a few basic validations

  exists, err := db.EMailExists(address)
  if err != nil || exists{
    // Return an error
  }

  id, err := db.CreateEMail(address)
  if err != nil {
    // Return a different error
  }

  data := struct {
    ID int64
  }{
    id,
  }
  var buf bytes.Buffer
  if err := emailTemplate.ExecuteTemplate(&buf, "confirm.html", data); err != nil {
    // Yet another error
  }
// Send email with data in buffer, that basically says "Click here to subscribe"
}

// Someone clicking the confirm link I sent them
func subscribeHandler(w http.ResponseWriter, r *http.Request) {
  id, err := strconv.ParseInt(r.FormValue("key"), 10, 64)
  if err != nil {
    // Uh oh, looks like nobody is subscribing today
  }

  if err := db.ConfirmEMail(id); err != nil {
    // Experiencing some technical difficulties
  }

  render(w, BasicTemplate{
    Description: "Your subscription has been added.",
    Header:      "You're all set!",
    Message:     "Good to go.",
    Type:        "success",
  })
}

// Someone doesn't want my ramblings in their inbox anymore
func unsubscribeHandler(w http.ResponseWriter, r *http.Request) {
  id, err := strconv.ParseInt(r.FormValue("key"), 10, 64)
  if err != nil {
    // Uh oh, they aren't going to like this
  }

  if err := db.DeleteEMail(id); err != nil {
    // Looks like they're stuck with me until I learn how to program better
  }

 render(w, BasicTemplate{
    Description: "Your subscription has been removed.",
    Header:      "You're all set!",
    Message:     "Good to go.",
    Type:        "success",
  })
}

// The struct used to hold subscriptions
type Subscription struct {
  Address   string
  Confirmed bool
  Added     time.Time
  ID        int64 `datastore:"-"`
}

Woah Brandon, when did you add highlighted code snippets to the site?!

I'm glad you noticed, hypothetical reader! I added it a few months ago, but this is the first post to actually utilize it. It uses highlight.js behind the scenes, and took like five lines of code to set up. But I'm getting sidetracked here. As for the code above, that's really all I needed to satisfy the CAN SPAM act requirements, all in all not too bad. Granted, that doesn't include the code for actually sending emails, but we're getting there.

Actually Sending Mail

Okay, so at this point in the narrative, we've set up DKIM and SPF to work with App Engine, and we've got a storage system set up for subscriptions. According to the documentation, all we need to do now is add:

subs, err := db.Subscriptions(ctx)
if err != nil {
  // Looks like no emails are going out today
}

for _, sub := range subs {
  data := struct {
    ID int64
  }{
    sub.ID,
  }
  var buf bytes.Buffer
  if err := emailTemplate.ExecuteTemplate(&buf, "new_post.html", data); err != nil {
    // Guess we're not sending this one
  }
  mail.Send(ctx, &mail.Message{
    Sender: "post-notifier@frominsidethebox.com",
    To: sub.Address,
    Subject: "New Post on From Inside The Box: " + post.Title,
    Body: buf.String(),
  })
}

Hook that up to some authenticated web endpoint, pass in the post ID as a parameter, and boom, you've got a working subscription system. Shockingly enough, that's pretty much all it took to get working. Not only did it work, it worked on (nearly) the first try, no less. In fact, it was so easy that I was sure something would break in the not-too-distant future, as my experience with trucks and things and life in general has shown me to be universally true.

Mistake #1: Not Reading the Manual

A bit of background: I have a special button I push when I want to send out emails to subscribers. It's separate from the special button for publishing posts, because in the event that I accidentally publish something too early, I'd rather not spam everyone with my half-baked ramblings. And if anything goes wrong in the email-sending process after I push my special button, the server will return a message. For a month or two, this process worked well. Things were going along swimmingly, and then one time I pushed the special email button, and things stopped swimming. I got an error that of my 106 subscription emails, 6 of them didn't send.

Huh, that's a bit suspicious, exactly 100 emails sent successfully...

After the slightest bit of sleuthing, I found out that App Engine has some limits and quotas, a particualrly relevant one being that I can't send more than 100 emails a day. This works fine when ≤100 people are subscribed, but not so well when 106 people are subscribed. Sorry to the 6 people who didn't receive an email when I put up this post, unfortunately my logging was also bad enough that I didn't know which six subscribers didn't get a letter, so the best thing I could do was fix the problem for the future. Enter stage right, SendGrid.

SendGrid is a mail-delivery service. They do lots of other things too, but those aren't quite as relevant or interesting to me. All I care about was that they'd happily send way more than 100 emails. So I ripped out the App Engine mailing code, and pulled in SendGrid's Go client library. It took about an hour to throw together, and the only real difference is that I have to pass the SendGrid API key, which I embed directly in my code because I have no sense for design or regard for security.*

Mistake #2: Just Generally Being Incompetent

If you look at the mail snippet above, you might notice one teensy little problem, mainly that I'm just iterating over all the subscriptions and making a blocking call to send a message. This is fine when you only have a few users, but it scales linearly with the number of subscriptions. Not only that, but most of the time spent sending each message is waiting for the request to trot off to The Internet At Large™ and mosey on back. This is a shame, because Go has all these fancy concurrency primitives that I'm not taking advantage of, and would be particularly well-suited to this embarrassingly parallel problem. Using goroutines, we can fire off all the requests, which will then do all that waiting in the background. Similar to my last performance post, it means we can send all of the emails in (almost, kinda sort if you squint a little bit) constant time, instead of watching it get slower as I get more subscribers.

Sounds great right? Well it would be if I was a less shoddy programmer. Concurrency can be tricky to get right (even in a language built for it) if you aren't being careful, and I wasn't thinking. Here's the problem: each email is slightly different because the unsubscribe link has a unique ID for each subscriber, so I can't fire off the same email for everyone. See if you can spot where I went wrong in this first implementation:

var buf bytes.Buffer
data := struct {
  Unsub  int64
  PostID int64
  Desc   string
}{
  0,
  post.ID,
  post.Desc(),
}

var wg sync.WaitGroup
// Iterate over list of subscribers
for _, email := range emails {
  wg.Add(1)
  go func(email *EMail) {
    defer wg.Done()
    data.Unsub = email.ID
    if err := emailTemplate.ExecuteTemplate(&buf, "new_post.html", data); err != nil {
      // Always check your errors kids
    }

    message := sendgrid.NewMail()
    // Build the rest of the message here

    if err := sg.Send(message); err != nil {
      // Log the errors and whatnot
    }

    buf.Reset()
  }(email)
}
// Wait for all the emails to be sent
wg.Wait()

...did you see the ~~gorilla~~ mistake? I thought I was being clever by using one buffer for the emails to save memory, but I was really setting myself up for an awful and awfully obvious race condition by writing to the same buffer from 100 different goroutines with no synchronization whatsoever. The end result could have been a number of problems of varying unpleasantness, but what happened, in reality, was that every person got an email containing the emails for EVERYONE concatenated together. Aside from just looking silly, it means that anyone who got one of those emails could have (and still can) click each and every one of the unsubscribe links in the email and unsubscribe >100 people from my mailing list. I greatly appreciate everyone continuing to not do that, but I regularly back up the mailing list just in case.

The first thing I did to fix this was to give each goroutine its own buffer. Then, I wrote some beefy unit and integration tests for the mailing system. Once I was satisfied with the passing tests, I ripped out all the mailing code, and replaced it with a pool of mailer threads that get fed subscriber IDs via a channel. That way, I only need as many buffers as workers in the pool. The first few attempts caused the tests to fail for one reason or another (malformed message bodies, deadlocks, etc), but I eventually got it working. I'm going to skip including that code, because this post is already probably too long, but you get the idea.

Last Thing: TLS

All this talk of email is great, but there was one oversight on my part that I'd been avoiding because it was convenient to do so. My site was being served over HTTP, not HTTPS, so if you clicked the "Subscribe" button, it'd be sending your email address over the open Internet. Not a huge deal, but it's 2016 and I can do better than that. So with that in mind (and a bit of prodding from a friend), I started looking up the process for getting TLS on a custom domain with App Engine. I also wasn't ecstatic about the idea of paying a signing authority for a certificate, so I looked into Let's Encrypt. I ended up finding a few great guides on how to set it all up, then I just had to switch my image serving over to HTTPS, and add a "secure: true" in my App Engine app.yaml configuration so that it would always redirect to the HTTPS version of the site and that was it! I don't know about you, but the green lock in the address bar gives me a warm, fuzzy feeling.

*I'm joking, of course. What kind of idiot would mix credentials with code? Kidding again, I'm that kind of idiot and that's exactly what I do, but I swear I have every intention of having a separate Datastore table to store all of the private keys I need for the application. In the mean time, my source code is stored in a private Github repository, only accessible via two-factor or private key authentication. Granted, I'm still putting quite a bit of faith in Github here, but the stakes aren't all that high for this particular case.

Site Stuff Truck Tech

Truck Tech

November 24, 2015 at 10:09 PM

Source: My App Engine console, with a few minor tweaks

What CMS are you using?

What WordPress plugin do I use to make a clock like yours?

You're using AppEngine right? I can tell by your blog's IP address.

Brandon, when are you going to nerd out and talk about technical stuff?

Right now, as it turns out. First, a little disclaimer:

This post is more for the technical types. You'll likely find this post mind-numbingly boring if you don't have an interest in programming or web development.

Still reading? Sweet. While I can't talk about what I work on in my day to day professional life, I can talk about my background and blog-related things, which in my opinion are more interesting. If you want juicy technical details, skip the "Background" section, that's just me reminiscing over a time in my life where I didn't have to come to grips with being an adult and stuff.

Background

On the sidebar, I mention that I'm a "Software Engineer". I capitalize Software and Engineer because it's my Official Title™ and it makes me feel Very Important™. I didn't always hold this position though, so let's start at the beginning.

For some reason unbeknownst to my current-self, a smaller, younger version of me had it in his silly little head that he wanted to be a lawyer. Maybe it was the allure of bucket-loads of bucks and the promise of putting particularly unpleasant people in prison, I can't say for sure. In any case, I shot down that dream while filling out a health form (or something similar) and realizing that I hated paperwork. Considering being a lawyer is like 99% paperwork (with the other 1% being bureaucracy and more paperwork), I opted for a far less miserable dream.

My first formal introduction to programming was at nerd camp when I was 13. I learned the fundamentals of C, building a basic command-line choose-your-own-adventure game. That was cool and all, but it was actually one of the other classes that caught my attention. A different course was using Multimedia Fusion, a programming platform with an easy click-and-drag UI. When you're 13 years old and you just made your first text-based game, that feels pretty sweet. When you look over and the kid next to you has made a full-on graphical game complete with sprites and sounds, you know it's time to step up your game.

So after camp was all over and done with, I started experimenting with this Multimedia Fusion software. I eventually got good enough with it to actually work with Clickteam, the company producing the software, and I built some educational children's games to showcase their tools. Multimedia Fusion was really a gateway programming language though, and it wasn't long before I started looking elsewhere to get my fix.

I spent five summers in my hometown as a parking lot attendant, and as the beach/lot became more popular over the years, people started asking for a way to make parking reservations online. Being the enterprising 17 year-old that I like to think I was, I whipped up a web app in PHP (for no other reason than I didn't know any better), and an accompanying Android app. This PHP app was hands-down the worst thing I'd ever written in my life. Not only did it have zero testing/version control/productionization/comments/coherence/style of any kind, the codebase was essentially a Franken-program of half-working snippets from Stack Overflow duct-taped to a MySQL instance hosted on a now-defunct web host, with all the worst CSS attributes in existence plastered onto the frontend. Somehow it worked though, and every day I'd use the Android app to keep a live-updating count of how many cars were in the lot, and people would pay me a dollar a piece to reserve a spot.

In college, I did a little bit of everything. I drove buses, programmed bus websites, graded for classes, and picked up contracts for any project I could get my hands on. Along the way I dabbled in Ruby/Rails, Python, Node.js, some Android stuff, a ragtag collection of web frameworks/technologies, and even a bit of assembly. I eventually stumbled upon Golang, and that's what most of the code I write of my own volition ends up being in, including this blog.

The Blog

At a high-level: this blog is written in Golang and hosted on Google App Engine. Posts and questions are stored in Datastore, and images are stored in Blobstore. The frontend is your standard HTML/CSS/Javascript stack with a sprinkling of Bootstrap because, let's be honest, nobody actually enjoys frontend web dev. I don't use Markdown or any other markup languages; posts are written in plain ole HTML, with the odd CSS class added to my stylesheet for new functionality.

Golang

If you haven't heard of Golang before, allow me to make the introduction. It's a fairly simple language, looks like a bastard child of C and Pascal, and puts primitives for concurrency right into the language. It's compiled and garbage-collected, which means deployments come in the form of single solitary binaries, and memory leaks are less likely to exist/ruin your day. If you're familiar with C or Java, you can probably learn the language in an afternoon. Built right into the standard library is a world-class HTTP package, as well as a templating system, and there's an extensive network of tooling covering everything from linting and formatting to deadlock detection and code coverage.

App Engine

App Engine is the magical piece of technology I use to host my code. It's kind of like Heroku, if Heroku didn't hold your hand quite so tightly and was built on big-boy infrastructure. It provides a great local development environment for me to iterate on, and deployments take <30 seconds. The built-in logging, monitoring, profiling, load-balancing, versioning, and auto-scaling, among a dozen other features, mean that I can focus on the blog and not unrelated productionization details.

Savings Clock

The code for the savings clock can be found at the bottom of this Javascript file. It's basically just a function that runs once a second and calculates my savings given some hard-coded values and the current date/time. Since my insurance rate is variable (as is the cost of rent), I divide the time between May and whenever into "epochs", each of which has a start and an end date, and the insurance and rent prices for those respective time periods. Then I iterate over the epochs, noting whether or not we're in the middle of one, and sum up the savings over all of them. Before updating the display on the page, I check if the value is negative (which it isn't anymore!) and format it appropriately.

Optimizations

I threw the blog together over the course of a few days in May, back when I was still unsure whether or not I could actually live out of a truck. Being the hectic time that it was, my main concern wasn't code quality or adequate testing coverage, but rather getting my thoughts onto the proverbial paper while they were still fresh in my mushy, unreliable mind. This was fine for a while, but I couldn't help but notice that each new feature took longer to integrate, and was more frequently broken than not. I knew it was time for an overhaul.

The Refactoring

The scenario is familiar to anyone who's ever done a bad job at maintaining a codebase: editing existing code becomes painful, and adding new features becomes brittle and burdensome, normally involving lots of copy and pasting and manual testing. In my case in particular, there was little to no isolation between components, and I was passing around global objects like it was nobody's business. So I started a new branch (because I use git and Github like a reasonable human being), and set to work. The first big change I made was wrapping App Engine's Context with my own and passing that into all my request handlers. Then I added interfaces for database lookups to decouple the implementation details from the business logic. Wrapping the default templating system with my own helped to get rid of the code duplication surrounding rendering common components, and a bunch of other small code clean up tasks around the site reduced complexity and fragility. Abstracting things like pagination and post creation out into their own independent ideas further slimmed the handler methods. Since everything is easier to test with proper abstraction, I tossed in a few more unit tests for basic functionality, and finished up by running the golint and go vet tools and implementing the suggestions they provided. A few weeks and a few thousand lines of code later, we're here, with a much more maintainable web app. To celebrate, I started adding a few new features.

Caching

When this blog was only getting 10 hits a day, it didn't matter how inefficiently I served up my content. After all, Datastore read/write requests are charged by the millions. I could have ran the website on a Raspberry Pi and nobody would have been any the wiser. But my fleeting collision with the limelight meant that suddenly those redundant calls started to add up. It wasn't until recently I realized something that should have been abundantly obvious from the outset: I'm the only one who updates the content on the site, I don't need to check the database to see if it's been updated. I can store pretty much everything in memory and serve from there, refreshing the in-memory representations anytime I create or edit a post.

Uh Brandon, you know that there's Memcached for that sort of thing, right? In fact, it's even integrated into App Engine.

I…I actually didn't know that. And if I had known that earlier, I probably would have done that. But since I had just finished my refactoring, it was a snap to add in a caching interface, and then drop the business logic into my new Context before it delegated calls to the Datastore.

Parallelism

A couple days ago, I stumbled upon the Cloud Trace feature of App Engine. So I turned it on, loaded the main page, and saw the following:

Well that's no good.

Can you spot the cardinal sin? I'm using a highly-concurrent programming language, and yet I'm loading all of my image URLs for a given page synchronously. There are five requests, because I put five posts per page and each one has a single title image. So I made a new type to represent a list of posts, and then added a prefetch function that would spin up a separate goroutine for each post, and wait until all of them had finished, using a WaitGroup. As you can imagine, it's approximately five times faster now, as shown below.

Much better.

But it turns out I had an even worse offender, the search functionality (which actually works now). A search for a common word, like "the" or "truck" or "deranged" (kidding) would load the entire corpus of posts, and all of their images serially, as shown below.

It's hard to see, but this request took nearly 3 seconds!

But running our handy-dandy new parallel prefetcher on this made quick work of that.

Down to 700ms.

Naturally, I'm still at the mercy of the longest-running lookup, plus building up the inverted index, which takes nearly 500 ms (and will grow linearly with the number of posts I write), but it's definitely an improvement.

New Features?

Now that I have a lean, reasonably sane codebase, what's next? I have a few ideas.

Post Comments

People have their own ideas about my ideas, and sometimes they'd like to share them in a forum visible to the rest of the Internet-connected planet. I'll definitely add these at some point, once I build up sufficient tooling and reasonable spam blocking.

Microposts

All of my posts are about the various mundane aspects of my life: whether it's leaky roofs, insurance, bicycles, or the weather, if the topic is boring and has no right being expounded about for paragraphs on end, there's a good change I'll write about it. But some of the prosaic things I want to talk about don't always warrant a whole essay, just a little blurb. What I'm describing is basically Twitter, but as an engineer with no interest in actually using Twitter, it seems reasonable to just whip up my own little system.

E-mail Subscription

For people who aren't a fan of RSS, but still want to read about my apathetic adventures.

Interactive Questions/Comments

Right now I have a comments/questions section, but it's one-way, so if people have a burning question and don't want to e-mail me or wait for me to do another Q&A, they're currently out of luck. I think it'd be cool if when you ask a question, you could get a link that would act as a private, topical chat and we could reply back and forth on that.

I have a few other features in the pipeline: I'd like to improve my search functionality, and maybe add a post listing by month or topic. I've been adding little features here and there and updating the layout, and if you have any good ideas for new features, I'm all ears.

Site Stuff Truck Tech

Thoughts from Inside the Box

Posts tagged "Truck Tech"

Truck Tech: Smarter Search

Next Up

Truck Tech: You've Got Mail (and TLS and other stuff)

The Nitty Grity Technical Details

DKIM

SPF

CAN SPAM

Actually Sending Mail

Mistake #1: Not Reading the Manual

Mistake #2: Just Generally Being Incompetent

Last Thing: TLS

Truck Tech

Background

The Blog

Golang

App Engine

Savings Clock

Optimizations

The Refactoring

Caching

Parallelism

New Features?

Post Comments

Microposts

E-mail Subscription

Interactive Questions/Comments

Search

Subscribe