I'm a full-stack web developer, quilter, sewist, avid reader, and a bunch of other things. This is my blog. Please connect with me on LinkedIn or visit my Github for more!.
  • Learnin' Kubernetes

    A Greek trireme, steered by a helmsman..get it?

    At work, all teams are being asked to adopt Karpenter, which, as you can tell from the name, is related to Kubernetes.

    The adoption is a relatively simple process, thanks to pre-work other teams have done to automate most of the hard stuff. Realistically, I should just have to change a few values in a couple YAML files and be done with it. However, when you’re talking about modifying deployed services, it always pays to be careful.

    I felt like what I was being asked to do would be relatively simple if I only understood some of the terminology being used. I mean:

    Identify taints to add to provisioned nodes. If a pod doesn’t have a matching toleration for the taint, the effect set by the taint occurs (NoSchedule, PreferNoSchedule, or NoExecute).

    Yikes?

    So some basic terminology larnin’ is in order. I’m not really a devops girl so I am starting with the true basics. Here are some of my notes. Any mistakes here, however, are fully my own.

    Note: I dithered for four days about whether to post this, but then I read Comfortable with the struggle and was reminded that not understanding things is the default mode for software developers. It’s okay that I couldn’t tell you the difference between a Kubernetes pod and a node until this week, although I feel better now that I have learned to tell them apart.

    Kubernetes

    As explained by its docs, Kubernetes is an “open source system for automating deployment, scaling, and management of containerized applications.” In other words, it manages your apps running in containers: with Kubernetes, you can easily scale an app up or down (increase or decrease the number of pods running your app), handle load balancing, and (for example) automatically reboot a pod if it is failing.

    Fun fact because I am a language nerd: Kubernetes comes from kybernetes, the Greek word for “helmsman” (the idea being that if a Docker container is like a shipping container, then the helmsman is the person who steers all those containers around). Know what else can claim kybernetes as a root word? If you guessed cybernetics, you’d be correct: Norbert Wiener claimed he coined the term because he was interested in systems of feedback, and said that ships’ steering engines were “one of the earliest and best-developed forms of feedback mechanisms.” (“Governor” is also derived from kybernetes via Latin. Now that’s cool.)

    Karpenter

    An add-on for Kubernetes. Its value proposition is “just-in-time nodes.” It is aimed at helping users reduce costs by auto-scaling-down nodes if they are underutilized, or switching to “spot instances” (an AWS product that basically runs your app on ‘spare’ computers), among other things.

    Pods / nodes / clusters

    A pod is the smallest possible “unit” in Kubernetes - one container, for example. (However, a pod can hold more than one container (: ) A node is the thing that the pods run on, typically a physical server or a virtual machine. All pods need a node to run on, but not all nodes have pods scheduled to them. A cluster is a group of nodes. Typically a service would have a cluster per deploy environment, so one for dev, one for staging, etc. The Kubernetes control plane lives at the cluster level.

    Control plane

    Just a fancy name for the software that actually orchestrates the containers. The scheduler is an important part of the control plane as it is the software that assigns pods to nodes. It can do this automatically, applying some sensible defaults, but if you work on a big production app you probably want more granular control, hence the next set of terms.

    Node affinity

    Pods have what’s called node affinities which means that they either want to run on a certain node (or a type of a node or a node location), or don’t want to run on a certain node. For example, you might want two services that communicate with each other to be in the same geographic region, to reduce latency. It’s all still electrons, after all. Or, you might want certain pods to only run on nodes with some minimum CPU power, or you might just want to make sure that your app is evenly spread across regions to better protect it from downtime.

    Taint

    Taint is like the opposite of a node affinity, it says “don’t put this pod on this node.” Unless the node can tolerate the taint…

    Toleration

    A pod that has a toleration matching a node’s taint can still be scheduled on that node.

    The syntax for taints and tolerations (in addition to all the new terminology) confused me.

    This is the command to apply a taint to a node:

    kubectl taint nodes node1 key1=value1:NoSchedule

    This means that no pods can be scheduled on (assigned to) a node unless they have a toleration matching key1=value1. (I had assumed that this means no pods can be scheduled on that node if their key/value matches, but it is the opposite.)

    So now that we know (or at least have familiarized ourselves) with this terminology, we can go back to the Karpenter documentation from before:

    Identify taints to add to provisioned nodes. If a pod doesn’t have a matching toleration for the taint, the effect set by the taint occurs (NoSchedule, PreferNoSchedule, or NoExecute).

    This just means Karpenter can control how nodes are tainted, rather than controlling it through K8s.

    In the end, to adopt Karpenter on my service at work, it truly was a pretty easy lift; as I said, other teams did most of the hard work so I just had to update some YAML to tell Karpenter my service’s tolerations and taints. For now, the taint is essentially, “don’t run any pods on this node that don’t belong to my team” and the toleration is “I belong to Rachel’s team,” so it’s not too complicated. But now that it’s set up we can (possibly) come up with some more interesting customizations in the future.

    Resources

  • Fruit Tracker, my first Fitbit app, is live

    Image of a beautiful apple in the grass

    My first Fitbit app is live in the Fitbit app store.

    You can read about what it does here. Below are some things I learned while making it.

    • The fitbit onboard computer is very stupid. This is probably why Fitbits have such good battery life. You can do very basic calculations on the Fitbit itself, and render or unrender components, but if you want to do anything else like:
      • Connect to the internet/fetch data from an API
      • Store a variable to be retrieved next time the app is opened
      • Download images
      • Get a GPS heading
      • Do anything while the watch is off/idle

      You need what is called a “companion app,” another Node app that runs on your phone, rather than on the watch itself. (I suspect that making the watch super dumb is partly why its battery life is so good.) The heavy compute functions are offloaded to the phone and you simply pass data back and forth from the phone to the watch using the built-in Messaging API, or a different File Transfer API if you need to move large files. (Messaging API is limited to 1KB of data per message, which was not a problem for Fruit Tracker, but has caused me to have to get creative with another app I’m working on.)

    • By the way, the companion app and watch app can only send strings, so get comfortable with JSON.stringify() and JSON.parse().

    • Manual positioning of elements is for suckers. You can totally declare an svg element and position it like:

          <text fill="white" x="50" y="15" text-anchor="middle" font-size="20" id="yesterday">
      

      And in fact I have done that with a number of components, but the winning way is to use Fitbit components. (I mentioned those components in a previous post.) Those come prestyled, they are easy to place, and using them also ensures that you’re consistent with the rest of the watch UI.

      Finally, the last thing I learned from building Fruit Tracker is….

    • Eating the ‘right’ amount of fruits and vegetables is freakin’ hard. I thought I was doing well until I started actually tracking my consumption. Good news is, I have almost immediately increased my fruit and vegetable intake by at least 50%. I can’t imagine a better example of using measurement tools to make change.
  • What Is A Snowflake Stage?

    Some snowflakes photographed under a microscope

    At work the past week, I’ve been working with large datasets, trying to figure out the most efficient way to process and transform the data without overloading other teams’ servers.

    Disclaimer, I’m not a data scientist and SQL is not my strong suit. Any mistakes in the below are mine :)

    So imagine you have a table like this:

    First Last Favorite_color Zip_code
    Angela Apple red 90210
    Bruce Banana yellow 02134
    Claire Canteloupe orange 20008
    Zed Zucchini …zcolor 12345

    And let’s say that table has eleventy billion rows in it. The actual number is not important, for the purposes of our exercise the fact that it is “huge” is the important part.

    Now let’s say you want to get everyone’s first name whose Zip code matches a list of 10,000 arbitrary zip codes.*

    You could do: SELECT First FROM my_table WHERE Zip_code in (10001, 10002, 10003 ... 20000)

    but that is very slow! I don’t know how long it takes on this hypothetical table but in my real-world scenario, a similar query took 20 seconds for 1000 items in the “in” clause. With the amount of data we need to pull from this table (which is significantly more than 10,000 rows), batching in groups of 1000 would be prohibitively slow.

    However, if you had your desired ZIP codes as a separate table, my_codes:

    Desired_zip_codes
    10001
    10002
    20000

    you could run a much faster query:

    SELECT First from my_table INNER JOIN my_codes ON my_table.zip_code = my_codes.desired_zip_codes

    This is much faster because sql magic?**

    In my case, I had confirmed that a join would be faster than an in clause. So how do I get my list of codes into a JOIN-able format?

    A colleague suggested using a Snowflake Stage, which I had never heard of before. The basic documentation on stages explains how to create them. Essentially it’s a loading area for ingesting structured data that is not in your database.

    In my case I have a CSV uploaded to S3 with the data I want to join on. So I need to load that into a stage and then I can join against it.

    So we create a stage:

    CREATE STAGE my_stage URL='s3://mybucket/mypath'

    There are a million other params you can pass in to tweak the behavior of said stage, see more on that here. But that’s all it takes to create a very basic stage. Depending on how your database and AWS integrations are set up you probably need to add some permissions. This will be different for every setup so I’ll spare you the gory details of what I did here (it was mostly Terraform and pinging our data-platform team for help). But assuming you’ve created the stage and your database role has permissions to read from it, you are now in business!

    Say you have a file in your S3 storage: mybucket/mypath/data.csv. It’s now as simple as:

    SELECT $1, $2 FROM @my_stage/data.csv

    Snowflake doesn’t know that the header of your CSV is its column names (if you’re not using a headerless CSV, which is a whole other thing) which is why we have to use positional arguments. However, we can fix that by importing the CSV into a temporary (or permanent, depending on our use case) table:

    CREATE TEMP TABLE mytemptable (ZIP number);

    Lets now say our zip code list is a one-column CSV:

    COPY INTO mytemptable @my_stage/data.csv;
    
    SELECT First FROM mytable JOIN mytemptable ON mytemptable.ZIP = mytable.Zip_code;
    

    As before, there are about a thousand different configurable options here. I especially like MATCH_BY_COLUMN_NAME, which, when paired with parse_header=TRUE, automagically determines the column names in your CSV (with headers) and inserts them into the table, even if the columns in the table are in a different order.

    Note: MATCH_BY_COLUMN_NAME doesn’t get you a free pass to not create the temp table, if you want to select by column name. To dynamically create the temp table without knowing the headers in the CSV, that’s a separate post. Or see “Generate column description” in my resources below.

    That said, I’ve now created a script that does everything I want to and selects the data I want from Snowflake, joining on my imported CSV! It’s still quite slow, taking a few minutes to run each time, but compared to the previous process, this is a huge improvement!

    Resources:

    *I realize this is a contrived example; believe me that my real-world use-case is more relevant.

    **I learned in the process of writing this blog post that a JOIN is not always faster than an IN, and ‘real’ data scientists and database admins have dedicated many words to analyzing which query is faster under which circumstances. (Don’t forget that with most database languages you can run something like EXPLAIN PLAN before actually running the query to see how long it might take.)

  • More Fitbit Dev Resources

    a stock photo of a person wearing a smartwatch. I'm not even sure if this is an actual fitbit.

    The developing-for-fitbit journey continues…

    Breaking changes between SDK 4.x and 5.x

    The older Fitbit I have, a Fitbit Versa 2, uses software written with version 4.x (or lower?) of the Fitbit SDK. Modern Fitbits use… I think it’s up to 6.x? And excitingly, there were a ton of breaking changes between version 4.x and 5.x.

    I solved a number of the issues here, but am running into more, especially with premade Fitbit components like buttons–which, incidentally, are very poorly documented.

    However! Pure luck and a lot of random web searching led me to this official Fitbit demo project: https://github.com/Fitbit/sdk-exercise/. And we’re in luck, the initial version was written with SDK 3.0 and then was abandoned shortly after, even though all of Fitbit’s developer docs were updated for 5.0 when the Versa 3 came out. In other words, it’s an ideal resource for learning how things ‘used to’ be done.

    From that I found that if I want to place a button on the screen on the Versa 3, I would import <link rel="import" href="/mnt/sysassets/widgets/text_button.defs" />, but if I want to place a button on the screen on the Versa 2, I should import <link rel="import" href="/mnt/sysassets/widgets/square_button_widget.gui" />.

    I’m not even a hundred percent sure how to browse that widgets directory, if it’s even possible; I suspect it’s built into Fitbit’s firmware and can’t be accessed by humans. Which makes this dead repo possibly the best source of truth for SDK 4.0 widget names?

    There are a number of other demo projects in various states of completion in the Fitbit github organization. I believe the repos beginning with sdk- are the most useful, but take a look.

    General help

    Bless this Aussie developer who has 1) made about a thousand watchfaces for Fitbit (including not one, but two watchfaces that actually play Pong) and 2) written up his own SDK guide filled with helpful tips.

    Hopefully these resources help you as they are helping me. And I hope I will have an update on my own app(s) soon!

  • Common Errors When Developing for Fitbit

    Or maybe just common to me?

    Image of the Fitbit developer bridge on a watch

    Problem:

    Install failed: RPC call to 'app.install.stream.begin' could not be completed as the RPC stream is closed

    Cause:

    Jury’s out on what causes this. It seems to happen when I sleep my laptop; the Fitbit simulator doesn’t seem to be able to recover.

    Solution:

    Remove ~/Library/Application\ Support/Fitbit\ OS\ Simulator/, which contains caches, preferences, etc. You’ll have to reset your preferences afterwards of course.

    Read on →

> > >Blog archive

Projects

> > >All projects