Thursday, May 10, 2012

your data is always wrong and other things I wish I knew a long time ago

Data analysis is tough work. I've been doing it for eight years or so, and I still have an infinite amount to learn and master.

However, there are a few things I have picked up, and some of them are simple enough for the 22-year-old Chris Perry to understand, so I'd like to formally request that whenever any of my descendants gets around to inventing time travel,1 they pass these on to my former self.

The first item you should tell young Chris Perry, of course, is to INVEST EVERY DIME YOU HAVE in Apple stock while it's sub-$50/share, just like your roommate John IS TELLING YOU TO DO EVERY NIGHT YOU IDIOT, but I'll spare you the rest of the non-data-related items for today.

Your data is always wrong

Always.

Your data is always wrong.

It's wrong for all sorts of reasons: the data collection mechanism is broken, the servers went down, there are strange confounding effects due to your pseudo-random selection, ghosts are somehow causing weird corner cases every millionth row, solar flares, you can name them all, but the real reason is this: just like mom said, life isn't fair.

Data is not perfect. It never has been. It never will be. Your stats teacher couldn't tell you that because she wanted to perpetuate the myth of some Santa Clausian dataset which exists without blemish and travels the world giving gifts to Analysts on every Pi Day Eve, but the truth is the world is just imperfect, and there are always imperfections in your data.

Always.

Once you accept this, you'll be a much better analyst. The data you are looking at, right this second, is wrong. Most data isn't so terribly wrong that you need to go running around outside screaming that the Bayesians are coming, because it usually falls within the margin of error,2 so you're usually okay, but you must always keep this in mind.

Thou shalt not give bad results

This is the only commandment in all of data analysis. You can never ever transgress the the most holy commandment and make a mistake that causes you to deliver bad results.

This does not contradict my first point, because there's a big difference between calling Indiana for McCain and telling me that 84% of the U.S. population is Jewish. I did the former, and I blame the margin of error, and I saw the latter happen and it wasn't pretty.

If you quote numbers to your boss, you are held to a high standard. If you retract them later after you discover an error, or worse, someone else does, you look like an idiot. You must be right.

Know your numbers

Speaking of, be sure and know your numbers. Before you present to anyone, have the answers to a few obvious questions they might ask about those numbers on the tip of your tongue. To use a completely hypothetical example, if you were to list out segments of your user base that return at high rates, you might want to, and again, this is completely hypothetical, have stats prepared on what those people do when they return, you nincompoop.

Have relevant figures ready and in your head when presenting. It will help you not look stupid. Trust me.

Always double check

Even if you just spot-check a result or two by hand, you really need to verify your results.

If you were a software developer, you would have the privilege of writing code that can, at absolute bare minimum, be released and tested on users out in the wild, and by their screams and angry hacker news comments and your burning servers, you'd be able to tell that something is broken.

Not so for data analysis. If your code is buggy, you are screwed beyond belief. With satanically horrific frequency, it is often not at all obvious that there are problems with your results.

If you're not verifying your results using some independent method, then you are just relying on your own gut check to make sure the numbers are right, and let me tell you, relying on your so-called "gut check" just means you're going to get your gut checked by a physician someday soon because when the CEO calls you out for screwing something up, you will learn the true meaning of ulcerating.

You can ignore some stuff

BUT DON'T IGNORE THE WRONG STUFF.

It's routine and common to see numbers that are close-ish to each other and declare it's it's close enough for government work,3 and continue on. And that usually works.

But, and going back to your gut here, sometimes you'll see something just a little bit off, and you'll have this momentary little nagging thought that'll tell you to investigate that further.

If you ignore this, you will almost certainly regret it. Write it down, and check it out later when you've fallen out of your groove and you need something else to spend time on before you get stuck going back and ensuring compliance with your logging spec.

You will always perform a task another time

At least. Always. Without fail. No wait, let me hear you tell me that this analysis is special and you're only doing it this one time, so you're just going to whip up some cheap code and you won't save it, or maybe, horror of horrors, you just plan on doing something by hand in Excel,4 and let me tell you that it is a Grand Law of the Universe that either your boss, your co-worker, or you will want you to do it again in the future, or an analysis almost exactly like unto it.

You are one hundred percent guaranteed to do it again. Therefore, script it. You must script it. If it's not easily repeatable by script, you have failed.

I once had a data export project for a product worth literally millions of dollars that depended on one guy who fiddled with some exporter by hand, thinking he would only have to do it once. We did it over a dozen times. It wasn't until the dozenth that I realized he was doing it by hand, and I suddenly understood why he wanted to stick me with a rusty shiv every time I came down telling him it blew up again.

Document your scripts

While you're scripting, please, for the love of everything happy and kind on this earth, please document your scripts. You learned the comment character in your language of choice. Use it. You'll thank yourself in a year.

Also, spend an extra ten seconds and think of a descriptive name for your script. Make it really easy to find.

Lastly, naming your variables bob, jim, foo1, foo2, etc., makes for a very sad you eons later when you're trying to decipher what went on.5 You are guaranteed to forget everything about the script you are now writing within a week. I promise.

Always start small

I know it always seems like your code is bug free, and you can run the analysis over the entire dataset, and it doesn't matter if you have to wait a minute or two for results, because, hey, even if there is a bug in your code, there will only be one, and you'll only need to re-run it once, and the Easter Bunny is real and someday a politician will voluntarily balance the budget.

No. I normally hesitate to contradict people in such strong of terms, but you are an idiot. Your code has errors, and you're going to need a few cycles in order to iron it all out, and you'll save yourself a lot of wasted time if you just run everything on a small subset, then, once you're sure everything works, run it on the entire dataset.

Even if this is only a difference of 30 seconds, you will still save yourself loads of time, because do you know what happens to your brain when you wait for 30 seconds in order to change something? It shuts off completely and starts singing the theme song to Spongebob Squarepants. You are taxing your faculties trying to maintain everything in your mental memory, so keep the feedback loop as short as possible.

There are two steps to performing analysis

Step 1: Spec out what you are going to do.
Step 2: Do it.

If you try and merge those steps, and just set off down your road less traveled with your silly hopes and dreams, you're going to run into that tree down the path, and instead of busting out your chainsaw and knocking that sucker out of the way, you're going to alter your course a little bit to the left because, hey, it's still roughly the same direction and it'll kind of get you to the same place, and maybe you'll mumble something about the law of large numbers on your way.

No.

Spec out exactly what you are going to do as a separate and distinct step. This will force you to have the discipline necessary to tackle the roadblocks that fall in your path, instead of sissying out and performing a crummier analysis for it.

You need a data buddy

You need someone to bounce ideas off of, to help you think through big problems, and, in general, be a sounding board. Find someone smart who isn't afraid to tell you you're wrong. You'll thank her or him, and me, later when you produce, as Moe said, "the best damn [analysis] in town!"6

1. And you're taking your sweet time you insouciant child.
2. This term was invented by statisticians to force other disciplines to give us a break whenever our numbers are slightly off.
3. This is my favorite sissying-out phrase, beating out appeals to the margin of error.
4. Though if you are doing "data analysis" by hand in Excel, you are living deep in sin and must needs repent.
5. The phrasing of this sentence is basically stolen from my good friend Chris who was kind enough to look over a draft of this. Thanks. And thanks Jamie for looking it over too. Oh, and Britt, thanks for helping so much on this during finals. You're the best.
6. Due to copyright restrictions, my favorite Simpsons moment ever has been removed from YouTube. But you should watch Homer vs. the 18th Amendment sometime.

Tuesday, May 8, 2012

cleaning schedules

A friend of mine recently posted a cleaning schedule on her blog, with cleaning tasks broken out by frequency; daily, weekly, monthly, and yearly. This reminded me of the cleaning schedule my wife wants us to very loosely follow, with cleaning tasks broken out by misery; really miserable, really miserable, and really miserable.

All of this talk about cleaning schedules is unnecessary, because, like most things in life, I already had the perfect system organized as a bachelor, as follows:

Daily

Don't be messy, you turd.

Weekly

Hire a maid.

Monthly

Hire a maid.

Yearly

Move.1

1. I'm totally kidding about this post honey! I love vacuuming. I'll get right on that tonight.

Tuesday, March 27, 2012

deodorant

Britt: Why do you put deodorant on your chest??
Me: It's perfectly normal! Look (pointing at the illustration on the can), I'm putting it on exactly like this guy!
Britt: Congratulations, you can put deodorant on as good as a functionally literate person.
Me: I'm mumble...functionally...mumble...literate...mumble.
Britt: Oh, it looks like he's using it as a body spray.
Me: (Just now learned about the concept of body spray) Oh...yes...that...was...my...idea.
Britt: Honey, deodorants have actual chemicals that kill odors when you apply it in the right places.
Me: Wait a minute, are you saying I smell bad?
Britt: (awkward pause) ...ask me no questions...1

1. Suffice it to say, my deodorant application methods have since changed.

Monday, March 19, 2012

things to talk about when you don't actually have a topic

Many of you know what it's like: you haven't posted on your blog in three weeks, you got married, went on a honeymoon, and spent the entire last weekend trying to situate your house and you should have accepted the offer of help from your wife when you went to bring the couch up from the truck, but you didn't1 so your entire body feels like you spent last night being beaten senseless by mafioso ducks and you haven't posted in three weeks and you need to come up with a topic or something or at least finish this sentence because nobody can read books this long, let alone sentences this long, and oh my gosh I have to stop this somehow but I can't.

I have but a few thoughts to leave you with this fine evening.

The latest Mission Impossible film is called Mission Impossible - Ghost Protocol.

It is not called Mission Impossible III. Mission Impossible III is a film that came out six years ago, and contains many scenes that you might remember, had you seen the film six years ago. Mission Impossible - Ghost Protocol contains none of these scenes. It does not, in fact, re-use any footage from the previous film. This would be stupid and nonsensical. If you happen to have rented Mission Impossible III on your honeymoon under the mistaken impression that it was the latest Mission Impossible film, you might want to HOLD YOUR TONGUE AND NOT SOUND LIKE A STUPID CRAZYPERSON IN FRONT OF YOUR WIFE when you recognize the scenes, characters, and plot, and suggest that elements have been reused because THAT MAKES NO SENSE AT ALL.2

Check the phone number before you call your home teachee.

In my church, they have a program called home teaching, wherein people check in on each other to see if they need any help. You are assigned individuals, and you typically visit them in their home every month, but, if you're two days away from your wedding and you have no time and you're just trying to survive, you might just call some of your people. However, and this is really important, make sure you have the CORRECT phone number for them, because, and this is purely hypothetical, one hundred percent hypothetical and not at all autobiographical, you might call the phone number you have stored for an individual, and greet them enthusiastically, and ask about their work, life, etc., and realize in the middle of a forced and awkward conversation that you have THE WRONG NUMBER.3

People in my life are awesome.

Not the ladies who I waited to pay for parking for five minutes,4 but everyone else. We've been so surprised by all of the kind words, notes, thoughts, gifts, and everything sent our way. People really have been so sweet and loving and generous, and we'll be thanking you all individually of course,5 but collectively, thank you. My faith in humanity hasn't been restored, of course, because seriously, if my wife is in law school, the least you could do on our first time in Sunday school in the new congragation is not proclaim, "you can tell how wicked a society is by the number of lawyers in it", but my faith in the people in my life continues unwavering.

I am never serious here, but let me break from character to just say that the best thing about my life is the people I know; my wife foremost, naturally,6 but I have the privilege of knowing and associating with the absolute best people on the planet. I never cease to find that astounding.

Thanks. All of you. Thanks.

1. In truth, it was just a big Ikea box and I, as venerable George W. Bush would say, misunderestimated the size of said box by at least fifty percent. She showered, and I slowly pushed and pulled a box the size of a motorcycle up to our apartment. Not my finest moment, intellect wise.
2. And if you don't realize your error until two thirds of the way through the film, may god have mercy on your soul.
3. Whoever it was I called must have been so confused. We literally talked for five minutes with me asking him how he was doing, how work was, offering to help, and he just kept on pretending like he knew who I was and both of us kept that game of chicken up until the end of the phone call. The good news is, he's working two jobs now and is doing all right, and doesn't need any help, and thanks me. Mr. Haynie, the man I intended to call, and who I called afterwards, is doing wonderfully as well, I'm happy to report.
4. They sat and laughed to each other about everything, including their inability to figure out the machine, their inability to use a card to pay, and the fact that the dollar that they tried to put in the credit card slot did not work. When I offered them quarters to get them out of my way, they thanked me and said, and I quote, "You must be from Harvard!" Yes, yes I am. Because the Crimson are so well known for their ability to give quarters to brainless parkers. Also, look at my sweatshirt: it says the University of Utah. I don't understand people at all.
5. Please excuse whatever genericisms I send your way, but have you ever tried to write fifty thank you notes?
6. I'm almost done typing while you're trying to sleep; I'm sorry honey, you are the best and most patient.

Monday, February 27, 2012

the vow

The Vow is a heart-warming tale of one young lover and one young former lover as they discover the joys of making the men in the audience keep down bile and check their watch every ten minutes hoping the pain is over. It teaches us the helpful moral of loving the one you...no, choosing the one you...no, loving to choose to lo...no, actually, there weren’t any morals, which is fine by me because if I wanted to get preached to, I’d move back home.1

If there was a moral it was something along the lines of "next time hire a casting director that realizes that Channing Tatum looks more like He-Man than a hipster". I live within spitting distance of San Francisco,2 and let me assure you that no hipster male has that body, much less has any sort of muscle definition on the whole of his body.3 But4 I know all about his body because the director went to great trouble to make sure we saw his booty after he spent the night naked on the couch, and I know all about involuntary convulsions because I then spent the rest of the movie with my face contorted in supreme disgust trying to decide how exactly I would BURN MY COUCH TO THE GROUND if anyone ever slept naked on it.

Can you think of anything more disgusting? I cannot. Maybe burning the couch was in a deleted scene, but seriously dude, if your wife kicks you out to the couch, maybe the problem isn't amnesia, maybe it's because you need to rethink what sort of horrors you are leaving on your living room furniture before you plop down for the night.5

1. Where I’d be preached to by my little brother, of course, and no, I was not in any way referring to my mother.
2. Though in SF they call it peeing distance, because spitting is so yesterday, and the ENTIRE CITY SMELLS LIKE URINE.
3. I'd be less inclined to insult hipsters if I hadn't spent a flight out of SF next to one who physically pushed me off the arm rest I had the audacity to use for less than three minutes, so, I'm sorry, but I hate you all, and please change out of those ridiculous jeans.
4. This pun is intentional.
5. Also, I'd like to thank Britt for letting me steal from her to construct everything that was funny in this post, and no, she did not agree to this beforehand, but thank you honey, welcome to community property!

Sunday, February 19, 2012

startups

Between embarrassing myself publicly on the internets and planning a wedding which will feature a woman who hopefully doesn't mind getting married to the village idiot, one thing I love to do in my spare time is dream up startup ideas.

Thinking of startup ideas is great, because you get all of the fun and glory1 of coming up with the most revolutionary and innovative solutions that have ever been dreamed up in the history of the human condition, with none of the actual effort.

Lately I've come up with the most revolutionary and innovative email experience in the history of mankind. It will work like this: imagine threaded mail conversations, sort of like gmail. Next, imagine chatting tightly integrated, sort of like gmail. Next, imagine keyboard shortcuts, and excellent search, sort of like gmail.

Okay, okay, just imagine gmail. Except, better.2 Because instead of ads at the top of the screen which stay still, think about ads at the top of the screen that move. You see where I'm going with this? That's right, an ad ticker! Magic. Innovative. Revolutionary. Steve Jobs award, here I come.

Also, I'd probably add in features like backgrounds that include sharks, people getting eaten by sharks, maybe spiders on your face, and other gross things, all to discourage me from tabbing back to my mail client and checking my email every thirty seconds like a squirrel hopped up on speed.

My other great idea is to start a site that lets people post funny pictures of themselves and add "funny" commentary, and possibly even include features like letting people post funny pictures of things that aren't themselves, and maybe even, if you ask me really nicely, including the feature wherein everyone on the entire planet can spend the past week beating a horse dead posting and re-posting pictures that tell me what society and your mother thinks you do and make me want to punch myself in the face every time I log in to that site and wonder why in the world I keep coming back and getting sucked in to looking at those things.

I kid, I kid. I don't wonder why I keep coming back. It's because the background doesn't include spiders on my face.

1. Where glory is defined as me spending the entire day thinking I'm brilliant for coming up with an idea that features shipping people mixed dry ingredients that they can buy at stores themselves for a fraction of the cost, taking into account the lack of shipping expense for relatively low-value items, and fun is defined as me getting laughed at to this day by my co-workers to whom I ran and told excitedly of said ridiculousness.
2. It wouldn't, for instance, show you those stupid yellow arrows which try and tell you which email conversations are important and consistently get it wrong: email from my niece including cat pictures? Not important. Bill from my credit card company? REALLY IMPORTANT.

Monday, February 13, 2012

things you should never, under any circumstances, pray for

While you should pray for many things, allow me to urge you to never pray for the following:

To drive less

This is a terrible idea. Let me assure you that you are better off living your life driving, or you might, to imagine up an entirely hypothetical situation with no real-world parallels, be tempted to wish to be able to drive less in your life, and one day return to your truck, and start said truck, and attempt to put said truck into gear, only to find the truck will not go into gear. You might try several different ways of getting the truck into gear. You might mildly curse in the language of your choice.

You might spend the following days begging for rides and walking all over the peninsula in the rain.1

These are just a few of the things that might happen to you, should you wish to drive less.

Patience

Again, a terrible idea. Have you ever been taught patience? Let me assure you that you are better off living your life without praying for it, or you might, to again imagine up an entirely hypothetical situation with no real-world parallels, find yourself spending two hours on a hard chair inside Whole Foods, slurping their free wifi, waiting for a tow truck to show up, and trying your best not to facially punch the hipster who keeps jolting you every time he walks up to get yet another glass of water from the jug next to you.2

Humility

Even more so than all of the previous, and more than any other quality that has ever existed, you should NEVER DESIRE THIS TRAIT EVER EVER EVER. Don't pray for it, don't wish you had it, DON'T EVEN THINK ABOUT WANTING IT.

Have you ever been taught humility? You're better off without it, or you might, to dream up a completely ridiculous hypothetical that is so outlandish that I laugh at your insinuation that this might have ever happened to anyone you know, and one that you should never ever think has anything to do with my own life in any way, and one that you SHOULD NEVER MENTION TO ME EVER UNDER PAIN OF DEATH,3 you might be unable to get your truck into gear, you might have it towed, it might be a horrendous inconvenience in your pitiful life, and you might run yourself ragged trying to get it fixed.

And when you call the mechanic that fateful day to check on the progress, well, he might laugh at you. And laugh. And ask you with sincere confusion why you placed the car in neutral with the four-wheel drive lever.4

In unrelated news, I'm selling my truck, because I'm too stupid to own it.5

Also, Brittney, I'm sorry you're marrying an idiot. I can be smrt sometimes, I promise.

1. That would be two hours of walking yesterday, if the hypothetical hiker were counting in this hypothetical situation.
2. I swear he had ten cups of water. What is wrong with people in this world?
3. I am one hundred percent serious. I realize the hypocrisy of blogging this, but consider it exposure therapy.
4. This is the most embarrassed I think I have been in my adult life. I cannot express in words the dread that filled my soul as I went to pick up the car from the mechanic. I inadvertently put my truck in neutral. I then had it towed like an idiot. And then I had to face down a mechanic who, with good reason, thought I was the stupidest person on the planet. The only consolation is there is a very slight, minuscule chance that he actually fixed the car and then made up this story to make me look retarded. The chance of that being true is somewhere around the probability of a meteor crushing me at this moment and sparing me from my severe humiliation, but, I would like to just make sure you all know it still is a very real possibility.
5. My only real worry about putting this story in the public domain is being fired from my job for gross incompetence, and never being able to find employment again due to obvious abject stupidity.