DEATH NOTE: L, ANONYMITY & ELUDING ENTROPY

In the manga Death Note, the pro­tag­o­nist Light Yagami is given the su­per­nat­ural weapon “Death Note” which can kill any­one on de­mand, and be­gins us­ing it to re­shape the world. The ge­nius de­tec­tive L at­tempts to track him down with analy­sis and trick­ery, and ul­ti­mately suc­ceeds. Death Note is al­most a thought-ex­per­i­men­t-given the per­fect mur­der weapon, how can you screw up any­way? I con­sider the var­i­ous steps of L’s process from the per­spec­tive of com­puter se­cu­ri­ty, cryp­tog­ra­phy, and in­for­ma­tion the­o­ry, to quan­tify Light’s ini­tial anonymity and how L grad­u­ally de-anonymizes him, and con­sider which mis­take was the largest as fol­lows:

  1. Light’s fun­da­men­tal mis­take is to kill in ways un­re­lated to his goal.

    Killing through heart at­tacks does not just make him vis­i­ble early on, but the deaths re­veals that his as­sas­si­na­tion method is im­pos­si­bly pre­cise and some­thing pro­foundly anom­alous is go­ing on. L has been tipped off that Kira ex­ists. What­ever the bo­gus jus­ti­fi­ca­tion may be, this is a ma­jor vic­tory for his op­po­nents. (To de­ter crim­i­nals and vil­lains, it is not nec­es­sary for there to be a glob­al­ly-known sin­gle anom­alous or su­per­nat­ural killer, when it would be equally effec­tive to arrange for all the killings to be done nat­u­ral­is­ti­cally by or­di­nary mech­a­nisms such as third par­ties/po­lice/ju­di­ciary or used in­di­rectly as par­al­lel con­struc­tion to crack cas­es.)

  2. Worse, the deaths are non-ran­dom in other ways—they tend to oc­cur at par­tic­u­lar times!

    Just the sched­ul­ing of deaths cost Light 6 bits of anonymity

  3. Light’s third mis­take was re­act­ing to the bla­tant provo­ca­tion of Lind L. Tai­lor.

Tak­ing the bait let L nar­row his tar­get down to 1⁄3 the orig­i­nal Japan­ese pop­u­la­tion, for a gain of ~1.6 bits. 4. Light’s fourth mis­take was to use con­fi­den­tial po­lice in­for­ma­tion stolen us­ing his po­lice­man fa­ther’s cre­den­tials.

This mis­take was the largest in bits lost. This mis­take cost him 11 bits of anonymi­ty; in other words, this mis­take cost him twice what his sched­ul­ing cost him and al­most 8 times the mur­der of Tai­lor! 5. Killing Ray Pen­bar and the FBI team.

If we as­sume Pen­bar was tasked 200 leads out of the 10,000, then mur­der­ing him and the fi­ancee dropped Light just 6 bits or a lit­tle over half the fourth mis­take and com­pa­ra­ble to the orig­i­nal sched­ul­ing mis­take. 6. Endgame: At this point in the plot, L re­sorts to di­rect mea­sures and en­ters Light’s life di­rect­ly, en­rolling at the uni­ver­si­ty, with Light un­able to per­fectly play the role of in­no­cent un­der in­tense in­-per­son sur­veil­lance.

From that point on, Light is screwed as he is now play­ing a deadly game of “Mafia” with L & the in­ves­tiga­tive team. He frit­tered away >25 bits of anonymity and then L in­tu­ited the rest and sus­pected him all along.

Fi­nal­ly, I sug­gest how Light could have most effec­tively em­ployed the Death Note and lim­ited his loss of anonymi­ty. In an ap­pen­dix, I dis­cuss the max­i­mum amount of in­for­ma­tion leak­age pos­si­ble from us­ing a Death Note as a com­mu­ni­ca­tion de­vice.

(Note: This es­say as­sumes a fa­mil­iar­ity with the early plot of and . If you are un­fa­mil­iar with DN, see my es­say or con­sult or read the DN rules⁠.)

DETECTIVE STORIES AS OPTIMIZATION PROBLEMS

In Light’s case, L starts with the world’s en­tire pop­u­la­tion of 7 bil­lion peo­ple and needs to nar­row it down to 1 per­son. It’s a search prob­lem. It maps fairly di­rectly onto ba­sic in­for­ma­tion the­ory⁠, in fact. (See also Sim­u­la­tion in­fer­ences⁠, The 3 Grenades⁠, and for case stud­ies in ap­plied deanonymiza­tion, Tor DNM-related ar­rests, 2011–2015⁠.) To uniquely spec­ify one item out of 7 bil­lion, you need 33 bits of in­for­ma­tion be­cause $\log_2(7000000000) \approx 32.7$; to use an anal­o­gy, your 32-bit com­puter can only ad­dress one unique lo­ca­tion in mem­ory out of 4 bil­lion lo­ca­tions, and adding an­other bit dou­bles the ca­pac­ity to >8 bil­lion. Is 33 bits of in­for­ma­tion a lot?

Not re­al­ly. L could get one bit just by look­ing at his­tory or crime sta­tis­tics, and not­ing that mass mur­der­ers are, to an as­ton­ish­ing de­gree, male1⁠, thereby rul­ing out half the world pop­u­la­tion and ac­tu­ally start­ing L off with a re­quire­ment to ob­tain only 32 bits to break Light’s anonymi­ty.2 If Death Note users were suffi­ciently ra­tio­nal & knowl­edge­able, they could draw on con­cepts like su­per­ra­tional­ity to acausally co­op­er­ate3 to avoid this in­for­ma­tion leak­age… by ar­rang­ing to pass on Death Notes to fe­males4 to re­store a 50:50 gen­der ra­tio—­for ex­am­ple, if for every fe­male who ob­tained a Death note there were 3 males with Death Notes, then all users could roll a 1d3 dice and if 1 keep it and if 2 or 3 pass it on to some­one of the op­po­site gen­der.

We should first point out that Light is al­ways go­ing to leak some bits. The only way he could re­main per­fectly hid­den is to not use the Death Note at all. If you change the world in even the slight­est way, then you have leaked in­for­ma­tion about your­self in prin­ci­ple. Every­thing is con­nected in some sense; you can­not mag­i­cally wave away the ex­is­tence of fire with­out cre­at­ing a cas­cade of con­se­quences that re­sult in every liv­ing thing dy­ing⁠. For ex­am­ple, the fun­da­men­tal point of Light ex­e­cut­ing crim­i­nals is to shorten their lifes­pan—there’s no way to hide that. You can’t both shorten their lives and not shorten their lives. He is go­ing to re­veal him­self this way, at the least, to the ac­tu­ar­ies and sta­tis­ti­cians.

More his­tor­i­cal­ly, this has been a chal­lenge for cryp­tog­ra­phers, like in WWII: how did they ex­ploit the Enigma & other com­mu­ni­ca­tions with­out re­veal­ing they had done so? Their so­lu­tion was mis­di­rec­tion: con­stantly ar­rang­ing for plau­si­ble al­ter­na­tives⁠, like search planes that ‘just hap­pened’ to find Ger­man sub­marines or leaks to con­trolled known Ger­man agents about there be­ing undis­cov­ered spies. (How­ev­er, the fa­mous story that Win­ston Churchill al­lowed the town of Coven­try to be bombed rather than risk the se­cret of Ul­tra has since been put into ques­tion⁠.) This worked in part be­cause of Ger­man over­con­fi­dence, be­cause the war did not last too long, and in part be­cause each cover story was plau­si­ble on its own and no one was, in the chaos of war, able to see the whole pic­ture and re­al­ize that there were too many lucky search planes and too many undis­cov­er­able moles; even­tu­al­ly, how­ev­er, some­one would re­al­ize, and ap­par­ently some Ger­mans did con­clude that Enigma had to have been bro­ken (but much too late). It’s not clear to me what would be the best mis­di­rec­tion for Light to mask his nor­mal killings—use the Death Note’s con­trol fea­tures to in­vent a an­ti-crim­i­nal ter­ror­ist or­ga­ni­za­tion?

So there is a real chal­lenge here: one party is try­ing to in­fer as much as pos­si­ble from ob­served effects, and the other is try­ing to min­i­mize how much the for­mer can ob­serve while not stop­ping en­tire­ly. How well does Light bal­ance the com­pet­ing de­mands?

MISTAKE 1

How­ev­er, he can try to re­duce the leak­age and make his anonymity set as large as pos­si­ble. For ex­am­ple, killing every crim­i­nal with a heart at­tack is a dead give-away. Crim­i­nals do not die of heart at­tacks that often. (The point is more dra­matic if you re­place ‘heart at­tack’ with ‘lu­pus’; as we all know, in real life it’s never lu­pus.) Heart at­tacks are a sub­set of all deaths, and by re­strict­ing him­self, Light makes it eas­ier to de­tect his ac­tiv­i­ties. 1000 deaths of lu­pus are a blar­ing red alarm; 1000 deaths of heart at­tacks are an odd­i­ty; and 1000 deaths dis­trib­uted over the sta­tis­ti­cally likely sus­pects of can­cer and heart dis­ease etc. are al­most in­vis­i­ble (but still no­tice­able in prin­ci­ple).

So, Light’s fun­da­men­tal mis­take is to kill in ways un­re­lated to his goal. Killing through heart at­tacks does not just make him vis­i­ble early on, but the deaths re­veals that his as­sas­si­na­tion method is su­per­nat­u­rally pre­cise. L has been tipped off that Kira ex­ists. What­ever the bo­gus jus­ti­fi­ca­tion may be, this is a ma­jor vic­tory for his op­po­nents.

First mis­take, and a clas­sic one of se­r­ial killers (eg the BTK killer’s vaunt­ing was less anony­mous than he be­lieved): delu­sions of grandeur and the de­sire to taunt, play with, and con­trol their vic­tims and demon­strate their power over the gen­eral pop­u­la­tion. From a lit­er­ary per­spec­tive, this sim­i­lar­ity is clearly not an ac­ci­dent, as we are meant to read Light as the So­ciopath Hero ar­che­type: his ul­ti­mate down­fall is the con­se­quence of his fa­tal per­son­al­ity flaw⁠, hubris⁠, par­tic­u­larly in the orig­i­nal sadis­tic sense. Light can­not help but self­-s­ab­o­tage like this.

(This is also deeply prob­lem­atic from the point of car­ry­ing out Light’s the­ory of de­ter­rence: to de­ter crim­i­nals and vil­lains, it is not nec­es­sary for there to be a glob­al­ly-known sin­gle su­per­nat­ural killer, when it would be equally effec­tive to arrange for all the killings to be done nat­u­ral­is­ti­cally by third par­ties/po­lice/ju­di­ciary or used in­di­rectly to crack cas­es. Ar­guably the de­ter­rence would be more effec­tive the more diffused it’s be­lieved to be—s­ince a sin­gle killer has a fi­nite lifes­pan, fi­nite knowl­edge, fal­li­bil­i­ty, and idio­syn­cratic pref­er­ences which re­duce the threat and con­nec­tion to crim­i­nal­i­ty, while if all the deaths were as­cribed to un­usu­ally effec­tive po­lice or de­tec­tives, this would be in­ferred as a gen­eral in­crease in all kinds of po­lice com­pe­tence, one which will not in­stantly dis­ap­pear when one per­son gets bored or hit by a bus.)

MISTAKE 2

Worse, the deaths are non-ran­dom in other ways—they tend to oc­cur at par­tic­u­lar times! Graphed, daily pat­terns jump out.

L was able to nar­row down the ac­tive times of the pre­sum­able stu­dent or worker to a par­tic­u­lar range of lon­gi­tude, say 125–150° out of 180°; and what coun­try is most promi­nent in that range? Japan. So that cut down the 7 bil­lion peo­ple to around 0.128 bil­lion; 0.128 bil­lion re­quires 27 bits ($\log_2(128000000) \approx 26.93$) so just the sched­ul­ing of deaths cost Light 6 bits of anonymi­ty!

De-Anonymization

On a side-note, some might be skep­ti­cal that one can in­fer much of any­thing from the graph and that Death Note was just gloss­ing over this part. “How can any­one in­fer that it was some­one liv­ing in Japan just from 2 clumpy lines at morn­ing and evening in Japan?” But ac­tu­al­ly, such a graph is sur­pris­ingly pre­cise. I learned this years be­fore I watched Death Note, when I was heav­ily ac­tive on Wikipedia; often I would won­der if two ed­i­tors were the same per­son or roughly where an ed­i­tor lived. What I would do if their ed­its or user page did not re­veal any­thing use­ful is I would go to “Kate’s edit counter” and I would ex­am­ine the times of day all their hun­dreds or thou­sands of ed­its were made at. Typ­i­cal­ly, what one would see was ~4 hours where there were no ed­its what­so­ev­er, then ~4 hours with mod­er­ate to high ac­tiv­i­ty, a trough, then an­other grad­ual rise to 8 hours later and a fur­ther de­cline down to the first 4 hours of no ac­tiv­i­ty. These pe­ri­ods quite clearly cor­re­sponded to sleep (pretty much every­one is asleep at 4 AM), morn­ing, lunch & work hours, evening, and then night with peo­ple oc­ca­sion­ally stay­ing up late and edit­ing5⁠. There was noise, of course, from peo­ple stay­ing up es­pe­cially late or get­ting in a bunch of edit­ing dur­ing their work­day or oc­ca­sion­ally trav­el­ing, but the over­all pat­terns were clear—n­ever did I dis­cover that some­one was ac­tu­ally a night­watch­man and my guess was an en­tire hemi­sphere off. (A­ca­d­e­mic es­ti­mates based on user edit­ing pat­terns cor­re­late well with what is pre­dicted by on the ba­sis of the ge­og­ra­phy of IP ed­its.6)

Com­puter se­cu­rity re­search offers more scary re­sults. Per­haps be­cause “every­thing is cor­re­lated”⁠, there are an amaz­ing num­ber of ways to break some­one’s pri­vacy and de-anonymize them (back­ground⁠; there is also fi­nan­cial in­cen­tive to do so in or­der to ad­ver­tise & price dis­crim­i­nate):

  1. small er­rors in their com­put­er’s clock’s time (even over Tor)

  2. Web brows­ing his­tory7 or just the ver­sion and plu­g­ins8⁠; and this is when ran­dom Fire­fox or Google Docs or Face­book bugs don’t leak your iden­tity

  3. based on how slow pages load9 (how many there are; tim­ing at­tacks can also be used to learn web­site user­names or # of pri­vate pho­tos)

  4. Knowl­edge of what ‘groups’ a per­son was in could uniquely iden­tify 42%10 of peo­ple on so­cial net­work­ing site ⁠, and pos­si­bly Face­book & 6 oth­ers

  5. Sim­i­lar­ly, some­one has watched11⁠, pop­u­lar or ob­scure, through often grants ac­cess to the rest of their pro­file if it was in­cluded in the ⁠. (This was more dra­matic than the be­cause AOL searches had a great deal of per­sonal in­for­ma­tion em­bed­ded in the search queries, but in con­trast, the Net­flix data seems im­pos­si­bly im­pov­er­ished—there’s noth­ing ob­vi­ously iden­ti­fy­ing about what anime one has watched un­less one watches ob­scure ones.)

  6. The re­searchers to find iso­mor­phisms be­tween ar­bi­trary graphs12 (such as so­cial net­works stripped of any and all data ex­cept for the graph struc­ture), for ex­am­ple and ⁠, and give many ex­am­ples of pub­lic datasets that could be de-anonymized13—such as your Ama­zon pur­chases (Ca­lan­drino et al 2011⁠; blog). These at­tacks are on just the data that is left after at­tempts to anonymize data; they don’t ex­ploit the ob­ser­va­tion that the choice of what data to re­move is as in­ter­est­ing as what is left, what calls “The Redac­tor’s Dilemma”⁠.

  7. User­names hardly bear dis­cussing

  8. Your hos­pi­tal records can be de-anonymized just by look­ing at pub­lic vot­ing rolls14 That re­searcher later went on to run “ex­per­i­ments on the iden­ti­fi­a­bil­ity of de-i­den­ti­fied sur­vey data [cite], phar­macy data [cite], clin­i­cal trial data [cite], crim­i­nal data [S­tate of Delaware v. Gan­nett Pub­lish­ing], DNA [cite⁠, cite⁠, cite], tax data, pub­lic health reg­istries [cite (sealed by court), etc.], web logs, and par­tial So­cial Se­cu­rity num­bers [cite].” (Whew.)

  9. Your is sur­pris­ingly unique and the sounds of typ­ing and arm move­ments can iden­tify you or be used snoop on in­put &

  10. Know­ing your morn­ing com­mute as loosely as to the in­di­vid­ual blocks (or less gran­u­lar) uniquely iden­ti­fies (Golle & Par­tridge 2009) you; know­ing your com­mute to the zip code/­cen­sus tract uniquely iden­ti­fies 5% of peo­ple

  11. Your hand­writ­ing is fairly unique, sure—but so is how you fill in bub­bles on tests15

  12. Speak­ing of hand­writ­ing, your writ­ing style can be pretty unique too

  13. the un­no­tice­able back­ground elec­tri­cal hum may uniquely date au­dio record­ings⁠. Un­no­tice­able sounds can also be used to per­sis­tently track de­vices/peo­ple, ex­fil­trate in­for­ma­tion across air gaps, and can be used to mon­i­tor room pres­ence/ac­tiv­i­ty, and even or tap­ping noises or

  14. you may have heard of for eaves­drop­ping… but what about eaves­drop­ping via video record­ing of potato chip bags or candy wrap­pers or ? (press re­lease), or cell­phone gy­ro­scopes? Lasers are good for de­tect­ing your heart­beat as well, which is—of course—uniquely iden­ti­fy­ing And Soon even will no longer be safe…

  15. steer­ing & dri­ving pat­terns are suffi­ciently unique as to al­low iden­ti­fi­ca­tion of dri­vers from as lit­tle as 1 turn in some cas­es: ⁠. These at­tacks also work on smart­phones for time zone, baro­met­ric pres­sure, pub­lic trans­porta­tion tim­ing, IP ad­dress, & pat­tern of con­nect­ing to WiFi or cel­lu­lar net­works (Mose­nia et al 2017)

  16. smart­phones can be IDed by the pat­tern of pixel noise, due to sen­sor noise such as small im­per­fec­tions in the CCD sen­sors and lenses (and Face­book has even patented this)

  17. smart­phone us­age pat­terns, such as app pref­er­ences, app switch­ing rates, con­sis­tency of com­mute pat­terns, over­all ge­o­graphic mo­bil­i­ty, slower or less dri­ving have been cor­re­lated with Alzheimer’s dis­ease (Kour­tis et al 2019) and per­son­al­ity ().16

    Eye track­ing is ⁠.

  18. voices cor­re­late with not just age/­gen­der/eth­nic­i­ty, but… ?

(The only sur­pris­ing thing about DNA-related pri­vacy breaks is how long they have taken to show up.)

To sum­ma­rize: differ­en­tial pri­vacy is al­most im­pos­si­ble17 and pri­vacy is dead18⁠. (See also “Bro­ken Promises of Pri­va­cy: Re­spond­ing to the Sur­pris­ing Fail­ure of Anonymiza­tion”⁠.)

MISTAKE 3

Light’s third mis­take was re­act­ing to the ca­nary trap provo­ca­tion of the Lind L. Tai­lor broad­cast, crit­i­ciz­ing Ki­ra, and Light lash­ing out to use the clear­ly-vis­i­ble name & face to kill Lind L. Tai­lor. The live broad­cast was a bla­tant at­tempt to pro­voke a re­ac­tion—any re­ac­tion—from a sur­prised & un­pre­pared Light, and that alone should have been suffi­cient rea­son to sim­ply ig­nore it (even if Light could not have rea­son­ably known ex­actly how it was a trap): one should never do what an en­emy wants one to do on ground & terms & tim­ing pre­pared by the en­e­my. (Light had the op­tion to use the Death Note at any time in the fu­ture, and that would have been al­most as good a demon­stra­tion of his power as do­ing so dur­ing a live broad­cast.)

Run­ning the broad­cast in 1 re­gion was also a gam­ble & a po­ten­tial mis­take on L’s part; he had no real rea­son to think Light was in Kanto (or if he did al­ready have pri­ors/in­for­ma­tion to that effect, he should’ve been bi­sect­ing Kan­to) and should have arranged for it to be broad­cast to ex­actly half of Japan’s pop­u­la­tion, ob­tain­ing an ex­pected max­i­mum of 1 bit. But it was one that paid off; he nar­rowed his tar­get down to 1⁄3 the orig­i­nal Japan­ese pop­u­la­tion, for a gain of ~1.6 bits. (You can see it was a gam­ble by con­sid­er­ing if Light had been out­side Kan­to; since he would not see it live, he would not have re­act­ed, and all L would learn is that his sus­pect was in that other 2⁄3 of the pop­u­la­tion, for a gain of only ~0.3 bit­s.)

But even this was­n’t a huge mis­take. He lost 6 bits to his sched­ule of killing, and lost an­other 1.6 bits to tem­pera­men­tally killing Lind L. Tai­lor, but since the male pop­u­la­tion of Kanto is 21.5 mil­lion (43 mil­lion to­tal), he still has ~24 bits of anonymity left (log2 (21500000) ≈ 24.36). That’s not too ter­ri­ble, and the loss is mit­i­gated even fur­ther by other de­tails of this mis­take, as pointed out by Zm­flav­ius⁠; specifi­cal­ly, that un­like “be­ing male” or “be­ing Japan­ese”, the in­for­ma­tion about be­ing in Kanto is sub­ject to de­cay, since peo­ple move around all the time for all sorts of rea­sons:

…quite pos­si­bly Light’s biggest mis­take was in­ad­ver­tently re­veal­ing his con­nec­tion to the po­lice hi­er­ar­chy by hack­ing his dad’s com­put­er. Whereas even the Lind L. Tay­lor de­ba­cle only re­vealed his killing me­chan­ics and nar­rowed him down to “some­one in the Kanto re­gion” (which is, while an im­pres­sive ac­com­plish­ment based on the in­for­ma­tion he had, en­tirely mean­ing­less for ac­tu­ally find­ing a sus­pec­t), there were per­haps a few hun­dred peo­ple who had ac­cess to the in­for­ma­tion Light’s dad had. There’s also the fact that L knew that Light was prob­a­bly some­one in their late teens, mean­ing that there was an ex­tremely high chance that at the end of the school year, even that coup of his would ex­pire, thanks to stu­dents head­ing off to uni­ver­sity all over Japan (of course, Light went to ⁠, and a stu­dent of his cal­iber not at­tend­ing such a uni­ver­sity would be sus­pi­cious, but L had no way of know­ing that then). I mean, per­haps L had hoped that Kira would re­veal him­self by sud­denly mov­ing away from the Kanto re­gion, but come the next May, he would have no way of mon­i­tor­ing un­usual move­ments among late teenagers, be­cause a large per­cent­age of them would be mov­ing for le­git­i­mate rea­sons.

(One could still run the in­fer­ence “back­wards” on any par­tic­u­lar per­son to ver­ify they were in Kanto in the right time pe­ri­od, but as time pass­es, it be­comes less pos­si­ble to run the in­fer­ence “for­wards” and only ex­am­ine peo­ple in Kan­to.)

This mis­take also shows us that the im­por­tant thing that in­for­ma­tion the­ory buys us, re­al­ly, is not the bit (we could be us­ing log10 rather than log2, and com­pares “dits” rather than “bits”) so much as com­par­ing events in the plot on a log­a­rith­mic scale. If we sim­ply looked at how the ab­solute num­ber of how many peo­ple were ruled out at each step, we’d con­clude that the first mis­take by Light was a de­ba­cle with­out com­pare since it let L rule out >6 bil­lion peo­ple, ap­prox­i­mately 60× more peo­ple than all the other mis­takes put to­gether would let L rule out. Mis­takes are rel­a­tive to each oth­er, not ab­solutes.

MISTAKE 4

Light’s fourth mis­take was to use con­fi­den­tial po­lice in­for­ma­tion stolen us­ing his po­lice­man fa­ther’s cre­den­tials. This was un­nec­es­sary as there are count­less crim­i­nals he could still ex­e­cute us­ing pub­lic in­for­ma­tion (face+­name is not typ­i­cally diffi­cult to get), and if for some rea­son he needed a spe­cific crim­i­nal, he could ei­ther re­strict use of se­cret in­for­ma­tion to a few high­-pri­or­ity vic­tim­s—if only to avoid sus­pi­cions of hack­ing & sub­se­quent se­cu­rity up­grades cost­ing him ac­cess!—or man­u­fac­ture, us­ing the Death Note’s co­er­cive pow­ers or Ki­ra’s pub­lic sup­port, a way to re­lease in­for­ma­tion such as a ‘leak’ or pass­ing pub­lic trans­parency laws.

This mis­take was the largest in bits lost. But in­ter­est­ing­ly, many or even most Death Note fans do not seem to re­gard this as his largest mis­take, in­stead point­ing to his killing Lind L. Tai­lor or per­haps re­ly­ing too much on Mika­mi. The in­for­ma­tion the­o­ret­i­cal per­spec­tive strongly dis­agrees, and lets us quan­tify how large this mis­take was.

When he acts on the se­cret po­lice in­for­ma­tion, he in­stantly cuts down his pos­si­ble iden­tity to one out of a few thou­sand peo­ple con­nected to the po­lice. Let’s be gen­er­ous and say 10,000. It takes 14 bits to spec­ify 1 per­son out of 10,000 (log2 (10000) ≈ 13.29)—as com­pared to the 24–25 bits to spec­ify a Kanto dweller.

This mis­take cost him 11 bits of anonymi­ty; in other words, this mis­take cost him twice what his sched­ul­ing cost him and al­most 8 times the mur­der of Tai­lor!

MISTAKE 5

In com­par­ison, the fifth mis­take, mur­der­ing Ray Pen­bar’s fi­ancee and fo­cus­ing L’s sus­pi­cion on Pen­bar’s as­signed tar­gets was pos­i­tively cheap. If we as­sume Pen­bar was tasked 200 leads out of the 10,000, then mur­der­ing him and the fi­ancee dropped Light from 14 bits to 8 bits (log2 (200) ≈ 7.64) or just 6 bits or a lit­tle over half the fourth mis­take and com­pa­ra­ble to the orig­i­nal sched­ul­ing mis­take.

ENDGAME

At this point in the plot, L re­sorts to di­rect mea­sures and en­ters Light’s life di­rect­ly, en­rolling at the uni­ver­si­ty. From this point on, Light is screwed as he is now play­ing a deadly game of Mafia with L & the in­ves­tiga­tive team. He frit­tered away >25 bits of anonymity and then L in­tu­ited the rest and sus­pected him all along. (We could jus­tify L skip­ping over the re­main­ing 8 bits by point­ing out that L can an­a­lyze the deaths and in­fer psy­cho­log­i­cal char­ac­ter­is­tics like ar­ro­gance, puz­zle-solv­ing, and great in­tel­li­gence, which com­bined with heuris­ti­cally search­ing the re­main­ing can­di­dates, could lead him to zero in on Light.)

From the the­o­ret­i­cal point of view, the game was over at that point. The chal­lenge for L then be­came prov­ing it to L’s sat­is­fac­tion un­der his self­-im­posed moral con­straints.19

SECURITY IS HARD (LET’S GO SHOPPING)

What should Light have done? That’s easy to an­swer, but tricky to im­ple­ment.

One could try to man­u­fac­ture disin­for­ma­tion. Ter­ence Tao re­hearses many of the above points about in­for­ma­tion the­ory & anonymi­ty, and goes on to loosely dis­cuss the pos­si­ble ben­e­fits of fak­ing in­for­ma­tion:

…one ad­di­tional way to gain more anonymity is through de­lib­er­ate dis­in­for­ma­tion. For in­stance, sup­pose that one re­veals 100 in­de­pen­dent bits of in­for­ma­tion about one­self. Or­di­nar­i­ly, this would cost 100 bits of anonymity (as­sum­ing that each bit was a pri­ori equally likely to be true or false), by cut­ting the num­ber of pos­si­bil­i­ties down by a fac­tor of 2100; but if 5 of these 100 bits (cho­sen ran­domly and not re­vealed in ad­vance) are de­lib­er­ately fal­si­fied, then the num­ber of pos­si­bil­i­ties in­creases again by a fac­tor of (100 choose 5) ~ 226, re­cov­er­ing about 26 bits of anonymi­ty. In prac­tice one gains even more anonymity than this, be­cause to dis­pel the dis­in­for­ma­tion one needs to solve a prob­lem, which can be no­to­ri­ously in­tractable com­pu­ta­tion­al­ly, al­though this ad­di­tional pro­tec­tion may dis­si­pate with time as al­go­rithms im­prove (e.g. by in­cor­po­rat­ing ideas from ).

RANDOMIZING

The diffi­culty with sug­gest­ing that Light should—or could—have used dis­in­for­ma­tion on the tim­ing of deaths is that we are, in effect, en­gag­ing in a sort of hind­sight bias⁠. How ex­actly is Light or any­one sup­posed to know that L could de­duce his time­zone from his killings? I men­tioned an ex­am­ple of us­ing Wikipedia ed­its to lo­cal­ize ed­i­tors, but that tech­nique was unique to me among WP ed­i­tors20 and no doubt there are many other forms of in­for­ma­tion leak­age I have never heard of de­spite com­pil­ing a list; if I were Light, even if I re­mem­bered my Wikipedia tech­nique, I might not bother evenly dis­trib­ut­ing my killing over the clock or adopt­ing a de­cep­tive pat­tern (eg sug­gest­ing I was in Eu­rope rather than Japan). If Light had known he was leak­ing tim­ing in­for­ma­tion but did­n’t know that some­one out there was clever enough to use it (a “known un­known”), then we might blame him; but how is Light sup­posed to know these “un­known un­knowns”?

Ran­dom­iza­tion is the an­swer. Ran­dom­iza­tion and en­cryp­tion scram­ble the cor­re­la­tions be­tween in­put and out­put, and they would serve as well in Death Note as they do in cryp­tog­ra­phy & sta­tis­tics in the real world, at the cost of some effi­cien­cy. The point of ran­dom­iza­tion, both in cryp­tog­ra­phy and in sta­tis­ti­cal ex­per­i­ments, is to not just pre­vent the leaked in­for­ma­tion or con­founders (re­spec­tive­ly) you do know about but also the ones you do not yet know about.

To steal & para­phrase an ex­am­ple from Jim Manz­i’s Un­con­trolled: you’re run­ning a weight-loss ex­per­i­ment. You know that the effec­tive­ness might vary with each sub­jec­t’s pre-ex­ist­ing weight, but you don’t be­lieve in ran­dom­iza­tion (y­ou’re a prac­ti­cal man! only prissy sta­tis­ti­cians worry about ran­dom­iza­tion!); so you split the sub­jects by weight, and for con­ve­nience you al­lo­cate them by when they show up to your ex­per­i­men­t—in the end, there are ex­actly 10 ex­per­i­men­tal sub­jects over 150 pounds and 10 con­trols over 150 pounds, and so on and so forth. Un­for­tu­nate­ly, it turns out that un­be­knownst to you, a ge­netic vari­ant con­trols weight gain and a whole ex­tended fam­ily showed up at your ex­per­i­ment early on and they all got al­lo­cated to ‘ex­per­i­men­tal’ and none of them to ‘con­trol’ (s­ince you did­n’t need to ran­dom­ize, right? you were mak­ing sure the groups were matched on weight!). Your ex­per­i­ment is now bo­gus and mis­lead­ing. Of course, you could run a sec­ond ex­per­i­ment where you make sure the ex­per­i­men­tal and con­trol groups are matched on weight and also now matched on that ge­netic vari­ant… but now there’s the po­ten­tial for some third con­founder to hit you. If only you had used ran­dom­iza­tion—then you would prob­a­bly have put some of the vari­ants into the other group as well and your re­sults would­n’t’ve been bo­gus!

So to deal with Light’s first mis­take, sim­ply sched­ul­ing every death on the hour will not work be­cause the wake-sleep cy­cle is still pre­sent. If he set up a list and wrote down n crim­i­nals for each hour to elim­i­nate the peak-troughs rather than ran­dom­iz­ing, could that still go wrong? May­be: we don’t know what in­for­ma­tion might be left in the data which an L or Tur­ing could de­ci­pher. I can spec­u­late about one pos­si­bil­i­ty—the al­lo­ca­tion of each kind of crim­i­nal to each hour. If one were to draw up lists and go in or­der (hey, one does­n’t need ran­dom­iza­tion, right?), then the or­der might go ‘crim­i­nals in the morn­ing news­pa­per, crim­i­nals on TV, crim­i­nals whose de­tails were not im­me­di­ately given but were avail­able on­line, crim­i­nals from years ago, his­tor­i­cal crim­i­nals etc’; if the morn­ing-news­pa­per-crim­i­nals start at say 6 AM Japan time… And al­lo­cat­ing evenly might be hard, since there’s nat­u­rally go­ing to be short­falls when there just aren’t many crim­i­nals that day or the news­pa­pers aren’t pub­lish­ing (hol­i­days?) etc., so the short­fall pe­ri­ods will pin­point what the Kira con­sid­ers ‘end of the day’.

A much safer pro­ce­dure is thor­ough-go­ing ran­dom­iza­tion ap­plied to tim­ing, sub­jects, and man­ner of death. Even if we as­sume that Light was bound and de­ter­mined to re­veal the ex­is­tence of Kira and gain pub­lic­ity and in­ter­na­tional no­to­ri­ety (a ma­jor char­ac­ter flaw in its own right; ac­com­plish­ing things, tak­ing cred­it—­choose one), he still did not have to re­duce his anonymity much past 32 bits.

  1. Each ex­e­cu­tion’s time could be de­ter­mined by a ran­dom dice roll (say, a 24-sided dice for hours and a 60-sided dice for min­utes).
  2. Se­lect­ing method of death could be done sim­i­larly based on eas­ily re­searched de­mo­graphic data, al­though per­haps ir­rel­e­vant (serv­ing mostly to con­ceal that a killing has taken place).
  3. Se­lect­ing crim­i­nals could be based on in­ter­na­tion­ally ac­ces­si­ble pe­ri­od­i­cals that plau­si­bly every hu­man has ac­cess to, such as the New York Times, and deaths could be de­layed by months or years to broaden the pos­si­bil­i­ties as to where the Kira learned of the vic­tim (TV? books? the In­ter­net?) and avoid­ing is­sues like killing a crim­i­nal only pub­li­cized on one ob­scure Japan­ese pub­lic tele­vi­sion chan­nel. And so on.

Let’s re­mem­ber that all this is pred­i­cated on anonymi­ty, and on Light us­ing low-tech strate­gies; as one per­son asked me, “why does­n’t Light set up an cryp­to­graphic as­sas­si­na­tion mar­ket or just take over the world? He would win with­out all this clev­er­ness.” Well, then it would not be Death Note.

APPENDICES

COMMUNICATING WITH A DEATH NOTE

One might won­der how much in­for­ma­tion one could send in­ten­tion­ally with a Death Note, as op­posed to in­ad­ver­tently leak bits about one’s iden­ti­ty. As deaths are by and large pub­licly known in­for­ma­tion, we’ll as­sume the sender and re­cip­i­ent have some sort of pre-arranged key or one-time pad (although one would won­der why they’d use such an im­moral and clumsy sys­tem as op­posed to steganog­ra­phy or mes­sages on­line).

A death in­flicted by a Death Note has 3 main dis­tin­guish­ing traits which one can con­trol—who, when, and how:

  1. the per­son

    The ‘who?’ is al­ready cal­cu­lated for us: if it takes 33 bits to spec­ify a unique hu­man, then a par­tic­u­lar hu­man can con­vey 33 bits. Con­cerns about learn­abil­ity (how would you learn of an Ama­zon tribesman’s death?) im­ply that it’s re­ally <33 bits.

    If you try some scheme to en­code more bits into the choice of as­sas­si­na­tion, you ei­ther wind up with 33 bits or you wind up un­able to con­vey cer­tain com­bi­na­tions of bits and effec­tively 33 bits any­way—y­our scheme will tell you that to con­vey your des­per­ately im­por­tant mes­sage X of 50 bits telling all about L’s true iden­tity and how you dis­cov­ered it, you need to kill an Ola­fur Ja­cobs of Tan­za­nia who weighs more than 200 pounds and is from Tai­wan, but alas! Ja­cobs does­n’t ex­ist for you to kill.

  2. the time

    The ‘when’ is han­dled by sim­i­lar rea­son­ing. There is a cer­tain gran­u­lar­ity to Death Note kills: even if it is ca­pa­ble of tim­ing deaths down to the nanosec­ond, one can’t ac­tu­ally wit­ness this or re­ceive records of this. Doc­tors may note time of death down to the min­ute, but no finer (and how do you get such pre­cise med­ical records any­way?). News re­ports may be even less ac­cu­rate, not­ing merely that it hap­pened in the morn­ing or in the late evening. In rare cases like live broad­casts, one may be able to do a lit­tle bet­ter, but even they tend to be de­layed by a few sec­onds or min­utes to al­low for buffer­ing, tech­ni­cal glitches be fixed, the stenog­ra­phers pro­duce the closed cap­tion­ing, or sim­ply to guard against em­bar­rass­ing events (like Janet Jack­son’s nip­ple-s­lip). So we’ll not as­sume the tim­ing can be more ac­cu­rate than the minute. But which min­utes does a Death Note user have to choose from? Inas­much as the Death Note is ap­par­ently in­ca­pable of in­flu­enc­ing the past or caus­ing Pratch­et­t­ian21 su­per­lu­mi­nal effects, the past is off-lim­its; but mes­sages also have to be sent in time for what­ever they are sup­posed to in­flu­ence, so one can­not afford to have a win­dow of a cen­tu­ry. If the mes­sage needs to affect some­thing within the day, then the user has a win­dow of only 60 · 24 = 1440 min­utes, which is log2(1440) = 10.49 bits; if the user has a win­dow of a year, that’s slightly bet­ter, as a death’s tim­ing down to the minute could em­body as much as log2(60 · 24 · 365) = 19 bits. (Over a decade then is 22.3 bits, etc.) If we al­low tim­ing down to the sec­ond, then a year would be 24.9 bits. In any case, it’s clear that we’re not go­ing to get more than 33 bits from the date. On the plus side, an ‘IP over Death’ pro­to­col would be su­pe­rior to —here, the worse your la­ten­cy, the more bits you could ex­tract from the pack­et’s time­stamp! on com­pres­sion schemes:

    “Yeah, but there’s more to be­ing smart than know­ing com­pres­sion schemes!” “No there’s not!” “Shoot—he knows the se­cret!!” –Ryan North
  3. the cir­cum­stances (such as the place)

    The ‘how’… has many more de­grees of free­dom. The cir­cum­stances is much more diffi­cult to cal­cu­late. We can sub­di­vide it in a lot of ways; here’s one:

    1. Lo­ca­tion (eg. lat­i­tude/­lon­gi­tude)

      Earth has ~510,072,000,000 square me­ters of sur­face area; most of it is en­tirely use­less from our per­spec­tive—if some­one is in an air­plane and dies, how on earth does one fig­ure out the ex­act square me­ter he was above? Or on the oceans? Earth has ~148,940,000,000 square me­ters of land, which is more us­able: the usual cal­cu­la­tions gives us log2(148940000000) = 37.12 bits. (Sur­prised at how sim­i­lar to the ‘who?’ bit cal­cu­la­tion this is? But 37.12 - 33 = 4.12 and 24.12 = 17.4. The SF clas­sic drew its name from the ob­ser­va­tion that the 7 bil­lion peo­ple alive in 2010 would fit in Zanz­ibar only if they stood shoul­der to shoul­der—spread them out, and mul­ti­ply that area by ~18…) This raises an is­sue that affects all 3: how much can the Death Note con­trol? Can it move vic­tims to ar­bi­trary points in, say, Siberia? Or is it lim­ited to within dri­ving dis­tance? etc. Any of those is­sues could shrink the 37 bits by a great deal.

    2. Cause Of Death

      The In­ter­na­tional Clas­si­fi­ca­tion of Dis­eases lists up­wards of 20,000 dis­eases, and we can imag­ine thou­sands of pos­si­ble ac­ci­den­tal or de­lib­er­ate deaths. But what mat­ters is what gets com­mu­ni­cat­ed: if there are 500 dis­tinct brain can­cers but the death is only re­ported as ‘brain can­cer’, the 500 count as 1 for our pur­pos­es. But we’ll be gen­er­ous and go with 20,000 for re­ported dis­eases plus ac­ci­dents, which is log2(20000) = 14.3 bits.

    3. Ac­tion Prior To Death

      Ac­tions prior to death over­laps with ac­ci­den­tal caus­es; here the se­ries does­n’t help us. Light’s early ex­per­i­ments cul­mi­nat­ing in the “L, do you know death gods love ap­ples?” seem to im­ply that ac­tions are lim­ited in en­tropy as each word took a death (as­sum­ing the or­di­nary Eng­lish vo­cab­u­lary of 50,000 words, 16 bit­s), but other plot events im­ply that hu­mans can un­der­take long com­plex plans at the or­der of Death Notes (like Mikami bring­ing the fake Death Note to the fi­nal con­fronta­tion with Near). Ac­tions be­fore death could be re­ported in great de­tail, or they could be hid­den un­der offi­cial se­crecy like the afore­men­tioned death gods men­tioned (Light uniquely priv­i­leged in learn­ing it suc­ceeded as part of L test­ing him). I can’t be­gin to guess how many dis­tinct nar­ra­tives would sur­vive trans­mis­sion or what lim­its the Note would set. We must leave this one un­de­fined: it’s al­most surely more than 10 bits, but how many?

Sum­ming, we get <33 + <19 + 17 + <37 + 14 + ? = 120? bits per death.

“BAYESIAN JURISPRUDENCE”

E.T. Jaynes in his posthu­mous Prob­a­bil­ity The­o­ry: The Logic of Sci­ence (on Bayesian sta­tis­tics) in­cludes a chap­ter 5 on “Queer Uses For Prob­a­bil­ity The­ory”⁠, dis­cussing such top­ics as ESP; mir­a­cles; heuris­tics & bi­ases⁠; how vi­sual per­cep­tion is the­o­ry-laden; phi­los­o­phy of sci­ence with re­gard to New­ton­ian me­chan­ics and the famed dis­cov­ery of Nep­tune⁠; horse-rac­ing & weather fore­cast­ing; and fi­nal­ly—­sec­tion 5.8, “Bayesian ju­rispru­dence”. Jay­nes’s analy­sis is some­what sim­i­lar in spirit to my above analy­sis, al­though mine is not ex­plic­itly Bayesian ex­cept per­haps in the dis­cus­sion of gen­der as elim­i­nat­ing one nec­es­sary bit.

The fol­low­ing is an ex­cerpt; see also “Bayesian Jus­tice”⁠.

It is in­ter­est­ing to ap­ply prob­a­bil­ity the­ory in var­i­ous sit­u­a­tions in which we can’t al­ways re­duce it to num­bers very well, but still it shows au­to­mat­i­cally what kind of in­for­ma­tion would be rel­e­vant to help us do plau­si­ble rea­son­ing. Sup­pose some­one in New York City has com­mit­ted a mur­der, and you don’t know at first who it is, but you know that there are 10 mil­lion peo­ple in New York City. On the ba­sis of no knowl­edge but this, e(Guilty|X) = −70 db is the plau­si­bil­ity that any par­tic­u­lar per­son is the guilty one.

How much pos­i­tive ev­i­dence for guilt is nec­es­sary be­fore we de­cide that some man should be put away? Per­haps +40 db, al­though your re­ac­tion may be that this is not safe enough, and the num­ber ought to be high­er. If we raise this num­ber we give in­creased pro­tec­tion to the in­no­cent, but at the cost of mak­ing it more diffi­cult to con­vict the guilty; and at some point the in­ter­ests of so­ci­ety as a whole can­not be ig­nored.

For ex­am­ple, if 1000 guilty men are set free, we know from only too much ex­pe­ri­ence that 200 or 300 of them will pro­ceed im­me­di­ately to in­flict still more crimes upon so­ci­ety, and their es­cap­ing jus­tice will en­cour­age 100 more to take up crime. So it is clear that the dam­age to so­ci­ety as a whole caused by al­low­ing 1000 guilty men to go free, is far greater than that caused by falsely con­vict­ing one in­no­cent man.

If you have an emo­tional re­ac­tion against this state­ment, I ask you to think: if you were a judge, would you rather face one man whom you had con­victed false­ly; or 100 vic­tims of crimes that you could have pre­vent­ed? Set­ting the thresh­old at +40 db will mean, crude­ly, that on the av­er­age not more than one con­vic­tion in 10,000 will be in er­ror; a judge who re­quired ju­ries to fol­low this rule would prob­a­bly not make one false con­vic­tion in a work­ing life­time on the bench.

In any event, if we took +40 db start­ing out from −70 db, this means that in or­der to en­sure a con­vic­tion you would have to pro­duce about 110 db of ev­i­dence for the guilt of this par­tic­u­lar per­son. Sup­pose now we learn that this per­son had a mo­tive. What does that do to the plau­si­bil­ity for his guilt? Prob­a­bil­ity the­ory says

(5-38)

since , i.e. we con­sider it quite un­likely that the crime had no mo­tive at all. Thus, the [im­por­tance] of learn­ing that the per­son had a mo­tive de­pends al­most en­tirely on the prob­a­bil­ity that an in­no­cent per­son would also have a mo­tive.

This ev­i­dently agrees with our com­mon sense, if we pon­der it for a mo­ment. If the de­ceased were kind and loved by all, hardly any­one would have a mo­tive to do him in. Learn­ing that, nev­er­the­less, our sus­pect did have a mo­tive, would then be very [im­por­tant] in­for­ma­tion. If the vic­tim had been an un­sa­vory char­ac­ter, who took great de­light in all sorts of foul deeds, then a great many peo­ple would have a mo­tive, and learn­ing that our sus­pect was one of them is not so [im­por­tan­t]. The point of this is that we don’t know what to make of the in­for­ma­tion that our sus­pect had a mo­tive, un­less we also know some­thing about the char­ac­ter of the de­ceased. But how many mem­bers of ju­ries would re­al­ize that, un­less it was pointed out to them?

Sup­pose that a very en­light­ened judge, with pow­ers not given to judges un­der present law, had per­ceived this fact and, when tes­ti­mony about the mo­tive was in­tro­duced, he di­rected his as­sis­tants to de­ter­mine for the jury the num­ber of peo­ple in New York City who had a mo­tive. If this num­ber is then

and equa­tion (5-38) re­duces, for all prac­ti­cal pur­pos­es, to

(5-39)

You see that the pop­u­la­tion of New York has can­celed out of the equa­tion; as soon as we know the num­ber of peo­ple who had a mo­tive, then it does­n’t mat­ter any more how large the city was. Note that (5-39) con­tin­ues to say the right thing even when is only 1 or 2.

You can go on this way for a long time, and we think you will find it both en­light­en­ing and en­ter­tain­ing to do so. For ex­am­ple, we now learn that the sus­pect was seen near the scene of the crime shortly be­fore. From Bayes’ the­o­rem, the [im­por­tance] of this de­pends al­most en­tirely on how many in­no­cent per­sons were also in the vicin­i­ty. If you have ever been told not to trust Bayes’ the­o­rem, you should fol­low a few ex­am­ples like this a good deal fur­ther, and see how in­fal­li­bly it tells you what in­for­ma­tion would be rel­e­vant, what ir­rel­e­vant, in plau­si­ble rea­son­ing.22

In re­cent years there has grown up a con­sid­er­able lit­er­a­ture on Bayesian ju­rispru­dence; for a re­view with many ref­er­ences, see Vi­g­naux and Robert­son (1996) [This is ap­par­ently In­ter­pret­ing Ev­i­dence: Eval­u­at­ing Foren­sic Sci­ence in the Court­room –Ed­i­tor].

Even in sit­u­a­tions where we would be quite un­able to say that nu­mer­i­cal val­ues should be used, Bayes’ the­o­rem still re­pro­duces qual­i­ta­tively just what your com­mon sense (after per­haps some med­i­ta­tion) tells you. This is the fact that George Polya demon­strated in such o ex­haus­tive de­tail that the present writer was con­vinced that the con­nec­tion must be more than qual­i­ta­tive.