DEATH NOTE: L, ANONYMITY & ELUDING ENTROPY

Detective stories as optimization problems
Mistakes
- Mistake 1
- Mistake 2
  - De-anonymization
- Mistake 3
- Mistake 4
- Mistake 5
- Endgame
Security is Hard (Let’s Go Shopping)
- Randomizing
See Also
External Links
Appendices
- Communicating with a Death Note
- “Bayesian Jurisprudence”
Footnotes

In the manga Death Note, the protagonist Light Yagami is given the supernatural weapon “Death Note” which can kill anyone on demand, and begins using it to reshape the world. The genius detective L attempts to track him down with analysis and trickery, and ultimately succeeds. Death Note is almost a thought-experiment-given the perfect murder weapon, how can you screw up anyway? I consider the various steps of L’s process from the perspective of computer security, cryptography, and information theory, to quantify Light’s initial anonymity and how L gradually de-anonymizes him, and consider which mistake was the largest as follows:

Light’s fundamental mistake is to kill in ways unrelated to his goal.

Killing through heart attacks does not just make him visible early on, but the deaths reveals that his assassination method is impossibly precise and something profoundly anomalous is going on. L has been tipped off that Kira exists. Whatever the bogus justification may be, this is a major victory for his opponents. (To deter criminals and villains, it is not necessary for there to be a globally-known single anomalous or supernatural killer, when it would be equally effective to arrange for all the killings to be done naturalistically by ordinary mechanisms such as third parties/police/judiciary or used indirectly as parallel construction to crack cases.)

Worse, the deaths are non-random in other ways—they tend to occur at particular times!

Just the scheduling of deaths cost Light 6 bits of anonymity

Light’s third mistake was reacting to the blatant provocation of Lind L. Tailor.

Taking the bait let L narrow his target down to 1⁄3 the original Japanese population, for a gain of ~1.6 bits. 4. Light’s fourth mistake was to use confidential police information stolen using his policeman father’s credentials.

This mistake was the largest in bits lost. This mistake cost him 11 bits of anonymity; in other words, this mistake cost him twice what his scheduling cost him and almost 8 times the murder of Tailor! 5. Killing Ray Penbar and the FBI team.

If we assume Penbar was tasked 200 leads out of the 10,000, then murdering him and the fiancee dropped Light just 6 bits or a little over half the fourth mistake and comparable to the original scheduling mistake. 6. Endgame: At this point in the plot, L resorts to direct measures and enters Light’s life directly, enrolling at the university, with Light unable to perfectly play the role of innocent under intense in-person surveillance.

From that point on, Light is screwed as he is now playing a deadly game of “Mafia” with L & the investigative team. He frittered away >25 bits of anonymity and then L intuited the rest and suspected him all along.

Finally, I suggest how Light could have most effectively employed the Death Note and limited his loss of anonymity. In an appendix, I discuss the maximum amount of information leakage possible from using a Death Note as a communication device.

(Note: This essay assumes a familiarity with the early plot of Death Note and Light Yagami. If you are unfamiliar with DN, see my Death Note Ending essay or consult Wikipedia or read the DN rules⁠.)

DETECTIVE STORIES AS OPTIMIZATION PROBLEMS

In Light’s case, L starts with the world’s entire population of 7 billion people and needs to narrow it down to 1 person. It’s a search problem. It maps fairly directly onto basic information theory⁠, in fact. (See also Simulation inferences⁠, The 3 Grenades⁠, and for case studies in applied deanonymization, Tor DNM-related arrests, 2011–2015⁠.) To uniquely specify one item out of 7 billion, you need 33 bits of information because $\log_2(7000000000) \approx 32.7$; to use an analogy, your 32-bit computer can only address one unique location in memory out of 4 billion locations, and adding another bit doubles the capacity to >8 billion. Is 33 bits of information a lot?

Not really. L could get one bit just by looking at history or crime statistics, and noting that mass murderers are, to an astonishing degree, male¹⁠, thereby ruling out half the world population and actually starting L off with a requirement to obtain only 32 bits to break Light’s anonymity.² If Death Note users were sufficiently rational & knowledgeable, they could draw on concepts like superrationality to acausally cooperate³ to avoid this information leakage… by arranging to pass on Death Notes to females⁴ to restore a 50:50 gender ratio—for example, if for every female who obtained a Death note there were 3 males with Death Notes, then all users could roll a 1d3 dice and if 1 keep it and if 2 or 3 pass it on to someone of the opposite gender.

We should first point out that Light is always going to leak some bits. The only way he could remain perfectly hidden is to not use the Death Note at all. If you change the world in even the slightest way, then you have leaked information about yourself in principle. Everything is connected in some sense; you cannot magically wave away the existence of fire without creating a cascade of consequences that result in every living thing dying⁠. For example, the fundamental point of Light executing criminals is to shorten their lifespan—there’s no way to hide that. You can’t both shorten their lives and not shorten their lives. He is going to reveal himself this way, at the least, to the actuaries and statisticians.

More historically, this has been a challenge for cryptographers, like in WWII: how did they exploit the Enigma & other communications without revealing they had done so? Their solution was misdirection: constantly arranging for plausible alternatives⁠, like search planes that ‘just happened’ to find German submarines or leaks to controlled known German agents about there being undiscovered spies. (However, the famous story that Winston Churchill allowed the town of Coventry to be bombed rather than risk the secret of Ultra has since been put into question⁠.) This worked in part because of German overconfidence, because the war did not last too long, and in part because each cover story was plausible on its own and no one was, in the chaos of war, able to see the whole picture and realize that there were too many lucky search planes and too many undiscoverable moles; eventually, however, someone would realize, and apparently some Germans did conclude that Enigma had to have been broken (but much too late). It’s not clear to me what would be the best misdirection for Light to mask his normal killings—use the Death Note’s control features to invent a anti-criminal terrorist organization?

So there is a real challenge here: one party is trying to infer as much as possible from observed effects, and the other is trying to minimize how much the former can observe while not stopping entirely. How well does Light balance the competing demands?

MISTAKE 1

However, he can try to reduce the leakage and make his anonymity set as large as possible. For example, killing every criminal with a heart attack is a dead give-away. Criminals do not die of heart attacks that often. (The point is more dramatic if you replace ‘heart attack’ with ‘lupus’; as we all know, in real life it’s never lupus.) Heart attacks are a subset of all deaths, and by restricting himself, Light makes it easier to detect his activities. 1000 deaths of lupus are a blaring red alarm; 1000 deaths of heart attacks are an oddity; and 1000 deaths distributed over the statistically likely suspects of cancer and heart disease etc. are almost invisible (but still noticeable in principle).

So, Light’s fundamental mistake is to kill in ways unrelated to his goal. Killing through heart attacks does not just make him visible early on, but the deaths reveals that his assassination method is supernaturally precise. L has been tipped off that Kira exists. Whatever the bogus justification may be, this is a major victory for his opponents.

First mistake, and a classic one of serial killers (eg the BTK killer’s vaunting was less anonymous than he believed): delusions of grandeur and the desire to taunt, play with, and control their victims and demonstrate their power over the general population. From a literary perspective, this similarity is clearly not an accident, as we are meant to read Light as the Sociopath Hero archetype: his ultimate downfall is the consequence of his fatal personality flaw⁠, hubris⁠, particularly in the original sadistic sense. Light cannot help but self-sabotage like this.

(This is also deeply problematic from the point of carrying out Light’s theory of deterrence: to deter criminals and villains, it is not necessary for there to be a globally-known single supernatural killer, when it would be equally effective to arrange for all the killings to be done naturalistically by third parties/police/judiciary or used indirectly to crack cases. Arguably the deterrence would be more effective the more diffused it’s believed to be—since a single killer has a finite lifespan, finite knowledge, fallibility, and idiosyncratic preferences which reduce the threat and connection to criminality, while if all the deaths were ascribed to unusually effective police or detectives, this would be inferred as a general increase in all kinds of police competence, one which will not instantly disappear when one person gets bored or hit by a bus.)

MISTAKE 2

Worse, the deaths are non-random in other ways—they tend to occur at particular times! Graphed, daily patterns jump out.

L was able to narrow down the active times of the presumable student or worker to a particular range of longitude, say 125–150° out of 180°; and what country is most prominent in that range? Japan. So that cut down the 7 billion people to around 0.128 billion; 0.128 billion requires 27 bits ($\log_2(128000000) \approx 26.93$) so just the scheduling of deaths cost Light 6 bits of anonymity!

De-Anonymization

On a side-note, some might be skeptical that one can infer much of anything from the graph and that Death Note was just glossing over this part. “How can anyone infer that it was someone living in Japan just from 2 clumpy lines at morning and evening in Japan?” But actually, such a graph is surprisingly precise. I learned this years before I watched Death Note, when I was heavily active on Wikipedia; often I would wonder if two editors were the same person or roughly where an editor lived. What I would do if their edits or user page did not reveal anything useful is I would go to “Kate’s edit counter” and I would examine the times of day all their hundreds or thousands of edits were made at. Typically, what one would see was ~4 hours where there were no edits whatsoever, then ~4 hours with moderate to high activity, a trough, then another gradual rise to 8 hours later and a further decline down to the first 4 hours of no activity. These periods quite clearly corresponded to sleep (pretty much everyone is asleep at 4 AM), morning, lunch & work hours, evening, and then night with people occasionally staying up late and editing⁵⁠. There was noise, of course, from people staying up especially late or getting in a bunch of editing during their workday or occasionally traveling, but the overall patterns were clear—never did I discover that someone was actually a nightwatchman and my guess was an entire hemisphere off. (Academic estimates based on user editing patterns correlate well with what is predicted by on the basis of the geography of IP edits.⁶)

Computer security research offers more scary results. Perhaps because “everything is correlated”⁠, there are an amazing number of ways to break someone’s privacy and de-anonymize them (background⁠; there is also financial incentive to do so in order to advertise & price discriminate):

small errors in their computer’s clock’s time (even over Tor)
Web browsing history ⁷ or just the version and plugins ⁸⁠; and this is when random Firefox or Google Docs or Facebook bugs don’t leak your identity
Timing attacks based on how slow pages load⁹ (how many cache misses there are; timing attacks can also be used to learn website usernames or # of private photos)
Knowledge of what ‘groups’ a person was in could uniquely identify 42%¹⁰ of people on social networking site XING⁠, and possibly Facebook & 6 others
Similarly, knowing just a few movies someone has watched¹¹⁠, popular or obscure, through Netflix often grants access to the rest of their profile if it was included in the Netflix Prize⁠. (This was more dramatic than the AOL search data scandal because AOL searches had a great deal of personal information embedded in the search queries, but in contrast, the Netflix data seems impossibly impoverished—there’s nothing obviously identifying about what anime one has watched unless one watches obscure ones.)
The researchers generalized their Netflix work to find isomorphisms between arbitrary graphs¹² (such as social networks stripped of any and all data except for the graph structure), for example Flickr and Twitter⁠, and give many examples of public datasets that could be de-anonymized¹³—such as your Amazon purchases (Calandrino et al 2011⁠; blog). These attacks are on just the data that is left after attempts to anonymize data; they don’t exploit the observation that the choice of what data to remove is as interesting as what is left, what Julian Sanchez calls “The Redactor’s Dilemma”⁠.
Usernames hardly bear discussing
Your hospital records can be de-anonymized just by looking at public voting rolls¹⁴ That researcher later went on to run “experiments on the identifiability of de-identified survey data [cite], pharmacy data [cite], clinical trial data [cite], criminal data [State of Delaware v. Gannett Publishing], DNA [cite⁠, cite⁠, cite], tax data, public health registries [cite (sealed by court), etc.], web logs, and partial Social Security numbers [cite].” (Whew.)
Your typing is surprisingly unique and the sounds of typing and arm movements can identify you or be used snoop on input & steal passwords
Knowing your morning commute as loosely as to the individual blocks (or less granular) uniquely identifies (Golle & Partridge 2009) you; knowing your commute to the zip code/census tract uniquely identifies 5% of people
Your handwriting is fairly unique, sure—but so is how you fill in bubbles on tests¹⁵
Speaking of handwriting, your writing style can be pretty unique too
the unnoticeable background electrical hum may uniquely date audio recordings⁠. Unnoticeable sounds can also be used to persistently track devices/people, exfiltrate information across air gaps, and can be used to monitor room presence/activity, and even monitor finger movements or tapping noises to help break passphrases or copy physical keys
you may have heard of laser microphones for eavesdropping… but what about eavesdropping via video recording of potato chip bags or candy wrappers or hanging light bulbs? (press release), or cellphone gyroscopes? Lasers are good for detecting your heartbeat as well, which is—of course—uniquely identifying And hard drives can be turned into microphones. Soon even Light’s potato chips will no longer be safe…
steering & driving patterns are sufficiently unique as to allow identification of drivers from as little as 1 turn in some cases: Hallac et al 2017⁠. These attacks also work on smartphones for time zone, barometric pressure, public transportation timing, IP address, & pattern of connecting to WiFi or cellular networks (Mosenia et al 2017)
smartphones can be IDed by the pattern of pixel noise, due to sensor noise such as small imperfections in the CCD sensors and lenses (and Facebook has even patented this)
smartphone usage patterns, such as app preferences, app switching rates, consistency of commute patterns, overall geographic mobility, slower or less driving have been correlated with Alzheimer’s disease (Kourtis et al 2019) and personality (Stachl et al 2019).¹⁶

Eye tracking is also interesting⁠.
voices correlate with not just age/gender/ethnicity, but… overall facial appearance?

(The only surprising thing about DNA-related privacy breaks is how long they have taken to show up.)

To summarize: differential privacy is almost impossible¹⁷ and privacy is dead¹⁸⁠. (See also “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization”⁠.)

MISTAKE 3

Light’s third mistake was reacting to the canary trap provocation of the Lind L. Tailor broadcast, criticizing Kira, and Light lashing out to use the clearly-visible name & face to kill Lind L. Tailor. The live broadcast was a blatant attempt to provoke a reaction—any reaction—from a surprised & unprepared Light, and that alone should have been sufficient reason to simply ignore it (even if Light could not have reasonably known exactly how it was a trap): one should never do what an enemy wants one to do on ground & terms & timing prepared by the enemy. (Light had the option to use the Death Note at any time in the future, and that would have been almost as good a demonstration of his power as doing so during a live broadcast.)

Running the broadcast in 1 region was also a gamble & a potential mistake on L’s part; he had no real reason to think Light was in Kanto (or if he did already have priors/information to that effect, he should’ve been bisecting Kanto) and should have arranged for it to be broadcast to exactly half of Japan’s population, obtaining an expected maximum of 1 bit. But it was one that paid off; he narrowed his target down to 1⁄3 the original Japanese population, for a gain of ~1.6 bits. (You can see it was a gamble by considering if Light had been outside Kanto; since he would not see it live, he would not have reacted, and all L would learn is that his suspect was in that other 2⁄3 of the population, for a gain of only ~0.3 bits.)

But even this wasn’t a huge mistake. He lost 6 bits to his schedule of killing, and lost another 1.6 bits to temperamentally killing Lind L. Tailor, but since the male population of Kanto is 21.5 million (43 million total), he still has ~24 bits of anonymity left (log₂ (21500000) ≈ 24.36). That’s not too terrible, and the loss is mitigated even further by other details of this mistake, as pointed out by Zmflavius⁠; specifically, that unlike “being male” or “being Japanese”, the information about being in Kanto is subject to decay, since people move around all the time for all sorts of reasons:

…quite possibly Light’s biggest mistake was inadvertently revealing his connection to the police hierarchy by hacking his dad’s computer. Whereas even the Lind L. Taylor debacle only revealed his killing mechanics and narrowed him down to “someone in the Kanto region” (which is, while an impressive accomplishment based on the information he had, entirely meaningless for actually finding a suspect), there were perhaps a few hundred people who had access to the information Light’s dad had. There’s also the fact that L knew that Light was probably someone in their late teens, meaning that there was an extremely high chance that at the end of the school year, even that coup of his would expire, thanks to students heading off to university all over Japan (of course, Light went to Toudai⁠, and a student of his caliber not attending such a university would be suspicious, but L had no way of knowing that then). I mean, perhaps L had hoped that Kira would reveal himself by suddenly moving away from the Kanto region, but come the next May, he would have no way of monitoring unusual movements among late teenagers, because a large percentage of them would be moving for legitimate reasons.

(One could still run the inference “backwards” on any particular person to verify they were in Kanto in the right time period, but as time passes, it becomes less possible to run the inference “forwards” and only examine people in Kanto.)

This mistake also shows us that the important thing that information theory buys us, really, is not the bit (we could be using log₁₀ rather than log₂, and compares “dits” rather than “bits”) so much as comparing events in the plot on a logarithmic scale. If we simply looked at how the absolute number of how many people were ruled out at each step, we’d conclude that the first mistake by Light was a debacle without compare since it let L rule out >6 billion people, approximately 60× more people than all the other mistakes put together would let L rule out. Mistakes are relative to each other, not absolutes.

MISTAKE 4

Light’s fourth mistake was to use confidential police information stolen using his policeman father’s credentials. This was unnecessary as there are countless criminals he could still execute using public information (face+name is not typically difficult to get), and if for some reason he needed a specific criminal, he could either restrict use of secret information to a few high-priority victims—if only to avoid suspicions of hacking & subsequent security upgrades costing him access!—or manufacture, using the Death Note’s coercive powers or Kira’s public support, a way to release information such as a ‘leak’ or passing public transparency laws.

This mistake was the largest in bits lost. But interestingly, many or even most Death Note fans do not seem to regard this as his largest mistake, instead pointing to his killing Lind L. Tailor or perhaps relying too much on Mikami. The information theoretical perspective strongly disagrees, and lets us quantify how large this mistake was.

When he acts on the secret police information, he instantly cuts down his possible identity to one out of a few thousand people connected to the police. Let’s be generous and say 10,000. It takes 14 bits to specify 1 person out of 10,000 (log₂ (10000) ≈ 13.29)—as compared to the 24–25 bits to specify a Kanto dweller.

This mistake cost him 11 bits of anonymity; in other words, this mistake cost him twice what his scheduling cost him and almost 8 times the murder of Tailor!

MISTAKE 5

In comparison, the fifth mistake, murdering Ray Penbar’s fiancee and focusing L’s suspicion on Penbar’s assigned targets was positively cheap. If we assume Penbar was tasked 200 leads out of the 10,000, then murdering him and the fiancee dropped Light from 14 bits to 8 bits (log₂ (200) ≈ 7.64) or just 6 bits or a little over half the fourth mistake and comparable to the original scheduling mistake.

ENDGAME

At this point in the plot, L resorts to direct measures and enters Light’s life directly, enrolling at the university. From this point on, Light is screwed as he is now playing a deadly game of Mafia with L & the investigative team. He frittered away >25 bits of anonymity and then L intuited the rest and suspected him all along. (We could justify L skipping over the remaining 8 bits by pointing out that L can analyze the deaths and infer psychological characteristics like arrogance, puzzle-solving, and great intelligence, which combined with heuristically searching the remaining candidates, could lead him to zero in on Light.)

From the theoretical point of view, the game was over at that point. The challenge for L then became proving it to L’s satisfaction under his self-imposed moral constraints.¹⁹

SECURITY IS HARD (LET’S GO SHOPPING)

What should Light have done? That’s easy to answer, but tricky to implement.

One could try to manufacture disinformation. Terence Tao rehearses many of the above points about information theory & anonymity, and goes on to loosely discuss the possible benefits of faking information:

…one additional way to gain more anonymity is through deliberate disinformation. For instance, suppose that one reveals 100 independent bits of information about oneself. Ordinarily, this would cost 100 bits of anonymity (assuming that each bit was a priori equally likely to be true or false), by cutting the number of possibilities down by a factor of 2¹⁰⁰; but if 5 of these 100 bits (chosen randomly and not revealed in advance) are deliberately falsified, then the number of possibilities increases again by a factor of (100 choose 5) ~ 2²⁶, recovering about 26 bits of anonymity. In practice one gains even more anonymity than this, because to dispel the disinformation one needs to solve a satisfiability problem, which can be notoriously intractable computationally, although this additional protection may dissipate with time as algorithms improve (e.g. by incorporating ideas from compressed sensing).

RANDOMIZING

The difficulty with suggesting that Light should—or could—have used disinformation on the timing of deaths is that we are, in effect, engaging in a sort of hindsight bias⁠. How exactly is Light or anyone supposed to know that L could deduce his timezone from his killings? I mentioned an example of using Wikipedia edits to localize editors, but that technique was unique to me among WP editors²⁰ and no doubt there are many other forms of information leakage I have never heard of despite compiling a list; if I were Light, even if I remembered my Wikipedia technique, I might not bother evenly distributing my killing over the clock or adopting a deceptive pattern (eg suggesting I was in Europe rather than Japan). If Light had known he was leaking timing information but didn’t know that someone out there was clever enough to use it (a “known unknown”), then we might blame him; but how is Light supposed to know these “unknown unknowns”?

Randomization is the answer. Randomization and encryption scramble the correlations between input and output, and they would serve as well in Death Note as they do in cryptography & statistics in the real world, at the cost of some efficiency. The point of randomization, both in cryptography and in statistical experiments, is to not just prevent the leaked information or confounders (respectively) you do know about but also the ones you do not yet know about.

To steal & paraphrase an example from Jim Manzi’s Uncontrolled: you’re running a weight-loss experiment. You know that the effectiveness might vary with each subject’s pre-existing weight, but you don’t believe in randomization (you’re a practical man! only prissy statisticians worry about randomization!); so you split the subjects by weight, and for convenience you allocate them by when they show up to your experiment—in the end, there are exactly 10 experimental subjects over 150 pounds and 10 controls over 150 pounds, and so on and so forth. Unfortunately, it turns out that unbeknownst to you, a genetic variant controls weight gain and a whole extended family showed up at your experiment early on and they all got allocated to ‘experimental’ and none of them to ‘control’ (since you didn’t need to randomize, right? you were making sure the groups were matched on weight!). Your experiment is now bogus and misleading. Of course, you could run a second experiment where you make sure the experimental and control groups are matched on weight and also now matched on that genetic variant… but now there’s the potential for some third confounder to hit you. If only you had used randomization—then you would probably have put some of the variants into the other group as well and your results wouldn’t’ve been bogus!

So to deal with Light’s first mistake, simply scheduling every death on the hour will not work because the wake-sleep cycle is still present. If he set up a list and wrote down n criminals for each hour to eliminate the peak-troughs rather than randomizing, could that still go wrong? Maybe: we don’t know what information might be left in the data which an L or Turing could decipher. I can speculate about one possibility—the allocation of each kind of criminal to each hour. If one were to draw up lists and go in order (hey, one doesn’t need randomization, right?), then the order might go ‘criminals in the morning newspaper, criminals on TV, criminals whose details were not immediately given but were available online, criminals from years ago, historical criminals etc’; if the morning-newspaper-criminals start at say 6 AM Japan time… And allocating evenly might be hard, since there’s naturally going to be shortfalls when there just aren’t many criminals that day or the newspapers aren’t publishing (holidays?) etc., so the shortfall periods will pinpoint what the Kira considers ‘end of the day’.

A much safer procedure is thorough-going randomization applied to timing, subjects, and manner of death. Even if we assume that Light was bound and determined to reveal the existence of Kira and gain publicity and international notoriety (a major character flaw in its own right; accomplishing things, taking credit—choose one), he still did not have to reduce his anonymity much past 32 bits.

Each execution’s time could be determined by a random dice roll (say, a 24-sided dice for hours and a 60-sided dice for minutes).
Selecting method of death could be done similarly based on easily researched demographic data, although perhaps irrelevant (serving mostly to conceal that a killing has taken place).
Selecting criminals could be based on internationally accessible periodicals that plausibly every human has access to, such as the New York Times, and deaths could be delayed by months or years to broaden the possibilities as to where the Kira learned of the victim (TV? books? the Internet?) and avoiding issues like killing a criminal only publicized on one obscure Japanese public television channel. And so on.

Let’s remember that all this is predicated on anonymity, and on Light using low-tech strategies; as one person asked me, “why doesn’t Light set up an cryptographic assassination market or just take over the world? He would win without all this cleverness.” Well, then it would not be Death Note.

APPENDICES

COMMUNICATING WITH A DEATH NOTE

One might wonder how much information one could send intentionally with a Death Note, as opposed to inadvertently leak bits about one’s identity. As deaths are by and large publicly known information, we’ll assume the sender and recipient have some sort of pre-arranged key or one-time pad (although one would wonder why they’d use such an immoral and clumsy system as opposed to steganography or messages online).

A death inflicted by a Death Note has 3 main distinguishing traits which one can control—who, when, and how:

the person

The ‘who?’ is already calculated for us: if it takes 33 bits to specify a unique human, then a particular human can convey 33 bits. Concerns about learnability (how would you learn of an Amazon tribesman’s death?) imply that it’s really <33 bits.

If you try some scheme to encode more bits into the choice of assassination, you either wind up with 33 bits or you wind up unable to convey certain combinations of bits and effectively 33 bits anyway—your scheme will tell you that to convey your desperately important message X of 50 bits telling all about L’s true identity and how you discovered it, you need to kill an Olafur Jacobs of Tanzania who weighs more than 200 pounds and is from Taiwan, but alas! Jacobs doesn’t exist for you to kill.
the time

The ‘when’ is handled by similar reasoning. There is a certain granularity to Death Note kills: even if it is capable of timing deaths down to the nanosecond, one can’t actually witness this or receive records of this. Doctors may note time of death down to the minute, but no finer (and how do you get such precise medical records anyway?). News reports may be even less accurate, noting merely that it happened in the morning or in the late evening. In rare cases like live broadcasts, one may be able to do a little better, but even they tend to be delayed by a few seconds or minutes to allow for buffering, technical glitches be fixed, the stenographers produce the closed captioning, or simply to guard against embarrassing events (like Janet Jackson’s nipple-slip). So we’ll not assume the timing can be more accurate than the minute. But which minutes does a Death Note user have to choose from? Inasmuch as the Death Note is apparently incapable of influencing the past or causing Pratchettian²¹ superluminal effects, the past is off-limits; but messages also have to be sent in time for whatever they are supposed to influence, so one cannot afford to have a window of a century. If the message needs to affect something within the day, then the user has a window of only 60 · 24 = 1440 minutes, which is log₂(1440) = 10.49 bits; if the user has a window of a year, that’s slightly better, as a death’s timing down to the minute could embody as much as log₂(60 · 24 · 365) = 19 bits. (Over a decade then is 22.3 bits, etc.) If we allow timing down to the second, then a year would be 24.9 bits. In any case, it’s clear that we’re not going to get more than 33 bits from the date. On the plus side, an ‘IP over Death’ protocol would be superior to some other protocols—here, the worse your latency, the more bits you could extract from the packet’s timestamp! Dinosaur Comics on compression schemes:
the circumstances (such as the place)

The ‘how’… has many more degrees of freedom. The circumstances is much more difficult to calculate. We can subdivide it in a lot of ways; here’s one:
1. Location (eg. latitude/longitude)
  
  Earth has ~510,072,000,000 square meters of surface area; most of it is entirely useless from our perspective—if someone is in an airplane and dies, how on earth does one figure out the exact square meter he was above? Or on the oceans? Earth has ~148,940,000,000 square meters of land, which is more usable: the usual calculations gives us log₂(148940000000) = 37.12 bits. (Surprised at how similar to the ‘who?’ bit calculation this is? But 37.12 - 33 = 4.12 and 2^4.12 = 17.4. The SF classic Stand on Zanzibar drew its name from the observation that the 7 billion people alive in 2010 would fit in Zanzibar only if they stood shoulder to shoulder—spread them out, and multiply that area by ~18…) This raises an issue that affects all 3: how much can the Death Note control? Can it move victims to arbitrary points in, say, Siberia? Or is it limited to within driving distance? etc. Any of those issues could shrink the 37 bits by a great deal.
2. Cause Of Death
  
  The International Classification of Diseases lists upwards of 20,000 diseases, and we can imagine thousands of possible accidental or deliberate deaths. But what matters is what gets communicated: if there are 500 distinct brain cancers but the death is only reported as ‘brain cancer’, the 500 count as 1 for our purposes. But we’ll be generous and go with 20,000 for reported diseases plus accidents, which is log₂(20000) = 14.3 bits.
3. Action Prior To Death
  
  Actions prior to death overlaps with accidental causes; here the series doesn’t help us. Light’s early experiments culminating in the “L, do you know death gods love apples?” seem to imply that actions are limited in entropy as each word took a death (assuming the ordinary English vocabulary of 50,000 words, 16 bits), but other plot events imply that humans can undertake long complex plans at the order of Death Notes (like Mikami bringing the fake Death Note to the final confrontation with Near). Actions before death could be reported in great detail, or they could be hidden under official secrecy like the aforementioned death gods mentioned (Light uniquely privileged in learning it succeeded as part of L testing him). I can’t begin to guess how many distinct narratives would survive transmission or what limits the Note would set. We must leave this one undefined: it’s almost surely more than 10 bits, but how many?

Summing, we get <33 + <19 + 17 + <37 + 14 + ? = 120? bits per death.

“BAYESIAN JURISPRUDENCE”

E.T. Jaynes in his posthumous Probability Theory: The Logic of Science (on Bayesian statistics) includes a chapter 5 on “Queer Uses For Probability Theory”⁠, discussing such topics as ESP; miracles; heuristics & biases⁠; how visual perception is theory-laden; philosophy of science with regard to Newtonian mechanics and the famed discovery of Neptune⁠; horse-racing & weather forecasting; and finally—section 5.8, “Bayesian jurisprudence”. Jaynes’s analysis is somewhat similar in spirit to my above analysis, although mine is not explicitly Bayesian except perhaps in the discussion of gender as eliminating one necessary bit.

The following is an excerpt; see also “Bayesian Justice”⁠.

It is interesting to apply probability theory in various situations in which we can’t always reduce it to numbers very well, but still it shows automatically what kind of information would be relevant to help us do plausible reasoning. Suppose someone in New York City has committed a murder, and you don’t know at first who it is, but you know that there are 10 million people in New York City. On the basis of no knowledge but this, e(Guilty|X) = −70 db is the plausibility that any particular person is the guilty one.

How much positive evidence for guilt is necessary before we decide that some man should be put away? Perhaps +40 db, although your reaction may be that this is not safe enough, and the number ought to be higher. If we raise this number we give increased protection to the innocent, but at the cost of making it more difficult to convict the guilty; and at some point the interests of society as a whole cannot be ignored.

For example, if 1000 guilty men are set free, we know from only too much experience that 200 or 300 of them will proceed immediately to inflict still more crimes upon society, and their escaping justice will encourage 100 more to take up crime. So it is clear that the damage to society as a whole caused by allowing 1000 guilty men to go free, is far greater than that caused by falsely convicting one innocent man.

If you have an emotional reaction against this statement, I ask you to think: if you were a judge, would you rather face one man whom you had convicted falsely; or 100 victims of crimes that you could have prevented? Setting the threshold at +40 db will mean, crudely, that on the average not more than one conviction in 10,000 will be in error; a judge who required juries to follow this rule would probably not make one false conviction in a working lifetime on the bench.

In any event, if we took +40 db starting out from −70 db, this means that in order to ensure a conviction you would have to produce about 110 db of evidence for the guilt of this particular person. Suppose now we learn that this person had a motive. What does that do to the plausibility for his guilt? Probability theory says

$e (Guilty | Motive) = e (Guilty | X) + 10 l o g_{10} \frac{P (Motive | Guilty)}{P (Motive | Not Guilty)}$ (5-38)

$≃ - 70 - 10 l o g_{10} P (Motive | Not Guilty)$

since $P (Motive | Guilty) ≃ 1$ , i.e. we consider it quite unlikely that the crime had no motive at all. Thus, the [importance] of learning that the person had a motive depends almost entirely on the probability $P (Motive | Not Guilty)$ that an innocent person would also have a motive.

This evidently agrees with our common sense, if we ponder it for a moment. If the deceased were kind and loved by all, hardly anyone would have a motive to do him in. Learning that, nevertheless, our suspect did have a motive, would then be very [important] information. If the victim had been an unsavory character, who took great delight in all sorts of foul deeds, then a great many people would have a motive, and learning that our suspect was one of them is not so [important]. The point of this is that we don’t know what to make of the information that our suspect had a motive, unless we also know something about the character of the deceased. But how many members of juries would realize that, unless it was pointed out to them?

Suppose that a very enlightened judge, with powers not given to judges under present law, had perceived this fact and, when testimony about the motive was introduced, he directed his assistants to determine for the jury the number of people in New York City who had a motive. If this number is $N_{m}$ then

$P (Motive | Not Guilty) = \frac{N_{m} - 1}{(Number of people in New York) - 1} ≃ 10^{- 7} (N_{m} - 1)$

and equation (5-38) reduces, for all practical purposes, to

$e (Guilty | Motive) ≃ - 10 log (N_{m} - 1)$ (5-39)

You see that the population of New York has canceled out of the equation; as soon as we know the number of people who had a motive, then it doesn’t matter any more how large the city was. Note that (5-39) continues to say the right thing even when $N_{m}$ is only 1 or 2.

You can go on this way for a long time, and we think you will find it both enlightening and entertaining to do so. For example, we now learn that the suspect was seen near the scene of the crime shortly before. From Bayes’ theorem, the [importance] of this depends almost entirely on how many innocent persons were also in the vicinity. If you have ever been told not to trust Bayes’ theorem, you should follow a few examples like this a good deal further, and see how infallibly it tells you what information would be relevant, what irrelevant, in plausible reasoning.²²

In recent years there has grown up a considerable literature on Bayesian jurisprudence; for a review with many references, see Vignaux and Robertson (1996) [This is apparently Interpreting Evidence: Evaluating Forensic Science in the Courtroom –Editor].

Even in situations where we would be quite unable to say that numerical values should be used, Bayes’ theorem still reproduces qualitatively just what your common sense (after perhaps some meditation) tells you. This is the fact that George Polya demonstrated in such o exhaustive detail that the present writer was convinced that the connection must be more than qualitative.