Last year I signed up for a membership at DaringFireball. I had been thinking about doing this since John first wrote about memberships in June 2004. Note that I did not sign up at that time. Why not? For many reasons, but not the least of which is that I wasn’t really into RSS yet (for those who don’t know, RSS is a technology which allows you to know when a site has been updated without having to go to it. Gross oversimplification, but the explanation that you need for my purposes if you don’t understand RSS). I just regularly pulled up the website and looked to see if there was anything new.

Nope, I didn’t actually join until the October 2005 membership drive with prizes was announced. One might conclude this makes me either a cheap bastard or a sucker for a contest. I saw it as motivation to metaphorically get off my literal duff, support the writing of someone whose website I enjoy, get a t-shirt, and maybe win a prize. Actual I signed up while sitting down, so I guess the duff thing was also metaphorical.

I actually paid $29 and got a t-shirt, which I have yet to wear. There’s probably something to be said about me psychologically when I didn’t pay $19 for something I was using and then paid an extra $10 for something I didn’t use, but let’s not dwell on that.

What does it get you?

The best quote about why to support DaringFireball comes from DaringFireball.

But please, I implore you, do not think of this as paying $20 just to get a full-content RSS feed. Think of it as a small token of my gratitude for supporting my writing at this site. It’s like when you pledge $100 to PBS and they send you a tote bag; no one does it to get the tote bag.

The first rule of supporting DF is that you do not not support DF for the t-shirt. Maybe that’s why I haven’t worn mine.

After all, I can read the site for free, I can use the DaringFireball summary RSS feed for free. I could manually check the Linked List myself or write a shell script to scrape it and alert me when it finds new stuff. I don’t need to subscribe to DF. That was the first realization, and it was an important one.

It’s neither about a tote-bag, a t-shirt, or an RSS feed.

It’s about supporting someone whose writing and observations I enjoy. It was about giving a little back, as I was making a bit of money from AdSense at the time. People were paying to read my stuff, I wanted to pay it forward, spread the weath, insert your trite analogy here.

That was how I thought about signing up, but even more so, I felt that I was paying for what had already been written. As I said, I had been reading the site for awhile, I had enjoyed it. It was as much a “thank you” as anything else. After all, it could have been that the next day/week/month DF would have had a post announcing “Sorry it didn’t work out, I’m going to work for Microsoft.” Ok maybe not the last part, but he could have decided not to write anymore.

Why did I do it? I did it because of excellent articles and analysis like the review of OmniWeb 5 public beta which had originally gotten my attention. I’ve also written in-depth about a for-pay browser in a free-browser world, maybe that’s another thing we have in common.

In fact, I think that’s probably something worth looking at: someone who is willing to pay for something better than that free alternatives. If you use OmniWeb or if you used Opera before it was free, you were paying for a better alternative than what was available for free. Is paying to support content otherwise available for free really that much of a jump?

As I said, I looked at it in different terms. My $29 got me a t-shirt and a chance to win some other software, and was something of an expression of thanks for the site I had already enjoyed.

Now that the time to renew was approaching, it was time to ask again: What does my $19 get me? Or, to think in my terms, what did my $19 get me for the past year? So I decided to take a closer look at what my subscription supported last year:

  1. Membership Numbers 27 Oct 2005
  2. Full Metal Jacket 11 Nov 2005
  3. Full Metal Jacket Addenda 16 Nov 2005
  4. ‘Restart Apache’ AppleScript 30 Nov 2005
  5. A Public Service Announcement Regarding ‘callto:’ URIs and Safari 16 Dec 2005
  6. Merry 26 Dec 2005
  7. MWSF ’06 Predictions 10 Jan 2006
  8. iLife ’06 From the Perspective of an Anthropomorphized Brushed Metal Interface 13 Jan 2006
  9. iTunes MiniStore Is Now Opt-In 18 Jan 2006
  10. Macworld Expo 2006 in Review 20 Jan 2006
  11. Smart Crash Reports 30 Jan 2006
  12. Bedecked 2 Feb 2006
  13. Smart Crash Reports Addenda 13 Feb 2006
  14. Joyeur 20 Feb 2006
  15. The Safari Shell Script Execution Exploit 22 Feb 2006
  16. Familiarity Breeds a User Base 5 Mar 2006
  17. Switching Is Afoot 8 Mar 2006
  18. Speaking at SXSW 10 Mar 2006
  19. The Annals of Journalism 10 Mar 2006
  20. The iPod Juggernaut 24 Mar 2006
  21. Stubborn Chronicle Staff Writer Is an Ignoramus 27 Mar 2006
  22. Adios Avie 31 Mar 2006
  23. ‘Repair Permissions’ Is Not a Recommended Step When Applying System Updates 3 Apr 2006
  24. Windows: The New Classic 6 Apr 2006
  25. Several Asinine and/or Risky Ideas Regarding Apple’s Strategy That Boot Camp Does Not Portend 10 Apr 2006
  26. Seriously, ‘Repair Permissions’ Is Voodoo 12 Apr 2006
  27. O’Grady v. Superior Court 13 Apr 2006
  28. New T-Shirts 19 Apr 2006
  29. Initiative 20 Apr 2006
  30. Cringely’s Machinations 22 Apr 2006
  31. When ‘Smart’ Cut/Copy/Paste Attacks 26 Apr 2006
  32. More on NSTextView’s ‘Smart’ Cut/Copy/Paste 27 Apr 2006
  33. Aperture Dirt 28 Apr 2006
  34. Feed Me 30 Apr 2006
  35. Good Journalism 2 May 2006
  36. More Aperture Dirt 4 May 2006
  37. Using .htaccess Redirection to Standardize Web Server Addresses 5 May 2006
  38. Goal 9 May 2006
  39. Jackass of the Week: Rob Glaser 11 May 2006
  40. Tim Bray on iCal 12 May 2006
  41. The Last Pixel 12 May 2006
  42. ‘Web Kit’ vs. ‘WebKit’ 15 May 2006
  43. Confidence Game 26 May 2006
  44. And Oranges 15 Jun 2006
  45. Why Apple Won’t Open Source Its Apps 19 Jun 2006
  46. Interoperability and DRM Are Mutually Exclusive 20 Jun 2006
  47. Hot Off the Press 26 Jun 2006
  48. The Mac OS X Tipping Point 8 Jul 2006
  49. Standing in Line With Mr. Jimmy 15 Jul 2006
  50. Magic 8-Ball Answers Your Questions Regarding Microsoft’s ‘Zune’ 25 Jul 2006
  51. Regarding Brian Krebs’s Reporting on the Supposed MacBook Wi-Fi Exploit 3 Aug 2006
  52. Highly Selective 4 Aug 2006
  53. WWDC Prelude 6 Aug 2006
  54. Regarding the Hardware Announcements at WWDC 2006 16 Aug 2006
  55. Site Preferences and T-Shirts 17 Aug 2006
  56. Jackass of the Week: Paul Thurrott 18 Aug 2006
  57. The Curious Case of the Supposed MacBook Wi-Fi Hack 21 Aug 2006
  58. A Bit More Regarding the MacBook Wireless Security Saga 23 Aug 2006
  59. Vacation, All I Ever Wanted 30 Aug 2006
  60. An Open Challenge to David Maynor and Jon Ellch 1 Sep 2006
  61. Update on the MacBook Wi-Fi Exploit Challenge 5 Sep 2006
  62. Lies, Damned Lies, and MacBook Wi-Fi Hacks 11 Sep 2006
  63. Buy New iPods From Amazon and Support Daring Fireball 12 Sep 2006
  64. Showtime: The Big Picture 13 Sep 2006
  65. High on Vapor Fumes 15 Sep 2006
  66. Showtime: The New iPods 15 Sep 2006
  67. Regarding Analyst Speculation on iPod Pricing 18 Sep 2006
  68. Regarding the Features and Capabilities of the Various Fifth Generation iPods 20 Sep 2006
  69. The AirPort Security Update and the Supposed MacBook Wi-Fi Hack 21 Sep 2006
  70. Jackass of the Week: Kieren McCarthy 26 Sep 2006
  71. Generalissimo Francisco Franco: Still Dead; Kieren McCarthy: Still a Jackass 28 Sep 2006
  72. New in 10.4.8: Zoom Using Scroll Wheel 29 Sep 2006
  73. Brand New 2 Oct 2006
  74. Some Assembly Required 5 Oct 2006
  75. BBColors 1.0 8 Oct 2006
  76. How to Determine if a Certain App Is Running Using AppleScript and Perl 10 Oct 2006
  77. Processing Processes 13 Oct 2006
  78. Using Keyboard Maestro to Intercept Keyboard Shortcuts Usurped by the System 16 Oct 2006
  79. Membership Renewal 17 Oct 2006
  80. Feed Me 18 Oct 2006
  81. Jackasses of the Week: Gartner Analysts Mark Stahlman and Charles Smulders 19 Oct 2006
  82. My Jackass Stamp Is Running Out of Ink 20 Oct 2006
  83. Jackass of the Week: Rush Limbaugh 25 Oct 2006
  84. Yet More Jackasses: Neal Mueller and the Business Editors of The Washington Post 26 Oct 2006
  85. Can I Get an ‘Hallelujah’ for Auto-Completion With the Esc and F5 Keys? 27 Oct 2006

I made that into a nifty ordered list in HTML to save me all the counting. 1 Wow, 85 articles! At $19 (the price for that those who didn’t order t-shirts that they didn’t wear) that works out to about $0.23/post. Less than a quarter per article.

OK I pretty much already knew that I was going to renew anyway, but 85 articles? You really can’t complain about that.

And yet, the Unix geek in me wouldn’t stop.

Because if you’re like me (and, as David Letterman says, “I pray to God that you’re not”) part of you wants to know, wants desperately to know, how many words were in those 85 articles? I mean, heck, anyone can write 85 articles if they’re 3 paragraphs long.

Go ahead, guess….. Guess… I can wait here all day. Ok, did you guess? Ok, but did you guess 109,398? No, probably not, and even if you did I wouldn’t believe you. Ok Rainman, how much is that per word? I’m not sure but I think it’s something like $0.00017368 per word 2. I’m not great at “The Math” but whatever it is, it’s pretty small.

Was it good for you, you Inner Unix Geek you? I see that look in your eyes, you want more data, compiled via as many geeky Unix scripts as you can get. First I bet you want to know how many articles there were per month.

Get Your Geek On

Alright Sports Fans: here come the scripts.

The first thing to do was get a local copy of the articles, so I could run various tests on them without having to keep hitting the server whenever I wanted some arcane bits (bytes?) of information. But we might as well dump the stuff that we aren’t going to do anything with right away.

For example, what if I could isolate the text of the articles without the header/sidebar/footer information? Well it turns out I can do just that because I’m a Unix Geek. By harnessing the power of Lynx and sed and forking and redirecting.

Fortunately for us the site is fairly standardized in its display. The bits that we want to ignore are fairly static for any short term period. lynx has -dump flag which will linearize the text of the page, giving you just the parts that you would normally see and a few extras. If you don’t have access to Lynx directly, let me describe it: 15 or 16 lines before a line that includes “Ads via The Deck”. Then the article, formatted in 80 characters, and then at the end there are lines to the “Previous” and “Next” articles. Of course you have no way of knowing how long the article will be.

Sed can do this. It can start at the first line and delete to the line that contains “Ads via The Deck” (and since that line is likely to be unique, it’s a good candidate for a match). It can also match the line “Previous:” but that’s a dangerous line because it’s not as likely to be unique, so we have to be more careful. Looking at the format of lynx -dump, that “Previous” line has 3 spaces from the beginning of the line, the word Previous, then a colon and another space. That translates to /^   Previous: /

So we’ll put that all together and it will look something like this:

for $DF_URLS in http://url.one http://url.two http://url.three 
do
short=`basename $DF_URLS` 
lynx -dump $DF_URLS |\ 
sed ‘1,/Ads via The Deck/d; / Previous: /,$d’ > $short
done 

Of course replace the URLs with the actual URLs from DaringFireball. Do this in a clean empty directory so the only files in it will be the output files. Then run ‘wc -w *’ and it will give you the word count for each file and then the total.

I bet you want to know: “What days of the week should I expect to see a DF post?”

for DAY in Sunday Monday Tuesday Wednesday Thursday Friday Saturday do
echo -n “$DAY: ” egrep “^$DAY, .*200(5|6)$” * |wc -l done

Sunday: 4
Monday: 14
Tuesday: 10
Wednesday: 15
Thursday: 16
Friday: 22
Saturday: 3

Well what about months?

for MONTH in November December January February \
March April May June July August September October 
do 
echo -n “$MONTH: ” egrep ” $MONTH 200(5|6)$” * |wc -l 
done

November: 3
December: 2
January: 5
February: 4
March: 7
April: 12
May: 9
June: 4
July: 3
August: 9
September: 13
October: 14

We need to tweak that output because one of those October posts was from 2005 and the rest were from 2006, so October 2006’s result is really 13.

Except for a bit of a light summer (June/July), you can see that post frequency went up a great deal after he went full-time on 20 April.

Ok, now you’re wondering: “How many of those posts included the word ‘jackass’?” Well just run a quick “fgrep -li jackass *”

  1. and_oranges
  2. gartner_jackasses
  3. jackass_kieren_mccarthy
  4. jackass_paul_thurrott
  5. jackass_rush_limbaugh
  6. jackass_stamp
  7. magic_8ball_zune
  8. mccarthy_still_a_jackass
  9. neal_mueller_washington_post
  10. rob_glaser_jackass

Man I’m tired but it’s a good, scripty tired.

Linked List Love

If you follow a lot of Mac sites, you read the same news over and over again. One site gets it, and 10 echo it. I actually have a folder of RSS feeds I call “Mac News” for sites which all pretty much cover the same thing. I’ve never bothered to check to see if any of them are better than the others or any of them just echo stuff I could hear elsewhere. It’s easy enough to just check that whole folder and scan the headlines.

The Linked List is something different. It not only covers Mac news, but also stuff around the ’net that the author finds interesting. I’ve realized over the past year that he and I share similar interest, from movies (The Departed was great) to basketball teams (Bird-era Celtics) to parenting to video games that can be played without controllers using 72 buttons and 3 joysticks. Yes there are rumors that he roots for a certain baseball team from New York and I’m from Boston, but these are things you do not talk about.

Where was I? Oh yes, the Linked List. There’s just a bunch of interesting stuff there, much of which I wouldn’t have found on my own. So that’s another reason that I like being a member of the site.

Then I started to wonder about the Linked List. Just how many posts have there been to it? So I loaded up each of the Linked List archives for the months in question.

Whoa. Just the size of the scrollbar alone told me “I’m not going to even try to count that.”

Clearly I’m going to have to script that.

for MONTH in november december january february \
march april may june july august september october 
do 
if [ "$MONTH" = "november" -o "$MONTH" = "december" ] 
then 
YEAR=2005 
else 
YEAR=2006
fi 

lynx -dump http://daringfireball.net/linked/$YEAR/$MONTH |\
fgrep -i http://|\
fgrep -vi http://daringfireball.net|\
awk ‘{print $2}’|\ cat -n > df.ll.$YEAR.$MONTH.txt done 

For those of you who don’t speak shell script, let me translate: We are in a FOR loop which will execute one time for each of the months named there. Note that I begin with November and end with October as those were the months of my subscription, but they needn’t be listed in the order there. My original idea was to use a counter which would increment each time through the loop and after 2 loops it would go from 2005 to 2006, but I decided that the IF/ELSE was more elegant/fewer moving parts.

Inside the loop we check to see if the month is either november or december. If so then it must be 2005, otherwise it is 2006. Then I ran a loop against the archive of the Linked List for each of those month/year combinations. I looked through the output for URLs (the first ‘fgrep’ line) and then excluded links to DF itself (the second fgrep line). I then picked out the 2nd item of each resulting line (which is the URL, not the number.. check the output of lynx -dump and you’ll see for yourself). Then I took that and numbered each line (cat -n) and stuck the output in a file. This last step was not necessary, I could have just run ‘wc -l’ (count lines) against each file. In fact I did such a thing:

wc -l *
   141 df.ll.2005.december.txt
   102 df.ll.2005.november.txt
   170 df.ll.2006.april.txt
   203 df.ll.2006.august.txt
   114 df.ll.2006.february.txt
   169 df.ll.2006.january.txt
   203 df.ll.2006.july.txt
   228 df.ll.2006.june.txt
   176 df.ll.2006.march.txt
   225 df.ll.2006.may.txt
   298 df.ll.2006.october.txt
   320 df.ll.2006.september.txt
2349 total

You can see that the Linked List is hugely active, and as I said before, many of these stories are things that I had not seen elsewhere. This is not just the duplication of content, links, stories that you get at all the “Mac News” sites, in fact many of them are not Mac related at all, but still interesting.

I did wonder about duplication. How many of those 2,349 links were unique (i.e. how many times would you see the same link referred to on DF’s Linked List)? That too was easy to deduce using “awk ‘{print $2}’ *|sort -u|wc -l” which translates to “Give me the 2nd column [which is the URL] from all the files (awk), sort them so that I just just unique lines (sort -u), and count the resulting lines. Answer: 2205. So 144 duplicates. Part of the problem was that I didn’t even try to filter out things like links to “The Deck” the highly unobtrusive ads which run on the site. Still, 2200 links in a year, filtered by a real human, almost all of it stuff that I haven’t seen 18 other times and places. And each piece comes with a line or two telling you what it’s about, so you know if it is something that you are interested in.

Whew.

Well if I wasn’t convinced already, here was certainly a mountain of evidence.

Think I’m a bit weird? Here’s the kicker: I had already sent in my renewal before I did all this. Why?

Because at the end of the day it still comes down to the fact that I like and enjoy the site enough to spend some money on it. The rest is just frosting.

Update 3 November 2006: So this post made the Linked List which explains how anyone else saw it :-)

The humble side of me doesn’t want to link to this, but it’s too good to pass up.

Ok, well I didn’t do it to get on the Linked List, but I probably would have been disappointed if it hadn’t made it.

TJ probably could have saved some time if he’d known that you can just add a “.text” extension to the permalink URL for any full article to get it in Markdown-formatted plain text.

Dude, what kind of lesson would that have been for padawan Unix Geeks?

Footnotes:

  1. BTW you might think that I just copied and pasted that list from the actual DaringFireball archives but I totally didn’t. DF shows the list in descending order (newest first) whereas I have it in ascending order (oldest first). Or maybe it’s the other way around… it’s like the nearsighted/farsighted thing, I can never remember which is which. Oh, and I have mine in an ordered list, whereas DF has them in paragraphs, much to the chagrin of semantic web enthusiasts everywhere (all 6 of them).

    I did, of course totally crib the footnotes.

  2. How I counted words: I made a list of all the URLs during my subscription period (see above ordered list) and then ran this loop:
    for $DF_URLS in (list of full URLs from above each separated by a space)
    do
    short=`basename $DF_URLS` 
    lynx -dump $DF_URLS |\ 
    sed ‘1,/Ads via The Deck/d; / Previous: /,$d’ > $short 
    WORDS=`wc -w $short` 
    echo “$short ($WORDS)” 
    done 

    This gave me a local copy of each article so I didn’t have to keep hitting the DF site, and worked fine except for the fact that there were two posts named “Feed Me” (30 Apr and 18 Oct 2006). So I manually ran lynx(1) for that URL and saved it to a different filename (feed_me-2). I then ran wc(1) — the Unix word count utility — with the -w flag to give me the number of words in each document. Note that this amount is slightly inflated since I did not bother to delete the words for the date/time stamp at the top of each post, which would count for approximately 4 words per post, or 340 words, which would bring the total to 109,058 words. Also note that this is not necessarily a 100% accurate count as I believe that wc(1) considers pretty much any character surrounded by whitespace as a “word” including things like footnote digits. So maybe we’re down to 109,000 words. Statistically insignificant.

    Note that there are three spaces before the word “Previous: ” This relates more to the format of the output of “lynx -dump” than DF’s authoring style itself. If I had been thinking more clearly I would have used an ^ to anchor the regex at the beginning of the line,