ChangeLog日誌 - 初学者の箸置

他人の言うことに翻弄されやすいミーハー野郎なので、2年ほど前からChangeLog形式のメモ日記つけてます。
タグもついでに書き込んでいるので、どんなタグが多いのかなー？と気になって車輪の再発明。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
import sys
import getopt

## Option
def usage():
   print "usage: %s [-ncda] filename\n"\
         "       -n ...  sorted by name\n"\
         "       -c ...  sorted by count (default)\n"\
         "       -d ...  sorted in descending order(default)\n"\
         "       -a ...  sorted in ascending order\n" % sys.argv[0]
   sys.exit(1)

check, dir = 1, -1    # sort by count and in descending order
opts, args = getopt.getopt(sys.argv[1:], "ncdah?")
if not args:
   usage()

for opt,arg in opts:
   if opt == "-n":      check = 0
   elif opt == "-c":    check = 1
   elif opt == "-d":    dir = -1
   elif opt == "-a":    dir = 1
   elif opt == "-h" or opt == "-?": usage()

## preparing regex
entry  = re.compile("^\s+\*\s+.+\[")# match "   * topicname: " line
tagpat = re.compile("\[(.+?)\]")   # match [...]-ish tag & capture it
## extract/count
tags = {}
for line in open(args[0], "r").readlines():
   if entry.match(line):
      for keyword in tagpat.findall(line):
         keyword = keyword.lower()
         try:
            tags[keyword] += 1
         except:
            tags[keyword] = 1
tagitems = tags.items()

## sort it with count (or name)
def create_sorter(check, dir):
   return lambda x,y: dir * cmp(x[check], y[check])

sorter = create_sorter(check, dir)
tagitems.sort(sorter)

for key,item in tagitems:
   print "%d: %s" % (item, key)
print "-------\ntotal=%d tags" % len(tagitems)

出力：

86: perl
84: lang
64: emacs
57: python
55: english
45: elisp
38: word
32: math
31: javascript
31: linux
30: web
30: haskell
29: macosx
26: ruby
25: idea
23: japanese
23: sicp
22: algorithm
21: sgx
21: vlsi
20: c++
18: unix
16: c#
16: hardware
15: ocaml
15: game
14: book
13: programming
13: physics
13: systemc
12: flash
12: asm
12: sql
11: 3d
11: gdi
11: database
10: java
10: life
9: scheme
　　：
　　：

ありゃりゃ案外少ないもんですね。それにしてもトップがperlとは。。。

＃分散しすぎてるんだろうか。

変なタグ。。。

基本的に日本語で書いてるタグは変。

[まだピンときてない]

いつピンとくるつもりなんだ。Z-biasの件だった。

[いい加減おぼえよう]

まったくだ。pythonの

# -*- coding: utf-8 -*-

についてだった。

[馬鹿対策]

バカはオマエだ・・・
会社のバッチジョブシステムの小メモリジョブ投入方法だった。