peicheng note

2013年10月31日星期四

[Linux] failed to bring up eth0 in Ubuntu

在使用預先製作好的 VM檔案時，
內部配置有一張bridge 一張NAT 網卡，外面有dhcp server 會分發 ip 。
開了一陣子發現，外面連不進去dhcp的那個 ip了，直覺先 /etc/init.d/networking restart 一下，
出現了 "failed to bring up eth0 " in Ubuntu

之後，會使用 ifconifig 看看是不是 dev 的name改變了，發現也沒有變動。
本來以為是，之前bind 的mac ，跟現在不一樣了 dhcp 才不能動，
改去 /etc/udev/rules.d/ check 一下 network 那個 file 。
發現並沒有產生這個file。

只好 refresh 一下 dhcp
sudo dhcpclient

It's Work ~

peicheng@TW-PCLIAO:~/code/git_col/repeating-phrases
$ ping 10.1.192.142
PING 10.1.192.142 (10.1.192.142) 56(84) bytes of data.
64 bytes from 10.1.192.142: icmp_req=1 ttl=64 time=0.320 ms
64 bytes from 10.1.192.142: icmp_req=2 ttl=64 time=1.74 ms
64 bytes from 10.1.192.142: icmp_req=3 ttl=64 time=0.596 ms

2013年10月22日星期二

[台灣武術] 13.10.20 台灣拳頭會於麻豆總爺藝文中心舉辦之”台灣拳頭巡迴大匯演首站”

台灣拳頭會於麻豆總爺藝文中心舉辦之”台灣拳頭巡迴大匯演首站”，於時間 10月20日(日)下午舉行，
圓滿成功。

超過 20 幾個團體，50 幾種拳種，匯集一地，大家互相交流，互相成長，各路名家精銳盡出。

2013年10月17日星期四

[solr] Solr vs SolrCloud / Solr 與 SolrCloud的差別

SolrCloud 改善了原本在 Solr 分散式上面的不足。
Solr 在之前沒有distributed index的feature，我們需要手動的去拆分 core to shards，
你要自己知道你要index的record file 是送到哪一台node上面做index，Solr並不會幫你管理這些步驟。
而且，當你的每個core不 balance 時，也要手動來結果。而且不支援 failover 。可能會導致某些 index 找不到的情況。

SolrCloud 引入 ZooKeeper

在整個系統中導入了hadoop eco 常用的居中協調角色的 ZooKeeper來做 failover 與 load balancing。讓整個 Search engine 可以更 Robust。

There were, however, several problems with the distributed approach that necessitated improvement with SolrCloud:

- Splitting of the core into shards was somewhat manual.
- There was no support for distributed indexing, which meant that you needed to explicitly send documents to a specific shard; Solr couldn't figure out on its own what shards to send documents to.
- There was no load balancing or failover, so if you got a high number of queries, you needed to figure out where to send them and if one shard died it was just gone.
SolrCloud fixes all those problems. There is support for distributing both the index process and the queries automatically, and ZooKeeper provides failover and load balancing. Additionally, every shard can also have multiple replicas for additional robustness.

Shards and Indexing Data in SolrCloud - Apache Solr Reference Guide - Apache Software Foundation
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

2013年10月16日星期三

[hadoop] mapreduce 新舊版本 org.apache.hadoop.mapred vs org.apache.hadoop.mapreduce

note一下，

0.20 前使用 org.apache.hadoop.mapred 舊介面
0.20 版本開始引入org.apache.hadoop.mapreduce 的新API

0.20 後使用
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Mapper.html

@InterfaceAudience.Public
@InterfaceStability.Stable
public class Mapper
extends Object
Maps input key/value pairs to a set of intermediate key/value pairs.

使用 extends

新的API中引入了 context ，直接替換了，map() , reduce()方法中使用的 OutputCollector ,Reporter object。現在透過調用 Context.write() 輸出key value。

0.20 之後使用 extends map , extends reduce 去 extend class。
而之前 map ,reduce 則是interface。

2013年10月13日星期日

[mac] mac 預覽圖片 preview

在使用mac整理圖片時就想說，怎麼沒有一個好用的看圖工具了。
原來就是內建的 preview 預覽工具。

在圖片上按 space 就可以使用上下左右去看圖了。

2013年10月11日星期五

[hadoop]MapFile

MapFile 是排序且帶索引的 hadoop SequenceFile 。
一個 MapFile 在 HDFS上是一個資料夾，包含兩個file組成，一個是index，也就是key的索引，另外一個就是 data，排序好的原始資料。
在查找時，只需要把index載入，memory中，使用binary search的方式，就可以很快查找到要找的key。

index
內含
# hadoop fs -text numbers.map/index

1 128
129 5820
257 11539
385 17255
513 22971
641 28676
769 34388
897 40107

每128 key會有一個索引，第2欄是offset

data 內就含有排序後的key value 。

# hadoop fs -text numbers.map/data
13/10/11 17:17:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/10/11 17:17:23 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/10/11 17:17:23 INFO compress.CodecPool: Got brand-new decompressor
1 one 11 fsdf afd fsdf 111
2 two 222 fsdf d fsd sd 222
3 thref sfd sfsdf e 333 fsd 333
4 four 44 fds 4sfsd fsdfs 4444
5 five 555 fsd fdsf fsd f sf sdfsdfsdf 5555
6 one 11 fsdf afd fsdf 111
7 two 222 fsdf d fsd sd 222
8 thref sfd sfsdf e 333 fsd 333

org.apache.hadoop.io
Class MapFile

java.lang.Object

extended by

org.apache.hadoop.io.MapFile

Direct Known Subclasses:: ArrayFile, SetFile

public class MapFileextends Object

A file-based map from keys to values.

A map is a directory containing two files, the data file, containing all keys and values in the map, and a smaller index file, containing a fraction of the keys. The fraction is determined by MapFile.Writer.getIndexInterval().

The index file is read entirely into memory. Thus key implementations should try to keep themselves small.

Map files are created by adding entries in-order. To maintain a large database, perform updates by copying the previous version of a database and merging in a sorted change list, to create a new version of the database in a new file. Sorting large change lists can be done with SequenceFile.Sorter.

2013年10月8日星期二

[hadoop] ambari HDP default jobtracker namenode port

IPC port
JobTracker:8021
namenode:8020

JobTracker WebUI : 50030
NameNode WebUI 50070

2013年10月2日星期三

[eclipse][ubuntu] eclipse cant not start java.lang.UnsatisfiedLinkError: Could not load SWT library. Reasons:

$ cat 1380704328894.log
!SESSION 2013-10-02 16:58:48.757 -----------------------------------------------
eclipse.buildId=I20110613-1736
java.version=1.6.0_34
java.vendor=Sun Microsystems Inc.
BootLoader constants: OS=linux, ARCH=x86_64, WS=gtk, NL=zh_TW
Command-line arguments: -os linux -ws gtk -arch x86_64

!ENTRY org.eclipse.osgi 4 0 2013-10-02 16:58:50.031
!MESSAGE Application error
!STACK 1
java.lang.UnsatisfiedLinkError: Could not load SWT library. Reasons:
no swt-gtk-3740 in java.library.path
no swt-gtk in java.library.path
Can't load library: /home/peicheng/.swt/lib/linux/x86_64/libswt-gtk-3740.so
Can't load library: /home/peicheng/.swt/lib/linux/x86_64/libswt-gtk.so

at org.eclipse.swt.internal.Library.loadLibrary(Library.java:285)
at org.eclipse.swt.internal.Library.loadLibrary(Library.java:194)
at org.eclipse.swt.internal.C.(C.java:21)
at org.eclipse.swt.internal.Converter.wcsToMbcs(Converter.java:63)
at org.eclipse.swt.internal.Converter.wcsToMbcs(Converter.java:54)
at org.eclipse.swt.widgets.Display.(Display.java:132)
at org.eclipse.ui.internal.Workbench.createDisplay(Workbench.java:695)
at org.eclipse.ui.PlatformUI.createDisplay(PlatformUI.java:161)
at org.eclipse.ui.internal.ide.application.IDEApplication.createDisplay(IDEApplication.java:153)
at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:95)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:344)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:622)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:577)
at org.eclipse.equinox.launcher.Main.run(Main.java:1410)
at org.eclipse.equinox.launcher.Main.main(Main.java:1386)

how to solve

ln -s /usr/lib/jni/libswt-* ~/.swt/lib/linux/x86/
or
ln -s /usr/lib/jni/libswt-* ~/.swt/lib/linux/x86_64/

[hive] left semi join

Tutorial - Apache Hive - Apache Software Foundation
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Joins

Hive 的說明內只出了Joins的幾種組合用法，
其中有個是 left semi join

In order check the existence of a key in another table, the user can use LEFT SEMI JOIN as illustrated by the following example.

INSERT OVERWRITE TABLE pv_users
SELECT u.*
FROM user u LEFT SEMI JOIN page_view pv ON (pv.userid = u.id)
WHERE pv.date = '2008-03-03';

如果有兩張表
A,B
A
id name
1 abc
2 edf

B
id city
1 taipei
2 ku
1 yl

使用 left semi join 時，B表只會出現一筆 rec ，達到去重效果。

cf.

Hive Join(翻译自Hive wiki) - ggjucheng - 博客园
http://www.cnblogs.com/ggjucheng/archive/2013/01/15/2860723.html

訂閱：意見 (Atom)

2013年10月31日 星期四

2013年10月22日 星期二

2013年10月17日 星期四

2013年10月16日 星期三

2013年10月13日 星期日

2013年10月11日 星期五

org.apache.hadoop.io Class MapFile

2013年10月8日 星期二

2013年10月2日 星期三