[TRACE] org.apache.hadoop.hbase.util.Merge

org.apache.hadoop.hbase.util.Merge,
是在hbase 0.94版本中用來實現Region合併的工具,
使用方式如下:

$ ./bin/hbase org.apache.hadoop.hbase.util.Merge
Usage: bin/hbase merge <table-name> <region-1> <region-2>


在程式的一開始,
org.apache.hadoop.hbase.util.Merge將會先檢查:
輸入的變數, HDFS的設定, hbase是否關閉:

81      if (parseArgs(args) != 0) {
82        return -1;
83      }
84  
85      // Verify file system is up.
86      FileSystem fs = FileSystem.get(getConf());              // get DFS handle
87      LOG.info("Verifying that file system is available...");
88      try {
89        FSUtils.checkFileSystemAvailable(fs);
90      } catch (IOException e) {
91        LOG.fatal("File system is not available", e);
92        return -1;
93      }
94  
95      // Verify HBase is down
96      LOG.info("Verifying that HBase is not running...");
97      try {
98        HBaseAdmin.checkHBaseAvailable(getConf());
99        LOG.fatal("HBase cluster must be off-line, and is not. Aborting.");
100       return -1;
101     } catch (ZooKeeperConnectionException zkce) {
102       // If no zk, presume no master.
103     } catch (MasterNotRunningException e) {
104       // Expected. Ignore.
105     }

確認輸入變數正確(包括table name, region-1, region-2),
HDFS開啟, 且hbase關閉時, 才開始進行merging (透過mergeTwoRegions()).
在mergeTwoRegions()內, 一開始將先去找尋存取.META.表格的RegionServer,
並嘗試根據region-1和region-2的名字, 查詢Region資料, 存入info1和info2中,
確定region-1和region-2的資訊後, 進入merge()程序, 產生合併後的內容,
加入.META.表格當中, mergeTwoRegions()的程式碼如下:

130   /*
131    * Merges two regions from a user table.
132    */
133   private void mergeTwoRegions() throws IOException {
134     LOG.info("Merging regions " + Bytes.toStringBinary(this.region1) + " and " +
135         Bytes.toStringBinary(this.region2) + " in table " + this.tableName);
136     HRegion meta = this.utils.getMetaRegion();
137     Get get = new Get(region1);
138     get.addColumn(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER);
139     Result result1 =  meta.get(get);
140     Preconditions.checkState(!result1.isEmpty(),
141         "First region cells can not be null");
142     HRegionInfo info1 = HRegionInfo.getHRegionInfo(result1);
143     if (info1 == null) {
144       throw new NullPointerException("info1 is null using key " +
145           Bytes.toStringBinary(region1) + " in " + meta);
146     }
147     get = new Get(region2);
148     get.addColumn(HConstants.CATALOG_FAMILY, HConstants.REGIONINFO_QUALIFIER);
149     Result result2 =  meta.get(get);
150     Preconditions.checkState(!result2.isEmpty(),
151         "Second region cells can not be null");
152     HRegionInfo info2 = HRegionInfo.getHRegionInfo(result2);
153     if (info2 == null) {
154       throw new NullPointerException("info2 is null using key " + meta);
155     }
156     TableDescriptor htd = FSTableDescriptors.getTableDescriptorFromFs(FileSystem.get(getConf()),
157       this.rootdir, this.tableName);
158     HRegion merged = merge(htd.getHTableDescriptor(), meta, info1, info2);
159 
160     LOG.info("Adding " + merged.getRegionInfo() + " to " +
161         meta.getRegionInfo());
162 
163     HRegion.addRegionToMETA(meta, merged);
164     merged.close();
165   }

在merge()中做了兩件事情,
第一件事情是透過HRegion.merge(r1, r2)將region-1和region-2合併,
第二件事情則是將.META.表中region-1和region-2的資訊刪除,
這樣一來, 在mergeTwoRegions()就只需要把新的資訊加入.META.表中,
merge()的原始碼如下:

167   /*
168    * Actually merge two regions and update their info in the meta region(s)
169    * Returns HRegion object for newly merged region
170    */
171   private HRegion merge(final HTableDescriptor htd, HRegion meta,
172                         HRegionInfo info1, HRegionInfo info2)
173   throws IOException {
174     if (info1 == null) {
175       throw new IOException("Could not find " + Bytes.toStringBinary(region1) + " in " +
176           Bytes.toStringBinary(meta.getRegionName()));
177     }
178     if (info2 == null) {
179       throw new IOException("Could not find " + Bytes.toStringBinary(region2) + " in " +
180           Bytes.toStringBinary(meta.getRegionName()));
181     }
182     HRegion merged = null;
183     HRegion r1 = HRegion.openHRegion(info1, htd, utils.getLog(info1), getConf());
184     try {
185       HRegion r2 = HRegion.openHRegion(info2, htd, utils.getLog(info2), getConf());
186       try {
187         merged = HRegion.merge(r1, r2);
188       } finally {
189         if (!r2.isClosed()) {
190           r2.close();
191         }
192       }
193     } finally {
194       if (!r1.isClosed()) {
195         r1.close();
196       }
197     }
198 
199     // Remove the old regions from meta.
200     // HRegion.merge has already deleted their files
201 
202     removeRegionFromMeta(meta, info1);
203     removeRegionFromMeta(meta, info2);
204 
205     this.mergeInfo = merged.getRegionInfo();
206     return merged;
207   }

在hbase 0.94的merge工具中, 提供了offline合併Region的工具,
然而, trace程式的結果, 並能夠回答為什麼有offline的限制,
(為什麼不直接online修改.META.表格就好了?)
接下來, 將繼續trace HRegion.merge()這一隻程式,
觀察Region合併的過程.

參考資料:
https://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/util/Merge.html
https://hbase.apache.org/xref/org/apache/hadoop/hbase/util/Merge.html
http://blog.csdn.net/macyang/article/details/6624482

留言

熱門文章

LTE筆記: RSRP, RSSI and RSRQ

[WiFi] WiFi 網路的識別: BSS, ESS, SSID, ESSID, BSSID

LTE筆記: 波束成型 (beamforming) 和天線陣列