[TRACE] org.apache.hadoop.hbase.regionserver

在trace org.apache.hadoop.hbase.util.Merge時,
我們看到了這個merge的工具使用HRegion.merge()的方式,
把兩個Region合併成一個, 為了更進一步了解merge的流程,
我們繼續trace HRegion.merge()的程式碼,
並把HRegion程式碼中, 和merge相關的地方找出來,
HRegion.merge()是一個兩個變數的函式, 輸出為合併後的Region,
HRegion.merge()輸入和輸出都是 HRegion 的格式, 使用方法如下:

HRegion merged = null;
HRegion r1 = HRegion.openHRegion(info1, htd, utils.getLog(info1), getConf());
HRegion r2 = HRegion.openHRegion(info2, htd, utils.getLog(info2), getConf());
merged = HRegion.merge(r1, r2);


在程式中, 一開始先檢查兩個要合併的Region是否屬於同一張表格,
並且兩個Region都存在(不為NULL),
確定之後, 將把記憶體的資料flush至硬碟,
壓縮(compact)檔案, 確認每一個column family都只對應到一個檔案,
接著, 呼叫RegionMergeTransaction(a, b, true);進行Region的合併,
透過操作一個RegionMergeTransaction物件, 我們可以得到,
合併後的Region資訊, 包括合併後Region的start key和end key, 以及名稱,
接著產生新的Hfile並將, 將原有的Region的Hfile(a和b)歸檔(Archiving),
完成Region的合併,

5064   /**
5065    * Merge two regions whether they are adjacent or not.
5066    *
5067    * @param a region a
5068    * @param b region b
5069    * @return new merged region
5070    * @throws IOException
5071    */
5072   public static HRegion merge(final HRegion a, final HRegion b) throws IOException {
5073     if (!a.getRegionInfo().getTable().equals(b.getRegionInfo().getTable())) {
5074       throw new IOException("Regions do not belong to the same table");
5075     }
5076 
5077     FileSystem fs = a.getRegionFileSystem().getFileSystem();
5078     // Make sure each region's cache is empty
5079     a.flushcache(true);
5080     b.flushcache(true);
5081 
5082     // Compact each region so we only have one store file per family
5083     a.compactStores(true);
5084     if (LOG.isDebugEnabled()) {
5085       LOG.debug("Files for region: " + a);
5086       a.getRegionFileSystem().logFileSystemState(LOG);
5087     }
5088     b.compactStores(true);
5089     if (LOG.isDebugEnabled()) {
5090       LOG.debug("Files for region: " + b);
5091       b.getRegionFileSystem().logFileSystemState(LOG);
5092     }
5093 
5094     RegionMergeTransaction rmt = new RegionMergeTransaction(a, b, true);
5095     if (!rmt.prepare(null)) {
5096       throw new IOException("Unable to merge regions " + a + " and " + b);
5097     }
5098     HRegionInfo mergedRegionInfo = rmt.getMergedRegionInfo();
5099     LOG.info("starting merge of regions: " + a + " and " + b
5100         + " into new region " + mergedRegionInfo.getRegionNameAsString()
5101         + " with start key <"
5102         + Bytes.toStringBinary(mergedRegionInfo.getStartKey())
5103         + "> and end key <"
5104         + Bytes.toStringBinary(mergedRegionInfo.getEndKey()) + ">");
5105     HRegion dstRegion;
5106     try {
5107       dstRegion = rmt.execute(null, null);
5108     } catch (IOException ioe) {
5109       rmt.rollback(null, null);
5110       throw new IOException("Failed merging region " + a + " and " + b
5111           + ", and successfully rolled back");
5112     }
5113     dstRegion.compactStores(true);
5114 
5115     if (LOG.isDebugEnabled()) {
5116       LOG.debug("Files for new region");
5117       dstRegion.getRegionFileSystem().logFileSystemState(LOG);
5118     }
5119 
5120     if (dstRegion.getRegionFileSystem().hasReferences(dstRegion.getTableDesc())) {
5121       throw new IOException("Merged region " + dstRegion
5122           + " still has references after the compaction, is compaction canceled?");
5123     }
5124 
5125     // Archiving the 'A' region
5126     HFileArchiver.archiveRegion(a.getBaseConf(), fs, a.getRegionInfo());
5127     // Archiving the 'B' region
5128     HFileArchiver.archiveRegion(b.getBaseConf(), fs, b.getRegionInfo());
5129 
5130     LOG.info("merge completed. New region is " + dstRegion);
5131     return dstRegion;
5132   }
5133 


對於一個Region,擁有多種狀態 (請參考65.2.4. Region State Transition),
因此, 可以查詢此Region是否可以被執行合併,
在狀態轉換圖中, 我們會發現只有OPEN的Region可以被合併,
這個函式就是在測試Region是否可以被合併:

1103   /**
1104    * @return true if region is mergeable
1105    */
1106   public boolean isMergeable() {
1107     if (!isAvailable()) {
1108       LOG.debug("Region " + this.getRegionNameAsString()
1109           + " is not mergeable because it is closing or closed");
1110       return false;
1111     }
1112     if (hasReferences()) {
1113       LOG.debug("Region " + this.getRegionNameAsString()
1114           + " is not mergeable because it has references");
1115       return false;
1116     }
1117 
1118     return true;
1119   }

在進行Region合併時, 要預先產生一個合併的Region位置,
下列程式即是預先產生合併後的Region.

4947   /**
4948    * Create a merged region given a temp directory with the region data.
4949    * @param region_b another merging region
4950    * @return merged HRegion
4951    * @throws IOException
4952    */
4953   HRegion createMergedRegionFromMerges(final HRegionInfo mergedRegionInfo,
4954       final HRegion region_b) throws IOException {
4955     HRegion r = HRegion.newHRegion(this.fs.getTableDir(), this.getWAL(),
4956         fs.getFileSystem(), this.getBaseConf(), mergedRegionInfo,
4957         this.getTableDesc(), this.rsServices);
4958     r.readRequestsCount.set(this.getReadRequestsCount()
4959         + region_b.getReadRequestsCount());
4960     r.writeRequestsCount.set(this.getWriteRequestsCount()
4961 
4962         + region_b.getWriteRequestsCount());
4963     this.fs.commitMergedRegion(mergedRegionInfo);
4964     return r;
4965   }

在合併Region a和Region b時, 要先判定a和b是否是相鄰的Region,
若是相鄰的Region, 才可以進行合併.

5034   /**
5035    * Merge two HRegions.  The regions must be adjacent and must not overlap.
5036    *
5037    * @return new merged HRegion
5038    * @throws IOException
5039    */
5040   public static HRegion mergeAdjacent(final HRegion srcA, final HRegion srcB)
5041   throws IOException {
5042     HRegion a = srcA;
5043     HRegion b = srcB;
5044 
5045     // Make sure that srcA comes first; important for key-ordering during
5046     // write of the merged file.
5047     if (srcA.getStartKey() == null) {
5048       if (srcB.getStartKey() == null) {
5049         throw new IOException("Cannot merge two regions with null start key");
5050       }
5051       // A's start key is null but B's isn't. Assume A comes before B
5052     } else if ((srcB.getStartKey() == null) ||
5053       (Bytes.compareTo(srcA.getStartKey(), srcB.getStartKey()) > 0)) {
5054       a = srcB;
5055       b = srcA;
5056     }
5057 
5058     if (!(Bytes.compareTo(a.getEndKey(), b.getStartKey()) == 0)) {
5059       throw new IOException("Cannot merge non-adjacent regions");
5060     }
5061     return merge(a, b);
5062   }

RegionMergeTransaction中, 也會先判斷Region的狀態, 是否相鄰等條件,
確定符合條件後, 將先開啟一個空的Region以及對應的Hfile作為目標,
進行Region的合併, 並關閉原有的Region.

因為這樣的過程, 牽涉到HDFS中檔案的操作,
以及Region狀態的改變, 因此, 必須在整個叢集關閉的前提下執行,
這也造成了hbase 0.94版本中, Region合併的限制.

參考資料:
https://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/HRegion.html
https://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.html
http://hbase.apache.org/book.html#regions.arch

留言

熱門文章

LTE筆記: RSRP, RSSI and RSRQ

[WiFi] WiFi 網路的識別: BSS, ESS, SSID, ESSID, BSSID

LTE筆記: 5G NR Measurement Events