[TRACE] org.apache.hadoop.hbase.regionserver
在trace org.apache.hadoop.hbase.util.Merge時,
我們看到了這個merge的工具使用HRegion.merge()的方式,
把兩個Region合併成一個, 為了更進一步了解merge的流程,
我們繼續trace HRegion.merge()的程式碼,
並把HRegion程式碼中, 和merge相關的地方找出來,
HRegion.merge()是一個兩個變數的函式, 輸出為合併後的Region,
HRegion.merge()輸入和輸出都是 HRegion 的格式, 使用方法如下:
在程式中, 一開始先檢查兩個要合併的Region是否屬於同一張表格,
並且兩個Region都存在(不為NULL),
確定之後, 將把記憶體的資料flush至硬碟,
壓縮(compact)檔案, 確認每一個column family都只對應到一個檔案,
接著, 呼叫RegionMergeTransaction(a, b, true);進行Region的合併,
透過操作一個RegionMergeTransaction物件, 我們可以得到,
合併後的Region資訊, 包括合併後Region的start key和end key, 以及名稱,
接著產生新的Hfile並將, 將原有的Region的Hfile(a和b)歸檔(Archiving),
完成Region的合併,
在進行Region合併時, 要預先產生一個合併的Region位置,
下列程式即是預先產生合併後的Region.
在合併Region a和Region b時, 要先判定a和b是否是相鄰的Region,
若是相鄰的Region, 才可以進行合併.
在RegionMergeTransaction中, 也會先判斷Region的狀態, 是否相鄰等條件,
確定符合條件後, 將先開啟一個空的Region以及對應的Hfile作為目標,
進行Region的合併, 並關閉原有的Region.
因為這樣的過程, 牽涉到HDFS中檔案的操作,
以及Region狀態的改變, 因此, 必須在整個叢集關閉的前提下執行,
這也造成了hbase 0.94版本中, Region合併的限制.
參考資料:
https://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/HRegion.html
https://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.html
http://hbase.apache.org/book.html#regions.arch
我們看到了這個merge的工具使用HRegion.merge()的方式,
把兩個Region合併成一個, 為了更進一步了解merge的流程,
我們繼續trace HRegion.merge()的程式碼,
並把HRegion程式碼中, 和merge相關的地方找出來,
HRegion.merge()是一個兩個變數的函式, 輸出為合併後的Region,
HRegion.merge()輸入和輸出都是 HRegion 的格式, 使用方法如下:
HRegion merged = null;
HRegion r1 = HRegion.openHRegion(info1, htd, utils.getLog(info1), getConf());
HRegion r2 = HRegion.openHRegion(info2, htd, utils.getLog(info2), getConf());
merged = HRegion.merge(r1, r2);
在程式中, 一開始先檢查兩個要合併的Region是否屬於同一張表格,
並且兩個Region都存在(不為NULL),
確定之後, 將把記憶體的資料flush至硬碟,
壓縮(compact)檔案, 確認每一個column family都只對應到一個檔案,
接著, 呼叫RegionMergeTransaction(a, b, true);進行Region的合併,
透過操作一個RegionMergeTransaction物件, 我們可以得到,
合併後的Region資訊, 包括合併後Region的start key和end key, 以及名稱,
接著產生新的Hfile並將, 將原有的Region的Hfile(a和b)歸檔(Archiving),
完成Region的合併,
5064 /**
5065 * Merge two regions whether they are adjacent or not.
5066 *
5067 * @param a region a
5068 * @param b region b
5069 * @return new merged region
5070 * @throws IOException
5071 */
5072 public static HRegion merge(final HRegion a, final HRegion b) throws IOException {
5073 if (!a.getRegionInfo().getTable().equals(b.getRegionInfo().getTable())) {
5074 throw new IOException("Regions do not belong to the same table");
5075 }
5076
5077 FileSystem fs = a.getRegionFileSystem().getFileSystem();
5078 // Make sure each region's cache is empty
5079 a.flushcache(true);
5080 b.flushcache(true);
5081
5082 // Compact each region so we only have one store file per family
5083 a.compactStores(true);
5084 if (LOG.isDebugEnabled()) {
5085 LOG.debug("Files for region: " + a);
5086 a.getRegionFileSystem().logFileSystemState(LOG);
5087 }
5088 b.compactStores(true);
5089 if (LOG.isDebugEnabled()) {
5090 LOG.debug("Files for region: " + b);
5091 b.getRegionFileSystem().logFileSystemState(LOG);
5092 }
5093
5094 RegionMergeTransaction rmt = new RegionMergeTransaction(a, b, true);
5095 if (!rmt.prepare(null)) {
5096 throw new IOException("Unable to merge regions " + a + " and " + b);
5097 }
5098 HRegionInfo mergedRegionInfo = rmt.getMergedRegionInfo();
5099 LOG.info("starting merge of regions: " + a + " and " + b
5100 + " into new region " + mergedRegionInfo.getRegionNameAsString()
5101 + " with start key <"
5102 + Bytes.toStringBinary(mergedRegionInfo.getStartKey())
5103 + "> and end key <"
5104 + Bytes.toStringBinary(mergedRegionInfo.getEndKey()) + ">");
5105 HRegion dstRegion;
5106 try {
5107 dstRegion = rmt.execute(null, null);
5108 } catch (IOException ioe) {
5109 rmt.rollback(null, null);
5110 throw new IOException("Failed merging region " + a + " and " + b
5111 + ", and successfully rolled back");
5112 }
5113 dstRegion.compactStores(true);
5114
5115 if (LOG.isDebugEnabled()) {
5116 LOG.debug("Files for new region");
5117 dstRegion.getRegionFileSystem().logFileSystemState(LOG);
5118 }
5119
5120 if (dstRegion.getRegionFileSystem().hasReferences(dstRegion.getTableDesc())) {
5121 throw new IOException("Merged region " + dstRegion
5122 + " still has references after the compaction, is compaction canceled?");
5123 }
5124
5125 // Archiving the 'A' region
5126 HFileArchiver.archiveRegion(a.getBaseConf(), fs, a.getRegionInfo());
5127 // Archiving the 'B' region
5128 HFileArchiver.archiveRegion(b.getBaseConf(), fs, b.getRegionInfo());
5129
5130 LOG.info("merge completed. New region is " + dstRegion);
5131 return dstRegion;
5132 }
5133
對於一個Region,擁有多種狀態 (請參考65.2.4. Region State Transition),
因此, 可以查詢此Region是否可以被執行合併,
在狀態轉換圖中, 我們會發現只有OPEN的Region可以被合併,
這個函式就是在測試Region是否可以被合併:
1103 /**
1104 * @return true if region is mergeable
1105 */
1106 public boolean isMergeable() {
1107 if (!isAvailable()) {
1108 LOG.debug("Region " + this.getRegionNameAsString()
1109 + " is not mergeable because it is closing or closed");
1110 return false;
1111 }
1112 if (hasReferences()) {
1113 LOG.debug("Region " + this.getRegionNameAsString()
1114 + " is not mergeable because it has references");
1115 return false;
1116 }
1117
1118 return true;
1119 }
在進行Region合併時, 要預先產生一個合併的Region位置,
下列程式即是預先產生合併後的Region.
4947 /**
4948 * Create a merged region given a temp directory with the region data.
4949 * @param region_b another merging region
4950 * @return merged HRegion
4951 * @throws IOException
4952 */
4953 HRegion createMergedRegionFromMerges(final HRegionInfo mergedRegionInfo,
4954 final HRegion region_b) throws IOException {
4955 HRegion r = HRegion.newHRegion(this.fs.getTableDir(), this.getWAL(),
4956 fs.getFileSystem(), this.getBaseConf(), mergedRegionInfo,
4957 this.getTableDesc(), this.rsServices);
4958 r.readRequestsCount.set(this.getReadRequestsCount()
4959 + region_b.getReadRequestsCount());
4960 r.writeRequestsCount.set(this.getWriteRequestsCount()
4961
4962 + region_b.getWriteRequestsCount());
4963 this.fs.commitMergedRegion(mergedRegionInfo);
4964 return r;
4965 }
在合併Region a和Region b時, 要先判定a和b是否是相鄰的Region,
若是相鄰的Region, 才可以進行合併.
5034 /**
5035 * Merge two HRegions. The regions must be adjacent and must not overlap.
5036 *
5037 * @return new merged HRegion
5038 * @throws IOException
5039 */
5040 public static HRegion mergeAdjacent(final HRegion srcA, final HRegion srcB)
5041 throws IOException {
5042 HRegion a = srcA;
5043 HRegion b = srcB;
5044
5045 // Make sure that srcA comes first; important for key-ordering during
5046 // write of the merged file.
5047 if (srcA.getStartKey() == null) {
5048 if (srcB.getStartKey() == null) {
5049 throw new IOException("Cannot merge two regions with null start key");
5050 }
5051 // A's start key is null but B's isn't. Assume A comes before B
5052 } else if ((srcB.getStartKey() == null) ||
5053 (Bytes.compareTo(srcA.getStartKey(), srcB.getStartKey()) > 0)) {
5054 a = srcB;
5055 b = srcA;
5056 }
5057
5058 if (!(Bytes.compareTo(a.getEndKey(), b.getStartKey()) == 0)) {
5059 throw new IOException("Cannot merge non-adjacent regions");
5060 }
5061 return merge(a, b);
5062 }
在RegionMergeTransaction中, 也會先判斷Region的狀態, 是否相鄰等條件,
確定符合條件後, 將先開啟一個空的Region以及對應的Hfile作為目標,
進行Region的合併, 並關閉原有的Region.
因為這樣的過程, 牽涉到HDFS中檔案的操作,
以及Region狀態的改變, 因此, 必須在整個叢集關閉的前提下執行,
這也造成了hbase 0.94版本中, Region合併的限制.
參考資料:
https://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/HRegion.html
https://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/RegionMergeTransaction.html
http://hbase.apache.org/book.html#regions.arch
留言
張貼留言