How to merge histogram bins (edges and counts) by bin-count condition?
Question:
- See the stacoverflow question How to merge histogram bins (edges and counts) by bin-count condition?
My answer to this question
I met this problem in my own reserach project, so I searched this question and posted my answer, hoping to help others and myself for reviewing.
- See my answer:
Assume the current histogram hist
and bins bin_edges
are returned by np.hist()
function, and we want to merge small bins (i.e., the value of hist
is smaller than some threshold) to larger ones, the code is shown below, where inputs are current hist and bins, and outputs are the new ones.
def merge_hist_bins(hist, bin_edges, \
hist_value_thred = 1, # i.e., 1% if is_percentile True;
is_percentile = False
):
total = np.sum(hist)
if is_percentile:
hist_thred = int(total*hist_value_thred*0.01)
else:
hist_thred = int(hist_value_thred)
print ("[***] hist_thred = ", hist_thred)
assert len(hist) == len(bin_edges) - 1
bin_dict = {}
i_rightmost = 0
for i in range(0, len(hist)):
if i < i_rightmost:
continue
edge_left = bin_edges[i]
j = i
tmp_hist_sum = 0
while tmp_hist_sum < hist_thred and j < len(hist):
tmp_hist_sum += hist[j]
j += 1
edge_right = bin_edges[j]
else:
bin_dict[(edge_left, edge_right)] = tmp_hist_sum
i_rightmost = j
idx = 0
new_hist = []
new_bin_edges = [bin_edges[0]]
for k , v in bin_dict.items():
new_hist.append(v)
new_bin_edges.append(k[1])
print ("key {} : {}".format(k, v))
idx += 1
print ("[***] done, hist_thred = ", hist_thred)
print ("[***] old bin # = {}, new bin # = {}".format(len(bin_edges), len(new_bin_edges)))
return np.array(new_hist), np.array(new_bin_edges), hist_thred
We will show the histogram with the following function:
def show_hist(bin_edges, hist, fig_file = None):
d_min = bin_edges[0]
d_max = bin_edges[-1]
d_num = len(bin_edges)
fig, ax = plt.subplots() #create figure and axes
plt.hist(x=bin_edges[:-1], bins=bin_edges, weights=hist)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('My Very Own Histogram')
# Figure size in inches (default)
plt.text(x=0.5, y=0.5, \
s=r'$D_{min}=$'+"{}".format(d_min) + r', $D_{max}=$'+"{}".format(\
d_max) + r', $D_{num}=$'+"{}".format(d_num),
transform=ax.transAxes)
if fig_file:
plt.savefig("./results/{}.png".format(fig_file))
print ("saved ", "./results/{}.png".format(fig_file))
plt.show()
txt_fn = "./results/" + npz_file + ".csv"
comment = "#right_bin_edge, hist_value"
file_lists = [ "{},{}".format(i, j if j > 50 else 0.5) for (i,j) in zip(bin_edges[1:], hist)]
file_lists = [comment] + file_lists
write_to_file(txt_fn, file_lists)
See the histogram before
and after
the bin merging. In this example, input hist bin number is 256, new hist bin number is 95, with the threshold being 12%
of sum(hist)
.
Complete Code
- See the complete code here