Message ID | b20292f5-e080-2b66-bae6-4a230cb0627a@codeaurora.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | Andy Gross |
Headers | show |
On Mon, Jan 29, 2018 at 1:34 AM, Chintan Pandya <cpandya@codeaurora.org> wrote: > >> I was curious, so I implemented it. It ends up being similar to Rasmus's >> 1st suggestion. The difference is we don't try to store all entries, but >> rather implement a hash table that doesn't handle collisions. Relying on >> the fact that phandles are just linearly allocated from 0, we just mask >> the high bits of the phandle to get the index. > > I think this is most resourceful way. >> >> Can you try out on your setup and try different >> array sizes. > > Here are my test results. However, I simply considered overall boot time to > compare different scenarios because profiling of_find_node_by_phandle() in > early boot fails. > > Scenarios: > [1] Cache size 1024 + early cache build up [Small change in your cache > patch, > see the patch below] > [2] Hash 64 approach[my original v2 patch] > [3] Cache size 64 > [4] Cache size 128 > [5] Cache size 256 > [6] Base build > > Result (boot to shell in sec): > [1] 14.292498 14.370994 14.313537 --> 850ms avg gain > [2] 14.340981 14.395900 14.398149 --> 800ms avg gain > [3] 14.546429 14.488783 14.468694 --> 680ms avg gain > [4] 14.506007 14.497487 14.523062 --> 670ms avg gain > [5] 14.671100 14.643344 14.731853 --> 500ms avg gain It's strange that bigger sizes are slower. Based on this data, I'd pick [3]. How many phandles do you have? I thought it was hundreds, so 1024 entries would be more than enough and you should see some curve to a max gain as cache size approaches # of phandles. Rob -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Scenarios: >> [1] Cache size 1024 + early cache build up [Small change in your cache >> patch, >> see the patch below] >> [2] Hash 64 approach[my original v2 patch] >> [3] Cache size 64 >> [4] Cache size 128 >> [5] Cache size 256 >> [6] Base build >> >> Result (boot to shell in sec): >> [1] 14.292498 14.370994 14.313537 --> 850ms avg gain >> [2] 14.340981 14.395900 14.398149 --> 800ms avg gain >> [3] 14.546429 14.488783 14.468694 --> 680ms avg gain >> [4] 14.506007 14.497487 14.523062 --> 670ms avg gain >> [5] 14.671100 14.643344 14.731853 --> 500ms avg gain > It's strange that bigger sizes are slower. Based on this data, I'd pick [3]. > > How many phandles do you have? I thought it was hundreds, so 1024 > entries would be more than enough and you should see some curve to a > max gain as cache size approaches # of phandles. > 1063 phandles for my device. In one of the previous mails, I estimated it to be few hundreds but I wastoo short of actual number. However, 1063 still doesn't justify why [4] and [5] are notbetter than [3]. I would still be interested to find out a way to dynamically allocate array with size near to total # of phandles with pre-stored mapping. And free this array once done with it. But at present, no idea how will I achieve this. If you can share any pointers around this, that would help ! Thanks, Chintan Pandya
diff --git a/drivers/of/base.c b/drivers/of/base.c index a0bccb5..671e7cf 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -1084,6 +1084,8 @@ int of_modalias_node(struct device_node *node, char *modalias, int len) } EXPORT_SYMBOL_GPL(of_modalias_node); +struct device_node *phandle_cache[PHANDLE_CACHE_SZ]; + /** * of_find_node_by_phandle - Find a node given a phandle * @handle: phandle of the node to find @@ -1100,9 +1102,15 @@ struct device_node *of_find_node_by_phandle(phandle handle) return NULL; raw_spin_lock_irqsave(&devtree_lock, flags); - for_each_of_allnodes(np) - if (np->phandle == handle) - break; + np = phandle_cache[PHANDLE_CACHE_HASH(handle)]; + if (!np || np->phandle != handle) { + for_each_of_allnodes(np) + if (np->phandle == handle) { + phandle_cache[PHANDLE_CACHE_HASH(handle)] = np; + break; + } + + } of_node_get(np); raw_spin_unlock_irqrestore(&devtree_lock, flags); return np; diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index c0914fb..e90a458 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -227,6 +227,9 @@ static void populate_properties(const void *blob, pprev = &pp->next; } + if (!dryrun && np->phandle) + phandle_cache[PHANDLE_CACHE_HASH(np->phandle)] = np; + /* With version 0x10 we may not have the name property, * recreate it here from the unit name if absent */ diff --git a/include/linux/of.h b/include/linux/of.h index 299aeb1..b1200be 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -92,6 +92,10 @@ struct of_phandle_iterator { struct device_node *node; }; +#define PHANDLE_CACHE_SZ 1024 +#define PHANDLE_CACHE_HASH(x) ((x) & (PHANDLE_CACHE_SZ - 1)) +extern struct device_node *phandle_cache[PHANDLE_CACHE_SZ]; + struct of_reconfig_data { struct device_node *dn; struct property *prop;