diff mbox

[v2] of: use hash based search in of_find_node_by_phandle

Message ID b20292f5-e080-2b66-bae6-4a230cb0627a@codeaurora.org (mailing list archive)
State Not Applicable, archived
Delegated to: Andy Gross
Headers show

Commit Message

Chintan Pandya Jan. 29, 2018, 7:34 a.m. UTC
> I was curious, so I implemented it. It ends up being similar to Rasmus's
> 1st suggestion. The difference is we don't try to store all entries, but
> rather implement a hash table that doesn't handle collisions. Relying on
> the fact that phandles are just linearly allocated from 0, we just mask
> the high bits of the phandle to get the index.
I think this is most resourceful way.
> Can you try out on your setup and try different
> array sizes.
Here are my test results. However, I simply considered overall boot time to
compare different scenarios because profiling of_find_node_by_phandle() in
early boot fails.

Scenarios:
[1] Cache size 1024 + early cache build up [Small change in your cache 
patch,
     see the patch below]
[2] Hash 64 approach[my original v2 patch]
[3] Cache size 64
[4] Cache size 128
[5] Cache size 256
[6] Base build

Result (boot to shell in sec):
[1] 14.292498 14.370994 14.313537 --> 850ms avg gain
[2] 14.340981 14.395900 14.398149 --> 800ms avg gain
[3] 14.546429 14.488783 14.468694 --> 680ms avg gain
[4] 14.506007 14.497487 14.523062 --> 670ms avg gain
[5] 14.671100 14.643344 14.731853 --> 500ms avg gain
[6] 15.263386 15.246155 15.041847 --> 0

Previously reported 400ms gain for [2] was from different set up. These 
tests
and new data is from my own debug set up. When we take any of these patch to
production, result might deviate accordingly.

Chintan Pandya

Patch for the case [1]
<--------------------------------------------------------------------------->

Comments

Rob Herring (Arm) Jan. 29, 2018, 3:10 p.m. UTC | #1
On Mon, Jan 29, 2018 at 1:34 AM, Chintan Pandya <cpandya@codeaurora.org> wrote:
>
>> I was curious, so I implemented it. It ends up being similar to Rasmus's
>> 1st suggestion. The difference is we don't try to store all entries, but
>> rather implement a hash table that doesn't handle collisions. Relying on
>> the fact that phandles are just linearly allocated from 0, we just mask
>> the high bits of the phandle to get the index.
>
> I think this is most resourceful way.
>>
>> Can you try out on your setup and try different
>> array sizes.
>
> Here are my test results. However, I simply considered overall boot time to
> compare different scenarios because profiling of_find_node_by_phandle() in
> early boot fails.
>
> Scenarios:
> [1] Cache size 1024 + early cache build up [Small change in your cache
> patch,
>     see the patch below]
> [2] Hash 64 approach[my original v2 patch]
> [3] Cache size 64
> [4] Cache size 128
> [5] Cache size 256
> [6] Base build
>
> Result (boot to shell in sec):
> [1] 14.292498 14.370994 14.313537 --> 850ms avg gain
> [2] 14.340981 14.395900 14.398149 --> 800ms avg gain
> [3] 14.546429 14.488783 14.468694 --> 680ms avg gain
> [4] 14.506007 14.497487 14.523062 --> 670ms avg gain
> [5] 14.671100 14.643344 14.731853 --> 500ms avg gain

It's strange that bigger sizes are slower. Based on this data, I'd pick [3].

How many phandles do you have? I thought it was hundreds, so 1024
entries would be more than enough and you should see some curve to a
max gain as cache size approaches # of phandles.

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chintan Pandya Jan. 29, 2018, 6:18 p.m. UTC | #2
>> Scenarios:
>> [1] Cache size 1024 + early cache build up [Small change in your cache
>> patch,
>>      see the patch below]
>> [2] Hash 64 approach[my original v2 patch]
>> [3] Cache size 64
>> [4] Cache size 128
>> [5] Cache size 256
>> [6] Base build
>>
>> Result (boot to shell in sec):
>> [1] 14.292498 14.370994 14.313537 --> 850ms avg gain
>> [2] 14.340981 14.395900 14.398149 --> 800ms avg gain
>> [3] 14.546429 14.488783 14.468694 --> 680ms avg gain
>> [4] 14.506007 14.497487 14.523062 --> 670ms avg gain
>> [5] 14.671100 14.643344 14.731853 --> 500ms avg gain
> It's strange that bigger sizes are slower. Based on this data, I'd pick [3].
>
> How many phandles do you have? I thought it was hundreds, so 1024
> entries would be more than enough and you should see some curve to a
> max gain as cache size approaches # of phandles.
>
1063 phandles for my device. In one of the previous mails, I estimated it to
be few hundreds but I wastoo short of actual number. However, 1063 still
doesn't justify why [4] and [5] are notbetter than [3].

I would still be interested to find out a way to dynamically allocate array
with size near to total # of phandles with pre-stored mapping. And free this
array once done with it. But at present, no idea how will I achieve this. If
you can share any pointers around this, that would help !

Thanks,
Chintan Pandya
diff mbox

Patch

diff --git a/drivers/of/base.c b/drivers/of/base.c
index a0bccb5..671e7cf 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1084,6 +1084,8 @@  int of_modalias_node(struct device_node *node, 
char *modalias, int len)
  }
  EXPORT_SYMBOL_GPL(of_modalias_node);

+struct device_node *phandle_cache[PHANDLE_CACHE_SZ];
+
  /**
   * of_find_node_by_phandle - Find a node given a phandle
   * @handle:    phandle of the node to find
@@ -1100,9 +1102,15 @@  struct device_node 
*of_find_node_by_phandle(phandle handle)
                 return NULL;

         raw_spin_lock_irqsave(&devtree_lock, flags);
-       for_each_of_allnodes(np)
-               if (np->phandle == handle)
-                       break;
+       np = phandle_cache[PHANDLE_CACHE_HASH(handle)];
+       if (!np || np->phandle != handle) {
+               for_each_of_allnodes(np)
+                       if (np->phandle == handle) {
+ phandle_cache[PHANDLE_CACHE_HASH(handle)] = np;
+                               break;
+                       }
+
+       }
         of_node_get(np);
         raw_spin_unlock_irqrestore(&devtree_lock, flags);
         return np;
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index c0914fb..e90a458 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -227,6 +227,9 @@  static void populate_properties(const void *blob,
                 pprev      = &pp->next;
         }

+       if (!dryrun && np->phandle)
+               phandle_cache[PHANDLE_CACHE_HASH(np->phandle)] = np;
+
         /* With version 0x10 we may not have the name property,
          * recreate it here from the unit name if absent
          */
diff --git a/include/linux/of.h b/include/linux/of.h
index 299aeb1..b1200be 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -92,6 +92,10 @@  struct of_phandle_iterator {
         struct device_node *node;
  };

+#define PHANDLE_CACHE_SZ       1024
+#define PHANDLE_CACHE_HASH(x)  ((x) & (PHANDLE_CACHE_SZ - 1))
+extern struct device_node *phandle_cache[PHANDLE_CACHE_SZ];
+
  struct of_reconfig_data {
         struct device_node      *dn;
         struct property         *prop;