Message ID | 20161028031747.68472ac7@roar.ozlabs.ibm.com (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Snipping in various places: Dave Chinner wrote: > directory operations: > lookups = 79418, creates = 460612 > removes = 460544, getdents = 5685160 > > SO, we have 80k directory lookups to instantiate an inode, but 460k > creates and removes, and 5.6M readdir calls. That's a lot of readdir > calls.... > >> trans 0 3491960 0 >> > 3.5 million asynchronous transaction commits. That's a lot, so far I > can account for about 1.5M of them (creates, removes and a handful > of data extents). Oh, a couple of million data writes - I bet the > rest are timestamp/i_version updates. > > [...] > log force sleeps = 143932 (fsync waits!) > ... > Identifying what that is will give you some idea > of how to reduce the fsync load and hence potentially decrease the > log and CRC overhead that it is causing.... > ---- Back when the free-inode performance enhancement was introduced, it was required that crc'ing of metadata also be enabled. Have these options been separated yet? If so, save yourself some reading and just get the "thanks!", and ignore the rest of this, but if not... please read on, and please re-consider... I asked for a separation of these options on the basis that crc'ing of the file system was not something I wanted on my files due to performance and data security and recovery considerations. I was told that the crc'ing of the data wouldn't be noticeable (something I was skeptical of), and that allowing the options separately wouldn't be offered). My inner-cynic thought that this was because many, if not most users wouldn't need the crc and it would only be another source of problems -- one that would disable data-access if the crc counts couldn't be verified. I really, still, don't want the crc's on my disks, I don't see that they would provide any positive benefit in my usage -- nor in many uses of xfs -- which ISN'T to say I can't see the need and/or desire to have them for many classes of xfs users -- just not one that I belong to at this point. I was more worried about performance than anything else (always trying to eek the most out of fixed budget!). I see the speedup from the free-inode tree in saving the time during the real-time-search for free space, but only saw the crc calculations as time-wasting overhead for the customer not needing such integrity guarantees. Is the free-space inode feature anything more than something to offset the speed lost by crc calculations? Wouldn't xfs be more flexible if it could be tuned for performance OR integrity (or a mix of both, using both). Even if someone only wants to support the combination in some official release, allowing selection of those options to be separate would allow better testing as well as isolation of features for specific workloads, testing and/or benchmarks. L. Walsh -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Linda, On Tue, Nov 01, 2016 at 07:18:32PM -0700, L.A. Walsh wrote: s > Snipping in various places: > > Dave Chinner wrote: > >directory operations: > > lookups = 79418, creates = 460612 > > removes = 460544, getdents = 5685160 > > > >SO, we have 80k directory lookups to instantiate an inode, but 460k > >creates and removes, and 5.6M readdir calls. That's a lot of readdir > >calls.... > >>trans 0 3491960 0 > >3.5 million asynchronous transaction commits. That's a lot, so far I > >can account for about 1.5M of them (creates, removes and a handful > >of data extents). Oh, a couple of million data writes - I bet the > >rest are timestamp/i_version updates. > >[...] > >log force sleeps = 143932 (fsync waits!) > >... > >Identifying what that is will give you some idea > >of how to reduce the fsync load and hence potentially decrease the > >log and CRC overhead that it is causing.... > ---- > Back when the free-inode performance enhancement was introduced, > it was required that crc'ing of metadata also be enabled. Have > these options been separated yet? No, and they won't be. The answer has not changed. ..... > I really, still, don't want the crc's on my disks, I don't see > that they would provide any positive benefit in my usage -- nor in > many uses of xfs -- which ISN'T to say I can't see the need and/or > desire to have them for many classes of xfs users -- just not one > that I belong to at this point. I was more worried about performance > than anything else (always trying to eek the most out of fixed budget!). You may not want CRC's, but they /aren't for you/. The benefits of CRCs are realised when things go wrong, not when things go right. IOWs, CRCs are most useful for the people supporting XFS as they provide certainty about where errors and corruptions have originated. As most users never have things go wrong, all they think is "CRCs are unnecessary overhead". It's just like backups - how many people don't make backups because they cost money right now and there's no tangible benefit until something goes wrong which almost never happens? > I see the speedup from the free-inode tree in saving the > time during the real-time-search for free space, but only saw the crc > calculations as time-wasting overhead for the customer not needing > such integrity guarantees. Exactly my point. Humans are terrible at risk assessment and mitigation because most people are unaware of the unconcious cognitive biases that affect this sort of decision making. > Is the free-space inode feature anything more than something to > offset the speed lost by crc calculations? That's not what the free inode btree does. It actually slows down allocation on an empty filesystem and trades off that increase in "empty filesystem" overhead for significantly better aged filesystem inode allocation performance. i.e. the finobt provides more deterministic inode allocation overhead, not "faster" allocation. Let me demonstrate with some numbers on empty filesystem create rate: create rate sys CPU time write rate (files/s) (seconds) (MB/s) crc = 0, finobt = 0: 238943 2629 ~200 crc = 1, finobt = 0: 231582 2711 ~40 crc = 1, finobt = 1: 232563 2766 ~40 *hacked* crc disable: 231435 2789 ~40 Std deviation of the file create rate is about +/-10%, so the differences between the create rate results (3%) are not significant. Yes, there's a slight decrease in performance with CRCs enabled, but that's because of the increased btree footprint due to the increased btree header size in the v5 format. Hence lookups, modifications, etc take slightly longer and cost more CPU. We can see that the system CPU time increased by 3.1% with the "addition of CRCs". The CPU usage increases by a further 2% with the addition of the free inode btree, which should give you an idea of how much CPU time even a small btree consumes. The allocated inode btree is huge in comparison to the finobt in this workload, which is why even a small change in header size (when CRCs are enabled) makes a large difference in CPU usage. To verify that CRC has no significant impact on inode allocation, let's look at the actual CPU being used by the CRC calculations in this workload are: 0.28% [kernel] [k] crc32c_pcl_intel_update Only a small proportion of the entire increase in CPU consumption that comes from "turning on CRCS". Indeed, the "*hacked* CRC disable" results are from skipping CRC calculations in the code altogether and returning "verify ok" without calculating them. The create rate is identical to the crc=1,finobt=1 numbers and the CPU usage is /slightly higher/ than when CRCs are enabled. IOWs, for most workloads CRCs have no impact on filesystem performance. Yes, there are cases where there is some impact (as Nick has pointed out) but these are situations where minor optimisations can make a big difference, such as Nick's suggestion to zero before write rather than do two partial calcs. This is not a severe issue - it's just the normal process of iterative improvement. Cheers, Dave.
Dave Chinner wrote: > > As most users never have things go wrong, all they think is "CRCs > are unnecessary overhead". It's just like backups - how many people > don't make backups because they cost money right now and there's no > tangible benefit until something goes wrong which almost never > happens? ---- But it's not like backups. You can't run a util program upon discovering bad CRC's that will fix the file system because the file system is no longer usable. That means you have to restore from backup. Thus, for those keeping backups, there is no benefit, as they'll have to restore from backups in either case. > Exactly my point. Humans are terrible at risk assessment and > mitigation because most people are unaware of the unconcious > cognitive biases that affect this sort of decision making. --- My risk is near 0 since my file systems are monitored by a raid controller with read patrols made over the data on a period basis. I'll assert that the chance of data randomly going corrupt is much higher because there is ALOT more data than metadata. On top of that, because I keep backups, my risk, is at worst, the same without crc's as with them. > It actually slows down > allocation on an empty filesystem and trades off that increase in > "empty filesystem" overhead for significantly better aged filesystem > inode allocation performance. ---- Ok, let's see, ages of my file systems: 4 from 2009 (7 years) 1 from 2013 (3 years) 9 from 2014 (2 years) --- I don't think I have any empty or new filesystems (FWIW, I store the creation time in the UUID). i.e. the finobt provides more > deterministic inode allocation overhead, not "faster" allocation. > > Let me demonstrate with some numbers on empty filesystem create > rate: > > create rate sys CPU time write rate > (files/s) (seconds) (MB/s) > crc = 0, finobt = 0: 238943 2629 ~200 > crc = 1, finobt = 0: 231582 2711 ~40 > crc = 1, finobt = 1: 232563 2766 ~40 > *hacked* crc disable: 231435 2789 ~40 > We can see that the system CPU time increased by 3.1% with the > "addition of CRCs". The CPU usage increases by a further 2% with > the addition of the free inode btree, --- On an empty file system or older ones that are >50% used? It's *nice* to be able to benchmarks, but not allowing crc to be disabled, disables that possibility -- and that's sorta the point. In order to prove you point, you created a benchmark with crc's disabled. But the thing about benchmarks is making so others can reproduce your results. That's the problem. If I could do the same benchmarks, and get similar results, I'd give up as finobt not being worth it. But I'm not able to run such tests on my workload and/or filesystems. The common advice about performance numbers and how they are affected by options is to do benchmarks on your own systems with your own workload and see if the option helps. That's what I want to do. Why deny that? which should give you an idea > of how much CPU time even a small btree consumes. --- In a non-real-world case on empty file systems. How does it work in the real world on file systems like mine? I know the MB/s isn't close, w/my max sustained I/O rates being about 1GB/s (all using direct i/o -- rate drops significantly if I use kernel buffering). Even not pre-allocating and defragmenting the test file will noticeable affect I/O rates. Showing the result on an empty file system is when finobt would have the *least* affect, since it is when the kernel has to search for space that things slow down, but if the free space is pre-allocated in a dedicated b-tree, then the kernel doesn't have to search -- which would be a much bigger difference than on an empty file system. The allocated > inode btree is huge in comparison to the finobt in this workload, > which is why even a small change in header size (when CRCs are > enabled) makes a large difference in CPU usage. > > To verify that CRC has no significant impact on inode allocation, > let's look at the actual CPU being used by the CRC calculations in > this workload are: > > 0.28% [kernel] [k] crc32c_pcl_intel_update --- And how much is spent searching for free space? On multi-gig files it can reduces I/O rates by 30% or more. > > Only a small proportion of the entire increase in CPU consumption > that comes from "turning on CRCS". Indeed, the "*hacked* CRC > disable" results are from skipping CRC calculations in the code > altogether and returning "verify ok" without calculating them. The > create rate is identical to the crc=1,finobt=1 numbers and the CPU > usage is /slightly higher/ than when CRCs are enabled. > > IOWs, for most workloads CRCs have no impact on filesystem > performance. --- Too bad no one can test such the effect on their own workloads, though if not doing crc's takes more CPU, then it sounds like an algorithm problem: crc calculations don't take "negative time", and a benchmark showing they do indicates something else is causing the slowdown. > Cheers, > Dave. ---- Sigh... and Cheers to you too! ;-) Linda -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/3/16 11:04 AM, L.A. Walsh wrote: > > > Dave Chinner wrote: >> >> As most users never have things go wrong, all they think is "CRCs >> are unnecessary overhead". It's just like backups - how many people >> don't make backups because they cost money right now and there's no >> tangible benefit until something goes wrong which almost never >> happens? > ---- > But it's not like backups. You can't run a util > program upon discovering bad CRC's that will fix the file system > because the file system is no longer usable. Sure you can - xfs_repair knows what to do in such a case. It's not guaranteed that you get /all/ your data back, in every corrupted fs situation, but that's not because of CRCs - a bad CRC is just the messenger about filesystem corruption. > That means you > have to restore from backup. Thus, for those keeping > backups, there is no benefit, as they'll have to restore > from backups in either case. Again, as Dave said, the format changes implemented for CRCS help us support XFS. It's not something /directly/ useful to the user, other than possibly stopping a corruption before it gets worse, by early detection. If you have a corruption which is so bad that you have to go to backups, that would be true with or without CRCs. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 03, 2016 at 09:04:42AM -0700, L.A. Walsh wrote: > > > Dave Chinner wrote: > > > >As most users never have things go wrong, all they think is "CRCs > >are unnecessary overhead". It's just like backups - how many people > >don't make backups because they cost money right now and there's no > >tangible benefit until something goes wrong which almost never > >happens? > ---- > But it's not like backups. You can't run a util > program upon discovering bad CRC's that will fix the file system > because the file system is no longer usable. xfs_repair will fix it, just like it will fix the same corruption on non-CRC filesystems. > >Exactly my point. Humans are terrible at risk assessment and > >mitigation because most people are unaware of the unconcious > >cognitive biases that affect this sort of decision making. > --- > My risk is near 0 since my file systems are monitored > by a raid controller with read patrols made over the data on > a period basis. If I had a dollar for every time someone said "hardware raid protects me" I'd have retired years ago. Media scrubbing does not protect against misdirected writes, corruptions to/from the storage, memory errors, software bugs, bad compilers (yes, we've already had XFS CRCs find a compiler bug), etc. > I'll assert that the chance of data randomly > going corrupt is much higher because there is ALOT more data than > metadata. On top of that, because I keep backups, my risk, is > at worst, the same without crc's as with them. The /scale of disaster/ for metadata corruption is far higher than for file data - a single bit error can trash the entire filesystem. You may not care about this, but plenty of other XFS users do. > i.e. the finobt provides more > >deterministic inode allocation overhead, not "faster" allocation. > > > >Let me demonstrate with some numbers on empty filesystem create > >rate: > > > > create rate sys CPU time write rate > > (files/s) (seconds) (MB/s) > >crc = 0, finobt = 0: 238943 2629 ~200 > >crc = 1, finobt = 0: 231582 2711 ~40 > >crc = 1, finobt = 1: 232563 2766 ~40 > >*hacked* crc disable: 231435 2789 ~40 > > > >We can see that the system CPU time increased by 3.1% with the > >"addition of CRCs". The CPU usage increases by a further 2% with > >the addition of the free inode btree, > --- > On an empty file system or older ones that are >50% > used? > > It's *nice* to be able to benchmarks, but not allowing > crc to be disabled, disables that possibility -- and that's > sorta the point. If you want to reproduce the above numbers, the script is below. You don't need the "CRC disable" hack to test whether CRCs have overhead or not, CPU profiles are sufficient for that. But, really, I don't care about whether you can reproduce these tests, because microbenchmarks don't matter to production systems. That is,, you haven't provided any numbers to back up your assertions that CRCs have excessive overhead for /your workload/. For me to care about what you are saying, you need to demonstrate a performance degradation between v4 and v5 filesystem formats for /your workloads/. I can't do this for you. I don't know what your workload is, nor what hardware you use. *Give me numbers* that I can work with - something we can measure and fix. You need to do the work to show there's an issue - I can't do that for you, and no amount of demanding that I do will change that. > >IOWs, for most workloads CRCs have no impact on filesystem > >performance. > --- > Too bad no one can test such the effect on their > own workloads, though if not doing crc's takes more CPU, then > it sounds like an algorithm problem: crc calculations don't > take "negative time", and a benchmark showing they do indicates > something else is causing the slowdown. I'm guessing that you aren't aware of how memory access patterns affect performance on modern CPUs. i.e. sequential memory access to a structure is much faster than random meory access becase the hardware prefetchers detect the sequential accesses and minimises cache miss latency. e.g. for a typical 4k btree block, doing a binary search on a cache cold block requires 10-12 cache misses for complete search. However, if we run a CRC check on it, we take a couple of cache misses before the hardware prefetcher kicks in then it's just CPU time to run the calc. Then the binary search doesn't have a cache miss at all. Hence if the CRC calc is fast enough (and for h/w accel it is fast enough) adding CRCs will make the code run faster.... This is actually a well known behaviour of modern CPUs - for years we've been using memset() to initialise structures when it's technically not necessary because it's the fastest way to prime the CPU caches for upcoming accesses to that structure. Cheers, Dave.
On Fri, 4 Nov 2016 11:12:48 +1100 Dave Chinner <david@fromorbit.com> wrote: > On Fri, Oct 28, 2016 at 03:17:47AM +1100, Nicholas Piggin wrote: > > Hi guys, > > > > We're seeing crc32c_le show up in xfs log checksumming on a MySQL benchmark > > on powerpc. I could reproduce similar overheads with dbench as well. > > > > 1.11% mysqld [kernel.vmlinux] [k] __crc32c_le > > | > > ---__crc32c_le > > | > > --1.11%--chksum_update > > | > > --1.11%--crypto_shash_update > > crc32c > > xlog_cksum > > xlog_sync > > _xfs_log_force_lsn > > xfs_file_fsync > > vfs_fsync_range > > do_fsync > > sys_fsync > > system_call > > 0x17738 > > 0x17704 > > os_file_flush_func > > fil_flush > > > > As a rule, it helps the crc implementation if it can operate on as large a > > chunk as possible (alignment, startup overhead, etc). So I did a quick hack > > at getting XFS checksumming to feed crc32c() with larger chunks, by setting > > the existing crc to 0 before running over the entire buffer. Together with > > some small work on the powerpc crc implementation, crc drops below 0.1%. > > > > I don't know if something like this would be acceptable? It's not pretty, > > but I didn't see an easier way. > > Here's an alternative, slightly cleaner patch that optimises the CRC > update side but leaves the verify side as it is. I've not yet > decided exactly what is cleanest for the xlog_cksum() call in log > recovery, but that won't change the performance of the code. Can > you give this a run through, Nick? Hi Dave, Yeah sorry for the slow response, I've been getting a more realistic MySQL benchmark setup working (what I had reproduced what appeared to be the same overhead, but I wanted to get something better to retest with). So your patch comes at a good time (and thanks for working on it). I'll see if I can get something running and have results for you by next week. Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dave Chinner wrote: > If I had a dollar for every time someone said "hardware raid > protects me" I'd have retired years ago. ---- It's not a sole line of protection, but it does a fair job of *detection*. > Media scrubbing does not protect against misdirected writes, > corruptions to/from the storage, memory errors, software bugs, bad > compilers (yes, we've already had XFS CRCs find a compiler bug), > etc. --- Crc's don't _protect_ against such things either -- they can allow detection, but for protection ... I make due with xfsdump. As for crc's -- I already had the feature prevent me from creating and using new partitions where it didn't like me setting the disk UUID when creating a new partition -- something that I heard would be fixed -- but something that prevented me from testing the new finobt feature I was interested in. I'm sure I'm not alone in wanting to test in my environment, but _both_ with and without the crc option. I seem to remember, recently, that it took other kernel developers to disable xfs panicing the entire system on problem detection, with the idea of only disabling what was broken. Along the same lines, if I am using crc's and badness is detected, I'd want to be able to keep the disk online, though in "read-only" mode to allow me to explore what was wrong and find the best solution. > You may not care about this, but plenty of other XFS users do. --- That's a crock -- I never said I didn't care about having it -- just that I wanted to be able to run finobt with crc's being disabled. I noted after my last response, that you said your crc testing only jumped around the tests to compare crc's -- which sounded like they were still being generated but not checked -- no wonder the no-crc option was the slowest of tested options. > I can't do this for you. I don't know what your workload is, nor > what hardware you use. *Give me numbers* that I can work with - > something we can measure and fix. You need to do the work to show > there's an issue - I can't do that for you, and no amount of > demanding that I do will change that. I don't want to bother others with my testing, I just want the platform to be open enough for me to quietly do my own testing and go forward from there. I may be ignorant, but I'm also picky about things I care about -- thus I prefer to do _some_ things myself, thank-you. I realize that closed-platforms where the undesirable is bundled with the desirable and where end-users have no choice is the wave of the future. I realize that in many cases, large corporations are pushing these changes. Microsoft's last "open-update" that allowed selecting what updates you wanted is history and now you are forced to take everything or nothing. Such offerings are not made for the benefit of the users -- it's not something they want. But it doesn't matter, it's where large companies are pushing things so consumers will be easier to control. I wouldn't be surprised if this feature bundling was being pushed by project sponsors and that developers had little choice and I may not either, as I don't have the smallest fraction of the time I would need to add in options or changes to the SW I use, just to hold the "status quo", let alone make improvements. All I can usually do is ask and later, "re-ask" -- since policies in effect at one point in time may not be present later. Cheers! -l -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/4/16 1:56 AM, L.A. Walsh wrote: > > > Dave Chinner wrote: >> If I had a dollar for every time someone said "hardware raid >> protects me" I'd have retired years ago. > ---- > It's not a sole line of protection, but it does a fair > job of *detection*. >> Media scrubbing does not protect against misdirected writes, >> corruptions to/from the storage, memory errors, software bugs, bad >> compilers (yes, we've already had XFS CRCs find a compiler bug), >> etc. > --- > Crc's don't _protect_ against such things either -- they > can allow detection, but for protection ... I make due with > xfsdump. As for crc's -- I already had the feature > prevent me from creating and using new partitions where it > didn't like me setting the disk UUID when creating a > new partition -- something that I heard > would be fixed -- but something that prevented me from testing > the new finobt feature I was interested in. That has been fixed for over a year, now, FWIW. v4.3 kernel / v4.2.0 xfsprogs > I'm sure I'm not alone in wanting to test in my environment, but _both_ with and without the crc option. > I seem to remember, recently, that it took other kernel > developers to disable xfs panicing the entire system on problem > detection, with the idea of only disabling what was broken. Sorry, what? > Along the same lines, if I am using crc's and badness is > detected, I'd want to be able to keep the disk online, though > in "read-only" mode to allow me to explore what was wrong and find > the best solution. >> You may not care about this, but plenty of other XFS users do. > --- > That's a crock -- I never said I didn't care about having > it -- just that I wanted to be able to run finobt with crc's being > disabled. If you want an untestable a la carte mishmash every option, there are other filesystems which will do that for you. They have lots of weird broken corner cases as a result. The xfs development crew has limited resources, and part of the decision to /not/ make infinite combinations of features available was so that we can get reliable coverage in testing and produce a robust result. But we already went through all this on a similar thread 1 year ago... > I noted after my last response, that you said your crc > testing only jumped around the tests to compare crc's -- which > sounded like they were still being generated but not checked -- no wonder the no-crc option was the slowest of tested options. > >> I can't do this for you. I don't know what your workload is, nor >> what hardware you use. *Give me numbers* that I can work with - >> something we can measure and fix. You need to do the work to show >> there's an issue - I can't do that for you, and no amount of >> demanding that I do will change that. > > I don't want to bother others with my testing, I just want the > platform to be open enough for me to quietly do my own testing > and go forward from there. I may be ignorant, but I'm also > picky about things I care about -- thus I prefer to do _some_ > things myself, thank-you. > > I realize that closed-platforms where the undesirable is bundled with the desirable and where end-users have no choice > is the wave of the future. I realize that in many cases, large > corporations are pushing these changes. Oh come on, now. Open source doesn't mean everyone gets a pet pony in their favorite color. Failing to offer said ponies to each user doesn't make this a "closed platform" and I resent the assertion on behalf of Dave and other xfs developers. We're performing a herculean task with a skeleton crew. Dave, in particular, has performed tirelessly for /years/ helping to create what is now arguably the highest-performing, most robust, most scalable filesystem available on Linux. A large part of XFS's success has been due to wise design and development decisions, /including/ the decision to keep the test matrix under control. And I'm not sure what to make sure of your mention of "large corporations" but I can assure you that decisions like this are made within the xfs developer community, without outside influence. -Eric > -l -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/xfs/libxfs/xfs_cksum.h b/fs/xfs/libxfs/xfs_cksum.h index fad1676..88f4431 100644 --- a/fs/xfs/libxfs/xfs_cksum.h +++ b/fs/xfs/libxfs/xfs_cksum.h @@ -9,20 +9,9 @@ * cksum_offset parameter. */ static inline __uint32_t -xfs_start_cksum(char *buffer, size_t length, unsigned long cksum_offset) +xfs_start_cksum(void *buffer, size_t length) { - __uint32_t zero = 0; - __uint32_t crc; - - /* Calculate CRC up to the checksum. */ - crc = crc32c(XFS_CRC_SEED, buffer, cksum_offset); - - /* Skip checksum field */ - crc = crc32c(crc, &zero, sizeof(__u32)); - - /* Calculate the rest of the CRC. */ - return crc32c(crc, &buffer[cksum_offset + sizeof(__be32)], - length - (cksum_offset + sizeof(__be32))); + return crc32c(XFS_CRC_SEED, buffer, length); } /* @@ -42,22 +31,29 @@ xfs_end_cksum(__uint32_t crc) * Helper to generate the checksum for a buffer. */ static inline void -xfs_update_cksum(char *buffer, size_t length, unsigned long cksum_offset) +xfs_update_cksum(void *buffer, size_t length, unsigned long cksum_offset) { - __uint32_t crc = xfs_start_cksum(buffer, length, cksum_offset); + __le32 *crcp = buffer + cksum_offset; - *(__le32 *)(buffer + cksum_offset) = xfs_end_cksum(crc); + *crcp = 0; /* set to 0 before calculating checksum */ + *crcp = xfs_end_cksum(xfs_start_cksum(buffer, length)); } /* * Helper to verify the checksum for a buffer. */ static inline int -xfs_verify_cksum(char *buffer, size_t length, unsigned long cksum_offset) +xfs_verify_cksum(void *buffer, size_t length, unsigned long cksum_offset) { - __uint32_t crc = xfs_start_cksum(buffer, length, cksum_offset); + __le32 *crcp = buffer + cksum_offset; + __le32 crc = *crcp; + __le32 new_crc; + + *crcp = 0; + new_crc = xfs_end_cksum(xfs_start_cksum(buffer, length)); + *crcp = crc; /* restore original CRC in place */ - return *(__le32 *)(buffer + cksum_offset) == xfs_end_cksum(crc); + return new_crc == crc; } #endif /* _XFS_CKSUM_H */ diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 8de9a3a..efd8daa 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -419,15 +419,11 @@ xfs_dinode_calc_crc( struct xfs_mount *mp, struct xfs_dinode *dip) { - __uint32_t crc; - if (dip->di_version < 3) return; ASSERT(xfs_sb_version_hascrc(&mp->m_sb)); - crc = xfs_start_cksum((char *)dip, mp->m_sb.sb_inodesize, - XFS_DINODE_CRC_OFF); - dip->di_crc = xfs_end_cksum(crc); + xfs_update_cksum(dip, mp->m_sb.sb_inodesize, XFS_DINODE_CRC_OFF); } /* diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 3b74fa0..4e242f0 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1658,7 +1658,7 @@ xlog_pack_data( * This is a little more complicated than it should be because the various * headers and the actual data are non-contiguous. */ -__le32 +void xlog_cksum( struct xlog *log, struct xlog_rec_header *rhead, @@ -1667,10 +1667,9 @@ xlog_cksum( { __uint32_t crc; - /* first generate the crc for the record header ... */ - crc = xfs_start_cksum((char *)rhead, - sizeof(struct xlog_rec_header), - offsetof(struct xlog_rec_header, h_crc)); + /* first generate the crc for the record header with 0 crc... */ + rhead->h_crc = 0; + crc = xfs_start_cksum(rhead, sizeof(struct xlog_rec_header)); /* ... then for additional cycle data for v2 logs ... */ if (xfs_sb_version_haslogv2(&log->l_mp->m_sb)) { @@ -1691,7 +1690,7 @@ xlog_cksum( /* ... and finally for the payload */ crc = crc32c(crc, dp, size); - return xfs_end_cksum(crc); + rhead->h_crc = xfs_end_cksum(crc); } /* @@ -1840,8 +1839,7 @@ xlog_sync( } /* calculcate the checksum */ - iclog->ic_header.h_crc = xlog_cksum(log, &iclog->ic_header, - iclog->ic_datap, size); + xlog_cksum(log, &iclog->ic_header, iclog->ic_datap, size); #ifdef DEBUG /* * Intentionally corrupt the log record CRC based on the error injection diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 2b6eec5..18ba385 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -432,7 +432,7 @@ xlog_recover_finish( extern int xlog_recover_cancel(struct xlog *); -extern __le32 xlog_cksum(struct xlog *log, struct xlog_rec_header *rhead, +extern void xlog_cksum(struct xlog *log, struct xlog_rec_header *rhead, char *dp, int size); extern kmem_zone_t *xfs_log_ticket_zone; diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 9b3d7c7..3f62d9a 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -5114,8 +5114,13 @@ xlog_recover_process( { int error; __le32 crc; + __le32 old_crc; - crc = xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len)); + old_crc = rhead->h_crc; + rhead->h_crc = 0; + xlog_cksum(log, rhead, dp, be32_to_cpu(rhead->h_len)); + crc = rhead->h_crc; + rhead->h_crc = old_crc; /* * Nothing else to do if this is a CRC verification pass. Just return