Discussion:
De-duplicating a Maildir directory
(too old to reply)
d...@brannerchinese.com
2021-12-17 09:29:43 UTC
Permalink
Does Alpine contain functionality for de-duplicating a Maildir directory?

It sometimes happens that a single message gets saved more than once to an archiving directory, and I'd like to know if there is already functionality for removing such duplicates.

Thanks!

- dpb
d...@brannerchinese.com
2021-12-17 09:37:14 UTC
Permalink
I'm aware of this free-standing application: https://github.com/kdeldycke/mail-deduplicate

But I'm wondering if there is anything comparable built into Alpine itself.

- dpb
J.O. Aho
2021-12-17 12:58:06 UTC
Permalink
Post by ***@brannerchinese.com
I'm aware of this free-standing application: https://github.com/kdeldycke/mail-deduplicate
But I'm wondering if there is anything comparable built into Alpine itself.
I think de-duplication is a file system feature, zfs has a such
functionality where it will just store one block with the same data and
then just point to that block. When you delete the last file pointing to
that block, then the block content is deleted too.

No, I Alpine don't have a function for deleting duplicate mails, you
should look at tools made for this, for example
https://github.com/kdeldycke/mail-deduplicate
--
//Aho
d...@brannerchinese.com
2021-12-18 11:40:52 UTC
Permalink
I find mail-deduplicate inadequately documented, and some of the functionality doesn't work as expected. Output, for instance, seems always to be to mbox format, even when I specify Maildir input.

However, I find fdupes (available through many package managers) helpful.

- dpb
Henning Hucke
2021-12-23 08:07:11 UTC
Permalink
[...]
Thunderbird has an addon to do this. It searches a folder, and produces
a window listing duplicates (it displays several fields), offering to
delete them. I find it a useful function.
Strange thing whis is! I never had (real) duplicates except intentional ones.
The last part of the centence means that indeed it happenes that I save
one mail to another folder without deleting the "original".
Aside from this duplicates show up from sources which obvioulsy don't
understand the task of a message ID and the necessity to avoid duplicates or
which don't know how to generate unique identifiers.

Atlassian and Jira are an bad example of that...

Nonetheless they are no real duplicates in the sense that they are
identical in message ID as well as mail body.

Best regards,
Henning
--
In the first place, God made idiots;
this was for practice; then he made school boards.
-- Mark Twain
Eduardo Chappa
2021-12-18 17:41:54 UTC
Permalink
Post by ***@brannerchinese.com
Does Alpine contain functionality for de-duplicating a Maildir directory?
It sometimes happens that a single message gets saved more than once to
an archiving directory, and I'd like to know if there is already
functionality for removing such duplicates.
Dear dpb,

if you build alpine with maildir support, then the mailutil program
bundled with Alpine will be able to read a maildir folder and remove
duplicates. What you would do is to use the mailutil program as

mailutil dedup MAILBOX_NAME

if you do not input the MAILBOX, mailutil will remove duplicates of your
INBOX. For purposes of defining a duplicate, this is understood as two
messages that have the same message-id.

I hope this helps.
--
Eduardo
https://tinyurl.com/yc377wlh (web)
http://repo.or.cz/alpine.git (Git)
Loading...