#archiveteam 2014-10-01,Wed

↑back Search

Time Nickname Message
06:49 🔗 midas scan _all_ the internet
07:08 🔗 SketchCow Scanning
10:48 🔗 Muad-Dib just grabbed all videos on this playlist, hahaha oh wow
10:48 🔗 Muad-Dib http://www.youtube.com/watch?v=u9MpsAftCDk&list=PLAX8JHUJcFR2gh_WG3YJBITuO-tODVCcJ&index=3
14:11 🔗 balrog http://wwdbam.com/category/podcasts/ keeps archives but they purge old ones frequently
14:11 🔗 balrog so it's not really "archives"
14:30 🔗 godane balrog: i'm sending it to archivebot
14:30 🔗 balrog godane: thing is, it's something that would need to be archived periodically :/
14:31 🔗 godane i know
14:50 🔗 joepie91 perhaps archivebot should have a --scheduled flag, cc yipdw
16:07 🔗 SketchCow OK, I need help.
16:07 🔗 SketchCow ftp.sunet.se
16:08 🔗 SketchCow It's too big. I can't have FOS do the work of downloading it. Can people please team up and take pieces?
16:29 🔗 Muad-Dib SketchCow, try #effteepee
17:05 🔗 GChriss what's the best way to propose a site as a new archive project?
17:05 🔗 GChriss also:
17:05 🔗 GChriss WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD
17:05 🔗 Kazzy yahoosucks
17:11 🔗 GChriss it's not a deathwatch project in the traditional sense, but important content has a tendency to go missing after a few years
17:11 🔗 GChriss most ppl don't notice due to influx of new content
17:11 🔗 GChriss and it's non-accessible by archive.org's crawlbot
17:13 🔗 GChriss +email inquiries for missing content go unanswered
17:16 🔗 Kazzy GChriss: if it's not an absolutely huge site, you could get someone to check it out in #archivebot
17:17 🔗 SketchCow Don't keep us in suspense, bro
17:18 🔗 GChriss it's moderate in all: mostly text + occasional video
17:18 🔗 GChriss that would be the Knight News Challenge
17:19 🔗 SketchCow oh that!
17:19 🔗 * SketchCow is on that
17:20 🔗 GChriss things would be easier if IA's "archive this page" was a single URL, w/o "click here to archive" javascript
17:20 🔗 schbirid there is a bookmarklet but it never really worked well for me iirc
17:21 🔗 GChriss there's new security restrictions that limit bookmarket functionality
17:22 🔗 GChriss URL downloads no longer supported
17:36 🔗 SketchCow Archivebot can handle it.
17:40 🔗 GChriss there's a "View More" button at the bottom of the entries page: can archive bot read past this? (I think not?)
17:40 🔗 GChriss https://www.newschallenge.org/challenge/libraries/submissions/
17:40 🔗 GChriss also no robots.txt
17:41 🔗 GChriss I've manually submitted the ~600 projects to IA in the last round
17:42 🔗 GChriss don't let that throw you
17:46 🔗 DFJustin probably not
18:48 🔗 SketchCow I've got a process/project underway to get as much data off FOS as possible, one of those clean-throughs I do every month or so. If you see a shitload of stuff I'm uploading, that's what it is.
19:16 🔗 SketchCow Ancestry is 2tb of love, that's going in
19:24 🔗 ersi Holy moly.
19:32 🔗 arkiver Awesome SketchCow! I'm excited to see it show up in the wayback machine :)
19:53 🔗 SketchCow TONS of tiny accounts in these.
19:53 🔗 SketchCow I dropped per-item to 40gb because there's so many in each one.
19:53 🔗 SketchCow Which means lots of items.
19:56 🔗 godane SketchCow: i'm doing my monthly upload cleaning too
19:56 🔗 godane at least get the news collection up to date
20:04 🔗 godane SketchCow: i uploaded 3 dvds of linux format the other day
20:04 🔗 godane disk 186, 187, and 188
20:07 🔗 SketchCow Great
20:40 🔗 Arkiver2 SketchCow: there are 4 websites for ancestry: mundia.com, myfamily.com, mycanvas.com and genealogy.com/familytreemaker.genealogy.com/familyorigins.com
20:40 🔗 Arkiver2 mycanvas is staying (see websites)
20:41 🔗 Arkiver2 mundia and myfamily are going away
20:41 🔗 Arkiver2 genealogy has announced to make everything read-only
20:41 🔗 Arkiver2 so I think it would be a good idea to keep archiving everything from genealogy, since it's now read-only and now changes will be made anymore
21:05 🔗 SketchCow No arguments here.
21:05 🔗 SketchCow I'm just shoving out stuff from the buffer machine into the wayback.
21:28 🔗 Muad-Dib https://8chan.co/rip.txt ;_;7
22:10 🔗 kyan Hi! Is there a copy of the file "urls-2011-11-29-2200.tar.bz2" available? It was at http://db.tt/GNrEh61y (linked from http://archiveteam.org/index.php?title=Knol ) but is now gone. Also: the wiki page on Knol lists it as "saved", with a link to the Archives page, but I don't see any reference to it there. Thanks!
22:28 🔗 * joepie91 looks at shortened URL and hisses
22:31 🔗 joepie91 okay, fair, it was a service-specific shortened URL
22:31 🔗 joepie91 but still.
22:33 🔗 xmc imo, expanded dropbox urls aren't any better than db.tt urls
22:51 🔗 DFJustin kyan: that's an older grab before we had our processes fully figured out, I checked the usual places and don't see it so I don't know where it ended up
22:52 🔗 DFJustin hopefully whoever did it is still around
22:55 🔗 xmc would someone be able to jog my memory? i have here a few tens of GB of hg and svn repo dumps in a directory named "~/archiveteam/oracle", timestamped around mid february 2013
22:56 🔗 joepie91 xmc: Sun panicsave, maybe?
22:56 🔗 xmc right, but what was it? :P
22:56 🔗 xmc some xen stuff
22:56 🔗 joepie91 not sure
22:56 🔗 joepie91 Oracle acquisition of Sun seems like a valid reason to me to Save All The Things
22:56 🔗 xmc right
22:57 🔗 xmc well, ok.
22:57 🔗 xmc I would look at my irc logs but I'm kind of doing other things
22:58 🔗 DFJustin [15:58:21] <balrog-> in case you aren't aware, the opensolaris website is going away soon
22:58 🔗 DFJustin [15:58:23] <balrog-> it needs to be archived and the Mercurial repositories do as we'll
22:58 🔗 xmc every time I want to free up space on my laptop I notice that directory, and then forget later to check where it has gone
22:58 🔗 xmc ok, that must be it
22:59 🔗 xmc looks like this stuff never made it onto IA: https://archive.org/search.php?query=opensolaris%20collection%3Aarchiveteam-fire
22:59 🔗 xmc I'll push it up later today when I'm at a place with better neternets
22:59 🔗 kyan DFJustin: Oh, oh well :(
23:02 🔗 DFJustin looks like it was http://archiveteam.org/index.php?title=User:Emijrp
23:02 🔗 xmc emi
23:02 🔗 kyan DFJustin: Thanks, i'll send them an email :)
23:03 🔗 xmc I think emijrp is around still intermittently
23:03 🔗 DFJustin let us know how it turns out, it needs to get reuploaded into an archive.org item in our collection
23:10 🔗 kyan Shot an email off to them: https://archive.org/download/mail.google.com-saved-1Oct2014/mail.google.com-saved-1Oct2014.mail
23:13 🔗 xmc that's a very weird thing to put on IA
23:13 🔗 xmc but ok
23:14 🔗 kyan I usually upload anything that seems like it might be of interest to anyone, correspondence, archives of websites, home videos, etc
23:14 🔗 kyan I really have a visceral hatred of data being discarded
23:15 🔗 kyan so I almost always save things. In as many places as possile.
23:15 🔗 xmc fair enough
23:16 🔗 yipdw_ rm stuff
23:20 🔗 joepie91 xmc: weird shit makes the world go 'round
23:20 🔗 joepie91 :)
23:20 🔗 joepie91 (and then there's those fools who think it was this thing called 'money'...)
23:21 🔗 xmc heh
23:21 🔗 yipdw money doesn't make the world go 'round but it is a good lubricant
23:25 🔗 joepie91 SketchCow: around?
23:25 🔗 joepie91 somebody got a "no space left on device" on IA
23:26 🔗 joepie91 that's probably Not Good
23:26 🔗 joepie91 said somebody is in this channel...
23:26 🔗 * joepie91 stares
23:26 🔗 DFJustin it happens all the time on individual nodes I think
23:26 🔗 joepie91 suggested workaround?
23:27 🔗 ohhdemgir joepie91, :3
23:27 🔗 DFJustin eventually someone comes around and moves stuff off the affected node
23:28 🔗 joepie91 DFJustin: ia python module sends sizehint, does it not?
23:28 🔗 DFJustin there's plenty of space overall https://home.archive.org/~tracey/mrtg/df-week.png
23:28 🔗 joepie91 shouldn't that theoretically keep stuff like this from occurring?
23:29 🔗 DFJustin I don't know if it sends it or not but yes that is supposed to prevent it
23:29 🔗 DFJustin if the item is in the terabytes range then it may be inevitable
23:30 🔗 joepie91 309G
23:30 🔗 joepie91 per ohhdemgir
23:30 🔗 joepie91 single tar
23:31 🔗 DFJustin I dunno how they arrange things but it's conceivable that no node would have that much free at any given time and it would just give you the least full one
23:31 🔗 joepie91 hrm.
23:43 🔗 joepie91 also, context: https://catalogd.archive.org/history/2014.09.vimeoartofnakedness
23:45 🔗 DFJustin ah, so
23:45 🔗 DFJustin the item was initially created with a txt file
23:45 🔗 DFJustin then the .tar file was attempted to be added in another operation
23:46 🔗 DFJustin the size hint thing only affects the initial item creation as that is when it picks which node to put the stuff on
23:50 🔗 DFJustin it looks like there is space on the server in question so they may have fixed it by now and it may be enough to just re-run the archive job but I'll leave that to someone who knows more
23:52 🔗 underscor The disk it's on only has 277gb free
23:52 🔗 underscor Emptying it to 320G now
23:53 🔗 underscor https://catalogd.archive.org/log/337363494

irclogger-viewer