Time |
Nickname |
Message |
06:49
🔗
|
midas |
scan _all_ the internet |
07:08
🔗
|
SketchCow |
Scanning |
10:48
🔗
|
Muad-Dib |
just grabbed all videos on this playlist, hahaha oh wow |
10:48
🔗
|
Muad-Dib |
http://www.youtube.com/watch?v=u9MpsAftCDk&list=PLAX8JHUJcFR2gh_WG3YJBITuO-tODVCcJ&index=3 |
14:11
🔗
|
balrog |
http://wwdbam.com/category/podcasts/ keeps archives but they purge old ones frequently |
14:11
🔗
|
balrog |
so it's not really "archives" |
14:30
🔗
|
godane |
balrog: i'm sending it to archivebot |
14:30
🔗
|
balrog |
godane: thing is, it's something that would need to be archived periodically :/ |
14:31
🔗
|
godane |
i know |
14:50
🔗
|
joepie91 |
perhaps archivebot should have a --scheduled flag, cc yipdw |
16:07
🔗
|
SketchCow |
OK, I need help. |
16:07
🔗
|
SketchCow |
ftp.sunet.se |
16:08
🔗
|
SketchCow |
It's too big. I can't have FOS do the work of downloading it. Can people please team up and take pieces? |
16:29
🔗
|
Muad-Dib |
SketchCow, try #effteepee |
17:05
🔗
|
GChriss |
what's the best way to propose a site as a new archive project? |
17:05
🔗
|
GChriss |
also: |
17:05
🔗
|
GChriss |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
17:05
🔗
|
Kazzy |
yahoosucks |
17:11
🔗
|
GChriss |
it's not a deathwatch project in the traditional sense, but important content has a tendency to go missing after a few years |
17:11
🔗
|
GChriss |
most ppl don't notice due to influx of new content |
17:11
🔗
|
GChriss |
and it's non-accessible by archive.org's crawlbot |
17:13
🔗
|
GChriss |
+email inquiries for missing content go unanswered |
17:16
🔗
|
Kazzy |
GChriss: if it's not an absolutely huge site, you could get someone to check it out in #archivebot |
17:17
🔗
|
SketchCow |
Don't keep us in suspense, bro |
17:18
🔗
|
GChriss |
it's moderate in all: mostly text + occasional video |
17:18
🔗
|
GChriss |
that would be the Knight News Challenge |
17:19
🔗
|
SketchCow |
oh that! |
17:19
🔗
|
* |
SketchCow is on that |
17:20
🔗
|
GChriss |
things would be easier if IA's "archive this page" was a single URL, w/o "click here to archive" javascript |
17:20
🔗
|
schbirid |
there is a bookmarklet but it never really worked well for me iirc |
17:21
🔗
|
GChriss |
there's new security restrictions that limit bookmarket functionality |
17:22
🔗
|
GChriss |
URL downloads no longer supported |
17:36
🔗
|
SketchCow |
Archivebot can handle it. |
17:40
🔗
|
GChriss |
there's a "View More" button at the bottom of the entries page: can archive bot read past this? (I think not?) |
17:40
🔗
|
GChriss |
https://www.newschallenge.org/challenge/libraries/submissions/ |
17:40
🔗
|
GChriss |
also no robots.txt |
17:41
🔗
|
GChriss |
I've manually submitted the ~600 projects to IA in the last round |
17:42
🔗
|
GChriss |
don't let that throw you |
17:46
🔗
|
DFJustin |
probably not |
18:48
🔗
|
SketchCow |
I've got a process/project underway to get as much data off FOS as possible, one of those clean-throughs I do every month or so. If you see a shitload of stuff I'm uploading, that's what it is. |
19:16
🔗
|
SketchCow |
Ancestry is 2tb of love, that's going in |
19:24
🔗
|
ersi |
Holy moly. |
19:32
🔗
|
arkiver |
Awesome SketchCow! I'm excited to see it show up in the wayback machine :) |
19:53
🔗
|
SketchCow |
TONS of tiny accounts in these. |
19:53
🔗
|
SketchCow |
I dropped per-item to 40gb because there's so many in each one. |
19:53
🔗
|
SketchCow |
Which means lots of items. |
19:56
🔗
|
godane |
SketchCow: i'm doing my monthly upload cleaning too |
19:56
🔗
|
godane |
at least get the news collection up to date |
20:04
🔗
|
godane |
SketchCow: i uploaded 3 dvds of linux format the other day |
20:04
🔗
|
godane |
disk 186, 187, and 188 |
20:07
🔗
|
SketchCow |
Great |
20:40
🔗
|
Arkiver2 |
SketchCow: there are 4 websites for ancestry: mundia.com, myfamily.com, mycanvas.com and genealogy.com/familytreemaker.genealogy.com/familyorigins.com |
20:40
🔗
|
Arkiver2 |
mycanvas is staying (see websites) |
20:41
🔗
|
Arkiver2 |
mundia and myfamily are going away |
20:41
🔗
|
Arkiver2 |
genealogy has announced to make everything read-only |
20:41
🔗
|
Arkiver2 |
so I think it would be a good idea to keep archiving everything from genealogy, since it's now read-only and now changes will be made anymore |
21:05
🔗
|
SketchCow |
No arguments here. |
21:05
🔗
|
SketchCow |
I'm just shoving out stuff from the buffer machine into the wayback. |
21:28
🔗
|
Muad-Dib |
https://8chan.co/rip.txt ;_;7 |
22:10
🔗
|
kyan |
Hi! Is there a copy of the file "urls-2011-11-29-2200.tar.bz2" available? It was at http://db.tt/GNrEh61y (linked from http://archiveteam.org/index.php?title=Knol ) but is now gone. Also: the wiki page on Knol lists it as "saved", with a link to the Archives page, but I don't see any reference to it there. Thanks! |
22:28
🔗
|
* |
joepie91 looks at shortened URL and hisses |
22:31
🔗
|
joepie91 |
okay, fair, it was a service-specific shortened URL |
22:31
🔗
|
joepie91 |
but still. |
22:33
🔗
|
xmc |
imo, expanded dropbox urls aren't any better than db.tt urls |
22:51
🔗
|
DFJustin |
kyan: that's an older grab before we had our processes fully figured out, I checked the usual places and don't see it so I don't know where it ended up |
22:52
🔗
|
DFJustin |
hopefully whoever did it is still around |
22:55
🔗
|
xmc |
would someone be able to jog my memory? i have here a few tens of GB of hg and svn repo dumps in a directory named "~/archiveteam/oracle", timestamped around mid february 2013 |
22:56
🔗
|
joepie91 |
xmc: Sun panicsave, maybe? |
22:56
🔗
|
xmc |
right, but what was it? :P |
22:56
🔗
|
xmc |
some xen stuff |
22:56
🔗
|
joepie91 |
not sure |
22:56
🔗
|
joepie91 |
Oracle acquisition of Sun seems like a valid reason to me to Save All The Things |
22:56
🔗
|
xmc |
right |
22:57
🔗
|
xmc |
well, ok. |
22:57
🔗
|
xmc |
I would look at my irc logs but I'm kind of doing other things |
22:58
🔗
|
DFJustin |
[15:58:21] <balrog-> in case you aren't aware, the opensolaris website is going away soon |
22:58
🔗
|
DFJustin |
[15:58:23] <balrog-> it needs to be archived and the Mercurial repositories do as we'll |
22:58
🔗
|
xmc |
every time I want to free up space on my laptop I notice that directory, and then forget later to check where it has gone |
22:58
🔗
|
xmc |
ok, that must be it |
22:59
🔗
|
xmc |
looks like this stuff never made it onto IA: https://archive.org/search.php?query=opensolaris%20collection%3Aarchiveteam-fire |
22:59
🔗
|
xmc |
I'll push it up later today when I'm at a place with better neternets |
22:59
🔗
|
kyan |
DFJustin: Oh, oh well :( |
23:02
🔗
|
DFJustin |
looks like it was http://archiveteam.org/index.php?title=User:Emijrp |
23:02
🔗
|
xmc |
emi |
23:02
🔗
|
kyan |
DFJustin: Thanks, i'll send them an email :) |
23:03
🔗
|
xmc |
I think emijrp is around still intermittently |
23:03
🔗
|
DFJustin |
let us know how it turns out, it needs to get reuploaded into an archive.org item in our collection |
23:10
🔗
|
kyan |
Shot an email off to them: https://archive.org/download/mail.google.com-saved-1Oct2014/mail.google.com-saved-1Oct2014.mail |
23:13
🔗
|
xmc |
that's a very weird thing to put on IA |
23:13
🔗
|
xmc |
but ok |
23:14
🔗
|
kyan |
I usually upload anything that seems like it might be of interest to anyone, correspondence, archives of websites, home videos, etc |
23:14
🔗
|
kyan |
I really have a visceral hatred of data being discarded |
23:15
🔗
|
kyan |
so I almost always save things. In as many places as possile. |
23:15
🔗
|
xmc |
fair enough |
23:16
🔗
|
yipdw_ |
rm stuff |
23:20
🔗
|
joepie91 |
xmc: weird shit makes the world go 'round |
23:20
🔗
|
joepie91 |
:) |
23:20
🔗
|
joepie91 |
(and then there's those fools who think it was this thing called 'money'...) |
23:21
🔗
|
xmc |
heh |
23:21
🔗
|
yipdw |
money doesn't make the world go 'round but it is a good lubricant |
23:25
🔗
|
joepie91 |
SketchCow: around? |
23:25
🔗
|
joepie91 |
somebody got a "no space left on device" on IA |
23:26
🔗
|
joepie91 |
that's probably Not Good |
23:26
🔗
|
joepie91 |
said somebody is in this channel... |
23:26
🔗
|
* |
joepie91 stares |
23:26
🔗
|
DFJustin |
it happens all the time on individual nodes I think |
23:26
🔗
|
joepie91 |
suggested workaround? |
23:27
🔗
|
ohhdemgir |
joepie91, :3 |
23:27
🔗
|
DFJustin |
eventually someone comes around and moves stuff off the affected node |
23:28
🔗
|
joepie91 |
DFJustin: ia python module sends sizehint, does it not? |
23:28
🔗
|
DFJustin |
there's plenty of space overall https://home.archive.org/~tracey/mrtg/df-week.png |
23:28
🔗
|
joepie91 |
shouldn't that theoretically keep stuff like this from occurring? |
23:29
🔗
|
DFJustin |
I don't know if it sends it or not but yes that is supposed to prevent it |
23:29
🔗
|
DFJustin |
if the item is in the terabytes range then it may be inevitable |
23:30
🔗
|
joepie91 |
309G |
23:30
🔗
|
joepie91 |
per ohhdemgir |
23:30
🔗
|
joepie91 |
single tar |
23:31
🔗
|
DFJustin |
I dunno how they arrange things but it's conceivable that no node would have that much free at any given time and it would just give you the least full one |
23:31
🔗
|
joepie91 |
hrm. |
23:43
🔗
|
joepie91 |
also, context: https://catalogd.archive.org/history/2014.09.vimeoartofnakedness |
23:45
🔗
|
DFJustin |
ah, so |
23:45
🔗
|
DFJustin |
the item was initially created with a txt file |
23:45
🔗
|
DFJustin |
then the .tar file was attempted to be added in another operation |
23:46
🔗
|
DFJustin |
the size hint thing only affects the initial item creation as that is when it picks which node to put the stuff on |
23:50
🔗
|
DFJustin |
it looks like there is space on the server in question so they may have fixed it by now and it may be enough to just re-run the archive job but I'll leave that to someone who knows more |
23:52
🔗
|
underscor |
The disk it's on only has 277gb free |
23:52
🔗
|
underscor |
Emptying it to 320G now |
23:53
🔗
|
underscor |
https://catalogd.archive.org/log/337363494 |