Time |
Nickname |
Message |
00:46
π
|
DFJustin |
http://arts.nationalpost.com/2014/01/10/the-book-closes-on-a-golden-age/ |
03:42
π
|
SketchCow |
FINISHED --2014-01-13 20:42:32-- |
03:42
π
|
SketchCow |
Total wall clock time: 6d 4h 42m 3s |
03:42
π
|
SketchCow |
Downloaded: 70061 files, 269G in 5d 11h 20m 3s (596 KB/s) |
03:42
π
|
SketchCow |
ha ha, these never stop being hilarious. |
04:05
π
|
kyan_lapt |
DFJustin, "IΓ’ΒΒm happy to see a lot of [these] books go to the landfill" that guy saysΓ’ΒΒ¦ that attitude should rot in hell |
04:05
π
|
kyan_lapt |
Γ’ΒΒ¦ umm to be blunt |
04:05
π
|
kyan_lapt |
I hope he gives the books to IA :) |
04:06
π
|
kyan_lapt |
would be a better home for them |
04:06
π
|
kyan_lapt |
than someone who thinks like that |
04:06
π
|
kyan_lapt |
</rant> |
05:21
π
|
SketchCow |
A rant? In #archiveteam? Well I never |
07:47
π
|
joepie91 |
kyan__: jesus that guy is bitter |
08:59
π
|
Nemo_bis |
Look what they made me do, Steiner... https://archive.org/search.php?query=creator%3A%22Rudolf+Steiner%22 |
09:04
π
|
Nemo_bis |
"Instead, heΓ’ΒΒs been driving into the city three days a week to manage the store": three days a week opening? I doubt that's an effective way to do commerce |
12:07
π
|
blit |
I have a personal interest in erotic literature and the sites that collect them. I am looking to expand my collection and this neatly dovetails with artchieteams interest in these sites. Being a programmer puts me in the positionof being able to do something: specifically I want to make an up to date archive of asstr.org. |
12:08
π
|
blit |
I've attempted this earlier in the year using wget on the http version of the site, but the results I got were only partial and it was a bit of a clusterfuck. |
12:09
π
|
blit |
I'm wondering if any y'all got advice on how I should go about this. For reference my primary os is ubuntu raring 13.04 and I'm a ruby programmer, but I'm open to anything that will work better than wget |
12:11
π
|
blit |
of course if I can get this to happen successfully, I'd be happy to put up a torrent - thought alas my regular upstream is only 100KB/s |
12:13
π
|
blit |
also: I'm highly interested in doing the same for literotica.com and storiesonline.net |
12:22
π
|
dashcloud |
so, if you want stuff to be in the Interner Archive, you'll need to make WARCs of it- wget-warc is probably the easiest one: http://www.archiveteam.org/index.php?title=Wget , but you should also take a look at http://www.archiveteam.org/index.php?title=ArchiveBot which greatly simplifies things once it is running |
12:23
π
|
blit |
dashcloud, thanks, having a look at those links now |
12:25
π
|
blit |
ok, so looks like wget may be the way to go after all... well however I decide to do it I'll make it output warcs to make it easier for y'all |
12:26
π
|
blit |
looking at archivebot, it seems like it's for smaller sites? my (limited) dump that I've already done is 400,000 files+ |
12:30
π
|
blit |
eugh that's right, they hide all the www folders if you access it via ftp |
12:32
π
|
blit |
hmm, as long as I take a dump of the ftp site _and_ the web site then it should be complete |
12:45
π
|
blit |
I might sleep on this and try to attack the problem tomorrow night. But this is something I'm really keen on, so expect to see me back here ;) |
15:15
π
|
balrog |
for ftp sites I've had best results with lftp |
16:58
π
|
SketchCow |
Downloaded: 394892 files, 37G in 2d 22h 18m 51s (154 KB/s) |
16:58
π
|
SketchCow |
FINISHED --2014-01-14 12:03:47-- |
16:58
π
|
SketchCow |
Total wall clock time: 7d 4h 56m 16s |
16:58
π
|
SketchCow |
Thank youuuuuuuuuuu slow connection. |
17:07
π
|
xmc |
you're welcome! |
17:18
π
|
SketchCow |
It's nvg.org. |
17:18
π
|
SketchCow |
It's a really nice collection of other FTP sites, as per my blog entry. |
17:36
π
|
sep332 |
http://www.operationwardiary.org/ - transcribe WWI British regimental diaries |
17:43
π
|
SketchCow |
so, I'm probably going to split off the CD-ROM cover disks into a separate sub-collection of cdbbsarchive, or maybe up into a under-software one |
17:46
π
|
Nemo_bis |
there are many indeed, and a lot of effort worth some visibility |
17:54
π
|
DFJustin |
personally I would have it as a sub-collection of cdbbsarchive, and move pc_cdrom to under software |
17:55
π
|
joepie91 |
SketchCow: great news! I have a working DVD drive hooked up now, so I can continue making images of a bunch of shareware-ish discs I have laying around, probably |
17:55
π
|
joepie91 |
SketchCow: relatedly, I think the few discs that I already uploaded aren't in that collection yet |
17:55
π
|
SketchCow |
Oh good. |
17:56
π
|
joepie91 |
shall I dig up the URLs? |
17:56
π
|
joepie91 |
actually, that was easier than I expected |
17:57
π
|
joepie91 |
https://archive.org/details/InteraktiefSpellenDeel1, https://archive.org/details/ComputerEasyMagazineDiscFebruary2003, https://archive.org/details/CompuKidsCDRom2001 |
17:57
π
|
joepie91 |
interaktiefspellen is a little broken, but I managed to mend most of the damage |
17:59
π
|
joepie91 |
(as noted in the description, also) |
17:59
π
|
joepie91 |
turns out that whiteboard marker and white stickers are actually a really good way to make several damaged discs mostly readable again when there's foil damage |
17:59
π
|
joepie91 |
severely * |
18:28
π
|
asie |
hello |
18:46
π
|
SketchCow |
https://archive.org/details/coverdiscs |
20:57
π
|
godane |
SketchCow: just know i have more coverdisks on cdbbsarchive |
20:58
π
|
godane |
ok looks like game.exe is all in coverdisk collection |
20:59
π
|
godane |
also if your putting tons of my stuff in to coverdisk can i have access |
20:59
π
|
godane |
cause if its not going to be in cdbbsarchive then i will not have access to it |
21:07
π
|
SketchCow |
You have it |
21:14
π
|
godane |
thanks |
21:16
π
|
godane |
SketchCow: right now i'm grabbing cnn all access podcasts |
21:17
π
|
godane |
that podcast has ended in 2008 so its going up first |
21:19
π
|
godane |
also based on the xml data for cnn videos i think are mostly under 20mb |
21:20
π
|
godane |
also i was able to push it up to 640x360 video res |
21:22
π
|
godane |
i'm also grabbing stuff like student news podcast |
22:44
π
|
dashcloud |
relevant to SketchCow 's latest blog post about FTP sites: https://www.piratepad.ca/p/old-ftp-list a list of noteworthy FTP sites from 1996 (from the book Internet Games Directory, also available on IA) |
22:49
π
|
SketchCow |
I'm happy to use this as a hitlist. |
23:16
π
|
SketchCow |
Is Alex Handy around |
23:16
π
|
ersi |
That'd be.. handy. |
23:16
π
|
* |
ersi lets himself out |
23:28
π
|
dashcloud |
probably all the URLs in that book (which was way ahead of its time and actually included an ebook made up of HTML pages) would probably be good archive candidates |
23:49
π
|
xmc |
SketchCow: could you please move http://archive.org/details/2013.09.bbs.bajer.cz into ftpsites ? thx |
23:51
π
|
SketchCow |
You got it. |
23:51
π
|
SketchCow |
Want access? |
23:51
π
|
xmc |
that'd be great |
23:52
π
|
SketchCow |
Done |
23:52
π
|
xmc |
rad |
23:52
π
|
xmc |
now uploading ftp.redcom.ru, 20G |
23:53
π
|
SketchCow |
Let's blow people away |
23:54
π
|
xmc |
wooooo |
23:55
π
|
xmc |
also I have ftp.oracle.com ftp.funet.fi ftp.3gpp.org |
23:55
π
|
SketchCow |
I decided, if it isn't obvious, we're just going to download all FTP sites. |
23:55
π
|
xmc |
sounds good |
23:55
π
|
SketchCow |
Fuck it, let's dupe it |
23:55
π
|
SketchCow |
We'll move to per-ip scanning soon |
23:55
π
|
SketchCow |
We're like that. |
23:55
π
|
xmc |
why not |
23:56
π
|
SketchCow |
5.8G vim |
23:56
π
|
SketchCow |
root@teamarchive0:/1/FTPSITE/ftp.icm.edu.pl/pub# du -sh vim |
23:56
π
|
SketchCow |
vim. |
23:56
π
|
SketchCow |
VIM. |
23:56
π
|
SketchCow |
6gb. |
23:57
π
|
xmc |
dag |
23:57
π
|
SketchCow |
That shit cray |
23:57
π
|
SketchCow |
I'm sure the rest are equivalently insane. |
23:59
π
|
xmc |
hm, where the fuck did my grab of ftp.qwest.net go |
23:59
π
|
xmc |
qwest recently-bought-by-centurylink |
23:59
π
|
xmc |
or did I not get that in time |
23:59
π
|
xmc |
:| |