Time |
Nickname |
Message |
00:44
🔗
|
godane |
so i got the very first ac360 podcast |
00:44
🔗
|
godane |
so looks like i'm maybe able to get all of them |
03:43
🔗
|
xmc |
SketchCow: I expect a 150M .txt.gz of ip addresses |
03:44
🔗
|
xmc |
about 26,400,000 addrs |
04:42
🔗
|
xmc |
now I have a way to check them automatedly |
04:42
🔗
|
xmc |
ncftp is fortunately not shitty |
05:00
🔗
|
xmc |
https://github.com/ArchiveTeam/ftp-nab |
05:27
🔗
|
bsmith093 |
blit wanted an asstr archive , already done one recently |
05:42
🔗
|
xmc |
bsmith093: I can't parse that, say again? |
05:43
🔗
|
bsmith093 |
i'm scrolling the the logs, and what i meant to say was that i've already uploaded a fairly recent asstr archive, as of about a year ago |
05:56
🔗
|
xmc |
hm, ok. what's asstr? |
07:37
🔗
|
Nemo_bis |
Today I discovered two million+ pages wikis we archived just in time: http://wikiindex.org/Wikinfo (2013) http://wikiindex.org/WebsiteWiki (2012) |
07:40
🔗
|
Nemo_bis |
xmc: great! I want to join the party and download some of those FTP sites when you're done nicely listing them :) no more huge ones though, at most 4-500 GB or so |
07:42
🔗
|
SketchCow |
I finally wrote a script to deal with this nightmare site. |
07:43
🔗
|
SketchCow |
It will: utterly download an entire subdirectory, remove the index.html?*=* files that happen, tar up the directory, delete it, and put a "already got it, don't get again" into a list. |
07:47
🔗
|
SketchCow |
It's already yanked the machine back from the brink - had one drive full of 8tb of material. |
07:49
🔗
|
midas |
always nice if it keeps downloading the same data over and over again |
07:50
🔗
|
xmc |
yow |
07:58
🔗
|
Nemo_bis |
SketchCow: how big will the uploaded items be? |
07:59
🔗
|
SketchCow |
Probably 200gb apiece |
07:59
🔗
|
Nemo_bis |
a script to download FTP sites without worrying about running our of disk or *cough cough* uploading 2 TB items would be wonderful |
07:59
🔗
|
Nemo_bis |
sensible |
07:59
🔗
|
SketchCow |
Hmmph. |
08:00
🔗
|
SketchCow |
Well, the big issue right now is that a lot of things break on archive.org dealing with that. |
08:01
🔗
|
SketchCow |
-rw-r--r-- 1 root root 46146570240 Jan 15 02:35 ftp.icm.edu.pl_amiga.tar |
08:01
🔗
|
SketchCow |
-rw-r--r-- 1 root root 1003048960 Jan 15 07:39 ftp.icm.edu.pl_beos.tar |
08:01
🔗
|
SketchCow |
-rw-r--r-- 1 root root 1185730560 Jan 15 07:14 ftp.icm.edu.pl_garbo.tar |
08:01
🔗
|
SketchCow |
etc |
08:05
🔗
|
SketchCow |
I suspect the issue is the FreeBSD and BSD directories. |
08:06
🔗
|
SketchCow |
I think they're as big as they get |
08:09
🔗
|
godane |
i found flash video of cnn going back to 2008 |
08:09
🔗
|
godane |
this a website video NOT a podcast |
08:09
🔗
|
godane |
look here: http://money.cnn.com/sitemap_videos_0001.xml |
08:10
🔗
|
godane |
if you know the image path you can find the video |
08:10
🔗
|
godane |
money/video/news becomings money/big/news |
08:11
🔗
|
godane |
then change host domain to ht3.cdn.turner.com and add _576x324_dl.flv to replace the Wxh.jpg |
08:49
🔗
|
godane |
SketchCow: looks like the older videos maybe still around |
12:28
🔗
|
ZoeB |
Does anyone have a full backup of this site? http://www.heavensgate.com/ |
12:30
🔗
|
joepie91 |
if not yet, then we will soon |
12:30
🔗
|
* |
joepie91 added it to archivebot |
12:32
🔗
|
ZoeB |
Thanks! |
12:34
🔗
|
ZoeB |
I hear you're going after every FTP site now too? |
12:37
🔗
|
ZoeB |
Might I ask you make sure ftp://ftp.modland.com/ is on that list? It's 81.4GB of Amiga mods and their derivatives (ie the music part of the demoscene), including tracker software and related utilities for multiple platforms. I would be grabbing it myself already but my Raspberry Pi doesn't have that much space, and I'm not leaving my laptop on for several weeks straight... :) |
12:37
🔗
|
ZoeB |
Username "anonymous", no password, I believe. Be gentle / slow! |
12:39
🔗
|
Baljem |
81.4GB of MODs? oh my. having Inertia Player flashbacks now... |
12:46
🔗
|
joepie91 |
ZoeB: sure, I'll grab that FTP too |
12:46
🔗
|
joepie91 |
I think wget does FTP |
12:46
🔗
|
joepie91 |
what |
12:46
🔗
|
joepie91 |
er |
12:46
🔗
|
joepie91 |
what kind of delay between requests would you recommend * |
12:47
🔗
|
* |
joepie91 has 4TB disk now, so that wouldn't be a problem to grab |
12:47
🔗
|
* |
joepie91 is also on 100mbit |
12:48
🔗
|
Nemo_bis |
100 both up and down? |
12:49
🔗
|
Nemo_bis |
delay for FTP sites? hahahahhahahahahaha |
12:50
🔗
|
joepie91 |
Nemo_bis: theoretical, yes |
12:51
🔗
|
joepie91 |
practically, it's more like 85/55 |
12:51
🔗
|
joepie91 |
because my ISP is balls |
12:51
🔗
|
joepie91 |
this is FttH, not cable, so such a large difference between theoretical and practical is ridiculous |
12:51
🔗
|
joepie91 |
I -very- rarely hit 90mbit up |
12:53
🔗
|
Nemo_bis |
mine is only 10 Mb/s but I always have 100 % of it |
12:59
🔗
|
midas |
joepie91: he said be gentle |
12:59
🔗
|
midas |
put some lube on your fiber. |
12:59
🔗
|
joepie91 |
lol |
12:59
🔗
|
joepie91 |
that's why I asked about a delay |
13:51
🔗
|
ZoeB |
Sorry, was having lunch |
13:51
🔗
|
ZoeB |
Back now! |
13:51
🔗
|
ZoeB |
To give you an idea of how busy that FTP site usually is, there server_stats.txt file says: |
13:51
🔗
|
ZoeB |
Number of bytes downloaded last 24 hours: 1751.2 MB |
13:51
🔗
|
ZoeB |
Number of files downloaded last 24 hours: 19740 |
13:52
🔗
|
ZoeB |
s/there/their |
13:52
🔗
|
ZoeB |
So, uh, please don't dwarf that, I guess! :) |
13:54
🔗
|
ZoeB |
And yes, 81.4GB of MODs. :) 30GB of Fast Tracker 2 files, 17GB of Impulse Tracker files, etc etc. It's quite the collection! |
13:55
🔗
|
ZoeB |
Thank you very much! ^.^ |
17:37
🔗
|
xmc |
isp claims I have 10M up, looks reasonable: http://zeppelin.xrtc.net/corp.xrtc.net/kyat.corp.xrtc.net/if_eth1-day.png |
17:50
🔗
|
xmc |
had to stop the ftp scan early |
17:50
🔗
|
xmc |
17:16:00 83% (3h36m left); send: 3553485314 57.3 Kp/s (57.2 Kp/s avg); recv: 22972016 355 p/s (369 p/s avg); drops: 0 p/s (4 p/s avg); hits: 0.65% |
17:51
🔗
|
xmc |
output from that run is at http://bl-r.com/trx/ftp.txt.gz |
17:52
🔗
|
xmc |
(had to stop because the isp sent the hosting provider 5 nastygrams in 10 minutes) |
18:13
🔗
|
xmc |
zcat | wc -l gives 22,961,651 addresses in that file |
18:36
🔗
|
xmc |
ding-dong ditch |
18:36
🔗
|
xmc |
people bitch |
18:40
🔗
|
SketchCow |
I've downloaded modland before. |
18:40
🔗
|
SketchCow |
All for it being downloaded again. |
18:40
🔗
|
SketchCow |
The owner hates me |
18:42
🔗
|
arkiver |
is schemer.com already fully saved? |
19:12
🔗
|
SketchCow |
[2:12:01 PM] Hank Bromley (Internet Archive): for anyone keeping score at home, anand has succeeded in changing the size column in the metadata table from integer to bigint, and that monstrous 2.1 TB item has managed to update its row, which now shows a "size" value of 2331388015 (that's in KB) |
19:12
🔗
|
SketchCow |
For the people not keeping track, that means that the Archive Team just forced Internet Archive to work with 2.1 terabyte items |
19:22
🔗
|
SketchCow |
http://archive.org/details/http://tectonicablog.com/wp-content/uploads/2010/04/lakata.org-01.jpg |
19:22
🔗
|
SketchCow |
http://archive.org/details/ftp-ftp.hp.com_pub-2013-10 sorry |
19:26
🔗
|
arkiver |
wow... |
19:26
🔗
|
arkiver |
that a really big item SketchCow.. great! :) |
19:27
🔗
|
arkiver |
are the other directories from ftp://ftp.hp.com/ also going to be done? |
19:28
🔗
|
arkiver |
and do you some kind of tutorial on how to create a ftp copy to upload like the other ftp uploads? |
19:29
🔗
|
yipdw |
something tells me it'd be funny to write an FTP server that used IA as a backend |
19:29
🔗
|
yipdw |
although I guess you can do that now with the IA FUSE module |