#archiveteam 2014-01-15,Wed

↑back Search

Time Nickname Message
00:44 🔗 godane so i got the very first ac360 podcast
00:44 🔗 godane so looks like i'm maybe able to get all of them
03:43 🔗 xmc SketchCow: I expect a 150M .txt.gz of ip addresses
03:44 🔗 xmc about 26,400,000 addrs
04:42 🔗 xmc now I have a way to check them automatedly
04:42 🔗 xmc ncftp is fortunately not shitty
05:00 🔗 xmc https://github.com/ArchiveTeam/ftp-nab
05:27 🔗 bsmith093 blit wanted an asstr archive , already done one recently
05:42 🔗 xmc bsmith093: I can't parse that, say again?
05:43 🔗 bsmith093 i'm scrolling the the logs, and what i meant to say was that i've already uploaded a fairly recent asstr archive, as of about a year ago
05:56 🔗 xmc hm, ok. what's asstr?
07:37 🔗 Nemo_bis Today I discovered two million+ pages wikis we archived just in time: http://wikiindex.org/Wikinfo (2013) http://wikiindex.org/WebsiteWiki (2012)
07:40 🔗 Nemo_bis xmc: great! I want to join the party and download some of those FTP sites when you're done nicely listing them :) no more huge ones though, at most 4-500 GB or so
07:42 🔗 SketchCow I finally wrote a script to deal with this nightmare site.
07:43 🔗 SketchCow It will: utterly download an entire subdirectory, remove the index.html?*=* files that happen, tar up the directory, delete it, and put a "already got it, don't get again" into a list.
07:47 🔗 SketchCow It's already yanked the machine back from the brink - had one drive full of 8tb of material.
07:49 🔗 midas always nice if it keeps downloading the same data over and over again
07:50 🔗 xmc yow
07:58 🔗 Nemo_bis SketchCow: how big will the uploaded items be?
07:59 🔗 SketchCow Probably 200gb apiece
07:59 🔗 Nemo_bis a script to download FTP sites without worrying about running our of disk or *cough cough* uploading 2 TB items would be wonderful
07:59 🔗 Nemo_bis sensible
07:59 🔗 SketchCow Hmmph.
08:00 🔗 SketchCow Well, the big issue right now is that a lot of things break on archive.org dealing with that.
08:01 🔗 SketchCow -rw-r--r-- 1 root root 46146570240 Jan 15 02:35 ftp.icm.edu.pl_amiga.tar
08:01 🔗 SketchCow -rw-r--r-- 1 root root 1003048960 Jan 15 07:39 ftp.icm.edu.pl_beos.tar
08:01 🔗 SketchCow -rw-r--r-- 1 root root 1185730560 Jan 15 07:14 ftp.icm.edu.pl_garbo.tar
08:01 🔗 SketchCow etc
08:05 🔗 SketchCow I suspect the issue is the FreeBSD and BSD directories.
08:06 🔗 SketchCow I think they're as big as they get
08:09 🔗 godane i found flash video of cnn going back to 2008
08:09 🔗 godane this a website video NOT a podcast
08:09 🔗 godane look here: http://money.cnn.com/sitemap_videos_0001.xml
08:10 🔗 godane if you know the image path you can find the video
08:10 🔗 godane money/video/news becomings money/big/news
08:11 🔗 godane then change host domain to ht3.cdn.turner.com and add _576x324_dl.flv to replace the Wxh.jpg
08:49 🔗 godane SketchCow: looks like the older videos maybe still around
12:28 🔗 ZoeB Does anyone have a full backup of this site? http://www.heavensgate.com/
12:30 🔗 joepie91 if not yet, then we will soon
12:30 🔗 * joepie91 added it to archivebot
12:32 🔗 ZoeB Thanks!
12:34 🔗 ZoeB I hear you're going after every FTP site now too?
12:37 🔗 ZoeB Might I ask you make sure ftp://ftp.modland.com/ is on that list? It's 81.4GB of Amiga mods and their derivatives (ie the music part of the demoscene), including tracker software and related utilities for multiple platforms. I would be grabbing it myself already but my Raspberry Pi doesn't have that much space, and I'm not leaving my laptop on for several weeks straight... :)
12:37 🔗 ZoeB Username "anonymous", no password, I believe. Be gentle / slow!
12:39 🔗 Baljem 81.4GB of MODs? oh my. having Inertia Player flashbacks now...
12:46 🔗 joepie91 ZoeB: sure, I'll grab that FTP too
12:46 🔗 joepie91 I think wget does FTP
12:46 🔗 joepie91 what
12:46 🔗 joepie91 er
12:46 🔗 joepie91 what kind of delay between requests would you recommend *
12:47 🔗 * joepie91 has 4TB disk now, so that wouldn't be a problem to grab
12:47 🔗 * joepie91 is also on 100mbit
12:48 🔗 Nemo_bis 100 both up and down?
12:49 🔗 Nemo_bis delay for FTP sites? hahahahhahahahahaha
12:50 🔗 joepie91 Nemo_bis: theoretical, yes
12:51 🔗 joepie91 practically, it's more like 85/55
12:51 🔗 joepie91 because my ISP is balls
12:51 🔗 joepie91 this is FttH, not cable, so such a large difference between theoretical and practical is ridiculous
12:51 🔗 joepie91 I -very- rarely hit 90mbit up
12:53 🔗 Nemo_bis mine is only 10 Mb/s but I always have 100 % of it
12:59 🔗 midas joepie91: he said be gentle
12:59 🔗 midas put some lube on your fiber.
12:59 🔗 joepie91 lol
12:59 🔗 joepie91 that's why I asked about a delay
13:51 🔗 ZoeB Sorry, was having lunch
13:51 🔗 ZoeB Back now!
13:51 🔗 ZoeB To give you an idea of how busy that FTP site usually is, there server_stats.txt file says:
13:51 🔗 ZoeB Number of bytes downloaded last 24 hours: 1751.2 MB
13:51 🔗 ZoeB Number of files downloaded last 24 hours: 19740
13:52 🔗 ZoeB s/there/their
13:52 🔗 ZoeB So, uh, please don't dwarf that, I guess! :)
13:54 🔗 ZoeB And yes, 81.4GB of MODs. :) 30GB of Fast Tracker 2 files, 17GB of Impulse Tracker files, etc etc. It's quite the collection!
13:55 🔗 ZoeB Thank you very much! ^.^
17:37 🔗 xmc isp claims I have 10M up, looks reasonable: http://zeppelin.xrtc.net/corp.xrtc.net/kyat.corp.xrtc.net/if_eth1-day.png
17:50 🔗 xmc had to stop the ftp scan early
17:50 🔗 xmc 17:16:00 83% (3h36m left); send: 3553485314 57.3 Kp/s (57.2 Kp/s avg); recv: 22972016 355 p/s (369 p/s avg); drops: 0 p/s (4 p/s avg); hits: 0.65%
17:51 🔗 xmc output from that run is at http://bl-r.com/trx/ftp.txt.gz
17:52 🔗 xmc (had to stop because the isp sent the hosting provider 5 nastygrams in 10 minutes)
18:13 🔗 xmc zcat | wc -l gives 22,961,651 addresses in that file
18:36 🔗 xmc ding-dong ditch
18:36 🔗 xmc people bitch
18:40 🔗 SketchCow I've downloaded modland before.
18:40 🔗 SketchCow All for it being downloaded again.
18:40 🔗 SketchCow The owner hates me
18:42 🔗 arkiver is schemer.com already fully saved?
19:12 🔗 SketchCow [2:12:01 PM] Hank Bromley (Internet Archive): for anyone keeping score at home, anand has succeeded in changing the size column in the metadata table from integer to bigint, and that monstrous 2.1 TB item has managed to update its row, which now shows a "size" value of 2331388015 (that's in KB)
19:12 🔗 SketchCow For the people not keeping track, that means that the Archive Team just forced Internet Archive to work with 2.1 terabyte items
19:22 🔗 SketchCow http://archive.org/details/http://tectonicablog.com/wp-content/uploads/2010/04/lakata.org-01.jpg
19:22 🔗 SketchCow http://archive.org/details/ftp-ftp.hp.com_pub-2013-10 sorry
19:26 🔗 arkiver wow...
19:26 🔗 arkiver that a really big item SketchCow.. great! :)
19:27 🔗 arkiver are the other directories from ftp://ftp.hp.com/ also going to be done?
19:28 🔗 arkiver and do you some kind of tutorial on how to create a ftp copy to upload like the other ftp uploads?
19:29 🔗 yipdw something tells me it'd be funny to write an FTP server that used IA as a backend
19:29 🔗 yipdw although I guess you can do that now with the IA FUSE module

irclogger-viewer