[00:13] godane: How much data did you get before you had to stop? [00:21] it was only 2.6gb [00:21] i may do a it different now looking at [00:26] anyways the worst that would happen is that all of this data goes to FOS [00:32] ok i'm starting to upload it to FOS [01:05] what's the purpose of the files ftp_check.py produces in the `archive` directory? [01:15] *** brayden_ has joined #archiveteam-bs [01:15] *** swebb sets mode: +o brayden_ [01:20] *** brayden has quit IRC (Ping timeout: 633 seconds) [01:36] *** coretx has quit IRC (Remote host closed the connection) [01:37] *** Somebody has joined #archiveteam-bs [01:39] *** coretx has joined #archiveteam-bs [01:50] *** DiscantX has joined #archiveteam-bs [02:08] *** i336 has quit IRC (Remote host closed the connection) [02:13] *** DiscantX has quit IRC (Read error: Operation timed out) [02:14] *** DFJustin has quit IRC (Remote host closed the connection) [02:15] *** Start has quit IRC (Remote host closed the connection) [02:19] *** Start has joined #archiveteam-bs [02:19] *** DFJustin has joined #archiveteam-bs [02:19] *** Start has quit IRC (Client Quit) [02:20] *** DFJustin has quit IRC (Remote host closed the connection) [02:25] *** DFJustin has joined #archiveteam-bs [02:33] *** Start has joined #archiveteam-bs [02:57] *** Ravenloft has quit IRC () [02:59] *** ravetcofx has quit IRC (Read error: Operation timed out) [03:09] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [03:15] *** ravetcofx has joined #archiveteam-bs [03:16] *** Sk1d has joined #archiveteam-bs [04:40] *** ravetcofx has quit IRC (Read error: Operation timed out) [04:49] *** ravetcofx has joined #archiveteam-bs [05:16] *** Frogging has quit IRC (El Psy Kongroo!) [05:24] *** Frogging has joined #archiveteam-bs [05:32] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [05:38] *** Sk1d has joined #archiveteam-bs [06:11] *** BlueMaxim has quit IRC (Read error: Operation timed out) [06:42] so i'm starting to upload BBC World Service Newshour [06:42] i have about 46gb from 2009-06-23 to end of 2011 [06:50] *** ravetcofx has quit IRC (Read error: Operation timed out) [07:06] *** ravetcofx has joined #archiveteam-bs [07:15] *** brayden_ is now known as brayden [07:19] SketchCow: did you missing getting a magazine called MCmicrocomputer? [07:20] its a computer magazine from italy [07:21] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [07:22] *** dashcloud has joined #archiveteam-bs [07:23] based on what i could tell is not on archive.org [07:24] *** BlueMaxim has joined #archiveteam-bs [07:45] *** Honno has joined #archiveteam-bs [08:20] *** ndiddy has quit IRC (Read error: Connection reset by peer) [09:05] *** kristian_ has joined #archiveteam-bs [09:21] *** DiscantX has joined #archiveteam-bs [09:29] *** GE has joined #archiveteam-bs [09:34] *** Smiley has joined #archiveteam-bs [09:35] *** RichardG_ has joined #archiveteam-bs [09:39] *** kanzure_ has joined #archiveteam-bs [09:39] *** Igloo_ has joined #archiveteam-bs [09:39] *** GE has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** RichardG has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** Yoshimura has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** godane has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** SmileyG has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** achip has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** superkuh has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** Igloo has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** kanzure has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** tpw_rules has quit IRC (hub.efnet.us irc.Prison.NET) [09:39] *** aschmitz has quit IRC (hub.efnet.us irc.Prison.NET) [09:52] *** Honno has quit IRC (Read error: Operation timed out) [09:56] *** godane has joined #archiveteam-bs [10:50] *** DiscantX has quit IRC (Ping timeout: 492 seconds) [11:14] *** superkuh has joined #archiveteam-bs [11:15] *** Igloo_ is now known as Igloo [11:46] PurpleSym: what did you make https://archive.org/download/ftp-mayn-de-2016-08-04 with? [11:47] downloading it now too, but if you're here, please let me know [11:57] arkiver: Uh, wget, I think. [11:58] Yes, header confirms it was wget. [12:08] *** BlueMaxim has quit IRC (Read error: Operation timed out) [12:12] ah, I didn't know wget does FTP to WARC [12:22] *** GE has joined #archiveteam-bs [12:47] *** Honno has joined #archiveteam-bs [13:29] I did let my slow FTP crawler (1 folder per ~1.5s) take over ftp.sec.gov over night, just to make a file list and get an estimated total size. It's still not finished: the file list alone is 18MB. [13:31] most files are 4-20kB in size [13:58] *** RichardG_ is now known as RichardG [14:25] *** wp494 has quit IRC (Ping timeout: 506 seconds) [14:27] *** wp494 has joined #archiveteam-bs [15:22] *** REiN^ has quit IRC (Max SendQ exceeded) [15:22] *** REiN^ has joined #archiveteam-bs [15:31] *** Start has quit IRC (Quit: Disconnected.) [15:32] *** REiN^ has quit IRC (Max SendQ exceeded) [15:32] *** REiN^ has joined #archiveteam-bs [16:35] VADemon: what information are saving about the files? [16:41] *** godane has quit IRC (Quit: Leaving.) [16:41] *** godane has joined #archiveteam-bs [16:54] ContraCT: download went fine, I think [16:54] oops [17:05] *** Somebody has quit IRC (Ping timeout: 370 seconds) [17:07] *** kristian_ has quit IRC (Quit: Leaving) [17:09] *** Fletcher has quit IRC (Ping timeout: 244 seconds) [17:16] *** Fletcher has joined #archiveteam-bs [17:51] *** Stiletto has quit IRC () [18:00] *** Start has joined #archiveteam-bs [18:12] *** Start has quit IRC (Quit: Disconnected.) [18:59] *** GE_ has joined #archiveteam-bs [19:00] *** GE has quit IRC (Ping timeout: 255 seconds) [19:00] *** GE_ is now known as GE [19:23] godane: There's anything I possibly could get [19:23] I always miss a few. [19:32] *** jrwr has quit IRC (Remote host closed the connection) [19:38] i think MC microcomputer may have been missed cause its on issuu [19:38] anyways i'm grabbing it and making cbz files out the images [19:42] even there website only directly goes to the issuu.com urls: http://www.mc-online.it/ [19:43] and to top it off i had to fix my issuu.sh script so it will work with the new html code [19:43] *** Start has joined #archiveteam-bs [20:24] *** bwn has quit IRC (Ping timeout: 244 seconds) [20:25] *** Stiletto has joined #archiveteam-bs [20:35] *** bwn has joined #archiveteam-bs [20:37] *** Stiletto has quit IRC (Ping timeout: 362 seconds) [20:38] *** Stiletto has joined #archiveteam-bs [20:53] *** tpw_rules has joined #archiveteam-bs [20:58] *** Start has quit IRC (Quit: Disconnected.) [21:03] *** Start has joined #archiveteam-bs [21:12] *** Stilett0 has joined #archiveteam-bs [21:14] *** Stiletto has quit IRC (Ping timeout: 246 seconds) [21:20] *** antomati_ is now known as antomatic [21:24] *** Stilett0 has quit IRC (Read error: Connection reset by peer) [21:30] *** DiscantX has joined #archiveteam-bs [21:49] *** Stiletto has joined #archiveteam-bs [21:50] waybackmachine.org in google - no meta data because of robots.txt [21:50] I LOL'd [21:54] *** DiscantX has quit IRC (Ping timeout: 492 seconds) [21:58] Anyone know if there is another way to access the wayback machine? I'm trying to get around a web blocker [22:02] Pay for proxy [22:15] *** Famicoma1 has quit IRC (Ping timeout: 260 seconds) [22:19] There isn't an alternate domain similar to how archive.is has archive.li and archive.fo ? [22:20] set up a server [22:20] SSH tunnel [22:22] *** Stiletto has quit IRC (Remote host closed the connection) [22:22] hook54321: remind me in the week, I'll sort something [22:22] *** Stiletto has joined #archiveteam-bs [22:27] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [22:28] *** Stiletto has joined #archiveteam-bs [22:28] *** Start has quit IRC (Quit: Disconnected.) [22:35] *** sep332 has quit IRC (Quit: konversation out) [22:50] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [22:51] *** RichardG has joined #archiveteam-bs [22:57] *** kristian_ has joined #archiveteam-bs [23:00] HCross2: Remind you in a week or during the week? [23:09] arkiver: file paths and sizes. it's still not finished. [23:09] the website states that users shall use the index files to quickly get what they want, so if we believe that these indexes are a complete representation of the server, we may use them to plan out the mirroring [23:11] *** BlueMaxim has joined #archiveteam-bs [23:13] 345k+ files so far. i think of a warrior project..? [23:13] *** GE has quit IRC (Remote host closed the connection) [23:23] *** Smiley has quit IRC (Ping timeout: 250 seconds) [23:24] *** Famicoma1 has joined #archiveteam-bs [23:27] *** Smiley has joined #archiveteam-bs [23:31] *** wp494_ has joined #archiveteam-bs [23:32] *** wp494 has quit IRC (Ping timeout: 245 seconds) [23:33] SketchCow: i got ftp://ftp.sec.noaa.gov [23:33] *** wp494_ is now known as wp494 [23:35] How big [23:36] 2.6gb zip [23:36] its about 3.8gb uncompress [23:37] i'm going after smaller ftps so i'm not stuck uploading it forever [23:47] it might be better to grab FTPs as WARCs [23:51] *** Stiletto has quit IRC (Read error: Operation timed out) [23:52] i'm just grabbing them has zips for the file boneyards [23:52] *** Stiletto has joined #archiveteam-bs