[00:27] *** SmileyG has quit IRC (Remote host closed the connection) [00:29] *** Kirk has quit IRC (Ping timeout: 240 seconds) [00:30] *** BlueMaxim has quit IRC (Ping timeout: 265 seconds) [00:30] *** Kirk has joined #archiveteam-bs [00:31] *** Start has quit IRC (Read error: Connection reset by peer) [00:31] *** BlueMaxim has joined #archiveteam-bs [00:33] *** Start has joined #archiveteam-bs [00:34] *** wp494 has quit IRC (Read error: Operation timed out) [00:37] *** Smiley has joined #archiveteam-bs [00:49] *** wp494 has joined #archiveteam-bs [00:52] *** GLaDOS has joined #archiveteam-bs [00:52] *** swebb sets mode: +o GLaDOS [01:11] *** aaaaaaaaa has quit IRC (Read error: Operation timed out) [01:14] *** aaaaaaaaa has joined #archiveteam-bs [01:15] *** dashcloud has quit IRC (Read error: Operation timed out) [01:21] *** dashcloud has joined #archiveteam-bs [01:32] *** primus104 has quit IRC (Leaving.) [02:39] *** RedType_ has quit IRC (Remote host closed the connection) [02:58] *** mistym has quit IRC (Remote host closed the connection) [03:29] *** mistym has joined #archiveteam-bs [03:38] *** dashcloud has quit IRC (hub.dk irc.homelien.no) [03:38] *** ionpulse has quit IRC (hub.dk irc.homelien.no) [03:38] *** pikhq has quit IRC (hub.dk irc.homelien.no) [03:38] *** altlabel has quit IRC (hub.dk irc.homelien.no) [03:54] *** dashcloud has joined #archiveteam-bs [04:04] *** SN4T14 has joined #archiveteam-bs [05:00] *** mistym has quit IRC (Remote host closed the connection) [05:23] *** aaaaaaaaa has quit IRC (Leaving) [05:45] *** mistym has joined #archiveteam-bs [05:57] *** mistym has quit IRC (Remote host closed the connection) [06:03] *** RedType has joined #archiveteam-bs [06:12] *** RedType has quit IRC (Quit: Lost terminal) [06:19] *** RedType has joined #archiveteam-bs [06:23] *** mistym has joined #archiveteam-bs [06:26] so there maybe a archive of pri the world on audible.com [06:33] *** RedType has quit IRC (Client Quit) [07:04] *** Muad-Dib has quit IRC (Ping timeout: 260 seconds) [07:08] *** Muad-Dib has joined #archiveteam-bs [07:15] *** pikhq has joined #archiveteam-bs [07:17] *** ionpulse has joined #archiveteam-bs [07:19] https://medium.com/message/archive-fever-2a330b627274 [07:24] "The archivist produces more archive, and that is why the archive is never closed. It opens out of the future." [08:00] can we crawl the arduino sites? there's some sort of split happening http://hackaday.com/2015/02/25/arduino-v-arduino/ [08:12] *** mistym has quit IRC (Remote host closed the connection) [08:18] *** primus104 has joined #archiveteam-bs [08:25] *** acridAxid has quit IRC (Quit: Quitting) [08:29] *** dashcloud has quit IRC (west.us.hub irc.eversible.com) [08:29] *** wp494 has quit IRC (west.us.hub irc.eversible.com) [08:29] *** rejon has quit IRC (west.us.hub irc.eversible.com) [08:29] *** Famicoma1 has quit IRC (west.us.hub irc.eversible.com) [08:29] *** balrog has quit IRC (west.us.hub irc.eversible.com) [08:29] *** lrkj has quit IRC (west.us.hub irc.eversible.com) [08:29] *** slash` has quit IRC (west.us.hub irc.eversible.com) [08:29] *** Baljem has quit IRC (west.us.hub irc.eversible.com) [08:29] *** espes__ has quit IRC (west.us.hub irc.eversible.com) [08:29] *** Mayonaise has quit IRC (west.us.hub irc.eversible.com) [08:29] *** marvinw has quit IRC (west.us.hub irc.eversible.com) [08:29] *** Cameron_D has quit IRC (west.us.hub irc.eversible.com) [08:29] *** ohhdemgir has quit IRC (west.us.hub irc.eversible.com) [08:29] *** eprillios has quit IRC (west.us.hub irc.eversible.com) [08:29] *** Rickster has quit IRC (west.us.hub irc.eversible.com) [08:29] *** fenn has quit IRC (west.us.hub irc.eversible.com) [08:29] *** xmc has quit IRC (west.us.hub irc.eversible.com) [08:30] *** acridAxid has joined #archiveteam-bs [08:30] *** wp494_ has joined #archiveteam-bs [08:30] *** wp494_ has quit IRC (Excess Flood) [08:30] *** wp494_ has joined #archiveteam-bs [08:31] *** Rickster` has joined #archiveteam-bs [08:32] *** lrkj_ has joined #archiveteam-bs [08:36] *** acridAxid has quit IRC (Read error: Operation timed out) [08:41] *** acridAxid has joined #archiveteam-bs [08:44] *** Rickster` is now known as Rickster [08:44] *** rejon has joined #archiveteam-bs [08:44] *** balrog has joined #archiveteam-bs [08:44] *** slash` has joined #archiveteam-bs [08:44] *** Baljem has joined #archiveteam-bs [08:44] *** espes__ has joined #archiveteam-bs [08:44] *** Mayonaise has joined #archiveteam-bs [08:44] *** marvinw has joined #archiveteam-bs [08:44] *** Cameron_D has joined #archiveteam-bs [08:44] *** xmc has joined #archiveteam-bs [08:44] *** irc.eversible.com sets mode: +oo balrog xmc [08:44] *** swebb sets mode: +o balrog [08:44] *** swebb sets mode: +o xmc [08:48] *** fenn has joined #archiveteam-bs [08:51] *** fenn has quit IRC (west.us.hub irc.eversible.com) [08:51] *** rejon has quit IRC (west.us.hub irc.eversible.com) [08:51] *** balrog has quit IRC (west.us.hub irc.eversible.com) [08:51] *** slash` has quit IRC (west.us.hub irc.eversible.com) [08:51] *** Baljem has quit IRC (west.us.hub irc.eversible.com) [08:51] *** espes__ has quit IRC (west.us.hub irc.eversible.com) [08:51] *** Mayonaise has quit IRC (west.us.hub irc.eversible.com) [08:51] *** marvinw has quit IRC (west.us.hub irc.eversible.com) [08:51] *** Cameron_D has quit IRC (west.us.hub irc.eversible.com) [08:51] *** xmc has quit IRC (west.us.hub irc.eversible.com) [08:54] *** espes___ has joined #archiveteam-bs [08:55] *** schbirid has joined #archiveteam-bs [08:55] *** marvinw_ has joined #archiveteam-bs [08:55] *** Baljem_ has joined #archiveteam-bs [09:06] *** eprillios has joined #archiveteam-bs [09:10] *** dashcloud has joined #archiveteam-bs [09:10] *** rejon has joined #archiveteam-bs [09:10] *** balrog has joined #archiveteam-bs [09:10] *** Mayonaise has joined #archiveteam-bs [09:10] *** Cameron_D has joined #archiveteam-bs [09:10] *** xmc has joined #archiveteam-bs [09:10] *** irc.eversible.com sets mode: +oo balrog xmc [09:10] *** swebb sets mode: +o balrog [09:10] *** swebb sets mode: +o xmc [09:19] *** fenn has joined #archiveteam-bs [09:21] *** primus104 has quit IRC (Leaving.) [09:46] *** eprillios has quit IRC (Ping timeout: 506 seconds) [09:52] *** eprillios has joined #archiveteam-bs [09:59] *** Famicoman has joined #archiveteam-bs [10:18] *** swebb has quit IRC (Read error: Operation timed out) [10:22] *** swebb has joined #archiveteam-bs [12:00] *** slash` has joined #archiveteam-bs [12:11] whoop whoop [12:11] first mirror-to-IA code in Node.js completed [12:11] handles unicode, newlines, the whole shebang :D [12:12] * ersi pats joepie91_ [12:23] * BlueMaxim pats ersi [12:25] *** primus104 has joined #archiveteam-bs [12:27] * ersi explodes [12:53] damnit BlueMaxim [12:53] I told you not to pat the creeper [12:54] now I have to rebuild my code [12:54] :( [12:56] * BlueMaxim is exploded [13:04] *** BlueMaxim has quit IRC (Quit: Leaving) [13:11] *** wp494_ has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) [13:11] *** wp494 has joined #archiveteam-bs [13:24] *** ohhdemgir has joined #archiveteam-bs [13:46] *** sankin has joined #archiveteam-bs [14:50] *** sankin has quit IRC (Leaving.) [14:58] *** sankin has joined #archiveteam-bs [15:02] *** primus104 has quit IRC (Leaving.) [15:32] *** mistym has joined #archiveteam-bs [15:50] *** mistym has quit IRC (Remote host closed the connection) [15:53] *** mhazinsk has quit IRC (Ping timeout: 186 seconds) [15:56] *** mhazinsk has joined #archiveteam-bs [16:11] *** mistym has joined #archiveteam-bs [16:33] *** Rickster has quit IRC (hub.se efnet.port80.se) [16:33] *** Muad-Dib has quit IRC (hub.se efnet.port80.se) [16:33] *** GLaDOS has quit IRC (hub.se efnet.port80.se) [16:33] *** WubTheCap has quit IRC (hub.se efnet.port80.se) [16:33] *** Sue_ has quit IRC (hub.se efnet.port80.se) [16:33] *** danneh_ has quit IRC (hub.se efnet.port80.se) [16:33] *** deathy has quit IRC (hub.se efnet.port80.se) [16:36] *** aaaaaaaaa has joined #archiveteam-bs [16:37] *** Sue__ has joined #archiveteam-bs [16:49] *** WubTheCap has joined #archiveteam-bs [17:12] *** mistym has quit IRC (Remote host closed the connection) [17:27] *** Start_ has joined #archiveteam-bs [17:27] *** Start has quit IRC (Read error: Connection reset by peer) [17:27] *** Start_ is now known as Start [17:38] *** primus104 has joined #archiveteam-bs [17:46] *** primus104 has quit IRC (Leaving.) [17:59] *** xmc sets mode: +o swebb [18:06] *** mistym has joined #archiveteam-bs [18:07] *** mistym_ has joined #archiveteam-bs [18:16] *** mistym has quit IRC (Ping timeout: 600 seconds) [18:27] *** primus104 has joined #archiveteam-bs [18:34] *** slash` has quit IRC (Ping timeout: 512 seconds) [18:36] *** balrog has quit IRC (Ping timeout: 512 seconds) [18:37] *** balrog has joined #archiveteam-bs [18:37] *** swebb sets mode: +o balrog [19:11] *** RedType has joined #archiveteam-bs [19:23] *** slash` has joined #archiveteam-bs [19:25] *** RedType_ has joined #archiveteam-bs [19:26] *** RedType_ has quit IRC (Client Quit) [19:28] *** RedType_ has joined #archiveteam-bs [19:42] *** RedType has quit IRC (Quit: Lost terminal) [20:12] SketchCow: i'm grabbing web ahead episodes [20:13] *** kyan has quit IRC (Quit: Leaving) [20:17] *** BlueMaxim has joined #archiveteam-bs [20:19] Wow, unbelievably sad.. [20:19] Idiots in "IS" have destroyed pieces up to 3500 year old at the museum of Mosul, northern Iraq [20:23] i heard about that yesterday on glenn beck show [20:30] was the Google Reader grab really 9TB or am I mathing wrong? [20:59] *** RedType_ has quit IRC (Quit: leaving) [20:59] *** RedType has joined #archiveteam-bs [21:01] http://imgur.com/gallery/KO9V4 [21:07] i'm uploading 2008-03 urls of theguardian.com [21:08] turns out i grab that in 20141006 [21:10] *** mistym_ has quit IRC (Remote host closed the connection) [21:31] they took out the mosul library too http://www.csmonitor.com/Books/chapter-and-verse/2015/0225/ISIS-burns-Mosul-library-Why-terrorists-target-books [21:56] *** mistym has joined #archiveteam-bs [22:01] *** sankin has quit IRC (Leaving.) [22:15] *** n00b740 has joined #archiveteam-bs [22:15] *** n00b740 has quit IRC (Client Quit) [22:28] *** lelandbat has joined #archiveteam-bs [22:28] *** cbb2 has joined #archiveteam-bs [22:29] Hello? [22:31] sup [22:33] I was invited to the channel by sep332 to discuss a quandary I'm facing [22:33] I'll just dump here: [22:36] A while ago, I built a site to make gifs out of YouTube videos. I put no restrictions on size, duration, or number. Since then, about 30,000 gifs have been made, about 200GB in total. [22:38] I don't want to host 200GB of gifs, many of which are not particularly popular. What do I do? [22:39] hm [22:39] Or more accurately, what would people recommend that I do? I have a backup of all gifs on my own computer, but I've been deleting old gifs on the site in an effort to reduce space. [22:39] do you want to keep them at their current urls [22:39] ok [22:39] so, if you don't care that much, stick them into a .tar file and then http://archive.org/upload/ [22:40] best to include a file with original urls, other metadata [22:40] How would I include this metadata? [22:40] uh, what do you have now and what form is it in [22:42] so i think archive doesn't count duplicates right [22:43] the 0511full.wma in theworld.ort/content url will say the 2 urls are unique [22:43] i grabbed both mp3s and they have the same md5sum [22:47] Right now, I have no metadata (other than the preserved file creation dates). Since the files where just served out of an apache directory, the url for any given gif is just a hostname+path+gif_name [22:52] What format should metadata be in? [22:59] if all you've got is file modification times and paths, tar should be fin [22:59] e [22:59] if you have anything else relevant, like what youtube video they came out of, you probably should include that as well [23:00] in a csv or whatever. it's not a giant machine-readable dataset. probably just useful to humans. i'd suggest writing up a quick readme too [23:17] *** cbb2 has quit IRC (hub.dk irc.efnet.pl) [23:17] *** primus104 has quit IRC (hub.dk irc.efnet.pl) [23:17] *** schbirid has quit IRC (hub.dk irc.efnet.pl) [23:17] *** Zebranky_ has quit IRC (hub.dk irc.efnet.pl) [23:17] *** cbb2 has joined #archiveteam-bs [23:17] *** primus104 has joined #archiveteam-bs [23:17] *** schbirid has joined #archiveteam-bs [23:17] *** Zebranky_ has joined #archiveteam-bs [23:18] Ah, I see. But if I could get that metadata, such as the original video a gif was from, that would be interesting? [23:22] IA can address files inside of .tar as well so you could set up your server to redirect to the archived versions and not break links [23:22] the more metadata the merrier generally [23:24] How would I set up my server to redirect to archived versions? I thought about doing that myself: converting the gifs into much more efficient webm videos, then redirecting to those. I'd be fine with that since webm is about 15x smaller than gifs (in my testing), but I don't know how to set up the redirects. [23:24] *** BlueMaxim has quit IRC (Read error: Operation timed out) [23:24] look for a mod_rewrite tutorial for apache, it's not too hard [23:31] once you have the .tar uploaded to an internet archive item like this https://archive.org/details/ftp.cs.umanitoba.ca [23:32] https://archive.org/details/Y2K_Family_Survival_Guide_With_Leonard_Nimoy_Palsojom1.X264.CG [23:32] you can then access files inside like this http://archive.org/download/ftp.cs.umanitoba.ca/2014.06.ftp.cs.umanitoba.ca.tar/ftp.cs.umanitoba.ca/pub/andersj/nelson/CGoodmanR.jpg [23:32] basically archive.org/itemname/filename.tar/directories/inside/tar/file.name [23:33] oh, how cool! [23:33] er archive.org/download/itemname/filename.tar/directories/inside/tar/file.name [23:33] we've been systematically putting up old ftps this way https://archive.org/details/ftpsites [23:34] How could I efficiently upload 200+GB to Archive.org? [23:34] note that it will redirect to a url like ia802506.us.archive.org, you don't want to use that as periodically the ia##### server number will change [23:34] good to know [23:36] there is a command line upload tool https://pypi.python.org/pypi/internetarchive [23:36] perfect, thank you! [23:36] I would recommend testing with something less than 200gb first though [23:37] *** BlueMaxim has joined #archiveteam-bs [23:38] you can delete and replace files within an item [23:38] but you can't delete an item [23:38] you can mark an item as 'test' when creating it, which means it'll be deleted soonish [23:40] .zip also works but is probably pointless for .gif content [23:40] .tar.gz etc does not work [23:45] *** Jonimus has joined #archiveteam-bs [23:50] *** nico_32_ has joined #archiveteam-bs [23:50] *** nico_32 has quit IRC (Read error: Connection reset by peer) [23:50] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [23:53] *** BlueMaxim has joined #archiveteam-bs [23:57] yep. plain tarfile [23:58] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [23:59] *** BlueMaxim has joined #archiveteam-bs [23:59] is linking to files inside tar files performant enough and welcome by IA? [23:59] i mean lots of tiny files