[00:26] Upload stuff and tell me if you want to see it added. [00:57] I take it the external drives were cheaper to buy than OEM/internal ones? https://en.wikipedia.org/wiki/File:Incoming_additional_storage_at_Internet_Archive.jpg [00:58] Nemo_bis: using x-archive-meta-mediatype ? [01:12] odie5533: yes [01:12] that was during the thailand floods [01:12] externals were cheaper than internals generally [01:25] plastic case and USB3 controller = packing material ;) [01:47] How do I get a warc upload added to the waybackmachine? [02:39] odie5533: SketchCow or another IAer has to give it the 'web' mediatype [02:50] ivan`: I already set it to web [02:52] lol.. some poor lacky had to bust all those externals open. If they were USB3 you could get a few bucks each out of them.. trade 20 for one more HD [03:53] Nemo_bis: ( is it included twice?) yes, i'm almost certain it is. tar adds files in the order listed. I've tar'd up source code into smaller files by reversing strings, sorting, and reversing again, to get the mostly-similar files as close as possible. (pre 7z/lzma era) [03:56] is there some manifest of google video content i can check against to see if the files on this server are stored elsewhere? [06:42] http://ask.metafilter.com/251635/With-every-good-wish-cordially [06:43] no idea what should happen there. maybe SketchCow could tell the op what to do, but those tapes should either go to ann druyan (carl sagan's widow) or the library of congress (where the rest of carl sagan's papers reside) most likely [06:43] in my opinion)( [07:58] odie5533: yes [07:58] Nemo_bis: thanks. I was able to set it. [07:58] Coderjoe: sure, it follows the same order, but does it add *the same file* twice if listed twice in the same command? [07:58] :) [07:58] Nemo_bis: after using the s3 interface, the normal interface to add files is broken. [08:01] Nemo_bis: it only shows the original files, and then when I try to upload another file using the web interface, it deletes the files I'd uploaded via the s3 interface. [08:02] chfoo: hey. Does warcat have its own code for reading in WARC files, similar to hanzo warctools? [08:04] odie5533: never saw such a thing [08:04] remember not to do two things at once, or one upload/derive process may overwrite/delete the files of the other [08:06] odie5533: it's an independent implementation of the warc file format. the inspiration for writing it was to have more diversity so i purposely didn't look at the code in hanzo warctools. [08:08] That Sagan thing is stupid and I won't set foot in it. [08:12] Kurisu: Yes, we can safely say that we have a very good system to detect penis drawings. Hopefully playerÃ¢ÂÂs Miiverse experiences will be a fun one! [08:12] According to a survey of Wii U users, around 20% of drawings encountered through the Miiverse include one or more penises. [08:13] today i learned: the new chromebook update interprets right click as paste in crosh! [08:13] Kurisu: Yes, we can safely say that we have a very good system to detect penis drawings. Hopefully playerÃ¢ÂÂs Miiverse experiences will be a fun one! [08:13] According to a survey of Wii U users, around 20% of drawings encountered through the Miiverse include one or more penises. [08:14] you know, im going to just stop trying to open that url [08:14] http://www.p4rgaming.com/iwata-asks-miiverse-penis-drawing-detection-took-weeks-to-develop/ <- not sure if serious or not [08:15] it's not real fyi [08:16] i guess that's what i get for being on the alpha channel [08:24] chfoo: Would you recommend its usage over warctools? Do you know how it compares to warctools? [08:29] odie5533: it's up to you. it's not meant to replace warctools but let you choose an alternative. i don't know how it compares, someone said they were going to do a comparison report but that person hasn't come back. [08:30] chfoo: Does warcat store an entire record in memory was it reads? [08:30] *as it reads [08:34] odie5533: no, it buffers parts of the files to disk so its a bit disk io heavy [08:34] huh, what parts does it buffer, and why? [08:37] odie5533: because python doesn't offer a low level api to gzip to seek gzip boundaries, i have to seek the entire warc.gz file. so what it does is keep 500MB decompressed chunks before and after the current location in the file so seeking is faster. [08:44] oh, i also noticed both warctools and mine cannot handle gzip compressed chunked transfer encoding [08:44] warctools seems to use a GzipRecordFile class for seeking, though I'm not sure [08:55] chfoo: The warctools code appears to call file_handle.seek(offset) and then open the file_handle as a gzip. Does this method work? [09:02] odie5533: as long as it doesn't seek backwards, or all around, it should be ok. i need to use peek() for gzip but it was only recently implemented in python 3.2. so i ended writing up buffering so i could seek anywhere i want. [09:08] chfoo: there's a utility in warctools that decompresses and dechunks gzip-compressed chunked data [09:09] warc2warc -D [09:29] yipdw: yeah, i know. warctools has a lot more features. i was just explaining why i went for a pseudo clean room design approach. [09:44] This is going to sound like a simple question, but how would I search a site through a web archive? For example, I see userscripts.org is being backed up, but how would one search it after it's submitted to IA? [09:46] There's no good way, but you can search the urls [09:46] by adding /* to the end of the url in the archive view thing [10:10] Hmm, so if the title isn't in the URL there's no advanced search option to search text? [10:11] That's the problem not having any index of site content, it requires knowledge of the specific address to be useful in the future. [10:24] Sum1: There is I believe the archiveit archive, which indexes it [10:24] but most of the stuff is not indexed as far as I know. [10:25] Sum1: https://archive-it.org/ [10:27] Nice. Seems to be used only for colleges, etc though. [15:02] Lord_Nigh you around? [16:25] there is a way to get FLACs from jamendo now, one by one. it would probably be best if one person did this, so i am not posting the details directly here. contact me if you cant find it yourself. might be great to get them added to the existing album dump (and maybe update that?) to replace the mp3 and mp3->ogg transcodes with properly named files [16:27] Cowering: back [16:54] Cowering: poke [17:07] [Tue-11:06] 2sf dsf gbs gsf hoot nsf psf psf3 s98 spc usf wsr xbox 3do fmtowns gcn hes kss ncd pc psf2 psp smd ssf wii x360 [17:07] [Tue-11:06] none are on wayback machine but he does not care if you mirror since he last updated them in 2010 [17:07] [Tue-11:06] stick that as the subdomains for joshw.info and watch the fun [17:07] [Tue-11:07] about 800GB [17:07] [Tue-11:07] jeez [17:07] [Tue-11:07] should post that in #archiveteam [17:08] i'm BW limited.. so go to it guys :) [17:08] sounds like a job for SketchCow [17:08] some of the filenames look like toesuck.. do they track music? [19:44] anyone know why the 301works urlteam content is marked as not publicly accessible on Internet Archive? https://archive.org/details/301works [19:54] edsu: what item(s)? [19:54] try the bit.ly ones [19:55] https://ia801505.us.archive.org/4/items/Jan2010Part00000/ [19:55] jan2010-00000 [19:56] hum [19:56] weird [21:52] can someone op a bunch of us? [21:53] have some snail hats [21:53] thanks chronomex :P [21:54] and before you go all "who the fuck is chazchaz", he's a friend of mine who seems to have a stable connection [21:54] hey hey I trust ya bro [21:54] it's not you I'm worried about [21:54] * xmc points [21:55] that guy. [21:55] * BlueMax ducks [21:55] it's that guy I'm scared of. [22:44] So you should be xmc now stay on topic! [22:44] >___> [23:52] * GLaDOS slaps xmc down [23:52] NO.