#archiveteam 2013-11-12,Tue

↑back Search

Time Nickname Message
00:26 πŸ”— SketchCow Upload stuff and tell me if you want to see it added.
00:57 πŸ”— odie5533 I take it the external drives were cheaper to buy than OEM/internal ones? https://en.wikipedia.org/wiki/File:Incoming_additional_storage_at_Internet_Archive.jpg
00:58 πŸ”— odie5533 Nemo_bis: using x-archive-meta-mediatype ?
01:12 πŸ”— joepie91 odie5533: yes
01:12 πŸ”— joepie91 that was during the thailand floods
01:12 πŸ”— joepie91 externals were cheaper than internals generally
01:25 πŸ”— ivan` plastic case and USB3 controller = packing material ;)
01:47 πŸ”— odie5533 How do I get a warc upload added to the waybackmachine?
02:39 πŸ”— ivan` odie5533: SketchCow or another IAer has to give it the 'web' mediatype
02:50 πŸ”— odie5533 ivan`: I already set it to web
02:52 πŸ”— Cowering lol.. some poor lacky had to bust all those externals open. If they were USB3 you could get a few bucks each out of them.. trade 20 for one more HD
03:53 πŸ”— Coderjoe Nemo_bis: ( is it included twice?) yes, i'm almost certain it is. tar adds files in the order listed. I've tar'd up source code into smaller files by reversing strings, sorting, and reversing again, to get the mostly-similar files as close as possible. (pre 7z/lzma era)
03:56 πŸ”— Coderjoe is there some manifest of google video content i can check against to see if the files on this server are stored elsewhere?
06:42 πŸ”— Lord_Nigh http://ask.metafilter.com/251635/With-every-good-wish-cordially
06:43 πŸ”— Lord_Nigh no idea what should happen there. maybe SketchCow could tell the op what to do, but those tapes should either go to ann druyan (carl sagan's widow) or the library of congress (where the rest of carl sagan's papers reside) most likely
06:43 πŸ”— Lord_Nigh in my opinion)(
07:58 πŸ”— Nemo_bis odie5533: yes
07:58 πŸ”— odie5533 Nemo_bis: thanks. I was able to set it.
07:58 πŸ”— Nemo_bis Coderjoe: sure, it follows the same order, but does it add *the same file* twice if listed twice in the same command?
07:58 πŸ”— Nemo_bis :)
07:58 πŸ”— odie5533 Nemo_bis: after using the s3 interface, the normal interface to add files is broken.
08:01 πŸ”— odie5533 Nemo_bis: it only shows the original files, and then when I try to upload another file using the web interface, it deletes the files I'd uploaded via the s3 interface.
08:02 πŸ”— odie5533 chfoo: hey. Does warcat have its own code for reading in WARC files, similar to hanzo warctools?
08:04 πŸ”— Nemo_bis odie5533: never saw such a thing
08:04 πŸ”— Nemo_bis remember not to do two things at once, or one upload/derive process may overwrite/delete the files of the other
08:06 πŸ”— chfoo odie5533: it's an independent implementation of the warc file format. the inspiration for writing it was to have more diversity so i purposely didn't look at the code in hanzo warctools.
08:08 πŸ”— SketchCow That Sagan thing is stupid and I won't set foot in it.
08:12 πŸ”— RedType Kurisu: Yes, we can safely say that we have a very good system to detect penis drawings. Hopefully playerҀ™s Miiverse experiences will be a fun one!
08:12 πŸ”— RedType According to a survey of Wii U users, around 20% of drawings encountered through the Miiverse include one or more penises.
08:13 πŸ”— RedType today i learned: the new chromebook update interprets right click as paste in crosh!
08:13 πŸ”— RedType Kurisu: Yes, we can safely say that we have a very good system to detect penis drawings. Hopefully playerҀ™s Miiverse experiences will be a fun one!
08:13 πŸ”— RedType According to a survey of Wii U users, around 20% of drawings encountered through the Miiverse include one or more penises.
08:14 πŸ”— RedType you know, im going to just stop trying to open that url
08:14 πŸ”— Lord_Nigh http://www.p4rgaming.com/iwata-asks-miiverse-penis-drawing-detection-took-weeks-to-develop/ <- not sure if serious or not
08:15 πŸ”— RedType it's not real fyi
08:16 πŸ”— RedType i guess that's what i get for being on the alpha channel
08:24 πŸ”— odie5533 chfoo: Would you recommend its usage over warctools? Do you know how it compares to warctools?
08:29 πŸ”— chfoo odie5533: it's up to you. it's not meant to replace warctools but let you choose an alternative. i don't know how it compares, someone said they were going to do a comparison report but that person hasn't come back.
08:30 πŸ”— odie5533 chfoo: Does warcat store an entire record in memory was it reads?
08:30 πŸ”— odie5533 *as it reads
08:34 πŸ”— chfoo odie5533: no, it buffers parts of the files to disk so its a bit disk io heavy
08:34 πŸ”— odie5533 huh, what parts does it buffer, and why?
08:37 πŸ”— chfoo odie5533: because python doesn't offer a low level api to gzip to seek gzip boundaries, i have to seek the entire warc.gz file. so what it does is keep 500MB decompressed chunks before and after the current location in the file so seeking is faster.
08:44 πŸ”— chfoo oh, i also noticed both warctools and mine cannot handle gzip compressed chunked transfer encoding
08:44 πŸ”— odie5533 warctools seems to use a GzipRecordFile class for seeking, though I'm not sure
08:55 πŸ”— odie5533 chfoo: The warctools code appears to call file_handle.seek(offset) and then open the file_handle as a gzip. Does this method work?
09:02 πŸ”— chfoo odie5533: as long as it doesn't seek backwards, or all around, it should be ok. i need to use peek() for gzip but it was only recently implemented in python 3.2. so i ended writing up buffering so i could seek anywhere i want.
09:08 πŸ”— yipdw chfoo: there's a utility in warctools that decompresses and dechunks gzip-compressed chunked data
09:09 πŸ”— yipdw warc2warc -D
09:29 πŸ”— chfoo yipdw: yeah, i know. warctools has a lot more features. i was just explaining why i went for a pseudo clean room design approach.
09:44 πŸ”— Sum1 This is going to sound like a simple question, but how would I search a site through a web archive? For example, I see userscripts.org is being backed up, but how would one search it after it's submitted to IA?
09:46 πŸ”— odie5533 There's no good way, but you can search the urls
09:46 πŸ”— odie5533 by adding /* to the end of the url in the archive view thing
10:10 πŸ”— Sum1 Hmm, so if the title isn't in the URL there's no advanced search option to search text?
10:11 πŸ”— Sum1 That's the problem not having any index of site content, it requires knowledge of the specific address to be useful in the future.
10:24 πŸ”— odie5533 Sum1: There is I believe the archiveit archive, which indexes it
10:24 πŸ”— odie5533 but most of the stuff is not indexed as far as I know.
10:25 πŸ”— odie5533 Sum1: https://archive-it.org/
10:27 πŸ”— Sum1 Nice. Seems to be used only for colleges, etc though.
15:02 πŸ”— Cowering Lord_Nigh you around?
16:25 πŸ”— Schbirid there is a way to get FLACs from jamendo now, one by one. it would probably be best if one person did this, so i am not posting the details directly here. contact me if you cant find it yourself. might be great to get them added to the existing album dump (and maybe update that?) to replace the mp3 and mp3->ogg transcodes with properly named files
16:27 πŸ”— Lord_Nigh Cowering: back
16:54 πŸ”— Lord_Nigh Cowering: poke
17:07 πŸ”— Cowering [Tue-11:06] <Cowering> 2sf dsf gbs gsf hoot nsf psf psf3 s98 spc usf wsr xbox 3do fmtowns gcn hes kss ncd pc psf2 psp smd ssf wii x360
17:07 πŸ”— Cowering [Tue-11:06] <Cowering> none are on wayback machine but he does not care if you mirror since he last updated them in 2010
17:07 πŸ”— Cowering [Tue-11:06] <Cowering> stick that as the subdomains for joshw.info and watch the fun
17:07 πŸ”— Cowering [Tue-11:07] <Cowering> about 800GB
17:07 πŸ”— Cowering [Tue-11:07] <Lord_Nigh> jeez
17:07 πŸ”— Cowering [Tue-11:07] <Lord_Nigh> should post that in #archiveteam
17:08 πŸ”— Cowering i'm BW limited.. so go to it guys :)
17:08 πŸ”— DFJustin sounds like a job for SketchCow
17:08 πŸ”— Cowering some of the filenames look like toesuck.. do they track music?
19:44 πŸ”— edsu anyone know why the 301works urlteam content is marked as not publicly accessible on Internet Archive? https://archive.org/details/301works
19:54 πŸ”— Nemo_bis edsu: what item(s)?
19:54 πŸ”— edsu try the bit.ly ones
19:55 πŸ”— edsu https://ia801505.us.archive.org/4/items/Jan2010Part00000/
19:55 πŸ”— edsu jan2010-00000
19:56 πŸ”— xmc hum
19:56 πŸ”— xmc weird
21:52 πŸ”— balrog can someone op a bunch of us?
21:53 πŸ”— xmc have some snail hats
21:53 πŸ”— BlueMax thanks chronomex :P
21:54 πŸ”— xmc and before you go all "who the fuck is chazchaz", he's a friend of mine who seems to have a stable connection
21:54 πŸ”— BlueMax hey hey I trust ya bro
21:54 πŸ”— xmc it's not you I'm worried about
21:54 πŸ”— * xmc points
21:55 πŸ”— xmc that guy.
21:55 πŸ”— * BlueMax ducks
21:55 πŸ”— xmc it's that guy I'm scared of.
22:44 πŸ”— SmileyG So you should be xmc now stay on topic!
22:44 πŸ”— BlueMax >___>
23:52 πŸ”— * GLaDOS slaps xmc down
23:52 πŸ”— GLaDOS NO.

irclogger-viewer