[00:01] Hmm [00:01] *** tomwsmf_ has joined #archiveteam-bs [00:03] The archive was completed in 2012, not recently [00:26] Right. Unfortunately, something IRL is forcing me to leave. Thanks for the help. [00:26] *** Matt_Lock has quit IRC (Quit: ChatZilla 0.9.92 [Firefox 48.0.2/20160823121617]) [00:31] *** Aranje has quit IRC (Remote host closed the connection) [00:32] *** Aranje has joined #archiveteam-bs [00:34] lol [00:36] *** Aranje has quit IRC (Client Quit) [00:49] *** Stiletto has joined #archiveteam-bs [00:50] *** kristian_ has quit IRC (Quit: Leaving) [00:51] *** JesseW has joined #archiveteam-bs [00:53] *** balrog has quit IRC (Quit: Bye) [00:55] *** balrog has joined #archiveteam-bs [00:55] *** swebb sets mode: +o balrog [01:03] if Matt_Lock comes back, someone relay that one way to get individual entries out of a megawarc is [01:03] (1) scan the .cdx.idx for the URL they're looking for and note the byte offset and size [01:04] (2) download that byte offset and size out of the .cdx.gz [01:04] (3) get the byte offset and size out of the .cdx.gz [01:04] (4) use that in a ranged request on the original megawarc [01:05] dashcloud: http://schools.hsd.k12.or.us/Portals/20/academics/LHS%20Senior%20Portfolio%202015-16.pdf [01:07] worked example: https://gitlab.peach-bun.com/snippets/34 [01:08] I don't claim this isn't cumbersome, but it gets results [01:09] there are probably better ways but this is the first one that I thought up [01:19] *** Stiletto has quit IRC (Read error: Operation timed out) [01:20] *** Stiletto has joined #archiveteam-bs [01:45] *** JesseW has quit IRC (Ping timeout: 370 seconds) [01:50] *** kristian_ has joined #archiveteam-bs [02:03] yipdw: while I think that's useful, I know what the follow-up question(s) is, and we're stuck once again [02:22] *** kristian_ has quit IRC (Quit: Leaving) [02:25] *** JesseW has joined #archiveteam-bs [03:09] *** dashcloud has quit IRC (Read error: Operation timed out) [03:15] *** dashcloud has joined #archiveteam-bs [03:27] *** tomwsmf_ has quit IRC (Ping timeout: 255 seconds) [03:53] *** GE has joined #archiveteam-bs [04:39] *** GE has quit IRC (Remote host closed the connection) [04:43] Is there some sort of trick to getting pages like this to load without almost crashing my browser? http://web.archive.org/web/*/http://hsd.k12.or.us/* [04:43] Like, archive.is is even having issues with saving it. [04:54] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [05:01] *** Sk1d has joined #archiveteam-bs [05:06] Chromium 52 will load it fine here, but it's a giant amount of markup to load [05:07] actually never mind, that tab just ate it [05:07] I wonder if Servo can do it [05:07] ate it? [05:11] ok that's annoying, servo isn't even starting up for me now [05:11] boo [05:12] Is it just the URL? [05:13] no, it's the 64,288-ish URLs that each have their own markup [05:14] throwing that much HTML at any browser poses challenges [05:14] at some point I suspect a wayback engineer will implement pagination on that page [05:17] that page is also being streamed in and I wonder if that causes additional reflow work on each new chunk or something [05:30] *** balrog has quit IRC (Read error: Operation timed out) [05:34] *** balrog has joined #archiveteam-bs [05:34] *** swebb sets mode: +o balrog [06:03] weird. I thought the wayback had pagination in ye olden days [06:03] for wildcard searches [06:04] guess I was wrong, maybe there is a case for archiving wayback [06:05] well, screenshots thereof, certainly [06:05] maybe not actual pages [07:01] *** Stilett0 has joined #archiveteam-bs [07:02] *** JesseW has quit IRC (Read error: Operation timed out) [07:02] *** fie_ has joined #archiveteam-bs [07:03] *** Stiletto has quit IRC (Read error: Operation timed out) [07:09] *** fie__ has quit IRC (Read error: Operation timed out) [07:20] *** Stilett0 has quit IRC (Read error: Operation timed out) [07:20] *** Stiletto has joined #archiveteam-bs [07:34] *** dashcloud has quit IRC (Read error: Operation timed out) [07:37] *** dashcloud has joined #archiveteam-bs [07:52] *** schbirid has joined #archiveteam-bs [08:37] *** ravetcofx has quit IRC (Ping timeout: 370 seconds) [08:42] *** GE has joined #archiveteam-bs [09:51] *** GE has quit IRC (Quit: zzz) [10:00] "In this paper, we illustrate for the first time how an adversary could leverage a maliciously controlled charging station to exfiltrate data from the smartphone via a USB charging cable (i.e., without using the data transfer functionality), controlling a simple app running on the device, and without requiring any permission to be granted by the user to send data out of the device. We show the feasibility of the proposed attack through a [10:00] prototype implementation in Android, which is able to send out potentially sensitive information, such as IMEI, contacts' phone number, and pictures." [10:00] https://arxiv.org/abs/1609.02750 [10:11] *** godane has quit IRC (Ping timeout: 260 seconds) [10:23] *** VADemon has joined #archiveteam-bs [10:25] *** godane has joined #archiveteam-bs [10:30] joepie91: uh, if this requires that the user install an application, why doesn't it just exfiltrate the data over HTTP instead of the incredibly improbable charging vector? [10:32] this is the kind of incredibly lazy research people do when they can't come up with any good attacks [11:09] *** zenguy_pc has quit IRC (Excess Flood) [11:11] *** zenguy_pc has joined #archiveteam-bs [11:17] *** Honno has joined #archiveteam-bs [11:46] *** Honno_ has joined #archiveteam-bs [11:51] *** GE has joined #archiveteam-bs [11:55] *** Honno has quit IRC (Read error: Operation timed out) [12:26] *** kristian_ has joined #archiveteam-bs [12:55] so GoogleScraper in selenium mode is broken [12:56] or.. not always [12:58] 'num_results_for_query': 'Přibližný počet výsledků: 363 (0,25 s)\xa0', [12:58] I guess it can't parse this [12:58] so it only scrapes the front page [12:59] hmmmm. [12:59] https://archive.org/details/hackercons?and[]=defcon [12:59] is there a collection I'm not aware of [13:00] or is IA's DEFCON content just very incomplete? [13:01] oh hm, looks like --num-pages-for-keyword is a thing [13:02] it's just very crashy... [13:24] *** useretail has quit IRC (Ping timeout: 244 seconds) [13:39] *** tomwsmf_ has joined #archiveteam-bs [13:50] i love reddit and helping people, but holy crap some people need to stop being idiots [13:50] i don't mind you wanting scripting help because you don't know how to script, but when you don't even explain waht language or OS, wtf do you want me to do to help? lol [14:14] *** fie_ has quit IRC (Quit: Leaving) [14:15] *** Start has quit IRC (Quit: Disconnected.) [14:28] *** purplebot has quit IRC (Ping timeout: 244 seconds) [14:32] *** useretail has joined #archiveteam-bs [14:34] *** purplebot has joined #archiveteam-bs [15:03] *** rduser has quit IRC (Ping timeout: 260 seconds) [15:19] *** dashcloud has quit IRC (Read error: Operation timed out) [15:22] *** dashcloud has joined #archiveteam-bs [15:44] *** useretail has quit IRC (Ping timeout: 244 seconds) [15:48] *** ravetcofx has joined #archiveteam-bs [15:57] *** JesseW has joined #archiveteam-bs [15:57] *** BlueMaxim has quit IRC (Quit: Leaving) [15:58] *** rduser has joined #archiveteam-bs [16:10] *** metalcamp has joined #archiveteam-bs [16:23] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:35] *** dashcloud has quit IRC (Read error: Operation timed out) [16:37] *** dashcloud has joined #archiveteam-bs [16:44] *** JW_work has quit IRC (Quit: Leaving.) [16:46] *** JW_work has joined #archiveteam-bs [16:50] *** JW_work has quit IRC (Client Quit) [16:51] *** JW_work has joined #archiveteam-bs [16:53] *** JW_work has quit IRC (Client Quit) [16:54] *** JW_work has joined #archiveteam-bs [16:56] *** JW_work has quit IRC (Client Quit) [16:57] *** JW_work has joined #archiveteam-bs [17:23] *** useretail has joined #archiveteam-bs [17:46] *** Aranje has joined #archiveteam-bs [17:58] *** RedType has quit IRC (Read error: Operation timed out) [18:03] *** RedType has joined #archiveteam-bs [18:54] https://twitter.com/joepie91/status/775769220063846400 (cc SketchCow ) [19:03] this looks gross [19:05] that's because it is :) [19:05] *** VADemon has quit IRC (left4dead) [19:05] this is like, the perfect example of the "corporations have taken over the internet" shit that I'm constantly complaining about [19:15] *** bsmith093 has joined #archiveteam-bs [19:17] so i'm at 858k items now [19:22] *** BartoCH has quit IRC (Ping timeout: 260 seconds) [19:27] *** BartoCH has joined #archiveteam-bs [19:33] *** zino has quit IRC (Quit: Leaving) [19:34] *** zino has joined #archiveteam-bs [20:05] *** kristian_ has quit IRC (Quit: Leaving) [20:06] *** schbirid has quit IRC (Quit: Leaving) [20:25] :) [20:41] so looks like the blaze tv is going to be a problem now [20:42] after 9/1 they got rid of the akamai flash stream ondemand [20:42] stream that has it : http://web.gbtv.com/gen/multimedia/detail/6/8/3/1077120683.xml [20:43] stream that doesn't : http://web.gbtv.com/gen/multimedia/detail/7/8/3/1118796783.xml [20:43] whats even more funny is i can't get the stream at all to play in firefox [20:43] *** JW_work has quit IRC (Quit: Leaving.) [20:44] 08:59 <@joepie91> hmmmm. [20:44] 08:59 <@joepie91> https://archive.org/details/hackercons?and[]=defcon [20:45] 08:59 <@joepie91> is there a collection I'm not aware of [20:45] Dude. [20:45] Let me introduce you to the staff of that project [20:45] So, here's me [20:45] ... [20:45] Wait [20:45] Wait [20:45] huh [20:45] Appears to be one guy [20:45] One busy guy [20:45] Someone should get on that [20:45] I'll ask the staff [20:45] Wait, that's also me [20:46] Hmmm, let me look on FOS [20:46] root@teamarchive0:/0/SCREENDEAT# du -sh /0/HACKERCONS/ [20:46] 336G /0/HACKERCONS/ [20:46] Huh [20:46] That appears to be a 336gb directory of files [20:46] That need to go on IA [20:46] As if [20:46] AS IF [20:46] It's a task [20:46] In a task list [20:46] To be done by a staff [20:46] Of one guy [20:46] Because as you know [20:46] AS YOU KNOW [20:46] *** metalcamp has quit IRC (Ping timeout: 506 seconds) [20:46] Of ALL THE GROUPS IN THE WOULD [20:46] THAT IS, ALL THE SINGULAR GROUPS IN THE WORLD [20:46] Hacker cons [20:47] White guys on stage [20:47] talking about white guy stuff [20:47] Is the single most at-risk content in the world [20:47] Because of the total lack of members gazing up at their own asses to kiss [20:48] To listen to someone with aspbergers and hygiene difficulties [20:48] explain about stack overflows for 64 minutes out of an allotted 45 [20:48] forever [20:48] PS It's my birthday [20:48] *** JW_work has joined #archiveteam-bs [20:48] could you transcribe the conference talks while you upload them, also? :) [20:48] On it [20:49] * SketchCow adds it to the task list [20:49] Happy Birthday! [20:49] * Celebrate Birthday [20:49] * Do Everything Else In the World [20:49] * Strangle Joepie in a hotel bathtub and make it look like the illuminati did it [20:49] * Upload hacker talks [20:50] * Transcribe hacker talks [20:50] Anything else need to be added [20:50] * Work for Yahoo [20:51] I was actually working on transcribing hacker talks some time ago [20:51] we were trying to use amazon turk [20:52] I wanted to do something like that myself, but there's never enough hours in the day.. [20:52] The text you wanted to save was blocked by the spam filter. This is probably caused by a link to a blacklisted external site. [20:52] The following text is what triggered our spam filter: .ru [20:52] fukkkkk [20:53] the results were like https://drive.google.com/open?id=0B8vMVOeBjgixUFJDMFhkc2QxaUU [20:53] Nice! [20:53] how much did you offer for that? [20:53] SketchCow: Working undercover for Yahoo while silently making copies of their hard drives actually sounds like a good idea [20:54] Transcripts like that would make great subtitles/captions too. [20:54] I can't link my homepage because it's on a .rustedlogic.net subdomain, hooray [20:54] defcon does that essentially [20:55] Happy Birthday SketchCow! [20:55] Sanqui - link to a url shortener maybe? [usually bad form, but if needs must...] [20:55] For the record [20:56] The Internet Archive has secret access to a transcriber [20:56] antomatic, you {swearword} [20:56] Which could then be cleaned up [20:56] oooo [20:56] 1. They must be requested [20:56] 2. They need a .wav to work from [20:56] 4'8,7`3'12. happy [20:56] 13.9,11'6·10` birthday [20:56] (sorry if that was particularly obnoxious) [20:57] I was going to suggest a while back that IA could try some auto-transcription of video and then shove the results through your keyword creator script. (For some reason I was thinking particularly about Computer Beach Party and what kind of keywords it might devolve from the text) [20:57] Happy Birthday SketchCow :-D [20:57] antomatic: The following text is what triggered our spam filter: tinyurl.com/j3mwk92 [20:57] jesus christ [20:58] yikes [20:59] does anybody have access to the ArchiveTeam wiki server? this filter should be removed because we vet all accounts anyway. I believe it's somewhere in the LocalConfig.php file [20:59] so now it's a battle to find a url shortener that is not in that site's spam filter. :) [20:59] LocalSettings.php [21:00] *** kristian_ has joined #archiveteam-bs [21:02] change the wiki article to "Nifty!" [21:02] then purplebot can look really excited [21:03] but it's Nifty not Yahoo! [21:04] let me introduce you to my new startup, excla.mr [21:04] we add exclamation marks to the world's domains [21:19] *** Stiletto has quit IRC (Read error: Connection reset by peer) [21:24] *** dashcloud has quit IRC (Read error: Operation timed out) [21:25] YIPDW [21:25] STOP [21:25] has yipdw gone too far? signs point to yous [21:25] *yes [21:28] *** dashcloud has joined #archiveteam-bs [21:30] Our Fabulous Journey! [21:51] fabjourn.er [21:56] happy birthday SketchCow! [22:00] *** fie has joined #archiveteam-bs [22:09] *** dashcloud has quit IRC (Read error: Operation timed out) [22:11] *** fie has quit IRC (Quit: Leaving) [22:13] *** sep332 has joined #archiveteam-bs [22:13] *** dashcloud has joined #archiveteam-bs [22:36] *** zenguy_pc has quit IRC (Read error: Operation timed out) [22:50] *** JW_work has quit IRC (Quit: Leaving.) [22:55] *** zenguy_pc has joined #archiveteam-bs [23:04] *** zenguy_pc has quit IRC (Read error: Operation timed out) [23:15] *** GE has quit IRC (Quit: zzz) [23:41] microsofti.er? [23:47] *** Honno_ has quit IRC (Read error: Operation timed out)