[00:23] *** JesseW has joined #archiveteam-bs [00:47] *** ItsYoda has quit IRC (Ping timeout: 261 seconds) [00:53] *** JesseW has quit IRC (Ping timeout: 370 seconds) [00:57] *** ItsYoda has joined #archiveteam-bs [01:10] *** Coderjoe has quit IRC (Read error: Operation timed out) [01:11] *** Coderjoe has joined #archiveteam-bs [01:15] *** JW_work1 has joined #archiveteam-bs [01:21] *** JW_work has quit IRC (Read error: Operation timed out) [01:44] (I know I'm being lazy here) does anyone know the flags for youtube-dl to dump a youtube channel's videos to a file? [01:51] i've been using: --write-description --write-info-json --write-annotations --write-thumbnail --write-sub --write-auto-sub --all-subs --recode-video mp4 -f best [01:52] which is just something i cobbled together and stuffed into a random note to myself, i'm sure i'm missing something :) [01:52] oops I mean video URL list to a textfile [01:53] oh, sorry [01:56] i think i've done that before just passing the user's main url? i didn't leave a note.. [03:42] rip kickass torrents [04:04] http://kickasstorrentsan.com/ [04:09] *** RichardG has quit IRC (Quit: Keyboard not found, press F1 to continue) [04:11] *** RichardG has joined #archiveteam-bs [04:47] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:48] *** JesseW has joined #archiveteam-bs [04:53] https://www.githubarchive.org/ <- should probably be mentioned on the wiki (maybe in the github article) [04:54] *** Sk1d has joined #archiveteam-bs [04:55] *** kristian_ has joined #archiveteam-bs [04:56] https://speakerdeck.com/filosottile/the-code-archive-hope-xi about http://codearchive.org [04:58] githubarchive.org is already mentioned [06:24] *** kristian_ has quit IRC (Leaving) [06:35] Anyone know how to get this to work? https://github.com/bibanon/webcache-scraper [06:41] *** tomwsmf has quit IRC (Read error: Operation timed out) [06:56] dxrt: I don't think there's any way to get the URLs exactly, but you can use --get-id to get a list of the video IDs and use that to build the URLs [06:56] *** DoomTay has quit IRC (Quit: Page closed) [07:10] *** JesseW has quit IRC (Ping timeout: 370 seconds) [07:21] *** davidar_ has quit IRC (Quit: Connection closed for inactivity) [07:45] *** metalcamp has joined #archiveteam-bs [07:59] *** r3c0d3x has quit IRC (Quit: Leaving) [08:01] *** schbirid has joined #archiveteam-bs [09:05] *** nightpool has joined #archiveteam-bs [09:11] *** nightpool has quit IRC (Read error: Operation timed out) [09:29] MrRadar: Thanks! [10:23] *** vitzli has joined #archiveteam-bs [10:31] *** metal_cam has joined #archiveteam-bs [10:32] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [11:28] *** pguth_ has joined #archiveteam-bs [11:30] *** pguth_ has left [11:34] *** davidar_ has joined #archiveteam-bs [12:08] *** nightpool has joined #archiveteam-bs [12:22] *** nightpool has quit IRC (Read error: Operation timed out) [12:51] *** nightpool has joined #archiveteam-bs [12:56] xmc: did you ever get started on the tumblr code? [13:00] Hi [13:00] What is up with tumblr? [13:00] I thought that was going to stay? [13:04] *** nightpool has quit IRC (Ping timeout: 250 seconds) [13:05] *** nightpool has joined #archiveteam-bs [13:06] hm not sure why I dc'd [13:08] arkiver: its a huge loss center for Yahoo and i can't imagine the new owner is really going to keep it around for long. its very hard to monetize, and even if they do manage it it'll probably become so unusable people will just leave (making it even more of a poor investment for whoever buys Yahoo) [13:08] wait [13:08] I just read yahoo is being bought by verizon?? [13:09] that might be newer, I've been pretty busy over the last day or so [13:09] last I heard it was just "looking likely" [13:11] Yeah looks like nothing is final yet with the verizon deal, its all just "sources say" at this point [13:12] Anyway I can't imagine Verizon is going to have much love for tumblr either. It's built on some pretty old tech and therefore costs a *ton* of money to run [13:20] *** nightpool has quit IRC (Quit: what the water wants is hurricanes) [13:31] *** nightpool has joined #archiveteam-bs [13:59] *** BlueMaxim has quit IRC (Quit: Leaving) [14:00] *** metal_cam is now known as metalcamp [14:20] *** r3c0d3x has joined #archiveteam-bs [15:14] *** fie_ has joined #archiveteam-bs [15:14] *** fie has quit IRC (Read error: Connection reset by peer) [15:36] *** JesseW has joined #archiveteam-bs [16:15] arkiver: yes, I was shocked [16:26] *** DoomTay has joined #archiveteam-bs [16:29] Can Warrior code parse HTML? [16:30] well like [16:30] what do you mean [16:31] I'm going to go with a blanket 'yes' for now, because there's no reason you can't [16:31] Say, if you want to see if a page has a DIV with the id "bob" [16:32] Or look at the contents thereof [16:33] yeah, no reason why not really [16:34] warrior projects are literally just python scripts packaged up a bit, so the warrior will look for a script and run it [16:34] examples are at https://github.com/archiveteam - any repo ending in "-grab" is your best bet [16:37] *** brayden has quit IRC (Quit: Leaving) [16:37] *** brayden has joined #archiveteam-bs [16:37] *** swebb sets mode: +o brayden [16:45] *** brayden has quit IRC (Quit: Leaving) [16:50] *** brayden has joined #archiveteam-bs [16:50] *** swebb sets mode: +o brayden [17:10] *** fie__ has joined #archiveteam-bs [17:10] *** fie_ has quit IRC (Read error: Connection reset by peer) [17:14] DoomTay: what do you want to do exactly? [17:14] ad why do you want to do it with the warrior [17:16] Maybe after the current batch is done, we could look at artist galleries like http://portalgraphics.net/pg/illust/individual/?user_id=3425. DOM parsing might be the best way to "find" the total number of pages [17:17] you to get pages like [17:17] http://portalgraphics.net/pg/illust/individual.php?user_id=3425&page=2&key=&method= [17:17] http://portalgraphics.net/pg/illust/individual.php?user_id=3425&page=3&key=&method= [17:17] etc.? [17:18] Da [17:18] Is that a yes? [17:18] Yes [17:18] ok [17:19] I don't think we have to find the total number of pages [17:19] links to other pages are just in the html, so we can just extract that [17:19] I hadn't seen Microsoft's "10 Immutable Laws of Computer Security" before: https://technet.microsoft.com/en-us/library/hh278941.aspx -- quite amusing (if also serious). [17:19] Oh [17:20] I'll get profile downloading in in a few days [17:20] Cool [17:20] It will be ready before this batch is finished [17:20] JesseW: Any idea when that was written? [17:21] DoomTay: right around the year 2000 I think [17:21] Ha, oh wow [17:21] And then they pulled that thing with the Windows 10 upgrade [17:27] *** vitzli has quit IRC (Quit: Leaving) [17:30] sometimes i want to punch people through the internet https://bbs.archlinux.org/viewtopic.php?id=203232 [17:31] wut [17:34] "try reinstalling windows" would be a similarly useful suggestion [17:38] *** JesseW has quit IRC (Ping timeout: 370 seconds) [17:39] We should save that for posterity [18:04] schbirid: I recognize that issue [18:04] had it as well [18:04] and unfortunately I think there really *is* no fix [18:04] other than to update [18:05] * joepie91_ spent way too many hours trying to track this down [18:09] yeah [18:10] it is incredibly annoying [18:10] you can disable sounds in the advanced options [18:10] but now i am waiting 45 minutes until that currently download finishes so i can restart =( [18:13] *** Ravenloft has joined #archiveteam-bs [18:27] *** fie_ has joined #archiveteam-bs [18:27] *** fie__ has quit IRC (Read error: Connection reset by peer) [18:40] *** tomwsmf has joined #archiveteam-bs [18:46] joepie91_: disabling captcha sound seems to have fixed it here [18:58] 2011 episodes of The Dan Patrick Show are uploaded [19:25] nightpool: not yet, no [19:26] maybe today, idk, i'm feeling antsy/flighty so i might just disappear for the afternoon [19:32] *** metal_cam has joined #archiveteam-bs [19:33] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [19:35] https://s-media-cache-ak0.pinimg.com/564x/bf/26/0f/bf260f47e2e7d91792de3af1f3219b91.jpg [20:02] https://www.swordsearcher.com/history-of-swordsearcher-bible-software.html [20:42] *** Ravenloft has quit IRC (Ping timeout: 244 seconds) [21:13] *** metal_cam has quit IRC (Read error: Operation timed out) [21:23] *** Ravenloft has joined #archiveteam-bs [21:38] *** Aranje has joined #archiveteam-bs [22:03] *** Ravenloft has quit IRC (Ping timeout: 260 seconds) [22:03] *** Ravenloft has joined #archiveteam-bs [22:07] *** schbirid has quit IRC (Quit: Leaving) [22:29] *** Ravenloft has quit IRC (Ping timeout: 250 seconds) [22:34] *** godane has quit IRC (Ping timeout: 501 seconds) [22:39] *** fie__ has joined #archiveteam-bs [22:39] *** fie_ has quit IRC (Read error: Connection reset by peer) [22:58] *** Ravenloft has joined #archiveteam-bs [23:00] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [23:19] *** Ravenloft has quit IRC (Ping timeout: 633 seconds) [23:49] *** BlueMaxim has joined #archiveteam-bs [23:59] xmc: [23:59] oops shit [23:59] xmc: okay, do you think you're going to start on it this week sometime? if not I can try to get started on my own too