[00:04] *** dashcloud has joined #archiveteam [00:19] *** wvdp___ has joined #archiveteam [00:22] *** Stilett0 has joined #archiveteam [00:23] *** Stiletto has quit IRC (Ping timeout: 306 seconds) [00:24] *** JesseW has joined #archiveteam [00:27] *** godane has quit IRC (Ping timeout: 258 seconds) [00:29] *** godane has joined #archiveteam [00:38] *** godane has quit IRC (Quit: Leaving.) [00:38] *** godane has joined #archiveteam [00:43] *** philpem has quit IRC (Ping timeout: 252 seconds) [00:45] *** Stilett0 is now known as Stiletto [01:05] *** JesseW has quit IRC (Ping timeout: 600 seconds) [01:13] *** Start has joined #archiveteam [01:21] *** JesseW has joined #archiveteam [01:25] *** primus104 has quit IRC (Leaving.) [01:40] Whatever happened to http://archiveteam.org/index.php?title=FlickrFckr ? [01:49] *** Start has quit IRC (Quit: Disconnected.) [01:55] arkiver: i can only give logs for 2015-08-02 and 2015-08-07. the rest of the logs are on the old tracker [02:02] i checked redis and i don't know why it has settings stored as an item [02:27] yipdw how do i compile wget-lua from our repo [02:28] S[h]O[r]T: autoconf; ./configure --prefix=PREFIX; make install [02:28] alternatively one of us can update the tarball but that will take some time [02:28] you can also try wpull if you're starting a new project [02:31] im trying to get downloaders up for blip grab and that 5.18 error is stopping me :( [02:31] getting problems configuring...trying to fix that [02:32] try adding the following to the get-wget-lua script [02:32] sed -e "s/\(item \)\([0-9]\)/\1\.\2/" ./doc/wget.texi > ./doc/wget.texi.tmp && mv ./doc/wget.texi.tmp ./doc/wget.texi [02:32] at around line 38 [02:35] wget-lua successfully built. [02:36] aaaaaaaaa is a useful name ive now learned [02:45] *** robink has quit IRC (Ping timeout: 492 seconds) [02:55] *** Start has joined #archiveteam [03:01] *** JesseW has quit IRC (Read error: Operation timed out) [03:10] *** robink has joined #archiveteam [04:10] *** aaaaaaaaa has quit IRC (Leaving) [04:31] *** JesseW has joined #archiveteam [04:42] *** xk_id has joined #archiveteam [04:52] *** xk_id has quit IRC (Remote host closed the connection) [05:07] *** brayden_ has joined #archiveteam [05:07] *** brayden has quit IRC (Read error: Connection reset by peer) [05:12] *** godane has quit IRC (Leaving.) [05:13] *** godane has joined #archiveteam [06:21] *** xk_id has joined #archiveteam [06:39] *** JesseW has quit IRC (Read error: Operation timed out) [06:55] *** bassiexp_ has joined #archiveteam [07:11] *** bassiexp_ has quit IRC (Quit: Page closed) [07:36] *** bentpins has joined #archiveteam [07:37] Any thought on soundcloud? http://thump.vice.com/en_au/article/the-great-soundcloud-purge-of-2015-has-begun [08:12] i'm grabbing the first 100k urls rss feeds [08:14] after that i can then give you guys a mp3 list [08:15] after 200 users rss urls i got 94 mp3 urls [08:16] godane: from what? [08:21] each url has a rss feed: http://feeds.soundcloud.com/users/soundcloud:users:648/sounds.rss [08:21] and a number [08:21] so it easly brute forceible [08:23] *** schbirid has joined #archiveteam [08:25] there are m4a files also: http://feeds.soundcloud.com/users/soundcloud:users:2/sounds.rss [08:26] code for getting mp3 urls in web archive: zcat *.warc.gz | grep url= | sed 's|.* url="||g' | sed 's|" .*||g' [08:57] Good stuff [09:01] I see [09:11] *** primus104 has joined #archiveteam [09:22] SketchCow: how the situation is right now we are likely ot able to get blip saved 100% before the deadline [09:22] SketchCow: I see you've been in contact with someone from blip, can you please ask him if blip's shutdown can be delayed by two weeks? [09:33] *** primus104 has quit IRC (Leaving.) [10:55] *** nmnn has joined #archiveteam [11:04] *** xk_id has quit IRC (Remote host closed the connection) [11:52] *** Ungstein has joined #archiveteam [11:53] *** Ungstein has quit IRC (Client Quit) [11:54] *** primus104 has joined #archiveteam [12:02] *** Ungstein has joined #archiveteam [12:35] does anyone know a decent twitter scraper for selected accounts that will grab their timeline, tweets and images (:orig!) without requiring you to submit your blood type and food preferences for OAuth twitter access? [12:35] for running daily or something [12:40] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [12:41] what is your quest [12:41] prices are a bit old http://www.archiveteam.org/index.php?title=Storage_Media [12:42] yahoosucks [12:42] cheers [12:42] <3 [12:42] bentpins: Thanks for updating them :) [12:42] and welcome~ [12:44] ^ [12:45] *** sivoais has quit IRC (Read error: Operation timed out) [12:45] *** espes__ has quit IRC (Read error: Operation timed out) [12:46] *** dashcloud has quit IRC (Read error: Operation timed out) [12:48] *** PurpleSym has joined #archiveteam [12:54] *** dashcloud has joined #archiveteam [12:57] *** sivoais has joined #archiveteam [13:05] OK, I have two coding requests, if possible. [13:05] The first, is a piece of code that, given a Wiki, pulls every external reference of that Wiki out and submits it to archivebot or Internet Archive. [13:07] The second is can wait [13:07] What kind of wiki? (MediaWiki?) [13:07] *** SketchCow sets mode: +oooo beardicus BlueMaxim Cameron_D chfoo [13:07] *** SketchCow sets mode: +oooo dashcloud db48x dcmorton DFJustin [13:07] *** SketchCow sets mode: +oo ersi Famicoman [13:08] Yes, Mediawiki. [13:09] Hm~ [13:09] I guess we can use the wikiteam dump scrips and then suck out the URLs from the dump [13:18] *** espes__ has joined #archiveteam [13:22] If the wiki is still online there’s https://www.mediawiki.org/wiki/Help:Linksearch [13:22] SketchCow: I'll write a bit of code for that [13:26] *** wvdp_ has joined #archiveteam [13:27] I didn't know about :Linksearch [13:27] That's very nice. It might be useful for the script. [13:32] *** wvdp___ has quit IRC (Read error: Operation timed out) [13:39] *** nmnn has quit IRC (Ping timeout: 483 seconds) [13:40] *** Stiletto has quit IRC (Read error: Operation timed out) [13:41] *** Stiletto has joined #archiveteam [13:50] *** expr_ has joined #archiveteam [13:52] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [13:52] https://news.ycombinator.com/item?id=10064565 [13:57] Fuck THAT guy and his manuals [14:08] The guy who runs the store? [14:10] *** rogal has joined #archiveteam [14:12] hi! After some time I'm reloading my project of archiving ownlog.com blog service. I'm an author of ownlog-grab scripts on archiveteam's github. Tracker for this project is also ready [14:13] What's the next step? I suppose I should have some permissions to upload items to the tracker - and I need rsync account created for this project [14:14] *** rogal has quit IRC (Read error: Connection reset by peer) [14:14] *** rogal has joined #archiveteam [14:27] *** rogal has quit IRC (Read error: Connection reset by peer) [14:29] *** rogal has joined #archiveteam [14:39] *** chfoo has quit IRC (Ping timeout: 258 seconds) [14:43] *** Stiletto has quit IRC () [14:54] *** xk_id has joined #archiveteam [15:14] *** SimpBrain has joined #archiveteam [15:15] *** Stiletto has joined #archiveteam [15:22] *** rogal has quit IRC (Read error: Connection reset by peer) [15:22] *** rogal has joined #archiveteam [15:26] *** chfoo has joined #archiveteam [15:29] *** rogal has quit IRC (Read error: Connection reset by peer) [15:30] *** rogal has joined #archiveteam [15:42] *** Stiletto has quit IRC () [15:45] *** Froggypwn has quit IRC (Ping timeout: 606 seconds) [15:46] *** Froggypwn has joined #archiveteam [15:53] *** Stiletto has joined #archiveteam [15:56] *** rogal has quit IRC (Read error: Connection reset by peer) [15:56] *** rogal has joined #archiveteam [16:37] *** nmnn has joined #archiveteam [16:37] *** xk_id has quit IRC (Read error: Connection reset by peer) [16:40] *** chfoo0 has joined #archiveteam [16:41] *** bamboo has joined #archiveteam [16:41] hi [16:41] anyone here working on blingee [16:46] *** chfoo has quit IRC (Ping timeout: 483 seconds) [16:46] i'd like to try scraping the stamps, which are stored as swfs [16:47] bit of a process to get at them [16:49] they're all stored as swfs [16:52] *** xk_id has joined #archiveteam [16:57] *** chfoo0 is now known as chfoo [16:58] *** rogal has quit IRC (Read error: Connection reset by peer) [16:58] *** rogal has joined #archiveteam [17:16] I am! [17:16] bamboo, Do you have an example stamp/swf url? [17:16] *** chfoo0 has joined #archiveteam [17:19] *** nmnn has quit IRC (Ping timeout: 483 seconds) [17:22] *** nertzy has joined #archiveteam [17:23] *** chfoo has quit IRC (Ping timeout: 483 seconds) [17:27] *** JesseW has joined #archiveteam [17:27] trying to get one, i don't htink i can generate them programatically [17:28] i was going to scrape their search pages http://blingee.com/stamp/embedded_list?query=cat [17:28] which pass an encrypted string back to the main blingee editor (flash app) which i decompiled and am lookin through [17:28] they're AES encrypted [17:28] the key appears to be "rAI1P8bpXoReutED8XOTT0lh26MWhWz87IH4t39LjJp3wxLkEHDKE2Er" [17:32] From what I've seen, you can access stamps via http://blingee.com/stamp/view/$ID and then search the html for the bigbox div. [17:32] For example, http://blingee.com/stamp/view/4906955 and http://image.blingee.com/images18/content/output/000/000/000/04a/670662943_920758.gif [17:32] Not sure if that works for all of them though. [17:35] *** rogal has quit IRC (Read error: Connection reset by peer) [17:36] *** rogal has joined #archiveteam [17:36] *** expr_ has quit IRC (My Mac has gone to sleep. ZZZzzz…) [17:40] *** rogal has quit IRC (Read error: Connection reset by peer) [17:40] *** rogal has joined #archiveteam [17:43] *** aaaaaaaaa has joined #archiveteam [17:48] the gifs are useless though, they all have that checkerboard pattern [17:49] i decrypted this thing finally lol [17:49] http://image.blingee.com/images19/content/output/000/000/000/083/856589260_1244670.swf [17:49] this is what the app is actually using [17:49] they have transparency [17:49] you can't generate the swf url from the gif alas [17:50] the swf stickers actually have full alpha transparency [17:51] ah [17:52] i think it would be feasible to scrape search, decode these strings, and grab the swfs [17:52] i'll see if there's something else we can scrape [17:53] *** JesseW has quit IRC (Leaving.) [17:53] *** rogal has quit IRC (Read error: Connection reset by peer) [17:53] it seems like the archive bot has a lot of blingee captured already [17:54] *** rogal has joined #archiveteam [17:54] but these stamps are valuable, other gif-stamp sites exist but don't have the range [17:54] alarming: the top stamp names are in korean [17:54] *** rogal has quit IRC (Read error: Connection reset by peer) [17:55] *** rogal has joined #archiveteam [17:55] *** nmnn has joined #archiveteam [17:56] *** rogal has quit IRC (Read error: Connection reset by peer) [17:56] *** Start has quit IRC (Quit: Disconnected.) [17:56] *** rogal has joined #archiveteam [17:57] bamboo, do you know if there are swf urls for the actual blingees? or is it just gifs? [17:57] i just pasted a swf url [17:58] I mean non-stamps, like http://blingee.com/blingee/view/1 [17:58] ah no, i think the final output is a gif [17:59] lol 3000 cat swfs [18:04] *** nmnn has quit IRC (Ping timeout: 483 seconds) [18:06] wonder how you could get a list of stamp tags [18:06] ah the stamp pages themselves have them [18:12] *** rogal has quit IRC (Read error: Connection reset by peer) [18:14] *** rogal has joined #archiveteam [18:19] *** rogal has quit IRC (Read error: Connection reset by peer) [18:20] meta-meta2-B /buffer +1 [18:20] sorry [18:26] the funny thing is a lot of these stamps have that gradient watermark pattern on them, idgi [18:30] *** brayden_ has quit IRC (Read error: Connection reset by peer) [18:31] *** primus104 has quit IRC (Leaving.) [18:33] *** chfoo0 has quit IRC (Ping timeout: 483 seconds) [18:43] *** nmnn has joined #archiveteam [18:46] *** JesseW has joined #archiveteam [19:08] *** godane has quit IRC (Quit: Leaving.) [19:15] *** primus104 has joined #archiveteam [19:27] *** aliz has quit IRC (Ping timeout: 252 seconds) [19:27] *** nmnn has quit IRC (Ping timeout: 483 seconds) [19:34] *** habi has joined #archiveteam [19:40] *** habi has left [19:43] *** yan has joined #archiveteam [19:46] *** yan has quit IRC (Client Quit) [20:07] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [20:15] *** nertzy has joined #archiveteam [20:30] *** philpem has joined #archiveteam [20:32] *** dashcloud has quit IRC (Remote host closed the connection) [20:34] *** dashcloud has joined #archiveteam [20:39] *** bentpins has quit IRC (Quit: Leaving) [20:50] *** JesseW has quit IRC (Leaving.) [21:10] *** PurpleSym has quit IRC (Remote host closed the connection) [21:17] *** JesseW has joined #archiveteam [21:19] *** SimpBrain has quit IRC (Leaving) [21:19] welp i'm fetching 188 pages of "cat" swfs [21:20] my friend wrote something to dump pngs out of the swfs [21:21] Great! [21:21] would it be funny to merge the pngs into an apng [21:27] *** aliz has joined #archiveteam [21:29] That would be one massive collage [21:29] oh, oops read that wrong [21:32] *** aliz has quit IRC (Remote host closed the connection) [21:37] *** JesseW has quit IRC (Ping timeout: 600 seconds) [21:43] *** godane has joined #archiveteam [22:05] *** chfoo has joined #archiveteam [22:17] *** nertzy has quit IRC (Quit: This computer has gone to sleep) [22:21] *** nertzy has joined #archiveteam [22:42] bamboo: scripts for blingee for a warrior project are (hopefully) ready tomorrow [22:44] if you have anything you think we should know about, please write something about it here http://archiveteam.org/index.php?title=Blingee [22:47] oh nice [22:47] WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD [22:47] yahoosucks [22:47] yahoosucks [22:47] heh [22:47] lol [22:48] i should probably incorporate my thing into yours, i made something to focus on the swfs [22:49] I'm almost done with the scripts: https://github.com/garyrh/blingee-grab [22:49] would need to move it over to lua, presumably [22:49] You could do it in Lua, or in Python and just call it from the Lua script. [22:49] cool i'll have a look later [22:51] *** dcmorton has quit IRC (Quit: ZNC - http://znc.in) [22:54] *** dcmorton has joined #archiveteam [23:00] *** wvdp___ has joined #archiveteam [23:01] my swf scraper is here https://github.com/julescarbon/blingee-stamp [23:01] written in javascript because it was ready-to-hand, hope that's cool [23:06] *** wvdp_ has quit IRC (Read error: Operation timed out) [23:26] *** aaaaaaaa_ has joined #archiveteam [23:26] *** aaaaaaaaa has quit IRC (Read error: Connection reset by peer) [23:27] *** aaaaaaaa_ is now known as aaaaaaaaa [23:36] *** dashcloud has quit IRC (Read error: Operation timed out) [23:39] *** dashcloud has joined #archiveteam [23:40] *** Start has joined #archiveteam [23:46] *** dcmorton_ has joined #archiveteam [23:50] *** BlueMaxim has joined #archiveteam