[00:06] *** dashcloud has quit IRC (Read error: Operation timed out) [00:09] *** dashcloud has joined #archiveteam [00:10] I have no access to that [00:15] *** FalconK has quit IRC (Ping timeout: 260 seconds) [00:16] *** FalconK has joined #archiveteam [00:17] do chfoo or xmc have access to the wiki box? I /think/ it may be just SketchCow, but I'm not sure. [00:17] I have no idea [00:18] not i [00:22] i don't. i'm not even an admin on the wiki [00:24] OK, good to know. We probably just have to wait for Jason then. [00:44] Checking [00:48] Just submitted a ticket to the hosters. [00:49] Just paid my invoice too, just in case. [00:49] :-) [00:50] *** sigkell has quit IRC (Ping timeout: 260 seconds) [00:52] Hi Jason, [00:52] We're currently having a drive problem with the server your site is hosted on and are working to resolve this ASAP. We'll keep you updated on the issue status. [00:52] Thanks, [00:52] Sky M [00:52] *** philpem has joined #archiveteam [00:53] well, that's ... mildly alarming. [00:53] If you're 19 [00:53] And haven't dealt with servers for 25 years [00:53] I'm sure it'll be fine. [00:54] if we lose all the data, we back things up here [00:54] heh. granted. [00:56] SketchCow: Do you happen to remember why you added "stream_only" to the gamepocket_library collection back in May 2014? https://catalogd.archive.org/log/302043910 Collections don't normally contain anything that would get downloaded, AFAIK. I'm trying to figure out if it was an accident, or if I'm missing something. (well, I *know* I'm missing a lot of things, but...) [00:59] *** Stiletto has quit IRC (Read error: Operation timed out) [01:00] *** sigkell has joined #archiveteam [01:14] *** philpem has quit IRC (Ping timeout: 260 seconds) [01:17] *** Stiletto has joined #archiveteam [01:45] *** Atros has quit IRC (Read error: Operation timed out) [01:45] *** atrocity has joined #archiveteam [01:47] and it's back! [02:00] *** Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [02:01] *** Yoshimura has joined #archiveteam [02:04] *** dashcloud has quit IRC (Read error: Operation timed out) [02:05] *** SN4T14 has quit IRC (Read error: Connection reset by peer) [02:08] *** dashcloud has joined #archiveteam [03:06] Archiveteam was rebooted and is back. [03:07] I am sure I added stream_only because I was experimenting with adding it to see if it would force all sub-items into stream_only. [03:07] It didn't, but there was no initiative to change it back, so there it stayed. [03:17] cool, good to know, thanks! [03:18] *** dashcloud has quit IRC (Read error: Operation timed out) [03:19] *** bwn has quit IRC (Ping timeout: 492 seconds) [03:20] *** BnA-Rob1n has quit IRC (Ping timeout: 244 seconds) [03:22] *** dashcloud has joined #archiveteam [03:22] *** BnA-Rob1n has joined #archiveteam [03:22] *** Simpbrai_ has quit IRC (Ping timeout: 244 seconds) [03:22] *** Simpbrai_ has joined #archiveteam [03:27] *** Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [03:30] *** Yoshimura has joined #archiveteam [03:33] I have 15 windows open to FOS, all doing things now. [03:40] Back to full effort. [04:04] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:12] *** bwn has joined #archiveteam [04:13] *** Sk1d has joined #archiveteam [04:43] *** metalcamp has joined #archiveteam [04:48] *** tomwsmf-a has quit IRC (Ping timeout: 258 seconds) [05:05] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [05:10] *** Stilett0 has joined #archiveteam [05:11] *** beardicus has quit IRC (Read error: Operation timed out) [05:11] *** thefinn93 has quit IRC (Read error: Operation timed out) [05:12] *** aMunster has quit IRC (Read error: Operation timed out) [05:12] *** mhazinsk has quit IRC (Read error: Operation timed out) [05:12] *** Stiletto has quit IRC (Read error: Operation timed out) [05:12] *** nwf has quit IRC (Read error: Operation timed out) [05:12] *** thefinn93 has joined #archiveteam [05:12] *** schbirid has joined #archiveteam [05:13] *** Jonimus has quit IRC (Read error: Operation timed out) [05:16] *** Mayonaise has quit IRC (Read error: Operation timed out) [05:17] *** Honno has joined #archiveteam [05:32] *** mhazinsk has joined #archiveteam [05:44] *** beardicus has joined #archiveteam [05:44] *** swebb sets mode: +o beardicus [05:44] *** Mayonaise has joined #archiveteam [05:48] https://www.theguardian.com/technology/2016/apr/11/daily-mail-publisher-private-equity-companies-yahoo-takeover [05:58] *** n00b264 has joined #archiveteam [05:59] *** n00b264 is now known as apples [06:00] hello [06:01] apples: yes, what? [06:02] What ever happened to the videos from the google video archive project? [06:04] I assume you've read http://archiveteam.org/index.php?title=Google_Video [06:04] yes [06:06] all it says it that videos are still being downloaded [06:08] hm, good question. Hopefully one of the people who was involved with that can clarify, then it can be added to the page. [06:11] Did asking the question put that into motion or should I come back and keep asking it like the website says? [06:15] https://archive.org/details/googlevideo2011?and[]=google%20video [06:20] Are there videos hidden inside the crawldata files? [06:21] thanks. It looks like that data is not available for direct download (like most of the Wayback Machine data). I'll add a link to it from the wiki page anyway. [06:24] So the crawldata is what I'm looking for but I can't get at it because of some reason? [06:28] if you know of a particular URL, try getting it through the Wayback Machine. [06:32] *** metalcamp has joined #archiveteam [06:37] ugh. looks like wbm has the page but so far the video isn't showing up. [06:38] you probably have to dig it out [06:38] and/or try pointing youtube-dl and the wayback URL :-) [06:38] er, *at* the wayback url [06:39] thanks. This is the closest I've gotten. I'll have to roll up my sleeves I guess [06:40] good luck [06:40] what's the URL, in case someone else happens to feel like taking a crack at it? [06:42] http://video.google.com/videoplay?docid=-2988947153633950316# [06:42] *** aMunster has joined #archiveteam [06:43] *** Jonimus has joined #archiveteam [06:43] *** swebb sets mode: +o Jonimus [06:44] "Project Popcorn" eh? neat [06:48] dang that's an inconvenient format [06:51] looks like you'd need to walk every cdx file [06:51] but it doesn't seem to let me download any of them [06:51] " The item is not available due to issues with the item's content. " [06:53] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:55] *** ariscop has quit IRC (Quit: Leaving) [06:55] I was hoping that it'd just be the videos :-/ [06:59] https://www.irccloud.com/pastebin/jjLiiMBk [06:59] Argh. Why did it do that [07:00] http://gu.com/p/4t8jt?CMP=Share_AndroidApp_Copy_to_clipboard [07:03] *** nwf has joined #archiveteam [07:20] *** Honno has quit IRC (Read error: Operation timed out) [07:30] *** apples has quit IRC (Quit: http://chat.efnet.org ) [07:31] *** bwn has quit IRC (Read error: Operation timed out) [07:37] *** atomotic has joined #archiveteam [07:41] *** ariscop has joined #archiveteam [07:59] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [08:05] *** metalcamp has joined #archiveteam [08:07] *** zenguy has quit IRC (Read error: Operation timed out) [08:11] *** bwn has joined #archiveteam [08:45] *** WinterFox has joined #archiveteam [08:46] *** dashcloud has quit IRC (Read error: Operation timed out) [08:49] *** dashcloud has joined #archiveteam [09:33] *** lbft has quit IRC (Quit: Bye) [09:34] *** lbft has joined #archiveteam [09:43] *** lbft has quit IRC (Quit: Bye) [09:47] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [09:58] *** Yoshimura has quit IRC (http://www.kiwiirc.com/ - A hand crafted IRC client) [10:42] *** Emcy has quit IRC (Read error: Operation timed out) [10:45] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [11:42] *** atomotic has joined #archiveteam [11:51] *** metalcamp has joined #archiveteam [12:37] *** BlueMaxim has quit IRC (Read error: Operation timed out) [13:04] *** SN4T14 has joined #archiveteam [13:06] *** Emcy has joined #archiveteam [13:27] *** GLaDOS has quit IRC (Ping timeout: 260 seconds) [13:30] *** GLaDOS has joined #archiveteam [13:33] *** VADemon has joined #archiveteam [13:37] *** WinterFox has quit IRC (Remote host closed the connection) [13:49] *** Stilett0 is now known as Stiletto [13:56] *** Honno has joined #archiveteam [13:57] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [13:59] *** Start has quit IRC (Quit: Disconnected.) [14:26] *** atomotic has joined #archiveteam [14:29] *** Yoshimura has joined #archiveteam [14:31] *** Kaz has joined #archiveteam [14:32] *** kurt has quit IRC (Quit: leaving) [14:39] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [14:54] *** Start has joined #archiveteam [15:21] *** JesseW has joined #archiveteam [15:47] *** kris33 has joined #archiveteam [15:57] *** JesseW has quit IRC (Ping timeout: 370 seconds) [16:05] *** Stiletto has quit IRC (Ping timeout: 260 seconds) [16:06] *** Start has quit IRC (Quit: Disconnected.) [16:45] *** kris33 has quit IRC (Textual IRC Client: www.textualapp.com) [16:50] *** philpem has joined #archiveteam [16:55] *** atomotic has joined #archiveteam [16:58] We now have 10.6% of Fotolog saved! [17:02] *** Stiletto has joined #archiveteam [17:05] I tried accessing it, but yells its over capacity. So I hope those 10% pages are not "over capacity" xD [17:17] *** atomotic has quit IRC (Quit: My Mac has gone to sleep. ZZZzzz…) [17:18] *** atomotic has joined #archiveteam [17:20] So http://fos.textfiles.com/ARCHIVETEAM/ [17:20] I'm still working on this - but the rough idea is to have a reporting mechanism for the pipelines of uploading the group works. [17:21] So as things are shoved into the archive, they'll show here. Right now, a simple textfile and table, I'm not a great programmer. [17:21] Problem right now is that I have to test it by generating a 50gb WARC, so that'll take a tad. [17:22] SketchCow: Not a grat programmer either, but can help anyhow? [17:23] Btw, why the fotolog is archived page by page? Is it intention, why not ask them to provide API? Or you do not ask so they don't block or? [17:28] What you use on server? Language. PHP, Python, Lua, ...? [17:29] *** Honno has quit IRC (Read error: Operation timed out) [17:31] we archive page by page so that it will work in http://web.archive.org/ [17:32] Oh yeah, my bad. There could have been API to make equivalent outputs faster, but yeah. [17:32] yes, if we do an API grab it's difficult for ordinary users to access the archive [17:33] if we can we try to get both [17:33] it's a question of priorities [17:33] anyway [17:33] I meant API from fotolog side. [17:34] But actual API would be worth for the archived stuff, so one can utilize it better. [17:34] I wonder how Archive.org stores the data. [17:35] warc [17:35] woop woop woop off-topic siren [17:35] take it to #archiveteam-bas [17:35] er [17:35] take it to #archiveteam-bs [17:47] #archiveteam-bass [17:50] *** Start has joined #archiveteam [17:52] *** bwn has quit IRC (Ping timeout: 246 seconds) [18:05] *** Honno has joined #archiveteam [18:09] *** JW_work has joined #archiveteam [18:09] *** Start has quit IRC (Quit: Disconnected.) [18:17] *** Honno_ has joined #archiveteam [18:21] *** Honno has quit IRC (Read error: Operation timed out) [18:22] *** Start has joined #archiveteam [18:25] *** bwn has joined #archiveteam [18:29] *** Honno has joined #archiveteam [18:36] *** ndizzle has quit IRC (Read error: Connection reset by peer) [18:38] *** Honno_ has quit IRC (Read error: Operation timed out) [18:38] *** ndiddy has joined #archiveteam [18:42] *** scyther has joined #archiveteam [18:42] *** Tomcat_ has joined #archiveteam [18:49] Yoshimura: no! That's all taken care of in the script [18:49] It looks like fotolog is havng a bad 'over capacity' day once every week/two weeks [18:51] Well, if it has right response code I believe it is :) [18:53] Not sure if appropriate channel. Would it be fine to make own warrior code and stuff, maintaining all compatibility? [18:53] Response code of the over capaity message is 200 [18:53] Oh. Can I have a hint on how you detect it then in script? [18:53] in this case that would not be the right seponse vode [18:53] #archiveteam-bs , ask me there [19:10] *** K4k has quit IRC (Quit: WeeChat 1.4) [19:10] *** K4k has joined #archiveteam [19:44] *** Start has quit IRC (Quit: Disconnected.) [19:56] *** schbirid has quit IRC (Quit: Leaving) [20:02] *** Simon has joined #archiveteam [20:05] Hi. I'm here because I know of a website that is quite possibly going to die. At the very least, the iOS app (which was the only way to interact with this site) is entirely unmaintained and has been so for at least a year. I only today found that there was a way to reliably spider the site for content, and I'm not technically-savvy enough to do so myself because their audio player is pretty suphisticated and strange. It was basi [20:05] cally a social network formed around 90 second messages, and was never insanely popular but still has many hours of audio. Any idea where I should go from here, or what I can do? I was sent here by the people over in #youtube-dl (my original idea for how I could download posts). [20:06] maybe mention what site it is? [20:06] :P www.meltapp.com [20:07] Someone here might have an idea what to do [20:08] *** atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) [20:09] Simon: what do you know about how to spider the site? What is the format of the URL, etc. [20:09] you can go to www.meltapp.com/users and get a list of active (or something) users. www.meltapp.com/users/username will pull up a profile. The app is no longer available on the app store and before I thought to try the /users thing, the only way to get a share link to a specific post was to generate it from the app. Also the audio player doesn't give the file URL in the clear; it's some sort of function that retrieves it. That's all I know [20:09] cool, that's something [20:09] oh awesome [20:09] http://www.meltapp.com/users/jason [20:09] Yeah; I have been wanting to do this for months but there's no reference to the /users thing anywhere. It was purely an accidental discovery [20:09] so many audios [20:10] nod, that was one of the head devs [20:10] that's a rediculously small number of users [20:10] SketchCow: see above. Get them or not? [20:10] list is not complete [20:10] if I pull up the page in a browser, there is a link right above the "download melt to reply" thing that leads directly to the audio file. But that link is not in the html source [20:10] http://www.meltapp.com/users/jason is not in the ilst [20:10] list* [20:11] yeah you'd pretty much have to generate a list of users by branching out from the ones displayed. [20:11] We can do thast [20:11] grab followers/friends, repeat [20:11] that* [20:12] Simon: on what exact page do you see the "download melt to reply" button which leads to the audio file? [20:12] I'm only directed to itunes [20:12] Hm [20:12] for example http://www.meltapp.com/melts/WNu1s9Nky6 [20:12] I see it here http://www.meltapp.com/users/jyenners [20:13] Helllllo [20:13] yeah [20:13] I went to my own profile (/users/simon) and grabbed the first link (stuck in the past). The URL is this: http://www.meltapp.com/melts/wrBinAB7xP [20:14] ^ there I'm directed to itunes when clicking Download Melt to reply. [20:14] apparently some are at: http://files.parsetfss.com/b96e3be4-34e2-4803-b1aa-9b710e6d80e9/tfss-3a78678b-7ec2-4051-91c6-843ef43b2de7-recording.m4a [20:14] but we'll get the audio [20:14] yeah [20:14] just investigating why [20:14] If I go to this page, right above the reply button there's a link. i'm using screen reading software and I have no idea how the page renders visually, but I can tab (and shift+tab) between the links. The audio file URL is something like this: [20:14] http://files.parsetfss.com/b96e3be4-34e2-4803-b1aa-9b710e6d80e9/tfss-30e63716-d417-4877-9d9d-fbe0d9642041-recording.m4a [20:15] if I pull up the page in a browser, there is a link right above the "download melt to reply" thing that leads directly to the audio file. But that link is not in the html source [20:15] doesn't work for me [20:19] the audio file is requested with a POST at https://api.parse.com/1/classes/Memo [20:19] which sucks, since that won't playback well in the Wayback Machine [20:19] yuck [20:25] The main user page might not hold up, but it seems like people's profiles will show all their posts. Jason's has hundreds of them. So there's that. [20:26] There's a standard limit that comes with the POST request of 500 [20:26] Let me know if you find a page with 500 items, so I can test if it works if we just higher that number [20:27] let me check a couple of the people I know to be frequent posters [20:27] or if you find anyting with more then 500, let me know too [20:28] we should probably take this to a separated channel, I suggest #melted [20:29] Simon, arkiver [20:29] sure [20:43] *** scyther has quit IRC (Read error: Connection reset by peer) [20:48] *** SimpBrain has quit IRC (Quit: Leaving) [20:56] *** Tomcat_ has quit IRC (Remote host closed the connection) [20:57] Egad [20:58] En'gard, even. [20:58] So wait. We'd be grabbing Melt just because? [20:59] Because it's a lot of user-contributed content on a service whose main entry point is no longer available? [20:59] WHere's the news on it not being available. [21:00] Simon claimed the iPhone app is no longer available. [21:00] He didn't say not available. [21:00] He said undermaintained. [21:00] * JW_work goes to check upthread [21:00] It's no longer available in the app store as far as I'm aware; I just happen to still have it [21:00] Yes, that's what I thought. [21:01] Search for "melt voice". There's another social network thing called "melt" and it's definitely not what we want. [21:02] If this link doesn't work it's safe to say the app is gone: https://itunes.apple.com/us/app/melt-social-voice-recordings/id655838005 [21:03] also, main=only. There was never an app on any other platform. So yeah, it's dead. Forgot about that detail. [21:04] Well, just let me know if we want it and I'll make sure a discovery/grabbing project will be in the warrior soon [21:05] Off the top of my head I know a lot of people who no longer have the app and would love to have their stuff back. It is a comparatively small community though. [21:05] *** dashcloud has quit IRC (Read error: Operation timed out) [21:09] *** dashcloud has joined #archiveteam [21:24] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [21:34] Well, can we find out what the space amount is? [21:47] *** ariscop has quit IRC (Leaving) [22:02] *** tomwsmf-a has joined #archiveteam [22:14] *** Start has joined #archiveteam [22:18] *** VADemon has quit IRC (Quit: left4dead) [22:19] *** chfoo- has quit IRC (Read error: Operation timed out) [22:19] *** chfoo- has joined #archiveteam [22:22] *** wacky_ has quit IRC (Ping timeout: 244 seconds) [22:23] *** wacky has joined #archiveteam [22:40] *** Deewiant_ has quit IRC (Ping timeout: 250 seconds) [22:40] *** Fletcher_ has quit IRC (Ping timeout: 250 seconds) [22:40] *** koon has quit IRC (Ping timeout: 250 seconds) [22:40] *** espes__ has quit IRC (Ping timeout: 250 seconds) [22:40] *** lukeman has quit IRC (Ping timeout: 250 seconds) [22:40] *** espes__ has joined #archiveteam [22:41] *** lukeman has joined #archiveteam [22:41] *** Deewiant_ has joined #archiveteam [22:50] SketchCow: There is no complete list of users, so we can't say how many there are [22:51] Total size estimate isn't very possible for thi [22:51] this [22:51] they never issued a press release with a number of users in it? [22:52] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [22:54] xmc: no idea [22:54] hm ok yeah i can't find their website or anything [22:54] *** ariscop has joined #archiveteam [23:02] Turns out fotolog is shutting down anymore. [23:02] I'd still like to grab them though [23:03] is not shutting down* [23:04] I bet 90% of melt users follow Shane or Jason. Yank from there. [23:04] I'm going to use the same user discovery system as we are currently using for fotolog. [23:04] Works very well. [23:05] I'll trust your judgment. You already figured out more than I did about this. [23:37] *** koon has joined #archiveteam [23:54] *** tomwsmf-a has joined #archiveteam [23:58] *** Stiletto has quit IRC (Read error: Operation timed out)