[00:12] so this month is sucking for me: https://archive.org/details/@chris85?and[]=addeddate:2017-05 [00:13] to be fair i'm ripping old original broadcasts from vhs [00:26] so this person tons of old McDonalds stuff: http://myworld.ebay.com/uptowngirljeni/ [00:26] including sets of old happy meal boxs [01:18] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [01:19] *** j08nY has quit IRC (Quit: Leaving) [01:25] *** Sk1d has joined #archiveteam-bs [01:32] *** username1 has joined #archiveteam-bs [01:35] *** schbirid2 has quit IRC (Read error: Operation timed out) [02:37] *** ndiddy has quit IRC () [02:57] *** nyany has quit IRC (Leaving) [04:05] https://www.imzy.com/imzy/post/imzy_is_shutting_down [04:05] we have about 1 month to get it [04:08] *** nyany has joined #archiveteam-bs [05:02] godane: greenie in here is imzy staff [05:14] *** DopefishJ has joined #archiveteam-bs [05:14] *** swebb sets mode: +o DopefishJ [05:16] *** DFJustin has quit IRC (Ping timeout: 260 seconds) [06:42] *** wabu_ has quit IRC (Read error: Operation timed out) [06:52] *** wabu has joined #archiveteam-bs [06:58] *** w0rp has quit IRC (Read error: Operation timed out) [06:59] *** w0rp has joined #archiveteam-bs [08:15] *** icedice has joined #archiveteam-bs [08:31] *** user_ has quit IRC (Read error: Operation timed out) [08:34] *** bwn has quit IRC (Ping timeout: 260 seconds) [08:54] *** bwn has joined #archiveteam-bs [09:10] *** j08nY has joined #archiveteam-bs [09:14] *** GE has joined #archiveteam-bs [10:24] *** j08nY has quit IRC (Read error: Operation timed out) [10:31] *** davidar has joined #archiveteam-bs [10:34] *** GE has quit IRC (Remote host closed the connection) [11:25] *** BlueMaxim has quit IRC (Quit: Leaving) [11:59] Speed: 0.9 B/s \m/ [12:04] *** pizzaiolo has joined #archiveteam-bs [12:22] *** GE has joined #archiveteam-bs [12:58] *** vitzli has joined #archiveteam-bs [12:59] *** icedice2 has joined #archiveteam-bs [13:00] *** icedice2 has quit IRC (Client Quit) [13:01] *** GE is now known as SHODAN_UI [13:05] *** j08nY has joined #archiveteam-bs [13:24] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [13:33] *** vitzli_ has joined #archiveteam-bs [13:36] *** vitzli has quit IRC (Ping timeout: 250 seconds) [13:38] *** vitzli_ has quit IRC (Quit: Leaving) [14:23] *** jtn2 has quit IRC (Ping timeout: 250 seconds) [14:26] *** godane has quit IRC (Quit: Leaving.) [14:28] *** schbirid2 has joined #archiveteam-bs [14:30] *** jtn2 has joined #archiveteam-bs [14:31] *** username1 has quit IRC (Read error: Operation timed out) [15:23] *** icedice has quit IRC (Quit: Leaving) [15:31] *** Sanqui has quit IRC (Remote host closed the connection) [15:37] *** Sanqui has joined #archiveteam-bs [16:26] *** DopefishJ is now known as DFJustin [16:28] *** schbirid2 has quit IRC (Quit: Leaving) [16:28] *** godane has joined #archiveteam-bs [17:36] *** j08nY has quit IRC (Remote host closed the connection) [17:39] *** j08nY has joined #archiveteam-bs [17:42] *** wolfpld has quit IRC (Quit: WeeChat 1.6) [17:51] *** SHODAN_UI has quit IRC (Quit: zzz) [18:26] *** namespace has joined #archiveteam-bs [18:44] *** SHODAN_UI has joined #archiveteam-bs [19:05] *** ndiddy has joined #archiveteam-bs [20:13] *** wacky_ has joined #archiveteam-bs [20:48] yeah godane i think my backend engineer is emailing with alembic, we're not sure what format y'all want the dump of url slugs in [20:50] but we're gonna try to get that to yall. if folks have questions, i can try to help relay. we have some individual users asking for various data dumps but right now we're focusing on getting users their own data, and secondly working with archivers who are interested or need info from us [21:46] do we need a warrior project for imzy? [21:46] alembic ^ [21:47] greenie with url slugs you mean a list of URLs? [21:49] scripts will be at https://github.com/ArchiveTeam/imzy-grab when created [21:51] PurpleSym: do you have the dump you made of imzy somewhere? [21:51] let's create a channel for this also [21:54] looking forward to this, social media sites are always fun [21:55] arkiver: the email we got says "Because the posts are routed via slugs and not incremental IDs, it is currently very difficult to crawl and archive imsy. A list of these slugs would solve this issue." [21:55] I see [21:55] I'm not sure what information PurpleSym has exactly [21:56] but a list of groups like https://www.imzy.com/worldbuilding and users like https://www.imzy.com/@blues_sevenfold would already help a lot [21:56] just in like a txt file or something? [21:57] arkiver: But a list of posts would be import too, right? [21:57] *important [21:57] I think a txt file would be fine [21:57] json too [21:58] timmc: yes [21:58] but else we'll go over the groups and users an get all the posts and comments they have made [21:59] however, very large groups with tens or hundreds of thousands of posts would be problematical [22:00] I haven't had a very good look yet at the website, so I might be missing stuff [22:01] The one thing I can think of that might not be easy to pick up would be posts that had a chat room instead of a comment section. [22:02] No idea how common those were, I never participated in one myself. [22:02] we dont have many of those, comparatively [22:02] it was a feature we were planning on killing and totally reworking in a different/better way, because it functioned poorly and wasn't used for the most part. While I'm not an archivist, I don't think theres important stuff in them, tbh [22:03] *** SHODAN_UI has quit IRC (Remote host closed the connection) [22:04] okay so what we'll try to get you guys is three files, basically. 1. community URLs, 2. profile URLs, 3. post URLs. we'll only send you the URLs for communities/profiles that actually have content [22:04] Well, the more we can of the site the better I think [22:04] If there's any other special section on the site other than communities and profiles, please also try to get the URLs for those [22:05] or the names/IDs of them in the URLs [22:05] timmc: do you have an example of a chat room? [22:05] cool. weffey (our backend engineer) is gonna be afk over the weekend, but will try to get yall a "how does this look?" set of files on tuesday-ish [22:05] thanks! [22:06] I think I'll have the script ready by then [22:07] rad. weffey can be added to the repo if yall want, weffey@gmail.com and can try to help [22:07] i dont really know yalls style, so im sorta just functioning as a middleperson here [22:08] https://github.com/weffey [22:09] we have some weird ajax paging stuff, that wouldnt just be a simple wget [22:09] timmc has maybe figured some of this out already, im not sure [22:09] we have custom scripts [22:09] for example [22:10] for flickr: https://github.com/ArchiveTeam/flickr-grab/ [22:10] where the real custom stuff is done in https://github.com/ArchiveTeam/flickr-grab/blob/master/flickr.lua [22:10] (afk, heading home, will check it out in a bit) [22:10] thanks [22:17] arkiver: Each community also has an /about, /rules, and /leaders page. Example: https://www.imzy.com/boston/rules [22:18] There are also some static pages linked from the footer (or the bottom of the sidebar on infinite-scroll pages). [22:21] thank you [22:21] will have a look at those [22:23] I see the IP is saved on the page [22:23] In the footer? [22:24] in the script imzy-state [22:25] remoteIp in window.__IMZY__.sessionStorageCache [22:25] Yeah, I noticed that before, and I *think* it's only the viewer's IP... but on some static pages, it might leak a previous viewer's IP. [22:25] uh yeah [22:25] so in the footer [22:25] greenie: ^ I had mentioned this before, not sure if it was investigated. [22:26] so this will be a warning for people running the project [22:26] hmm, yeah [22:26] It also saves a session ID that I haven't been able to connect to any other session identifiers. [22:27] yeah, but the sessions ID is not directly identifiable as coming from someone by non-imzy staff [22:27] Most forums we save have session IDs stored [22:43] I'll run it past weff, timmc. I don't want to speak in absolutes about that particular sort of thing without being quite confident [22:43] yes of course [22:45] *** Stiletto has quit IRC (Ping timeout: 246 seconds) [22:45] (I *think* I'd reported this, at least!) [22:46] yeah it sounds familiar but rather than me digging through my github stuff i'm just gonna be lazy and double check with weff [22:51] greenie: You can probably search issues for "remoteIp" or "less-trafficked" if there's any chance you copied my email into github. [23:00] *** Stilett0 has joined #archiveteam-bs [23:24] *** BlueMaxim has joined #archiveteam-bs [23:42] arkiver unsure if it needs a warrior, but we won't need to do link discovery [23:42] I think we'll do a warrior project [23:42] Imsy engineer I'm speaking to would like to know what format is best [23:42] for the list of urls [23:42] I'm guessing just a \n delimited list? [23:46] yes [23:46] we also talked with greenie about it a bit [23:46] see log [23:46] okidoki [23:47] yeah ive been sorta relaying, sorry for the multiple means of communication [23:47] np, I've held off emailing Lesley at Imsy because I've figured as much [23:50] greenie if it's ok with you guys, we'll just to stick to IRC going forward? [23:52] i should clarify, lesley = weffey that ive been referring to. The server I host my IRC client on is old and fussy and tends to kill my ssh connection, but yeah I can manage IRC if thats easier for you. However, lesley/weffey is the person who will actually be doing the technical end of things on our side, is our lead backend dev. So may end up being a bit of a game of telephone, unless they hop [23:52] on IRC, which I'm not sure they do anymore [23:54] ahaha ok, we'll stick to our motley hybrid of IRC/email then :P [23:54] hehe okay