#archiveteam-bs 2017-05-25,Thu

↑back Search

Time Nickname Message
00:12 🔗 godane so this month is sucking for me: https://archive.org/details/@chris85?and[]=addeddate:2017-05
00:13 🔗 godane to be fair i'm ripping old original broadcasts from vhs
00:26 🔗 godane so this person tons of old McDonalds stuff: http://myworld.ebay.com/uptowngirljeni/
00:26 🔗 godane including sets of old happy meal boxs
01:18 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
01:19 🔗 j08nY has quit IRC (Quit: Leaving)
01:25 🔗 Sk1d has joined #archiveteam-bs
01:32 🔗 username1 has joined #archiveteam-bs
01:35 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
02:37 🔗 ndiddy has quit IRC ()
02:57 🔗 nyany has quit IRC (Leaving)
04:05 🔗 godane https://www.imzy.com/imzy/post/imzy_is_shutting_down
04:05 🔗 godane we have about 1 month to get it
04:08 🔗 nyany has joined #archiveteam-bs
05:02 🔗 Lord_Nigh godane: greenie in here is imzy staff
05:14 🔗 DopefishJ has joined #archiveteam-bs
05:14 🔗 swebb sets mode: +o DopefishJ
05:16 🔗 DFJustin has quit IRC (Ping timeout: 260 seconds)
06:42 🔗 wabu_ has quit IRC (Read error: Operation timed out)
06:52 🔗 wabu has joined #archiveteam-bs
06:58 🔗 w0rp has quit IRC (Read error: Operation timed out)
06:59 🔗 w0rp has joined #archiveteam-bs
08:15 🔗 icedice has joined #archiveteam-bs
08:31 🔗 user_ has quit IRC (Read error: Operation timed out)
08:34 🔗 bwn has quit IRC (Ping timeout: 260 seconds)
08:54 🔗 bwn has joined #archiveteam-bs
09:10 🔗 j08nY has joined #archiveteam-bs
09:14 🔗 GE has joined #archiveteam-bs
10:24 🔗 j08nY has quit IRC (Read error: Operation timed out)
10:31 🔗 davidar has joined #archiveteam-bs
10:34 🔗 GE has quit IRC (Remote host closed the connection)
11:25 🔗 BlueMaxim has quit IRC (Quit: Leaving)
11:59 🔗 JAA Speed: 0.9 B/s \m/
12:04 🔗 pizzaiolo has joined #archiveteam-bs
12:22 🔗 GE has joined #archiveteam-bs
12:58 🔗 vitzli has joined #archiveteam-bs
12:59 🔗 icedice2 has joined #archiveteam-bs
13:00 🔗 icedice2 has quit IRC (Client Quit)
13:01 🔗 GE is now known as SHODAN_UI
13:05 🔗 j08nY has joined #archiveteam-bs
13:24 🔗 pizzaiolo has quit IRC (Quit: pizzaiolo)
13:33 🔗 vitzli_ has joined #archiveteam-bs
13:36 🔗 vitzli has quit IRC (Ping timeout: 250 seconds)
13:38 🔗 vitzli_ has quit IRC (Quit: Leaving)
14:23 🔗 jtn2 has quit IRC (Ping timeout: 250 seconds)
14:26 🔗 godane has quit IRC (Quit: Leaving.)
14:28 🔗 schbirid2 has joined #archiveteam-bs
14:30 🔗 jtn2 has joined #archiveteam-bs
14:31 🔗 username1 has quit IRC (Read error: Operation timed out)
15:23 🔗 icedice has quit IRC (Quit: Leaving)
15:31 🔗 Sanqui has quit IRC (Remote host closed the connection)
15:37 🔗 Sanqui has joined #archiveteam-bs
16:26 🔗 DopefishJ is now known as DFJustin
16:28 🔗 schbirid2 has quit IRC (Quit: Leaving)
16:28 🔗 godane has joined #archiveteam-bs
17:36 🔗 j08nY has quit IRC (Remote host closed the connection)
17:39 🔗 j08nY has joined #archiveteam-bs
17:42 🔗 wolfpld has quit IRC (Quit: WeeChat 1.6)
17:51 🔗 SHODAN_UI has quit IRC (Quit: zzz)
18:26 🔗 namespace has joined #archiveteam-bs
18:44 🔗 SHODAN_UI has joined #archiveteam-bs
19:05 🔗 ndiddy has joined #archiveteam-bs
20:13 🔗 wacky_ has joined #archiveteam-bs
20:48 🔗 greenie yeah godane i think my backend engineer is emailing with alembic, we're not sure what format y'all want the dump of url slugs in
20:50 🔗 greenie but we're gonna try to get that to yall. if folks have questions, i can try to help relay. we have some individual users asking for various data dumps but right now we're focusing on getting users their own data, and secondly working with archivers who are interested or need info from us
21:46 🔗 arkiver do we need a warrior project for imzy?
21:46 🔗 arkiver alembic ^
21:47 🔗 arkiver greenie with url slugs you mean a list of URLs?
21:49 🔗 arkiver scripts will be at https://github.com/ArchiveTeam/imzy-grab when created
21:51 🔗 arkiver PurpleSym: do you have the dump you made of imzy somewhere?
21:51 🔗 arkiver let's create a channel for this also
21:54 🔗 arkiver looking forward to this, social media sites are always fun
21:55 🔗 greenie arkiver: the email we got says "Because the posts are routed via slugs and not incremental IDs, it is currently very difficult to crawl and archive imsy. A list of these slugs would solve this issue."
21:55 🔗 arkiver I see
21:55 🔗 arkiver I'm not sure what information PurpleSym has exactly
21:56 🔗 arkiver but a list of groups like https://www.imzy.com/worldbuilding and users like https://www.imzy.com/@blues_sevenfold would already help a lot
21:56 🔗 greenie just in like a txt file or something?
21:57 🔗 timmc arkiver: But a list of posts would be import too, right?
21:57 🔗 timmc *important
21:57 🔗 arkiver I think a txt file would be fine
21:57 🔗 arkiver json too
21:58 🔗 arkiver timmc: yes
21:58 🔗 arkiver but else we'll go over the groups and users an get all the posts and comments they have made
21:59 🔗 arkiver however, very large groups with tens or hundreds of thousands of posts would be problematical
22:00 🔗 arkiver I haven't had a very good look yet at the website, so I might be missing stuff
22:01 🔗 timmc The one thing I can think of that might not be easy to pick up would be posts that had a chat room instead of a comment section.
22:02 🔗 timmc No idea how common those were, I never participated in one myself.
22:02 🔗 greenie we dont have many of those, comparatively
22:02 🔗 greenie it was a feature we were planning on killing and totally reworking in a different/better way, because it functioned poorly and wasn't used for the most part. While I'm not an archivist, I don't think theres important stuff in them, tbh
22:03 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
22:04 🔗 greenie okay so what we'll try to get you guys is three files, basically. 1. community URLs, 2. profile URLs, 3. post URLs. we'll only send you the URLs for communities/profiles that actually have content
22:04 🔗 arkiver Well, the more we can of the site the better I think
22:04 🔗 arkiver If there's any other special section on the site other than communities and profiles, please also try to get the URLs for those
22:05 🔗 arkiver or the names/IDs of them in the URLs
22:05 🔗 arkiver timmc: do you have an example of a chat room?
22:05 🔗 greenie cool. weffey (our backend engineer) is gonna be afk over the weekend, but will try to get yall a "how does this look?" set of files on tuesday-ish
22:05 🔗 arkiver thanks!
22:06 🔗 arkiver I think I'll have the script ready by then
22:07 🔗 greenie rad. weffey can be added to the repo if yall want, weffey@gmail.com and can try to help
22:07 🔗 greenie i dont really know yalls style, so im sorta just functioning as a middleperson here
22:08 🔗 greenie https://github.com/weffey
22:09 🔗 greenie we have some weird ajax paging stuff, that wouldnt just be a simple wget
22:09 🔗 greenie timmc has maybe figured some of this out already, im not sure
22:09 🔗 arkiver we have custom scripts
22:09 🔗 arkiver for example
22:10 🔗 arkiver for flickr: https://github.com/ArchiveTeam/flickr-grab/
22:10 🔗 arkiver where the real custom stuff is done in https://github.com/ArchiveTeam/flickr-grab/blob/master/flickr.lua
22:10 🔗 greenie (afk, heading home, will check it out in a bit)
22:10 🔗 arkiver thanks
22:17 🔗 timmc arkiver: Each community also has an /about, /rules, and /leaders page. Example: https://www.imzy.com/boston/rules
22:18 🔗 timmc There are also some static pages linked from the footer (or the bottom of the sidebar on infinite-scroll pages).
22:21 🔗 arkiver thank you
22:21 🔗 arkiver will have a look at those
22:23 🔗 arkiver I see the IP is saved on the page
22:23 🔗 timmc In the footer?
22:24 🔗 arkiver in the script imzy-state
22:25 🔗 arkiver remoteIp in window.__IMZY__.sessionStorageCache
22:25 🔗 timmc Yeah, I noticed that before, and I *think* it's only the viewer's IP... but on some static pages, it might leak a previous viewer's IP.
22:25 🔗 arkiver uh yeah
22:25 🔗 arkiver so in the footer
22:25 🔗 timmc greenie: ^ I had mentioned this before, not sure if it was investigated.
22:26 🔗 arkiver so this will be a warning for people running the project
22:26 🔗 timmc hmm, yeah
22:26 🔗 timmc It also saves a session ID that I haven't been able to connect to any other session identifiers.
22:27 🔗 arkiver yeah, but the sessions ID is not directly identifiable as coming from someone by non-imzy staff
22:27 🔗 arkiver Most forums we save have session IDs stored
22:43 🔗 greenie I'll run it past weff, timmc. I don't want to speak in absolutes about that particular sort of thing without being quite confident
22:43 🔗 arkiver yes of course
22:45 🔗 Stiletto has quit IRC (Ping timeout: 246 seconds)
22:45 🔗 timmc (I *think* I'd reported this, at least!)
22:46 🔗 greenie yeah it sounds familiar but rather than me digging through my github stuff i'm just gonna be lazy and double check with weff
22:51 🔗 timmc greenie: You can probably search issues for "remoteIp" or "less-trafficked" if there's any chance you copied my email into github.
23:00 🔗 Stilett0 has joined #archiveteam-bs
23:24 🔗 BlueMaxim has joined #archiveteam-bs
23:42 🔗 alembic arkiver unsure if it needs a warrior, but we won't need to do link discovery
23:42 🔗 arkiver I think we'll do a warrior project
23:42 🔗 alembic Imsy engineer I'm speaking to would like to know what format is best
23:42 🔗 alembic for the list of urls
23:42 🔗 alembic I'm guessing just a \n delimited list?
23:46 🔗 arkiver yes
23:46 🔗 arkiver we also talked with greenie about it a bit
23:46 🔗 arkiver see log
23:46 🔗 alembic okidoki
23:47 🔗 greenie yeah ive been sorta relaying, sorry for the multiple means of communication
23:47 🔗 alembic np, I've held off emailing Lesley at Imsy because I've figured as much
23:50 🔗 alembic greenie if it's ok with you guys, we'll just to stick to IRC going forward?
23:52 🔗 greenie i should clarify, lesley = weffey that ive been referring to. The server I host my IRC client on is old and fussy and tends to kill my ssh connection, but yeah I can manage IRC if thats easier for you. However, lesley/weffey is the person who will actually be doing the technical end of things on our side, is our lead backend dev. So may end up being a bit of a game of telephone, unless they hop
23:52 🔗 greenie on IRC, which I'm not sure they do anymore
23:54 🔗 alembic ahaha ok, we'll stick to our motley hybrid of IRC/email then :P
23:54 🔗 greenie hehe okay

irclogger-viewer